Framework Layout

Something that often seems to get overlooked when designing test automation frameworks is how easy it is for other people to pick up the framework and write tests with it. The harder something is to work with, the slower the automation process will be.

Making a framework where it is easy to discover everything available to use has become very important to me. I now field less questions about “hey, is there a class that represents the change password page?”, and have more time to talk about test design philosophy and good practices.

To make a framework easy to discover is a relatively straightforward process. The current way I do this is to create classes that represent different concepts on the framework. While this can vary from framework to framework, the core concepts that are included in everything I create are:

  • Framework
  • Pages
  • Workflows


The framework class is our jumping-in point.  Almost every action that a test writer can do will start with calling the framework.  Everything else is contained within this class.  When working with a web portal, I usually instantiate this as the variable “portal”.  You could call this whatever you like though, there’s a lot of things that make sense.  The specific product name would also be a great choice.


All of our page objects are instantiated and accessible through this class.  It keeps all the pages in one neat place, and allows the test writer to locate all of the pages that are available.


This is such a simple quality of life improvement that will make it so much easier for your test writers to find things on their own.  Do not underestimate how important being able to find things quickly is!


Workflows are processes that are larger than the scope of a single page.  They allow us to combine multiple pages and areas of the framework into a single method call.  This reduces the amount of repeated code that exists for common processes.  A good example of this is a Workflow method that I typically name


You’re probably familiar with this kind of method.  It’s a “just get through all this stuff please” method.  What it contains depends on the exact structure of the web portal I’m working on, but it typically has these steps in it:

  • Register a user
  • Accept the terms and conditions (yes, this is a separate step where I work)
  • Skip through the welcome/tutorial pages
  • Check that we end up on the dashboard (the logged in user’s homepage)

All the Workflow does is utilise the page objects within the framework to carry out a series of actions that a single page object would not (and should not!) be capable of doing on its own.  I’ve called these “helpers” in the past, but I feel that Workflows is a much more accurate description of what they are.

Notice that we’re just combining actions from different pages here.  There’s really no new interactions.

For the above example, we’re saving a test writer having to call 4 separate page interactions here.  We’re also making it so that the tester doesn’t have to remember the specific order that these actions are taken in.  This saves both them and I a lot of time when it comes to debugging tests and trying to figure out where things are going wrong.  Are they using the workflow?  If they are, it’s probably a genuine issue rather than them not entirely understanding how you’re meant to combine those individual actions.

Testing Tools Are the Most Valuable Things I Make

Tools are great.  Tools can make you and your colleagues’ work lives so much less tedious on a daily basis.

Where I work we have various long, complex processes that are incredibly easy to get wrong and spend half an hour trying to figure out what the problem was.  Was it step one?  Step five?  Who knows – start again!

The biggest culprit for this kind of frustration is our “order” making system.  It’s big, complex, and requires filling in around 25 fields in a CSV each time you want to make a single order.  Several of these fields – when grouped together – must be unique from any other order made on the system.  Several fields have very specific values that they must be.  Making an order is like navigating a minefield, especially for new members of QA, who are likely to encounter this system days after starting.

If you’ve read the title of this blog post, you know where this is going.  So how does a QA/test tool help out here?

The Manual Process

This order system we have has several steps to it that require very specific actions.

  1. Create a CSV with very finicky requirements for several field values
  2. Zip it up in a file that requires a very specific name (that changes on a daily basis)
  3. Make sure you password that zip file with a very specific password
  4. Send that file either via email or SFTP to one of several different locations depending on environment
  5. Wait a while for the file to be consumed and your order to be created

Each order could take upwards of 20 minutes of faff to complete when done entirely manually.

In our tool we’ve made it possible to complete the entire process from start to finish in under 10 seconds.

Default Everything

For CSV creation, the majority of fields are presented to the tester as free-text boxes with default values.  Where possible these values are randomised, but always randomised to valid and realistic choices.  E.g. we pick from a pool of first names, surnames etc.

For the majority of the times we have to run through this process, we end up only filling in 2 of the 25 fields by hand.  The default values get us most of the way.  That’s a huge time saving.

Lock It Down (But Not Too Much)

For CSV fields that only allow specific values, we lock them down using select boxes, just like on a website.  The select list is populated with all the valid values, as well as some invalid ones (that are clearly marked as such).  This allows the tester to see all the available values, whilst also being able to send an  invalid value for more extensive testing of negative paths as required.

To make the tool as versatile as possible, I would recommend avoiding making it “too helpful”.  What I mean by this is locking field values down too heavily.  Always allow bad values if the tester really wants to put them in.  We want to guide them to the happy path, but allow them to veer off when they need to.

Abstract Out Tedium

The tool completely abstracts the next few parts of the process.  Zipping up and sending the file is done with the click of a button.  There’s no value in our testers going through this process by hand, it’s tedious busywork.  As long as testers understand enough about the process that’s going on in the background, they really don’t need to go through it each time.

Environment choice is handled in much the same way. Testers get to pick an easily recognised name from a list rather than having to remember various different server URLs, and folder locations on those.  The tool just handles it.

Good Feedback

After the order is sent to the server, the tool goes into a “checking if stuff worked” mode.  It loops until it finds a specific entry in the database that tells us the request succeeded or failed.

The first piece of feedback we use here is a progress bar.  This bar switches colours depending on what’s happening.  Blue means the order request is in progress.  Green means success.  Red means the process errored/failed in some way.  It’s basic stuff, but its importance shouldn’t be overlooked.

For any failures, we present them to the tester in an output log within the GUI.  This output includes any information that was stored about the specific problem straight from the process that consumes the orders.  This saves a lot of time explaining to our testers where the errors are stored in the database, as well as having to track down the specific row each time a problem occurs.  Easy access to available information is key here.




How Would You Approach Exploratory API Testing?

Day 2 of MoT’s 30 Days of Testing asks the question “How would you approach exploratory API Testing?”.  From the perspective of someone who has never done testing that I feel falls into the realm of ‘real’ exploratory testing, this is a tough question to answer.  I’m going to approach this as “You’ve been told to test an API you have no documentation for.  You work at the company creating the API.”

There’s a couple of things I should tell you before getting into the exploratory testing:

  1.  The company I work for is a software developer with many different departments. We’re all friendly, and knowing more about something is as easy as going to someone’s desk and asking if they’ve got time to help you
  2. We’re big into writing specs for things, and having test scripts based on those specs.

The second point typically means that exploratory testing is moved to the sideline.  We will do some loose exploratory testing as we follow the script – based on any quirks we notice in the implementation – but 95% of the time we’re executing test cases that we knew existed far in advance of any code being delivered.

With that out of the way, how would I go about exploratory testing an API with no information about it?

Find Out More

APIs are tricky to exploratory test due to their nature.  There’s no GUI to lead you on your exploratory path.  How can we make it easier on ourselves, as someone in direct contact with the people creating the API?  Talk to them.  Even without a solid spec to work from, business analysts should have an idea of the problem the client wants to solve with the API.  The first thing I do is talk to the BAs and make some notes while having that conversation.  I translate them into a mind map later on.  This gets me one part of puzzle: what we intended to make.

After speaking with the BAs, the next port of call are the developers who worked on implementing the API.  Armed with the knowledge of what needed to be made, I ask the developers how they implemented solutions to each of the problems they were presented with.  This will typically result in a list of methods/paths that allow access to those solutions via the API.  At this point, I ask them to provide sample requests, as well as all parameters that it’s possible to set in each request.  This is important because often there are optional parameters that developers may not think to provide in an example request.  Optional parameters can potentially change how that request is processed, and lead to failures or exceptions that would not be able to find without knowing that the parameter existed in the first place.

Document Stuff

Having had those discussions, I go back to my desk and think about what I’ve learned.  My basic pre-testing discovery period is now complete.  I spend some time attempting to match up what was implemented with what was intended.  If I see any areas where I think the two have significantly diverged, I mark those in some way for testing as soon as I’m confident the very basics of the API are up and running.  This allows for any potential implementation problems to be found and talked about quickly.

I also make sure that what I’ve learned is available to my colleagues.  We keep test scripts in Excel, and maintain an internal Wiki where we write articles about what projects are, where their documentation is, and any information likely to be helpful for someone picking up testing the project for the first time.


My tool of choice for any kind of API testing is RestAssured.  I’m primarily an automated tester so I’m more comfortable in an IDE than I am in Postman or SoapUI.  Any tests that I made note of earlier that seem like they would be good long-term tests are written using RestAssured, where they can be re-run any time we need do a release of the API.


The easiest way to deal with a situation where you need to test an API where there’s no specification is to ask people about it.  This may all be cheating when it comes to exploratory testing, I’m not sure.  I can only come at this from a perspective I’m familiar with based on the culture where I work.  If we don’t know, we ask.

I’m looking forward to the other posts about this subject and hopefully learning about what people do when there’s nobody to ask.

What is API Testing?

To preface this, I’ve never really read a great deal about API testing.  My opinion here is based on a little bit of reading, and a lot of doing.

If you Google the definition of an API, you get the following:

a set of functions and procedures that allow the creation of applications which access the features or data of an operating system, application, or other service.

That is a little broad.  It encompasses so many different things that it would be tricky to address them in a post that doesn’t make everyone fall asleep.

So let’s narrow the scope of our definition here to what I think the MoT “30 Days of API Testing” event is focusing on: Web API Testing.

What is a Web API?

A web API is essentially a service provided by a company that wishes to allow external third parties access to certain functions or data.  This access can be provided via a Web API.

Let me use a couple of real world examples here.  Where I work, we provide a client-facing API that allows them to make some basic calls over the Internet.  Those calls include

  • Making a new user (alters the state on our system)
  • Requesting score data for a user (reads existing data on our system)

So a client typically makes a new user over the API.  That user then takes various actions that influence their score (outside of the API, via an app).  The client can then request to see the scores for the same user.

Let’s go into a little more detail about what’s happening on each of those examples.  Web APIs are all about requests, responses, and changes. I won’t go into too much detail here, as there are much better resources available online for these things.  As a quick overview:


Like in life, requests to an API are an “I would like you to do this for me” request.  In plain English this could be as simple as “Hey API, how long have you been awake?”.


These are what an API sends back to you after a request.  There’s a wide range of codes that can be returned, and the content will vary based on which code it is.  Continuing our example above, the response here would be 200 with a body of “I’ve been awake for 7 days”.


Sometimes the requests we make change the “state” on the API’s server/database.  Requests that create something, delete something, or update something are examples of this.   Requests that change the system often influence the response from other requests, so need to be tested in conjunction with those other calls.

What is Web API Testing?

With the above information in mind, let’s think about what testing for something like that would be like.

Testing an API includes doing the following things:

  • Making calls directly to it, in the same way that the client will be making those calls.
  • Checking that the response to a call is what is defined in the spec, including
    • The status code
    • The response body (if any)
    • Any data types are correct (e.g. float, integer, date-time)
  • Checking error codes are returned when appropriate
  • Checking that changes occur where appropriate
  • Checking that appropriate security is in place
  • Checking that paths/urls are as defined
  • Checking combinations of calls to see if any state-changing ones break the others

A good spec document should contain information helping you to test all of the above, and make your API testing experience quite pleasant.


Xpath: Friend Or Foe?

Over the years I’ve seen varying opinions on whether you should use Xpath locators as anything but a last resort.   Back when I started I adopted the standpoint that I would avoid most usage of Xpath, but in situations where it really wasn’t practical to have IDs added I would use it as a last resort.

These days I don’t mind Xpath being used in situations where it makes sense.  Things like fake tables (made of divs and spans), and where multiple copies of the same HTML structure exist, and adding identical IDs to those would not be valid HTML.

A few years ago, I would have expected using Xpath locators to lead to extremely fragile tests, but I’ve learned a few of the nuances of the language that lead to strong and readable locators.  I believe they’re a great asset to testing web pages, as long as they’re used in a responsible and sensible manner.

Too many times when reading through WebDriver questions on Stack Overflow, I see this kind of thing:


This is the kind of thing you should never do, and never suggest anyone else do. If you write an Xpath like this and have to fix it months down the line, you’re going to have a difficult job remembering what this was doing.  Even worse if a team member has to fix it, they won’t have a clue!

Just by looking at that locator, try to figure out what it’s locating.  It’s a link contained within a hugely complicated number of other elements.  Who really knows what it does though.

So how do you do Xpath better?

The first thing to know is that you don’t have to navigate through every parent element on the page before getting to the one you want.  Xpaths starting with “//” are relative and allow you to start looking for an element anywhere on the page, contained within any number of other elements.  “//” starts based on the current context, and all driver.findElement requests begin at the root of the document, so using a relative Xpath request means “Find this element anywhere on the page”.

Let’s take a simplified example; a page that shows a couple of products on it with their names and prices.  In this example we’ll assume that we know the names of the products that we’re expecting to find on the page, and that we want to verify that their costs are as we expect.

Let’s take a look at the structure of the page we’ve got.  We don’t have any IDs to work with, so we need to get a little more creative.  Where do we start?  We know we need to locate the price for a particular item and check that it’s what we expect.  If we just checked that the price was somewhere on the page, it would not be specific enough for our case.  It would only prove that that price existed somewhere on the page, and not necessarily near the product we expect it to be for.

So let’s locate our product first, and see where we need to go from there.

We can locate our product fairly simply, by using the text we know it should have, and the element we know it exists inside:

driver.findElement(By.xpath("//p[text()='Product 2']")

That’s a fairly simple Xpath, and it’s obvious what it does by looking at it. It finds a P element with the text ‘Product 2’.

I would encourage you to follow along with this using .  This is a website I use all the time when debugging Xpath selector issues.  Copy and paste the Xpath into the Xpath field, and the HTML from above into the XML field.  You’ll be able to see what Xpath sees at each step of the process.

Next we’ll make use of Xpath’s powerful axes features. These allow you to navigate the document based on where you currently are in it. We’re currently at a P tag contained within a DIV that contains additional information we’re interested in checking. The DIV is the parent tag of the P, so we can use the parent axis to travel back up the document to the DIV.

driver.findElement(By.xpath("//p[text()='Product 2']/..")

/.. is a shortcut for using the parent axis, so we’ll use that here. We now have the DIV that contains all the product information we’re interested in and we can now locate only elements inside that DIV and nowhere else. We know that any information we collect now is related directly to the product with the name “Product 2”.

Looking at the parent DIV in the HTML above, we can see that the price is located inside a P tag with the class “item-price”.  Let’s add that to our Xpath:

driver.findElement(By.xpath("//p[text()='Product 2']/../p[@class='item-price']")

This Xpath is slightly more complicated now, but still fairly readable.  We get a P element with the text “Product 2”, go back to its parent, and then inside that parent we go foward into a P element with the class “item-price”.  If this were to break and someone else in your team had to fix it, that would give them vital clues about what exactly it is that’s broken.  The code is failing to find the item-price, and they need to fix the Xpath so that it can find the price again.

It’s descriptive of what it’s doing, and that is an incredibly important aspect of making good tests.  People need to be able to quickly understand what your code is doing.  Test automation projects live and die based on how easily other members of the team can understand and modify your code when necessary.  If nobody can maintain the code all value in the tests are lost the moment something breaks.

This article has been more about the philosophy behind writing understandable Xpaths, rather than teaching the basics of how to write them.  If you’re interested in understanding more about Xpath there’s a few decent resources about them out there, including this one over at guru99 that deals specifically with its use in Selenium. WebDriver.

Simple Tip: Pass By Locators Into Methods For Maximum Versatility

This may seem like an obvious tip to many, but I’ve seen a few applications for automated tester roles at my company that fail to do this.  When creating methods that interact with elements on a web page, remember to use Selenium WebDriver’s By locators.

These By locators are super versatile, and will allow you to have one method that deals with any element that can be located.

So what happens when we stop using By and try to pass other things to our methods instead?  We get in situations like this:

Notice how each of the elements we interact with inside the clickElementAndGetText method is limited to being located via their ID?  This limits where we can use this method quite extremely.  The only application is where both elements we want to use can be located via an ID.  What if one has to be located via its html tag instead?  Well we can’t use this method to do that.  We could make another method that changes the last findElement call to use By.tagName instead, but think about how many methods you would have to create to accomplish all the possible combinations of element location strategies!

Instead, just make your methods pass By locators around.

Look how versatile this method is when the location strategies are decided outside of the method and passed in as arguments!

Page Object Model Inheritance

Hopefully you already know the importance of using the Page Object Model to keep your code organised.  To keep your code maintainable it’s a good practice to have page inherit from a single abstract “BasePage” class.  This allows us to have one central place that defines general actions that our pages can perform, and allows our concrete pages to inherit those behaviours, such as:

  • Finding elements
  • Clicking elements
  • Checking whether an element is present
  • Waiting for an element to become present
  • Filling in text fields
  • Selecting items from select lists

But these are all things you can access through the WebDriver API itself right?  Yes, but to help write more readable and maintainable code it’s good to limit how much we make direct calls to the WebDriver API.  It’s better to have methods that you directly control, that will allow you to make broad changes really easily if they become necessary.

For example, if a new version of Firefox is released that for some reason breaks WebDriver’s ability to know whether a page is ready or not, it’s much easier to fix that in your code by altering your “findElement” method to first wait for that element to be present before trying to use it.  This kind of event tends to be few and far between, but it saves a lot of time versus refactoring the hundreds of driver.findElement calls you’ve got littered throughout your code.

This is our abstract BasePage class, containing a couple of simple methods that allow us to make very quick alterations to every base action that we take in our framework.

When we make a concrete page object, we extend from BasePage:

We now have a simple system for sharing a set of common actions between all of our page objects.  If we have to change the way we do things down the line, it’s now a one or two line change, rather than hours of work.

When we want to interact with the page, we create an instance of LoginPage and pass in our WebDriver instance:

Basic Page Object Model

Page objects are one of the simplest ways to organise your test automation code.  The basic principle is to have a class that represents each of your pages.  You then create methods to take the actions that a user would.

Without some way of logically organising your code, it soon becomes impossible to maintain.  The Page Object Model allows you to know where code should exist.  Code for logging a user in?  It should exist in a LoginPage class.  Code for registering a user?  It should exist in a RegistrationPage class.

If we take a look at the following login form, we can see the elements (parts of the page) that the user needs to interact with to log in.


  • The email address field
  • The password field
  • The keep me logged in checkbox
  • The login button

So now that we know some of the things we’ll need to interact with on the page, let’s think about method naming.  The naming of your methods is important.  Well named methods will make it clear what your code is doing at a glance.

For the above interactions, we can make the following methods:

  • fillEmailAddressField
  • fillPasswordField
  • tickKeepMeLoggedIn
  • clickLoginButton

Notice how each of the methods we are going to create have logical names that say exactly what they’ll be doing.


In the code above the constructor has a parameter that allows us to pass in our WebDriver instance.  There’s a few ways of going about managing the use of a single instance of WebDriver in your page objects, but this is my preferred method.

This is a very basic implementation of a Page Object Model, and I would recommend reading more posts on this and other sites in order to pick up some other good practices for using the Page Object Model, and WebDriver in general.

How I Started In Test Automation

The first job that I worked as a QA/Software Tester was for a tiny company.  There were 3 software developers at the time I joined, and I was their first tester.  The product was a website that let users do some basic filing tasks, to keep track of their data a little more easily.  It would remind them about things like their car insurance renewal dates.

Looking back on it, it was a reasonably straight forward system that was little more complicated than a website backed up by a MySQL database.

Around 6 months into the job, my employers dropped the bombshell that I would be expected to set up automated tests to cover the website’s functionality.  I wasn’t thrilled at the idea of this, but I went about researching ways of accomplishing the task I’d been set.

At that point in my life, I had always been interested in programming, but had never got past the “tinkering” phase.  I would read a few tutorials for a language, set out with lofty goals of what I wanted to accomplish, and lose interest within a week or two.  I had to translate that limited experience into something that would allow our company to run regression tests automatically all the way through our strict 2 week sprints.

So where did I start out?  I started with Ruby and Watir.  Well, in reality Watir came first.  I saw a few references to it in books I was reading at the time, and they had positive things to say about it.  It ran on Ruby, and therefore Ruby was the language I started with.  After a few days of trying it out, I was sold.  Watir was simple and intuitive, and Ruby was extremely forgiving.  Ruby has a reputation for being easy to start learning, and that is definitely true.

Within a month (or two) I had a framework set up, and almost every part of the site covered with at least a couple of tests.  I use the word “framework” incredibly loosely here.  My code was bad, its structure was bad, it was unreliable, but it was a start.

  • I had no idea how to structure things.
  • There were raw calls to Watir everywhere.
  • Almost everything that wasn’t a test was in one huge file called basics.rb (this thing haunts my dreams).
  • The browser object was global, which is a terrible idea if you’re ever going to want to run your tests in parallel.
  • The tests relied on accounts in known states, and would break the moment anything unexpected happened.
  • In an attempt to fix the above, branching code was written to try to deal with various error states in accounts.  This made it difficult to know when things were genuinely going wrong and the test code was just coping with the problems on its own.
  • So much more awfulness.

It was really bad, but… it got the job done.  I would run it against every build, and things would inevitably fail and require attention, or a human “no this is actually fine, ignore it”.  That was still significantly faster than trying to regression test functionality every time the developers released a new build, which was daily.

The main problem was the automation effort was too difficult to maintain.  A small change to the website would result in hours of altering element locators and logic to get the tests to stop ending in errors.  With a little knowledge this is incredibly easy to avoid.  Implementing a Page Object Model (POM), and basic abstraction solves it.

So that’s what I ended up doing.  A class was made to represent each page, and Basics.rb was split up so that all the code within it had a proper home.  Each page object contained the code required to take different actions on a given page.  The tests then instantiated the page objects, and made calls to the methods, e.g.

What would have been:

$browser.text_field(:id => "username").set("Goose")
$browser.text_field(:id => "password").set("Password1")
$browser.button(:id => "submit").click



This was immediately a lot more maintainable.  It still wasn’t great, but it was definitely an improvement.  Lots of little improvements like this added up over time.  The framework became more reliable and needed less of my time per sprint.  New tests would be added to cover new functionality, but the older tests were less prone to causing errors and failures where there were none.

So yeah, my career in test automation started with being thrown in at the deep end, making something fairly terrible and gradually making it invaluable.  At the end it was still awful, but we wouldn’t have been able to get all the testing in a sprint done without it.

As a little bit of context: it’s been 7 or 8 years since I started this journey, and I’m now at the point where I don’t write automated tests anymore.  I set up test frameworks, and leave the test writing to other people in the department.

If there’s a lesson to be learned here, I’d say it’s that making mistakes is fine.  I’m hoping I can fill this blog/site with tips on how to not be as bad at this as I was back then.