Xpath: Friend Or Foe?

Over the years I’ve seen varying opinions on whether you should use Xpath locators as anything but a last resort.   Back when I started I adopted the standpoint that I would avoid most usage of Xpath, but in situations where it really wasn’t practical to have IDs added I would use it as a last resort.

These days I don’t mind Xpath being used in situations where it makes sense.  Things like fake tables (made of divs and spans), and where multiple copies of the same HTML structure exist, and adding identical IDs to those would not be valid HTML.

A few years ago, I would have expected using Xpath locators to lead to extremely fragile tests, but I’ve learned a few of the nuances of the language that lead to strong and readable locators.  I believe they’re a great asset to testing web pages, as long as they’re used in a responsible and sensible manner.

Too many times when reading through WebDriver questions on Stack Overflow, I see this kind of thing:

driver.findElement(By.xpath("html/body/div[1]/div[1]/div[2]/form/div[1]/div[2]/div[1]/div[22]/a")).click();

This is the kind of thing you should never do, and never suggest anyone else do. If you write an Xpath like this and have to fix it months down the line, you’re going to have a difficult job remembering what this was doing.  Even worse if a team member has to fix it, they won’t have a clue!

Just by looking at that locator, try to figure out what it’s locating.  It’s a link contained within a hugely complicated number of other elements.  Who really knows what it does though.

So how do you do Xpath better?

The first thing to know is that you don’t have to navigate through every parent element on the page before getting to the one you want.  Xpaths starting with “//” are relative and allow you to start looking for an element anywhere on the page, contained within any number of other elements.  “//” starts based on the current context, and all driver.findElement requests begin at the root of the document, so using a relative Xpath request means “Find this element anywhere on the page”.

Let’s take a simplified example; a page that shows a couple of products on it with their names and prices.  In this example we’ll assume that we know the names of the products that we’re expecting to find on the page, and that we want to verify that their costs are as we expect.

Let’s take a look at the structure of the page we’ve got.  We don’t have any IDs to work with, so we need to get a little more creative.  Where do we start?  We know we need to locate the price for a particular item and check that it’s what we expect.  If we just checked that the price was somewhere on the page, it would not be specific enough for our case.  It would only prove that that price existed somewhere on the page, and not necessarily near the product we expect it to be for.

So let’s locate our product first, and see where we need to go from there.

We can locate our product fairly simply, by using the text we know it should have, and the element we know it exists inside:

driver.findElement(By.xpath("//p[text()='Product 2']")

That’s a fairly simple Xpath, and it’s obvious what it does by looking at it. It finds a P element with the text ‘Product 2’.

I would encourage you to follow along with this using xpathtester.com .  This is a website I use all the time when debugging Xpath selector issues.  Copy and paste the Xpath into the Xpath field, and the HTML from above into the XML field.  You’ll be able to see what Xpath sees at each step of the process.

Next we’ll make use of Xpath’s powerful axes features. These allow you to navigate the document based on where you currently are in it. We’re currently at a P tag contained within a DIV that contains additional information we’re interested in checking. The DIV is the parent tag of the P, so we can use the parent axis to travel back up the document to the DIV.

driver.findElement(By.xpath("//p[text()='Product 2']/..")

/.. is a shortcut for using the parent axis, so we’ll use that here. We now have the DIV that contains all the product information we’re interested in and we can now locate only elements inside that DIV and nowhere else. We know that any information we collect now is related directly to the product with the name “Product 2”.

Looking at the parent DIV in the HTML above, we can see that the price is located inside a P tag with the class “item-price”.  Let’s add that to our Xpath:

driver.findElement(By.xpath("//p[text()='Product 2']/../p[@class='item-price']")

This Xpath is slightly more complicated now, but still fairly readable.  We get a P element with the text “Product 2”, go back to its parent, and then inside that parent we go foward into a P element with the class “item-price”.  If this were to break and someone else in your team had to fix it, that would give them vital clues about what exactly it is that’s broken.  The code is failing to find the item-price, and they need to fix the Xpath so that it can find the price again.

It’s descriptive of what it’s doing, and that is an incredibly important aspect of making good tests.  People need to be able to quickly understand what your code is doing.  Test automation projects live and die based on how easily other members of the team can understand and modify your code when necessary.  If nobody can maintain the code all value in the tests are lost the moment something breaks.

This article has been more about the philosophy behind writing understandable Xpaths, rather than teaching the basics of how to write them.  If you’re interested in understanding more about Xpath there’s a few decent resources about them out there, including this one over at guru99 that deals specifically with its use in Selenium. WebDriver.

Simple Tip: Pass By Locators Into Methods For Maximum Versatility

This may seem like an obvious tip to many, but I’ve seen a few applications for automated tester roles at my company that fail to do this.  When creating methods that interact with elements on a web page, remember to use Selenium WebDriver’s By locators.

These By locators are super versatile, and will allow you to have one method that deals with any element that can be located.

So what happens when we stop using By and try to pass other things to our methods instead?  We get in situations like this:

Notice how each of the elements we interact with inside the clickElementAndGetText method is limited to being located via their ID?  This limits where we can use this method quite extremely.  The only application is where both elements we want to use can be located via an ID.  What if one has to be located via its html tag instead?  Well we can’t use this method to do that.  We could make another method that changes the last findElement call to use By.tagName instead, but think about how many methods you would have to create to accomplish all the possible combinations of element location strategies!

Instead, just make your methods pass By locators around.

Look how versatile this method is when the location strategies are decided outside of the method and passed in as arguments!

Page Object Model Inheritance

Hopefully you already know the importance of using the Page Object Model to keep your code organised.  To keep your code maintainable it’s a good practice to have page objects inherit from a single abstract “BasePage” object.  This allows us to have one central place that defines general actions that our pages can perform, and allows our concrete pages to inherit those behaviours, such as:

  • Finding elements
  • Clicking elements
  • Checking whether an element is present
  • Waiting for an element to become present
  • Filling in text fields
  • Selecting items from select lists

But these are all things you can access through the WebDriver API itself right?  Yes, but to help write more readable and maintainable code it’s good to limit how much we make direct calls to the WebDriver API.  It’s better to have methods that you directly control, that will allow you to make broad changes really easily if they become necessary.

For example, if a new version of Firefox is released that for some reason breaks WebDriver’s ability to know whether a page is ready or not, it’s much easier to fix that in your code by altering your “findElement” method to first wait for that element to be present before trying to use it.  This kind of event tends to be few and far between, but it saves a lot of time versus refactoring the hundreds of driver.findElement calls you’ve got littered throughout your code.

This is our abstract BasePage class, containing a couple of simple methods that allow us to make very quick alterations to every base action that we take in our framework.

When we make a concrete page object, we extend from BasePage:

We now have a simple system for sharing a set of common actions between all of our page objects.  If we have to change the way we do things down the line, it’s now a one or two line change, rather than hours of work.

When we want to interact with the page, we create an instance of LoginPage and pass in our WebDriver instance:

Basic Page Object Model

Page objects are one of the simplest ways to organise your test automation code.  The basic principle is to have a class that represents each of your pages.  You then create methods to take the actions that a user would.

Without some way of logically organising your code, it soon becomes impossible to maintain.  The Page Object Model allows you to know where code should exist.  Code for logging a user in?  It should exist in a LoginPage class.  Code for registering a user?  It should exist in a RegistrationPage class.

If we take a look at the following login form, we can see the elements (parts of the page) that the user needs to interact with to log in.

login

  • The email address field
  • The password field
  • The keep me logged in checkbox
  • The login button

So now that we know some of the things we’ll need to interact with on the page, let’s think about method naming.  The naming of your methods is important.  Well named methods will make it clear what your code is doing at a glance.

For the above interactions, we can make the following methods:

  • fillEmailAddressField
  • fillPasswordField
  • tickKeepMeLoggedIn
  • clickLoginButton

Notice how each of the methods we are going to create have logical names that say exactly what they’ll be doing.

 

In the code above the constructor has a parameter that allows us to pass in our WebDriver instance.  There’s a few ways of going about managing the use of a single instance of WebDriver in your page objects, but this is my preferred method.

This is a very basic implementation of a Page Object Model, and I would recommend reading more posts on this and other sites in order to pick up some other good practices for using the Page Object Model, and WebDriver in general.

How I Started In Test Automation

The first job that I worked as a QA/Software Tester was for a tiny company.  There were 3 software developers at the time I joined, and I was their first tester.  The product was a website that let users do some basic filing tasks, to keep track of their data a little more easily.  It would remind them about things like their car insurance renewal dates.

Looking back on it, it was a reasonably straight forward system that was little more complicated than a website backed up by a MySQL database.

Around 6 months into the job, my employers dropped the bombshell that I would be expected to set up automated tests to cover the website’s functionality.  I wasn’t thrilled at the idea of this, but I went about researching ways of accomplishing the task I’d been set.

At that point in my life, I had always been interested in programming, but had never got past the “tinkering” phase.  I would read a few tutorials for a language, set out with lofty goals of what I wanted to accomplish, and lose interest within a week or two.  I had to translate that limited experience into something that would allow our company to run regression tests automatically all the way through our strict 2 week sprints.

So where did I start out?  I started with Ruby and Watir.  Well, in reality Watir came first.  I saw a few references to it in books I was reading at the time, and they had positive things to say about it.  It ran on Ruby, and therefore Ruby was the language I started with.  After a few days of trying it out, I was sold.  Watir was simple and intuitive, and Ruby was extremely forgiving.  Ruby has a reputation for being easy to start learning, and that is definitely true.

Within a month (or two) I had a framework set up, and almost every part of the site covered with at least a couple of tests.  I use the word “framework” incredibly loosely here.  My code was bad, its structure was bad, it was unreliable, but it was a start.

  • I had no idea how to structure things.
  • There were raw calls to Watir everywhere.
  • Almost everything that wasn’t a test was in one huge file called basics.rb (this thing haunts my dreams).
  • The browser object was global, which is a terrible idea if you’re ever going to want to run your tests in parallel.
  • The tests relied on accounts in known states, and would break the moment anything unexpected happened.
  • In an attempt to fix the above, branching code was written to try to deal with various error states in accounts.  This made it difficult to know when things were genuinely going wrong and the test code was just coping with the problems on its own.
  • So much more awfulness.

It was really bad, but… it got the job done.  I would run it against every build, and things would inevitably fail and require attention, or a human “no this is actually fine, ignore it”.  That was still significantly faster than trying to regression test functionality every time the developers released a new build, which was daily.

The main problem was the automation effort was too difficult to maintain.  A small change to the website would result in hours of altering element locators and logic to get the tests to stop ending in errors.  With a little knowledge this is incredibly easy to avoid.  Implementing a Page Object Model (POM), and basic abstraction solves it.

So that’s what I ended up doing.  A class was made to represent each page, and Basics.rb was split up so that all the code within it had a proper home.  Each page object contained the code required to take different actions on a given page.  The tests then instantiated the page objects, and made calls to the methods, e.g.

What would have been:

$browser.text_field(:id => "username").set("Goose")
$browser.text_field(:id => "password").set("Password1")
$browser.button(:id => "submit").click

became

loginPage.fill_username_field("Goose")
loginPage.fill_password_field("Password1")
loginPage.click_login_button

This was immediately a lot more maintainable.  It still wasn’t great, but it was definitely an improvement.  Lots of little improvements like this added up over time.  The framework became more reliable and needed less of my time per sprint.  New tests would be added to cover new functionality, but the older tests were less prone to causing errors and failures where there were none.

So yeah, my career in test automation started with being thrown in at the deep end, making something fairly terrible and gradually making it invaluable.  At the end it was still awful, but we wouldn’t have been able to get all the testing in a sprint done without it.

As a little bit of context: it’s been 7 or 8 years since I started this journey, and I’m now at the point where I don’t write automated tests anymore.  I set up test frameworks, and leave the test writing to other people in the department.

If there’s a lesson to be learned here, I’d say it’s that making mistakes is fine.  I’m hoping I can fill this blog/site with tips on how to not be as bad at this as I was back then.