Xpath: Friend Or Foe?

Over the years I’ve seen varying opinions on whether you should use Xpath locators as anything but a last resort.   Back when I started I adopted the standpoint that I would avoid most usage of Xpath, but in situations where it really wasn’t practical to have IDs added I would use it as a last resort.

These days I don’t mind Xpath being used in situations where it makes sense.  Things like fake tables (made of divs and spans), and where multiple copies of the same HTML structure exist, and adding identical IDs to those would not be valid HTML.

A few years ago, I would have expected using Xpath locators to lead to extremely fragile tests, but I’ve learned a few of the nuances of the language that lead to strong and readable locators.  I believe they’re a great asset to testing web pages, as long as they’re used in a responsible and sensible manner.

Too many times when reading through WebDriver questions on Stack Overflow, I see this kind of thing:

driver.findElement(By.xpath("html/body/div[1]/div[1]/div[2]/form/div[1]/div[2]/div[1]/div[22]/a")).click();

This is the kind of thing you should never do, and never suggest anyone else do. If you write an Xpath like this and have to fix it months down the line, you’re going to have a difficult job remembering what this was doing.  Even worse if a team member has to fix it, they won’t have a clue!

Just by looking at that locator, try to figure out what it’s locating.  It’s a link contained within a hugely complicated number of other elements.  Who really knows what it does though.

So how do you do Xpath better?

The first thing to know is that you don’t have to navigate through every parent element on the page before getting to the one you want.  Xpaths starting with “//” are relative and allow you to start looking for an element anywhere on the page, contained within any number of other elements.  “//” starts based on the current context, and all driver.findElement requests begin at the root of the document, so using a relative Xpath request means “Find this element anywhere on the page”.

Let’s take a simplified example; a page that shows a couple of products on it with their names and prices.  In this example we’ll assume that we know the names of the products that we’re expecting to find on the page, and that we want to verify that their costs are as we expect.

Let’s take a look at the structure of the page we’ve got.  We don’t have any IDs to work with, so we need to get a little more creative.  Where do we start?  We know we need to locate the price for a particular item and check that it’s what we expect.  If we just checked that the price was somewhere on the page, it would not be specific enough for our case.  It would only prove that that price existed somewhere on the page, and not necessarily near the product we expect it to be for.

So let’s locate our product first, and see where we need to go from there.

We can locate our product fairly simply, by using the text we know it should have, and the element we know it exists inside:

driver.findElement(By.xpath("//p[text()='Product 2']")

That’s a fairly simple Xpath, and it’s obvious what it does by looking at it. It finds a P element with the text ‘Product 2’.

I would encourage you to follow along with this using xpathtester.com .  This is a website I use all the time when debugging Xpath selector issues.  Copy and paste the Xpath into the Xpath field, and the HTML from above into the XML field.  You’ll be able to see what Xpath sees at each step of the process.

Next we’ll make use of Xpath’s powerful axes features. These allow you to navigate the document based on where you currently are in it. We’re currently at a P tag contained within a DIV that contains additional information we’re interested in checking. The DIV is the parent tag of the P, so we can use the parent axis to travel back up the document to the DIV.

driver.findElement(By.xpath("//p[text()='Product 2']/..")

/.. is a shortcut for using the parent axis, so we’ll use that here. We now have the DIV that contains all the product information we’re interested in and we can now locate only elements inside that DIV and nowhere else. We know that any information we collect now is related directly to the product with the name “Product 2”.

Looking at the parent DIV in the HTML above, we can see that the price is located inside a P tag with the class “item-price”.  Let’s add that to our Xpath:

driver.findElement(By.xpath("//p[text()='Product 2']/../p[@class='item-price']")

This Xpath is slightly more complicated now, but still fairly readable.  We get a P element with the text “Product 2”, go back to its parent, and then inside that parent we go foward into a P element with the class “item-price”.  If this were to break and someone else in your team had to fix it, that would give them vital clues about what exactly it is that’s broken.  The code is failing to find the item-price, and they need to fix the Xpath so that it can find the price again.

It’s descriptive of what it’s doing, and that is an incredibly important aspect of making good tests.  People need to be able to quickly understand what your code is doing.  Test automation projects live and die based on how easily other members of the team can understand and modify your code when necessary.  If nobody can maintain the code all value in the tests are lost the moment something breaks.

This article has been more about the philosophy behind writing understandable Xpaths, rather than teaching the basics of how to write them.  If you’re interested in understanding more about Xpath there’s a few decent resources about them out there, including this one over at guru99 that deals specifically with its use in Selenium. WebDriver.

Simple Tip: Pass By Locators Into Methods For Maximum Versatility

This may seem like an obvious tip to many, but I’ve seen a few applications for automated tester roles at my company that fail to do this.  When creating methods that interact with elements on a web page, remember to use Selenium WebDriver’s By locators.

These By locators are super versatile, and will allow you to have one method that deals with any element that can be located.

So what happens when we stop using By and try to pass other things to our methods instead?  We get in situations like this:

Notice how each of the elements we interact with inside the clickElementAndGetText method is limited to being located via their ID?  This limits where we can use this method quite extremely.  The only application is where both elements we want to use can be located via an ID.  What if one has to be located via its html tag instead?  Well we can’t use this method to do that.  We could make another method that changes the last findElement call to use By.tagName instead, but think about how many methods you would have to create to accomplish all the possible combinations of element location strategies!

Instead, just make your methods pass By locators around.

Look how versatile this method is when the location strategies are decided outside of the method and passed in as arguments!

Page Object Model Inheritance

Hopefully you already know the importance of using the Page Object Model to keep your code organised.  To keep your code maintainable it’s a good practice to have page objects inherit from a single abstract “BasePage” object.  This allows us to have one central place that defines general actions that our pages can perform, and allows our concrete pages to inherit those behaviours, such as:

  • Finding elements
  • Clicking elements
  • Checking whether an element is present
  • Waiting for an element to become present
  • Filling in text fields
  • Selecting items from select lists

But these are all things you can access through the WebDriver API itself right?  Yes, but to help write more readable and maintainable code it’s good to limit how much we make direct calls to the WebDriver API.  It’s better to have methods that you directly control, that will allow you to make broad changes really easily if they become necessary.

For example, if a new version of Firefox is released that for some reason breaks WebDriver’s ability to know whether a page is ready or not, it’s much easier to fix that in your code by altering your “findElement” method to first wait for that element to be present before trying to use it.  This kind of event tends to be few and far between, but it saves a lot of time versus refactoring the hundreds of driver.findElement calls you’ve got littered throughout your code.

This is our abstract BasePage class, containing a couple of simple methods that allow us to make very quick alterations to every base action that we take in our framework.

When we make a concrete page object, we extend from BasePage:

We now have a simple system for sharing a set of common actions between all of our page objects.  If we have to change the way we do things down the line, it’s now a one or two line change, rather than hours of work.

When we want to interact with the page, we create an instance of LoginPage and pass in our WebDriver instance: