XPath (XML Path Language) is a query language for selecting nodes from an XML document. If you work with XML data, XSLT transformations, web scraping, or test automation tools like Selenium, XPath is an essential skill. This complete guide covers XPath 1.0 — the most widely supported version — from basic paths to advanced axes and functions.
What is XPath?
XPath is a W3C standard language that uses path expressions to navigate through elements and attributes in an XML document. It is similar to how a file system path navigates folders — /home/user/documents/file.txt — but for XML nodes. XPath 1.0 is natively supported in every browser via the document.evaluate() API and is built into XSLT, Java, Python, .NET and many other platforms.
Sample XML Document
We will use this bookstore XML for all examples throughout this guide:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book id="1" category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book id="2" category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book id="3" category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Basic XPath Syntax
XPath uses a path notation similar to file systems:
/— Selects from the root node (absolute path)//— Selects nodes anywhere in the document (descendant).— Selects the current node..— Selects the parent of the current node@— Selects an attribute*— Wildcard — matches any element
Essential XPath Expressions
/bookstore — Root bookstore element
/bookstore/book — All book elements directly under bookstore
//book — All book elements anywhere in the document
//book/@id — The id attribute of all book elements
//book/title/text() — Text content of all title elements
//* — Every element in the document
//@* — Every attribute in the document
Predicates (Filtering)
Predicates are conditions inside square brackets [] that filter nodes:
//book[1] — First book element
//book[last()] — Last book element
//book[position()<3] — First two books
//book[@category] — Books that have a category attribute
//book[@category='web'] — Books with category="web"
//book[price>30] — Books where price is greater than 30
//book[year=2005] — Books published in 2005
//book[@id='1']/title — Title of the book with id=1
XPath Axes
Axes define the relationship between the context node and the nodes being selected. XPath 1.0 has 13 axes:
| Axis | Selects | Example |
|---|---|---|
child:: | Child elements (default) | child::book |
parent:: | Parent element | parent::bookstore |
ancestor:: | All ancestors (parent, grandparent...) | ancestor::bookstore |
descendant:: | All descendants | descendant::title |
following-sibling:: | All siblings after current node | following-sibling::book |
preceding-sibling:: | All siblings before current node | preceding-sibling::book |
attribute:: | Attributes of current node | attribute::category (same as @category) |
self:: | Current node itself | self::book |
XPath Built-in Functions
XPath 1.0 includes a rich set of built-in functions across four categories:
Node Functions
count(//book) — Number of book elements (returns 3)
name(//*[1]) — Name of first element
last() — Index of last node in set
position() — Position of current node
String Functions
contains(title, "XML") — True if title contains "XML"
starts-with(title, "Harry") — True if title starts with "Harry"
string-length(title) — Length of title text
normalize-space(title) — Trims and collapses whitespace
concat("Hello", " ", "World") — Concatenates strings
substring(title, 1, 5) — Extracts 5 chars starting at position 1
Number Functions
sum(//price) — Sum of all prices (returns 99.94)
floor(3.9) — Returns 3 (round down)
ceiling(3.1) — Returns 4 (round up)
round(3.5) — Returns 4 (standard rounding)
Boolean Functions
not(@category) — True when element has no category attribute
boolean(//book) — True when there is at least one book
true() — Always returns true
false() — Always returns false
Combining Predicates
You can combine multiple predicates and use logical operators:
// Books with price over 30 AND in the web category
//book[price>30 and @category='web']
// Books published in 2003 OR 2005
//book[year=2003 or year=2005]
// Books that do NOT have a category attribute
//book[not(@category)]
// Books whose title contains "XML" and price is under 40
//book[contains(title,'XML') and price<40]
Practical XPath Use Cases
Web Scraping with Python (lxml):
from lxml import etree
tree = etree.parse('bookstore.xml')
# Get all book titles
titles = tree.xpath('//book/title/text()')
# Get books over 30 with their prices
expensive = tree.xpath('//book[price>30]')
for book in expensive:
print(book.find('title').text, book.find('price').text)
Test Automation with Selenium:
// Find button by text
//button[text()='Submit']
// Find input by placeholder
//input[@placeholder='Search...']
// Find element containing specific text
//*[contains(text(),'Add to Cart')]
// Find nth table row
//table/tbody/tr[3]
Summary
XPath is a powerful, concise language for navigating XML documents. Master the basic path syntax, learn the 13 axes, and practice with predicates and functions. Use our free XML Tools Suite to run XPath queries against your own XML documents directly in your browser — with live results and no setup required.