1. Overview
The W3C recommends XPath as a standard syntax, providing a set of expressions to navigate XML documents.
In this tutorial, we’ll go over the basics of XPath with the support of the standard Java JDK.
We’ll take a simple XML document, process it, and learn how to extract the necessary information from it.
2. A Simple XPath Parser
import jakarta.xml.namespace.NamespaceContext;
import jakarta.xml.parsers.DocumentBuilder;
import jakarta.xml.parsers.DocumentBuilderFactory;
import jakarta.xml.parsers.ParserConfigurationException;
import jakarta.xml.xpath.XPath;
import jakarta.xml.xpath.XPathConstants;
import jakarta.xml.xpath.XPathExpressionException;
import jakarta.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
public class DefaultParser {
private File file;
public DefaultParser(File file) {
this.file = file;
}
}
Now, let’s take a closer look at the elements we’ll find in the DefaultParser:
FileInputStream fileIS = new FileInputStream(this.getFile());
DocumentBuilderFactory builderFactory = newSecureDocumentBuilderFactory();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/Tutorials/Tutorial";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
Let’s break down the above code:
To produce a DOM object tree from our XML document, we’ll create a builderFactory instance using the newSecureDocumentBuilderFactory() method which internally creates and configures a DocumentBuilderFactory instance for secure XML parsing.
This method enhances security during XML parsing by disabling potentially dangerous features related to external entities and DTDs. This is a best practice when processing XML from untrusted sources to prevent XML-related vulnerabilities.
DocumentBuilderFactory builderFactory = newSecureDocumentBuilderFactory();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Having an instance of the DocumentBuilder class, we can parse XML documents from many different input sources like InputStream, File, URL, and SAX:
Document xmlDocument = builder.parse(fileIS);
A Document represents the entire XML document, is the root of the document tree, and provides our first access to data:
XPath xPath = XPathFactory.newInstance().newXPath();
From the XPath object, we’ll access the expressions and execute them over our document to extract what we need from it:
xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
We can compile an XPath expression passed as a string and define what kind of data we expect to receive, such as NODESET, NODE, or String.
3. Let’s Start
Now that we’ve reviewed the base components we’ll be using, let’s dive into some code with a simple XML example for testing purposes:
<?xml version="1.0"?>
<Tutorials>
<Tutorial tutId="01" type="java">
<title>Guava</title>
<description>Introduction to Guava</description>
<date>04/04/2016</date>
<author>GuavaAuthor</author>
</Tutorial>
<Tutorial tutId="02" type="java">
<title>XML</title>
<description>Introduction to XPath</description>
<date>04/05/2016</date>
<author>XMLAuthor</author>
</Tutorial>
</Tutorials>
3.1. Retrieve a Basic List of Elements
The first method is a simple use of an XPath expression to retrieve a list of nodes from the XML:
FileInputStream fileIS = new FileInputStream(this.getFile());
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/Tutorials/Tutorial";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
We can retrieve the tutorial list contained in the root node by using the expression above, or by using the expression “*//Tutorial*” but this one will retrieve all
The NodeList returns by specifying NODESET to the compile instruction. The return type is an ordered collection of nodes that can be accessed by passing an index as a parameter.
3.2. Retrieving a Specific Node by Its ID
We can look for an element based on any given ID just by filtering:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(this.getFile());
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/Tutorials/Tutorial[@tutId=" + "'" + id + "'" + "]";
node = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);
Using this kind of expression, we can filter for any element we need by using the correct syntax. These kinds of expressions are called predicates and they are an easy way to locate specific data over a document, for example:
/Tutorials/Tutorial[1]
/Tutorials/Tutorial[first()]
/Tutorials/Tutorial[position()<4]
3.3. Retrieving Nodes by a Specific Tag Name
Now we’re going further by introducing axes, let’s see how this works by using it in an XPath expression:
Document xmlDocument = builder.parse(this.getFile());
this.clean(xmlDocument);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//Tutorial[descendant::title[text()=" + "'" + name + "'" + "]]";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
With the expression used above, we’re looking for every