PHP DOM: Using XPath

Share this article

In a recent article I discussed PHP’s implementation of the DOM and introduced various functions to pull data from and manipulate an XML structure. I also briefly mentioned XPath, but didn’t have much space to discuss it. In this article, we’ll look closer at XPath, how it functions, and how it is implemented in PHP. You’ll find that XPath can greatly reduce the amount of code you have to write to query and filter XML data, and will often yield better performance as well. I’ll use the same DTD and XML from the previous article to demonstrate the PHP DOM XPath functionality. To quickly refresh your memory, here’s what the DTD and XML look like:

<!ELEMENT library (book*)> 
<!ELEMENT book (title, author, genre, chapter*)> 
  <!ATTLIST book isbn ID #REQUIRED> 
<!ELEMENT title (#PCDATA)> 
<!ELEMENT author (#PCDATA)> 
<!ELEMENT genre (#PCDATA)> 
<!ELEMENT chapter (chaptitle,text)> 
  <!ATTLIST chapter position NMTOKEN #REQUIRED> 
<!ELEMENT chaptitle (#PCDATA)> 
<!ELEMENT text (#PCDATA)>
<?xml version="1.0" encoding="utf-8"?> 
<!DOCTYPE library SYSTEM "library.dtd"> 
<library> 
  <book isbn="isbn1234"> 
    <title>A Book</title> 
    <author>An Author</author> 
    <genre>Horror</genre> 
    <chapter position="first"> 
      <chaptitle>chapter one</chaptitle> 
      <text><![CDATA[Lorem Ipsum...]]></text> 
    </chapter> 
  </book> 
  <book isbn="isbn1235"> 
    <title>Another Book</title> 
    <author>Another Author</author> 
    <genre>Science Fiction</genre> 
    <chapter position="first"> 
      <chaptitle>chapter one</chaptitle> 
      <text><![CDATA[<i>Sit Dolor Amet...</i>]]></text> 
    </chapter> 
  </book> 
</library>

Basic XPath Queries

XPath is a syntax available for querying an XML document. In it’s simplest form, you define a path to the element you want. Using the XML document above, the following XPath query will return a collection of all the book elements present:
//library/book
That’s it. The two forward slashes indicate library is the root element of the document, and the single slash indicates book is a child. It’s pretty straight forward, no? But what if you want to specify a particular book. Let’s say you want to return any books written by “An Author”. The XPath for that would be:
//library/book/author[text() = "An Author"]/..
You can use text() here in square braces to perform a comparison against the value of a node, and the trailing “/..” indicates we want the parent element (i.e. move back up the tree one node). XPath queries can be executed using one of two functions: query() and evaluate(). Both perform the query, but the difference lies in the type of result they return. query() will always return a DOMNodeList whereas evaluate() will return a typed result if possible. For example, if your XPath query is to return the number of books written by a certain author rather than the actual books themselves, then query() will return an empty DOMNodeList. evaluate() will simply return the number so you can use it immediately instead of having to pull the data from a node.

Code and Speed Benefits with XPath

Let’s do a quick demonstration that returns the number of books written by an author. The first method we’ll look at will work, but doesn’t make use of XPath. This is to show you how it can be done without XPath and why XPath is so powerful.
<?php
public function getNumberOfBooksByAuthor($author) { 
    $total = 0;
    $elements = $this->domDocument->getElementsByTagName("author");
    foreach ($elements as $element) {
        if ($element->nodeValue == $author) {
            $total++;
        }
    }
    return $number;
}
The next method achieves the same result, but uses XPath to select just those books that are written by a specific author:
<?php
public function getNumberOfBooksByAuthor($author)  { 
    $query = "//library/book/author[text() = '$author']/..";
    $xpath = new DOMXPath($this->domDocument);
    $result = $xpath->query($query); 
    return $result->length;
}
Notice how we this time we have removed the need for PHP to test against the value of the author. But we can go one step further still and use the XPath function count() to count the occurrences of this path.
<?php
public function getNumberOfBooksByAuthor($author)  { 
    $query = "count(//library/book/author[text() = '$author']/..)";
    $xpath = new DOMXPath($this->domDocument);
    return $xpath->evaluate($query);
}
We’re able to retrieve the information we needed with only only line of XPath and there is no need to perform laborious filtering with PHP. Indeed, this is a much simpler and succinct way to write this functionality! Notice that evaluate() was used in the last example. This is because the function count() returns a typed result. Using query() will return a DOMNodeList but you will find that it is an empty list. Not only does this make your code cleaner, but it also comes with speed benefits. I found that version 1 was 30% faster on average than version 2 but version 3 was about 10 percent faster than version 2 (about 15% faster than version 1). While these measurements will vary depending on your server and query, using XPath in it’s purest form will generally yield a considerable speed benefit as well as making your code easier to read and maintain.

XPath Functions

There are quite a few functions that can be used with XPath and there are many excellent resources which detail what functions are available. If you find that you are iterating over DOMNodeLists or comparing nodeValues, you will probably find an XPath function that can eliminate a lot of the PHP coding. You’ve already see how count() functions. Let’s use the id() function to return the titles of the books with the given ISBNs. The XPath expression you will need to use is:
id("isbn1234 isbn1235")/title
Notice here that the values you are searching for are enclosed within quotes and delimited with a space; there is no need for a comma to delimit the terms.
<?php
public function findBooksByISBNs(array $isbns) { 
    $ids = join(" ", $isbns);
    $query = "id('$ids')/title"; 

    $xpath = new DOMXPath($this->domDocument); 
    $result = $xpath->query($query); 

    $books = array();
    foreach ($result as $node) {
        $book = array("title" => $booknode->nodeValue);
        $books[] = $book;
    }
    return $books; 
}
Executing complex functions in XPath is relatively simple; the trick is to become familiar with the functions that are available.

Using PHP Functions With XPath

Sometimes you may find that you need some greater functionality that the standard XPath functions cannot deliver. Luckily, PHP DOM also allows you to incorporate PHP’s own functions into an XPath query. Let’s consider returning the number of words in the title of a book. In it’s simplest function, we could write the method as follows:
<?php
public function getNumberOfWords($isbn) {
    $query = "//library/book[@isbn = '$isbn']"; 

    $xpath = new DOMXPath($this->domDocument); 
    $result = $xpath->query($query); 

    $title = $result->item(0)->getElementsByTagName("title")
        ->item(0)->nodeValue; 

    return str_word_count($title); 
}
But we can also incorporate the function str_word_count() directly into the XPath query. There are a few steps that need to be completed to do this. First of all, we have to register a namespace with the XPath object. PHP functions in XPath queries are preceded by “php:functionString” and then the name of the function function you want to use is enclosed in parentheses. Also, the namespace to be defined is http://php.net/xpath. The namespace must be set to this; any other values will result in errors. We then need to call registerPHPFunctions() which tells PHP that whenever it comes across a function namespaced with “php:”, it is PHP that should handle it. The actual syntax for calling the function is:
php:functionString("nameoffunction", arg, arg...)
Putting this all together results in the following reimplementation of getNumberOfWords():
<?php
public function getNumberOfWords($isbn) {
    $xpath = new DOMXPath($this->domDocument);

    //register the php namespace
    $xpath->registerNamespace("php", "http://php.net/xpath"); 

    //ensure php functions can be called within xpath
    $xpath->registerPHPFunctions();

    $query = "php:functionString('str_word_count',(//library/book[@isbn = '$isbn']/title))"; 

    return $xpath->evaluate($query); 
}
Notice that you don’t need to call the XPath function text() to provide the text of the node. The registerPHPFunctions() method does this automatically. However the following is just as valid:
php:functionString('str_word_count',(//library/book[@isbn = '$isbn']/title[text()]))
Registering PHP functions is not restricted to the functions that come with PHP. You can define your own functions and provide those within the XPath. The only difference here is that when defining the function, you use “php:function” rather than “php:functionString”. Also, it is only possible to provide either functions on their own or static methods. Calling instance methods are not supported. Let’s use a regular function that is outside the scope of the class to demonstrate the basic functionality. The function we will use will return only books by “George Orwell”. It must return true for every node you wish to include in the query.
<?php
function compare($node) {
    return $node[0]->nodeValue == "George Orwell";
}
The argument passed to the function is an array of DOMElements. It is up to the function to iterate through the array and determine whether the node being tested should be returned in the DOMNodeList. In this example, the node being tested is /book and we are using /author to make the determination. Now we can create the method getGeorgeOrwellBooks():
<?php
public function getGeorgeOrwellBooks() { 
    $xpath = new DOMXPath($this->domDocument); 
    $xpath->registerNamespace("php", "http://php.net/xpath"); 
    $xpath->registerPHPFunctions(); 

    $query = "//library/book[php:function('compare', author)]"; 
    $result = $xpath->query($query); 

    $books = array(); 
    foreach($result as $node) { 
        $books[] = $node->getElementsByTagName("title")
            ->item(0)->nodeValue; 
    } 

    return $books;
}
If compare() were a static method, then you would need to amend the XPath query so that it reads:
//library/book
In truth, all of this functionality can be easily coded up with just XPath, but the example shows how you can extend XPath queries to become more complex. Calling an object method is not possible within XPath. If you find you need to access some object properties or methods to complete the XPath query, the best solution would be to do what you can with XPath and then work on the resulting DOMNodeList with any object methods or properties as necessary.

Summary

XPath is a great way of cutting down the amount of code you have to write and to speed up the execution of the code when working with XML data. Although not part of the official DOM specification, the additional functionality that the PHP DOM provides allows you to extend the normal XPath functions with custom functionality. This is a very powerful feature and as your familiarity with XPath functions increase you may find that you come to rely on this less and less. Image via Fotolia

Frequently Asked Questions (FAQs) about PHP DOM using XPath

What is XPath and how is it used in PHP DOM?

XPath, or XML Path Language, is a query language that is used for selecting nodes from an XML document. In PHP DOM, XPath is used to navigate through elements and attributes in an XML document. It allows you to locate and select specific parts of an XML document through a variety of methods such as selecting nodes by name, by the value of their attributes, or by their position in the document. This makes it a powerful tool for parsing and manipulating XML data in PHP.

How do I create an instance of DOMXPath?

To create an instance of DOMXPath, you first need to create an instance of the DOMDocument class. Once you have a DOMDocument object, you can create a new DOMXPath object by passing the DOMDocument object to the DOMXPath constructor. Here’s an example:

$dom = new DOMDocument;
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);

How do I select nodes using XPath?

You can select nodes using the query() method of the DOMXPath object. The query() method takes an XPath expression as a parameter and returns a DOMNodeList object containing all the nodes that match the expression. For example:

$nodes = $xpath->query('//book/title');

This will select all the <title> elements that are children of <book> elements.

What is the difference between query() and evaluate() methods in DOMXPath?

Both query() and evaluate() methods are used to evaluate XPath expressions. The difference lies in the type of result they return. The query() method returns a DOMNodeList of all nodes matching the XPath expression. On the other hand, evaluate() returns a typed result such as a boolean, number, or string, depending on the XPath expression. If the expression results in a node set, evaluate() will return a DOMNodeList.

How do I handle namespaces in XPath queries?

To handle namespaces in XPath queries, you need to register the namespace with the DOMXPath object using the registerNamespace() method. This method takes two parameters: the prefix and the namespace URI. Once the namespace is registered, you can use the prefix in your XPath queries. For example:

$xpath->registerNamespace('x', 'http://www.example.com');
$nodes = $xpath->query('//x:book/x:title');

How can I select attributes using XPath?

You can select attributes in XPath by using the @ symbol followed by the attribute name. For example, to select all href attributes of a elements, you can use the following XPath expression: //a/@href.

How do I use XPath functions in PHP DOM?

XPath provides a number of functions that you can use in your XPath expressions. These functions can be used to manipulate strings, numbers, node sets, and more. To use an XPath function in PHP DOM, simply include the function in your XPath expression. For example, to select all book elements that have a price element with a value greater than 30, you can use the number() function like this: //book[number(price) > 30].

Can I use XPath with HTML documents in PHP DOM?

Yes, you can use XPath with HTML documents in PHP DOM. However, because HTML is not always well-formed XML, you may encounter issues when trying to use XPath with HTML. To avoid these issues, you can use the loadHTML() method of the DOMDocument class to load the HTML document. This method will parse the HTML and correct any well-formedness errors, allowing you to use XPath with the resulting DOMDocument object.

How do I handle errors when using XPath in PHP DOM?

When using XPath in PHP DOM, errors can occur for a variety of reasons, such as a malformed XPath expression or a failure to load the XML document. To handle these errors, you can use the libxml_use_internal_errors() function to enable user error handling. This function will cause libxml errors to be stored internally, allowing you to handle them in your code. You can then use the libxml_get_errors() function to retrieve the errors and handle them as needed.

Can I modify XML documents using XPath in PHP DOM?

While XPath itself does not provide a way to modify XML documents, you can use XPath in conjunction with the DOM API to modify XML documents. You can use XPath to select the nodes you want to modify, and then use the methods provided by the DOM API to make the modifications. For example, you can use the removeChild() method of the DOMNode class to remove a node, or the setAttribute() method of the DOMElement class to change the value of an attribute.

Tim SmithTim Smith
View Author

Tim Smith is a freelance web designer and developer based in Harrogate, North Yorkshire in the UK. He started out programming the Spectrum 48k and has never looked back. When he's not boring his other half with the details of his latest project, Tim can be found with his beloved Korgs making music, reading a good book or in the pub.

Intermediate
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week