Bending XML to Your Will

Share this article

If you’ve ever worked with the Twitter or Facebook APIs, looked at RSS feeds from a website, or made use of some type of RPC calls, you’ve undoubtedly experienced working with XML. Extensible Markup Language (XML) is a big building block of today’s web with hundreds of XML-based languages having been developed, including XHTML, ATOM, and SOAP just to name a few. I myself have to work with quite a few third-party systems to send and receive data and the preferred method for all of them is XML.

Knowing how to process XML data is a crucial programming skill today, and thankfully, PHP offers multiple ways to read, filter, and even generate XML. In this article I’ll explain what exactly XML is, in case you haven’t had any experience with it yet, and then dive into a few ways you can use PHP to bend XML to your will.

What does XML do?

The short answer to the question “What does XML do?” is nothing. It does nothing at all. XML is simply a markup language, similar to HTML. Whereas HTML was designed to display data, however, XML was designed to provide a structured way to transport and store data.

Let’s take a look at a simple XML example that contains information on particular sports teams:

<?xml version="1.0" encoding="UTF-8" ?>
<roster>
 <team>
   <name>Bengals</name>
   <division>AFC North</division>
   <colors>Black and Orange</colors>
   <stadium location="Cincinnati">Paul Brown Stadium</stadium>
   <coach>Marvin Lewis</coach>
 </team>
 <team>
  <name>Titans</name>
  <division>AFC South</division>
  <colors>Blue and White</colors>
  <stadium location="Tennessee">LP Field</stadium>
  <coach>Mike Munchak</coach>
 </team>
</roster>

As you can see from the example, XML is human-readable and is self descriptive. Unlike HTML, XML has no predefined tags, allowing you to invent your own. Anyone, whether they are a programmer or not, can look at this example and understand the data. The software that you create has the job to write or parse the information from the XML document.

Sharing information between various platforms, databases, and programming languages can be a frustrating endeavor, but since XML is just a plain text file, it allows your data to be independent from the software in use. Because XML is such a wide-spread standard, it also gives you the freedom to develop your application without worrying about incompatibility on the other end.

If you’re still a bit shaky on XML and what it’s place in web development is, take a look at this great introduction to XML, A Really, Really, Really Good Introduction to XML.

Types of XML Parsers

There are two basic types of XML parsers: tree-based parsers and event-based parsers (sometimes called stream parsers). Tree-based parsers read the entire XML document into memory, structures the data into a tree-like format, and allows you access to the tree elements. Event-based parsers on the other hand read in XML and raises an event every time it reaches a new start or end tag. This allows you to apply a function pertinent to you application when an event occurs for a specific element. Since you are not storing the entire XML document in memory, event-based parsers are generally faster and less-resource intensive than the tree-based ones. Tree-based parsers are generally easier to use and require less code.

PHP 5 has a plethora of tools to choose from that work with XML, including the XML Parser (a.k.a. SAX or Expat Parser), DOM, SimpleXML, XMLReader, XMLWriter, and the XSL extensions. For the sake of brevity I’ll look at just two of the most widely used parsers, the XML Parser and SimpleXML extensions, which coincidently is one of each type of parser.

Using the XML Parser Extension

The first example I’ll show you involves using the XML Parser extension, an event-based parser. To start, let’s use the same XML example from earlier and parse it with the extension. Imagine you have been given the task to parse the XML into a simple list to display on a web page. Create the file nfl.xml with the the example XML as its contents.

Create another file called xmlParserExample.php with the following code:

<?php
$xmlFile = "nfl.xml";

$parser = xml_parser_create();
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);
xml_set_element_handler($parser, array(NFLParser, "openTag"),
    array(NFLParser, "closeTag"));
xml_set_character_data_handler($parser,
    array(NFLParser, "characterData"));

$fp = fopen($xmlFile, "r");
while ($data = fread($fp, 4096)) {
    xml_parse($parser, $data, feof($fp))
        or die (sprintf("XML Error: %s at line %d",
            xml_error_string(xml_get_error_code($parser)), 
            xml_get_current_line_number($parser)));
}
xml_parser_free($parser);

class NFLParser {
    protected static $element;
    protected static $attrs;

    public static function openTag($parser, $elementName, $elementAttrs) {
        self::$element = $elementName;
        self::$attrs = $elementAttrs;

        switch($elementName) {
            case "team":
                echo "<ul>";
                break;
            case "division":
                echo "<li>Division: ";
                break;
            case "name":
                echo "<li>Team Name: ";
                break;
            case "colors":
                echo "<li>Team Colors: ";
                break;
            case "stadium":
                echo "<li>Stadium: ";
                break;
            case "coach":
                echo "<li>Head Coach: ";
        }
    }

    public static function closeTag($parser, $elementName) {
        self::$element = null;
        self::$attrs = null;
    
        if ($elementName == "team") {
            echo "</ul>";
        }
        elseif($elementName != "roster") {
            echo "</li>";
        }
    }

    public static function characterData($parser, $data) {
        echo $data;
        if (self::$element == "stadium") {
            echo " (" . self::$attrs["location"] . ")";
        }
    }
}

The xml_parser_create() function creates a new XML parser handler that is used throughout the code. The next function, xml_parser_set_option(), is used to set options for the parser. In this case, the XML_OPTION_CASE_FOLDING option is set to false (since it is set to true by default). Case folding is a the process applied to a sequence of characters in which they are all converted to uppercase. By setting this option to true I can preserve the case sensitivity of tags exactly how they appear in the XML file.

The xml_set_element_handler() function sets the parser’s start and end element handlers. This function accepts three parameters: the first parameter is the parser reference, the second parameter is the callback function that will handle opening tags (the static openTag() method of the NFLParser class in the example), and the third parameter is the callback that will handle closing tags (the closeTag() method).

PHP passes three parameters to openTag(): the parser, the name of the element for which this handler is called, and an associative array of any attributes for the element. Two parameters are provided to closeTag(): the parser and the name of the element.

The xml_set_character_data_handler() function specifies the function that will handle character data for an element. The function accepts two parameters: the parser and the name of the callback function which, in this example, is the static characterData() method. The characterData() method is passed two parameters: the parser, and the character data from the element.

The remaining bit of code in the example reads in the XML file and calls the xml_parse() function which starts the parsing process. xml_parse() accepts three parameters: the parser, a chunk of data to parse, and a boolean parameter which indicates whether it is the last piece of data.

The last function called is xml_parser_free(); just like in file handling, it is always a good idea to free up the reference handle when you’re finished.

I chose to encapsulate the methods in the class NFLParser so I could track the current element and attributes being parsed in $element and $attrs without them polluting the global namespace and make them available to the characterData() method.

Execute your script and you should have a nice HTML list of all the data from the XML.

<ul>
 <li>Team Name: Titans</li>
 <li>Team Colors: Blue and White</li>
 <li>Stadium: LP Field (Nashville)</li>
 <li>Head Coach: Mike Munchak</li>
</ul>
<ul>
 <li>Team Name: Bengals</li>
 <li>Team Colors: Black and Orange</li>
 <li>Stadium: Paul Brown Stadium (Cincinnati)</li>
 <li>Head Coach: Marvin Lewis</li>
</ul>

Well that wasn’t too bad interpreting XML with PHP using the event-driven parser, but what if there was an even easier way to slice up XML, a simpler way if you will?

Using SimpleXML

The SimpleXML extension was introduced in PHP 5 and takes a lot of the tedium of XML manipulation away. SimpleXML is a tree-based object-oriented parser, so it’s a slower and more resource-intensive way to parse XML, but any speed lost using this extension will be long forgotten once you see how “simple” it truly is to use.

Create a file called simpleXMLExample.php and enter the code below:

<?php
$xmlFile = "nfl.xml";

$xml = simplexml_load_file($xmlFile);

foreach($xml->team as $element){
    $attr = $element->stadium->attributes();
    $location = $attr->location;

    echo "<ul>n";
    echo " <li>Division:" . $element->division . "</li>n";
    echo " <li>Team Name:" . $element->name . "</li>n";
    echo " <li>Team Colors:" . $element->color . "</li>n";
    echo " <li>Stadium:" . $element->stadium ." (" . $location. ")</li>n";
    echo " <li>Coach" . $element->coach . "</li>n";
    echo "</ul>n";
}

Executing this script will produce the same output but without the need to write much of the parsing code.

You might be wondering why would you use an extension like XML Parser if SimpleXML is so… well, simple? I liken this question to a construction worker that goes to his job with only a hammer in his belt. Sure he’ll get by hammering nails for awhile, but what eventually he’ll be faced with a screw. Even though one tool might be easier to use, it doesn’t make it the ideal choice for every situation.

Summary

In this article you learned a little bit about XML and how it’s used around the web. More importantly, though, you learned about the two basic types of XML parsers, tree-based and event-based parsers. PHP offers several different XML parsing extensions, two of which are XML Parser and SimpleXML. Each offers trade-offs with performance, ease of use, and the amount of code the programmer needs to write. Hopefully seeing how both extensions are used will help you confidently choose the best approach the next time you need to consume XML.

Image via Ken Durden/Shutterstock

Stephen ThorpeStephen Thorpe
View Author

Stephen Thorpe is originally from London but now living in Tennessee. He works at an Internet and Telephone company as an applications developer primarily using PHP and MySQL.

Intermediate
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week