XML

You are currently browsing the archive for the XML category.

An Undocumented “Feature”

Suppose we write the following code, whose simple purpose is to go through an XML document and replace every “foo” element with an empty “bar” element:

$dom = DOMDocument::loadXML('
  <root>
  <foo>This</foo>
  <foo />
  <foo />
  </root>'
);

$document = $dom->documentElement;
$foos = $document->getElementsByTagName('foo');

for ($i = 0; $i < $foos->length; $i++) {
  $bar = $dom->createElement('bar');
  $document->replaceChild($bar, $foos->item($i));
}

We are quite surprised when the script outputs:

<root><bar/><foo/><bar/></root>

Why did it skip the middle element? Because the DOMNodeList class has an undocumented “feature”: when the owner document of a DOMNodeList object is changed, the object is recreated. That means that, when we replace the first “foo” node, the second “foo” node becomes the new first node. Also, the length of the node list is now 2, not 3. But since $i has been incremented, the for loop misses the second node entirely, operates on the third, then exits normally.

The solution to this problem is to save a reference to each node in an array, then loop over the array:

for ($i = 0; $i < $foos->length; $i++) {
  $nodes[$i] = $foos->item($i);
}

for ($i = 0; $i < count($nodes); $i++) {
  $bar = $dom->createElement('bar');
  $document->replaceChild($bar, $nodes[$i]);
}

This code outputs what we intuitively expected from the original code:

<root><bar/><bar/><bar/></root>

Implementation: A DOMNodeIterator Class

It’s best to encapsulate this technique in a class. Here’s a simple class that does the job:

class DOMNodeIterator implements Iterator
{
  protected $nodes;

  public function __construct(DOMNodeList $nodeList)
  {
    if ($nodeList->item(0)) {
      for ($i = 0; $i < $nodeList->length; $i++) {
        $this->nodes[$i] = $nodeList->item($i);
      }
    }
  }

  public function current()
  {
    return current($this->nodes);
  }

  public function key()
  {
    return key($this->nodes);
  }

  public function next()
  {
    return  next($this->nodes);
  }

  public function rewind()
  {
    reset($this->nodes);
  }

  public function valid()
  {
    return $this->current() ? true : false;
  }
}

On the Other Hand, Orphan Nodes

Our iterator has one drawback: if we remove a node in the list via removeChild(), it will still exist in the iterator, but it will no longer be associated with our document. Unfortunately, the only way to check for this is to ascend the entire DOM tree each time we want to access a node, to make sure it is still a descendant of the root node. Rather than incur that overhead, we’ll leave it to the devloper to use the iterator with care. We can safeguard the above code by putting the call to replaceChild() inside a try block:

try {
  $document->replaceChild($bar, $foo);
} catch (DOMException $e) {
  if ($e->getMessage() !== 'Not Found Error') {
    throw $e;
  }
}

An Issue with PHP, or with DOM?

Stay tuned for my next blog entitled “Why the DOM Sucks.” Till next time…