An Undocumented "Feature"

Suppose we write the following code, whose simple purpose is to go through an XML document and replace every "foo" element with an empty "bar" element:

$dom = DOMDocument::loadXML('
  <root>
  <foo>This</foo>
  <foo />
  <foo />
  </root>'
);

$document = $dom->documentElement;
$foos = $document->getElementsByTagName('foo');

for ($i = 0; $i < $foos->length; $i++) {
  $bar = $dom->createElement('bar');
  $document->replaceChild($bar, $foos->item($i));
}
We are quite surprised when the script outputs:
<root><bar/><foo/><bar/></root>
Why did it skip the middle element? Because the DOMNodeList class has an undocumented "feature": when the owner document of a DOMNodeList object is changed, the object is recreated. That means that, when we replace the first "foo" node, the second "foo" node becomes the new first node. Also, the length of the node list is now 2, not 3. But since $i has been incremented, the for loop misses the second node entirely, operates on the third, then exits normally.

The solution to this problem is to save a reference to each node in an array, then loop over the array:

for ($i = 0; $i < $foos->length; $i++) {
  $nodes[$i] = $foos->item($i);
}

for ($i = 0; $i < count($nodes); $i++) {
  $bar = $dom->createElement('bar');
  $document->replaceChild($bar, $nodes[$i]);
}
This code outputs what we intuitively expected from the original code:
<root><bar/><bar/><bar/></root>

Implementation: A DOMNodeIterator Class

It's best to encapsulate this technique in a class. Here's a simple class that does the job:

class DOMNodeIterator implements Iterator
{
  protected $nodes;

  public function __construct(DOMNodeList $nodeList)
  {
    if ($nodeList->item(0)) {
      for ($i = 0; $i < $nodeList->length; $i++) {
        $this->nodes[$i] = $nodeList->item($i);
      }
    }
  }

  public function current()
  {
    return current($this->nodes);
  }
    
  public function key()
  {
    return key($this->nodes);
  }

  public function next()
  {
    return  next($this->nodes);
  }

  public function rewind()
  {
    reset($this->nodes);
  }

  public function valid()
  {
    return $this->current() ? true : false;
  }
}

On the Other Hand, Orphan Nodes

Our iterator has one drawback: if we remove a node in the list via removeChild(), it will still exist in the iterator, but it will no longer be associated with our document. Unfortunately, the only way to check for this is to ascend the entire DOM tree each time we want to access a node, to make sure it is still a descendant of the root node. Rather than incur that overhead, we'll leave it to the devloper to use the iterator with care. We can safeguard the above code by putting the call to replaceChild() inside a try block:

try {
  $document->replaceChild($bar, $foo);
} catch (DOMException $e) {
  if ($e->getMessage() !== 'Not Found Error') {
    throw $e;
  }
}

An Issue with PHP, or with DOM?

Stay tuned for my next blog entitled "Why the DOM Sucks." Till next time…

One undeniable advantage of Ruby on Rails is its terse template syntax. The cumbersome

<?php echo $something ?>

is replaced by the elegant and readable

<%= something %>

The very good news is that you can use these same tags in PHP! Just add the following to your htaccess:

php_flag asp_tags on

Voila! Readable templates. What's curious is that no one seems to use this option. Perhaps it is because they are referred to as "ASP-style tags." (Can we just call them "Ruby-style tags", instead?) On all my sites, I use the long tags for blocks of code, and the short <%= %> tags for any PHP code floating in the HTML sea. In symfony, that means the template files use one tag-style, and the rest of the code uses the other. I encourage everyone - yes, everyone - to start using Ruby-style tags in their templates. The more people use it, the more common it will be to have the "asp_tags" option on by default, so people on shared servers can join in the readable fun.

It's important for a second reason - arbitrary conventions should be standardized. That is, any time we are faced with a set of possibilities that are all of equal value - such as what weird punctuation our programming language should use to demarcate itself - we should pick one standard way and stick with it. That way we reduce the learning curve of all languages (or whatever the things the convention pertains to). Imagine if there were one, universal syntax for putting server-side code into HTML. Imagine if there were a templating language that every designer knew - because it's so simple - and that every server-side language supported. It would not be a universal programming language, since we should not be using the full power of a programming language from inside our templates; that's what the controller code is for. This universal templating language (What the hell, let's go ahead and call it UTL, because giving initials to computer-denizens makes them seem like real people) should support the following things, and probably nothing else:

  1. Variables, including objects and arrays
  2. if/else
  3. while
  4. foreach
  5. Very basic arithmetic and string operators
  6. No function or method calls
That last one may raise some eyebrows. Why would we not include functions and methods? Because we can call all those funcitons in the controller, and assign their return values to variables, which we then use in the template per number 1 above. Also, allowing function calls brings the capacity for arbitrarily complex logic into UTL, as well as coupling it more tightly with the mother language, both of which defeat the purpose of UTL.

Make the dream a reality. Start by using Ruby tags.

I recently wrote a model-validation plugin for symfony. I believe this plugin, or something like it, should be included in the symfony core. Here's why.

In an MVC architecture, the model layer (the M in "MVC") encapsulates the data model of our application. This includes not only data storage and access, but also validation - that is, the data model should specify what data is allowed, and what is forbidden. In symfony, for example, the save() method of every data object should first validate the object. Thus, in a strict MVC architecture, we should perform only form-specific validation in the controller. But in a web application, since almost all data that needs to be validated comes from web forms, we often "cheat" by validating the form data in the controller layer instead of validating within the model. symfony does things this way. While there are several efficient ways of validating forms in symfony, they all occur between the end-user's submission of a form, and the corresponding "update" action. Thus, symfony provides support only for controller-based validation. To be fair, we can validate within the model, by overriding the validate() method of each data class in our model. But this represents a lot of repeated code. Furthermore, if we want to use YAML files to specify validation, or if we want to set errors that can be accessed from the presentation layer, we must manually call functions to accomplish these things.

It would be better if we could use symfony's automated validation logic from within the model. This would not only allow, but encourage developers to move validation related to the data model out of the controller and into the model. Doing so not only helps us understand our application conceptually; it also protects our data from bugs in the presentation layer. As developers, we are wise to be skeptical of ourselves, especially when the security of our data may depend on it. Encapsulation can also protect us from bugs in the framework itself, like this one.

For the application I am currently developing, data security is the top priority. So I wrote the sfPropelValidateBehavior plugin. Now you can associate your validation logic with your data classes and put it in your project's "model" directory. In most cases, you can copy the configuration from you form-validation files as-is. I hope this helps you other symfony developers!

The PHP manual tells us that, though a class may implement more than one interface, "A class cannot implement two interfaces that share function names, since it would cause ambiguity." Makes perfect sense - until, that is, you realize that PHP interfaces contain no code. They're nothing but definitions. Therefore, there is nothing to be ambiguous.

Interfaces have a single purpose: by using them, functions and methods can require their arguments to implement a certain set of methods. (Actually, they merely require that methods matching certain definitions exist.) Suppose we are writing a graphics library, and we have a set of functions for manipulating and drawing objects that have a color. In that case, we may want to have an interface like the following:

interface DrawableInColor 
{ 
  public function getColor(); 
  public function setColor($color); 
  public function draw($x, $y); 
}

But suppose we also want to deal with objects that can be resized. Then we may want an interface like this:

interface DrawableAndResizable 
{ 
  public function scaleX($x_factor); 
  public function scaleY($y_factor); 
  public function draw($x, $y); 
}

So what if we have a classof objects that are colored and resizable? So long as we give the class's methods the appropriate names, the class will be an implementation of both the above interfaces. There is no ambiguity, since there is only one method named "draw." In fact, no class can have two functions by the same name, so there can never be ambiguity in declaring that a class implements two overlapping interfaces. But we can't declare it! To use our class with both sets of functions, we would need to remove the functions' interface requirements entirely. The only other solution is to rename the draw() function of one of the interfaces (to draw2(), let's say) and add a dummy function to our class, like this:

public function draw2($x, $y) 
{ 
  $this->draw($x, $y); 
}

Both "solutions" are terribly ugly, and although the core developers of PHP insist that it is not an object-oriented language, it should at least allow the OO features of the language to be used as such. The restriction on interfaces severely limits their utility in PHP, and the development team should really consider removing it in future releases.