PHP

You are currently browsing the archive for the PHP category.

Don’t use the equality operator (==) to compare strings – use the strcmp() function, or the identity operator (===). In fact, when comparing values in PHP, use the identity operator (===) instead of the equality operator whenever possible. You may have heard that === is faster, since it incurs no type conversions, but some people have argued that it hurts code clarity. According to this theory, you should only use the identity operator when you actually need the two variables to have the same type, as when checking strpos() for the return value “false“. But the fact is that the identity operator is not only faster, it is also clearer. The identity operator says, “These two pieces of data are identical,” whereas the equality operator says, “These two pieces of data, or anything under the sun they could possibly be converted to, are equal.” Which seems like clearer code to you?

If you doubt the importance of this distinction – and I understand why you might – check out the following, rather shocking examples taken from the PHP manual:

if (            0 == 'my string'
  &&            1 != 'my string'
  &&       '+010' == '10.0'
  &&   '  131e-2' == '001.3100'
  && '000e002073' == '0e459239'
  &&       '0xab' == 0253
  &&       '0xab' != '0253'
  &&       '0xab' == '171'
){
  echo 'WTF????';
}

If you run the above code, you will see that all the above examples do indeed evaluate to true. How can that be? It’s because PHP checks every possible conversion of each argument, so it actually tries converting both strings to floats, and if those floats are equal, returns true. Those little e’s are exponent markers, the x’s indicate hexadecimal values, and a leading 0 marks an octal (unless it’s in a string, apparently). As for why the numeral zero is equal to any string … it beats the hell out of me.

In light of these facts, a developer should only ever use == when type-conversion is expected, such as when comparing a form input to a numerical value. Furthermore, we should regard == as expressing this expectation, since, by using the comparison operator, the developer has forced us to check for conversions. Thus, to keep your code clean and self-documenting, stop using ==. Use strcmp() for strings; it’s binary-safe, and expresses the type of your arguments without the need for comments. Use === for everything else.

An Undocumented “Feature”

Suppose we write the following code, whose simple purpose is to go through an XML document and replace every “foo” element with an empty “bar” element:

$dom = DOMDocument::loadXML('
  <root>
  <foo>This</foo>
  <foo />
  <foo />
  </root>'
);

$document = $dom->documentElement;
$foos = $document->getElementsByTagName('foo');

for ($i = 0; $i < $foos->length; $i++) {
  $bar = $dom->createElement('bar');
  $document->replaceChild($bar, $foos->item($i));
}

We are quite surprised when the script outputs:

<root><bar/><foo/><bar/></root>

Why did it skip the middle element? Because the DOMNodeList class has an undocumented “feature”: when the owner document of a DOMNodeList object is changed, the object is recreated. That means that, when we replace the first “foo” node, the second “foo” node becomes the new first node. Also, the length of the node list is now 2, not 3. But since $i has been incremented, the for loop misses the second node entirely, operates on the third, then exits normally.

The solution to this problem is to save a reference to each node in an array, then loop over the array:

for ($i = 0; $i < $foos->length; $i++) {
  $nodes[$i] = $foos->item($i);
}

for ($i = 0; $i < count($nodes); $i++) {
  $bar = $dom->createElement('bar');
  $document->replaceChild($bar, $nodes[$i]);
}

This code outputs what we intuitively expected from the original code:

<root><bar/><bar/><bar/></root>

Implementation: A DOMNodeIterator Class

It’s best to encapsulate this technique in a class. Here’s a simple class that does the job:

class DOMNodeIterator implements Iterator
{
  protected $nodes;

  public function __construct(DOMNodeList $nodeList)
  {
    if ($nodeList->item(0)) {
      for ($i = 0; $i < $nodeList->length; $i++) {
        $this->nodes[$i] = $nodeList->item($i);
      }
    }
  }

  public function current()
  {
    return current($this->nodes);
  }

  public function key()
  {
    return key($this->nodes);
  }

  public function next()
  {
    return  next($this->nodes);
  }

  public function rewind()
  {
    reset($this->nodes);
  }

  public function valid()
  {
    return $this->current() ? true : false;
  }
}

On the Other Hand, Orphan Nodes

Our iterator has one drawback: if we remove a node in the list via removeChild(), it will still exist in the iterator, but it will no longer be associated with our document. Unfortunately, the only way to check for this is to ascend the entire DOM tree each time we want to access a node, to make sure it is still a descendant of the root node. Rather than incur that overhead, we’ll leave it to the devloper to use the iterator with care. We can safeguard the above code by putting the call to replaceChild() inside a try block:

try {
  $document->replaceChild($bar, $foo);
} catch (DOMException $e) {
  if ($e->getMessage() !== 'Not Found Error') {
    throw $e;
  }
}

An Issue with PHP, or with DOM?

Stay tuned for my next blog entitled “Why the DOM Sucks.” Till next time…

One undeniable advantage of Ruby on Rails is its terse template syntax. The cumbersome

<?php echo $something ?>

is replaced by the elegant and readable

<%= something %>

The very good news is that you can use these same tags in PHP! Just add the following to your htaccess:

php_flag asp_tags on

Voila! Readable templates. What’s curious is that no one seems to use this option. Perhaps it is because they are referred to as “ASP-style tags.” (Can we just call them “Ruby-style tags”, instead?) On all my sites, I use the long tags for blocks of code, and the short <%= %> tags for any PHP code floating in the HTML sea. In symfony, that means the template files use one tag-style, and the rest of the code uses the other. I encourage everyone – yes, everyone – to start using Ruby-style tags in their templates. The more people use it, the more common it will be to have the “asp_tags” option on by default, so people on shared servers can join in the readable fun.

It’s important for a second reason – arbitrary conventions should be standardized. That is, any time we are faced with a set of possibilities that are all of equal value – such as what weird punctuation our programming language should use to demarcate itself – we should pick one standard way and stick with it. That way we reduce the learning curve of all languages (or whatever the things the convention pertains to). Imagine if there were one, universal syntax for putting server-side code into HTML. Imagine if there were a templating language that every designer knew – because it’s so simple – and that every server-side language supported. It would not be a universal programming language, since we should not be using the full power of a programming language from inside our templates; that’s what the controller code is for. This universal templating language (What the hell, let’s go ahead and call it UTL, because giving initials to computer-denizens makes them seem like real people) should support the following things, and probably nothing else:

  1. Variables, including objects and arrays
  2. if/else
  3. while
  4. foreach
  5. Very basic arithmetic and string operators
  6. No function or method calls

That last one may raise some eyebrows. Why would we not include functions and methods? Because we can call all those funcitons in the controller, and assign their return values to variables, which we then use in the template per number 1 above. Also, allowing function calls brings the capacity for arbitrarily complex logic into UTL, as well as coupling it more tightly with the mother language, both of which defeat the purpose of UTL.

Make the dream a reality. Start by using Ruby tags.

The PHP manual tells us that, though a class may implement more than one interface, “A class cannot implement two interfaces that share function names, since it would cause ambiguity.” Makes perfect sense – until, that is, you realize that PHP interfaces contain no code. They’re nothing but definitions. Therefore, there is nothing to be ambiguous.

Interfaces have a single purpose: by using them, functions and methods can require their arguments to implement a certain set of methods. (Actually, they merely require that methods matching certain definitions exist.) Suppose we are writing a graphics library, and we have a set of functions for manipulating and drawing objects that have a color. In that case, we may want to have an interface like the following:

interface DrawableInColor
{
  public function getColor();
  public function setColor($color);
  public function draw($x, $y);
}

But suppose we also want to deal with objects that can be resized. Then we may want an interface like this:

interface DrawableAndResizable
{
  public function scaleX($x_factor);
  public function scaleY($y_factor);
  public function draw($x, $y);
}

So what if we have a classof objects that are colored and resizable? So long as we give the class’s methods the appropriate names, the class will be an implementation of both the above interfaces. There is no ambiguity, since there is only one method named “draw.” In fact, no class can have two functions by the same name, so there can never be ambiguity in declaring that a class implements two overlapping interfaces. But we can’t declare it! To use our class with both sets of functions, we would need to remove the functions’ interface requirements entirely. The only other solution is to rename the draw() function of one of the interfaces (to draw2(), let’s say) and add a dummy function to our class, like this:

public function draw2($x, $y)
{
  $this->draw($x, $y);
}

Both “solutions” are terribly ugly, and although the core developers of PHP insist that it is not an object-oriented language, it should at least allow the OO features of the language to be used as such. The restriction on interfaces severely limits their utility in PHP, and the development team should really consider removing it in future releases.

Newer entries »