"Working for Opera Software's QA department gives you in-depth perspectives on the web's wild and varied coding practises," writes Hallvord R. M. Steen. "I still wasn't prepared for the curious solutions that power the menu on the new Israel Railways website."
"The coding is unbelievable," Hallvord continues. "Diving into the website's source code shows that its coders must have fallen asleep during the what's the point of XSLT lesson. It's more like an XML parser/serializer stress test than a production site."
For those unaware (wake up Railways developers!), the XSLT markup/programming language is widely used to transform one sort of DOM into another - for example turning the DOM of a generic XML file into valid XHTML. Much of the benefit is that you're working on DOM trees, making it hard or impossible to create syntactically invalid pages. Of course, like most tools, XSLT can be horribly abused, and the Railways website does a pretty impressive job of that:
<xsl:template name="inner-text-tag-open"> <xsl:if test="$is-mz-impl"> <xsl:comment>nwlt</xsl:comment> </xsl:if> <xsl:text disable-output-escaping="yes"><</xsl:text> </xsl:template> <xsl:template name="inner-text-tag-close"> <xsl:text disable-output-escaping="yes">></xsl:text> <xsl:if test="$is-mz-impl"> <xsl:comment>nwgt</xsl:comment> </xsl:if> </xsl:template> <xsl:template name="inner-text-element-open"> <xsl:param name="element-name"/> <xsl:call-template name="inner-text-tag-open"/> <xsl:value-of select="$element-name"/> <xsl:text disable-output-escaping="yes"> </xsl:text> </xsl:template> <xsl:template name="inner-text-element-close"> <xsl:param name="element-name"/> <xsl:call-template name="inner-text-tag-open"/> / <xsl:value-of select="$element-name"/> <xsl:call-template name="inner-text-tag-close"/> </xsl:template>
The purpose of the preceding code was to concatenate bits of text (<
, /
, >
, etc) into something like this:
<div></div>
Of course, to get the other useful parts of the string (attributes, inner contents), the rest of the giant XSLT (and perhaps the other giant XSLT) is required.
Hallvord, who clearly spent much more time trying to understand the XSLT mess, explains further:
When they in their wisdom chose to generate markup inside text nodes with their XSLT they run into the familiar problem: when is < going to start a tag and when is it going to live in a text node? Hence, < is sometimes escaped as < to create proper text nodes with HTML source-as-text in them (as for example the instance of < in the code above). Now, of course when they set innerHTML they do not want this < to appear as a literal < so they do some pre-processing: all < and > they want to change into proper < and > before setting innerHTML have a comment node next to them...
<!--nwlt--><TR class="nw-2r"><!--nwgt--> <!--nwlt--><TD class="nw-2c"><!--nwgt-->...and their pre-processing is a simple string replace...
sHtml = sHtml .replace(/\<!--nwlt--\></g,"<") .replace(/>\<!--nwgt--\>/g,">") .replace(/\<[\/]?tbody\>/gi,"");And why they hate the poor TBODY so much they must strip it from the markup even though the browser will re-generate them in the DOM as soon as innerHTML is parsed I can't even begin to imagine.
Even More Fun
The good news for those with even more time on their hands is that this XSLT "cleverness" appears to be the tip of the iceberg. Digging through some of the Railways site's JavaScript, I noticed this function...
function escapeProperly(str)
...which calls...
function escapeProperlyCore(str, bAsUrl)
...which finally calls ...
function escapeProperlyCoreCore(str, bAsUrl, bForFilterQuery, bForCallback)
And then there was this bizarre array:
var LegalUrlChars=new Array ( false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, false, false, true, false, false, true, true, true, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, false, true, false, true, false, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false );
And on its surface, while the data does seem strange, there is actually a good use for it:
for(var i=0; i<strLeafName.length; i++) { var ch=strLeafName.charCodeAt(i); if(strLeafName.charAt(i)=='.' && (i==0 || i==(strLeafName.length-1))) return i; if(ch < 160 && ( strLeafName.charAt(i)=='/' || !LegalUrlChars[ch]) ) return i; } return -1;
Err, something like that.