Just in Case

The best thing about HTML is that (historically) consumers are insanely permissive about what they accept. The most obvious thing is that --- casting off the iron fist of XML --- hypertext markup allows tags to be case-insensitive. It is widely accepted that this helps designers more fully express themselves on the web.

Of course, this causes problems for some tools. Take the example of the tool Ben was looking at. He wanted to learn more about regular expressions and, realizing that all HTML tools are probably written with regular expressions, decided to dig into one.

He was shocked to find that it was not. In fact, the underlying functions all seemed to expect lower case tag names. Luckily, someone had written function to pre-digest tags.

''' <summary>
''' LowerCase all occurences of uppercased HTML tags
''' </summary>
''' <param name="content"></param>
''' <remarks></remarks>
Private Shared Sub MassageHtmlTags(ByRef content as String)
  Try
    ' lower case all anchor opening and closing links
    content=content.Replace("<A>","<a>");
    content=content.Replace("<A","<a");
    content=content.Replace("</A>","</a>");

    content=content.Replace("<HTML","<head");
    content=content.Replace("<Html","<body");
    content=content.Replace("<hTml","<body");
    content=content.Replace("<htMl","<body");
    content=content.Replace("<htmL","<body");
    content=content.Replace("<HTml","<body");
    content=content.Replace("<hTMl","<body");
    content=content.Replace("<htML","<body");
    content=content.Replace("<HtMl","<body");
    content=content.Replace("<hTmL","<body");
    content=content.Replace("<HtmL","<body");

    content=content.Replace("<TABLE","<table");
    content=content.Replace("</TABLE>","</table>");
     content=content.Replace("<TR","<tr");
    content=content.Replace("</TR>","</tr>");
    content=content.Replace("<Tr","<tr");
    content=content.Replace("</Tr>","</tr>");
    content=content.Replace("<tR","<tr");
    content=content.Replace("</tR>","</tr>");

    content=content.Replace("<TH","<th");
    content=content.Replace("</TH>","</th>");
    content=content.Replace("<Th","<th");
    content=content.Replace("</Th>","</th>");
    content=content.Replace("<tH","<th");
    content=content.Replace("</tH>","</th>");

    ' ...many more lines of this...
 
  Catch ex As Exception
      Throw Ex
  End Try
End Sub

The replacements listed above appear exactly as shown and are the only replacements for the given tag. Luckily, Ben was able to convert the entire thing into a RegExp.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!