If there’s one big problem with XML, it’s arguably that XML is overspecified. That’s not all bad- it means that every behavior, every option, every approach is documented, schematized, and defined. That might result in something like SOAP, which creates huge, bloated payloads, involves multiple layers of wrapping tags, integrates with discovery schemas, has additional federation and in-built security mechanisms, each of which are themselves defined in XML. And let’s not even start on XSLT and XQuery.

It also means that if you have a common task, like embedding arbitrary content in a safe fashion, there’s a well-specified and well-documented way to do it. If you did want to embed arbitrary content in a safe fashion, you could use the <![CDATA [Here is some arbitrary content]]> directive. It’s not a pretty way of doing it, but it means you don’t have to escape anything but ]]>, which is only a problem in certain esoteric programming languages with rude names.

So, there’s an ugly, but perfectly well specified and simple to use method of safely escaping content to store in XML. You know why we’re here. Carl W was going through some of the many, many gigs of XML data files his organization uses, and found:

<CommandLine>&amp;lt%3bPATH&amp;gt%3bSOME_VALUE_HERE&amp;lt%3b/PATH&amp;gt%3b</CommandLine>

The specific sequence of mangling operations that were performed aren’t documented anywhere, but you can figure it out. To decode this, you first have to convert the character entities back into actual characters- which really is just the ampersands.

Now you have: &lt%3bPATH&gt%3bSOME_VALUE_HERE&lt%3b/PATH&gt%3b.

This is obviously URL encoded. So we can reverse that, yielding &lt;PATH&gt;SOME_VALUE_HERE&lt;/PATH&gt;.

Now, we can decode the character entities here.

<PATH>SOME_VALUE_HERE</PATH>

XML documents nest quite neatly, so why even do this escaping rigamarole? If you don't want it as XML, why not use CDATA? Why URL encode any of this? Carl had neither the time nor the documentation to figure it out. He simply changed SOME_VALUE_HERE to NEW_VALUE_HERE, and moved on to the next problem.

[Advertisement] Keep the plebs out of prod. Restrict NuGet feed privileges with ProGet. Learn more.