The Daily WTF: Curious Perversions in Information Technology

2023-07-04 Reply Admin

At least they used JSON (which was probably the buzzword of the year) and not tortured XML.

By the way, happy Frouth of July.

2023-07-04 Reply Admin

URLs with spaces. Proper punkers. Oi!

2023-07-04 Reply Admin

If you believe Lisp fanboys, XML is "just" verbose sexprs. JSON sans objects is rather more equivalent to sexprs (as long as you don't distinguish atoms and strings), therefore transitively, this is XML.

Of course, the equivalence between XML and sexprs is very dubious: so far as I've ever been able to tell, a mapping scheme which translates arbitrary XML (after entity expansion) into sexprs and back is not one which turns arbitrary sexprs into XML and back. So lisp fanboys would also be a WTF.

2023-07-04 Reply Admin

JSON's mistake of not permitting comments and trailing commas makes it horrible for humans to edit, although it's used like that every day.

XML fails for humans to edit because the closing tag always has to be matched up.

sexprs at least didn't require verbose closing tags, but the last line of your file was always '))))))))))))'.

YAML is sorta OK, except it's expensive to parse because of those labeled references. At least it permits a relaxed JSON when you don't feel like using indentation as a syntax.

I can't wait for the AI people to let us start writing config in natural language. Won't that be wonderful?

Nutster · 2023-07-04 Reply Admin

It sounds like the primary metric is not SLOC, but MTBF: Mean Time Between Failures.

2023-07-04 Reply Admin

XML requiring matching close tags is a WTF on its own: it's the one thing that prevents the grammar from being context free (contrary to popular belief, "possibly misnested XML" is a regular language regardless of this).

This would be less of a WTF if XML wasn't a subset of SGML, which allows a syntax that is simply </> to close the current element. I have no idea why this didn't make it into XML, and my only hypothesis is that SGML's own facility for defining what shorthands are allowed doesn't have a specific switch to allow just empty close tags without also enabling empty open tags or something.

Empty open tags are a bit less helpful: do they open another copy of the currently open element, or of the most recently closed element? Still not the most WTF bit of SGML, which is an excellent demonstration of the principle that "easy to generate manually" and "easy to parse mechanically" are very hard indeed to reconcile.

PotatoEngineer · 2023-07-04 Reply Admin

On the YAML front, I don't like thinking of it as JSON-compatible: the whitespace is significant, just like in Python. The only JSON that's YAML-compatible is one that's been pretty-printed with just the right settings, and the only YAML that's JSON-compatible is one using just the right delimiters. Whether it's written by humans or converted by machines, that JSON-YAML compatibility is awfully fragile.

Mr. TA · 2023-07-04 Reply Admin

I just looked up YAML reference out of curiosity and it's one more proof that software design on drugs is a persistent problem.

Happy independence day, fellow Americans!!

Paddles · 2023-07-05 Reply Admin

"Programming in natural language" is older than I am. It's called COBOL, and I'm in no hurry to return to it.

R3D3 · 2023-07-05 Reply Admin

JSON's mistake of not permitting comments and trailing commas makes it horrible for humans to edit, although it's used like that every day.

I agree on trailing commas, but on comments I am somewhat split. I can perfectly see situations where a consumer rejects JSON with extra keys, so people start adding custom metadata in comments. But yes, the argument is weak, and I'd prefer having them.

R3D3 · 2023-07-05 Reply Admin

I don't particularly much like XML myself. But the closing tags are really not what I'd complain about. They are verbose, but that's it.

What I'd rather complain about is the enormous complexity, that results in everyone implementing the same concepts completely differently. Though I can see some value even there.

Example: You want to represent a list of input values, let's say a mass and a position. In JSON, you'd probably come up with a substructure along the lines of

"properties": {
    "mass": 1.53,
    "position": [0, 0, 10]
}

In XML, at this level this already starts with multiple questions: Attributes or text inside nodes? In our own config files I've seen both equivalents of

<mass value="1.53"/>

and

<mass>1.53</mass>

Things get weirder for the array, because XML has no natural way to represent this. I've seen all of

<position value="0 0 10"/>     (1)

<position>0 0 10</position>    (2)

<position>                     (3)
   <x>0</x>
   <y>0</y>
   <z>10</z>
</position>

and I suppose there are many more possible variations, as people try to balance the verbosity and need for conventions of (3) against the lack of structure in (1) and (2).

Now a new requirement comes along: You have so far been representing lengths in meters and masses in kilograms, but some manager or customer is adamant about the need to represent different unit systems in the XML file. Or at the very least, the XML file should actually specify the units. For XML the extension paths are reasonably easy using attributes.

<mass value="1.53" units="kg">
<mass units="kg">1.53</mass>

For the position, same concept. These input files are now also compatible with previous versions of the software, as long as the units correspond to the previous convention, as the previous version will just ignore the added attributes.

R3D3 · 2023-07-05 Reply Admin

(continued)

So... how do you do the same in JSON?

"mass_units": "kg",
"mass": 1.53,

is just awful at first glance, but might be OK for solving the "human readable and backwards compatible" aspect, if so required given the alternatives. A separation

"unit_system_info": {
    "mass": "kg",
    "length": "meters"
},
...
"proprties": {
    "mass": 1.53,
    "position": [0, 0, 10]
}

would work, but still be awkward. Also, not robust for allowing mixed units (e.g. milimters and meters) in the file, which the XML version is, and it separates the pieces of needed information awkwardly from a human-reader perspective.

A change to

"mass": {"value": 1.53, "units": "kg"},
"position": {"value": [0, 0, 10], "units": "meters"}

lacks the backwards-compatibility property, and makes the data less human-readable due to its verbosity. A form

"mass": [1.53, "kg"]

would look natural to a human reader, but be weirdly error-prone from a parsing perspective, and lead to weird cases like

"position": [[0, 0, 10], "meters"]

which start looking very weird under pretty-printing

"position": [
    [0, 0, 10],
    "meters"
],

which again fails the human-readability aspect.

TL;DR: Whatever to complain about the capabilities of common data representation methods, whether it needs a closing tag or not is really not high on my list of concerns.

2023-07-05 Reply Admin

Honestly I like verbose closing tags for the simple reason that they make parsing by both humans and machines easier. Consider the nesting example R3D3 posted only instead of one level you have 5 or 50 and suddenly having named closing tags make thing a lot easier to figure out at a glance.

Verbosity isn't always bad. Too much information can be discarded mentally, too little can't be created.

Watson · 2023-07-05 Reply Admin

On the subject of comments in JSON, the fact that JSON was designed as a wire protocol between processes, and not as some sort of file format, is all by itself enough to explain why it doesn't have comments. It was never supposed to sit around to be edited by humans. But yeah, adding comments to JSON would just encourage people to put semantically-significant stuff in those comments.

R3D3 · 2023-07-05 Reply Admin

Verbosity isn't always bad. Too much information can be discarded mentally, too little can't be created.

Mixed bag on that. While I agree that XML can make it easier to navigate large files, the verbosity is quite frequently so extreme, that it severely hurts readability.

I'm glad that our code-base went for the

<mass value="1.53" units="kilograms"/>

method, as a find

<mass units="kilograms">1.53</mass>

to be awful for human-readability and I often need to edit these files manually for development purposes.

R3D3 · 2023-07-05 Reply Admin

Out of curiosity: What happens in various formats with "duplicate keys"?

JSON:

properties: { 
    "mass": 1.53, 
    "mass": 1.52
},

XML:

<properties>
    <mass value="1.53"/>
    <mass value="1.51"/>
</properties>

In JSON, it would be obvious that something is going wrong. From what I've been reading, real-world parsers may not treat it is an error, but silently use the first or last value, or even try to preserve the duplicate key somehow in anticipation of possibly ill-formed input. But at least it is clear, that there is an issue.

With XML, it is not clear at all. In our own code base it has gone the very weird way of just ignoring such inputs entirely... So the two would be the same to our software:

(a) <properties><mass value="1.53"/><mass value="1.52"/></properties>
(b) <properties/>

2023-07-05 Reply Admin

Personally I would have vent for <mass> <value>1.53</value> <unit>kilograms</unit> </mass>.

If XML tags are properly indented and newlined they look much better and are easier to read than attributes.

2023-07-05 Reply Admin

In XML, they're not duplicate keys unless there's a schema which says that the 'mass' element cannot be repeated. It's perfectly valid XML as-is, and indeed, that's how you'd represent an array structure that allows multiple values. So it's up to the parser... it might reject it as non-conformant with the schema, or it might be fine because it does conform to the schema, or a non-validating parser might do just about anything...

Steve_The_Cynic · 2023-07-05 Reply Admin

With XML, it is not clear at all.

Maybe. If you have a validating parser and a correct description of the schema, the double <mass ...> might be treated as invalid or as what amounts to a two-valued parameter (effectively an array of masses).

So the two would be the same to our software:

I would hope that it treats <properties><mass value="1.53"/><mass value="1.52"/></properties> as being equivalent to <properties></properties> because of the subtle difference between <properties></properties> and <properties/>. (As I understand it, the first contains an empty text element, while the second contains no text element.)

2023-07-05 Reply Admin

Those are Tableau urls - the spaces in the URL is the dashboard name so spaces are both permitted and implemented - mostly by devs who have migrated from Excel. The worrying thing there is that they are scraping against a server they already have access to, probably their own vis-server, so begs the question, why bother with a scrape? Why not just go to source and pull the data from their own data server? Unless ofc, they don't know how to as the data has been re-arranged in the tool. But then, they could just extract the sql from the desktop log, server log or even the data server profiler.

R3D3 · 2023-07-05 Reply Admin

or a non-validating parser might do just about anything...

That's my interpretation of what our logic does. At least whenever we define new data entries in the XML file, we never update any kind of schema.

Unexpected duplicate entry? Just ignore it. To be fair, the files are expected to be generated by the GUI side of the project, so it isn't a real issue. The bigger annoyance is, that this pattern is mostly a result of copy-paste code for handling the "exactly one element with this tag" case, instead of creating a utility function for it. And lack of test coverage means, that it's not an easy fix either.

I would hope that it treats <properties><mass value="1.53"/><mass value="1.52"/></properties> as being equivalent to <properties></properties> because of the subtle difference between <properties></properties> and <properties/>. (As I understand it, the first contains an empty text element, while the second contains no text element.)

Didn't catch that one... The logic is not on the level of XML parsing though, but on the level of interpreting the parsed DOM, and we never access the text elements, except when the node is not expected to contain anything but the text element.

Which in turn means that

<mass>1.53<junk><data/></junk></mass>

would probably be interpreted the same as

<mass>1.53</mass>

In JSON such "room for interpretation" cases would occur less frequently I think.

2023-07-06 Reply Admin

Funny, environment described in article is just like my current place. Just add some unlimited amount of stored procedures (Oracle) copied around gazillion schemas; and absolutely no repositories (well, some history is preserved - copied around - procedure A, A_2, A_3...). No one sees all the code (because 'security'), there is no documentation. Any simple development task takes forever, because: "Hey, do you know anything about procedure X? I see a reference, uncommented declaration, but not code. Any idea of what it does?" If you manage to track down someone knowledgeable, then best you get is hour long meeting of vague mutterings a la "yeah, i've seen it once, no idea really, it's been there before me, certainly there is documentation somewhere, let me just arrange another meeting or two..."

Monitoring, if any, is fragmented. If clients complain because some large process silently fails, then we start again at a general meeting, finding out who might know something about a critical part. Often you hear something like "Aha! I knew it, there is logging! Message "something went wrong" (literally) was sent to my email... which i do not use anymore... oopsie. No, i have no idea what exactly went wrong. This is in catch-all-exceptions block."

Of course we cannot add central logging, because security. No changes to architecture are possible, because security. Well, and this system has grown for 30 years, therefore it is pinnacle of evolution. Did I mention security? Since no privileges are given to developers, access rights are shared by sharing passwords. Instead of neccessary rights you get full rights to a schema. Some older ones are really powerful..

And of course all the development has to happen in production database - walking on eggshells, fingers crossed, praying to your deity of choice. Testing and prelive databases actually exist, but those are decades outdated, have their own mix of schemas, no connections to other key systems, and no migration is possible. Because security.

2023-07-06 Reply Admin

Btw, wtf is WTF commenting system. I get error "message too long", but no indication of what the limit might be. Textarea has hard limit, but it is somewhat larger than backend allows.

2023-07-06 Reply Admin

If you want comments in JSON, why not create a "comment" tag?

2023-07-06 Reply Admin

I know it's a matter of taste but I just hate that. It just looks like and is an attempt to cram way too much into a single tag. The proper way to handle it would be: <mass><unit>"kilograms"</unit><value>1.53</value></mass> Newlines omitted because I don't want to fight the formatting on this site.

That way you can both extend the structure later if needed and more easily deserialize it into a mass class. And it's just more human readable. Remember, unlike JSON, XML files are supposed to be both human readable and human editable without tools.

Classic WTF: The Contractor

Leave a comment on “Classic WTF: The Contractor”