The Daily WTF: Curious Perversions in Information Technology

skotl · 2015-12-09 Reply Admin

That's not a :wtf:, then - it's an anti- :wtf:

Good work, Eliza!

Quite · 2015-12-09 Reply Admin

TRWTF is that "+" was being translated as "E" and not "&"? Gotcha.

chrullrich · 2015-12-09 Reply Admin

And everyone in the office went "Yeeeeaaaacuuuuooiiy" in relief.

RFoxmich · 2015-12-09 Reply Admin

Wait a happy ending? That must be TRWTF.

dkf · 2015-12-09 Reply Admin

TRWTF is that the original code didn't handle Ø and ø. Why do they hate Danish and Norwegian so much?

mott555 · 2015-12-09 Reply Admin

This was DiscoSearch code, wasn't it! Oh wait, couldn't be, it was fixed at the end of the story.

emkael · 2015-12-09 Reply Admin

TRWTF is that if you tried the same trick in MySQL < 5.6 and with Polish "ł" character, you'd summon a bunch of foreigners claiming you don't know anything about your language.

FrostCat · 2015-12-09 Reply Admin

That's an impressive level of hard-headed refusal to believe local expertise. "We don't have any Polish people so we'll ignore a bunch of you saying the collation is wrong."

emkael · 2015-12-09 Reply Admin

Nah, I get where they were coming from - it was the Unicode specification that was flawed and the argument against custom patching in a non-compliant way is a vaild one (they could have backported the 5.2.0-based collation, though...). It was a strange case anyway, as ideally you'd want diacritic characters to both sort "properly" (i.e. l < ł < m) and yield equivalence in use cases like in the article.

Just found it a bit silly that the general tone of discussion was "MySQL non-Poles would rather believe Unicode non-Poles than MySQL Poles".

Also, coming back to the article - Eliza's co-worker's lucky the users didn't expect what one of my customers recently requested: that searching for "rene" returns "René", but searching for "rené" does not return "renegades". Fun fun fun!

Tsaukpaetra · 2015-12-09 Reply Admin

Not only that, but a full half of the comments were about sorting, when the bug is about comparison. :wtf: Sure, you might need to rebuild indexes, but that's not really a big deal (let it run as part of the upgrade, expected maintenance guys, woohoo!). Classic derailment.

FrostCat · 2015-12-09 Reply Admin

Sure, but when the Poles started saying "The polish collation is corrct, but the general one isn't" and the MySQL people reply with "the polish collection is correct, this isn't a bug", well, you have to wonder about their reading comprehension.

Jeff_Olson · 2015-12-09 Reply Admin

I'm also very disappointed that they didn't include the "Heavy Metal N" (n̈), ala Spin̈al Tap.

PWolff · 2015-12-09 Reply Admin

So, characters are like dates: never assume there could be a built-in function or an available library to handle them, always build your own implementation - that's the only way you can be sure it works.

RFoxmich · 2015-12-09 Reply Admin

lim code works -> ~works

chubertdev · 2015-12-09 Reply Admin

emkael:
Also, coming back to the article - Eliza's co-worker's lucky the users didn't expect what one of my customers recently requested: that searching for "rene" returns "René", but searching for "rené" does not return "renegades". Fun fun fun!

Sounds like a narrow-minded conversion.

This article reminds me of a couple of jobs ago, when we picked up a Greek client. That will test your limits on collation.

CoyneTheDup · 2015-12-10 Reply Admin

IBM DB2 is a little unusual with respect to dates and times: it has a DATE type and a TIME type and a TIMESTAMP type. And these are truly different ...the IBM date is not one of those "pansy" things that also contains a time: when they say DATE, they mean month, day, year...that's it.

Also, the types are not directly comparable. So suppose you have a two host variable that contains a starting and ending DATE, and you want to look for all the rows with a TIMESTAMP on that date.

Well, an anti-pattern that developed here is to do this:

  SELECT *
  FROM ATABLE
  WHERE DATE(EVENT) >= :START
    AND DATE(EVENT) <= :END

And this takes forever on a big table, because to evaluate the WHERE clause, the engine must laboriously retrieve every row and convert the timestamp to a date, then do the comparison; row by row, for all of the milllions(?) of rows that might match.

Because the WHERE is not indexable; you can't fast-search an index when every index key has to be converted to another type. Bad as the TRANSLATES in the article.

Consider this alternative, which is almost as easy:

  SELECT *
  FROM ATABLE
  WHERE EVENT >= TIMESTAMP(:START,'00.00.00')
    AND EVENT <= TIMESTAMP(:END,  '24.00.00')

Well, the engine looks at that, and EVENT is being compared to two constant expressions, which can be pre-folded into constant timestamps. Indexable; what we call "stage 1" processing. Warp speed.

I've changed this in several programs and, in a couple cases, cut program wall time 90% (not to mention CPU).

Collated Performance

Leave a comment on “Collated Performance”