• (disco)

    That's not a :wtf:, then - it's an anti- :wtf:

    Good work, Eliza!

  • (disco)

    TRWTF is that "+" was being translated as "E" and not "&"? Gotcha.

  • (disco)

    And everyone in the office went "Yeeeeaaaacuuuuooiiy" in relief.

  • (disco)

    Wait a happy ending? That must be TRWTF.

  • (disco)

    TRWTF is that the original code didn't handle Ø and ø. Why do they hate Danish and Norwegian so much?

  • (disco)

    This was DiscoSearch code, wasn't it! Oh wait, couldn't be, it was fixed at the end of the story.

  • (disco)

    TRWTF is that if you tried the same trick in MySQL < 5.6 and with Polish "ł" character, you'd summon a bunch of foreigners claiming you don't know anything about your language.

  • (disco) in reply to emkael

    That's an impressive level of hard-headed refusal to believe local expertise. "We don't have any Polish people so we'll ignore a bunch of you saying the collation is wrong."

  • (disco) in reply to FrostCat

    Nah, I get where they were coming from - it was the Unicode specification that was flawed and the argument against custom patching in a non-compliant way is a vaild one (they could have backported the 5.2.0-based collation, though...). It was a strange case anyway, as ideally you'd want diacritic characters to both sort "properly" (i.e. l < ł < m) and yield equivalence in use cases like in the article.

    Just found it a bit silly that the general tone of discussion was "MySQL non-Poles would rather believe Unicode non-Poles than MySQL Poles".

    Also, coming back to the article - Eliza's co-worker's lucky the users didn't expect what one of my customers recently requested: that searching for "rene" returns "René", but searching for "rené" does not return "renegades". Fun fun fun!

  • (disco) in reply to FrostCat

    Not only that, but a full half of the comments were about sorting, when the bug is about comparison. :wtf: Sure, you might need to rebuild indexes, but that's not really a big deal (let it run as part of the upgrade, expected maintenance guys, woohoo!). Classic derailment.

  • (disco) in reply to Tsaukpaetra

    Sure, but when the Poles started saying "The polish collation is corrct, but the general one isn't" and the MySQL people reply with "the polish collection is correct, this isn't a bug", well, you have to wonder about their reading comprehension.

  • (disco) in reply to dkf

    I'm also very disappointed that they didn't include the "Heavy Metal N" (n̈), ala Spin̈al Tap.

  • (disco)

    So, characters are like dates: never assume there could be a built-in function or an available library to handle them, always build your own implementation - that's the only way you can be sure it works.

  • (disco) in reply to PWolff

    lim code works -> ~works

  • (disco) in reply to emkael
    emkael:
    Also, coming back to the article - Eliza's co-worker's lucky the users didn't expect what one of my customers recently requested: that searching for "rene" returns "René", but searching for "rené" does not return "renegades". Fun fun fun!

    Sounds like a narrow-minded conversion.

    This article reminds me of a couple of jobs ago, when we picked up a Greek client. That will test your limits on collation.

  • (disco)

    IBM DB2 is a little unusual with respect to dates and times: it has a DATE type and a TIME type and a TIMESTAMP type. And these are truly different ...the IBM date is not one of those "pansy" things that also contains a time: when they say DATE, they mean month, day, year...that's it.

    Also, the types are not directly comparable. So suppose you have a two host variable that contains a starting and ending DATE, and you want to look for all the rows with a TIMESTAMP on that date.

    Well, an anti-pattern that developed here is to do this:

      SELECT *
      FROM ATABLE
      WHERE DATE(EVENT) >= :START
        AND DATE(EVENT) <= :END
    

    And this takes forever on a big table, because to evaluate the WHERE clause, the engine must laboriously retrieve every row and convert the timestamp to a date, then do the comparison; row by row, for all of the milllions(?) of rows that might match.

    Because the WHERE is not indexable; you can't fast-search an index when every index key has to be converted to another type. Bad as the TRANSLATES in the article.

    Consider this alternative, which is almost as easy:

      SELECT *
      FROM ATABLE
      WHERE EVENT >= TIMESTAMP(:START,'00.00.00')
        AND EVENT <= TIMESTAMP(:END,  '24.00.00')
    

    Well, the engine looks at that, and EVENT is being compared to two constant expressions, which can be pre-folded into constant timestamps. Indexable; what we call "stage 1" processing. Warp speed.

    I've changed this in several programs and, in a couple cases, cut program wall time 90% (not to mention CPU).

Leave a comment on “Collated Performance”

Log In or post as a guest

Replying to comment #:

« Return to Article