• Sherrif Roscoe (unregistered)

    Removing French letters? I think this should be named freedomizeText() instead!

  • skicow (unregistered)

    WTF?! Why is he passing the StringBuilder byref and then returning it as well?!?

  • gunnar (unregistered)

    He forgot 'Ø' and 'ø'.

  • Jacques Troux (unregistered)

    He also forgot the French ligatures "Œ" and "œ".
    Icelanders will appreciate their "ð" being changed to "o", which is a completely unrelated letter.

  • Eric (unregistered)

    I think he also forgot to replace £,€,¥, etc. with $.

  • Mike R. (unregistered)

    Suprised they didn't replace þ with anything..

  • Jean-Philippe Daigle (unregistered)

    We'd see this everywhere if programmers were paid per line of code. ;)

  • DCD (unregistered)

    Stringbuilder is an object so passing byref is unnecessary unless he wants to change the ref.

  • Phil Scott (unregistered)

    I'm willing to bet this was VB6 code "upgraded" into a .NET project. Since ByRef was the default in VB6, the upgrade wizards probably threw it in there.

    And the developer probably at one time heard that strings were slow, so they decided to replace the String being passed in with a stringBuilder.

  • Mike R. (unregistered)

    Incidently, þ would be converted as "th". They'd probably replace it with something silly like "p" ... which would change it's name, þorn to something... err, less... wholesome ;)

  • Adam W. (unregistered)

    Pedantic correction: it's spelled "y'all", seeing as it's a contraction of "you all".

  • WanFactory (unregistered)

    Unless my understanding of VB(.NET?) is severely flawed, I think both the byref and the return are completely unnecessary.

    If someone knows differently please speak up, I'm just a Java guy making possibly unwarranted assumptions.

  • Phil Scott (unregistered)

    Totally of subject, but HOLY CRAP look what's on the consulting page from the oracle dude from yesterday: http://www.dba-oracle.com/redneck.htm

  • Phil Scott (unregistered)

    Errr, "totally off subject"...

  • Jim Bolla (unregistered)

    While this is certainly amusing, and its implementation may be flawed, the need for it could actually be justified if for instance one were building an app that had to talk to a legacy system that would explode on all those crazy characters.

  • josh (unregistered)

    I look forward to seeing the function that cleans up Chinese. :)

  • AvonWyss (unregistered)

    It's certainly not a very nice piece of code, but on the other hand, I have wished to find some function in the framework to convert those special chars to ASCII chars. But my search has not been successful. They didn't even bother to do some meaningful conversion when you make an System.Text.Encoding.ASCII.GetBytes("öäü"), you only get the bytes for '?' in the resulting byte array. Now that's not useful and I believe that this kind of problems leads to such inventions.

  • mike roome (unregistered)

    "Icelanders will appreciate their 'ð' being changed to 'o', which is a completely unrelated letter. "

    Unless it's uppercase, of course, in which case it gets changed to D...

    ----

    "on the other hand, I have wished to find some function in the framework to convert those special chars to ASCII chars."

    That's impossible to do in a general way, though. You can't just replace a letter with a diacritic with the corresponding letter without the diacritic, since that can change the meaning of words entirely, and any translation of non-ascii characters to sequences of ascii characters (for instance, german ö can be replaced with oe) is highly culture dependent (and that's without even contemplating non-roman scripts, where the translation of individual codepoints is not only culture dependent, but also dependent on what romanisation method you use).

    If you need to store things in a legacy system that only stores 8-bit encodings, the correct way to handle it is to use utf-8. If you need the text to be readable ascii, then you need to constrain the input to reject any non-ascii characters, which will allow people to choose what ascii representation they want, rather than automatically producing something that's horribly wrong.

  • Alex Papadimoulis (unregistered)

    Phil, I believe you just provided the perfect link for "Yeeeee haw".

  • Tony Perrie (unregistered)

    Also, this would have been way funnier if "them" was "'em" or as we from Appalachia say, "'em 'ere funny peacenik markings"

  • Ilya Haykinson (unregistered)

    Actually if you search google you can see that ya'll is a perfectly accepted alternative to y'all, albeit less frequently used.

    In my opinion, y'all should be used to mean "you-all" which is the plural of "you", while ya'll should mean "all y'all" which means "all you-all" which is the equivalent of "all of you[plural]".

    Or something like that.

  • Phil Scott (unregistered)

    Taking more pot shots:

    sbBuilder? StringBuilderBuilder?

  • Jon (unregistered)

    But they are two totally different words mind you. Y'all can be used when conversing with one to three people, or two people and up to two live animals.

    "All y'all" is only used when talking to four or more people, six sheep, or any combination thereof.

    Furthermore, pluralizing "all" so that you have "All's y'all" is used when discussing a large, yet abstract group of individuals (or livestock).

  • Michael Giagnocavo (unregistered)

    "System.Text.Encoding.ASCII.GetBytes("öäü"), you only get the bytes for '?' "

    -- Um, yes, what did you expect? What about when you pass in han characters to ASCII.GetBytes? What do you want it to do? Passing in invalid data should do silly magic behind the scenes. Be happy it's a '?' and not an exception :).

  • Phil McCracken (unregistered)

    OK, I made a similar function that was used for a curse word filter. It replaced certain characters with other characters that they resembled like "|" with "I" so people couldn't pick a username like "SH|THEAD".

  • Josh (unregistered)

    This is way off topic too, but about the Oracle guy. Follow the redneck link given above, and look on the right. One of the banners is for guide-horses, like guide dogs, only your kids can ride them.

    http://www.guidehorse.org/

    If this site isn't a sham, and this guy gets the rates that he's asking, I need to become an Oracle DBA.

  • Paul (unregistered)

    You evil bastard... you know some dumb Americans will rip off this code for their webpages. Good for a laugh, but seriously, I think Americans vilify the French enough without making a concerted effort to corrupt their language (and that of every other european nation using the Latin alphabet as well).

  • Kylector (unregistered)

    Oh goodness, this isn't about nationalism, the French, or the Americans. It's about laughable code. Who cares if a stupid American rips it off for some "evil" use, I'm sure stupid French people rip things off for "evil" use, too. It seems as though the "rest of the world" likes to call Americans arrogant and stupid, but the "rest of the world" sounds just as arrogant and stupid by saying it. Americans are not all the same, just like Europeans are not all the same, and Asians are not all the same. To lump every person in a country into a stereotype is about as arrogant and ignorant as you can get.

  • anonymous (unregistered)

    What makes this especially funny is that the author's name is French.

    Still, copy-and-paste coding isn't always bad. The above isn't any better or worse than defining a lookup table and looping through it to do the search-and-replace. Same difference, really.

    The problem is that the code does what is intended. If it is really is a problem; that code just might have been written in a situation where 7-bit ASCII output really was required. Email addresses, for example.

    I've seen plenty of good code and bad code. Truely bad code is a bunch of big-ball-of-mud files without any defined interfaces, can't be summed up in a single WTF post, and is all too common. At least the above mistake is sufficiently abstracted (in a function) that you can fix it in that one place without having to grok tens of KLOCs of other WTF.

  • foxyshadis (unregistered)

    The funny thing is that PHP has a function that does exactly this, in one quick call:

    strtr($string, 'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ', 'SZszYAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy')

    And it's usually misused and abused there too. ("Hey, these characters look similar, they must be the same!") I'm willing to bet whoever originally wrote replaceDiacritics (which mostly aren't diacritics) used that exact example above to do it.

    That is the winning link of the week, hands down.

  • Gary Wheeler (unregistered)

    Nobody's commented on how slow this is going to be. You end up making umpteen loops through the string, one for each call of the Replace() function. If you're going to do something as stupid as this, you could at least do it efficiently (one loop through the string, a simple range check on the character, and then search through a lookup table).

  • God (unregistered)
    1. "Removing French letters? I think this should be named freedomizeText() instead!"


    2) "I think he also forgot to replace £,€,¥, etc. with $."

    ---

    Evidently you are both americans.

    1) The war was unjustified, pointless and whatnot, Chiraq was right, Blair and Bush were wrong

    2) Unless it also includes the maths to convert the numbers that would be stupid, as £1 is not $1 its more like $1.80 cos your currency is pretty weak at this moment in time.
  • Jesus (unregistered)

    As usual, God needs to get a sense of humor.

    Why don't you just go and smite yourself.

  • Moses (unregistered)

    Oh I dunno, there was that law about not coveting your neighbour's ass. That was quite funny - it would be if you saw my neighbour - oy vey.

  • Damian Cugley (unregistered)

    I have no idea why they wanted this replaceDiacritcs function, but I can think of a plausible one: When I create web pages about photos of friends, I want to include their name in the URL, but URLs are restricted to plain US-ASCII, so a hypothetical Zoë László would have her photos in zoe-laszlo.html.

    Python, Perl and PHP have easy string-map functions (for single-byte character data, at any rate); I don’t think dot-Net does.

  • phnk (unregistered)

    Hope this function will help your 26-chars limited brains to understand the world a bit better :)

  • josh (unregistered)

    Why phnk, whatever are you

  • phnk (unregistered)

    Long story…

  • Baf (unregistered)

    I've written a function similar to this myself. My reason: sorting. I needed a way to tell a rather naive system that Õhm comes between Oglethorpe and Oldman, and the easiest way to do this was to create sort keys without diacriticals (or punctuation, or capitalization). The sort keys were used solely internally, not displayed to the user.

    So if you're laughing at the supposed American insularity of this code, I can't join you until I know what it was used for. There are legitimate reasons to want to strip diacriticals from a string.

    I can, however, laugh at it for taking about 40 lines of code to do it.

  • Raymond Lewallen (unregistered)

    biggest wtf here is returning the object sent by reference.

  • non_Dev (unregistered)

    @Raymond Lewallen
    Why is that a wtf? I think I speak for all of us when I say we've learned the importance of doing things at least twice.

  • Tomalak (unregistered)

    I can think of at least one good reason to do things like this: Imagine somethng like an intranet phone book in an international firm. Nobody wants to vgrep-cut-and-paste the german 'ö' from some dubious charset software just because the DARN thing is missing on the swedish-layout keybord and there is no other way to get the desired phone numer.

    That does not mean that the 'ö' is not going to display in the intranet page, but the application is much more usable when it displays some "could be" matches.

    Despite the lengthy and maybe naive implemetation I can't see a WTF here.

  • Daniel (unregistered)

    biggest wtf here is returning the object sent by reference.

    Why do peeps keep saying this? Why would you assume that passing parameters by reference automatically means that you don't want to be able to use chaining?

    Or am I misunderstanding VB here? (Having managed to stay clear of it so far)

  • gurra g (unregistered) in reply to foxyshadis
    foxyshadis:
    The funny thing is that PHP has a function that does exactly this, in one quick call:

    strtr($string, 'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ', 'SZszYAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy')

    Except it can't handle multi-character replacements such as æ->ae.

    The code in the article is easier to maintain, less error-prone and can handle special cases such as æ, so you fail.

Leave a comment on “Unglobalization”

Log In or post as a guest

Replying to comment #:

« Return to Article