- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Office Politics
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Removing French letters? I think this should be named freedomizeText() instead!
Admin
WTF?! Why is he passing the StringBuilder byref and then returning it as well?!?
Admin
He forgot 'Ø' and 'ø'.
Admin
He also forgot the French ligatures "Œ" and "œ".
Icelanders will appreciate their "ð" being changed to "o", which is a completely unrelated letter.
Admin
I think he also forgot to replace £,€,¥, etc. with $.
Admin
Suprised they didn't replace þ with anything..
Admin
We'd see this everywhere if programmers were paid per line of code. ;)
Admin
Stringbuilder is an object so passing byref is unnecessary unless he wants to change the ref.
Admin
I'm willing to bet this was VB6 code "upgraded" into a .NET project. Since ByRef was the default in VB6, the upgrade wizards probably threw it in there.
And the developer probably at one time heard that strings were slow, so they decided to replace the String being passed in with a stringBuilder.
Admin
Incidently, þ would be converted as "th". They'd probably replace it with something silly like "p" ... which would change it's name, þorn to something... err, less... wholesome ;)
Admin
Pedantic correction: it's spelled "y'all", seeing as it's a contraction of "you all".
Admin
Unless my understanding of VB(.NET?) is severely flawed, I think both the byref and the return are completely unnecessary.
If someone knows differently please speak up, I'm just a Java guy making possibly unwarranted assumptions.
Admin
Totally of subject, but HOLY CRAP look what's on the consulting page from the oracle dude from yesterday: http://www.dba-oracle.com/redneck.htm
Admin
Errr, "totally off subject"...
Admin
While this is certainly amusing, and its implementation may be flawed, the need for it could actually be justified if for instance one were building an app that had to talk to a legacy system that would explode on all those crazy characters.
Admin
I look forward to seeing the function that cleans up Chinese. :)
Admin
It's certainly not a very nice piece of code, but on the other hand, I have wished to find some function in the framework to convert those special chars to ASCII chars. But my search has not been successful. They didn't even bother to do some meaningful conversion when you make an System.Text.Encoding.ASCII.GetBytes("öäü"), you only get the bytes for '?' in the resulting byte array. Now that's not useful and I believe that this kind of problems leads to such inventions.
Admin
"Icelanders will appreciate their 'ð' being changed to 'o', which is a completely unrelated letter. "
Unless it's uppercase, of course, in which case it gets changed to D...
----
"on the other hand, I have wished to find some function in the framework to convert those special chars to ASCII chars."
That's impossible to do in a general way, though. You can't just replace a letter with a diacritic with the corresponding letter without the diacritic, since that can change the meaning of words entirely, and any translation of non-ascii characters to sequences of ascii characters (for instance, german ö can be replaced with oe) is highly culture dependent (and that's without even contemplating non-roman scripts, where the translation of individual codepoints is not only culture dependent, but also dependent on what romanisation method you use).
If you need to store things in a legacy system that only stores 8-bit encodings, the correct way to handle it is to use utf-8. If you need the text to be readable ascii, then you need to constrain the input to reject any non-ascii characters, which will allow people to choose what ascii representation they want, rather than automatically producing something that's horribly wrong.
Admin
Phil, I believe you just provided the perfect link for "Yeeeee haw".
Admin
Also, this would have been way funnier if "them" was "'em" or as we from Appalachia say, "'em 'ere funny peacenik markings"
Admin
Actually if you search google you can see that ya'll is a perfectly accepted alternative to y'all, albeit less frequently used.
In my opinion, y'all should be used to mean "you-all" which is the plural of "you", while ya'll should mean "all y'all" which means "all you-all" which is the equivalent of "all of you[plural]".
Or something like that.
Admin
Taking more pot shots:
sbBuilder? StringBuilderBuilder?
Admin
But they are two totally different words mind you. Y'all can be used when conversing with one to three people, or two people and up to two live animals.
"All y'all" is only used when talking to four or more people, six sheep, or any combination thereof.
Furthermore, pluralizing "all" so that you have "All's y'all" is used when discussing a large, yet abstract group of individuals (or livestock).
Admin
"System.Text.Encoding.ASCII.GetBytes("öäü"), you only get the bytes for '?' "
-- Um, yes, what did you expect? What about when you pass in han characters to ASCII.GetBytes? What do you want it to do? Passing in invalid data should do silly magic behind the scenes. Be happy it's a '?' and not an exception :).
Admin
OK, I made a similar function that was used for a curse word filter. It replaced certain characters with other characters that they resembled like "|" with "I" so people couldn't pick a username like "SH|THEAD".
Admin
This is way off topic too, but about the Oracle guy. Follow the redneck link given above, and look on the right. One of the banners is for guide-horses, like guide dogs, only your kids can ride them.
http://www.guidehorse.org/
If this site isn't a sham, and this guy gets the rates that he's asking, I need to become an Oracle DBA.
Admin
You evil bastard... you know some dumb Americans will rip off this code for their webpages. Good for a laugh, but seriously, I think Americans vilify the French enough without making a concerted effort to corrupt their language (and that of every other european nation using the Latin alphabet as well).
Admin
Oh goodness, this isn't about nationalism, the French, or the Americans. It's about laughable code. Who cares if a stupid American rips it off for some "evil" use, I'm sure stupid French people rip things off for "evil" use, too. It seems as though the "rest of the world" likes to call Americans arrogant and stupid, but the "rest of the world" sounds just as arrogant and stupid by saying it. Americans are not all the same, just like Europeans are not all the same, and Asians are not all the same. To lump every person in a country into a stereotype is about as arrogant and ignorant as you can get.
Admin
What makes this especially funny is that the author's name is French.
Still, copy-and-paste coding isn't always bad. The above isn't any better or worse than defining a lookup table and looping through it to do the search-and-replace. Same difference, really.
The problem is that the code does what is intended. If it is really is a problem; that code just might have been written in a situation where 7-bit ASCII output really was required. Email addresses, for example.
I've seen plenty of good code and bad code. Truely bad code is a bunch of big-ball-of-mud files without any defined interfaces, can't be summed up in a single WTF post, and is all too common. At least the above mistake is sufficiently abstracted (in a function) that you can fix it in that one place without having to grok tens of KLOCs of other WTF.
Admin
The funny thing is that PHP has a function that does exactly this, in one quick call:
strtr($string, 'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ', 'SZszYAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy')
And it's usually misused and abused there too. ("Hey, these characters look similar, they must be the same!") I'm willing to bet whoever originally wrote replaceDiacritics (which mostly aren't diacritics) used that exact example above to do it.
That is the winning link of the week, hands down.
Admin
Nobody's commented on how slow this is going to be. You end up making umpteen loops through the string, one for each call of the Replace() function. If you're going to do something as stupid as this, you could at least do it efficiently (one loop through the string, a simple range check on the character, and then search through a lookup table).
Admin
2) "I think he also forgot to replace £,€,¥, etc. with $."
---
Evidently you are both americans.
1) The war was unjustified, pointless and whatnot, Chiraq was right, Blair and Bush were wrong
2) Unless it also includes the maths to convert the numbers that would be stupid, as £1 is not $1 its more like $1.80 cos your currency is pretty weak at this moment in time.
Admin
As usual, God needs to get a sense of humor.
Why don't you just go and smite yourself.
Admin
Oh I dunno, there was that law about not coveting your neighbour's ass. That was quite funny - it would be if you saw my neighbour - oy vey.
Admin
I have no idea why they wanted this replaceDiacritcs function, but I can think of a plausible one: When I create web pages about photos of friends, I want to include their name in the URL, but URLs are restricted to plain US-ASCII, so a hypothetical Zoë László would have her photos in zoe-laszlo.html.
Python, Perl and PHP have easy string-map functions (for single-byte character data, at any rate); I don’t think dot-Net does.
Admin
Hope this function will help your 26-chars limited brains to understand the world a bit better :)
Admin
Why phnk, whatever are you
Admin
Long story…
Admin
I've written a function similar to this myself. My reason: sorting. I needed a way to tell a rather naive system that Õhm comes between Oglethorpe and Oldman, and the easiest way to do this was to create sort keys without diacriticals (or punctuation, or capitalization). The sort keys were used solely internally, not displayed to the user.
So if you're laughing at the supposed American insularity of this code, I can't join you until I know what it was used for. There are legitimate reasons to want to strip diacriticals from a string.
I can, however, laugh at it for taking about 40 lines of code to do it.
Admin
biggest wtf here is returning the object sent by reference.
Admin
@Raymond Lewallen
Why is that a wtf? I think I speak for all of us when I say we've learned the importance of doing things at least twice.
Admin
I can think of at least one good reason to do things like this: Imagine somethng like an intranet phone book in an international firm. Nobody wants to vgrep-cut-and-paste the german 'ö' from some dubious charset software just because the DARN thing is missing on the swedish-layout keybord and there is no other way to get the desired phone numer.
That does not mean that the 'ö' is not going to display in the intranet page, but the application is much more usable when it displays some "could be" matches.
Despite the lengthy and maybe naive implemetation I can't see a WTF here.
Admin
Why do peeps keep saying this? Why would you assume that passing parameters by reference automatically means that you don't want to be able to use chaining?
Or am I misunderstanding VB here? (Having managed to stay clear of it so far)
Admin
Except it can't handle multi-character replacements such as æ->ae.
The code in the article is easier to maintain, less error-prone and can handle special cases such as æ, so you fail.