best story ever

I chuckled. =)

TRWTF is that in any real system with > 100 IDs, the limitations of what character strings you could generate would mean that each 'id' would be about a paragraph long. IOW, this story was made up.

Who says he knows any math?

His paragraphs suck.

Pretty obvious, isn't it?

I think you'll agree.

You can uniquely identify more than 100 IDs using only seven digits if you limit yourself to ones and zeroes -- 128 in fact. You can get another 128 by adding just one more digit.

If you use all 10 numbers, you can do it in just three digits. If you add the latin alphabet, two.

TRWTF is that in any real system with > 100 IDs, the limitations of what character strings you could generate would mean that each 'id' would be about a paragraph long. IOW, this story was made up.

Yeah, next time show your work for partial credit. 0 -- Fail

I'm pretty sure that this should be in "Best of the Sidebar", with a credit...

Unless the OP sent it to Alex a year earlier and gave up on waiting.

"The effect of drinking a Pan Galactic Gargle Pussy is like having your brains smashed out with a slice of lemon wrapped round a large gold brick."

Should have simply used NATO alphabet (even if it gets a bit long)

However, TRWTF is the clients. Come on, there are customers whose surnames literally mean Fornication, Cat's Elbow, Golden Stone, Cat, Accident, Executioner, and more funny names I can't think of right now. Not to mention all those Dicks in America. Some of those even use to read in book of Moroni. Yeah, very funny, hahahaha. But fukumaku URLs is bad?

'di' and 'du' are not very japanesey phonemes.

It is in Nihon-shiki romanization, where ぢ and づ are romanized just like that.

Hepburn is a bit more common, though (at least in my experience), presumably because the romanization matches the pronunciation better. In that case, you'd have romanized those characters as ji and zu. However, that means there's no one-to-one mapping between romaji and characters, since ji and zu are also used for (the more common) じ and ず.

The real question is perhaps how we have both "shi" and "di" in there, since Nihon-shiki doesn't use shi, but si. (You'd once again need Hepburn for shi.) That suggests a mostly ad-hoc scheme intending to match pronunciation fairly closely, while still maintaining a one-to-one mapping (but I believe dzi and dzu are more common romanizations in that case).

This guy actually knows what he's talking about. That's TRWTF!

There are no "offensive" words (in the blanch-and-lose-consciousness-at-poppycock tradition, not in the "we will invade at midnight" military sense), there are only readers with such a weak internal sensitivity that it cannot sustain biological references (which are really passé for an 8 year old), and to which THEY choose to take offense. Let's not confuse misplaced puritanism with intrinsic value.

Captcha: "eros"... oh, you didn't ? You can tell me about guns and violence all day, but not that, ewwww, gross "bedroom" stuff ! How could you ? I'm gonna call the FCC. My intellectual stability depends on me not hearing about things that surround the hips.

This is the first time in a while that I've actually burst out laughing at a Daily WTF. Well done.

It is in Nihon-shiki romanization, where ぢ and づ are romanized just like that.

Hepburn is a bit more common, though (at least in my experience), presumably because the romanization matches the pronunciation better. In that case, you'd have romanized those characters as ji and zu. However, that means there's no one-to-one mapping between romaji and characters, since ji and zu are also used for (the more common) じ and ず.

The real question is perhaps how we have both "shi" and "di" in there, since Nihon-shiki doesn't use shi, but si. (You'd once again need Hepburn for shi.) That suggests a mostly ad-hoc scheme intending to match pronunciation fairly closely, while still maintaining a one-to-one mapping (but I believe dzi and dzu are more common romanizations in that case).

This guy actually knows what he's talking about. That's TRWTF!

That's because he is a pokemon, and they came from Japan.

TRWTF is these guys were still making code changes days before the release.

Are you telling me a 9 month project can be regression tested in a day or two?

You obviously weren't working in this industry in 1999.

It was just as horrible as it sounds.

So you americans are so "political correct" that a string like "fukumashita" is an insult?

PS: Ironically, I wrote "political correct" to be political correct and not writing stupid... then I'm stupid... well, at least I'm not 250 million stupid.

Oh man, I thought we voted in Pres. Obama so that everyone would like America again.

Tell that to the imdb.com forums.

I suppose none of them are Cubs fans. Fukudome? What? Is that the F-You Dome? Why are they so offensive?

Maybe fukudome -> "fuck you dummy"?

Ben:

Not here on TDWTF, but Facebook once gave me a captcha of "joder". "Joder" is Spanish for "fuck".

I suppose none of them are Cubs fans. Fukudome? What? Is that the F-You Dome? Why are they so offensive?

Maybe fukudome -> "fuck you dummy"?

Fuck You, Do Me.

Couldn't they have just added a client entered text based identifier unique per client?

They could then be pulled up with either the normal ID or a client ID and friendly string, more work for the client, but if that's what they want they enter it, they assume any responsibility for easily guessed identifiers AND using any bad words that could make baby Jesus cry.

If you have a client so stupid that they can be offended by randomly generated strings that are never seen by the public all you can do is shift the burden to them - you can NEVER make them happy with something generated by code.

Is there any particular reason they didn't just simplify their existing IDs? Switch from hex to decimal, throw in a few dashes, and you should wind up with something fairly easy to pronounce:

1027-4002-9530-3064

TEN TWENTY SEVEN! FOUR-THOUSAND TWO! NINETY-FIVE THIRTY! THIRTY SIXTY-FOUR!

OR, a real salesperson would offer to build this customer a brand spanking new system (for a mere million dollars) where they wouldn't need to speak these IDs aloud anymore.

Long strings of digits are difficult to remember and easy to mess up when copying. Which do you find easier to remember: Your friends' and coworkers' names? Or their phone numbers? I've had plenty of times that someone tried to relate some long id number to me while I wrote it down and I've had to ask, "Wait, was that 4-7 or 7-4", etc.

The basic idea of having pronounceable words sounds like a good one to me. Acknowledging the sort of problem brought up in this story.

Not one of his cited words sounds remotely rude if you pronounce them correctly. Not only was the boss a prick, he was one of the ignorant variety.

For example: "fukushita" may be pronounced foo-koo-shee-tah "diefatsu" may be pronounced die-fahts "kakashite" may be pronounced kah-kah-shee-tay

Then again, caca may be offensive to a spanish speaking person.

There are no "offensive" words (in the blanch-and-lose-consciousness-at-poppycock tradition, not in the "we will invade at midnight" military sense), there are only readers with such a weak internal sensitivity that it cannot sustain biological references (which are really passé for an 8 year old), and to which THEY choose to take offense. Let's not confuse misplaced puritanism with intrinsic value.

So you wouldn't mind if it generated racial epithets, then? Life if one day it churned out "wetback" or "n----r"? Or what if it produced a word offensive to homosexuals, like "sodomite"? Maybe you wouldn't mind those either, but I find that people who think it's absurd to be offended by vulgar references to sex or urination often go through the roof at other types of offensive language. Indeed, what if one day it simply generated the word "nazi"? Would you want an advertisement for your company to have the word "nazi" displayed above it?

Personally I'd object to either set of words. I'm not going to fall on the floor gasping for breath, but I'd rather not hear them. Words have meaning, and if you don't believe that, then what have you been typing in your posts? I object to the name of this web site, though obviously from that fact that I'm posting here, it doesn't disturb me enough to refrain from visiting it.

"These are only shown to users as URL parameters, no different than the session ID," Brian protested.

So apparently end users can see these words. Even if you think that your customers are morons for objecting to certain words, one dollar from a moron is just as good as one dollar from a genius. Do you insist that you will only accept money from people who share your social views?

Is there any particular reason they didn't just simplify their existing IDs? Switch from hex to decimal, throw in a few dashes, and you should wind up with something fairly easy to pronounce:

1027-4002-9530-3064

TEN TWENTY SEVEN! FOUR-THOUSAND TWO! NINETY-FIVE THIRTY! THIRTY SIXTY-FOUR!

OR, a real salesperson would offer to build this customer a brand spanking new system (for a mere million dollars) where they wouldn't need to speak these IDs aloud anymore.

I suspect that Brian is the type of Engineer who would design heated handlebars for biking in the cold (powered by your own pedaling) when a pair of gloves would work just as well.

Oh, god, "The Mythical Man Month"... I remember that from software engineering class. My brain always put the hyphen in the wrong place, so instead of Mythical Man-Month, it was Mythical-Man Month. Like Black History Month but for Hercules and the like. Minor chuckles at that were pretty much my entire takeaway from that class....

captcha "ullamcorper"....interesting! One who works for Ullam Corp?

Years ago a co-worker told me of a similar problem: The company generated product IDs using one of those schemes where the first letter identifies the product category, the second letter identifies the factory where it is made, etc., with a letter or two tacked on the end to make it unique. And they ran into the same sort of problem where one of the generated codes -- printed on the product label -- was a vulgar word. Their solution was simpler than Barry & Brian's: make a list of offensive words, and then bounce any code against this list. And so, he said, he was given the job of typing into the computer all the obscene and vulgar words that he could think of. How do you put that on a resume?

Oh, god, "The Mythical Man Month"... I remember that from software engineering class. My brain always put the hyphen in the wrong place, so instead of Mythical Man-Month, it was Mythical-Man Month. Like Black History Month but for Hercules and the like. Minor chuckles at that were pretty much my entire takeaway from that class....

Cool idea. I wonder if we could get it designated an official federal holiday?

Cool idea. I wonder if we could get it designated an official federal holiday?

Years ago a co-worker told me of a similar problem: The company generated product IDs using one of those schemes where the first letter identifies the product category, the second letter identifies the factory where it is made, etc., with a letter or two tacked on the end to make it unique. And they ran into the same sort of problem where one of the generated codes -- printed on the product label -- was a vulgar word. Their solution was simpler than Barry & Brian's: make a list of offensive words, and then bounce any code against this list. And so, he said, he was given the job of typing into the computer all the obscene and vulgar words that he could think of. How do you put that on a resume?

I was expecting that they'd just remove the vowels - that's what happened when I heard this story about a high school.

pronounce it using an English phonetic system.

You are funny

Slighty relevent - one time long ago, the server guys were explaining how they were going to name the file servers throughout Florida (each location only needed one back in those days).

Jacksonville would be jaxis, and Tampa would be tamis, Miami would be miais

"The effect of drinking a Pan Galactic Gargle Pussy is like having your brains smashed out with a slice of lemon wrapped round a large gold brick."
Call the FAA, 90% of TDWTF flew past this one.
TRWTF is the guy's hand.

Since '96 actually. Do you think regression testing was just invented?

"The effect of drinking a Pan Galactic Gargle Pussy is like having your brains smashed out with a slice of lemon wrapped round a large gold brick."

<slurp> mmmm, that's some good Gargle Pussy!

"When talking by phone with people in different offices, they have to read the IDs to each other to be able to identify which accounts they are talking about."

This has got to be one of the stupidest customer requirements I ever heard (and I have been in the custom software business since long before 1999).

Home office clerk on phone: "Friedman."

Branch office clerk on phone: "Just a sec." (types) freedman "Nope." (types) freedmann "Nope." (types) freidman "Nope." (types) freidmann "Nope." (types) fredemann "Nope." (types) freedemann "Nope." (types) frehdman "Nope." (into phone) "I've tried everything. We just don't have it."

Home office clerk on phone: "Dammit. Okay, next: bloogarue..."

-Harrow.

"When talking by phone with people in different offices, they have to read the IDs to each other to be able to identify which accounts they are talking about."

This has got to be one of the stupidest customer requirements I ever heard (and I have been in the custom software business since long before 1999).

Home office clerk on phone: "Friedman."

Branch office clerk on phone: "Just a sec." (types) freedman "Nope." (types) freedmann "Nope." (types) freidman "Nope." (types) freidmann "Nope." (types) fredemann "Nope." (types) freedemann "Nope." (types) frehdman "Nope." (into phone) "I've tried everything. We just don't have it."

Home office clerk on phone: "Dammit. Okay, next: bloogarue..."

-Harrow.

Wot, no SOUNDEX?

There are no "offensive" words (in the blanch-and-lose-consciousness-at-poppycock tradition, not in the "we will invade at midnight" military sense), there are only readers with such a weak internal sensitivity that it cannot sustain biological references (which are really passé for an 8 year old), and to which THEY choose to take offense. Let's not confuse misplaced puritanism with intrinsic value.

So you wouldn't mind if it generated racial epithets, then? Life if one day it churned out "wetback" or "n----r"? Or what if it produced a word offensive to homosexuals, like "sodomite"? Maybe you wouldn't mind those either, but I find that people who think it's absurd to be offended by vulgar references to sex or urination often go through the roof at other types of offensive language. Indeed, what if one day it simply generated the word "nazi"? Would you want an advertisement for your company to have the word "nazi" displayed above it?

Personally I'd object to either set of words. I'm not going to fall on the floor gasping for breath, but I'd rather not hear them. Words have meaning, and if you don't believe that, then what have you been typing in your posts? I object to the name of this web site, though obviously from that fact that I'm posting here, it doesn't disturb me enough to refrain from visiting it.

"These are only shown to users as URL parameters, no different than the session ID," Brian protested.

So apparently end users can see these words. Even if you think that your customers are morons for objecting to certain words, one dollar from a moron is just as good as one dollar from a genius. Do you insist that you will only accept money from people who share your social views?

really, in your mind "sh*t" which is something you do hopefully at least once a day is in the same realm as "nazi" but "gun" is not ? And you do not require therapy ?

So you americans are so "political correct" that a string like "fukumashita" is an insult?

PS: Ironically, I wrote "political correct" to be political correct and not writing stupid... then I'm stupid... well, at least I'm not 250 million stupid.

This is the country where the personalized license ILUVTOFU by a vegan was nixed by a bunch of bureaucrats. Given the spelling capacity of most modern American executives these days, however, I'd even question the whole premise of this idea.

TRWTF is that no pussy got gargled.

Zapakh:
Right up there with "Never, ever leave the singer in charge of the mix" has got to be "Never let the marketers make technical decisions".

But not so as well known as "Never get involved in a land war in Asia", or the slightly lesser known "Never go in against a Sicilian when death is on the line".

TRWTF is that no pussy got gargled.

A much better idea would've been to just randomly string together plain English words from a sanitized list. Easier to remember, easier to pronounce, no risk of business-destroying garglepussy. Also, two days to write a Markov chain random text generation program? It shouldn't even take two hours. And while I'm nitpicking, Japanese learning books that use romaji are a blight upon humanity, and even losing his job is not too great a punishment for one who as sinned so.
OutlawProgrammer:
Is there any particular reason they didn't just simplify their existing IDs? Switch from hex to decimal, throw in a few dashes, and you should wind up with something fairly easy to pronounce:

1027-4002-9530-3064

TEN TWENTY SEVEN! FOUR-THOUSAND TWO! NINETY-FIVE THIRTY! THIRTY SIXTY-FOUR!

OR, a real salesperson would offer to build this customer a brand spanking new system (for a mere million dollars) where they wouldn't need to speak these IDs aloud anymore.

I have in fact used a combination of these two methods for random client password generation, in the hopes that they won't end up writing them down on sticky-notes or forgetting them, as they tend to do with "secure" passwords like say "CnK5J\$lm0". The point is to make a password that occupies less space in human memory while still being sufficiently random (i.e, you can memorise "1812" or "red" as a single item, so "1592-jam-xray-3012" can be remembered as a string of 4 items in your brain, while "CnK5J\$lm0" is 9).

Of course, this is probably some sort of heresy, and some jerk will make erroneous assumptions about the size and content of the word/number set to call me an idiot in the comments, but that's the internet for you.

Is there any particular reason they didn't just simplify their existing IDs? Switch from hex to decimal, throw in a few dashes, and you should wind up with something fairly easy to pronounce:

1027-4002-9530-3064

TEN TWENTY SEVEN! FOUR-THOUSAND TWO! NINETY-FIVE THIRTY! THIRTY SIXTY-FOUR!

OR, a real salesperson would offer to build this customer a brand spanking new system (for a mere million dollars) where they wouldn't need to speak these IDs aloud anymore.

Long strings of digits are difficult to remember and easy to mess up when copying. Which do you find easier to remember: Your friends' and coworkers' names? Or their phone numbers? I've had plenty of times that someone tried to relate some long id number to me while I wrote it down and I've had to ask, "Wait, was that 4-7 or 7-4", etc.

The basic idea of having pronounceable words sounds like a good one to me. Acknowledging the sort of problem brought up in this story.

Well, my boss' name is indian and sort of hard to pronounce. We have some others in the area with 5 syllable names, and I'm pretty sure that blowing it out to 10 syllables would be just as bad as numbers. Just split the number into blocks with dashes and add some check digits: aaaac-bbbbc-ddddc-eeeec, that way you can tell which block has the bad number.

This could all have been avoided if he had just filled it with the Klingon Hamlet in the first place.

"They dropped our contract!" Brian shrieked, "Half our revenue is gone! You've killed our company!"
Isn't that supposed to be Barry who said that?
Isn't that supposed to be Barry who said that?

Ahh mah lunch!

People are trying to find meaning (or humour) in the captchas (mine's iusto, by the way). Recaptcha's are best for this. They give two English words, and if I get one of them completely wrong, no-one cares.

I wrote a small program. Pulls a fuckton of recaptchas, asks the user to fill them properly, then guesses which to replace with penis. Pollutes their database.

I can believe this -- twice I've had the same issue. We did, a long time ago, send data via Teletext on broadcast TV. We had a private page, not in any directory, and we got about 500 bytes of useful data transferred every 30 seconds or so. The TV station required that NO obscenities (with a very broad definition) were ever transmitted. The data was binary and encrypted. The very first test page sent had 'duck' in it. Sigh. Remove all vowels. A few days later there was one complaint -- 'shyt'. Remove the 'y'. Ran for quite a time without further complaint.

We also had to implement a 'grubby' word filter for a public information display. I spent a few days trawling porn sites and getting paid for it, and lifted all the meta tags. From the 500,000 tags collected, in the end we found ~1400 words or bases of words that might cause offense, of course in English only. Did it work -- of course not, but it kept the client happy.

A much better idea would've been to just randomly string together plain English words from a sanitized list. Easier to remember, easier to pronounce, no risk of business-destroying garglepussy. Also, two days to write a Markov chain random text generation program? It shouldn't even take two hours. And while I'm nitpicking, Japanese learning books that use romaji are a blight upon humanity, and even losing his job is not too great a punishment for one who as sinned so.

I've people say this occasionally and I never really understood why. It's a decent enough way to start, Japanese is completely unambiguous in its pronunciation so provided you remember the rules (which isn't exactly difficult) then the only thing you're missing out on by using romaji is hiragana reading practice (which you get plenty of later on). It's not like it teaches you bad habits that you have to later unlearn.

CAPTCHA: genitus. I don't normally post my captcha but this one seems pretty appropriate to the article subject matter.

A much better idea would've been to just randomly string together plain English words from a sanitized list. Easier to remember, easier to pronounce, no risk of business-destroying garglepussy. Also, two days to write a Markov chain random text generation program? It shouldn't even take two hours. And while I'm nitpicking, Japanese learning books that use romaji are a blight upon humanity, and even losing his job is not too great a punishment for one who as sinned so.

I've people say this occasionally and I never really understood why. It's a decent enough way to start, Japanese is completely unambiguous in its pronunciation so provided you remember the rules (which isn't exactly difficult) then the only thing you're missing out on by using romaji is hiragana reading practice (which you get plenty of later on). It's not like it teaches you bad habits that you have to later unlearn.

CAPTCHA: genitus. I don't normally post my captcha but this one seems pretty appropriate to the article subject matter.

Sure, the pronunciation is mostly unambiguous (trailing vowel sounds are often unvoiced), but the syllable emphasis screws up a bunch of americans - japanese is very flat, while english likes to emphasize syllable 1 or 2. Makes for confusing listening.

I understand that people work with what they have lying around, but the first examples of bad words seems easily avoidable... clear your cache and surf non-porn sites before using it as a corpus for your markov chains.

Is there any particular reason they didn't just simplify their existing IDs? Switch from hex to decimal, throw in a few dashes, and you should wind up with something fairly easy to pronounce:

1027-4002-9530-3064

TEN TWENTY SEVEN! FOUR-THOUSAND TWO! NINETY-FIVE THIRTY! THIRTY SIXTY-FOUR!

OR, a real salesperson would offer to build this customer a brand spanking new system (for a mere million dollars) where they wouldn't need to speak these IDs aloud anymore.

Long strings of digits are difficult to remember and easy to mess up when copying. Which do you find easier to remember: Your friends' and coworkers' names? Or their phone numbers? I've had plenty of times that someone tried to relate some long id number to me while I wrote it down and I've had to ask, "Wait, was that 4-7 or 7-4", etc.

The basic idea of having pronounceable words sounds like a good one to me. Acknowledging the sort of problem brought up in this story.

Doing it with number groups would have been simpler and probably kept the client. Errors in data entry would have been their problem after all. As programmers, we can't fix the world, and we shouldn't die in a ditch trying.

I admit, generated words would have sounded fine to me as well, had I not run into this sort of problem before.

A better plan from 1995 - if only Brian had known.

http://www.ietf.org/rfc/rfc1760.txt

The S/KEY system is designed to facilitate this manual entry without impeding automatic methods. The one-time password is therefore converted to, and accepted as, a sequence of six short (1 to 4 letter) English words. Each word is chosen from a dictionary of 2048 words;

Dictionary for Converting Between S/KEY 6-Word and Binary Formats

{ "A", "ABE", "ACE", ....."YELL", "YOGA", "YOKE" };