• (cs)

    Koremachiko !

  • Bill (unregistered)

    HA HA HA duis is tsrif

  • fw (unregistered)

    He shoved the paper at Brian, who took it apprehensively.

    "fukushita", "kakashite", "fukumihado", "diefatsu", "tokaduki", and "fukusuka", he read, collapsing inwardly and visibly shaking as he read down the list.

    "They dropped our contract!" Brian shrieked, "Half our revenue is gone! You've killed our company!"

    Brian protested that dropping the bad-word filter and using Japanese were both Barry's idea...

    TRWTF is that poor old Barry can't get a word in edgeways

  • (cs)

    Actually, I think I remember where I saw this one before.

    The sidebar.

  • Ann Onymous (unregistered)

    Many prior grammatical indiscretions were forgiven after noticing the phrase "pored over" in this WTF rather than the swiftly-becoming-standard "poured over". Throw in a "bated breath" and I'll stop bitching about missing words.

    Of course it's classic! Can't you see it's set in 1999?

    (third attempt)

  • Anon (unregistered)

    TRWTF is the client that can't handle psuedo-obscene IDs. I suppose their workforce consists entirely of 6 year-old children who must be shielded from such naughty language.

  • Incourced (unregistered)

    Not one of his cited words sounds remotely rude if you pronounce them correctly. Not only was the boss a prick, he was one of the ignorant variety.

    Captcha: inhibeo

  • Sid (unregistered)

    TRWTF is that in any real system with > 100 IDs, the limitations of what character strings you could generate would mean that each 'id' would be about a paragraph long. IOW, this story was made up.

  • A. Cube (unregistered) in reply to Incourced
    Not one of his cited words sounds remotely rude if you pronounce them correctly. Not only was the boss a prick, he was one of the ignorant variety.

    Yeah, but the boss was a VP of Marketing--what the heck do you expect?

  • Derek (unregistered)

    TRWTF is that Brian typed in all the pages instead of just scanning and OCR.

  • Matthew (unregistered) in reply to Incourced
    Incourced:
    Not one of his cited words sounds remotely rude if you pronounce them correctly.

    They're made up words. There is no "correctly".

  • Steve the Cynic (unregistered) in reply to Incourced
    Incourced:
    Not one of his cited words sounds remotely rude if you pronounce them correctly. Not only was the boss a prick, he was one of the ignorant variety.

    Captcha: inhibeo

    The main point here is not that any of the words is rude, nor that any of them sounds rude when said by a native speaker, but that letting non-speakers of a language loose with its words is a hazard.

    Making the IDs more like account numbers might have been a good idea, perhaps dropping the hex encoding and shoving a few hyphens in to break them up.

  • anon (unregistered) in reply to Incourced
    Incourced:
    inhibeo
    How dare you speak like that about my mom?
  • Anonymous (unregistered)

    As soon as I read this in the Sidebar I knew it would be front-page material. Some people think it has appeared here before but are you guys sure you didn't just read it when it was posted in the forums? Anyway, thanks to the submitter, it's a beauty!

  • Rootbeer (unregistered) in reply to Sid
    Sid:
    TRWTF is that in any real system with > 100 IDs, the limitations of what character strings you could generate would mean that each 'id' would be about a paragraph long.

    How do you figure that?

    Even if you were to limit to the five short vowels and seven least irregular Japanese consonants, that's 35 distinct values that can be represented by a single syllable.

    Two syllables = 35^2 = 1,225 values Three syllables = 35^3 = 42,875 values

    If we assume a person could remember an ID of eight syllables, that allows for more than 2.2 billion distinct values.

  • Brian..or Am I Barry? (unregistered)

    Did anyone else notice how Barry and Brian switched places in the story a few times?

  • Steve H (unregistered)

    You can't divine "phonemic combinations" from textual analysis. Least of all in English, that would clearly be ridiculous. Noobs.

  • Anonymous (unregistered) in reply to Steve H
    Steve H:
    You can't divine "phonemic combinations" from textual analysis. Least of all in English, that would clearly be ridiculous. Noobs.
    Phonemic combinations? Want to try that again, or should you maybe stay quiet and let the grown-ups talk?
  • Anonymous (unregistered) in reply to Anonymous
    Anonymous:
    Steve H:
    You can't divine "phonemic combinations" from textual analysis. Least of all in English, that would clearly be ridiculous. Noobs.
    Phonemic combinations? Want to try that again, or should you maybe stay quiet and let the grown-ups talk?
    Oh, you pointed out a typo. Well done you! Now seriously, go away.
  • some guy (unregistered) in reply to Matthew
    Matthew:
    Incourced:
    Not one of his cited words sounds remotely rude if you pronounce them correctly.

    They're made up words. There is no "correctly".

    It based on Japanese phonetics. There is exactly one correct way to pronounce each word generated. It's only ambiguous when you try to pronounce it using an English phonetic system.

  • Corey (unregistered) in reply to Anonymous

    "In a language or dialect, a phoneme is the smallest segmental unit of sound employed to form meaningful contrasts between utterances." - http://en.wikipedia.org/wiki/Phoneme.

    Now... do YOU want to try that again, or should you maybe stay quiet and let the grownups talk?

  • Anonymous (unregistered)

    A much better idea would've been to just randomly string together plain English words from a sanitized list. Easier to remember, easier to pronounce, no risk of business-destroying garglepussy. Also, two days to write a Markov chain random text generation program? It shouldn't even take two hours. And while I'm nitpicking, Japanese learning books that use romaji are a blight upon humanity, and even losing his job is not too great a punishment for one who as sinned so.

  • OutlawProgrammer (unregistered)

    Is there any particular reason they didn't just simplify their existing IDs? Switch from hex to decimal, throw in a few dashes, and you should wind up with something fairly easy to pronounce:

    1027-4002-9530-3064

    TEN TWENTY SEVEN! FOUR-THOUSAND TWO! NINETY-FIVE THIRTY! THIRTY SIXTY-FOUR!

    OR, a real salesperson would offer to build this customer a brand spanking new system (for a mere million dollars) where they wouldn't need to speak these IDs aloud anymore.

  • Keith (unregistered) in reply to some guy
    some guy:
    Matthew:
    Incourced:
    Not one of his cited words sounds remotely rude if you pronounce them correctly.

    They're made up words. There is no "correctly".

    It based on Japanese phonetics. There is exactly one correct way to pronounce each word generated. It's only ambiguous when you try to pronounce it using an English phonetic system.

    Agreed. While reading the words in japanese (no spaces injected, and the look Japanese), I found nothing offenive. Translating, of course, I found them humorously offensive.

    Still I wonder why they are using a Markovian process to transform numbers into words.

    TRWTF is, of course, the yelling out of IDs all day.

    TRRWTF is developing to the requirement of people that use the "shouting" method of communicating CS data. If only there were some sort of instant computer-to-computer method of communication.

  • Bem (unregistered)

    can anyone explain how "tokaduki" is rude ????

  • lulzfish (unregistered) in reply to Bem

    It's not very rude, but it reminds me of the word "toke", and you can have client's employees thinking about drugs.

  • Anonymous (unregistered) in reply to Bem

    Toke A Dookie. Dookie being a slang term for that which occasionally drops out of your butt.

  • Zapakh (unregistered)

    Clbuttic!

    Right up there with "Never, ever leave the singer in charge of the mix" has got to be "Never let the marketers make technical decisions".

    One time when I was poring over Apache logs, idly wondering what it would be like to generate pseudorandom text from people's IP addresses, just so I'd have something phonetic to call them. Now I know!

  • Ben (unregistered) in reply to lulzfish

    that makes sense, although if their first thought is 'toking the duki' you probably want to get rid of the employee rather than the software.

    anyone had a rude captcha yet?

  • (cs)

    So you americans are so "political correct" that a string like "fukumashita" is an insult?

    PS: Ironically, I wrote "political correct" to be political correct and not writing stupid... then I'm stupid... well, at least I'm not 250 million stupid.

  • ih8u (unregistered) in reply to Anon
    Anon:
    TRWTF is the client that can't handle psuedo-obscene IDs. I suppose their workforce consists entirely of 6 year-old children who must be shielded from such naughty language.

    I suppose none of them are Cubs fans. Fukudome? What? Is that the F-You Dome? Why are they so offensive?

    Come to think of it ... couldn't some of these wtf captchas be considered questionable?

  • Engival (unregistered) in reply to OutlawProgrammer
    OutlawProgrammer:
    Is there any particular reason they didn't just simplify their existing IDs? Switch from hex to decimal, throw in a few dashes, and you should wind up with something fairly easy to pronounce:

    1027-4002-9530-3064

    TEN TWENTY SEVEN! FOUR-THOUSAND TWO! NINETY-FIVE THIRTY! THIRTY SIXTY-FOUR!

    OR, a real salesperson would offer to build this customer a brand spanking new system (for a mere million dollars) where they wouldn't need to speak these IDs aloud anymore.

    The solution is even easier than that. Consider:

    1. You only need large crazy numbers for the online/scripting portion.
    2. Internally, a simple sequential campaign number would be enough.

    So combine them.

    CAMPAIGN-RANDOM_CRAP

    In their accounting, tell them to drop everything after the dash. In their software, use the full string.

  • Futoque (unregistered) in reply to Anonymous

    There's no need to make fun of other people's comments like that. I'm sure they're doing the best they can.

  • flarp (unregistered)

    'di' and 'du' are not very japanesey phonemes.

    He shouldn't have agreed to the idea in the first place -- why would you take a simple problem about synthetic keys and introduce natural languages and other extreme complexities?

  • (cs) in reply to Sid
    Sid:
    TRWTF is that in any real system with > 100 IDs, the limitations of what character strings you could generate would mean that each 'id' would be about a paragraph long. IOW, this story was made up.

    Your math sucks dude.

  • PJ Volk (unregistered)

    Software engineering rule #1: The customer doesn't know what they want.

    Classic example of a perceived problem causing more artifical ones. Also goalposts were moved (probably contrived for effect, just to make giggle words in Japanese).

    Customer did not have a real problem (too many to do manually, not enough to automate). If they were going to pull out because of it, I would have to think there were other problems.

    Definitely solution fixation. Why multiple syllible words? Why couldn't the dictionary be updated to remove the giggle words(Gutenburg is PD, right?)? Where's the sales guy to explain how such words could improve morale? What happens when the client starts putting context on the nonsense words (you could use some pusdiction, or most words + knob)?

    Change the solution, and those artificial problems go away.

  • gus (unregistered)

    I don't buy it.

    The first thing most coders would think of is to drop inappropriate words.

    I can't imagine a customer being happy having to pronounce nonsense words.

    I can't imagine a customer dropping you because of something so minor and easily fixable.

    Sorry, too many improbabilities. Not buying it.

  • Joey Stink Eye Smiles (unregistered)

    TRWTF is these guys were still making code changes days before the release.

    Are you telling me a 9 month project can be regression tested in a day or two?

  • LU (unregistered) in reply to gus
    gus:
    I don't buy it.

    The first thing most coders would think of is to drop inappropriate words.

    I can't imagine a customer being happy having to pronounce nonsense words.

    I can't imagine a customer dropping you because of something so minor and easily fixable.

    Sorry, too many improbabilities. Not buying it.

    Well, either you are very lucky or you are living under a rock. You wouldn't belive how stupid and over reactive people can be... Yes, there are even people who would drop a contract because of some insignificant stuff...

  • (cs) in reply to flarp
    flarp:
    'di' and 'du' are not very japanesey phonemes.

    He shouldn't have agreed to the idea in the first place -- why would you take a simple problem about synthetic keys and introduce natural languages and other extreme complexities?

    And miss the chance of playing with Markov Chains??? C'mon, how could anyone pass that!!!??!!??!!

  • Plz Send Me The Code (unregistered) in reply to ubersoldat
    ubersoldat:
    So you americans are so "political correct" that a string like "fukumashita" is an insult?

    PS: Ironically, I wrote "political correct" to be political correct and not writing stupid... then I'm stupid... well, at least I'm not 250 million stupid.

    Oh man, I thought we voted in Pres. Obama so that everyone would like America again.

  • henry (unregistered) in reply to PJ Volk

    Why not just give the client a link to an IM client? Problem solved without any changes at all

  • krupa (unregistered) in reply to ubersoldat
    ubersoldat:
    PS: Ironically, I wrote "political correct" to be political correct and not writing stupid... then I'm stupid... well, at least I'm not 250 million stupid.

    Wait... what?

  • LU (unregistered) in reply to henry
    henry:
    Why not just give the client a link to an IM client? Problem solved without any changes at all
    But the client *WANTED* pronounceable IDs! <put a child do-want-have-this-now! face expression here>
  • 日本語の学生 (unregistered) in reply to fw
    fw:
    "fukushita", "kakashite", "fukumihado", "diefatsu", "tokaduki", and "fukusuka"
    ふくした かかして ふくみはど ぢえふぁつ とかづき ふくすか

    私は分かりません。

  • (cs) in reply to flarp
    flarp:
    'di' and 'du' are not very japanesey phonemes.

    It is in Nihon-shiki romanization, where ぢ and づ are romanized just like that.

    Hepburn is a bit more common, though (at least in my experience), presumably because the romanization matches the pronunciation better. In that case, you'd have romanized those characters as ji and zu. However, that means there's no one-to-one mapping between romaji and characters, since ji and zu are also used for (the more common) じ and ず.

    The real question is perhaps how we have both "shi" and "di" in there, since Nihon-shiki doesn't use shi, but si. (You'd once again need Hepburn for shi.) That suggests a mostly ad-hoc scheme intending to match pronunciation fairly closely, while still maintaining a one-to-one mapping (but I believe dzi and dzu are more common romanizations in that case).

  • Alberto (unregistered) in reply to Engival
    Engival:
    OutlawProgrammer:
    Is there any particular reason they didn't just simplify their existing IDs? Switch from hex to decimal, throw in a few dashes, and you should wind up with something fairly easy to pronounce:

    1027-4002-9530-3064

    TEN TWENTY SEVEN! FOUR-THOUSAND TWO! NINETY-FIVE THIRTY! THIRTY SIXTY-FOUR!

    OR, a real salesperson would offer to build this customer a brand spanking new system (for a mere million dollars) where they wouldn't need to speak these IDs aloud anymore.

    The solution is even easier than that. Consider:

    1. You only need large crazy numbers for the online/scripting portion.
    2. Internally, a simple sequential campaign number would be enough.

    So combine them.

    CAMPAIGN-RANDOM_CRAP

    In their accounting, tell them to drop everything after the dash. In their software, use the full string.

    As usual, the simpliest solution would be aborred by most engineers

  • Alberto (unregistered)

    Is it possible to replace the present captcha with this generator?

  • (cs)

    Oh bugger - I remember when I first wrote an automated username generator - no forbidden characters like 0O1lI, and only pronouncable syllables..

    Luckily, the customer discovered the first "interesting" words during testing, before we went live.

    We quickly added a patch, where we generated a bunch of random names in a database table and manually filtered the offensive ones - the server picked the top one, and only generated a new one if there were no screened ones in the database table.. :D

  • ネット日本語の先生 (unregistered) in reply to 日本語の学生
    日本語の学生:
    fw:
    "fukushita", "kakashite", "fukumihado", "diefatsu", "tokaduki", and "fukusuka"
    ふくした かかして ふくみはど ぢえふぁつ とかづき ふくすか

    私は分かりません。

    No, you're doing it all wrong. This is the Internet, you're supposed to say things like "わかんないw".

Leave a comment on “The Automated Curse Generator”

Log In or post as a guest

Replying to comment #:

« Return to Article