• アレック (unregistered) in reply to Xythar
    Xythar:
    Kunrei-siki allows both variants though.

    However - the real WTF here is that he didn't use nihon-siki.

    Ugh I hate nihon-shiki. For the purpose of English speakers, if you're going to romanise Japanese but do so in a way that isn't pronounced at all how it's read (e.g. "zyosi" is actually pronounced like "josh") you might as well just leave it in hiragana in the first place. It's really only useful for the Japanese in places where they need to romanise the language (e.g. IMEs or non Unicode URLs), learning using it is dumb.

    I hate anything but nihonsiki. For the purpose of spelling and grammar it makes no sense. The words bear only a rudimentary resemblance to what they should be. Eg the conjunctive form in hepburn/kunrei transforms ku -> ki su - shi tsu -> chi nu -> ni fu -> hi (etc)... There's no pattern! The same conversation in Nihonsiki ku -> ki su -> si tu -> ti nu -> ni hu -> hi That's a lot easier to work with because it all follows the same pattern.

  • アレック (unregistered) in reply to Xythar
    Xythar:
    Kunrei-siki allows both variants though.

    However - the real WTF here is that he didn't use nihon-siki.

    Ugh I hate nihon-shiki. For the purpose of English speakers, if you're going to romanise Japanese but do so in a way that isn't pronounced at all how it's read (e.g. "zyosi" is actually pronounced like "josh") you might as well just leave it in hiragana in the first place. It's really only useful for the Japanese in places where they need to romanise the language (e.g. IMEs or non Unicode URLs), learning using it is dumb.

    I hate anything but nihonsiki. For the purpose of spelling and grammar it makes no sense. The words bear only a rudimentary resemblance to what they should be. Eg the conjunctive form in hepburn/kunrei transforms ku -> ki su - shi tsu -> chi nu -> ni fu -> hi (etc)... There's no pattern! The same conversation in Nihonsiki ku -> ki su -> si tu -> ti nu -> ni hu -> hi That's a lot easier to work with because it all follows the same pattern.

  • Jasper (unregistered)

    Come on, man. I have a hard time believing that this story is real.

    "They dropped our contract!" Brian shrieked, "Half our revenue is gone! You've killed our company!"
    Yeah, really. Half of the business of the company depended on this silly stuff and the biggest client swings from happy to dropping the contract overnight because of some trivial silly thing like this.

    Doesn't sound very realistic to me.

  • that much for symmetry (unregistered)

    So, this is the genesis of the TDWTF captcha generator?

  • Anonymous (unregistered) in reply to Jasper
    Jasper:
    Come on, man. I have a hard time believing that this story is real.
    "They dropped our contract!" Brian shrieked, "Half our revenue is gone! You've killed our company!"
    Yeah, really. Half of the business of the company depended on this silly stuff and the biggest client swings from happy to dropping the contract overnight because of some trivial silly thing like this.

    Doesn't sound very realistic to me.

    Spewing out a stream of profanity is not a "trivial silly thing". It may seem silly to a developer but in the realm of cross-corporation politics this is nothing less than suicide.

  • Here's a nickel, kid (unregistered) in reply to Anonymous
    Anonymous:
    Jasper:
    Come on, man. I have a hard time believing that this story is real.
    "They dropped our contract!" Brian shrieked, "Half our revenue is gone! You've killed our company!"
    Yeah, really. Half of the business of the company depended on this silly stuff and the biggest client swings from happy to dropping the contract overnight because of some trivial silly thing like this.

    Doesn't sound very realistic to me.

    Spewing out a stream of profanity is not a "trivial silly thing". It may seem silly to a developer but in the realm of cross-corporation politics this is nothing less than suicide.
    Sometimes company politics plays a role...Mgr A is just looking for some reason to stab Mgr B in the back, or is competing for resources in some way, or whatever.

    It can also be that there are more than one vendor in the competition, they're both about the same, give or take a few things, so the decision might be difficult (and include, again, company politics)...and all they need is one excuse to make the decision easy.

  • Here's a nickel, kid (unregistered) in reply to takimoto
    takimoto:
    Jay:
    I am completely sympathetic to people who object to being called a n----r and I certainly will not use that word.

    I object to being called "human" - I demand you call me Superior Overlord. See the pattern?

    Good thing that all this PC madness exists only in US - all other people do not see any problems in using words negro and negroid.

    1) All cultures have their stupidities & inconsistencies.
    1. These are often expressed as "taboos"...or, if you wish, being "Politically Correct".

    2. Racially homogeneous societies may not have PC(racial), but may have PC(class) or PC(caste) or PC(heredity) or PC(dialect) or PC(location) or PC(religion), etc.

  • OP (unregistered)

    It's a true story, there are some more details in the sidebar thread. the story has obviously been simplified--reality is always more complicated.

    Answers to some questions which have popped up more than once:

    1. Markov generation was one of many options presented to the client. Others were:
      • Numeric (999-999-999)
      • CIA "codewords" with prefix+word a la HAVE BLUE or PAVE HAVOK
      • UK post code style AnAnAnAn "H4j3d2l5"
      • manually entered
      • import of their internal account IDE
      Each of these were first rejected for various reasons.
    2. Client was old-school-- no Internet access for those workers involved.
    3. nobody had to type in the ID's. My understanding was that someone n their staff maintained a dictionary file mapping these to their internal identifiers, portions of which were printed and distributed to those who needed to know. Not sure about all the details on their side.
    4. we used the romanization of the book used as a source for the text.
    5. it took two days because in addition to the corpus parser and word generator, there were design docs, db changes to hold the mapping id's, URL parameter format parsing, internal mapping between these id's and actual entities, test cases, and lots of time searching for good text sources. We went through a lot of test runs to see what would come out.
    6. Much of the testing staff being foreign and time pressure being high, nobody noticed the fairly rare oddities.
    7. A bad-word filter was in fact implemented, and a few hundred curse words were checked in to version control. One the target language was changed away from English, everybody thought it was unnecessary. There was actually a group of people scribbling away on a whiteboard. But nobody could agree how bad or how marginally offensive a word needed to be Included.
    8. Internally, the client wasn't fully supportive of the project. In their company politics there were some who thought it was a mistake to spend much effort on the Internet until it was more well established, and were ready to pounce on any excuse to shut down this particular VP's project
    Captcha: damnum. Oh how I wish iPhone could take a safari screenshot.
  • Kef Schecter (unregistered) in reply to アレック
    アレック:
    I hate anything but nihonsiki. For the purpose of spelling and grammar it makes no sense. The words bear only a rudimentary resemblance to what they should be. Eg the conjunctive form in hepburn/kunrei transforms ku -> ki su - shi tsu -> chi nu -> ni fu -> hi (etc)... There's no pattern! The same conversation in Nihonsiki ku -> ki su -> si tu -> ti nu -> ni hu -> hi That's a lot easier to work with because it all follows the same pattern.

    I think this is a relatively small advantage. Yes, Nihon-shiki does illustrate patterns more easily, but you still have to abandon those patterns when you pronounce them aloud anyway. In other words, the irregularity doesn't go away, it just gets moved over into pronunciation instead of spelling. So now you still have a counterintuitive system, and one that people can't even pronounce correctly without a bit of mental training.

    For the record, I learned using kana anyway, sidestepping the whole issue. If you know kana, then which romanization system you use when you do have to use romaji is surely a trivial matter, as any mind capable of learning the kana is capable of handling multiple romanization systems.

    • Kef
  • Postman Pat's Black And White Cat (unregistered) in reply to OP
    OP:
  • UK post code style AnAnAnAn "H4j3d2l5"
  • psst... that isn't the UK post code format.

    United Kingdom post code format is: A(A)N(A/N)NAA ( though most leave a space before the last 3 characters. )

    And if by UK you meant Ukraine, well their post code format is: NNNNN

    I don't think your suggested format "AnAnAnAn" actually matches the post code of any nation, but a shorter version - "AnAnAn" - is the format for post codes in Canada.

  • oheso (unregistered) in reply to Parsons
    Parsons:
    N0G:
    Reminds me of the Japanese twins Fook Mi and Fook Yu in Austin Powers 3. Someone's going the right way for a smacked bottom!

    Okay, I know it was supposed to be a comedy, but surely neither of those names are actually erm... Japanese ?

    Fukumi. Very common girl's name. The second "u" is voiced, though, since the following consonant is voiced. (But it's often not voiced very much ... ;-) )

    Fukuyu would not be a common name, but would still be entirely possible. I would not be surprised to be introduced to someone named Fukuyu.

    Shiho is another matter all together ...

  • Barry..or Am I Brian (unregistered) in reply to Brian..or Am I Barry?
    Brian..or Am I Barry?:
    Did anyone else notice how Barry and Brian switched places in the story a few times?
    I totally did...or did I?
  • gregb (unregistered) in reply to krupa

    Political correctness is substituting words with an euphemism or jargon. Like using 'physically challenged' instead of 'disabled'. So 'you are politically correct', instead of 'you are stupid'. Which is recursively challenged.

  • Edward Royce (unregistered)

    Ok.

    "diefatsu"

    I'd buy a car with that logo.

  • Parsons (unregistered) in reply to oheso
    oheso:
    Parsons:
    N0G:
    Reminds me of the Japanese twins Fook Mi and Fook Yu in Austin Powers 3. Someone's going the right way for a smacked bottom!

    Okay, I know it was supposed to be a comedy, but surely neither of those names are actually erm... Japanese ?

    Fukumi. Very common girl's name. The second "u" is voiced, though, since the following consonant is voiced. (But it's often not voiced very much ... ;-) )

    Fukuyu would not be a common name, but would still be entirely possible. I would not be surprised to be introduced to someone named Fukuyu.

    Shiho is another matter all together ...

    well... I stand corrected :)

    ( maybe I shoud just get back to... The Project... )

  • Choose Your Own IP (unregistered) in reply to Jay

    Can you at least grasp the concept that offensive words can only be offensive when Person A says them in order to insult Person B?

    Here we have A reading computer-generated words to B in order to transfer information.

    Or do you feel that Free Speech should bow to courtesy? How about a public reading of a novel with so-called strong language? Should the reader just skip the iffy words?

  • kosh (unregistered)

    The problem of reading binary strings aloud was partially solved by 1995 with the 11-bit to English word S/KEY encoding specified in RFC1760 (http://tools.ietf.org/html/rfc1760).

    It would remain to introduce some kind of checksum to ensure validity and possibly a length encoding for variable binary strings (c.f. also Dan Bernstein's "netstrings" proposal).

    It is still possible to produce offensive statements in RFC1760 encoding (e.g. "EVIL" "DAVE" "ATE" "GAY" "KNOB" is a valid string) but perhaps less likely, and the list could easily be sanitised further.

  • (cs) in reply to LU

    Rip Blu-ray with 12 different ways. Download Blu-ray rip software for free and rip Bluray to DVD, PC, MP4, ISO, AVI, MPEG, iPod, WMV, MOV, etc.

  • panda (unregistered) in reply to Matthew

    actually, there is a specific and set way to pronounce words in Japanese - even though the words shown are "made up", any Japanese speaker would pronounce them exactly the same way. (presumably this is one of the reasons why they settled on it as a source language for the generator - Japanese pronunciation is both regular and easy for western speakers.

    For example one of the words given - "fukushita" - would be pronounced "foo-koo-she-tah" (and actually the word "fukushita" is the past tense form of the verb "fuku suru" meaning "to return to normal" and would be written 復した in Japanese). Another example would be the cited "kakashite" which would be pronounced "kah-kah-she-teh" (and now that I think of it, that is actually a conjugated version of a real verb in Japanese 欠かす, meaning "to miss") Anyway, I think that's what Incourced was getting at when s/he was talking about "pronouncing them correctly."

  • (cs)

    Fukushita ??????

    FUKUSHIMA !!!!!!

  • John (unregistered)

    I call bull-ファッキングシット.

    You can't spell a single past tense sentence in Roman characters in Japanese without the substring "shit". Which means there's no way this happened.

Leave a comment on “The Automated Curse Generator”

Log In or post as a guest

Replying to comment #:

« Return to Article