• Anonymous (unregistered) in reply to Okayyy, hes in school ffs.
    Okayyy:
    Bah. Europa. I prefer scandinavia. ;) Btw, there is a very small and odd language apart from nederlands. ;) dunno what its called in english though. Flamländska in swedish. Cool language.

    Flemish ?

  • (cs) in reply to DryTyler
    DryTyler:
    To be really pedantic, in Germany we speak Deutsch You may call it German, but I know the language I am speaking. It is Deutsch

    ;-)

    Pedantically "Sie sprechen Deutsch" but "You speak German".

  • denz (unregistered) in reply to nobody

    no, it won't match.

  • denz (unregistered) in reply to nobody

    (Previous post didn't exactly do what I wanted...)

    but it would still allow urls like /duelish/foo/bar.html and match it with "deu"
    No it won't match.
  • woohoo (unregistered) in reply to DOA
    DOA:
    Oh, man,I nearly had an aneurism from laughing at that. That's probably what their elementary school maps look like...

    you mean the map "the world acording to america"?

    I don't think the problem is that the maps in elementary schools look like this, but more that the map in the oval office (or in the mind of a certain person sitting in there) seems to look like this... ;o))

  • Anon-y-moose (unregistered) in reply to Anonymous Tart
    Anonymous Tart:
    In our system we have uk (means English innit!), french, german, (sanity takes hold after here), es, jp, ko, br, etc

    "innit" would make it chavglish, which is spoken far too much in the UK ... wikked!

  • merreborn's nemesis (unregistered) in reply to woohoo

    Really!

    Doesn't anyone else find it slightly odd that this website doesn't store this information in a useful place... like, oh, a session variable??? The real wtf is that this code needed to be written in the first place. Sessions are your friends.

    Also, (since I don't really know that ASP has sessions) any server-side cgi/scripting language without sessions or an alternative shouldn't ever be used. I'd assume that this isn't the case with ASP since it's so popular.

    And, of course, if there is no need for sessions, then why isn't the information simply created as actual pages on the server inside various files named after each language?

    CAPTCHA: doom...yep, this code makes me feel that way: doomed.

  • (cs) in reply to brendan
    brendan:
    This is C# because it is the only language that uses string(with no capital letter), and also the style of comments used is unique only to C#.

    Just to be pedantic... I agree this is C#, but your reasoning above is wrong. Delphi uses string (or String), as it's case-insensitive, and supports NDoc style /// comments.

  • (cs) in reply to Eirik
    Eirik:
    mathew:
    No, the REAL WTF is that the user's web browser already has a preference for what language they want to see the web site in, so the web server should just be using that.

    No no no, that is bad practice. Then the visitors can't choose in which language they want to see the site. You shouldn't make them have to fiddle around with the browser settings for every site on the internet.

    Why don't you all just learn english? It's so easy, I did it when I was a little baby - heck it's easier than using the toilet.

  • Marader (unregistered)

    Thank God that they did not include South Africa's 11 official languages. But at least then we could actually do some real performance testing :P

  • Mayer (unregistered)

    That's a political statement disguised as a piece of code!

  • (cs) in reply to Anon-y-moose
    Anon-y-moose:
    Anonymous Tart:
    In our system we have uk (means English innit!), french, german, (sanity takes hold after here), es, jp, ko, br, etc

    "innit" would make it chavglish, which is spoken far too much in the UK ... wikked!

    sigh English: the language that accidentally happened when a Norman knight tried to negotiate purchasable affections with a Saxon bar tart.

  • (cs) in reply to obediah
    obediah:
    Eirik:
    mathew:
    No, the REAL WTF is that the user's web browser already has a preference for what language they want to see the web site in, so the web server should just be using that.

    No no no, that is bad practice. Then the visitors can't choose in which language they want to see the site. You shouldn't make them have to fiddle around with the browser settings for every site on the internet.

    Why don't you all just learn english? It's so easy, I did it when I was a little baby - heck it's easier than using the toilet.

    sigh English: the language that accidentally happened when a Norman knight tried to negotiate purchasable affections with a Saxon bar maid.

  • William (unregistered) in reply to Pon

    The missing Swiss language (after German, French, and Italian), from http://dict.leo.org: Rhaeto-Romanic - a group of Romance languages spoken in eastern Switzerland and northeastern Italy [ling.]

    I've heard that it is a surviving remnant of vulgar latin.

    Orfay ouay mericansay, isthay eryvay oseclay otay igay-atinlay.

  • small-car? (unregistered)
    In Europe, they do things a little bit differently ... they drive tiny little cars.

    Mine's a Porsche; small, I suppose, but I'd rather it than an over-sized American 'sports' car (which, in itself, is surely a wtf-worth term)

  • not dubya (unregistered) in reply to Chris

    <quote user="Chris">To be pedantic, in Germany they speak German</quote>

    no,

    in germany they speak Deutsch.

    have you not heard the phrase Sprechen Sie Deutsch?

    The english world calls DeutschLAND Germany and says they speak German. however they speak Deutsch in DeutschLAND.

    their languauge, i think they have naming rights....

    CAPTCHA: dubya, in the US centric world they do speak German i guess....i stand corrected by DUBYA

  • mmm hash (unregistered) in reply to Whiskey Tango Foxtrot? Over.
    Whiskey Tango Foxtrot? Over.:
    *cough*HASHtable*cough*

    list is too small..overhead in setting up hashtable would eat into any gains

  • patrickniko (unregistered) in reply to Russ

    They have mod_rewrite like ISAPI filters that will rewrite the URL before passing it on to your actual C# code, but ya, its a wtf for sure.

  • shinmawa (unregistered) in reply to Skybert
    Skybert:
    Well, except Dutch (which should be 'nld'), all of these are actually ISO language codes (ISO 639-3 if you care).

    Except, of course, that the ISO 639-3 spec says "dut/nla Dutch". There's no 'nld' in there at all.

  • Jens Fudge (unregistered)

    The Real WTF here is: Seeing how many Americans are so self-centered, they cannot grasp the context of differet Countries residing in just one continent...

    Well, heres a news flash: We've even got different kinds of money, and guess what, Denmark has fewer inhabitants than London... And we even have a Queen !!!

    ;-)

  • Adriaan Renting (unregistered) in reply to turbothy
    turbothy:
    Mario:
    I speak Dutch (Flemish), btw.

    My condolences.

    Interesting fun fact: As a Dane, reading a Dutch newspaper is quite possible (easily 75% comprehension) but listening to Dutch ... 0% comprehension.

    You might find that it actually depends a lot on which dialect of Dutch people are speaking. I think might be easiest for you to understand Dutch from the most northern part of Holland (west-frysia) or a Frysian speaking Dutch, as those dialects are already closer to the scandinavian languages as for example the flemish dialects.

  • Neil (unregistered)

    If European cars are small, you should see what they drive in Japan and Korea.

    Oh wait, Japan and Korea are now in Europe apparently...

  • Laie Techie (unregistered)
    1. It doesn't use ISO for all the languages

    2. The function only compares the first 3 letters of the string (and it compares them 1 by 1). It should use a built-in string comparator.

    3. If you still insist on using an array, are the languages in the optimal order? Is Spanish really the least used language for that site? Assuming all languages have an equal likelihood of being used, each test string is compared to 6 known languages each request (11/2 = 5.5). By populating the array in alphabetical order and using a binary search algorithm, that drops down to 4 (log2 11 =~ 3.5).

    4. People are talking about hash tables in this thread. That is a waste of time unless c# doesn't support sets. A hash table entry has a key and value. Since we only want to verify the existence of a key, the memory for the value is utterly wasted. Sets distributed with the standard Java package include one that uses a binary search algorithm and another which has similar performance to a hash table.

    captcha: tesla

  • OMNIVORE (unregistered) in reply to modelnine

    Why do some men,

    when flushing the toilet use their foot,

    not use their foot when flushing the urinal?


    Moral: In code, as in real life, sometimes comments can hurt.


    CAPTCHA = sanitarium (double-labotomy anyone?)

  • Da' Man (unregistered)

    The real WTF is that this guy didn't use the 2-letter ISO language codes for languages.

    Captcha: dubya

  • This Way Out (unregistered) in reply to mbvlist

    Since this is off topic already ->

    Remembering that there was some sort of war because Americans didn't want to be under British rule, American English started to deviate because a guy called Webster thought it would be better for the Americans to differentiate themselves from the English. He went so far as to write a dictionary. I think Webster was originally of Scottish origin.

  • rojer (unregistered) in reply to Russ
    Russ:
    WTF are they doing this in the application? They should just have apache rewrite the url... something like this:

    RewriteCond %{REQUEST_URI} ^(/[^/])+)/(.)$ [NC] RewriteRule ^/(.)$ http://www.example.com/$2&lang=$1 [P,QSA]

    Now I haven't tested it, but you should probably see what I mean.

    yep. but your suggestion is itself is a wtf because: 1) that P in flags means proxy. this is VERY expensive and, assuming you're rewriting on the same host, unnecessary because apache will connect to itself to do this rewrite. 2) you're matching REQUEST_URI in rewritecond, which doesn't make a lot of sense, because this is what is matched in the first argument of every rewriterule (yes, the one you match with the stereotypic '^(.*)$', unnecessarily slowing down regex processing by creating a match group that is never used). if you really do all your stuff in conds and want to skip uri matching in the rule, the minimal valid regex to use is just ^ by itself. 3) you use NC in flags for rewritecond which makes the match case insensitive and then don't actually write anything case sensitive in you regex. 4) match groups in rewritecond are accessed via %number etc. $number accesses match groups in rule's own regex, but that's just syntax, your rewrite wouldn't work. 5) it is considered better style to use full flag names instead of short. had you typed 'proxy' instead of just 'P', chances are you'd actually thought about it and wouldn't do a blunder that is item 1 in this list. so, my take would be:

    RewriteRule ^/([^/])+)/(.*) /$2&lang=$1 [qsappend]

    this would only take an internal redirect to process.

    or, better still, you could pass language selection as an environment variable instead of mangling the query string:

    RewriteRule ^/([^/]+)/(.*) /$2 [env:LANGUAGE=$1]

    (don't mess up the LANG variable, just in case) modifying environment variable wouldn't even require an internal redirect.

  • (unregistered) in reply to Russ

    Well, the initial post is wrong on so many levels it's hard to know where to start.

    Just running C# and IIS are probably the cause of this problem.
    Content negotiation ( in this case for languages ) is part of the HTTP specification. http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html#sec12

    Just so everyone knows, there are standard two and three letter codes for languages: http://www.loc.gov/standards/iso639-2/langcodes.html So there is no need to f*k everything up by making your own.

  • !Z (unregistered) in reply to D-d-d-daaaaan

    Actually in America they speak Spanish too. And lots of other languages, but Spanish and Americanish are so prevalant all labelling must have both (well so it appeared to me). Coming from a much more consistently English speaking country it really stood out to me.

  • upser (unregistered)

    better than UPS shipping code check to see if the language is supported:

    if (lang.equals("us") || lang.equals("ca") || lang.equals("cz") || ..... 50 lines later... lang.equals("dk")) { return true; } else { return false; }

    Now where is that package I sent??!?!

  • (cs) in reply to Marader
    Marader:
    Thank God that they did not include South Africa's 11 official languages. But at least then we could actually do some real performance testing :P

    Ethiopia has something like 70 official languages ... with different written character representations

  • Design Pattern (unregistered) in reply to Harrow

    As already pointed out by others: There is no test that the letter after the language code is a "/".

    But what happens if the url does not even contain 3 letters in the place the language code is suspected?

    I smell an IndexOutOfBounds-exception!

    Harrow:
    I have added the requirement that each page preparer memorize and use a single letter representing his target language.

    If you do this, why don't you just use the numeric value of this letter as an array index? You can optimise away the foreach - loop then and the lookup will be O(1) instaed of O(N)!

    VALID_FOLDERS['e'-'a'] = "eng";
    VALID_FOLDERS['g'-'a'] = "deu";
    ....
    // all other: 
    VALID_FOLDERS[i] = string.Empty;
    
    public static string IsoFromUrl (string url)
    (
      if(url.length < 3 || url[2] != '/' ||
         url[1] < 'a' || url[1] > 'z')
      {
        return string.empty;
      }
      return VALID_FOLDERS[url[1]-a];
    }
    

    Only disadvantage is that the array will be slightly larger (26 positions if you support only ASCII letters). Empty slots will signal unsupported languages.

  • Ray Burns (unregistered)

    The given code is highly optimized.

    I'm surprised nobody has mentioned string.Intern() and the switch statement.

    In NET Framework, literal string constants such as "eng" are much faster for use in switch statements than computed values like url.Substring(1,3). This is because the literal string "eng" appearing in C# always references EXACTLY the same object, no matter which source file it was compiled from or which assembly it was loaded from. This allows the switch() statement (or == comparison) to do a simple reference equality check.

    If a computed string, such as would be returned from string.Substring(), is used in a switch statement, the string.IsInterned() method first computes a hash code, then looks it up in a hash table, both of which are comparatively expensive. In addition, a computed string requires an extra memory allocation which means that garbage collection will run sooner.

    The code given IS highly optimized if the resulting string will be used in switch statements or equality comparisons. The writer of the code apparently also knew that string.char[] is optimized away by the JIT, or else they just got lucky.

    Yes, it would be possible to optimize it more by writing:

    switch((int)url[1]<<16 | (int)url[2]<<8 | (int)url[3])
    {
      case 0x656e67: return "eng"; // 656e67 is hexadecimal for ASCII "eng"
      ...
    
    }
    

    but this is arguably much less readable and more bug-prone.

    The code to switch on numeric constants will take on average perhaps 20 CPU instructions, whereas the code as written will take more like 100. But consider the alternatives:

    • url.Substring(1,3) would take the same 100 instructions but not return interned strings. It would also incur garbage collection costs.
    • table.TryGetValue(url.Substring(1,3), out result) would take hundreds of CPU instructions to execute.

    That said, I do question the decision to create such highly optimized code for something that presumably gets called only once per web request, and also the decision to return strings instead of an enumerated values, but both of these decisions could be well justified, depending on what the rest of the application looks like.

    Final quiz: Both of the following could be optimized by replacing or refactoring the string switch(), but as they stand, which one is faster?

      void DoSomethingOne(string s)
      {
        if(s=="a") s="a";
        if(s=="b") s="b";
        ...
        for(int i=0; i<1000; i++)
        {
          ...
          switch(s)
          {
            case "a": ...
            case "b": ...
          }
        }
      }
    
      void DoSomethingTwo(string s)
      {
        ...
        for(int i=0; i<1000; i++)
        {
          ...
          switch(s)
          {
            case "a": ...
            case "b": ...
          }
        }
      }
    

    Answer:

    DoSomethingOne is faster. In fact, it is only about 20 instructions slower than if the switch statement was integer or enum-based. On the other hand, DoSomethingTwo does an extra string hash calculation and table lookup on every iteration, which takes several hundred instructions to execute.

  • S (unregistered) in reply to
    :
    Just running C# and IIS are probably the cause of this problem. Content negotiation ( in this case for languages ) is part of the HTTP specification.

    And ASP.NET also handles content negotiation etc. But since people can't/won't change the settings in their browsers (or for some reason want to read the page in a different language), there are usually options in web sites to change the language. For some reason usually you people forget that we really do have different languages in the world.

    Also, for another poster, before bashing IIS and talking about Apache's rewriting, the same can be done with IIS and ASP.NET easily. But please do tell me how to create 10 virtual hosts with standard Apache and set all of those to run all modules (e.g. PHP) with different credentials? Not so simple, is it? After all, even the Apache posse thought that Perchild module is unneeded, so they removed it (though it never even worked in the first place). So hooray to security and performance: either all modules run with the same user account OR you run 11 Apaches with one forwarding the queries to the other ones (and thus wrecking REMOTE_HOST etc stuff) OR you use CGI. Woohoo.

    Thank god I have IIS.

    (And let's not even talk about the number of bugs and holes found in Apache in the time that IIS6 has been in the market and has had none...)

  • jacobus (unregistered) in reply to VC1

    I think you mean radix search... Btw, i prefer readable, transparant code over pre-optimized code

  • jacobus (unregistered) in reply to cklam
    cklam:
    rien:
    Guybrush:
    In Germany they speak deutsch, shortened to "deu" in the table.

    you mean, in deutschland ? they speak german of course ! and when not speaking, they are driving big fast cars...

    Derrick Pallas:
    Apparently, they all speak different languages too.

    worst, they may speak different languages within the SAME country !

    In Germany, apart from german we have frisian, danish and at least one more separate language (which I can't remember right now) spoken by minorities. That does not account for all the dialects of german.

    German is also a minority language spoken in Switzerland, France (Alsace-Lorraine) and Italy (Southern Tyrol). The Austrians speaks german, too, of course.

    Other noteworthy minority languages are Basque (Spain and France), Catalan (Spain), Gaelic (both Irish and Scottish "versions") (UK), Welsh (UK).

    Switzerland is a case for itself: the official languages there include German (Swiss German dialect), Italian, French and one or two more languages spoken by very small minorities (can't remember the names right now).

    In Belgium they speak French in the South and Flemish (essentially Dutch - don't flame me Flemish readers) in the North.

    The list is by no means complete.

    Correction, in Belgium flemisch (dutch dialect) is spoken in the part called "Vlaanderen" which is in the western part, and french is spokenin "Wallonie" which is the eastern part of the country.

  • GrouchyAdmin (unregistered)

    Given the fact that this is obviously not running on a UNIX system, any snarky 'My god, the mod_rewrite should be used' falls flat, but still, I'm pretty damn sure that you can have almost any webserver support language plugins natively. Apache's done it since 1.3, and although I'm having issues using the proper terminology to find how to do it in IIS, this could be done in ASP quite trivially.

    Failing that, I'd just use select, but only if I had to. It's not like this is going to be too much more expensive.

  • Sebastian Ramadan (unregistered)

    That must be C#, and it appears to have a bug: It doesn't check the length. This could manifest as exceptions, or as allowing "languages" that aren't in the list (eg. http://www.company.tld/dangerous/path is valid because it starts with "dan").

    I find it difficult to believe that this part of their app was a significant bottleneck, but presumably, their initial profiling of this code indicated that strings were being compared too much here, so they eliminated the string comparisons in favour of byte comparisons. Regardless, the worst case here is verifying "spa"; This code must iterate over all eleven languages. By sorting the array and using a binary search (which C# has the built-in mechanisms to do), not only does the code look nicer, but it would also reduce that worst case of eleven to five. This is probably a better optimisation than their presumed string match replacement.

    public static string language_folder = new string[]
    {
        /* NB: These must remain sorted */
        "dan",
        "deu",
        "dut",
        "eng",
        "fin",
        "fra",
        "jpn",
        "kor",
        "nor",
        "spa",
        "swe"
    };
    
    public static string IsoFromUrl(string url) {
        int index = url.length < 4 || (url.length > 4 && url[4] != '/') ? -1 : Array.BinarySearch(language_folder, url.SubString(1, 3))
        return index < 0 ? string.Empty : language_folder[index];
    }
    

    Four byte comparisons fit nicely into a single integer comparison, so if absolutely necessary (however unlikely that is), one could switch the bsearch on an array of strings for a bsearch on an array of UInt32s, at the cost of code readability.

    CAPTCHA: damnum. On the topic of premature optimisation: You're damnumed if you do, yet damnumed if you don't...

Leave a comment on “Laying the Foundation for i18n, Brick by Brick”

Log In or post as a guest

Replying to comment #:

« Return to Article