• Andy (unregistered) in reply to Neil
    Neil:
    Cyberzombie:
    "^\w*$"

    Simple enough.

    1. Inside a double quoted string, \ is typically an escape character. Even if there is no special meaning for \w you will still lose the .

    1. Inside a regex, \w typically matches underscores too.
    And in JavaScript, \w only matches non-accented Latin word characters.

    http://blog.stevenlevithan.com/archives/javascript-regex-and-unicode

    captcha: facilisi JS lacks the facilisi to match ελληνικές λέξεις

  • Pete (unregistered) in reply to chris
    chris:
    Peteris:
    The OR solution would have different results in cases where a record matches multiple search criteria - in that case, the current solution would include duplicate rows.
    I don't think it would. Assuming that SQL Server and the w3schools page I just read follow standard practice, UNION ALL concatenates the results of the queries, including duplicates, but UNION returns a unique set.

    Tru, my bad - looked at my old sql code, and it's UNION ALL where needed. But even then the statements are not identical - if the table contains multiple identical rows, then UNION would throw out duplicates, but OR criteria would return as many rows as there were in the original dataset.

  • Fyrilin (unregistered) in reply to Cyberzombie
    Cyberzombie :
    "^\w*$"

    Simple enough.

    Word, yo.

    captcha: ingenium - quite so

  • (cs)

    Assuming the last one is Javascript, and it looks like it, it isn't so trivial. Can be done with a regex.

    See http://stackoverflow.com/questions/6800536/isalpha-replacement-for-javascript

  • JAPH (unregistered) in reply to HAL
    HAL:
    Return "Need to fix this"
    I'm sorry, Dave. I'm afraid I can't do that.
    That comment is so 2001.
  • Chris (unregistered)

    I'm the one who submitted that or/union sample. I can appreciate that sometimes less logical SQL can be faster, but I can assure you that that though would never have entered this guy's head. The query completes close to instantly either way.

  • Alex (unregistered) in reply to MySQL geek

    Not only MySQL. Also in some specific cases in Sybase (which Mssql derives from). So this might "just" be premature optimization ;)

  • (cs) in reply to Brillant
    Brillant:
    TRWTF is to say that regular expressions are tidy and save processing power :-)))
    This. When you run a regular expression over a string, you are running a script interpreter. Coding out what you are actually doing will always be faster, and more readable to boot.
  • (cs) in reply to Cantabrigian
    Cantabrigian:
    "regex scares me a little, OK, alot actually"

    Aaaaargh! It's "a lot" - two words.

    No, it's the alot, which is better than you at everything.
  • F (unregistered) in reply to Mason Wheeler
    Mason Wheeler:
    Brillant:
    TRWTF is to say that regular expressions are tidy and save processing power :-)))
    This. When you run a regular expression over a string, you are running a script interpreter. Coding out what you are actually doing will always be faster, and more readable to boot.

    Except, of course, when you're using a language that compiles regexes, in which case it will most likely produce faster code. And in this particular case the regex will be more readable and certainly easier to check (are you sure the uppercase O hasn't been miskeyed as a zero?)

  • (cs) in reply to JAPH
    JAPH:
    HAL:
    Return "Need to fix this"
    I'm sorry, Dave. I'm afraid I can't do that.
    That comment is so 2001.
    Dave's not here, man.

    That comment is so 1971.

  • fa2k (unregistered)

    Need to fix this

  • Victor (unregistered)

    Has anyone considered turning option strict on for comments?

  • Herp (unregistered) in reply to Larry
    Larry:
    The problem with regexen is a lot like the problem with library functions: you can never be entirely sure what they're doing. So, in cases like this where the results are important, it is safest to write your own.

    I really like how this was chosen as a featured comment so that it would immediately piss of anybody who views the article. Hilarious!

  • Uhh (unregistered) in reply to Mason Wheeler

    I was under impression that modern regex libs compiled the regex to a provably optimal version - so if you're really good, you may code a version that's as fast in processing and lacks some initialization overhead; but if you're not perfect, then your code will be slower.

  • (cs) in reply to MySQL geek
    MySQL geek:
    Actually, here UNION is the same as OR, but in the MySQL point of view, presents a better performance ;)

    Actually, while UNION and OR yield the same result in this example, the UNION requires a sort to remove duplicate rows (because it returns DISTINCT rows). It will not perform as well as an OR.

    (My basis for this conclusion is based on IBM host DB2, but I confirmed similar behavior for MYSQL; see UNION vs UNION ALL performance.)

    As he notes in the link, UNION ALL will perform much better (because no sort is required) but it is not exactly the same as OR, since overlapping predicates could cause a row to be returned more than once.

    Addendum (2012-11-28 14:22): Oh, and a side-case for UNION (UNION DISTINCT):

    Suppose a table exists where there is no unique key, such that duplicate rows could exist (that is, multiple rows having exactly the same value in each column). On a table of that type, UNION is not the same as OR either, since OR would return all matching rows, but UNION would delete the duplicate rows.

    That's because UNION operates on the distinct data value of the whole row. So if two rows have the same values in all columns, UNION will remove the second row.

    This can't happen if the table has a unique key because no duplicate rows can exist.

  • itzac (unregistered) in reply to Larry

    The problem with prepackaged CPUs is that you can't never be sure how the gates are wired. So in cases like this, where the results are important, it is safest to layout your own IC.

  • (cs)

    And as for "EntityTypeDescription", I don't think it's enterprise-y enough. He also needs "EntityTypeMetaType", "EntityTypeMetaTypeDescription", "EntityMetaEntityType", "EntityMetaEntityTypeDescription", "EntityMetaEntityTypeMetaType", and "EntityMetaEntityTypeMetaTypeDescription". Just to achieve that full enterprise flavor.

  • (cs) in reply to MySQL geek

    Also true for Oracle, provided the referenced columns are indexed...

  • Alistair (unregistered) in reply to @Deprecated
    @Deprecated:

    Did he really have to use charAt for two linear ranges? I think that if (c >= 'a' && c <= 'z') is less error-prone than "abcdefghijklmnpqrstuvwxyz".chartAt(c) != -1 Quick, spot the error!

    Gives false positives in EBCDIC.

  • (cs)

    After playing around with UNIONs and ORs in SSMS, it's interesting to note that the ORDER BY clause has a bigger performance hit on the UNION version than the OR version. At least in my tests.

  • grokpg (unregistered) in reply to recently departed colleague
    recently departed colleague:
    Hey Chris, sorry for all the injections, but please think a bit and may be you will realize that on old systems UNION worked faster than OR.

    captcha valid lol

    a) That's only a valid excuse if the code is running on one of these alleged old systems. b) "Think a bit" is bullshit - this isn't something you could magically figure out just by thinking if you didn't already know it worked like that.

    Conclusion: you're a worthless retard.

  • Mark (unregistered) in reply to Peteris

    Maybe if the results needed to be ranked by the number of criteria matches? Group the results and sort by highest count of duplicates. Then filter out the duplicates and you have a list sorted by relevancy. Probably a better way though.

  • (cs) in reply to Larry
    Larry:
    The problem with regexen is a lot like the problem with library functions: you can never be entirely sure what they're doing. So, in cases like this where the results are important, it is safest to write your own.

    It's not limited to using regular expressions, though. I have seen a massive amount of code where the developer definitely was not entirely sure what they were doing.

  • Some guy (unregistered) in reply to MySQL geek

    Actually, here UNION is the same as OR, but in the MySQL point of view, presents a better performance ;)

    Same with IBM DB2. I once re-wrote a large query with 20 ORed predicates into 20 UNIONed queries. Massive performance gain - due to each sub-query being run in parallel.

    The bigger WTF is the use of 'SELECT *'.

  • Barf 4Eva (unregistered)

    Not sure how I feel about the UNION issue. That could very well be valid, if the OR statement is going to cause the RDBMS to generate a bad exec plan per not being able to logically choose the right indexes. In the example shown here, the UNION might make sense. However, if we had something that would filter down a majority of rows from the table such that it would read as follows:

    WHERE bestFilter = some_value and (something1 like 'abc%' or something2 like 'abc%' or something3 like 'abc%' )

    The UNION will end up being more costly, most likely. I have already filtered out most rows with an index that will certainly be used.

    Whether or not it will be a WTF, well... It just depends. :)

  • Barf 4Eva (unregistered)

    Then again, if the example here is close to the orginal, the querying of three different attributes for the same data seems a bit flaky. :)

    Perhaps the UNION is not the WTF, but everything leading up to it is... More information is needed?

  • Barf 4Eva (unregistered) in reply to Some guy
    Some guy:
    > Actually, here UNION is the same as OR, but in the MySQL point of view, presents a better performance ;)

    Same with IBM DB2. I once re-wrote a large query with 20 ORed predicates into 20 UNIONed queries. Massive performance gain - due to each sub-query being run in parallel.

    The bigger WTF is the use of 'SELECT *'.

    It is, but I think for ease of writing views that need to expose everything to the sql developer, it's not all that bad as long as you rebuild your views nightly to capture any schema changes that won't be picked up otherwise...

  • Norman Diamond (unregistered) in reply to @Deprecated
    @Deprecated:
    Larry:
    The problem with regexen is a lot like the problem with library functions: you can never be entirely sure what they're doing. So, in cases like this where the results are important, it is safest to write your own.
    In many cases it is safest NOT to write your own! EG., encryption. Or did I just win a "whooosh"?
    Yes, you just won a "whoosh". Larry's irony was obvious and well done.
    @Deprecated:
    Did he really have to use charAt for two linear ranges? I think that if (c >= 'a' && c <= 'z') is less error-prone than "abcdefghijklmnpqrstuvwxyz".chartAt(c) != -1 Quick, spot the error!
    Which error? Treating é as a non-alphabetic? Treating some funny EBCDIC characters as alphabetic when running on a mainframe? Treating a as non-alphabetic? Treating α as non-alphabetic (where did the "alpha" in "alphabetic" come from anyway?)? Treating A as non-alphabetic? Or misspelling charAt as chartAt?
  • Norman Diamond (unregistered) in reply to Yank
    Yank:
    Steve The Cynic:
    Frak:
    Geoff:
    I'd rather right clean code
    and dirty English. Or did you left your brain home today?
    It's perfectly good English. To right something means to set it upright after it has fallen on one side (or even upside down). The RNLI (the UK's lifeboat-operating organisation, about the only UK charity I'd consider giving money to) has a stock of long-range shore-to-ship lifeboats that can right themselves.
    So you're one of those who likes to think the UK knows anything about good English?
    Well they know how to right it.

    The Linux makefile used to right clean code (not sure if it still does or not): make mrproper

  • 4c's A.G.N (unregistered) in reply to Cantabrigian
    Cantabrigian:
    "regex scares me a little, OK, alot actually"

    Aaaaargh! It's "a lot" - two words. You wouldn't write "afew", would you? Then again, it's probably better than writing "allot" (when meaning "a lot", not "share out"). </petpeeve>

    Oh, great, another grammar nazi. As if we needed more here, you guys are diamond dozen.
  • JustSomeGuy (unregistered) in reply to Yank
    Yank:
    Steve The Cynic:
    Frak:
    Geoff:
    I'd rather right clean code
    and dirty English. Or did you left your brain home today?
    It's perfectly good English. To right something means to set it upright after it has fallen on one side (or even upside down). The RNLI (the UK's lifeboat-operating organisation, about the only UK charity I'd consider giving money to) has a stock of long-range shore-to-ship lifeboats that can right themselves.
    So you're one of those who likes to think the UK knows anything about good English?

    Och aye the noo

    CAPTCHA: conventio : fellatio at a convention?

  • Friedrice the Great (unregistered) in reply to Frak
    Frak:
    Geoff:
    I'd rather right clean code
    and dirty English. Or did you left your brain home today?
    Yes, I left my brain at home today, that's why I'm reading the daily WTF. Makes it easier to learn kewl programming tricks.
  • jarfil (unregistered)
    var isChar = true; 

    Obvious WTF apart, I really HATE it when people state things in their code that are just not true.

    Do you know it's a Char? NO, not yet. So don't freaking set "isChar" to true! Make your checks, use whatever temporal variables you need, but for god's sake, don't name them "isChar" like it was the actual result... like it IS the actual result, and then you get on and on checking it on each iteration of the loop... WTF!

    Oh, right, you have to be able to break out of the look somehow. Let me guess, maybe "break" would work? Maybe setting the counter out-of-bounds so the loop condition is no longer met? But oh no, you had to use "isChar", the freaking result variable, to break out of the loop.

    Give me a break.

  • (cs) in reply to MySQL geek
    Coded Smorgasbord:
    No word on who "Dave" is though.

    I read that as text directed at Dave, not written by Dave. When attributing something to myself I usually use a double dash at the end. -- Mark

    MySQL geek:
    Actually, here UNION is the same as OR, but in the MySQL point of view, presents a better performance ;)

    The smiley face makes me think you're kidding, but I can't parse any humor out of your statement, so I'm not 100% sure.

    Coded Smorgasbord:
    I don't mind using fairly simple regex in my code. It's tidy and saves processing power.

    Which processor are you talking about? A regex engine certainly doesn't save any CPU power when compared against a purpose-built parser. I'm surprised when engineers can't understand that terse code isn't necessarily fast code, and vice-versa.

    It may save some processing power in your wetware, however, which is entirely the point of using abstractions like regex in the first place.

    lanmind:
    ...likely the bigger problem is the LIKE operators.

    Probably not. The LIKE operator can theoretically use an index if the wildcard is at the end of the query string. Of course, the likely SQL injection means a user can submit "%" in their query string in order to force a full table scan.

    lanmind:
    Not on SQL Server:
    SELECT Field01 = 1
    UNION
    SELECT 2
    UNION
    SELECT 0
    UNION
    SELECT 4
    ORDER BY Field01
    

    I almost choked on a Frito when I read this. I had to try this myself to believe it.

    Interestingly, SQL Server will also sort the results even if you omit the ORDER BY clause. You can't make this stuff up.

    Peteris:
    The OR solution would have different results in cases where a record matches multiple search criteria - in that case, the current solution would include duplicate rows.

    It shouldn't. UNION is a set operator, and as such, it should deduplicate results.

    http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.commands/html/commands/commands89.htm

    Ralph:
    Writing an arcane overly complex query to make the computer's work easier is premature optimization, like trying to outsmart your compiler.

    +1

    Tom:
    It's never as simple as it looks.

    +1

    Victor:
    Has anyone considered turning option strict on for comments?

  • Paul F (unregistered) in reply to jarfil

    My preferred way of doing this is to return false within the loop and just have return true at the end of the function.

  • Mark (unregistered) in reply to jarfil
    jarfil:
    Give me a break.

    Or just return false within the loop itself and return true immediately after. No need for temporal or result variable and no extra condition to check during iteration. In the case of this function, the intent would be perfectly clear.

  • A. Nonymous (unregistered) in reply to planB
    planB:
    Nonody noticed the order by only has an efect on the last select of the unions. i guess that's not the effect the developer was going for

    Because they know SQL?

  • Kakan (unregistered) in reply to jarfil

    Actually, the variable iSChar isn't needed at all. Just return false for the first char that's not in alpha. If the loop ends, return true.

    Like : for (var i=0;(i<sStr.length);i++)
    { Char = sStr.charAt(i); if (alpha.indexOf(Char)==-1) { return false; } } return true;

  • (cs) in reply to Yank
    Yank:
    So you're one of those who likes to think the UK knows anything about good English?
    Well, I wouldn't want to speak for the Welsh, the Scots, nor the Northern Irish, because I'm English, and they're not. I think England might know something about English. You know, like being where the language actually comes from...

    (Cue Spike Milligan:

    The English, The English, The English are best. I wouldn't give tuppence for all the rest.
    )

    ("tuppence": Two pence, that is £0.02, which is worth more than your feeble $0.02, but still not worth all that much.)

  • (cs) in reply to jarfil
    jarfil:
    var isChar = true; 
    Obvious WTF apart, I really HATE it when people state things in their code that are just not true.

    Do you know it's a Char? NO, not yet. So don't freaking set "isChar" to true! Make your checks, use whatever temporal variables you need, but for god's sake, don't name them "isChar" like it was the actual result... like it IS the actual result, and then you get on and on checking it on each iteration of the loop... WTF!

    Oh, right, you have to be able to break out of the look somehow. Let me guess, maybe "break" would work? Maybe setting the counter out-of-bounds so the loop condition is no longer met? But oh no, you had to use "isChar", the freaking result variable, to break out of the loop.

    Give me a break.

    yeah, you should initialise it to FileNotFound

  • chris (unregistered) in reply to Yank
    Yank:
    So you're one of those who likes to think the UK knows anything about good English?
    Sure we do. It's just that we also know about French and ancient Greek and we like our spelling to reflect our broad cultural background.;-)
  • (cs) in reply to Steve The Cynic
    Steve The Cynic:
    Yank:
    So you're one of those who likes to think the UK knows anything about good English?
    Well, I wouldn't want to speak for the Welsh, the Scots, nor the Northern Irish, because I'm English, and they're not. I think England might know something about English. You know, like being where the language actually comes from...

    (Cue Spike Milligan:

    The English, The English, The English are best. I wouldn't give tuppence for all the rest.
    )

    ("tuppence": Two pence, that is £0.02, which is worth more than your feeble $0.02, but still not worth all that much.)

    If Spike Milligan made this quote before 1971 then tuppence was 2d not 2p and was worth 1/120th of a pound as there were 240 pence in the old pound.

    Therefore for 2d to be worth more than 2 US cents, the exchange rate GBPUSD would have to have been more than 2.4 (2.4 US dollars to the pound sterling). I'm not sure it ever got that high, even before 1971.

    Of course, back in those days, 2d was worth something, more than 2p is worth now.

  • pencilcase (unregistered) in reply to Steve The Cynic
    Steve The Cynic:
    [...(Cue Spike Milligan:
    The English, The English, The English are best. I wouldn't give tuppence for all the rest.
    )

    ("tuppence": Two pence, that is £0.02, which is worth more than your feeble $0.02, but still not worth all that much.)

    Actually, that song was by Flanders and Swann, not Milligan. And it was written firmly tongue-in-cheek, not as an actual Yank-bashing exercise. And, since it was written in the 1950s, tuppence was actually worth £0.008333 (12 pennies in a shilling, 20 shillings in a pound). 3/10 Could Do Better.

  • Nick (unregistered) in reply to Steve The Cynic
    (Cue Spike Milligan:
    The English, The English, The English are best. I wouldn't give tuppence for all the rest.
    )

    I thought that was part of a song by Flanders and Swann. Basic googling didn't show any connection to Spike Milligan.

  • beano (unregistered) in reply to Cbuttius
    Cbuttius:
    Steve The Cynic:
    Yank:
    So you're one of those who likes to think the UK knows anything about good English?
    Well, I wouldn't want to speak for the Welsh, the Scots, nor the Northern Irish, because I'm English, and they're not. I think England might know something about English. You know, like being where the language actually comes from...

    (Cue Spike Milligan:

    The English, The English, The English are best. I wouldn't give tuppence for all the rest.
    )

    ("tuppence": Two pence, that is £0.02, which is worth more than your feeble $0.02, but still not worth all that much.)

    If Spike Milligan made this quote before 1971 then tuppence was 2d not 2p and was worth 1/120th of a pound as there were 240 pence in the old pound.

    Therefore for 2d to be worth more than 2 US cents, the exchange rate GBPUSD would have to have been more than 2.4 (2.4 US dollars to the pound sterling). I'm not sure it ever got that high, even before 1971.

    Of course, back in those days, 2d was worth something, more than 2p is worth now.

    Before 1970 the pound sterling regularly traded at above $2.40 to the pound.

  • (cs) in reply to pencilcase
    pencilcase:
    Actually, that song was by Flanders and Swann, not Milligan. And it was written firmly tongue-in-cheek, not as an actual Yank-bashing exercise. And, since it was written in the 1950s, tuppence was actually worth £0.008333 (12 pennies in a shilling, 20 shillings in a pound). 3/10 Could Do Better.
    Bah. Shows the superiority of basic googling versus long-term (i.e. decades since I last heard it) human memory.

    And yes, I know about pounds, shillings, and pence. I was born before the old system was abandoned. As opposed to my youngest colleagues, who weren't even born when the UK stopped putting "NEW PENCE" on coins, in favour of the simpler "PENCE".

  • pencilcase (unregistered) in reply to Steve The Cynic
    Steve The Cynic:
    pencilcase:
    Actually, that song was by Flanders and Swann, not Milligan. And it was written firmly tongue-in-cheek, not as an actual Yank-bashing exercise. And, since it was written in the 1950s, tuppence was actually worth £0.008333 (12 pennies in a shilling, 20 shillings in a pound). 3/10 Could Do Better.
    Bah. Shows the superiority of basic googling versus long-term (i.e. decades since I last heard it) human memory.

    And yes, I know about pounds, shillings, and pence. I was born before the old system was abandoned. As opposed to my youngest colleagues, who weren't even born when the UK stopped putting "NEW PENCE" on coins, in favour of the simpler "PENCE".

    No googling required - I am in my 50th year, and remember "New Pence", or "New Pee" and all this oldie stuff. Hence why I put "pennies", not "pence". But I couldn't see the English song being attributed to SM without reacting. Mind you, if your memory's half as bad as mine is, I'm not surprised at you being a little off the mark. Now, where did I leave my specs...?

  • faoileag (unregistered) in reply to jarfil
    jarfil:
    I really HATE it when people state things in their code that are just not true
    Are they not? I'd say the approach is: "The character at position i is a char until proven otherwise".
    jarfil:
    Oh, right, you have to be able to break out of the look somehow. Let me guess, maybe "break" would work?
    It would work nicely, but a "break" is a "goto" and "gotos" are not allowed in a lot of coding rules.
    jarfil:
    Maybe setting the counter out-of-bounds so the loop condition is no longer met?
    Violates the single-responsibility-principle - the job of "i" is to express the position in the string and not whether the last tested character was not a char.
    jarfil:
    But oh no, you had to use "isChar", the freaking result variable, to break out of the loop
    Yes, because in quite a few circumstances this is the cleanest way to write that test!

    What I really HATE is people ranting about how the style is wrong of otherwise bug-free code.

  • Bart Fargo (unregistered) in reply to Cantabrigian
    Cantabrigian:
    "regex scares me a little, OK, alot actually"

    Aaaaargh! It's "a lot" - two words. You wouldn't write "afew", would you? Then again, it's probably better than writing "allot" (when meaning "a lot", not "share out"). </petpeeve>

    Frankly, my dear, I don't give adamn.

Leave a comment on “The New TODO and More”

Log In or post as a guest

Replying to comment #:

« Return to Article