• (cs)

    Damn and in my app I really needed to get Jesus's birthday verified in a reg -- oh well back to the drawing board

  • (cs)

    There ain't nothing regular about THAT

  • (cs)

    Well, 0001 would have to be allowed in order to squeeze a date in (AD or CE, your choice) 1 into a system that assumes the current century if no century value exists (I've actually had to code that sort of thing for historical time-line entries). 0000 does not represent a year, since the  year before 0001 is 1 BC(with optional E). That being said --- oy, ve!!!!

    Even if the system had no native date support (is there one?), there are easier (and more maintainable) ways of validating a formatted date than RegEx. Okay -- test for and fail on "not-digit, not-virgule" (deliberately not in character group format), or strip (replace with nuthin') before continuing if desired -- but that should be about the end of the game. As powerful as RegEx is, it is also nearly unreadable at the best of times when taken in quantity. Any code monkey coming behind can read (or learn how to read) something short like the "not-digit, not-virgule" example, and can adjust allowable dates using alternative methods (the Boss doesn't like April 7 -- ever). What happens to the RegEx when the boss doesn't like April 7?

  • (unregistered)

    This is my all time favourite regexp:

    http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

  • (cs)
    <FONT style="BACKGROUND-COLOR: #efefef">I can see it now:</FONT>
    <FONT style="BACKGROUND-COLOR: #efefef"></FONT> 
    <FONT style="BACKGROUND-COLOR: #efefef">"Johnson!  Great job on that date validation code!  Works great.  We need a minor change, though -- it should only allow weekdays, no weekends.  Shouldn't be too hard with your programming expertise, huh?  Have it ready by tomorrow."</FONT>
  • (cs) in reply to

    Yes, indeed -- that parrot is deceased [:|] Thanks for the link. WOW!

  • (unregistered) in reply to
    :
    This is my all time favourite regexp:

    http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html



    Blink Blink,,  Waaaaaaaaaaa[:'(]
  • (cs)

    Obviously, the built in date validation routines should have been used; much easier to tell exactly what he's trying to do that way. If there weren't validation routines [and there was], the more verbose approach with if statements is a million times easier to comprehend, test, and change.

    That said, I did read through the RegEx, and, on a first pass, it looks like it will do what I'm guessing was intended. As Matt says, even granting the context of using a RegEx for this, the mixture of non-capturing and capturing groupings when none are backreferenced [and none look reasonable to be backreferenced] is at least somewhat of a WTF. The weird year construction seems to be due to the fact that I don't know of a way to specify a multiple character string to not match in the middle of a RegEx that you are trying to match; he couldn't say "match any four digits in a row, except 0000." Of course, if year != "0000" is very easy and should be a clue to approach the problem in a different way.

  • (unregistered)
    ...which ends up allowing any four digits EXCEPT 0000. But 0001 is valid.
    That would be a good thing, 0000 isn't a valid date anyway.  Nor is 0, or 00, or 000.  There is 1 BC, then 1 AD.  There is no year 0.
  • (unregistered) in reply to
    :
    ...which ends up allowing any four digits EXCEPT 0000. But 0001 is valid.
    That would be a good thing, 0000 isn't a valid date anyway.  Nor is 0, or 00, or 000.  There is 1 BC, then 1 AD.  There is no year 0.


    The point was, this is all for a company Intranet site, and we certainly haven't been in business long enough to worry about allowing any numbers into a "request date" form before this millenia, let alone two thousand years ago.
  • (unregistered) in reply to Jeremy Morton

    The built in validation routines are tightly nesteled into system.web so you would not want to use it in a desktop app

  • (unregistered) in reply to
    The built in validation routines are tightly nesteled into system.web so you would not want to use it in a desktop app

    <font face="Georgia">How odd - why not?  Breathes there a desktop so finely tuned that its user has deleted all the system libraries that aren't relevant to the precise setup?  Breathes there a language programmer who hasn't read up on smart linking and dead code elimination?  Surely including a few routines from one library wouldn't also include ten million others unless they were directly referenced?

    (I ask from a position of unaccustomed ignorance here, since the languages I use are smarter than this, but I realise it's possible they're not normal.)

    Incidentally, obWTF: <font face="Courier New">{^(\d+)[-./](\d+)[-./](\d+)$}</font> should do the trick; anything further inside a regex is a sign of a diseased mind.  Validate data in code, not in your regexes; that, or wait for Perl 6, which looks like it finally fixes regexes permanently.
    </font>
  • (unregistered) in reply to

    (Sigh... this forum software is really broken, you know that?  Anyone else seeing my last comment in Flyspeck 3pt?)

  • (unregistered) in reply to
    :
    (Sigh... this forum software is really broken, you know that?  Anyone else seeing my last comment in Flyspeck 3pt?)
    Yep.
  • (unregistered) in reply to

    :
    (Sigh... this forum software is really broken, you know that?  Anyone else seeing my last comment in Flyspeck 3pt?)

     

    another WTF   

  • (cs) in reply to

    The built in validation routines are tightly nesteled into system.web so you would not want to use it in a desktop app

     

    There are other options, obviously. In VB.NET, IsDate would be an obvious choice (and C# has similar functionality).               

  • (cs) in reply to

    :
    (Sigh... this forum software is really broken, you know that?  Anyone else seeing my last comment in Flyspeck 3pt?)

     

    Evidently you have not learned what the word 'preview' means.

  • (unregistered) in reply to
    :
    This is my all time favourite regexp:

    http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html



    For some reason, this one reminds me of BrainFuck. [:S]
  • (unregistered)
    http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

    Holy CRAP. At any point did it ever occur to that guy that maybe, just maybe, a regex wasn't a good solution to that particular problem?
  • (cs) in reply to

    It's fast! You can't argue with fast! (It's a good argument that the grammar of the address field is too complex. It's a good argument for committal of the regex creator, too.)

    One of the biggest changes to perl 6 will be turning regexes from an increasingly overburdened and complicated grammar, into something more context-based and . Sure it won't all fit into 400 characters of line noise, but it also won't have to be recreated every time it needs to change because even the author needs an hour to puzzle it out later.

    btw, tinyanon, you should use \d{1,2}[./-]\d{1,2}[./-]\d{2,4} etc. Unless months with hundreds of days are now in vogue. ^_~

  • (cs) in reply to foxyshadis

    <FONT style="BACKGROUND-COLOR: #efefef">99% of the time you need to parse the date anyway, so just stuff it into DateTime.Parse and see if it complains ;) Faster than validating, then parsing XD</FONT>

  • (unregistered)
    It's _fast_! You can't argue with fast!

    Actually I can argue with fast. Who cares how fast it is when nobody other than the author can understand it? From my point of view its effective performance is ZERO.

    http://www.codinghorror.com/blog/archives/000185.html

    code that makes sense is code which can be analyzed and maintained, and that makes it performant.
  • (cs) in reply to mjwills
    Evidently you have not learned what the word 'preview' means.
    <font face="Georgia">

    Fug smucker, aren't you?

    Yep - first time I've ever posted without previewing.  Goes to show.

    Seems to be a bug relating to indented text.  Lemme check...  Nope.  Maybe it was the italics at the start... Nope.  Must be related to selecting a bit of text from a parent message and pasting it in.  I could probably debug that...  Naah.</font>


  • (cs) in reply to bat

    Word, bat.

    I still haven't figured it out either... But in all (or at least most of that I can recall) the cases where it happened to me, I didn't copy, paste, or quote anything that I can recall.   Just clicked "Reply", typed in some text, and clicked "Post".  I see no need to have to preview "trivial" postings just to make sure the forum software (or more specifically, the edit control) didn't screw me in the process...

  • (cs) in reply to Blue

    Actually, after thinking about it more, I did in fact cut and paste a few times without remembering to paste into notepad and recopy.  That must be related...

    Ok, so I'm a doofus who should preview his posts.  Fair enuff.

  • (unregistered) in reply to Blue

    Don't worry, the RSS gateway is a bit munted too -- not only can't you see any of the comments, but every few days I mysteriously get a duplicate copy of the last 11 posts, for no readily apparent reason.

    (And I'm fairly sure it's not my reader, because I'm subscribed to a number of feeds and this is the only one it happens to.)  Sigh.

  • (cs) in reply to
    :
    Don't worry, the RSS gateway is a bit munted too -- not only can't you see any of the comments, but every few days I mysteriously get a duplicate copy of the last 11 posts, for no readily apparent reason.

    (And I'm fairly sure it's not my reader, because I'm subscribed to a number of feeds and this is the only one it happens to.)  Sigh.

    I get the duplicates too - Thunderbird.
  • (unregistered) in reply to foxyshadis
    \d{1,2}[./-]\d{1,2}[./-]\d{2,4}

    Better yet: \d{1,2}([./- ])\d{1,2}\1\d{2}\d{2}?

    Which forces the date separator to be the same either side of the month, and only allows 2 or 4 digit years. Of course, neither solution restricts months or days to be valid.

  • (unregistered) in reply to
    :
    \d{1,2}[./-]\d{1,2}[./-]\d{2,4}


    Better yet: \d{1,2}([./- ])\d{1,2}\1\d{2}\d{2}?

    Which forces the date separator to be the same either side of the month, and only allows 2 or 4 digit years. Of course, neither solution restricts months or days to be valid.


    And of course, neither of those actually validates an international ISO standard date.
  • (unregistered) in reply to
    :
    :
    \d{1,2}[./-]\d{1,2}[./-]\d{2,4}


    Better yet: \d{1,2}([./- ])\d{1,2}\1\d{2}\d{2}?

    Which forces the date separator to be the same either side of the month, and only allows 2 or 4 digit years. Of course, neither solution restricts months or days to be valid.


    And of course, neither of those actually validates an international ISO standard date.

    ...which is exactly what?

  • (unregistered) in reply to
    :

    Actually I can argue with fast. Who cares how fast it is when nobody other than the author can understand it? From my point of view its effective performance is ZERO.



    If you download and read the source code for module Mail::RFC822::Address you will notice that it is quite easy to read and understand, presuming that you have some understanding of regural expressions.  The big beast on that page is only for display purposes.

  • (cs) in reply to
    :
    [image]  wrote:
    [image]  wrote:
    <blockquote><table width="85%"><tbody><tr><td class="quoteTable"><table width="100%"><tbody><tr><td class="txt4" valign="top" width="100%">\d{1,2}[./-]\d{1,2}[./-]\d{2,4}</td></tr></tbody></table></td></tr></tbody></table></blockquote>
    <br>
    <br>Better yet: \d{1,2}([./- ])\d{1,2}\1\d{2}\d{2}?
    <br>
    <br>Which forces the date separator to be the same either side of the
    month, and only allows 2 or 4 digit years. Of course, neither solution
    restricts months or days to be valid.
    <br>
    <br>
    And of course, neither of those actually validates an international ISO standard date.<br>



    ...which is exactly what?

    Here's a doc on the subject: http://www.cl.cam.ac.uk/~mgk25/iso-time.html

    (I'm starting to understand the gripes about the forum software. Is it /really/ necessary to use a bleeping word processor to compose these posts? Not to mention one that doesn't even work in one of the popular alternatives to that other wtf that people use to infect their computers with spyware.</rant>)

  • (unregistered)

    Now I don't know much about RegExp but does this code try and validate for 29th Feb only on leap years? If so, is it doing it properly (every 100 years it's not a leap year unless it's also a multiple of 400 see: http://www.codeproject.com/datetime/leap_year.asp) or just the 'is the year divisible by 4' rule?

  • (cs) in reply to foxyshadis
    foxyshadis:
    It's _fast_! You can't argue with fast! (It's a good argument that the grammar of the address field is too complex. It's a good argument for committal of the regex creator, too.)
    ...

    I can assure you, writing the check hardcoded will be a lot faster [:)] (at least if you're working in a compiled language and not an interpreted where the regex is a native library, then it could become close, depending on the language)

    Anyway - I hate regex in code, it's a script thing, a quick hack, a commandline tool, but please not in code... It's hell to debug or extend something like that.

    For me it's the same as invoking a perl-interpreter to execute a small perl script because that particular thing is easyer to write in perl, and sadly enough - I can't say I haven't seen such practices. Ok - the guy that did that was so nice to add a comment where he explained what the perl script did, but it was slight overkill to add a complete perl-installation to a windows-client program that was supposed to be "lightweight"... His argument was also "yeah but perl regex is fast"... [:@]

  • (cs)

    that crazy address regex could be simplified greatly by splitting it up into sensible parts, like:

    $mailto = qr/(?#...)/;
    $http = qr/(?#...)/;
    #...
    $address = qr/$mailto|$http|(?#...)/o;

    the /o on the end of the last one means it'll only be compiled once, so this should be no slower.

    (and yes, I know it's only supposed to be an example of why you don't want to do it by regex)

  • (unregistered) in reply to
    :
    Now I don't know much about RegExp but does this code try and validate for 29th Feb only on leap years? If so, is it doing it properly (every 100 years it's not a leap year unless it's also a multiple of 400 see: http://www.codeproject.com/datetime/leap_year.asp) or just the 'is the year divisible by 4' rule?


    Yep. It's been a little while since I broke the regex down and tried to figure it out (and submitted it here) but I believe one of the three main groups in the expression was devoted solely to that.


  • (unregistered)

    I appreciate that this is kinda missing the point but there was no year "0000" but there was a year "0001" so that feature is kinda ok depending on whether we want dates going back that far...

  • (cs) in reply to Irrelevant
    Irrelevant:
    that crazy address regex could be simplified greatly by splitting it up into sensible parts, like:
    $mailto = qr/(?#...)/;
    $http = qr/(?#...)/;
    #...
    $address = qr/$mailto|$http|(?#...)/o;

    the /o on the end of the last one means it'll only be compiled once, so this should be no slower.

    (and yes, I know it's only supposed to be an example of why you don't want to do it by regex)

    Actually the is the coup-de-taut from Mastering Regular Expressions, so it's supposed to be an example of RexEx zen.  I believe the point (if memory serves me correctly) is that it doesn't have any NFA-style rollbacks, so its really fast and doesn't end up consuming a lot of money in the process of NFA-to-DFA conversion by a regex compiler.

  • (cs) in reply to logistix

    I'm sure you mean a <FONT color=#0000ff>coup d'état</FONT>. Okay I have no idea where that link is gonna go...

    I'm all for using regular expressions instead of a series of 'instr' commands in VB.Net. Just don't make them too complex. It needs to be maintainable too.

    Drak

  • (cs) in reply to Drak

    The main thing I hate about using regex in code is that you have to escape (ie ", or even worse, escaping the 's in the expression so they become \) so many of the metacharacters, etc, that it becomes a nightmare to seperate the actual regex expression from the mangling you had to do to get it into a string variable.

    Thank god C# allows the literal string construct (prefix with @), so it is no longer quite so bad for me.



  • (cs) in reply to Drak

    Drak:

    Actually, I think he mean coup de grâce.

  • (cs) in reply to JamesCurran
    JamesCurran:

    [image] Drak wrote:
    I'm sure you mean a <FONT color=#0000ff>coup d'état</FONT>.

    Actually, I think he mean coup de grâce.

    I prefer a coup soleil, though... [:P]

  • (cs) in reply to JamesCurran

    I don't know about "coup de grace" either -- which is usually defined as a mercy stroke, designed to kill a (usually) badly wounded foe who would suffer unnecessarily otherwise. (A death blow given in other contexts may be wrongly termed a coup de grace in English, but it misses the whole "grace" part of the deal.) Coup d'état, a sudden, violent overthrow of the government, is definitely wrong. Coup de génie (stroke of genious) may fit, but it's hardly a common find in English, as would chef d'oeuvre (masterpiece). The most probable fit for French-originated-but-common-in-English phrases would be "tour de force"; the effect it has on you may be likened to a "coup de foudre".

  • (cs) in reply to Stan Rogers

    I think he didn't know what he meant. [N]

  • (unregistered) in reply to Irrelevant
    Irrelevant:
    that crazy address regex could be simplified greatly by splitting it up into sensible parts, like:
    $mailto = qr/(?#...)/;
    $http = qr/(?#...)/;
    #...
    $address = qr/$mailto|$http|(?#...)/o;
    the /o on the end of the last one means it'll only be compiled once, so this should be no slower.

    No need for the /o.  The great thing about qr// is that it precompiles regexe(s|n)... :)
  • (cs)

    I believe the point of the forum software is to get you in the mood for a proper appreciation of the collection of wtfs.

    I didn't know ISO supported leaving the dashes and colons out. Nice, the Exslt and XPath specs never goes over that, and probably don't support the full 'standard'.

    Why the hell would anyone include a perl binary/installer with a compiled project? wtf? PCRE exists for a reason, and will definitely be much faster than marshalling arguments into perl scripts, calling perl, and (sometimes) getting the results back. Just call it all from C/C++ and be happy.

  • (cs) in reply to Stan Rogers

    Stan Rogers:
    I don't know about coup de grace ... Coup d'état... Coup de génie ... chef d'oeuvre ... "tour de force"; ... "coup de foudre".

    Perhaps het just meant 'Coup':

    coup (k)
    n. pl. coups (kz)

    1. <FONT size=4>A brilliantly executed stratagem; a triumph. </FONT>
      1. A coup d'état.
      2. A sudden appropriation of leadership or power; a takeover: a boardroom coup.
    2. Among certain Native American peoples, a feat of bravery performed in battle, especially the touching of an enemy's body without causing injury.

    Sorry, I overlooked the fact that d'etat wasn't in there for definition 1 the first time round[:S]

    Drak

  • (unregistered)

    Re
    http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

    I once worked at one of the first companies to offer design-your-own egreet.  Well, it was a pay service, and the folks using it weren't too hip to tech, and we really really wanted people to get the messages (so we'd have happy customers).

    I was tasked with writing an email address validation routine that would specify exactly what was wrong with the address.

    It checked parts, it validated domain names, etc.

    It would return messages like "the domain name (after the @ sign) must not begin with a numeral."

    It took about 3 days and was about 400 lines of VBScript. 

    I'm not sure if it actually sold any more cards...

    It wasn't until much later that I realized a parser was probably already available (though doubtfully in VBScript.

    <script src="chrome://greasemonkey/content/scripts/1102161148673"></script><script src="chrome://greasemonkey/content/scripts/1102237157909"></script>
  • (cs) in reply to

    <font face="Georgia"><font face="Georgia">I saw the ex-parrot URL and then read the next like as being about a company that offered a "design-your-own egret".  I skimmed the rest of the comment looking for other references to birds, assuming this was some meme new to the blogosphere and before I knew it I'd be knee-deep in obscure species of winged creature.  All your geese are belong to us?

    Topic?  What topic?</font>
    </font>

  • (cs) in reply to

    :
    Re
    It would return messages like "the domain name (after the @ sign) must not begin with a numeral."

    There is, in fact, nothing wrong with a domain name starting with a numeral.

Leave a comment on “Irregular Expression”

Log In or post as a guest

Replying to comment #:

« Return to Article