Comment On Breaking Broken

When Mickey's colleague was tasked with changing <br>s into newlines, he wanted to cover all the bases. Since <br />, <Br />, <bR />, etc. are all valid HTML, he clearly had his work cut out for him. [expand full text]
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Re: Breaking Broken

2007-10-31 09:17 • by Tukaro
You know, I'm fairly certain that Regular Expressions could cook breakfast for me if I could figure out the right sequence.

Re: Breaking Broken

2007-10-31 09:18 • by parser (unregistered)
considering that <br /> is just as legal.. oh well.

Re: Breaking Broken

2007-10-31 09:18 • by parser (unregistered)
159340 in reply to 159339
parser:
considering that <br /> is just as legal.. oh well.

that is, <br [any number of spaces] />

Re: Breaking Broken

2007-10-31 09:19 • by parser (unregistered)
159341 in reply to 159339
parser:
considering that <br /> is just as legal.. oh well.

..that is, <br and /> separated by any number of spaces.

Re: Breaking Broken

2007-10-31 09:21 • by Jon B (unregistered)
He's lucky the tag isn't <line break goes here>

Re: Breaking Broken

2007-10-31 09:23 • by cBradley
As awesome as regular expressions are, they aren't taught in most Comp Sci programs. This looks more like a task one would assign to a junior developer, and provide some guidance to them, or a suggestion of what to use. Now, if you were to tell me this was done by a senior engineer, or provided some history of grandiose accomplishments from the perpetrator of this submission, perhaps I would be more awestruck by it's "wtf"-osity. As it stands, it just appears to be an individual unaware of one of the many tools available to a developer.

Re: Breaking Broken

2007-10-31 09:28 • by RoBorg
<br style="clear:both;" />

Oops...

Re: Breaking Broken

2007-10-31 09:28 • by bstorer
159345 in reply to 159343
cBradley:
As awesome as regular expressions are, they aren't taught in most Comp Sci programs. This looks more like a task one would assign to a junior developer, and provide some guidance to them, or a suggestion of what to use. Now, if you were to tell me this was done by a senior engineer, or provided some history of grandiose accomplishments from the perpetrator of this submission, perhaps I would be more awestruck by it's "wtf"-osity. As it stands, it just appears to be an individual unaware of one of the many tools available to a developer.

What CS program did you have that didn't include regex? We had to learn the language theory and design our own regex engine.

Re: Breaking Broken

2007-10-31 09:29 • by Fuji (unregistered)
159346 in reply to 159338
Tukaro:
You know, I'm fairly certain that Regular Expressions could cook breakfast for me if I could figure out the right sequence.


It could, but the eggs would always look like they are scrambled.

Re: Breaking Broken

2007-10-31 09:29 • by dkf (unregistered)
159347 in reply to 159343
cBradley:
As awesome as regular expressions are, they aren't taught in most Comp Sci programs.
You mean there's a lot of places affiliated with WTF-U's programme? What are they teaching instead, underwater basket-weaving?

Re: Breaking Broken

2007-10-31 09:29 • by tomanyregex (unregistered)
The ifs would actually perform faster and use less memory then the regex. And the if approach could have been simplified to evaluate one character at a time. But that doesn't matter. Sure regex make you look really smart, but that doesn't mean its better

Re: Breaking Broken

2007-10-31 09:30 • by bstorer
strTagLess = Replace(strTagLess, "<bR>", vbCrLf)


If anybody used <bR> around me, I'd shoot 'em.

Re: Breaking Broken

2007-10-31 09:33 • by bstorer
159350 in reply to 159348
tomanyregex:
The ifs would actually perform faster and use less memory then the regex. And the if approach could have been simplified to evaluate one character at a time. But that doesn't matter. Sure regex make you look really smart, but that doesn't mean its better


Three Contains and twelve Replace faster than a very simple regex? If that's true, then the designer of VB's regex engine has some explaining to do.

Re: Breaking Broken

2007-10-31 09:41 • by sweavo (unregistered)
159351 in reply to 159350
bstorer:
tomanyregex:
The ifs would actually perform faster and use less memory then the regex. And the if approach could have been simplified to evaluate one character at a time. But that doesn't matter. Sure regex make you look really smart, but that doesn't mean its better


Three Contains and twelve Replace faster than a very simple regex? If that's true, then the designer of VB's regex engine has some explaining to do.


How about "well, first you have to compile the regexp"

Re: Breaking Broken

2007-10-31 09:41 • by CATS (unregistered)
You have no chance to survive make your time.
Ha ha ha ha.

Re: Breaking Broken

2007-10-31 09:43 • by Pyro (unregistered)
why bother fixing? using VB is the real WTF anyway :)

Re: Breaking Broken

2007-10-31 09:51 • by The peoples hypocrite (unregistered)
of course this the shorter sequence of the much longer set of functions which included replacing <input type="text" /> and <textarea />
or
<Input type="text" /> and <Textarea />
or
<INput type="text" /> and <TExtarea />
and so on...

Re: Breaking Broken

2007-10-31 09:53 • by xtremezone
Though technically the browser probably wouldn't care what came between <br and > so the regular expression should probably account for any number of anything, assuming there is at least one whitespace character between <br and >. :-/

<br>
<br />
<br it's cold>
<br
eak
here />
<Br avo team, this is Charlie. Paint the target/>

Re: Breaking Broken

2007-10-31 09:54 • by Christian (unregistered)
159356 in reply to 159343
cBradley:
As awesome as regular expressions are, they aren't taught in most Comp Sci programs.


They're almost certainly taught in 99.999% of courses on compilers. Aren't compiler courses still taught in a majority of CS programs?

Re: Breaking Broken

2007-10-31 09:55 • by bstorer
159357 in reply to 159351
sweavo:
bstorer:
tomanyregex:
The ifs would actually perform faster and use less memory then the regex. And the if approach could have been simplified to evaluate one character at a time. But that doesn't matter. Sure regex make you look really smart, but that doesn't mean its better


Three Contains and twelve Replace faster than a very simple regex? If that's true, then the designer of VB's regex engine has some explaining to do.


How about "well, first you have to compile the regexp"


How about "precompile the regex for superior asymptotic performance?" Or how about "never mind, .NET appears to cache the regex anyway, so you'll still get better asymptotic performance?"

Re: Breaking Broken

2007-10-31 09:59 • by Sam (unregistered)
159358 in reply to 159343
As awesome as regular expressions are, they aren't taught in most Comp Sci programs.


If a college educational curriculum doesn't cover regular expressions and other finite state machines, it's not a CS curriculum.

Re: Breaking Broken

2007-10-31 10:01 • by XIU
159359 in reply to 159355
xtremezone:
Though technically the browser probably wouldn't care what came between <br and > so the regular expression should probably account for any number of anything, assuming there is at least one whitespace character between <br and >. :-/

<br>
<br />
<br it's cold>
<br
eak
here />
<Br avo team, this is Charlie. Paint the target/>


Well I think "<br\s*/?>" would probably do it on most sites.

Re: Breaking Broken

2007-10-31 10:04 • by Pez (unregistered)
159360 in reply to 159349
bstorer:
strTagLess = Replace(strTagLess, "<bR>", vbCrLf)


If anybody used <bR> around me, I'd shoot 'em.


If anyone used anything other than <br /> around me, I'd shoot them...

Captcha: atari - Old Skool!

Re: Breaking Broken

2007-10-31 10:06 • by Sam (unregistered)
159361 in reply to 159351
tomanyregex:
The ifs would actually perform faster and use less memory then the regex.

sweavo:
How about "well, first you have to compile the regexp"

How about getting the code right first and optimizing later? The regex is more flexible, more readable, and less prone to failure than the original code.

In addition, until you profile the code, how do you know that the regex solution is slower? First, .NET should only compile the regex once and cache the compiled regex. And, given the simplicity of the regex, it still may be faster to compile and run the regex rather than the 3 contains and 12 replace statements, especially if the string is of a very large size. The regex only needs to scan the string once, versus the 15 times in the initial code.

Re: Breaking Broken

2007-10-31 10:10 • by brazzy
159362 in reply to 159351
sweavo:
bstorer:

Three Contains and twelve Replace faster than a very simple regex? If that's true, then the designer of VB's regex engine has some explaining to do.


How about "well, first you have to compile the regexp"

On a text of non-negligible lenght, 15 operations that have to look through the entire text would give the competing regexp engine ample time to do some compilation.

Re: Breaking Broken

2007-10-31 10:10 • by linepro (unregistered)
All together now:

"The real wtf is that Mickey is using VB"

Re: Breaking Broken

2007-10-31 10:12 • by Zecc
159364 in reply to 159350
bstorer:
tomanyregex:
The ifs would actually perform faster and use less memory then the regex. And the if approach could have been simplified to evaluate one character at a time. But that doesn't matter. Sure regex make you look really smart, but that doesn't mean its better


Three Contains and twelve Replace faster than a very simple regex? If that's true, then the designer of VB's regex engine has some explaining to do.

No, no, no. XSLT is *much* faster.

Re: Breaking Broken

2007-10-31 10:13 • by HeavyWave (unregistered)
Am I the only one who would just use toLower() ? O_o

Re: Breaking Broken

2007-10-31 10:15 • by nEUrOO (unregistered)
Nop, not the only one for the toLower :X
What the heck, a regular expression for that!!

Re: Breaking Broken

2007-10-31 10:15 • by Tom (unregistered)
Please. I'll take someone's well-crafted RegEx over a nest of Ifs any day.

Who writes this crap?

CAPTCHA: dubya. Well that explains it.

Re: Breaking Broken

2007-10-31 10:17 • by Welbog
I like how easy it is to predict what the comments are going to be about based on the content of the article. I don't even have to read them anymore.

Anyway, on topic, who the heck graduates with the CS degree without knowing regexps? It's like graduating without understand big-O notation...

Re: Breaking Broken

2007-10-31 10:20 • by bstorer
159369 in reply to 159364
Zecc:
bstorer:
tomanyregex:
The ifs would actually perform faster and use less memory then the regex. And the if approach could have been simplified to evaluate one character at a time. But that doesn't matter. Sure regex make you look really smart, but that doesn't mean its better


Three Contains and twelve Replace faster than a very simple regex? If that's true, then the designer of VB's regex engine has some explaining to do.

No, no, no. XSLT is *much* faster.

Okay, you're going to hell.

Re: Breaking Broken

2007-10-31 10:23 • by Salty (unregistered)
159370 in reply to 159353
Pyro:
why bother fixing? using VB is the real WTF anyway :)


That opinion is so 1990's.

Re: Breaking Broken

2007-10-31 10:26 • by Gamen (unregistered)
159371 in reply to 159365
HeavyWave:
Am I the only one who would just use toLower() ? O_o


Why not? The code does it 3 times.

As for a regex that would match most <br> tags, /<br[^>]*>/i

Better regexp ...

2007-10-31 10:26 • by Geoff (unregistered)
159372 in reply to 159338

<\s*br\s*/\s*>

Re: Breaking Broken

2007-10-31 10:28 • by SuperousOxide
159373 in reply to 159365
HeavyWave:
Am I the only one who would just use toLower() ? O_o


toLower would be bad, you'd lose the case of everything else is in the input.

"This is<br>A TesT"

Should convert to

"This is
A TesT"

not
"this is
a test"

Re: Breaking Broken

2007-10-31 10:29 • by Pidgeot
159374 in reply to 159365
Pez:
bstorer:
strTagLess = Replace(strTagLess, "<bR>", vbCrLf)


If anybody used <bR> around me, I'd shoot 'em.


If anyone used anything other than <br /> around me, I'd shoot them...

Captcha: atari - Old Skool!


XHTML is mostly useless in this day and age. Unless there's a *specific* need for something in XHTML (hint: there rarely is), it makes more sense to stick with HTML and use <br>.

HeavyWave:
Am I the only one who would just use toLower() ? O_o


nEUrOO:
Nop, not the only one for the toLower :X
What the heck, a regular expression for that!!


Both of you seem to have missed a vital fact: toLower'ing everything only works for LOCATING, which the original examples shows he's well aware of. If you're going to replace, which the example does, you'll end up with a string entirely in lower case - and that's a pretty bad thing.

Re: Breaking Broken

2007-10-31 10:34 • by AC (unregistered)
159375 in reply to 159338
... and more: http://xkcd.com/208/

captcha: gotcha

Re: Breaking Broken

2007-10-31 10:36 • by AC (unregistered)
159376 in reply to 159375
AC:
... and more: http://xkcd.com/208/

captcha: gotcha
Meh, that was in reply to
Tukaro:
You know, I'm fairly certain that Regular Expressions could cook breakfast for me if I could figure out the right sequence.

Re: Breaking Broken

2007-10-31 10:38 • by AC (unregistered)
Jake Vinson (TFA):
<br />, <Br />, <bR />, etc. are all valid HTML
Are we sure about that?

Re: Breaking Broken

2007-10-31 10:43 • by Doug#1
159379 in reply to 159343
This is actually true. I never learned them in college. You could probably teach a whole class on them. Or at least there are lots of classes worth.

I still am not sure how to use the look behind(ahead) feature. Never needed to get that complicated.

Regex is so important and makes lives sooo much easier. You think they'd have more of the spotlight.

Re: Breaking Broken

2007-10-31 10:48 • by Anders (unregistered)
159380 in reply to 159366
Presumably the code needed to preserve case, though. So toLower (or, looking at the code, LCase) wouldn't be an option.

Re: Breaking Broken

2007-10-31 10:49 • by ahnfelt
159381 in reply to 159360
Pez:
bstorer:
strTagLess = Replace(strTagLess, "<bR>", vbCrLf)


If anybody used <bR> around me, I'd shoot 'em.


If anyone used anything other than <br /> around me, I'd shoot them...

Captcha: atari - Old Skool!


If anyone used anything other than <Br style="color:#FF00FF" id="waltz"/> around me, I'd shoot them.

Re: Breaking Broken

2007-10-31 10:50 • by Anders (unregistered)
159382 in reply to 159380
Anders:
Presumably the code needed to preserve case, though. So toLower (or, looking at the code, LCase) wouldn't be an option.


...teach me to reply before refreshing. :( Also I forgot to quote who I was replying to. I'm not good at this 'internet' thing.

Re: Breaking Broken

2007-10-31 10:52 • by Michael (unregistered)
159383 in reply to 159370
Salty:
Pyro:
why bother fixing? using VB is the real WTF anyway :)


That opinion is so 1990's.

So is VB.

Re: Breaking Broken

2007-10-31 10:54 • by Troche
159384 in reply to 159356
Christian:
cBradley:
As awesome as regular expressions are, they aren't taught in most Comp Sci programs.


They're almost certainly taught in 99.999% of courses on compilers. Aren't compiler courses still taught in a majority of CS programs?

One of the things I have run into is that, though it may be taught in CS programs it wasn't taught in my program. My degree is in CIS(Computer Information Systems), throwing that information systems in there evidently gave them license to just skip huge sections of my education.

As an example my asp.net professor asked our class if we had ever had any asp experience before, when the entire class responded in the negative, he assured us that .net was just like asp and then proceeded to never teach anything about either .net or asp, the entire class was devoted to learning what an n-tier architecture was, and labs that couldn't be done because of the schools security policies.

WTF-U alum class of 2006

Re: Breaking Broken

2007-10-31 10:56 • by FredSaw
159385 in reply to 159348
tomanyregex:
Sure regex make you look really smart, but that doesn't mean its better
Yes it does. Looks are everything.

Re: Breaking Broken

2007-10-31 11:03 • by Jon (unregistered)
159388 in reply to 159368
Welbog:
Anyway, on topic, who the heck graduates with the CS degree without knowing regexps? It's like graduating without understand big-O notation...


We never covered RegEx in my CS courses, but covered big-O like crazy. While covering big-O and algorithm analysis has been extremely influential in my programming, I use RegEx much more frequently. (Well, maybe not, I guess everything we do in programming is about algorithm analysis, must be more of a reflex now.) All the RegEx I know I learned on-the-job. I wish we covered RegEx, but at this point I don't think I know any less about it than if we did. Unless it would be reflexive if we did cover it... (still have to look up some RegEx syntax)

Re: Breaking Broken

2007-10-31 11:07 • by Anon Fred (unregistered)
159391 in reply to 159345
bstorer:
What CS program did you have that didn't include regex? We had to learn the language theory and design our own regex engine.

I know they're a bit of an outlier, but you can get a CS degree from MIT without taking the compilers class where they go over the theory of regexp's.

For that matter, you can also get through without learning C or Ruby or PHP or Perl or Python or JavaScript.

(Most students learned about them on their own time. It's the ones that didn't that you really need to watch out for.)
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Add Comment