The Daily WTF: Curious Perversions in Information Technology

2007-10-31 Reply Admin

They weren't covered by my school.

xtremezone · 2007-10-31 Reply Admin

I went to a community college so naturally we didn't cover any advanced topics. :)

2007-10-31 Reply Admin

Can't believe this quote hasn't appeared yet:

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." --Jamie Zawinski

rsynnott · 2007-10-31 Reply Admin

We didn't cover them as such; a description of a very simple regular expression language turned up in a maths exam, once, under formal languages, but that was it.

Mind you, my course didn't really address 'practical' stuff; it let people figure that out if they felt like it. I must say, I think there's a lot to be said for this approach; if I had had lectures on how to make web applications I would probably have gone mad.

Phlip · 2007-10-31 Reply Admin

is never valid... the / is invalid in HTML, the capitals invalid in XHTML... but that probably won't stop most browsers from accepting it.

And I vote for s/<br\b[^>]*>/\r\n/gsi;... This'll work in any valid HTML/XHTML... but it won't be the same as the usual tag-soup parsers, and will choke on
which is invalid HTML but many browsers will accept.

If VB doesn't like \b then s/<br([^A-Za-z0-9-._:>][^>]*)?>/\r\n/gsi; is a working, but less readable, replacement... with the same caveats as above.

Also, why are people putting <\s*br in their expressions? You can't put spaces in there... it's illegal, and neither IE nor Fx will accept it.

2007-10-31 Reply Admin

Pyro:
why bother fixing? using VB is the real WTF anyway :)

Can we implement a system in these threads to vote posts down?

2007-10-31 Reply Admin

is never valid... the / is invalid in HTML, the capitals invalid in XHTML... but that probably won't stop most browsers from accepting it.

It is valid HTML, but not equivalent to
. Nevertheless, it's (incorrectly) treated as
by most (all?) visual user agents. Actually,
is equivalent to
> This is due to the support for the null end-tags minimization. In short, <ELEMENT/stuff/ is a shortcut for <ELEMENT>stuff</ELEMENT> when the OMITTAG option is specified in the SGML declaration of SGML applications such as HTML.

Hello
/
is equivalent to
> which is equivalent to
>

Actually, XML inherits the /> syntax from SGML. The SGML declaration of XML specifies a NET delimiter equal to / and a NESTC (net-enabling start-tag close) delimiter equal to >.

So, in XML, instead of using the syntax: <element/some data/, one must use the syntax <element/some data> Moreover, XML adds a constraint (violating the SGML specification, but that's not the matter of our story): There must be zero bit of data between the NET and the NESTC... This can only be used for empty elements. Like that, an SGML parser, fed with the SGML declaration of XML, will be able to parse an XML document.

2007-10-31 Reply Admin

savar:
the Winner is...:
/<\s*br([^\w>][^>]*)?>/i
And with quoted > allowed:

/<\sbr(^\w>'")?>/i

Captcha: ewww... Exactly what this last regexp looks like.

Which character class matches annoying captcha posts??

I wrote a greasemonkey script to get rid of the line that captcha statements are on: http://userscripts.org/scripts/show/7631

I would welcome suggestions, as I'm not a regex master, but it seems to do the job quite well so far.

2007-10-31 Reply Admin

[quote=Phlip] And I vote for s/<br\b[^>]*>/\r\n/gsi;... This'll work in any valid HTML/XHTML... but it won't be the same as the usual tag-soup parsers, and will choke on
which is invalid HTML but many browsers will accept. [/quote]

BR supports the core attributes: id, class, style and title. id is restricted to name characters, but the style attribute may contain the > character in a valid, conforming HTML document, as in:

You can probably imagine more with the title attribute.

It's also possible (in valid, conforming, but unsupported code) to write <br
which sould be interpreted as

because of the unclosed tags feature of HTML (not to be confused with the end-tag omission feature).

[quote=Phlip] Also, why are people putting <\s*br in their expressions? You can't put spaces in there... it's illegal, and neither IE nor Fx will accept it. [/quote] It isn't illegal, but it must be interpreted as < followed by spaces and the two br letters. Yeah is never equivalent to

chrismcb · 2007-10-31 Reply Admin

Sam:
tomanyregex:
The ifs would actually perform faster and use less memory then the regex.

sweavo:
How about "well, first you have to compile the regexp"
How about getting the code right first and optimizing later? The regex is more flexible, more readable, and less prone to failure than the original code.

I gotta say, given the example the RegEx is way simpler than the gaggle of Replaces... But uhmmm doesn't it fail for 2/3's of what the Replace statements work on?

KozMoz · 2007-10-31 Reply Admin

F*** there are some dickheads around!!

2007-10-31 Reply Admin

Am I the only one who would just use toLower() ? O_o Only if you don't want upper case letters in the text output.

2007-10-31 Reply Admin

Pez:
bstorer:
strTagLess = Replace(strTagLess, "
", vbCrLf)

If anybody used
around me, I'd shoot 'em.

If anyone used anything other than
around me, I'd shoot them...

Captcha: atari - Old Skool!

Assuming that this is not XHTML, then a stray
with or without a
is also acceptable.

Using all caps ia also fairly reasonable, and I know some basic books on HTML recommended that (I think that this was to make life easier if using a editor without highlighting, although I'm not sure)

Quinnum · 2007-10-31 Reply Admin

bstorer:
cBradley:
As awesome as regular expressions are, they aren't taught in most Comp Sci programs. This looks more like a task one would assign to a junior developer, and provide some guidance to them, or a suggestion of what to use. Now, if you were to tell me this was done by a senior engineer, or provided some history of grandiose accomplishments from the perpetrator of this submission, perhaps I would be more awestruck by it's "wtf"-osity. As it stands, it just appears to be an individual unaware of one of the many tools available to a developer.
What CS program did you have that didn't include regex? We had to learn the language theory and design our own regex engine.

Mine certainly didn't - although granted it was back in the day before world+dog decided to jump on the programming bandwagon.

There was no .NET, Java only just came out and there were none of these fancy fad languages that seemed to proliferate in the last decade. Heck, even html was new, where H1, H2 and H3 were the height of page formatting.

I did do the Compiler elective and the Finite Automata one as well, so it was a heck of a lot of theory - but not really delving into any specific implementations like regex.

You young-uns have it easy these days </old man's rant>

2007-10-31 Reply Admin

Answer8879:
BR supports the core attributes: id, class, style and title. id is restricted to name characters, but the style attribute may contain the > character in a valid, conforming HTML document, as in:

You can probably imagine more with the title attribute.

isn't valid... what you're after is
.

[edit] My mistake... turns out only less-than signs are verboten in attributes, not greater-than signs.

Damn, that makes this a lot more unnecessarily complicated.

K_Logic · 2007-11-01 Reply Admin

Just goes to show how easy things are in PHP...

<? $str = "string where tags show be replaced, maybe read from a file"; $replace_arr = array(" ","
","
"); for($x=0;$x<=$replace_arr;$x++){ $str = str_replace($replace_arr[$x],"\n",$str); } echo $str; ?>

that is bound to get them out.

2007-11-01 Reply Admin

Anon Fred:
I hope no one intends to use their own HTML tag, like <brisket>, because that would match most of the regex's that people have posted so far.
Or if Netscape comes out with the <brown> tag, you're all dead.

You're on to something there.... not to mention the well know east asian tags <brack> and <brue>

2007-11-01 Reply Admin

Then I will start using it to mess with your mind!

2007-11-01 Reply Admin

Assuming that this is not XHTML, then a stray
with or without a
is also acceptable.

is invalid HTML. Moreover, it doesn't behave consistently among browsers. Opera 9 and IE 6 interpret

as

, while FF 1.5 interpret it as
.

Using all caps ia also fairly reasonable, and I know some basic books on HTML recommended that (I think that this was to make life easier if using a editor without highlighting, although I'm not sure)

The W3C recommandation uses all caps for element names. Yes, this style is fairly reasonable.

2007-11-01 Reply Admin

cBradley:
As awesome as regular expressions are, they aren't taught in most Comp Sci programs.

Trade school graduates have got to stop trying to pass themselves off as college graduates. Being a Gamma-minus machine minder is nothing to be ashamed of. You have no idea, the troubles which Alphas and Betas need to deal with.

KenW · 2007-11-01 Reply Admin

cBradley:
As awesome as regular expressions are, they aren't taught in most Comp Sci programs. This looks more like a task one would assign to a junior developer, and provide some guidance to them, or a suggestion of what to use.

Some people just try to be morons with every post...

Sure, maybe the original developer didn't learn about regular expressions in Comp Sci. They did, however, learn about LCase(), didn't they? So they at least could have come up with one better way of doing what they did.

Or can you just not figure that out yourself?

2007-11-01 Reply Admin

Answer8879:

Phlip:
And I vote for s/<br\b[^>]*>/\r\n/gsi;... This'll work in any valid HTML/XHTML... but it won't be the same as the usual tag-soup parsers, and will choke on
which is invalid HTML but many browsers will accept.

BR supports the core attributes: id, class, style and title. id is restricted to name characters, but the style attribute may contain the > character in a valid, conforming HTML document, as in:

You can probably imagine more with the title attribute.

It's also possible (in valid, conforming, but unsupported code) to write <br
which sould be interpreted as

because of the unclosed tags feature of HTML (not to be confused with the end-tag omission feature).

Phlip:
Also, why are people putting <\s*br in their expressions? You can't put spaces in there... it's illegal, and neither IE nor Fx will accept it.
It isn't illegal, but it must be interpreted as < followed by spaces and the two br letters. Yeah is never equivalent to

Let's see how my attempt holds up

from the rest of the comments, this should cover all the bases for valid x?html and might even still be readable to somebody other than me.

Random832 · 2007-11-01 Reply Admin

UTU:
XIU:
Well I think "<br\s*/?>" would probably do it on most sites.
Geoff:
<\s*br\s*/\s*>
And we can just hope that the regexp engine isn't running in greedy mode by default :)

How about:
<\s*?br\s*?[^>]*?>

Yeah. Because b, /, and > are totally whitespace characters. His only problem was failing to take into account the possibility that there might be something other than /, or nothing at all (except whitespace), between br and > - neither of which had anything to do with greedy mode.

Hint: Even with greedy mode, the * operator won’t eat anything that’s not matched by what it’s attached to. All your question marks are unnecessary, and they introduce something that has to be changed for different regex flavors (in vim, you use {-} for a “non-greedy star” - posix basic regexes don't support it at all.)

Random832 · 2007-11-01 Reply Admin

Ancient_Hacker:
The real WTF is that to replace
in all the correct places, you need a fairly complete HTML parser, not a string replace or a regexp will do.
The
could be in a comment, some embedded code, or inside a or
 block.

Neither nor stop the parsing of other html tags within; if you're replacing tags you probably aren't going to display comments or execute scripts (the only reason i can think of is converting html to a text file)

2007-11-01 Reply Admin

In the real world, a lot of programming is done by people without computer science degrees. And why not? For most business application programming, domain knowledge is as important as anything taught in CS classes.

2007-11-01 Reply Admin

savar:
dkf:
cBradley:
As awesome as regular expressions are, they aren't taught in most Comp Sci programs.
You mean there's a lot of places affiliated with WTF-U's programme? What are they teaching instead, underwater basket-weaving?

At academic institutions, they focus on theories and knowledge.

At vocatiocal institutions, they focus on practice and experience.

Most real-life institutions offer some mix of academic and vocational study.

Fair enough, I can understand it not being in Code Monkeying 101, but why would they claim to be teaching CompSci without touching on regular expressions? That is the Real WTF.

2007-11-01 Reply Admin

This thread is so very funny. You all blathering about schools/degrees.

Many programmers do not have MISs or CSs. I have a business degree, am I certified MCDBA, and write VB6 and C#. I know about regexes and eschew there use. They are difficult to write for any type of complicated task and worse to maintain.

There are two concepts in coding that often are orthogonal, efficiency and maintainability. What is more important? In my world, it is maintainability.

Flame away all you CS degree holders...

2007-11-01 Reply Admin

worthlessFred:

There are two concepts in coding that often are orthogonal, efficiency and maintainability. What is more important? In my world, it is maintainability.

Flame away all you CS degree holders...

I defy you to show me any instance where I have to choose between writing efficient code and writing maintainable code. If you are in the habit of sacrificing one for the other, then you should not be coding; you are not a programmer, you are a monkey with a keyboard. "I have a business degree, am I certified MCDBA..." I don't know, are you? It doesn't matter, an MCDBA is no substitute for a brain.

2007-11-01 Reply Admin

Sam:
As awesome as regular expressions are, they aren't taught in most Comp Sci programs.

If a college educational curriculum doesn't cover regular expressions and other finite state machines, it's not a CS curriculum.

UVM will teach you finite state machines. Regex is more dubious, since I didn't see that in any of the required or optional classes I took. I'd kinda like to learn Regex, but I honestly think it looks like it's pretty limited in usefulness. I think a better use of time might be more comprehensive coverage of design patterns.

2007-11-01 Reply Admin

Well said.

2007-11-01 Reply Admin

I guess where you obtained your degree from determines the size of your ePenis. I have never stepped foot in a postsecondary classroom and probably never will. It took me 13 years of experience to get where I am now, but I am completely happy with my ridiculous salary and real world education.

...and all the CS grads that work for me usually bring the bad habits of their professors along with them.

2007-11-01 Reply Admin

[quote=Phlip]

My mistake... turns out only less-than signs are verboten in attributes, not greater-than signs. [/quote]

Both less-than and greater-than signs are allowed in attributes in HTML.

However, ampersand is interpreted as the start of an entity reference, so that ampersands must be encoded as &

Nozz · 2007-11-02 Reply Admin

FredSaw:
tomanyregex:
Sure regex make you look really smart, but that doesn't mean its better
Yes it does. Looks are everything.

Very true. Many clients would be happier with buggy, dodgy software that looks like a dream than software that executes flawlessly but looks like crap. It's all about perception.

2007-11-02 Reply Admin

worthlessFred:
I have a business degree, am I certified MCDBA, and write VB6 and C#. I know about regexes and eschew there use.

Too bad they didn't teach you the difference between there and their in business school. Last time I checked, by the way, the requirements for an MCDBA included a limited amount, if any, programming skills.

worthlessFred:
There are two concepts in coding that often are orthogonal, efficiency and maintainability. What is more important? In my world, it is maintainability.

If you find these two items orthogonal, then you might want to take a long, hard look at yourself as a programmer. A good programmer will construct code that is elegant: efficient, maintainable, easy to understand, and easy to debug. The initial code was none of the above. It was not efficient, hard to extend, unclear exactly what it is trying to accomplish, and hard to make sure that all of the relevant cases were considered.

Honestly, if you have trouble understanding the regex that was supplied, then get a new job. I don't care whether or not you have a MIS, CS, or business school degree. None of them, including a MCDBA, makes you a good programmer.

(In case anyone else doesn't know, MCDBA is short for, "I have a very small penis.")

2007-11-02 Reply Admin

Answer8879:
Both less-than and greater-than signs are allowed in attributes in HTML.
However, ampersand is interpreted as the start of an entity reference, so that ampersands must be encoded as &

It's quite a WTF that it's taken so long before someone noticed it. Many people talk down about HTML, mostly because it's not hard to understand. That makes it even more sad that so few people are capable of writing valid HTML 4.01 code. And why so few people can come up with "better" solutions in this topic.

I'm not even gonna try too hard, but if you know the code is valid HTML 4.01, even /<br(\s[^>]*|/)?>/i should cover most of the possible valid situations. But even then; what to do with
within comments? Or within a javascript string within a <script> block?

Some WTFs are actually no WTF if you know that certain situations won't occur. Don't overdo it. If you did so, you would really have to parse the entire HTML document, and then serialize it back to HTML. Certainly a simple regular expression based replacement would cover most circumstances. And for all other circumstances: fix the tool/person that supplied the HTML code in the first place.

2007-11-02 Reply Admin

Cheatah:
~<br(\s[^>]*|/)?>~i

Damnit, I do it every time. No / delimiters when using regexes and HTML.

2007-11-02 Reply Admin

Ha! Try this:

I think it even does validate (with warnings).

Lesson for you kids: Regular expressions can't reliably parse HTML (at least not those which are humanly comprehensible).

Use DOM and getElementsByTagName() (or avoid crappy "web-oriented" environments that don't have methods for processing HTML/XML).

2007-11-02 Reply Admin

kl:
Ha! Try this:

Well my regexp (see earlier) will match the
correctly

It does get confused by the comments - but assuming that the only useful purpose for replacing <br...> with CRLF is to convert to plain text, the routine should strip out comments/scripts/styles/etc first anyway, so the comments wouldn't be a problem.

Yes, parsing HTML properly needs a proper parser - but converting HTML to text can be done reasonably well using a set of regexps.

2007-11-02 Reply Admin

rumpelstiltskin:
worthlessFred:

There are two concepts in coding that often are orthogonal, efficiency and maintainability. What is more important? In my world, it is maintainability.

Flame away all you CS degree holders...

I defy you to show me any instance where I have to choose between writing efficient code and writing maintainable code. If you are in the habit of sacrificing one for the other, then you should not be coding; you are not a programmer, you are a monkey with a keyboard. "I have a business degree, am I certified MCDBA..." I don't know, are you? It doesn't matter, an MCDBA is no substitute for a brain.

hahahahahahahah, I love it, I work with CS grads with equal years of experience as me and I am paid more. I have to laugh at all you CS grads that can't even write the most simple SQL.

2007-11-02 Reply Admin

Sam:
If you find these two items orthogonal, then you might want to take a long, hard look at yourself as a programmer. A good programmer will construct code that is elegant: efficient, maintainable, easy to understand, and easy to debug. The initial code was none of the above. It was not efficient, hard to extend, unclear exactly what it is trying to accomplish, and hard to make sure that all of the relevant cases were considered.

Let's go down your list...

It was not efficient: it was efficient to author (not having to crack open the regex help file to start typing the line noise), or are you speaking of execution efficiency? What if this only ran once a day?
Hard to extend: within the context of adding other
permutations, it would be easy to add a couple more lines. How to extend the regex adds more complexity to a simple problem.
Unclear what exactly it was trying to accomplish: huh? it was replacing various instances of
with vbCrLF. The code comment says 'Replace
with vbCrLf. The code lists all of the ways
could be spelled to get the vbCrLf, with no surprises.
Hard to make sure that all of the relevant cases were considered: Bingo. But it does list the cases it does consider.

Heck, I'm not even against the use of a regex to simplify that replacement. But to make a wtf out of it... If this snippet is considered a serious wtf, the programmers these days are getting to be really good!

seejay · 2007-11-02 Reply Admin

worthlessFred:
hahahahahahahah, I love it, I work with CS grads with equal years of experience as me and I am paid more. I have to laugh at all you CS grads that can't even write the most simple SQL.

Hey look! Broad stroke brush in the works! What's your next act? Women can't be programmers? All Blacks are criminals? French are surrender monkeys? C'mon... you can do better!

-- Seejay

2007-11-02 Reply Admin

I'm sure I could come with a few more for your entertainment. I really don't appreciate being called a monkey in front of a key board because I don't have a CS degree.

I believe Regex's suck, not because of the problems I have using them, but in supporting other peoples code, often CS degree holders. Regex's are often the cause of my rant on maintainability vs efficiency. Take that for what you will.

Here are some for some for your entertainment:

How do you tell if a CS degree holder is an introvert or an extrovert?

Whether they look at their shoes or your's when they talk to you.

How do you keep a CS degree holder in the shower all day?

Give them a shampoo bottle that says rinse, lather, repeat.

2007-11-02 Reply Admin

2. Hard to extend: within the context of adding other
permutations, it would be easy to add a couple more lines. How to extend the regex adds more complexity to a simple problem.

How do you easily add "<br" followed by an arbitrary number of spaces, tabs and CRLF and ">" ? Oh, and of course, you must do it for all lower/upper case of br.

With the same notion of "extensible", the following program is an extensible multiplicator:

int multiply(int x,int y) { if (x==1 && y==1) return 1; else if (x==2 && y==1) return 2; else if (x==1 && y==2) return 2; else if (x==2 && y==2) return 4; /* easily extensible, just add new conditions! */ }

real_aardvark · 2007-11-02 Reply Admin

worthlessFred:
I'm sure I could come with a few more for your entertainment. I really don't appreciate being called a monkey in front of a key board because I don't have a CS degree.
I believe Regex's suck, not because of the problems I have using them, but in supporting other peoples code, often CS degree holders. Regex's are often the cause of my rant on maintainability vs efficiency. Take that for what you will.

Here are some for some for your entertainment:

How do you tell if a CS degree holder is an introvert or an extrovert?

Whether they look at their shoes or your's when they talk to you.

How do you keep a CS degree holder in the shower all day?

Give them a shampoo bottle that says rinse, lather, repeat.

Hang on a minute: I think I have an answer to support you (even though I've got a CS degree-ish...)

real_aardvark · 2007-11-02 Reply Admin

seejay:
worthlessFred:
hahahahahahahah, I love it, I work with CS grads with equal years of experience as me and I am paid more. I have to laugh at all you CS grads that can't even write the most simple SQL.

Hey look! Broad stroke brush in the works! What's your next act? Women can't be programmers? All Blacks are criminals? French are surrender monkeys? C'mon... you can do better!

-- Seejay

Well, almost next. Tune in next post!

That would be "Cheese-eating Surrender Monkeys," Seejay, and I'm ashamed at you not remembering that. Particularly because it's hysterically funny, and I like French cheese. But mostly only in France. Unless you can get Raclette your way, which I definitely recommend. Oh, and that thing with the ash in the middle and the morning cheese on top and the evening cheese below. Or is that Raclette? I forget. I certainly wouldn't recommend anything calling itself Camembert or Brie in the US, because it's either rancid or a lie.

Anyway, isn't it up to the monkeys to surrender?

real_aardvark · 2007-11-02 Reply Admin

rumpelstiltskin:
worthlessFred:

There are two concepts in coding that often are orthogonal, efficiency and maintainability. What is more important? In my world, it is maintainability.

Flame away all you CS degree holders...

I defy you to show me any instance where I have to choose between writing efficient code and writing maintainable code. If you are in the habit of sacrificing one for the other, then you should not be coding; you are not a programmer, you are a monkey with a keyboard. "I have a business degree, am I certified MCDBA..." I don't know, are you? It doesn't matter, an MCDBA is no substitute for a brain.

Damn. In the mean spirit of these dark streets that we go down on this site, I was about to make a comment on your better half -- she who spins the gold.

Unfortunately, I noticed that worthlessFred is certified.

Now, I'm a politically-correct sort of guy. All kinds of creeds, colours, sexes, religions, and spoons are grist to my metrosexual mill.

What, precisely, might a "certified MCDBA" be? And why should the nation (any nation) care?

Unfortunately, I have to disagree with you. This isn't a choice. This is a hierarchy.

(1) Write documented and (repeatably) testable code. (2) Write maintainable code (3) Write efficient code.

(3) is generally regarded as an 80/20 rule. (2) is, interestingly enough, also regarded as an 80/20 rule.

To transition from (3) to (2), (1) almost certainly gives you a benefit better that 80/20.

Well, I don't have an MDBARGHCEX-ComeInPluto, and I think my brain is gently frying right now, but I'd be careful of too much "efficiency" at the expense of "maintainability" if I were you. If only because I do maintenance, I'm much bigger than you, and I know the dark alleys round where you live. (isn't Google Earth a wonderful tool?)

Believe me. We'll all be happier if it's maintainable, but not particularly efficient.

Mind you, if you could make it scalable and testable, then you've got my vote, and MCDBAs be damned.

Fajar Endra Nusa · 2007-11-03 Reply Admin

I like to say that even almost in every edge of codes we have are done by driving it to a maintainable direction while keeping the efficiency, unless you're not a part of "we" I mentioned.

But then.. we can't always be perfect all the time. Some times you must choose between a best-material pants but costly and an intermediate-material pants with a low price.

Sure we got both of maintainable code and efficient code when we wrote this single line:

Regex.Replace(html, " ", vbCrLf, RegexOptions.IgnoreCase)

Although in some cases we faced the fork, hey which way I should to go? maintainable or efficient..? But still we can minimize the effect of sacrificing one of them.

etaoinbe · 2007-11-03 Reply Admin

And the colleague got promoted because he wrote so many lines of code, how productive of him !

Fajar Endra Nusa · 2007-11-04 Reply Admin

etaoinbe:
And the colleague got promoted because he wrote so many lines of code, how productive of him !

LOL =)) nice to have a colleague like him.

2007-11-04 Reply Admin

You're likely right about it being a junior developer, but I also wouldn't be surprised if it were someone with years of experience. I've seen some people go to extremes to avoid learning regular expressions, or really anything new for that matter.

Breaking Broken

Leave a comment on “Breaking Broken”