- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Admin
Admin
Admin
Load it into an XmlDocument object and use XSLT. :)
Admin
There's nothing stopping you... except from time. strcmp is a standard C function. It is there. If you have a decent C library, you can expect a good implementation. You don't have to write it yourself. Comparing to a dozen strings using strcmp is trivial to implement and hard to get wrong. Hashing is difficult. You have to write it yourself. Writing it using SSE instructions makes it ten times harder. Then you need to build a hash table. Handle collisions. You still have to compare strings. Possibly multiple times. You have to test it. Carefully, because the code isn't trivial. "It should be trivial" is not the same as "it is trivial".
And of course I said "a dozen strcmps". Not 70. On the other hand, to handle 70 compares, instead of hashing you can just switch on the first letter of the string, and then have 3 or 4 strcmps per letter. Still trivial to implement, and you try beating that with a hash table.
Admin
While crude, unsophisticated overkill, the motivation is understandable. Uppercase elements are an abomination on the face of the web, and there is a separate circle of hell for developers who use them.
Admin
Admin
this is why websites should be 100% flash, that way they will parse faster. HTML is so old fashioned.
Admin
Should have called pageHTML.toLowestCase(), incrementally lowering the case with toLowerCase never gives the desired result.
And who is this Proro the Pedofiler guy?
Admin
Another approach is to have the outermost case switch on the length of the string you are looking up. The advantage of that is that the code inside doesn't have to worry about running off the end of the string. Then inside that you can have another level of switch statements on the first character.
Admin
Seriously, have you stopped programming maybe twenty or thirty years ago?
All relevant modern programming languages / standard libraries come with hashtables included. It is trivial, just use what is already implemented and tested.
Admin
Don't several calls to toLowerCase() on an immutable object get optimised away anyway? I thought Java did a lot of background optimisation?
Admin
It would work in a purely functional language like Haskell, but Java Bytecode doesn't give such guarantees.
Imagine that toLowerCase() does some logging for debugging. Optimising it away would also optimise away the logging.
Admin
Must Must add Must add one Must add one word Must add one word at Must add one word at a Must add one word at a time Must add one word at a time to Must add one word at a time to handle Must add one word at a time to handle Strings Must add one word at a time to handle Strings efficiently...
Admin
Other problems: He used String concatenation in a loop instead of StringBuilder/StringBuffer. Also hard-coded newline instead of getting an environment variable. (In Java it'd be System.getProperty("line.separator");)
Admin
Word policing is kinda a sh*tty thing to do.
Admin
Word policing is kinda a sh*tty thing to do.
(Aside: why do they have a reply button if it doesn't quote the person you're replying? Now I'm confused.)
Admin
Building the String was probably unnecessary altogether. He could have done his stupid parsing line-by-line.
Admin
Admin
at the risk of stating the icredibly obvious ... strings are immutable. he wasn't lowercasing the document multiple times, he was calling a method that returned a lowercased document (the value of pageHTML wasn't modified). granted, the point is the same -- calling toLowerCase() on the same huge a string repeatedly is stupid.
Admin
Admin
For a brute-force evaluation, the continue keyword works wonders.
CAPTCHA: venio - Veni veni venias, veni veni facias! (What does One Winged Angel have to do with HTML parsing? Nothing, the captcha just made me think of the only part that I can easily remember.)
Admin
Hyrca, hyrce!