- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Admin
Nice shot!
Admin
Admin
Admin
What happened to using a real text editor for programming tasks? emacs, vi, or even edlin, fer Pete's sake, all show you the actual text. Word processors are for writing letters and manuscripts (and even there, I'll take emacs and LATeX any day -- better math layout).
Now, if you'll excuse me, I need to go out and tend to my herd of brontosaurs. . .
Admin
This "conversation" is an excellent example of why programmers rarely get to choose the vendors for their company
Admin
I actually think I know who these people are.
I have at least one of these types of conversations every week with them.
Admin
The first two are .NET things, and as such irrelevant. Also, they could easily have used better formats, but Microsoft (the publisher of this "why XML rules" list) chose not to do so. The second is almost a canonical example of why XML sucks. The third is also a pretty good example of something that would have been better without XML. Perhaps using XML will pay back when XHTML becomes common, but then again, using XML for that is also a historical curiosity. The answer to the last two involve databases.
For some people, like me, speed and compactness is important. We firstly don't understand why you'd want to give up these things when the only thing you get in return is the ability to say "It uses XML!" But what I really detest is that, of all conceivable hierarchical data formats, people standardized on something as horrid as XML. Of course, I know why it happened, but I still hate it, and I will never use it in any protocol or file format of my design.
Admin
Look, XML is not a bullet point, it's the lingua franca of the Internet. If you want someone to be able to access your data, XML is the best option available. Your sacrifice of speed and compactness is a false dichotomy, since most of the time there isn't an ideally fast, compact, binary solution available. Can you point to a competitor to XML for general-purpose data exchange? You can't just completely ignore a technology because you think there might have been a better alternative. That's just hubris.
If you tell people they have to learn a new binary format, and your competitor offers SOAP or POX, you lose. And barring it entirely? Mention that in all of your future interviews, and let me know how that goes. Have you told your current employer that you are an anti-XML extremist (not namecalling here, just by definition)?
Admin
For almost any real use of XML, JSON provides a simpler syntax for encoding the same data in a less verbose and less error-prone manner. It's a far better "platform agnostic" format, although it has some problems itself. (Mostly dealing with character encodings, something XML never really solved in any case.)
The real answer to anything you bring up is "there's a better solution, but it depends on the specifics." XML is like taking a hammer to everything, because it's big and heavy and it does sort of work.
Admin
Depends on the library / language... if this was .net and you wanted to write XML to some kind of stream it's simply a case of doing something like.
or just use XmlSerializer
which for is preferable to string manipulation... whatever floats ya boat really...
On the parsing side, once got my hands on some vendor code written in c#, where the developer decided to parse very large xml files using a combination of regular expressions and substring.. <shudder>
Admin
It is? The MSDN article states that Notepad uses IsTextUnicode(), which is just a punt. The documentation for IsTextUnicode() isn't very helpful either, but does point out that given its statistical methods "some ASCII strings can slip through", providing the somewhat pathological example "A\n\r^Z".
Bottom line, I still don't see a reasonable explanation of how a 4-word, 16-letter sentence, all in 7-bit ASCII, with no control characters, can fail to be recognized as such.
Admin
Admin
Sigh. Windows "programmers". (There's an appropriate use of doublequotes!)
Admin
Admin
Customers that insist a vendor break standards based interfaces to work around their shortcomings do not represent profit in the longer term.
But the point was, WTF was that idiot doing raising any sort of defect with his vendor in the first place? Exactly what sort of answer was he expecting to such a stupid question?
Admin
Turns out the reason is that there are approximately one-bazillion web-based "email name references", regexes, edit code snips, and etc., that incorrectly say this character is not legal.
Apparently someone made an error a decade or two ago and "all" that's left are hearsay references.
For the curious:
The characters permitted within an email name without quoting are:
(If you quote the name, you can basically use any printable character.)
Admin
Nope, that's just another notepad translation error.
Admin
Admin
Minimal structure, no support for encodings, no standardization effort. Probably does not support all character data supported by XML.
Which is not standardized at all making Sun's implementation the only reliable one.
JSON does not handle encodings at all and does not offer support for a concept similar to namespaces, making extensibility difficult (something which is essential for web services).
Again, the lack of encoding and extensibility support makes JSON a poor contender in this area.
Please elaborate on the stylesheet transformation technologies available for JSON.
Good luck processing the Japanese and Korean files and the ones containing incompatible extensions by three different vendors.
Speaking of error prone, could you please elaborate on the validation solutions available for JSON?
Right - it's so platform agnostic it does not even standardize any mapping from character to octet stream...
That, my friend, is complete and utter bullshit. I'll stop short of calling it a shameless lie. XML encodings are well-supported by every major implementation. If you had trouble with the i18n, the problem was probably in front of the keyboard.
Admin
JSON is a fairly decent lightweight interchange format, but I want named values and other fancy things for some of what I use xml for - named parameters beat positional notation with dealing with anything complex.
Admin
No it is not. The choice is logically arbitrary, and an accident of XML's inherited predecedents.
The only possible reason to make a computer protocol human-readable is that it will be read by humans, not written by humans. And, as Terrence Parr argues so eloquently (and correctly), XML is a disastrous choice for a human-readable API.
Thank you. I was, until now, unaware that XML had achieved true world dominance. (Unless, by "industry," you refer to the "XML industry," which is too self-referential for me.)I think that, given the assumed universality of the protocol, it is perfectly reasonable to suggest that Anon might be significantly smarter than a substantial proportion thereof. That's the thing about universality.
I imagine that Anon, like me, would grit his teeth and use XML in many situations, because it's just there. Which doesn't make it "appropriate." And it is true: XML zealots do tend to fly off the handle at even mild criticism of their ugly, warty little baby.
Priceless.
On a slightly more reasoned note:
I hardly think that this is in the spirit of the alarmingly verbose XML. What's a few megabytes between friends? Can you say "ludicrous premature optimisation?"Well, hitting on XML loonies is dull. Let's try:
Ipso facto, testers are not "just as necessary as developers." Without developers, there would be nothing to test.I think what you mean is something like "a truly competent tester is at least as valuable as a competent developer, and perhaps more so." The mindset isn't particularly important: it's the honey-pot instinct. All sorts of idiots flock to both development and QA jobs because, let's face it, they're comfortable. I used to think that you get more incompetent, dangerous idiots in testing that you do in development, although these days I'm not so sure. However, it remains a truism that a significant proportion of truly great testers will aspire to be developers, whereas the opposite is just so never going to happen. (Unit-testing excluded for the purposes of this rant.)
I do think, however, that you miss one further point.
Just as testers need developers as the essential feed to their work-flow, so developers need morons in turn.
Imagine a world with a tragically short supply of morons. How would developers spend their time? (Open source, I know.) Who would fund our opulent existence? If morons could do simple programming jobs, we'd all be in deep shit.
And this, I think, is the shining case for supporting XML in all its many, unnecessary usages.
Viva Los Morones!
Admin
Admin
Notepad actually DOES parsing and/or conversion, but I doubt this was such a case. An example of a string that will be parsed and causes automated insertion of text by notepad is the string ".LOG" on the beginning of the first line of a file.
Admin
Sometimes you have to step back from your personal ideals and remind yourself exactly why it is you're in business and what it is you're trying to accomplish.
Addendum (2007-11-29 18:09): Sorry, forgot to include this: since double-quotes work just fine as delimiters, I'd like to hear how it is that using them in place of single-quotes would "break standards based interfaces".
Admin
I can't quite bring myself to agree with you though... As far as debugging goes, it's a lot easier to find a missing xml tag than a missing brace, and a lot easier for me to explain to non-tech folks how to update an XML file than JSON notation (believe me, I've tried both approaches), so... would JSON work? Yes. Is it a better tool in all cases? An emphatic 'No'.
Admin
If you have Chinese language support installed, you'll see that in the encoding that Notepad guessed wrongly, the file contains a string of perfectly valid Chinese characters.
Admin
Admin
Given that single quotes are as valid as double quotes for XML attribute values, does anyone else get the impression that Rick's XML "parsing" consisted of simple string searches?
Admin
The funny thing is that there are translation issues in some versions of notepad...
The Windows NT version of Notepad, installed by default on Windows 2000 and Windows XP, has the ability to detect Unicode files even when they are missing a byte order mark. To do this, it utilizes a Windows API function called IsTextUnicode()[2]. This function is, however, imperfect, incorrectly identifing some all-lowercase ASCII text as UTF-16. In result, Notepad interprets a file containing a phrase like "aaaa aaa aaa aaaaa" as two-byte Unicode text file and attempts to display it as such. If a font with support for Chinese is installed, Chinese characters are displayed.
Few people misinterpreted this issue for an easter egg. Many phrases, which fit the pattern (including "this app can break" and "bush hid the facts") appeared on the web as hoaxes. Experts correctly attributed it to the Unicode detection algorithm.
This issue has been resolved in Windows Vista version of Notepad.
Admin
I gotta agree with Brianary, real_aardvark... you're kinda being a dink about the whole XML thing.
XML bugs me for its verbosity, but makes up for it in its widespread (native) acceptance and accessibility to the average person.
Being an anti-XML jihadist is no different from being an XML zealot. You believe in black and white where there are many more shades of grey.
Admin
Sounds a bit like my problem...
Been arguing for days with the webmaster. I'm trying to prove that https://ib.absa.co.za/ib/mb.do isn't valid XHTML.
No wonder it doesn't display on my mobile phone's Opera browser. Their response: Get a new phone.
Admin
How do you know it is 7bit ascii? HOW?
I give you 19 bytes (the spaces count too!): 74 68 69 73 20 6F 6E 65 20 63 61 6E 20 62 72 65 61 6B 00
QUICK! What does that say? What? Are you sure its English? I didn't tell you what format it was in. Does this start with two ascii characters 74 and 68, or the unicode character 6874?
Now I don't know how the algorithm works exactly, that makes it thinks this stream of characters is unicode, and a slightly different stream is ansi. But I can see that it can happen. And without something explicitly saying what the format is, it will always be possible to misinterpret the data.
Admin
[quote user="FredSawAs I understand it, he wasn't expecting an answer; rather, he was expecting an XML document which consistently used double-quotes for delimiters. This would involve editing two characters in the originating document. [/quote]
This isn't the first time someone said this. HOW do you know it is as simple as editing two characters in a file? Looking at the data in the file, I am guessing this file is generated. Changing the file means it will be "broken" next time it is generated.
Admin
I would like to know from the "XML suxx" crowd whether they can provide any alternative that provides the feature set of XML (including namespaces and encodings), XML Schema, (E)XSLT, and XPath in an all interoperable way, but I honestly don't expect too much of an answer. All I hear is that JSON or S-Expressions or binary (it's so easy to be vague!) are much better than XML, but upon closer examination they do not even offer the feature set of the XML standard itself, not to mention XSD, XSLT, XPath and related standards.
Because the XML haters have obvious problems coming up with any alternative that is
a) actually superior to XML in at least a majority of applications and b) available now
they will frequently resort to name-calling and other sorts of ad hominem attacks like this:
It takes a gullible fool to fall to the suggestion that braggadocio and baseless insults make a rational argument.
Admin
It shouldn't matter - it's still XML. It only needs to be consistent with the standard, that's what standards are for.
The chances are that if such a basic bug exists in his implementation then a myriad of other ones do. Using a compliant parser would have avoided them.
Consider why they are using XML. Most of the arguments for that will be based on making numerous systems (not just these two) compatible with the same message format. Broken parsers that introduce extra constraints completely undermine that.
It's got nowt to do with personal ideals. I have never seen using a standard correctly cost more than not doing so for either party. They are supposed to be communicating based on a standard, that standard being XML. That's what they agreed. Both delimiters are legal in the standard, coping with them isn't optional. If you can't cope with everything in that standard that is mandatory you have a broken interface.The cause is that Rick was ignorant of the standard. He seems to be making the usual basic error of thinking that XML just sorta looks like this a bit like HTML.
Ironically, Terry's first response "that they were processing the file wrong" was correct, though probably not in the way he meant it.
Admin
Admin
It's not valid anything without a DOCTYPE.
Admin
I check if any of the bytes has their 8th bit set. If not, I'd assume it to be 7bit.
No idea, don't really read hex that well. However, all of the values (thanks for providing them in hex; determining the byte order from pure binary representation would've been more difficult) are equal to or less than 116 in decimal, it would appear that the bytes only use the bottom 7 bits.
No. I didn't try to figure out the actual words (if there are any).
All I figured out was that it only uses 7 bits. Whether that tells it to be ASCII or not, I don't know. Only control-character appears to be the null-character at the end of the line, so I would be plausible that this is indeed ASCII.
With two distinct "characters" 74 and 68. If it were unicode, every other byte would've been null. If it were UTF8, it still would've been two distinct characters, since multibyte characters are constructed in a way that prevents two singlebyte characters from being mistaken as one multibyte character. Of other encodings I don't know enough, but I'd still hazard a guess of 7bit ASCII. Oh, and I don't really know how it could be unicode character 6874; are the multibyte characters really read in that order?
Everything is always possible.
And now that I took my hexeditor out and looked at the provided string, it seems like it reads "this one can break". As said previously, null terminated.
Admin
It's because they are (or were, I have not checked in IE7) using their file extension mechanism, mapping their internal thing to MIME. Brilliant.
Admin
Um. Assuming by "unicode" you mean UTF-16, you're saying that UTF-16 inserts a null against every single character for the sheer fun of it? Just because it's nice to double the size of every file?
Those nulls are there to indicate that the characters are in the Latin range (range 00). If the characters are not Latin, such as Greek or Chinese, the range bytes will be of different values, and then you're back to not being able to easily tell whether it's UTF-16 or not.
Admin
I have a similar XML story.
We use a third-party vendor for certain kinds of data and I had requested that they send me a list of ids via HTTP request. They sent the ids as a comma separated list as the response:
1,2,3,4,5,6,...
I requested that they send it in XML instead. They changed the response to the following:
<xml> <ids attr"1,2,3,4,5,6,..."> </xml>Gee thanks!!
Admin
A possibility nobody seems to have considered is that the single quotes were curly quotes, which are not permitted by the XML standard (which does indeed permit either ASCII single quotes or ASCII double quotes). This might happen, for example, if someone inserted these quotes (and it is unusual, though perfectly legal, for them to be the only single quotes in the whole document) using a quote-replacing editor such as WordPad.
Admin
You punks, we never used to have any problems with double quotes and single quotes, we would just open a socket and send a byte stream across captcha: ewww (someone get me out from under this desk)
Admin
The XML shown in the article is valid XML. Single quotes are in the spec. Please add this to the article.
Still, mixing single and double quotes is not considered best practise and the answer of the vendor is hilarious. So, it stays a WTF.
Admin
i love dumb people
Admin
Gwaaaaaaahhh.... (Dilbert scream style's)
Admin
Ah, but the thing is that how often are those all needed?
Maybe some people need them. But, I've come across lots of XML, and the vast majority of it could be encoded in half the space, and parsed far more easily using something like ASN.1 or JSON or even INI file format. In fact, I can't think of anything I've seen which couldn't have been done better in an alternative way.
I'm not, in any way, saying I've seen all possible uses of XML, but I am saying that I believe it's vastly overused. ISTM that some people see a data interchange problem, and immediately use XML without even considering alternatives. That is what I don't like. To me XML should be the last resort because it's so verbose and complex to parse.
I suspect that if the LDAP standard was written today, it'd use XML rather than ASN.1 - as a result LDAP queries would be more than double the size they are now - with no benefit other than to be able to say it uses XML...
Admin
Seriously though, it would be more sensible to base the choice of encoding on the OS's locale settings, i.e. if someone has configured their OS to en_US, assume that text files are ASCII encoded unless something in the file directly contradicts this.
Admin
Endianness is about converting between numbers bigger than one byte, and bytes.
A hex dump displays one byte at a time. There is no endianness involved.