- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
If only there was an approach to software development that put the technical decisions in the hands of the teams that were doing the work.... Of wait - there have been multiple ones for over 20 years!!!! and so many organizations claim to use one of them, but it is clearly in name only....
Admin
This sounds like it should have been a SQLite database.
Admin
Some fileformats allow readable headers and put the binary data behind the End-Of-File character, PNG for instance. Show the idea is not completely wrong.
Admin
XML and CDATA. Problem solved.
Admin
To clarify my previous post: I was being sarcastic...
Admin
Ever looked at the "HTML fragment" clipboard data format? It contains several size/offset fields in text. Thankfully, it allows padding them with zeroes.
Admin
If SQLite is your answer, you're asking the wrong question
Admin
That's a relief. XML, the comms equivalent of Regexps ...
Although looking at this problem in the abstract, XML was my first thought. Enterprisey enough for the PHCTO, and would probably work with a suitable schema.
Admin
SQLite's really good at replacing custom binary formats and whacking great directory trees full of lots of small files. It doesn't do everything a server-based DB does… but it's a complete doddle to deploy.
Admin
Ok, I'm an XML horror newbie. What would be wrong with XML in this case?
Admin
Even with context, I have no idea whether a doddle is a good thing or a bad thing.
...To the inter--! uh.
To a different part of the internet!
Admin
doddle noun
Example: "this printer's a doddle to set up and use"
Now I just need 1/4 of a red laser and 1/4 of a blue laser
Admin
I've spotted the terrible flaw in my argument. CDATA doesn't cat very well...
Of course, any CTO who insists on using cat for comparisons these days is the real WTF.
(Dogs don't work any better.)
Admin
There's this really cool way to stash binary data in text files. It's called base64. This CTO's predjudice in favor of text files was probably valid.
There's a this really cool way to handle binary-data endianness. It's called network order. (htonl(), ntohl()).
Everything in the file is text. Records formatted Tag: value separated by cr/lf (just like http headers, eh?) Binary data is converted to network order and base 64 encoded to store it.
The real WTF? CTOs who don't know how to explain what they want in a convincing way. They have one job: explaining stuff. Many CTOs can't do that.
Admin
For one thing XML doesn't like raw binary data, now sure you can base 64 everything but that's another level of unreable.
Admin
"you can base 64 everything but that's another level of unreable." And then you can encrypt it with ROT13 just to make it secure too. /snark.
Admin
Tim C's team is TRWTF. No, the cat-fanatic CTO doesn't help, but the above paragraph shows that the developers are in a whole different class of WTF.
The header length should, indeed, be stored in a way that doesn't change the length of the header, but that's not a hard problem. Either pack it with leading spaces or zeroes or something, or don't actually store the length as such. Take a lesson from HTTP, and represent the header as a sequence of lines of text, ended with a blank line. Duh.
For the endianness, you pick one representation and impose it on the idiot users. If they get it wrong, it's their problem. Or you discuss it carefully with the users and agree how it should be represented. And, of course, just in case they users are PDP-11 holdouts (that happens, even now) or other forms of idiocy, make sure to allow for flavours of middle-endian...
Admin
And the problem with normalizing the incoming data and storing it in a consistent format is...
Admin
"I am good at dealing with people, can't you understand that?! What the hell is wrong with you people?!"
Admin
I see you are reinventing MIME. Carry on.
Admin
I'm confused - I thought the clear way to go this days for this type of problem is to create a manifest file in a standardized format (like XML or json), and put binary data in either different files or inline them and the put everything including the original files in a container format, like zip? I think I'm missing something here %-)
Admin
Exactly what I was thinking. And if you need it to be cat-able, make that container a .tar file.
Admin
That’s going to be a problem for ANY readable scheme, which the CTO is insisting on, so why would XML be worse than any other option, given that limitation? (Not that I’m particularly pleased with the idea of XML, but if the CTO insists on a readable header that already breaks a lot of things. For that matter, the “header length might change the length of the header” problem ceases to be an issue if you use a format like XML where you just read until you reach the marker for the end of the section, which would resolve the problem by eliminating the need to write out the length in the first place.
On another note, though: if you’re going to support different endian-ness in the file format, rather than requiring all numbers to be recorded one way, then surely you need to write out more than just “1”, in case the individual bytes within an int have different endian-ness, which IIRC was the case on some obscure systems and you can’t be sure that some random client isn’t still using one. You should make them write 19088743, which is 0x01234567, so you can track each byte — even if they’re stored backwards you can count the number of bits in each byte to see which order they come in. (Or, for a 64-bit integer, 81985529216486895, which is 0x0123456789ABCDEF.)
Admin
Wouldn't the big-endian form of the 0x00000001 read as 0x01000000, not 0x10000000?
Admin
I'm wondering whether it would have been easier to find a friendly sysadmin to set up the CTO with replacement versions of cat and head that check the file type - if it's their new format then it displays some randomly-generated data that looks almost, but not quite, like XML. If it's anything else, it silently invokes the real head/cat.
Probably only needs to check the first file argument. I'm willing to bet good money that the CTO never actually uses cat to catenate files.
If that's too hard, then create a data file that includes a binary sequence that logs out (or crashes) a variety of terminal types, and name it 1_BiggestSpendingClient_TestFile.xmimelite, then make sure everyone except the CTO knows not to use that file. Surely that would be less work than reengineering the whole file format, and maybe the CTO will learn something, if only not to make such a big show about knowing about cat.
Admin
Are you using a Cedrus Stimtracker too? If yes, I have some tips for you (not sarcasm).
Otherwise, AFAIK - no, Big-Endian is not supposed to shift bits besides reversing the order. At least, all the devices I worked with just reversed (with one exception, see above, and it turned out the cable was wrong).
Admin
"Having the header length be an integer and not text also meant that recording the length wouldn't impact the length."
Errr, no. ("But 64k should be big enough for everyone"?)
Also, re the Base64 crowd here, nooooo, what you clearly want is a variant of Base85 with a permutation of the characters so it's incompatible with your competitor's tools.
Admin
Colin is right - endian-ness (at least with regards to memory and file storage) reverses the byte order, not the bit order. 0x00000001 stored in a file on a little-endian system is written as "0x01 0x00 0x00 0x00". There is no bit-shifting involved. The bytes are simply stored/written with the least significant byte at the lowest address. I believe that sometimes some comms systems transmit byte data in reversed bit order, but that's apparently not what we're talking about here.
Admin
Is there anything that you can’t do with JSON and all binary date base64 encoded? Or base-85 for added fun?
Admin
Bigendian vs littleendian: Ages ago when I wrote binary files, we used something similar to utf-8, which was compact and independent of byte order. JSON obviously means you send all numbers in decimal, also no problem with endianness.
Admin
A CTO that knows commandline tools? I'd take it every day. All CTOs I've seen know no tool beyond PPT and possibly Excel.
Admin
Nonsense, it's easy to represent binary data in XML: <binaryData> <bit>1</bit> <bit>1</bit> <bit>0</bit> <bit>1</bit> <bit>0</bit> ... </binaryData>
Admin
You're right. Endianness reverses bytes, not nibbles.
Admin
Tim presented a file format that contradicted a prior agreement, without having given notice, and the CTO checked him. I'm afraid the advantage is with the CTO.
The idiot users pay the bills, call the tune, and don't want to be bothered with unnecessary minutiae. "Aggregate the messages in a file so we can split them out again." "OK, which byte order should we conv--" "AGGREGATE. THE. MESSAGES."
Admin
"JSON obviously means you send all numbers in decimal"
which can be bad if the number is floating-point binary and you don't want to lose bits converting to/from decimal. I hacked up a protocol once that had to send such numbers in JSON. I formatted them as strings of hexadecimal.
Admin
Huh? Converting floating-point numbers between decimal and binary floating-point is a solved problem. It requires a lot of attention to detail, but your C Standard library should have code handling it, and your JSON parser should use that code. Now if you go above 64 bit double, that gets tricky.
Admin
In our company, we say: what you can't store it in a protocol buffer, it doesn't exist.
Admin
I actually used a NUXI system once. It was a 68000 CPU with an LSI-11 backplane running a Unix clone (Regulus), inside of a VT-100 terminal shell. Any data written out through SCSI was in NUXI order. I just wish I had kept a binary of the original non-Apple MACSBUG that it used as a boot rom.
Oh the number of WTFs in that description. Did I mention that it was used for a government contract?
Admin
No. You should use something more like
Not sure if I should have a separate name space for the bits or not.
Admin
Someone should tell the cat guy about strings
Admin
I'm the author. We needed to store millions or billions of tiny messages, e.g. 80 bytes each, in each file. The domain was stock exchange trading engines.
Admin
I'm the author. We needed to store millions or billions of tiny messages, e.g. 80 bytes each, in each file. This was for stock exchange trading engine data.
Admin
"The cat guy" LOL
Admin
https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
Admin
Sports betting. Bonus to the first deposit up to 500 euros. Online Casino. sports betting
Admin
Have you ever earned $765 just within 5 minutes? trade binary options