- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Obligatory frostiness
Admin
I think I puked a little in my mouth
Admin
Blobs are useless when dealing with large number of rows and variable length. Just sayin'.
Admin
This reminds me of seeing this taken one step further with the process that recovered the "content" operating as a series of single record queries retrieving the data 1 record at a time.
Admin
So... how'd he fix it?
Admin
If you're going to convert binary to text, Base64 would have been a more efficient option to choose.
Admin
It's really sad & frustrating how many "DBAs" won't allow blob columns in their databases.
Admin
Deffo a proper WTF for once. OMFG!
Admin
The extra 2 chars are so you can null terminate the string. You can never be too sure the first null will be noticed (I mean it's a null after all) so you do it twice.
Typical 'belt-and-braces' code.
Admin
No, the extra two characters are for the quotes around the string. Then you can avoid looking for the NUL key on your keyboard.
Admin
Came here to say that this is not a WTF. Clearly the author never had to work with a rudimentary DB with no true DBA available. This is exactly the kinf of no fuss coding that get shit done. I would hire the author of this code over any other guy who is looking down this code as "genius" code. Sometime shit need to be done guys.
Admin
Really? We use variable length blobs in Postgres (for real binary data, things that should go in columns go in columns) and it's not a problem. PSQL uses something called TOAST to keep the blobs 'outside' the rows - basically a handle to the blob - so it doesn't wedge your efficient row handling. It seems to work pretty well. Where does it become useless?
Admin
Explain why that is and your trolling might get up to T0pCod3r standards.
Admin
No comment about Zip and Hexing it?
Zip is obviously there to reduce the data size. But handling binary data is "difficult", so it will be hexed. So - the size will be doubled. The Zip will make more then 50%, so there is at least a data reduction, but is it worth?
Admin
Its really frustrating how often developers want to serialize out some giant object and stuff it into a database.
That said pregenerated reports where the underlying data for them is already there are a good case for blobs; they also probably small enough that it would not be a problem to keep them in the db especially if they expire and can be deleted or only the most recent per something else is stored. Really DBAs and Developers need to get better at talking to each other.
Admin
I suspect a tyrannical DBA lurking somewhere in the background.
Admin
I really want to see the response to XXXX's statement (it's currently held for moderation).
The reason I don't like binary data fields in a database is that it encourages developers, usually offshored, to do stupid things. Like creating an account table with two columns (one for the key and one for all of the other fields mashed together), completely defeating the purpose of a relational database.
Admin
Of course the converse of that is if you really need to store binary data and can't, then it gets stored as variable length base64-encoded ASCII in the database - which is also an abomination. A no binary policy may make sense by default (especially with offshoring, ugh), but hopefully you can allow it for people who make the case. F
Admin
When all you have is a "stupid hammer" everything looks like a varchar(255).
Admin
Storing blobs in tables has a performance issue with larger sizes and buffered retrieval. Anything larger than a couple KB is better stored on disk and streamed as needed.
Admin
Pretty sure the extra two characters are in fact for the [ and ] around each chunk.
leus wrote:
Well, now we know to ignore anything else you say on this topic. (You may be right for some databases; it's not true for Oracle at least.)
Admin
Fair, but not really relevant here where it's a set of report data rather than some opaque binary object. It looks like they're trying to set up a generic table to hold the results of multiple reports; presumably they all have different formats so you can't easily do it in a single relational table (unless you go the route of having a (string) value column and a format identifier column for each field, and then interpret one according to the other at report time).
Depends on your RDBMS and your ETL tools (and the format of your blobs). Particularly if it's an XML blob there are often good ways to use it either directly from the database or in the ETL.
This looks like the case in the article; they're only keeping the most recent set of report data for a given report ID. So a blob is a reasonable approach here. It's certainly better than zipping and hexing the data and then chopping it up into 248-character pieces.
And on that subject: even if this was the best approach you could think of, why on earth would you use such a small chunk size?
Admin
One of the offshored applications that I have to babysit a few more weeks has 2722 such columns. You know, for stuff like US state names (the longest, Rhode Island, is 52), 9-digit SSNs, 10-character user IDs, booleans (0, 1, yes, no, true, false), and dates (spelled out, long form), file paths (fun when somebody uses the full 260). Lots of warnings about indexes exceeding 900B and it's no wonder that 10Y/200GB in the old system turned into 11Y/900GB in the new system. So I could almost make an argument for banning varchar right alongside varbinary at this point.
Admin
Not sure about this situation, but I've seen chunking being used in lieu of proper streaming support for large files. Case in point: https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593
Admin
It's a WTF and so is your comment. 1) Zip outputs binary. 2) VARCHAR is not for binary data, it only accepts alphanum.
See the problem?
Admin
@Remy and @notroot (and the other posters so far) For shame! Y'all missed the call to
RawToHex
before the chunking loop - each chunk is only124 bytes of the original compressed XML. I'd put money (OK, probably only a eurocent or two) on that function being a simple "one byte of binary in, two ASCII hex digit characters out" transformation.But either way, it's impossible to query on the content of the reports, which might or might not be a real problem.