• Frist (unregistered)

    Obligatory frostiness

  • Supersonic Tumbleweed (unregistered)

    I think I puked a little in my mouth

  • leus (unregistered)

    Blobs are useless when dealing with large number of rows and variable length. Just sayin'.

  • Kanitatlan (unregistered)

    This reminds me of seeing this taken one step further with the process that recovered the "content" operating as a series of single record queries retrieving the data 1 record at a time.

  • Zach (unregistered)

    So... how'd he fix it?

  • Brian Boorman (google)

    If you're going to convert binary to text, Base64 would have been a more efficient option to choose.

  • XXXXX (unregistered)

    It's really sad & frustrating how many "DBAs" won't allow blob columns in their databases.

  • Donald Knuth (unregistered)

    Deffo a proper WTF for once. OMFG!

  • Regurgitated rubbish (unregistered)

    The extra 2 chars are so you can null terminate the string. You can never be too sure the first null will be noticed (I mean it's a null after all) so you do it twice.

    Typical 'belt-and-braces' code.

  • Steve (unregistered) in reply to Regurgitated rubbish

    No, the extra two characters are for the quotes around the string. Then you can avoid looking for the NUL key on your keyboard.

  • Phiu-x (unregistered) in reply to Regurgitated rubbish

    Came here to say that this is not a WTF. Clearly the author never had to work with a rudimentary DB with no true DBA available. This is exactly the kinf of no fuss coding that get shit done. I would hire the author of this code over any other guy who is looking down this code as "genius" code. Sometime shit need to be done guys.

  • sizer99 (google) in reply to leus

    Really? We use variable length blobs in Postgres (for real binary data, things that should go in columns go in columns) and it's not a problem. PSQL uses something called TOAST to keep the blobs 'outside' the rows - basically a handle to the blob - so it doesn't wedge your efficient row handling. It seems to work pretty well. Where does it become useless?

  • (nodebb) in reply to leus

    Explain why that is and your trolling might get up to T0pCod3r standards.

  • SG (unregistered)

    No comment about Zip and Hexing it?

    Zip is obviously there to reduce the data size. But handling binary data is "difficult", so it will be hexed. So - the size will be doubled. The Zip will make more then 50%, so there is at least a data reduction, but is it worth?

  • Geoff (unregistered) in reply to XXXXX

    Its really frustrating how often developers want to serialize out some giant object and stuff it into a database.

    1. First off it defeats the real use of the RDBMS. Nobody can query that data now. It can't be ETLed easily into a data warehouse if anyone wants to use the information in the blobs for anything else its a project
    2. It often means an big increase in database size and its associated problems. Seriously 9 times in 10 it would be better to generate a guid or hash; store that into the database engine and write your blobs out to a filesystem into a file named with the hash (in hash buckets so you don't have 10 million files in one directory).

    That said pregenerated reports where the underlying data for them is already there are a good case for blobs; they also probably small enough that it would not be a problem to keep them in the db especially if they expire and can be deleted or only the most recent per something else is stored. Really DBAs and Developers need to get better at talking to each other.

  • Jinks (unregistered)

    I suspect a tyrannical DBA lurking somewhere in the background.

  • (nodebb)

    I really want to see the response to XXXX's statement (it's currently held for moderation).

    The reason I don't like binary data fields in a database is that it encourages developers, usually offshored, to do stupid things. Like creating an account table with two columns (one for the key and one for all of the other fields mashed together), completely defeating the purpose of a relational database.

  • sizer99 (google) in reply to Zenith

    Of course the converse of that is if you really need to store binary data and can't, then it gets stored as variable length base64-encoded ASCII in the database - which is also an abomination. A no binary policy may make sense by default (especially with offshoring, ugh), but hopefully you can allow it for people who make the case. F

  • tango_uniform (unregistered)

    When all you have is a "stupid hammer" everything looks like a varchar(255).

  • (nodebb)

    Storing blobs in tables has a performance issue with larger sizes and buffered retrieval. Anything larger than a couple KB is better stored on disk and streamed as needed.

  • (nodebb) in reply to Steve

    Pretty sure the extra two characters are in fact for the [ and ] around each chunk.

    leus wrote:

    Blobs are useless when dealing with large number of rows and variable length. Just sayin'.

    Well, now we know to ignore anything else you say on this topic. (You may be right for some databases; it's not true for Oracle at least.)

  • (nodebb) in reply to Geoff

    Its really frustrating how often developers want to serialize out some giant object and stuff it into a database.

    Fair, but not really relevant here where it's a set of report data rather than some opaque binary object. It looks like they're trying to set up a generic table to hold the results of multiple reports; presumably they all have different formats so you can't easily do it in a single relational table (unless you go the route of having a (string) value column and a format identifier column for each field, and then interpret one according to the other at report time).

    First off it defeats the real use of the RDBMS. Nobody can query that data now. It can't be ETLed easily into a data warehouse if anyone wants to use the information in the blobs for anything else its a project

    Depends on your RDBMS and your ETL tools (and the format of your blobs). Particularly if it's an XML blob there are often good ways to use it either directly from the database or in the ETL.

    That said pregenerated reports where the underlying data for them is already there are a good case for blobs; they also probably small enough that it would not be a problem to keep them in the db especially if they expire and can be deleted or only the most recent per something else is stored.

    This looks like the case in the article; they're only keeping the most recent set of report data for a given report ID. So a blob is a reasonable approach here. It's certainly better than zipping and hexing the data and then chopping it up into 248-character pieces.

    And on that subject: even if this was the best approach you could think of, why on earth would you use such a small chunk size?

  • (nodebb) in reply to tango_uniform

    One of the offshored applications that I have to babysit a few more weeks has 2722 such columns. You know, for stuff like US state names (the longest, Rhode Island, is 52), 9-digit SSNs, 10-character user IDs, booleans (0, 1, yes, no, true, false), and dates (spelled out, long form), file paths (fun when somebody uses the full 260). Lots of warnings about indexes exceeding 900B and it's no wonder that 10Y/200GB in the old system turned into 11Y/900GB in the new system. So I could almost make an argument for banning varchar right alongside varbinary at this point.

  • Landis (unregistered)

    Not sure about this situation, but I've seen chunking being used in lieu of proper streaming support for large files. Case in point: https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593

  • notroot (unregistered) in reply to Phiu-x

    It's a WTF and so is your comment. 1) Zip outputs binary. 2) VARCHAR is not for binary data, it only accepts alphanum.

    See the problem?

  • (nodebb)

    @Remy and @notroot (and the other posters so far) For shame! Y'all missed the call to RawToHex before the chunking loop - each chunk is only124 bytes of the original compressed XML. I'd put money (OK, probably only a eurocent or two) on that function being a simple "one byte of binary in, two ASCII hex digit characters out" transformation.

    But either way, it's impossible to query on the content of the reports, which might or might not be a real problem.

Leave a comment on “Chunks of Genius”

Log In or post as a guest

Replying to comment #:

« Return to Article