• (nodebb)

    I suspect the developer responsible didn't understand how to split the extension off.

    Possible, but clearly the said developer didn't bother trying to find out either.

  • (nodebb)

    Come on, Remy, everyone knows that the more times you hash something, the more random it becomes.

  • (nodebb)

    The proper way in PHP to generate a UUID is actually by using the Random\Randomizer class. Now you just have to set a few bits to make it a RFC 4122 conform v4 UUID.

  • Qlbuttiq (unregistered)

    Why not allow S, C, or K? I can think of a pretty good number of words that couldn't accidentally show up in one of these "tokens" if those letters weren't used...

  • Deeseearr (unregistered) in reply to Qlbuttiq

    I'm sure that having a token read "FUQQ" or "QOQQ" would be perfectly safe and not offend anybody.

  • (nodebb)

    The real WTF is that PHP's uniqid function doesn't actually guarantee uniqueness... But I guess it's par for the course for PHP

  • RB (unregistered)

    "$extra is going to be the actual filename, which if it were me, I'd append the unique fields so the name remains sortable, not prepend them- "

    errr - sorry, but they "are" appending the filename....

    wtf?

  • (nodebb) in reply to MaxiTB

    Of course, UUIDs are only probabilistically "unique"... If you're very, very unlucky, you'll end up with a collision (admittedly, winning the lottery is likelier... until Finagle's Law steps in).

  • (author) in reply to RB

    Right, but they should be prepending the filename. Or, more to the point- splitting the filename and the extension, and inserting the hashes into the middle. E.g., foo.txt becomes foo.QQF523ABC43.txt.

  • LZ79LRU (unregistered) in reply to Medinoc

    I actually ran into a GUID collision in a production DB earlier this year. Brought the whole system down too. Well, not so much crashing as in seemingly unexplainable but very wrong behavior regarding monetary transactions of significant sizes. That one was not fun to track down. I mean, it's just not something you'd ever think about until it happens.

  • (nodebb) in reply to Remy Porter

    I'd have gone with saving the file with just a (rendered) UUID as the name, keeping the mapping from that to the "real" name in a database. Like that, nobody can send a file in and have it arrive somewhere that they have any control over at all (unless the server-side code subsequently decides to allow it).

    And I'd have also used a multi-layer directory structure Just In Case™ because you really don't want to put thousands of files in the same directory on most filesystems deployed out there.

  • (author) in reply to dkf

    That'd be my general leaning, too, though I do see a rationale for including the input filename (but also risks, especially if someone is sending you a carefully malformed filename).

  • (nodebb) in reply to dkf

    because you really don't want to put thousands of files in the same directory on most filesystems deployed out there.

    I was involved with a remediation where around 125000 files were dumped into a single directory on a Windows server. It was seriously time-consuming.

    Please folks, no matter how much you decide to screw up filenames, don't dump all those files in one directory.

  • Adam (unregistered) in reply to Remy Porter

    I don't think appending is as straightforward as you think. My first instinct follows the code: the ID should be prepended to the filename.

    Consider, what do you do if a file ends in .tar.gz? Both of those extensions are significant to the file type. So maybe you decide to put the ID before the first dot in the name. But now consider, there are also a lot of files that use dots outside of the extension! So now "myapp-1.0.zip" would become "myapp-1.iasdf823.0.zip". You just can't win if you try to insert an ID between a file name and extension. Maybe hardcode some known extension values, but that complexity is just not worth it. Just prepend it.

  • (nodebb)

    Erm... in maketoken(), the $scratch variable is only modified inside the "if" statement-- as written, this code will only append the letter "Q" based on random number magic and will not append any other characters.

  • RB (unregistered) in reply to Remy Porter

    Yes, obviously.... my wtf was the use of the opposing terms in the article...

  • Nuitari (unregistered) in reply to dkf

    Also, tempnam() is a thing. Generates a file name guaranteed to be unique.

  • Conradus (unregistered)

    QQ moar, noob!

  • nz (unregistered) in reply to Remy Porter

    Come on, split archive.2021.tar.lz4 to filename and extension

  • Officer Johnny Holzkopf (unregistered)

    At least their UUID approach was more than just "use a timestamp" - imagine the fun that one hour of DST "rewinding" would create (cf. programmers believe about time).

  • Klimax (unregistered) in reply to Bim Zively

    Meh. On Windows (Server) you can work fairly easy with directory containing few millions of files… (file system will hate you)

  • arweba (unregistered)

    Since the 'Q' token generation function would be executed before the time hashing is computed, it seems like the selection of the 3 letters is meaningless except to attempt to introduce some random time into the later computed time segment of the filename.

    Its all pointless though, since, well, uuid's and all (as pointed out already).

Leave a comment on “UniQQue Naming”

Log In or post as a guest

Replying to comment #:

« Return to Article