The Daily WTF: Curious Perversions in Information Technology

2020-09-29 Reply Admin

for (i=0; i<99; i++) try {activeGrids.add(generateID()); break; } catch (Exception frist) { // keep trying}

2020-09-29 Reply Admin

"SHA256 will take inputs like 0, nice long, complex strings like "5feceb66ffc86f38d952786c6d696c79c2dbc239dd4e91b46729d73a27fb57e9". "

Want to take another run up at that sentence?

2020-09-29 Reply Admin

Ever since I first encountered it, I've had philosophical problems with the design that requires you to generate a random number when you want to put a device onto a screen. Yes I understand it's a convenient technique so as to "ensure" that every element on the screen is unique -- but using a rando has always seemed to me to be using the wrong solution to solve the wrong problem. Speaking purely ivory-towerly philosophically. Surely there ought to have been a better way to design our machines so that this should not be necessary.

2020-09-29 Reply Admin

I concur. Ideally you'd want to make sure the ID is well and truly unique. And random numbers aren't. They always have the potential for overlap. And encoding them one way or the other does not change that even if you randomly change the encoding. I'd personally prefer to have say a counter that counts all the controlls and creates a new sequential ID number. An automatic number you might call it. Now where did I hear that before...

Jaime · 2020-09-29 Reply Admin

Ummm... Birthday Paradox anyone? The actual paradox part of the Birthday Paradox is that collisions are a lot more likely than a typical person would expect. It's super easy to keep a counter and use that for ID generation. The only time you need to get fancier than that is for distributed systems.

Hashing the random number adds nothing useful. I can't imagine what the author thought they were accomplishing.

2020-09-29 Reply Admin

What's wrong with a running counter of items on the page?

generateId() { return "control-" + (nextID++); }

It may not be unique across the entire world but it will definitely be unique within the context of the page, which is all that matters.

A classic example of someone over-engineering and screwing it up when a simple solution would do.

2020-09-29 Reply Admin

Well ... Whether you need them for a screen or some other purpose ultimately you're trying to generate a set of values (to be used as keys) that are guaranteed unique. And you may need several such sets that don't need uniqueness between sets, but do within each set. And you need to give them to the caller incrementally, perhaps quickly and perhaps spread over a very very long time interval. Which may need to be unique across a count of around 10 or around 10 billion.

Practical choices are 1) consult a universal oracle who issues them uniquely off perhaps a counter or by pre-permuting = shuffling a pre-computed fixed roster of values; 2) pull a (quasi-)random (post-swizzled however) out of a much larger sample space to reduce collisions to rare enough and suffer the occasional dupe, or 3) do #2 but retry on collision before returning the value so dupes are hidden from the caller.

Any time you introduce an oracle you get threading and contention problems, plus of course the rogue dev who won't use the same oracle. And the scope of the oracle needs to be larger than the scope of use. IOW, if 10 machines need to share a unique space, then the oracle must always be accessible to all of them.

GUID/UUID is an example of evading the hassles of #1 by using #2, and the OS or language usually has a robust enough implementation that it amounts to a universal good-enough oracle despite being locally generated.

Of course the risks of confusing uniqueness with randomness are well documented: https://devblogs.microsoft.com/oldnewthing/20120523-00/?p=7553

Though we have plenty of WTFs based on roll your own GUID-lookalikes. And retry on collision + too-small sample space = intermittent severe slowdowns or hard crashes. And a need for the dupe checker to have access to the list of values already issued previously. Which usually pushes that problem out to the end-dev, where it's ignored unwittingly or otherwise.

In short, there's thorns on each solution; just pick which ones you prefer stuck in your hand.

2020-09-29 Reply Admin

Add-on. Was responding to Prime Mover as the last post then everybody else chimed in shorter & simpler.

Sorry to go all long-winded pedantic.

2020-09-29 Reply Admin

I prefer the good old:

item = new item() item.id = items.count items.add(item)

Works like magic.

Ext3h · 2020-09-29 Reply Admin

In short, there's thorns on each solution; just pick which ones you prefer stuck in your hand.

As usual UIDs are only an issue if you haven't understood the problem. The ideal solution for an UID usually involves taking not only one source of randomness, but rather many of (not so random) system sources. Randomness is only introduced if you need to either prevent guessing, or if can't guarantee that at least one of the sources is guaranteed to change in between two generated IDs.

E.g. seeding with timestamp, local atomic counter, a random seed for a local random number generator and the MAC address of a local interface (which should be globally unique, exceptions in the form of non-conforming implementations aside) provides enough data to generate an GUID which remains unique even under adverse conditions.

You still need that random number in there, in case the counter is reset, and a loss of system time occurs, in which case a UID conflict on the node could occur.

2020-09-29 Reply Admin

It's super easy to keep a counter and use that for ID generation. The only time you need to get fancier than that is for distributed systems.
Hashing the random number adds nothing useful. I can't imagine what the author thought they were accomplishing.

It doesn't really say over what scope uniqueness was needed, so maybe 99 was thought reasonable if it was only per page per client session at the infancy of the project, before all the bells, whistles and dancing bears got chucked in. If that's it, it's a common WTF, though I never understand why people want to be pointlessly frugal with that sort of thing. Still, I'm sure that sort of range limit was clearly stated in the comprehensive documentation ......

The hashing? a dumbass attempt at obfuscating the ID for some reason? maybe some other component the IDs will be chucked at expects a GUID and this was the dumbass hack?

But the real frustration is being expected to use open source/extensible stuff and then finding the organisation doesn't have the faith in its developers to let proper changes be made when needed. Mind you, this sounds like a screw up that doesn't need a fork ... put the right comment on the right forum thread and someone will fix it, hopefully without introducing too many other bugs.

2020-09-29 Reply Admin

And this is what GUID's are for.

Jaime · 2020-09-29 Reply Admin

It doesn't really say over what scope uniqueness was needed, so maybe 99 was thought reasonable if it was only per page per client session at the infancy of the project, before all the bells, whistles and dancing bears got chucked in.

I ran some numbers through an online Birthday Paradox calculator. Using a range of 1 through 99, a page with five controls has a ten percent chance of duplicate IDs. There's no way this was ever anything but flaky.

2020-09-29 Reply Admin

Yes, I should clarify that by saying I agree it was just wrong to use a small range pseudo-range number, and if you want to I can't see why not just use a full long, though I don't know the collision rate for that. But then if you want that, just use a G/UUID, there's usually enough support not to have to put much effort into it. But, as you point out, a simple incrementer would have covered it, hashed or not.

When I wrote what I did I was wondering if it had started out as a simple 0-99 counter and then someone "clever" had frigged with it.

People seem to either fear GUIDs and do dumb things like this, or are overly in love with them and try to replace every single damn integer with one.

2020-09-29 Reply Admin

damn, pseudo-RANDOM.

2020-09-29 Reply Admin

"But the real frustration is being expected to use open source/extensible stuff and then finding the organisation doesn't have the faith in its developers to let proper changes be made when needed."

The majority of companies like the "free" part of it, contributing anything back to the pro0ject costs time and resources.

2020-09-29 Reply Admin

Why not simply append/prepend a timestamp including the thousandths. I've done this many times for other reasons like foiling browser caching.

2020-09-29 Reply Admin

I think, more generally, it's to do with Lawyers. My experience with Microsoft, for example, is that they are quite open to using open source (and as a two-way street), so long as you pass it through the legal department.

Which, obviously, makes most forms of the GPL ... tricky. And, obviously, GPL3 is expressly designed to make this tricky. On the rare occasions where I feel moved to Open The Kimono, I prefer the Do What The Fuck You Like license, which is sadly underused in the community.

I'm guessing that a lot of SMEs (my own included) don't have the requisite IP expertise and licensing expertise to make this a simple tick-the-box exercise. Which is a shame. Obviously, in the OP case (and assuming the rest of the code is worth more than its weight in cow dung, which is questionable given the approach to unique ids), it would be nice to be able to do a Git push of a couple of lines or so, and solve the problem for everybody and for ever.

2020-09-29 Reply Admin

If the source library is open source and has a way to push changes back upstream, then why doesn't the developer just make the change at home on his own time (assuming it's trivial), perhaps just slightly different than what they did at work, and issue the pull request?

Granted I might not suggest that for complex changes, but for a one-liner that will make your work life better I don't see the downside. You don't even need to tell anyone at work - just let the change roll in when you update the library next.

Ext3h · 2020-09-30 Reply Admin

If the source library is open source and has a way to push changes back upstream, then why doesn't the developer just make the change at home on his own time (assuming it's trivial), perhaps just slightly different than what they did at work, and issue the pull request?

Legal minefield, from corporate side. Usually your work contract states that everything you did for work, results in exclusive intellectual property for your employer. Even though you did path a GPL library, it's your employer who holds the right to these contributions, and has to release it into public explicitly.

Just doing it "slightly different" doesn't change a thing either, as the exclusive rights are usually not limited to what you do during work hours, but also everything related to that domain you do during off-hours.

R3D3 · 2020-09-30 Reply Admin

then why doesn't the developer just make the change at home on his own time (assuming it's trivial)

For a start, contributing to a project is never ever trivial, even if the change itself is.

Unlike I'm allowed to to this on paid time, I'd write a bug report and make a workaround, maybe with a comment about the reported bug.

If even reporting the bug isn't allowed on paid time, then I'd probably be too frustrated with work to even care.

2020-10-02 Reply Admin

bro wtf you're so pretentious

2020-10-02 Reply Admin

I should point out that this "everything you do, including out-of-work hours" is a bit contentious, and limited (AFAIK) to the US. I'm not even sure it's been tested in the courts of the USA.

As of now, it doesn't look like a legal imperative. But (like so much of the law in any country you care to name), it is definitely an impediment. The easy way out? Your "legal department," which in my case is the boss of a ten person software house, is happy to use quite ginormous amounts of FOSS -- defined generally, and as long as it comes with an Apache license or a Berkeley license or similar. But "giving back" is a legal minefield.

Look at it this way. If you use a geometry library (say, matrix manipulation and vector arithmetic) with an open source, absolutely nobody whatsoever is going to care. It's invisible. You're not going to get into some stupid public argument about twenty lines of pilfered Java, like Oracle and Google did.

But, if you send a well-intentioned Git push back upstream -- you've just taken ownership, legally and publicly. Unsurprisingly, many managers, many programmers, and most all people working for SMEs, are not willing to do that unless they pass it through a hugely qualified and specialised legal department that may or may not exist.

(It almost certainly does not, for an SME.)

I have no solution for this. We are in a crappy position, software-quality wise. I'd like it to be better. But I doubt it will.

Taking Your Chances

Leave a comment on “Taking Your Chances”