- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
He never mentioned changing the workflow, he asked how to do it a better way. Read his question again, but this time don't make assumptions about what you think he might be implying, go with the actual text.
Admin
You read it again.
He states: "like, to create a link to the new object in several tables".
That means he's using it in the DB, probably to insert records into child tables (why before the parent record is written is a mystery).
A WTF wrapped in a WTF.
Admin
Unless I'm mistaken, the '+ 1' makes it impossible for a 0 to appear in any of the randomly generated IDs. There are 10^6 numbers with 6 digits but only 9^6 numbers with 6 digits are used as IDs in this system. That's just a little over half (531,441 of 1,000,000).
Of course, this is like noticing that a car crash victim needs a manicure.
Admin
Specially since he is using 16 digits, so the actual numbers are 10^16 and 9^16, so he'd be using "only" 1,853,020,188,851,841 of 10,000,000,000,000,000 possible IDs :-)
Admin
It's really not that mysterious. Let's say you are performing a "Unit of Work" -- you are inserting several Parent records which each can have zero or more Child records (of one or several different types, where the children may be parents of children). You need to insert them all or none at all -- the set of parents and all the parents' children. So you construct the object graph in memory. If the parent is using an IDENTIY column as the PK and you have included the parent's PK column in the child's PK, then you can't create a simple batch of SQL statements to write the object graph to the database. You have to insert the parent, get the identity column back, and update all the children of that parent in memory with the actual identiy column assigned, then write the children. Or, you're SQL batch has to use @variables and becomes much more complicated, especially if you are trying to stay database agnostic. If you are using a UUID/GUID or other surrogate key value that you know before going to the database, creating the batch to persist the object graph is extremely simple.
Admin
Serializing on a table like this would cause a big bottleneck if this is a high traffic OLTP system. The only way to get around it is to use "autoincrement" in MS-SQL and Sybase or sequences in Oracle. Those features are designed to dole out sequence values in a much more scalable fashion than using a table as you have described. Yours is a solution, but not a very scalable one...
Admin
Only if your database grabbed a read lock on the table when it did so. (In which case, get a better DB.) It's less likely to collide, but still not safe.
Admin
I don't condone the way that the original coder did this. I'd rather see the unique key being generated in a stored procedure/function on the database. This way, the webserver isn't constantly running code as well as the database.
However, if the goal is to generate keys that are difficult to guess then they're really running into problems. If the key-generation code can't avoid collisions, then an attacker will be able to do the same thing. Regardless of the algorithm/implementation used, they needed to increase the length of the key values.
The only time that I've had to generate difficult-to-guess, but unique, keys involved using several calls to a random function, seeded with different values (date+time, process-id, number of records in some random table that fluctuates a lot), which reduces the chances of collisions.
However, (I haven't tried this), if I wanted to generate a set of unique keys that no ex-employee could know guess (apart from the ones that were generated already), I'd probably create a table to hold unique keys and generate a set of values up-front. This way, even if it slows down, so long as there are unused values in that table, the code that needs the unique value will not see a delay. The processes to generate keys would then run in the background. If an employee leaves, or the data is compromised, all unused keys could be cleaned out and regenerated, (using a different algorithm, if necessary).
The method for generating these unique values is left as an exercise etc. The only criterias are that it's very difficult for someone to see a handful of values and generate the next one on another computer, they have to be generated faster than they are used and the must be distributed in such a way that is difficult to guess.
Admin
Do you mean *parameters* ??? Yeah, it's stupid and complicated to use those when executing database commands. You have to make sure you assign them all values to avoid errors, and you have to use datatypes and everything !! Much better to concatenate strings together at the client and execute them directly.
If the "ID" is random or sequential or meaningless, then there is no reason why the client needs to know it at any point before inserting data into the database. If you feel it is "complicated" to call one stored proc to get the "ID" so you can simply pass that ID into subsequent stored procedures, then you shouldn't be doing any database programming whatsoever.
When you say you need to "update the ID for all the children in memory" you're joking, right? Why the heck would you manually copy some parentID to all objects in your hierachy in memory before writing them to the database? You could just, uh, simply use a reference the parent object's ID, or use a simple integer variable to store it, and so on.
Admin
This problem might have occurred before and have been delt with in the same fashion...
Of course, one shouldn't waste time to fix something completely when it can be solved temporarily so much more easily.
Admin
Right. That's probably why they ran out of IDs . They only had 1.8 quadrillion possibilities. ^o)
Just in case you were serious, here's the original post:
Admin
But only if you update first, then select the new number. Selects don't usually lock tables.
Admin
Jeff, you are a complete idiot. I'm not going to waste time arguing. Re-read my posts. Read up on the subjects addressed, and then get back to me.
Admin
This comment is so amusing. I can see exactly how sophisticated the work you have done is. You have no clue and don't have a clue that you have no clue. All of your commentary on this thread has been caustic and belittling. Believe it or not, you don't know everything and the way you do things isn't the only way or even a good way for many situations.
Yeah, I've never returned the scope_identity from a stored procedure. Thanks genius. Don't you know that 1+1 = 2. What a loser! You should try sleeping in caves and killing wild animals with tree limbs. It's the latest craze.
Admin
Not sure where you're going with this, John...
When I design a db, I use non-null FKs in the child tables. If I understand what you said above, the child FKs would need to be nullable so you could insert the child attributes before you know the FK to the parent (because that hasn't been written yet). Sounds like a recipe to kill your RI.
If I was inserting into multiple tables, I'd use a transaction anyway, so inserting the parent record first is no big deal; if a child insert fails, I roll back.
All of this is done in the db; there is no "object graph in memory". I want to let the db manage the RI; I don't want my app to have to worry about it.
Admin
Here's a thought...
I assume you were using a language like C, with code like the following:
delta=rand() % 5 - 3;
Perhaps, since RAND_MAX (32767) % 5 == 2, this means there was a bias in your mapping of (0..2*32-1) onto (-2..+2) it explains your problem?
Granted, it could take a bit for this to manifest...
In any case, if you would post your source code i would bet dollars to donuts that the problem was your code, not rand()s inherent non-randomness that did you in. =)
-dave-
Admin
Err i meant:
delta=rand() % 5 - 2;
mistype...
-dave-
Admin
JohnO -- I know that clearly I am a moron and most likely won't understand, but can you give me one of your "sophisticated" examples? Or one of those many situations in which "my way" is not the best way? Or, if it is easier, you can keep calling me names I guess.
Admin
In the third page there is a post from Vector, where he explains how he generates random ID for a session using a technique similar to the explained in the WTF. I thought that kanet77 was talking about him. Thanks anyways.
Admin
Database agnostic?!? That's limiting yourself for future changes. What happens if, sometime in the future, someone wants to change from a database to flatfiles? Or, XML? Or, cuneiform tablets? Because, you know, in large and complex information systems like the one you describe, that happens all the time.
Admin
No problem. Just make sure that six abstraction layers separate the application from the database. Then, if upper management recognizes the business value of having all data in XML files, just roll a dice to decide which abstraction layer should be adapted.
Seriously, there are only two reasons to stay DB-agnostic: code reuse (maybe the next client wants the same application an a different DB-system) and employee reuse (since some members staff is not trained in xxx, don't use exclusive features of xxx).
Admin
Jeff, you are pretty annoying, so how 'bout you just sit quietly for a while?
John, let me take a stab at this ('cause I'm been thinking through a problem like this for a while now):
You have a large OO system where each object is well encapsulated and knows how to persist itself (a system of these instances is the "object graph in memory"). This works great except where the object is composed within a parent object, and needs its parent's PK to persist itself. This object system is abstracted from the physical DB schema, so there's not a nice 1:1 object-table correspondence. What's an efficient & elegent way to persist this system?
I guess you could start at the root object, persist it & get its PK, then traverse the graph doing the same kind of thing. Sounds brute force to me.
I think John was saying that if we created the IDs (putative PKs) for the objects --algorithmically-- when they were instantiated we would be able to persist the objects "better". Sounds like a neat idea.
John: is this anywhere in the ballpark?
Admin
I worked on a large shrink wrapped system that allowed the purchaser to use their backend of choice (e.g. SQL Server, Oracle, Sybase &c.). To avoid writing & maintaining multiple versions of database code, we endeavored to be as DB agnostic as possible.
And we only supported Linear B, not cuneiform.
Admin
OK. Sorry. Let me know when you guys have this scenerio all layed out and when I can respond. To be less annoying, I will follow JohnO's posting style and completely ignore the actual topic and call people things like "clueless" and "idiots" and insult them personally, instead of addressing the actual content of their posts directly.
Admin
I'm afraid we were forced to use 8086 assembler. Of course, we didn't make great games, but I got lo learn a lot about how computers work on the very inside, you know, registers, stack, macros, gotos,....
If I knew much more about maths, I'm sure I could see how my using of hundreths of seconds, subtractions, etc, was causing some sort of bias. I'm afraid that I just jumbled the numbers around until the bias got lost in the maze of calculations :)
This was the random macro, which used the system time
This is the movement macro. Notice that the output set was [-3,3], not -2.2 like I had said (it was many years ago).
SS:[__P_RANDOM] is referring to a position inside the stack where we are storing the generated random. I wanted to make it different, so I used no variables. I just stored everything in the stack :)
I get the parameter DESPLA, and then I substract from it several random numbers, so to leverage any bias on them.
__P_RANDOM_4, etc, were calculated in a different macro. At each iterarion, I generated a random using the system hundreths fo seconds (0-99), then I calculated and used 1 and 3, or 2 and 4, depending on whether the hundreths were greater or smaller than 45. Of course, if I were using 1 and 3, I made sure that at some point the values of 2 and 4 were used to further "e;ofuscate" the result. If I used 1 and 3, then 2 and 4 were not re-calculated, I just used the value from the last time they were calculated.
Notice that DESPLA is already one of __P_RANDOM_1,2,3 or 4, as calculated of the former iteration.
I appplied a three bits mask (00000111) so I got the lowest bits (the most random ones, according to my maths teacher), so I get a number between 0 and 7. Then I move the road borders like this:
It took me lots of time to mix the randoms so much that I hadn't the problem of "my program takes exactly a third of a second to iterate, so I'm getting all the time the same seeds, because I'm using the hundreths of seconds as seed". For example (33,66,99).
This code used a random to determine which randoms to use. I deleted some intermediate jumps that I had to do because macros would take so much space and JMP wouldn't jump more than xxx bytes of code. Those were the days.
Sorry, I didn't know how to explain it in a short way
Admin
This IS the real (REAL) WTF... People getting so terribly upset about stuff they read on a public forum (this not _only_ directed at you Jeff, I just used your post to quote, it was in the end of the thread). The internet WTF ;)
Yeah, yeah, I know, I should F¤#%#¤#%CKING MINDING MY 0WN %¤#&%¤#&%¤ BUSINESS!!11!111
(I don't think Paula ever would get into these kind of 'arguments'...)
Admin
What, no one mentioned hotbits yet? It's about the only way to get truely random numbers. Although building custom radioactive hardware just to generate keys might be a teeny WTF in itself.
Admin
I prefer using my LavaRnd[1], thank you very much.
[1] Lava-Lamp based random number generator, http://www.lavarnd.org/
Admin
Ok. I'm glad we cleared that up. I'll quote more context next time. ;-)
Admin
a) yes, true, less than 100 people use it.
but i could add milliseconds on the end if need be.
b) i have constraints in my sybase ASE SQL db.
thanks.
Admin
Admin
Actually the AUTO_INCREMENT works in a WTF mannor as well with sybase at least it was the case in pervious versions. This was happening in version 11. You would think that after 11 versions that this would have been fixed.
You keep adding records in the database and each number is one more than the previous until.... the database server crashes or is restarted or something. I am not exactly sure but it adds 50,000 or so to the last known value and starts from there. Eventually you could run out of IDs very quickly if this sybase fix keeps getting triggered.
Now that is a WTF moment.