- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Normalization anyone?
Admin
"That's not normal."
Admin
Just what is it that some people have against relational data? That's only partly rhetorical - seriously, does anyone know?
Admin
I poked into some of the underlying tables for MS Sharpoint Portal Server and found, to me, the most ostensibly bad database design. It had columns named {int01 . . .int32}, {varchar01...varchar32}, {text01...text32}. That was impressive (impressively bad), but this . . .this is AMAZING!
Admin
I have to maintain a system just like this. Serialised Java objects. My colleague who "designed" that monstrosity is no longer with the company and there's no budget for me to fix it, so this "database" is now almost 9GB in size.
Recently, the user asked me for a database dump so they could do their own data mining.
They got what they asked, even if it isn't something they can use.
If that system ever needs some real changes, I shall be unable to be found and all my mail will be forwarded to an anonymous PO Box on the Bahama's.
brillant captcha: paula
Admin
Easy. Relational data requires careful planning, education, patience and discipline. Most programmers have none of those traits, as we have learned over years from this very site.
Admin
nice very nice! I love it when they "don't use the product as they are suppose to". Maybe there should be warning labels for databases.
Do not put in Toaster!
Not for children under 13!
Do not use to store serialised objects, it's not my purpose!
Admin
There can be useful places to put some data in opaque blobs like this, especially some bits where the data is tree-structured. You sure as hell want to keep some bits exposed in the table, and have some API written to get at the contents.
It's pretty obvious that this isn't stuff you want to be using the database to find though. :)
Admin
Isn't Alex still part of the "I Hate Oracle" club? Shouldn't he be archenemies with Tom, who makes a living with Oracle, thus propogating its continued usage?
Admin
I suspect that one contributing factor to the lack of database intelligence comes from spreadsheets, where you can store pages of simple lists with formulas.
Scroll down or to the right and you've got a completely new set of information.<>
<><>So how hard can it be to implement this in a standard database? It's just one long list, right?
<>Relational databases, structural integrity, security, and normalization are things that a lot of people can pronounce, and because of that those people think they have an understanding of it. Oh well. I get paid a bunch to fix their mistakes.
Admin
crap i was going to do this.....
Admin
I used to do mostly 'flat' databases, though I had self-discovered normalization to some extent on the more complex tables. The biggest problem is that all these SQL references and books didn't even mention the concept of it. Accordingly, if anybody reading isn't exactly sure what the concept is, google for Database Normalization. It really makes a big difference in crazy database projects.
Admin
This WTF is kind of flat but I guess Analysts and Architects are always to be watched.
As I always say, "Sign me a paper for this."
They'll always try to make it your fault.
Admin
There would be lot less database/Oracle-WTFs if people would start reading the (quite extensive) user guide and sites provided by Tom, Jonathan and others. But, we all now: every developer is a class by itself.
l.
Admin
For those times when you need to persist a serialized Java object, the filesystem works really well. Why people keep dumping object graphs into the database is beyond me.
Admin
Well, it's Oracle's own fault. They turned their database into a file system, so why should we be suprised when people start using it like that?
Admin
I think it's the opposite of a lot of other WTFs. They try to find a better way of doing it when there isn't.
Admin
Well after all Oracle is the "Object Oriented Database"...
So who is suprised by that?
And XML fields... Hmmmm Last HUGE Project I was NOT working on had those (I was complaining and fussing about them non stop....) They started to store XML fragments in Varchar fields... Important Data... Data that needs to be querried regulary... Data that should be indexed... Data that was processed on ONE SQL Server before the migration... Is now hosted on a system of 10+ Servers ;)
But after a while those fields became so HUGE they had to switch from varchar to text ;) Damn... The project had so many WTFs that I could almost replicate the whole site with those :)
If anyone from my old company is reading this: Hello guys... I am really enjoying my new Job - And I get a chance to produce my own WTFs now :)
P.s. "Dumme Belgier"
Admin
Hahah... I had to deal with a system nearly exactly like this. We developed a system that was sold and used by a bunch of customers, and one of our competitors did much the same thing. Eventually, one of our competitors customers saw our reporting mechanisms, liked them, and asked us to graft them onto our competitors system. We asked for an extreme fee, of course, and surprisingly got it.
Since they used an Oracle database and the data collection was largely the same, I figured it would be no big deal, just a remapping of the various column names and maybe some minor coding to deal with the format differences.
So I went over to their site and spent a couple hours looking around at how their data collection worked, to verify that the data would be more or less compatible. Eventually I got around to looking at the database. What I found was shocking.
One table.
Two columns.
One column was called "row" and was an auto-increment number column.
The other column was called "data" and was a blob.
And the data appeared to be a hex dump.
After some heavy drinking, I came back and wrote a small little program to read their data. Every 5 minutes or so, it would check their table for any new rows, pull them out, parse them, and insert them into our own existing database structure.
It worked well, as long as you kept the service running. The beauty was that since it parsed the data into our own tables, I didn't have to change our reporting one bit. It just worked.
Okay, so the solution was a WTF, but I didn't have their source code to fix the problem, and would have had to charge more for doing so anyway. As it stood, I solved the problem in 3 days, and 1 of those was spent drinking, to try to dull the pain. They thought I was a miracle worker. :)
Admin
I think it stems from the major DBMS vendors... They really seem to dislike that relational model stuff.
Admin
The problem that some people have with relational data, is that it conflicts with the way data are handled in the client(Client is here: The software that uses the database, it might very well, be a java servlet server) . If you for example are making a dating site, and you have a UserAccount class, which contains references to many other objecs such as MyImages,MyFriends,MyMailBox,MyGuestBook and so on,
you have an large object graph. This mean that when you need to load the user object, you alse need to reference to a lot of other tables, containing the GuestBok data, the ImageData, the Mails and so on.. The result is a major, ugly join which the java client, then have to read, and then greate the object graph again. (And things get even more ugly if you got sub objects with subobjects. Example
Class UserAccount {
Vector<MessageGroup>myMessages;
}
class MessageGroup {
String title; // Other data here.
Vector<SingleMessage>allMessagesInThisGroup;
}
class SingleMessage {
String title,Message; // And so on.
}
Now to load an UserAccount from the database, including all that users mail, you must do a join between the user, the messagegroup and the SingleMessage table. Parsing the result of the join on the client is diffucult, and require quite some cpu time, becasue the reply from the database contains redundent data. (And having to writing all that code that transform data to/from a object graph sucks.)
There are however sereval solutions: There exists projects such as hibernate which contains code to automatic transform data to/from object to database. (Not perfect, and it sometimes cause to much overhead, and it somehow limit which queries you can ask about your data)
The best solution would be to use a object orientered database, but nobody have really made a good useable general purpose language neutral oodb, so it's not always a solution.
Admin
I almost did do this for a small project I was working on. Not in the complete wtf way as described aboved, only in a semi wtf way. Luckily I caught myself before I finalized any designs and started coding.
I wasnt going to use a serialized object for the normal data, only to keep track up modifcations to certain objects when they are changed. A half assed version history for the data if you will. I figured that history tracking wasnt an important part of the project so I could get away with it. But I realized that every time some one updated a single data item in the object it would store of full copy of the object. It would have been way too much data being stored. I decided to spend the extra time to implement a way just to keep track of the items that were changed instead of the whole object.
It made me realize that spending that little bit if extra time even on the little things make a real difference in the end product.
Admin
Just about every beginning database book has a section on normalization. I have yet to see one that doesn't.
Admin
The filesystem don't really work that well for dumping of java objects. The problem is that you can't* append objects to a file, after you have closed it. Depending on what exactly you need to do, and what systems you need to work in, this can be either "No issue at all" or it can really suck. (Think: The combination of windows(You can't read anything from this file, while I have it open for writing) and solaris(Why would anyone ever want more then 256 open files)).
*Atleast I never found out how do it(You can write the objects, but you can't read them back again), but if you know how to do it, I would like to know :}
Admin
I once heard an amusing anecdote about a physics student taking an exam. One question had him stumped, and in a panic he wrote down some really random stuff, hoping for partial credit. When he got the test back, the professor had written a comment: "You're not even close enough to be wrong."
This question to AskTom is kind of like that. The guy is not even close enough to having the design right to ask the question about Crystal Reports, etc.
Admin
Task for 1st year CS students:
Find a way to represent a tree structure in a relational database.
Ain't no rocket science. So there's no reason to put trees into a blob.
Images and the like -- ok. But nothing else.
Admin
Reporting. The last thing on everybody's list. "We want it to do blah blah blah in the first rev. In the second rev, we want reports."
It's the last thing, because nobody knows what they want the report on, and it's always assumed to be easy (because of shitty tools like crystal reports, etc). And of course, this assumption depends entirely on the design.
OR mapping can be tricky. The tools and frameworks designed to help you with it often make it more tricky. Serialization is fast for development and well understood by developers and if your object's data members don't lend themselves to sorting or synchronous updates you may as well just stick them somewhere as bytes. The file system is just as good as the database -- they're the same thing these days.
My solution here? Datamarting. Collect just the important information about the processing and state of your objects as they're persisted solely for the purposes of reporting. It won't allow easy open ended queries on everything, but that's a bad idea anyway, and if the clients need that level of specificity you've either got to design for it or write the query tool for them.
Personally, I love databases. But I've drunk the Prevayler kool-aid and found it strangely alluring. You do not always need a database.
Admin
Actually, the best solution is to only load data when needed, write efficient SQL, used stored procedures (and/or a data access library to handle all database interfacing), have a good database design, follow best practices, and write good code. Every "problem" with relational databases that you described in your example is the result of not doing things in an optimal or correct way. Why load up all data into objects before they are needed? Why not load them from the database when they are actually *requested*. Just because a User object might have lots of related content, there is no reason to query it all in from the database until you actual need it. And if you do need it all, and you do pull in all related data from the database and create objects from it, what's wrong with that? I fail to see what the issue is. Too much code to write?
What you describe is what occurs when someone uses a relational database as a substitue for an XML file or a text file for persisting data, and doesn't properly utilize the database engine at their disposal.
The benefit of stroing your data properly into related tables instead of a massive objects stuffed into generic "object" columns far outweights the "hassle" of writing code to do the transformations. .... not only that, it is very easy to do with simple code generation tools. The key is to let those tools generate the boring stuff like assigning columns to properties, but never let those tools write your SQL, especially when joins are involved.
Admin
I use views and stored procedures for this. Tto shield all the sql query complication from the code that is used to actually display the information. that way you are only returning the important info to the client, therfore preventing redundant data or extra processing.
Admin
That's what lazy-loading ORMs are for.
That's what lazy-loading ORMs are for.
Admin
Here is a reason why someone would want to keep serialised Java objects in a database: Consider a product table where you have, besides all those normal columns like name, description, price, size, weight etc, a blob column containing a Java applet that somehow displays the product; depending on the kind of product, this might be an applet displaying an interactive 3D model for one product, while the applet for another product shows a short video clip.
Admin
Why not serialize the objects to XML then store it as text? At least C# is capable of it.
;-)
Admin
I interviewed at a company that did something like this.
A headhunter told me that a local company was looking for guy woh knew Linux and was comfortable with Perl, and sent me on an interview. The guys I talked to there wrote software which, if I remember correctly, created highly specialized "catalog" style web sites for vertical markets. Think Amazon, but instead of books and DVDs and whatnot, it was all machined parts for manufacturing, equipment, et cetera.
The trick was, as they explained in my interview, that pretty much all the data was stored as native Perl variables (hashes, et cetera) which had been serialized and written to the database. I assume that the DB would look a little like the one described here. Even back then - this was at least 5 years ago, probably more like 7 - I kept thinking, "I'm not so sure this is a good idea." So I declined a second interview.
Oh, and a follow-up WTF - when I told the headhunter that I wasn't interested in the position, he acted shocked and attempted to guilt-trip me into going back for a second interview and signing on with them. "Well, if you really don't want to, fine. You know, I'll call them, and be the asshole, I don't mind."
After that, I let the head hunter know that I wasn't really interested in working with him either.
Admin
Careful planning? Patience? Hey, in relational databases you can just insert and remove columns with only one SQL statement! It's scary. Using relational data is for the careless and impatient and undisciplined.
Using blobs, on the other hand, calls for careful planning (just imagine the costs of adding a new field), patience (if you want to look objects up by anything except their ids) and discipline (none of your silly ad hoc column additions and removals relational databases are tempting you to)
Admin
Actually there's one valid excuse for putting serialised objects (or any other kind of unstructured data) into databases, namely when they are not manipulated by your application.
Admin
This WTF reminds me of a Java library some guy told me about: Prevayler. This guy defended it as a panacea, stating "forget SQL: you can access your objects directly".
People who can't understand relational databases are likely to reinvent it. Poorly
Admin
Aye mate, do you have any clue what you're talking about?
I'll leave the solution to you -- it's too embarassingly simple to write it down here.
Admin
All of you DBAs out there who work with developers who actually understand that a "database" is really an application that manages user concurrency, statement and transaction consistency, and data quality should actually thank them. I certainly do!
Admin
Maybe they were chinese. Most DBs don't allow unicode table names, and if most of the developers don't grok english, then you have the choice of doing pinyin names or table1, table2, table3.
Admin
You're right in a way. You've only forgotten about one crucial resource: time. Writing, testing and especially debugging stored procedures manually takes a lot of time. Usually it takes more time than what you have.
You then might choose to develop a toolkit for helping you doing all the above. But then you risk building a potential monster, a horrible mesh of scripts, triggers and helper stored procs totally incomprehensible to anyone but you.
Or you might try using a "standard" O-R mapping tool, like Hibernate, sacrificing some of the performance to shorten development time.
Or you might choose something in between, depending on the actual task and deadline and speed requirements.
But we're overcomplicating things. You simply have to use the right tools for the right job. All you need is common sense. Sadly, that's what many programmers lack entirely.
Admin
Some people, when confronted with a database problem, think "I know, I'll use lazy-loading ORM." Now they have two problems.
Admin
Yes... But at least I wont be asked to generate a REPORT from those applets, nor is anyone interested whats on line 287 (I didnt pick 42 by purpose... Its overused) And i would agree that you can stuff binary data into a binary field if it is in fact... binary...
Admin
Ain't no rocket science. So there's no reason to put trees into a blob.
Images and the like -- ok. But nothing else.
As with most absolute statements, this one is over-general. There are good reasons to put binary trees in blobs -- they will consume a lot less space in the database. However, if you plan on querying the tree from SQL, it's not a good match.
The thing that makes this stuff difficult is that you have to be smart about it. Most people are too lazy.
Admin
Admin
I forgot the quote on this one... And here it is
Admin
Admin
What kind of applets? I thought about things like Oracle Advanced Queuing where you just have to deliver the message payload to the recipients without having to know what's in it.
Admin
When are the times you don't want to "persist" a serialized Java object? Why would anyone go to the trouble of serializing an object, if he wasn't going to "persist" it?
And, of course the filesystem works really well... that's why databases use the filesystem to do their "persisting". You must mean something else, like, "ASCII files work really well", or something like that.
Admin
You're still thinking of the users who want to access their data from a different application, right?
Admin
That's awful, sorry. You're talking about one applet per sku with a potential of some 100M skus (for a large inventory). Even 1000 skus will stress this. A marginally reasonable approach would be to include categorization data in the sku row and link that to applets or whatever tech you use for custom previews.