The Daily WTF: Curious Perversions in Information Technology

WWWWolf · 2006-04-22 Reply Admin

Alex Papadimoulis:
The Inner-Platform Effect is a result of designing a system to be so customizable that it ends becoming a poor replica of the platform it was designed with.

Not exactly a new antipattern...

"Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."
- Greenspun's Tenth Rule

"...including Common Lisp."
- RTM's Corollary to Greenspun's Tenth Rule

biziclop · 2006-04-22 Reply Admin

ammoQ:
Even the Orable guys aggree with Alex on this one...

Do not use Generic Data Models

I see this antipattern all too often, and the worst thing is, uneducated bosses ask for it, dreaming of a system "so flexible that we never have to change the program, just the parameters"

Now, this is only partly true. Take a reporting system for example. It's sole use is to gather data from other systems and produce reports and graphs in various formats, and display them via various devices (send them in email or MMS)

You don't want to fiddle with the code every time a new report comes in. Obviously you want to use some dinamic sql to create a new report or to handle a new data source which in turn obviously requires someone who can construct those sql statements but one who doesn't have to know JAVA or the intricacies of the actual system.

That's cool and it's working. Now, the posted example is neither cool nor working. :)

WWWWolf · 2006-04-22 Reply Admin

Anonymous:
JBL:

Yop, 3 lines to implement a LISP(tm) interpreter in LISP(tm). Try that, (insert other language here)!

Well, let's be fair: those are three very very long lines of LISP code...

Actually, they're not that long...

http://lib.store.yahoo.net/lib/paulgraham/jmc.lisp

biziclop · 2006-04-22 Reply Admin

ammoQ:
Dustman:
ammoQ:
Even the Orable guys aggree with Alex on this one...

Do not use Generic Data Models

I see this antipattern all too often, and the worst thing is, uneducated bosses ask for it, dreaming of a system "so flexible that we never have to change the program, just the parameters"

We use something somewhat similar at our shop, although the pattern is only used for our client's products, not the entire database in general. It works fairly well, because of a number of things:
...

To be honest, in a very limited range I've used that pattern too, without problems. Some years ago, I made a database system for a mobile telecommunications company that kept info about available cellphone models. Each cellphone has a lot of "features", e.g. "Java enabled", "Color Display", "Real Tones", "MP3 player", ... Since the makers of the phones add new features often, I did not make a column for every feature, but a "feature" table with a many-to-many relationship to the phone table.

Funny, I had to write exactly the same program some years ago. :)

wizztick · 2006-04-22 Reply Admin

JBL:

Yop, 3 lines to implement a LISP(tm) interpreter in LISP(tm). Try that, (insert other language here)!

PHP example: eval($code);
bash scripting: eval "$code";
(insert other scripting languages here)!

ammoQ · 2006-04-22 Reply Admin

biziclop:
ammoQ:
Even the Orable guys aggree with Alex on this one...

Do not use Generic Data Models

I see this antipattern all too often, and the worst thing is, uneducated bosses ask for it, dreaming of a system "so flexible that we never have to change the program, just the parameters"

Now, this is only partly true. Take a reporting system for example. It's sole use is to gather data from other systems and produce reports and graphs in various formats, and display them via various devices (send them in email or MMS)

You don't want to fiddle with the code every time a new report comes in. Obviously you want to use some dinamic sql to create a new report or to handle a new data source which in turn obviously requires someone who can construct those sql statements but one who doesn't have to know JAVA or the intricacies of the actual system.

That's cool and it's working. Now, the posted example is neither cool nor working. :)

You miss the important difference between "generic data model" and "dynamic sql". There is nothing wrong about a system that lets the user create reports based on the existing database.

tufty · 2006-04-22 Reply Admin

ammoQ:
tufty:
ammoQ:
Since the makers of the phones add new features often, I did not make a column for every feature, but a "feature" table with a many-to-many relationship to the phone table.

Yeah, but that's not a generic data model, is it? It's a correctly (at least as far as you have described it) normalised data model. Well done. Carry on.

Simon

Or so it seems, until you try to query all cellphones with several features, like: "color display" and ("WAP" or "HTML") and "Java" and ("polyphone ringtones" or "real tones") and "dualband" and "organizer functions"; it's not an uncommon or unreasonable request, but the query is large, ugly and probably slow. Granted, there were only a few hundred phones in the database, so performance was not a problem.

Correctly normalised doesn't always mean "easy to query". But it does mean you don't have to add a new column and the corresponding code to handle it every time a new feature comes out.

Simon

masklinn · 2006-04-22 Reply Admin

wizztick:
JBL:

Yop, 3 lines to implement a LISP(tm) interpreter in LISP(tm). Try that, (insert other language here)!

PHP example: eval($code);
bash scripting: eval "$code";
(insert other scripting languages here)!

You kinda missed the point, which actually was to implement eval, or even an REPL, without using any built-in eval function. In Lisp, that can be done in a dozen of lines (and that's actually how the first Lisp interpreter was built, more or less)

Usual Eval functions in later languages makes supplementary calls to the whole interpreter, it doesn't interpret the code by itself.

Which is fine, but shows the simplicity of Lisp.

ammoQ · 2006-04-22 Reply Admin

tufty:
Correctly normalised doesn't always mean "easy to query". But it does mean you don't have to add a new column and the corresponding code to handle it every time a new feature comes out.
Simon

Adding a column is easy. That's what databases are made for. Changing the code is not necessary - just use dynamic SQL to make the appropriate query.

Cooper · 2006-04-22 Reply Admin

Alex Papadimoulis:
<snip>
They're more or less the converse of
<snip>
I call it the Inner-Platform Effect.
<snip>

Um - excuuse me. KEDS were the ones with the inner platform.

tufty · 2006-04-22 Reply Admin

ammoQ:
tufty:
Correctly normalised doesn't always mean "easy to query". But it does mean you don't have to add a new column and the corresponding code to handle it every time a new feature comes out.
Simon

Adding a column is easy. That's what databases are made for. Changing the code is not necessary - just use dynamic SQL to make the appropriate query.

<sigh>

Okay, let's go through this again as I understand it.

A phone has a number of features. The set of features a phone could have is not fixed, and indeed, could be described as "very flexible"; manufacturers come out with new features all the time. So, we have 2 options:

Firstly we could model the phone as one entity and the feature as another. Something like this:

<font size="2">create table handset (
id int not null unique primary key,
name varchar(255)
manufacturer_id int not null references manufacturer(id) on delete cascade
)

create table function (
id int not null unique primary key,
name varchar(255)
)

create table handset_function (
handset_id int references handset(id) on delete cascade,
function_id int references function(id) on delete cascade
)
</font>
Now, that allows us to add arbitrary functions and map them to arbitrary handsets, on the fly, without changing the database structure, but it's a bit of a bugger to do big arbitrary queries on.
Alternatively

<font style="font-family: Courier New;" size="2">create table handset (
id int not null unique primary key,
name varchar(255)
manufacturer_id int not null references manufacturer(id) on delete cascade,
has_wap smallint default null,
has_downloadable_ringtones smallint default null,
etc etc
)
</font>
Which is easily queried, but every time a new function comes out you need to change the structure of the table. Dunno who you've worked for in the past, but the DBAs where I worked would have had me hung, drawn and quartered for such a suggestion.

We might be able to improve on it a bit by having a bitfield encoding the various features, but you still run up against having to change the table when you run out of space in your bitfield. And it would foul up your queries, too.

Then there's the code. Sure, your query can be almost fully automated using dynamic SQL (although dynamically generated SQL is likely to be an awful lot less efficient than hand-crafted queries), but at some point you're going to have to ask your users what features a phone has, and they are going to want to query based on those features. And unless you want to be providing actual column names to the users, you want some sort of human-readable name to be displayed. Which means that yes, you are going to have to change your code, if only to map handset.predictive_texting to "Predictive Texting". Yes, there's ways around that too, but you start reaching the point where it's more complex and difficult than doing it right.

This is not to say that the denormalised approach would not be a reasonable one were the feature set fixed, just that as you described it, it would be more work than using normalised data and spending some time on your queries.

As it stands, you're probably going to want to use UNION, INTERSECT and EXCEPT.

Simon

ammoQ · 2006-04-22 Reply Admin

tufty:
<sigh>
Which is easily queried, but every time a new function comes out you need to change the structure of the table. Dunno who you've worked for in the past, but the DBAs where I worked would have had me hung, drawn and quartered for such a suggestion.

That's not a technical problem. It's a DBA problem. The only drawback of an alter table is (at least in Oracle) that stored procedures referencing that table will recompile.

We might be able to improve on it a bit by having a bitfield encoding the various features, but you still run up against having to change the table when you run out of space in your bitfield. And it would foul up your queries, too.

A bitfield is denormalized, IMO not a good idea.

Then there's the code. Sure, your query can be almost fully automated using dynamic SQL (although dynamically generated SQL is likely to be an awful lot less efficient than hand-crafted queries)

In that case, the dynamically generated query is very simple if we have one table, and a lot more complex if we have three tables. Since we don't know in advance which featureset users will ask for, it's dynamic SQL anyway.

But at some point you're going to have to ask your users what features a phone has, and they are going to want to query based on those features. And unless you want to be providing actual column names to the users, you want some sort of human-readable name to be displayed. Which means that yes, you are going to have to change your code, if only to map handset.predictive_texting to "Predictive Texting". Yes, there's ways around that too, but you start reaching the point where it's more complex and difficult than doing it right.

I see no problem if the program automatically adds the necessary column whenever a new feature comes; there are several ways to keep the human readable names; and of course the users will make their queries in a query-by-example fashion.

Or, to put it in other words, from the outside, the users see no difference between the one-table-model and the (generic) three-table-model, except the speed.
</sigh>

pengi · 2006-04-22 Reply Admin

That gives me an idea :)

I'm thinking about codeing a database wrapperclass which using another databaseclass as backend with this structure.

Then it's possible to nest this in several steps... :) a database in a database in a database in a database in mysql.

starts codeing

Free · 2006-04-22 Reply Admin

If you look at eCommerce systems like MS Commerce Server, then tend to use mapper-style solutions.

MSCS manages the products table, add and removing columns based on how you define your product properties. If you look at the tables it generates, there is one for common properties (id, mfg#) and one for properties that are culture (language) specific (description). A search system is also provided, but only one join is needed for a search.

It does have limitations, your database must support a lot of fields in the same db if you have a lot of properties.

tufty · 2006-04-22 Reply Admin

ammoQ:
tufty:
<sigh>
We might be able to improve on it a bit by having a bitfield encoding the various features, but you still run up against having to change the table when you run out of space in your bitfield. And it would foul up your queries, too.

A bitfield is denormalized, IMO not a good idea.
</sigh>

Might I humbly suggest you read up on normalised forms? http://en.wikipedia.org/wiki/Database_normalization would be a good place to start. As a hint, pulling a many-many relationship down to attributes on one of the tables is denormalised as well. Exactly as denormalised as storing those values in a bitfield. All that pulling those attributes into a bitfield would change is the representation within the table, not the normalisation of the database structure.

Sure, you might be able to make your queries faster by denormalising, but you can only sensibly denormalise if the relationship being denormalised is static on one end, and even then you lose the benefits of integrity constraints.

Fuck me. I can't believe you suggested dynamically adding columns to a running database as a sensible solution.

Simon

ammoQ · 2006-04-22 Reply Admin

tufty:
ammoQ:
tufty:
<sigh>
We might be able to improve on it a bit by having a bitfield encoding the various features, but you still run up against having to change the table when you run out of space in your bitfield. And it would foul up your queries, too.

A bitfield is denormalized, IMO not a good idea.
</sigh>

Might I humbly suggest you read up on normalised forms? http://en.wikipedia.org/wiki/Database_normalization would be a good place to start.

Good idea...

What is 1NF?

The domain of attribute must include only atomic (simple, indivisible) values.

Given that definition, a bitfield is not 1NF.

As a hint, pulling a many-many relationship down to attributes on one of the tables is denormalised as well. Exactly as denormalised as storing those values in a bitfield. All that pulling those attributes into a bitfield would change is the representation within the table, not the normalisation of the database structure.

I think we disagree wheter a "feature" is an entity or not. IMO, it's not. Each "feature" corresponds to a boolean column. Having a features table and a m:n relationship is a generic data model, by definition.

I can't believe you suggested dynamically adding columns to a running database as a sensible solution.

Why not? I understand that early databases made a copy of the whole table whenever the structure was changed. But that is not the case with modern database systems. It costs nothing to do that.

tufty · 2006-04-22 Reply Admin

ammoQ:
What is 1NF?

The domain of attribute must include only atomic (simple, indivisible) values.

Given that definition, a bitfield is not 1NF.

Did I ever claim it was? All I ever claimed was that it was no less normalised than pulling an n:m relationship down to redundant data in one or other of the tables.

Sensible database design generally starts with (at least) 3NF, which is where you stop having redundant data, and then gets denormalised from there for performance reasons.

I think we disagree wheter a "feature" is an entity or not. IMO, it's not. Each "feature" corresponds to a boolean column. Having a features table and a m:n relationship is a generic data model, by definition.

Depends how you state the problem, really. I would see it as following:

A handset can have an arbitrary number of features
A feature can be implemented in many handsets
The set of features is not static. This is the one that nails it for me, to be honest : a feature may exist without ever being implemented by any handset.

I don't see that there's any question as to whether it's an entity or not, the only question is whether it can be sensibly denormalised. I would argue not.

I understand that early databases made a copy of the whole table whenever the structure was changed. But that is not the case with modern database systems. It costs nothing to do that.

Whether or not you make a copy of the table, there is a cost to adding columns to a table. If you have a significant amount of data, that cost is significant. Like I said, any half-way decent DBA would go ballistic at the mere mention of such a concept as part of the workings of a system.

I'm actually of the opinion that I'm being trolled. If I am, excellent, because it's funny.

Simon

ammoQ · 2006-04-22 Reply Admin

tufty:

(1NF) Did I ever claim it was? All I ever claimed was that it was no less normalised than pulling an n:m relationship down to redundant data in one or other of the tables.

So which normal form is broken by the design I propose?

Sensible database design generally starts with (at least) 3NF, which is where you stop having redundant data, and then gets denormalised from there for performance reasons.

Aggreed.

Depends how you state the problem, really. I would see it as following:
A handset can have an arbitrary number of features
A feature can be implemented in many handsets
The set of features is not static. This is the one that nails it for me, to be honest : a feature may exist without ever being implemented by any handset.
I don't see that there's any question as to whether it's an entity or not, the only question is whether it can be sensibly denormalised. I would argue not.

IMO, the term "feature" is very close to the term "attribute", which indicates a generic data model. Some "features" are relatively concrete (like "Java" or "camera"), but some features are very abstract (like "handicapped accessible"). A feature, by itself, has nearly no attributes.

Whether or not you make a copy of the table, there is a cost to adding columns to a table. If you have a significant amount of data, that cost is significant.

In Oracle: no, it isn't. The cost of adding a column is nearly zero, no matter how many records the table has. In other database systems, that may be different. And well, new features are not added 1000 times a day. Maybe one feature every other month or so.

I'm actually of the opinion that I'm being trolled. If I am, excellent, because it's funny.

When I made that system, I used the three-table model just like you would. But looking back, considering all I have read and learned since, I think the one-table model is the "right thing" to do. It would definitely be the right thing if there were, say, 1000000 handsets to consider. Given a database, of a few hundered, 2000 at most, handsets, even XML would do.
But even if you think you are trolled, consider it an excecise in argueing.

tufty · 2006-04-22 Reply Admin

ammoQ:
tufty:

(1NF) Did I ever claim it was? All I ever claimed was that it was no less normalised than pulling an n:m relationship down to redundant data in one or other of the tables.

So which normal form is broken by the design I propose?

All of them :)

Anyway, I do see your arguments, and it's true that this is a bit of an edge case that can be seen one way or the other. Yes, features are close to attributes, they have very few attributes until you start getting more specific, at which point they stop being "features" as such. For example, you might have a feature of "colour screen", but in reality that breaks down to a set of attributes describing screen capabilities, and depending on application you could model that either as attributes or as a separate entity.

The problem that happens with denormalising is that you are locking yourself into a particular approach, and that changes to the data you've denormalised make all sorts of nasty knock-on effects. Believe me, I've been there. People have a horrible tendency to program to the schema, if that gets fluid all sorts of things tend to get b0rked.

As for cost of adding columns, there's always a non-zero cost, regardless of database. You have to link in extra storage for every row in the table being modified, set up the default value, etc. Not sure about Oracle, not used it for 7 or 8 years, but I sincerely doubt they have optimised it for schema modifications, which are supposed to be a fairly rare event. Anyway, as you say, it's not like new features come out hundreds of times a day, so feh.

But even if you think you are trolled, consider it an excecise in argueing.

And a fun one. Always nice to disagree with people who can be civil :)

Funnily enough, either side of the argument could be considered a WTF.
normalised : WTF? You're just ramming any old shit in that table, they should be attributes.
denormalised : WTF? You change your database schema every time a new phone feature comes out?

Simon

emurphy · 2006-04-22 Reply Admin

Anonymous:

Joel Splosky talks about this, programmers are often like cavemen because we tend to count "zero, one, many". Sometimes two or three is just the right number! This design is just someone chanting "many! many!" and refusing to commit to a practical, concrete, maintanable design.

Actually, I work with one accounting system that allows you to add an unlimited (1) number of user-defined data fields (2) to any (3a) table, and define (3b) how they get copied from one table to another during various procedures. However, the system will only store the data and let you use it in reports/queries, you can't directly muck around with the business logic without getting a programmer (4) involved.

(1) Shorthand for "limit so large that any client wishing to exceed it should really hire a programmer"
(2) Distinguished from standard data fields by a mandatory UDF_ prefix in internal storage
(3) For the new-style modules (currently something like 6, steadily being expanded over the next several years); the old-style modules only have them for a few of the most important tables, and copy amongst pairs of those tables iff both have an identically named UDF
(4) Of some stripe or other; they've recently bolted on provisions for VBScript and COM, to appease the users of same

2006-04-23 Reply Admin

Hey, don't dis lisp!

Look I wrote a GTK+ wrapper and UI design language much like lisp for my 3d modeler:

http://icculus.org/freyja/

Sure I'm the only one that can use the uber customizable UI language, but I only have to edit a few lines compared to dozens upon dozens of lines in C/C++. Also the beauty here is you can call specific C modules functions from mlisp. Ahhh... and I even ported it to C# for giggles. You may not love lisp, but it loves you. Also think of all the games that will be using haskell for next gen systems, and all the older games using lisp before that. =)

2006-04-23 Reply Admin

This anti-pattern is far more common than most people want to admit, the real name for it is "rules engine".

2006-04-23 Reply Admin

This is a serious question. Do people still code in WAP/WML? I remember that being the technology of choice back in what...1999? Is it still around?

2006-04-23 Reply Admin

Anonymous:
This is a serious question. Do people still code in WAP/WML? I remember that being the technology of choice back in what...1999? Is it still around?

Not really. Wireless web development is moving towards XHTML

2006-04-23 Reply Admin

Coo, this almost looks like the way in which the Remedy AR System organises it's database.

2006-04-23 Reply Admin

In a system I worked on a number of years ago, (done in foxpro using native tables) we had a large and very unstable system that the Boss decided needed to be more flexable so a few areas were set aside where clients could customize a form with drag and drop fileds including combos ect.
This then created a randomly named table with radnomly named fields. All fields and had to be able to be reported on through our custom report writer easliy by they client.
This was initially designed to be just a few fields on each page for doing simple surveys of customers. However some time down the track we discovered that we needed to extend the functionality to hadle two tables (foxpro has a limit of I think 256 fields per table or somthing similer) and one client had overflowed this table.
By the time I left the company there were some clients that were into their third table of custom fields. Makes me wonder if they were using any of the functionality that we had actually put into the main programme.

2006-04-23 Reply Admin

<QUOTE>

How, by forcing everybody to learn Lisp in order to use EMACS?

</QUOTE>

Forced to learn Lisp !

Lisp Rules. Every programmer should know lisp, if only so they can complain that every other language is jsut not expressive enough.

granted there is the slight problem that every half decent implementation is either not portabe, not standards complient, lacks library support for what you want to do, has ridiculs license fees (in some cases all four apply).

Should anyone ever solves the above 4 (minor) problems I wouldn't use anything else.

regards

Konrad.

PS: I know I got the CAPTCHA right, it seems that it will never accept a message on the first try.

b1xml2 · 2006-04-23 Reply Admin

Anonymous:
I'm currently working on a web application that is intended to function as an electronic form processor to capture proof-of-claim data. Form metadata is used to define form content, and workflow. So all of the captured form data is stored as key value pairs in a generic "data" table. This allows us to easily add or remove questions from a form without having to add a new column to a data table. It also allows us to store all of the data in single table, rather than requiring a new table for each form, which is problematic because admin users can add a new form at any time. However captured data is the *only* portion of the data model that is generic. The relationships between forms and questions, questionTypes, etc. are all concrete and well-defined.

So I think it's wrong to say that generic data models aren't useful. They *are* useful. However you have to know when they are appropriate.

They *are* useful up to a point and when one wants to perform reasonably complex activities, they cause 2 things to occur:

It takes longer and longer to make substantial code ans structural changes.
It will be extremely costly when the system breaks down altogether and some third party is invited to come in, clean up the mess and implement a proper solution based on time-tested principles and design methodologies.

So, unless your application is for the most trivial of tasks, there will inevitably be a lot of WTF code in your code base to provide workarounds which will in turn require another set of workaround.

Put it simply, applications fail when the people behind them fail, fail to grasp what it takes to write cogent and successful applications.

Point to note: Having ALL your data in one table means you have not understood the concept of using a relational database, plus the associated performance benefits and the decoupling of the UI layer from the data layer. Having everything in ONE TABLE gives me more than enough evidence to indicate that the design and implementation you posit is one big WTF!

2006-04-23 Reply Admin

Anonymous:
<quote></quote>

How, by forcing everybody to learn Lisp in order to use EMACS?

Emacs Lisp: All the beauty of Lisp, without any of that static scoping and tail-recursion silliness!

Really, if you're going to use an editor to evangelize Lisp, at least make it a decent Lisp. Stallman was at MIT at the same time as Guy Steele and talked with him regularly. How did he get it so wrong?

2006-04-23 Reply Admin

Anonymous:
Not really. Wireless web development is moving towards XHTML

Phew, that's a relief. I had one job where I had to code in that ridiculous language and it was obvious that it was going nowhere.

2006-04-24 Reply Admin

Unequivocally yes. It is an excellent application of description logics.

I would recommend using Common Lisp with Loom (a popular knowledge representation system) to construct your Enterprithe Ruleth Engine.

2006-04-24 Reply Admin

The (car ...) and (cdr ...) operators are used for list operations -- in modern lisp programming style they are reserved for destructive operations.

However, common lisp possesses vectors, arrays and hash tables as well.

Though I suspect your post was facetious (and I certainly took it that away), I would like to point you toward the database options available on an "enterprise" common lisp platform:

http://www.franz.com/products/database_tools/

2006-04-24 Reply Admin

Anonymous:
Lisp Rules. Every programmer should know lisp, if only so they can complain that every other language is jsut not expressive enough.

Incidentally, that is all LISP programmers are good for. Infighting. bitching about how superior their language is, and claiming that everyone should use it. Hey, how old is it, 40 years? If it is so awesomely incredible, why isn't everyone using it by now. Why do you get superceded in popularity and industry adoption by just about every other programming paradigm invented since?

I suspect there has to be something wrong either with the language, or the people evangelizing it.

2006-04-24 Reply Admin

VGR:
As I mentioned just a few days ago, this anti-pattern already has a name: Second System Sydrome.

That's where after you've done a simple, direct version of something, you come up with dreams of creating a Grand Unification of Everything, a program that does everything by being totally generic at every level.

Not quite. The "second system effect" is the general problem of over-designing the second-time around to be overly general. The specific WTF here is using a relational database to implement a .. relational database.<o:p></o:p>

<o:p> </o:p>
As an off note, a then-colleague of mine was proposing and promoting something very similar to this around 10 years ago. When asked by other co-workers for my opinions on the idea, the best that I could say was that I had heard of it. <o:p>
</o:p>

Look at it this way: in this design, you are able to create "tables with "fields" having "types" at will using this design. But you can also do that by issuing "CREATE table" statements and the like at runtime. And it would be simpler, and faster, and would take advantage of all kinds of database optimizations. Not to mention that table joins are loads easier.<o:p></o:p>

ammoQ · 2006-04-24 Reply Admin

Anonymous:
Not to mention that table joins are loads easier.<o:p></o:p>

Joins? Hell, even a basic query like "select x,y,z from blablubb where a=1 and b=2 and (c=3 or d=4)" is a pain in the "database on top of a database".

2006-04-24 Reply Admin

xxxxxx

ParkinT · 2006-04-24 Reply Admin

Cooper:
Alex Papadimoulis:

<SNIP>

They're more or less the converse of

<SNIP>

I call it the Inner-Platform Effect.

<SNIP>

Um - excuuse me. KEDS were the ones with the inner platform.

WOW.

You must be really old ! [Y]

DZ-Jay · 2006-04-24 Reply Admin

VGR:
As I mentioned just a few days ago, this anti-pattern already has a name: Second System Sydrome.

That's where after you've done a simple, direct version of something, you come up with dreams of creating a Grand Unification of Everything, a program that does everything by being totally generic at every level.

Although this particular anti-pattern could be driven by the Second System Effect, it is not necessarily so. The Second System Effect, as theorized by Fred Brooks, does not necessarily involve a program that "does everything by being totally generic at every level", but a program that contains everything the designers, programmers, marketeers, managers, users, and all involved ever wanted to include in the project, but felt restricted in the first release. Its more akin to feature-creep than to generalization, by lacking in the pragmatic restraint that is usually practiced when an inexperienced group is creating something completely new for the first time.

You see the same thing with most movie sequels, where more money is spent in making everything bigger, louder, flashier, fancier, without regards to what made the original so successful without such grandious excesses.

-dZ.

2006-04-24 Reply Admin

FYI, I first heard of this page on the wikipedia page about antipatterns.

http://en.wikipedia.org/wiki/Antipattern

...and my world has never been the same.

2006-04-24 Reply Admin

Point to note: Having ALL your data in one table means you have not understood the concept of using a relational database, plus the associated performance benefits and the decoupling of the UI layer from the data layer. Having everything in ONE TABLE gives me more than enough evidence to indicate that the design and implementation you posit is one big WTF!

Did you read what I said? I said all of the captured form data is in one table. That's a very small subset of all the data in the system. All of the metadata for defining a form (and proof-of-claim) is 3NF spread across some 25-odd tables. If I'd tried to create table for each form with one column per question, it would have failed to meet several business requirements. It would have been a maintenance nightmare, in that every time a user updated a form layout (adding or deleting questions, changing flow), the system would have to have generated a DBA request which could take 5-10 business days to be completed.

What happens when the user decides to drop a question from a table and that column has data? Legally we can't just delete their data as it pertains to a proof-of-claim for an insurance company in liquidation. With my system, we can just remove the question from the form. Existing data will be retained, and life goes on. With a dynamic DDL system, you're hosed. You can't drop the column because it contains data that you have a legal obligation to retain. So you basically have to disallow removal of questions from forms, which means you don't meet one of the core business requirements that was specified at the outset.

The actual table for storing the data looks something like:

create table question_value
(
question_value_id numeric(10) primary key
, poc_id numeric(10) not null
, question_id numeric(10) not null
, receiver_value varchar2(1000)
, user_value varchar2(1000)
, reason varchar2(1000)
, instance numeric(10) not null
, constraint fk1_question_value foreign key (poc_id) references poc(poc_id)
, constraint fk2_question_value foreign key (question_id) references question(question_id)
);

I don't see a problem with this structure. Querying is simple. Want to find out how a user answered the question named 'claimant_type'?

select user_value
from question_value qv, question q
where
qv.question_id = q.question_id
and qv.poc_id = ?
and q.name = 'claimant_type'
;

Want to know how many of each claimant_type have been captured?

select distinct(user_value), count(*)
from question_value qv, question q
where
qv.question_id = q.question_id
and q.name = 'claimant_type'
group by user_value
;

Seriously I don't see a problem with storing answers made to questions in a single table with an 'value' field. It flows directly from the entity model that was created for the business rules. Questions can be added to and deleted from forms easily (IE without having to do a DBA-request), querying is fast, and it has the flexibility to handle arbitrary questions on each form, and it's transaction safe.

I don't see how you can claim it's a WTF

ammoQ · 2006-04-24 Reply Admin

Dave:

What happens when the user decides to drop a question from a table and that column has data? Legally we can't just delete their data as it pertains to a proof-of-claim for an insurance company in liquidation. With my system, we can just remove the question from the form. Existing data will be retained, and life goes on. With a dynamic DDL system, you're hosed. You can't drop the column because it contains data that you have a legal obligation to retain. So you basically have to disallow removal of questions from forms, which means you don't meet one of the core business requirements that was specified at the outset.

Sorry Dave, but this argument is pathetic. Nothing, absolutely nothing prevents a DDL system from keeping the column while hiding the question. One issue to consider with a DDL system is the limit of columns per table.

2006-04-24 Reply Admin

ammoQ:
Dave:

What happens when the user decides to drop a question from a table and that column has data? Legally we can't just delete their data as it pertains to a proof-of-claim for an insurance company in liquidation. With my system, we can just remove the question from the form. Existing data will be retained, and life goes on. With a dynamic DDL system, you're hosed. You can't drop the column because it contains data that you have a legal obligation to retain. So you basically have to disallow removal of questions from forms, which means you don't meet one of the core business requirements that was specified at the outset.

Sorry Dave, but this argument is pathetic. Nothing, absolutely nothing prevents a DDL system from keeping the column while hiding the question. One issue to consider with a DDL system is the limit of columns per table.

Dynamically defined data is sometimes the right way to do some things, particularly in cases like what Dave mentioned where the set of values captured changes a lot. It has serious drawbacks with regards to efficiency and ease of querying, but sometimes it is the right way.<o:p></o:p>

<o:p> </o:p>

However the original WTF is doing the *entire database* this way.<o:p></o:p>

2006-04-24 Reply Admin

Sorry Dave, but this argument is pathetic. Nothing, absolutely nothing prevents a DDL system from keeping the column while hiding the question. One issue to consider with a DDL system is the limit of columns per table.

Yes, you can simply hide the column but then, you have an even bigger problem is that your data model no longer accurately reflects the state of the system. One of the advantages of using DDL to create a table to represent a form over using a generic data model, is that there is a 1-1 correspondence between questions that appear on the form and columns in the table. When you break that correspondence by introducing columns that don't have questions on the form you lose this advantage.

Second, by constraining yourself to a 1-1 relationship between column and question, you disallow the scenarios where questions may appear multiple times on a form. For instance you might have a list of beneficiaries on a form: beneficiary first name, beneficiary last name. If you want to support multiple instances of this question on a single form you have to use some sort of name-mangling scheme, or even worse, use a generated name for the column and do a question->column mapping through a lookup table.

Once you start adding in all that crap to handle questions that were removed, multiple instances of questions, etc, you lose almost all of the advantages of having a DDL-based system, and indeed end up in a worse position from a maintenance/complexity standpoint. Because of the question->column mapping, you can't directly query the tables in any meaningful way, and you have to design a system for maintaining that metadata.

2006-04-24 Reply Admin

It has serious drawbacks with regards to efficiency and ease of querying, but sometimes it is the right way.

Can you give a few examples of these deficiencies?

Gene Wirchenko · 2006-04-24 Reply Admin

Anonymous:
First of all - the real WTF here is - you can't use Firefox to write messages in thedailywtf.com.

I do not use anything else but Firefox to post messages here.

Others have posted similarly.

Sincerely,

Gene Wirchenko

Grovesy · 2006-04-24 Reply Admin

Bit of a spin on First Normal Form there!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>

C.J Date. An Introduction to Database Systems (Eighth Edition). Page 358<o:p></o:p>

A relvar is in 1NF and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute. <o:p></o:p>

This basically states that ever relvar is in 1NF. It is perfectly legal to define attributes to be of type ‘Xml Document’, vector, 3D Point, or even another relation, the Relvar will be in 1NF as long as the value in every tuple is legal for the attribute-type. (Which is one of the basis of the Null = Prohibited argument, a relvar with an attribute of type INTEGER will only contain legal INTEGER values in every tuple for that attribute. Null is not an Integer and thus any tuple contain a Null marker is not in 1NF) <o:p></o:p>

Anyway, whether defining an attribute to be of a type ‘Xml Document’ or a vector is good design is another thing. <o:p></o:p>

Bear in mind a character string, or even an Integer are still reducible!<o:p></o:p>

felix · 2006-04-24 Reply Admin

RevEng:
If a customer wants a CMS that's completely generic and customizable, give them SQL (or your DB of choice). If they want a program that can perform anything they ask of it, give them a compiler. You can't just slap a pretty GUI on something and expect it to be easier and just as powerful and you can't expect to write a large application without knowing anything about programming.

With knowledge comes power, and with power comes responsibility. That's why we learn all these damn languages in the first place. I take it as an insult (and complete lack of understanding) for any customer to ask for a program to replace the job of the programmer.

I couldn't agree more. Unfortunately, there's a big BUT. You see, the kind of customers who want their programs to do everything don't want to learn how to program. Actually, they don't want to learn anything. That's the whole point, those people aren't looking for a tool, but a secretary. Preferably one who can guess their thoughts :D

felix · 2006-04-24 Reply Admin

Anonymous:

By the time I left the company there were some clients that were into their third table of custom fields. Makes me wonder if they were using any of the functionality that we had actually put into the main programme.

Of course they weren't. See my previous post ("Functionality? What functionality? I just want the software to... what do you mean it already does? Oh, the manual... I haven't read it."). They never do :(

And by the way, what the heck were they doing with over 512 fields?

ammoQ · 2006-04-24 Reply Admin

Dave:
Sorry Dave, but this argument is pathetic. Nothing, absolutely nothing prevents a DDL system from keeping the column while hiding the question. One issue to consider with a DDL system is the limit of columns per table.

Yes, you can simply hide the column but then, you have an even bigger problem is that your data model no longer accurately reflects the state of the system. One of the advantages of using DDL to create a table to represent a form over using a generic data model, is that there is a 1-1 correspondence between questions that appear on the form and columns in the table. When you break that correspondence by introducing columns that don't have questions on the form you lose this advantage.

This "advantage" is so tiny that it isn't worth to mention it, IMO.

Second, by constraining yourself to a 1-1 relationship between column and question, you disallow the scenarios where questions may appear multiple times on a form. For instance you might have a list of beneficiaries on a form: beneficiary first name, beneficiary last name. If you want to support multiple instances of this question on a single form you have to use some sort of name-mangling scheme, or even worse, use a generated name for the column and do a question->column mapping through a lookup table.

Aggreed, 1:n relationships should be treated as such: with a second table. But, well, this could be done dynamically as well; especially if this second table has more than one non-key column.

Once you start adding in all that crap to handle questions that were removed, multiple instances of questions, etc, you lose almost all of the advantages of having a DDL-based system, and indeed end up in a worse position from a maintenance/complexity standpoint. Because of the question->column mapping, you can't directly query the tables in any meaningful way, and you have to design a system for maintaining that metadata.

As long as I do it correctly, I do not use the most important advantage of using the database as it should be used: Performance.

It has serious drawbacks with regards to efficiency and ease of querying, but sometimes it is the right way.
Can you give a few examples of these deficiencies?

Query: List owners of a house have more than one SUV car and more than 2 sons born between 1990 and 2000?
Database used as it should be:

select first_name, last_name, email from persons
 where home_type = 'HOUSE'
   and owns_home = 'Y'
   and 2<=(select count(*) from cars
             where owner_id=persons.id
               and car_type='SUV')
   and 2<(select count(*) from childs
             where parent_id=persons.id
               and gender='M'
               and birtdate between to_date('1990-01-01', 'yyyy-mm-dd')
                                and to_date('2000-12-31', 'yyyy-mm-dd'));

Now let's see how this looks like with the generic data modell...

select fnv.user_value as first_name, lnv.user_value as last_name, ev.user_value as email
  from question fnq, question_value fnv,
       question lnq, question_value lnv,
       question eq, question_value ev,
       question htq, question_value htv,
       question ohq, question_value ohv
   where fnv.question_id = fnq.question_id and fnq.name='first_name'
     and lnv.question_id = lnq.question_id and lnq.name='first_name'
     and ev.question_id = eq.question_id and eq.name='email'
     and htv.quesion_id=htq.question_id and htq.name='home_type'
     and ohv.qustion_id=ohq.question_id and ohq.name='owns_home'
     and fnv.poc_id = lnv.poc_id
     and fnv.poc_id = ev.poc_id
     and fnv.poc_id = htv.poc_id
     and fnv.poc_id = ohv.poc_id
     and htv.user_value='HOUSE'
     and ohv.user_value='Y'
     and 2<= (select count(*) from question cq, question_value cv
                where cv.question_id = cq.question_id and cq.name='car_type'
                  and cv.user_value = 'SUV'
                  and fnv.poc_id = cv.poc_id)
     and 2< (select count(*) from question cgq, question_value cgv,
                                  question cbq, question_value cbv 
               where cgv.question_id = cgq.question_id and cgq.name='child_gender'
                 and cbv.question_id = cbq.question_id and cbq.name='child_birthday'
                 and fnv.poc_id = cgv.poc_id
                 and cgv.poc_id = cbv.poc_id
                 and cgv.instance = cbv.instance
                 and cgv.value = 'M'
                 and cbv.value between '19900101' and '20001231');

Maurits · 2006-04-24 Reply Admin

ammoQ:

Query: List owners of a house have more than one SUV car and more than 2 sons born between 1990 and 2000?
Database used as it should be:

select first_name, last_name, email from persons
 where home_type = 'HOUSE'
   and owns_home = 'Y'
   and 2<=(select count(*) from cars
             where owner_id=persons.id
               and car_type='SUV')
   and 2<(select count(*) from childs
             where parent_id=persons.id
               and gender='M'
               and birtdate between to_date('1990-01-01', 'yyyy-mm-dd')
                                and to_date('2000-12-31', 'yyyy-mm-dd'));

Good example... but the join is perhaps a little much. Here's a join-less version:

select
*
from
persons
where
home_type = 'HOUSE' and
owns_home = 'Y' and
id in
(

select
cars.owner_id
from
cars
where
cars.car_type = 'SUV'
group by
cars.owner_id
having
count(*) >= 2
) and
id in
(

select
child.parent_id
from
persons as child
where
child.gender = 'M' and
child.birthdate between
to_date('1990-01-01', 'yyyy-mm-dd') and
to_date('2000-12-31', 'yyyy-mm-dd')
group by
child.parent_id
having
count(*) > 2
);

The Inner-Platform Effect

Leave a comment on “The Inner-Platform Effect”