- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Not exactly a new antipattern...
"Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."
- Greenspun's Tenth Rule
"...including Common Lisp."
- RTM's Corollary to Greenspun's Tenth Rule
Admin
Now, this is only partly true. Take a reporting system for example. It's sole use is to gather data from other systems and produce reports and graphs in various formats, and display them via various devices (send them in email or MMS)
You don't want to fiddle with the code every time a new report comes in. Obviously you want to use some dinamic sql to create a new report or to handle a new data source which in turn obviously requires someone who can construct those sql statements but one who doesn't have to know JAVA or the intricacies of the actual system.
That's cool and it's working. Now, the posted example is neither cool nor working. :)
Admin
Actually, they're not that long...
http://lib.store.yahoo.net/lib/paulgraham/jmc.lisp
Admin
Funny, I had to write exactly the same program some years ago. :)
Admin
PHP example: eval($code);
bash scripting: eval "$code";
(insert other scripting languages here)!
Admin
You miss the important difference between "generic data model" and "dynamic sql". There is nothing wrong about a system that lets the user create reports based on the existing database.
Admin
Correctly normalised doesn't always mean "easy to query". But it does mean you don't have to add a new column and the corresponding code to handle it every time a new feature comes out.
Simon
Admin
You kinda missed the point, which actually was to implement
eval
, or even an REPL, without using any built-in eval function. In Lisp, that can be done in a dozen of lines (and that's actually how the first Lisp interpreter was built, more or less)Usual Eval functions in later languages makes supplementary calls to the whole interpreter, it doesn't interpret the code by itself.
Which is fine, but shows the simplicity of Lisp.
Admin
Adding a column is easy. That's what databases are made for. Changing the code is not necessary - just use dynamic SQL to make the appropriate query.
Admin
Um - excuuse me. KEDS were the ones with the inner platform.
Admin
<sigh>
Okay, let's go through this again as I understand it.
A phone has a number of features. The set of features a phone could have is not fixed, and indeed, could be described as "very flexible"; manufacturers come out with new features all the time. So, we have 2 options:
Firstly we could model the phone as one entity and the feature as another. Something like this:
<font size="2">create table handset (
id int not null unique primary key,
name varchar(255)
manufacturer_id int not null references manufacturer(id) on delete cascade
)
create table function (
id int not null unique primary key,
name varchar(255)
)
create table handset_function (
handset_id int references handset(id) on delete cascade,
function_id int references function(id) on delete cascade
)
</font>
Now, that allows us to add arbitrary functions and map them to arbitrary handsets, on the fly, without changing the database structure, but it's a bit of a bugger to do big arbitrary queries on.
Alternatively
<font style="font-family: Courier New;" size="2">create table handset (
id int not null unique primary key,
name varchar(255)
manufacturer_id int not null references manufacturer(id) on delete cascade,
has_wap smallint default null,
has_downloadable_ringtones smallint default null,
etc etc
)
</font>
Which is easily queried, but every time a new function comes out you need to change the structure of the table. Dunno who you've worked for in the past, but the DBAs where I worked would have had me hung, drawn and quartered for such a suggestion.
We might be able to improve on it a bit by having a bitfield encoding the various features, but you still run up against having to change the table when you run out of space in your bitfield. And it would foul up your queries, too.
Then there's the code. Sure, your query can be almost fully automated using dynamic SQL (although dynamically generated SQL is likely to be an awful lot less efficient than hand-crafted queries), but at some point you're going to have to ask your users what features a phone has, and they are going to want to query based on those features. And unless you want to be providing actual column names to the users, you want some sort of human-readable name to be displayed. Which means that yes, you are going to have to change your code, if only to map handset.predictive_texting to "Predictive Texting". Yes, there's ways around that too, but you start reaching the point where it's more complex and difficult than doing it right.
This is not to say that the denormalised approach would not be a reasonable one were the feature set fixed, just that as you described it, it would be more work than using normalised data and spending some time on your queries.
As it stands, you're probably going to want to use UNION, INTERSECT and EXCEPT.
Simon
Admin
That's not a technical problem. It's a DBA problem. The only drawback of an alter table is (at least in Oracle) that stored procedures referencing that table will recompile.
A bitfield is denormalized, IMO not a good idea.
In that case, the dynamically generated query is very simple if we have one table, and a lot more complex if we have three tables. Since we don't know in advance which featureset users will ask for, it's dynamic SQL anyway.
I see no problem if the program automatically adds the necessary column whenever a new feature comes; there are several ways to keep the human readable names; and of course the users will make their queries in a query-by-example fashion.
Or, to put it in other words, from the outside, the users see no difference between the one-table-model and the (generic) three-table-model, except the speed.
</sigh>
Admin
That gives me an idea :)
I'm thinking about codeing a database wrapperclass which using another databaseclass as backend with this structure.
Then it's possible to nest this in several steps... :) a database in a database in a database in a database in mysql.
starts codeing
Admin
If you look at eCommerce systems like MS Commerce Server, then tend to use mapper-style solutions.
MSCS manages the products table, add and removing columns based on how you define your product properties. If you look at the tables it generates, there is one for common properties (id, mfg#) and one for properties that are culture (language) specific (description). A search system is also provided, but only one join is needed for a search.
It does have limitations, your database must support a lot of fields in the same db if you have a lot of properties.
Admin
Might I humbly suggest you read up on normalised forms? http://en.wikipedia.org/wiki/Database_normalization would be a good place to start. As a hint, pulling a many-many relationship down to attributes on one of the tables is denormalised as well. Exactly as denormalised as storing those values in a bitfield. All that pulling those attributes into a bitfield would change is the representation within the table, not the normalisation of the database structure.
Sure, you might be able to make your queries faster by denormalising, but you can only sensibly denormalise if the relationship being denormalised is static on one end, and even then you lose the benefits of integrity constraints.
Fuck me. I can't believe you suggested dynamically adding columns to a running database as a sensible solution.
Simon
Admin
Good idea...
What is 1NF?
Given that definition, a bitfield is not 1NF.
I think we disagree wheter a "feature" is an entity or not. IMO, it's not. Each "feature" corresponds to a boolean column. Having a features table and a m:n relationship is a generic data model, by definition.
Why not? I understand that early databases made a copy of the whole table whenever the structure was changed. But that is not the case with modern database systems. It costs nothing to do that.
Admin
Sensible database design generally starts with (at least) 3NF, which is where you stop having redundant data, and then gets denormalised from there for performance reasons.
Depends how you state the problem, really. I would see it as following:
- A handset can have an arbitrary number of features
- A feature can be implemented in many handsets
- The set of features is not static. This is the one that nails it for me, to be honest : a feature may exist without ever being implemented by any handset.
I don't see that there's any question as to whether it's an entity or not, the only question is whether it can be sensibly denormalised. I would argue not.Whether or not you make a copy of the table, there is a cost to adding columns to a table. If you have a significant amount of data, that cost is significant. Like I said, any half-way decent DBA would go ballistic at the mere mention of such a concept as part of the workings of a system.
I'm actually of the opinion that I'm being trolled. If I am, excellent, because it's funny.
Simon
Admin
So which normal form is broken by the design I propose?
Aggreed.
IMO, the term "feature" is very close to the term "attribute", which indicates a generic data model. Some "features" are relatively concrete (like "Java" or "camera"), but some features are very abstract (like "handicapped accessible"). A feature, by itself, has nearly no attributes.
In Oracle: no, it isn't. The cost of adding a column is nearly zero, no matter how many records the table has. In other database systems, that may be different. And well, new features are not added 1000 times a day. Maybe one feature every other month or so.
When I made that system, I used the three-table model just like you would. But looking back, considering all I have read and learned since, I think the one-table model is the "right thing" to do. It would definitely be the right thing if there were, say, 1000000 handsets to consider. Given a database, of a few hundered, 2000 at most, handsets, even XML would do.
But even if you think you are trolled, consider it an excecise in argueing.
Admin
All of them :)
Anyway, I do see your arguments, and it's true that this is a bit of an edge case that can be seen one way or the other. Yes, features are close to attributes, they have very few attributes until you start getting more specific, at which point they stop being "features" as such. For example, you might have a feature of "colour screen", but in reality that breaks down to a set of attributes describing screen capabilities, and depending on application you could model that either as attributes or as a separate entity.
The problem that happens with denormalising is that you are locking yourself into a particular approach, and that changes to the data you've denormalised make all sorts of nasty knock-on effects. Believe me, I've been there. People have a horrible tendency to program to the schema, if that gets fluid all sorts of things tend to get b0rked.
As for cost of adding columns, there's always a non-zero cost, regardless of database. You have to link in extra storage for every row in the table being modified, set up the default value, etc. Not sure about Oracle, not used it for 7 or 8 years, but I sincerely doubt they have optimised it for schema modifications, which are supposed to be a fairly rare event. Anyway, as you say, it's not like new features come out hundreds of times a day, so feh.
And a fun one. Always nice to disagree with people who can be civil :)
Funnily enough, either side of the argument could be considered a WTF.
normalised : WTF? You're just ramming any old shit in that table, they should be attributes.
denormalised : WTF? You change your database schema every time a new phone feature comes out?
Simon
Admin
Actually, I work with one accounting system that allows you to add an unlimited (1) number of user-defined data fields (2) to any (3a) table, and define (3b) how they get copied from one table to another during various procedures. However, the system will only store the data and let you use it in reports/queries, you can't directly muck around with the business logic without getting a programmer (4) involved.
(1) Shorthand for "limit so large that any client wishing to exceed it should really hire a programmer"
(2) Distinguished from standard data fields by a mandatory UDF_ prefix in internal storage
(3) For the new-style modules (currently something like 6, steadily being expanded over the next several years); the old-style modules only have them for a few of the most important tables, and copy amongst pairs of those tables iff both have an identically named UDF
(4) Of some stripe or other; they've recently bolted on provisions for VBScript and COM, to appease the users of same
Admin
Hey, don't dis lisp!
Look I wrote a GTK+ wrapper and UI design language much like lisp for my 3d modeler:
http://icculus.org/freyja/
Sure I'm the only one that can use the uber customizable UI language, but I only have to edit a few lines compared to dozens upon dozens of lines in C/C++. Also the beauty here is you can call specific C modules functions from mlisp. Ahhh... and I even ported it to C# for giggles. You may not love lisp, but it loves you. Also think of all the games that will be using haskell for next gen systems, and all the older games using lisp before that. =)
Admin
This anti-pattern is far more common than most people want to admit, the real name for it is "rules engine".
Admin
This is a serious question. Do people still code in WAP/WML? I remember that being the technology of choice back in what...1999? Is it still around?
Admin
Not really. Wireless web development is moving towards XHTML
Admin
Coo, this almost looks like the way in which the Remedy AR System organises it's database.
Admin
In a system I worked on a number of years ago, (done in foxpro using native tables) we had a large and very unstable system that the Boss decided needed to be more flexable so a few areas were set aside where clients could customize a form with drag and drop fileds including combos ect.
This then created a randomly named table with radnomly named fields. All fields and had to be able to be reported on through our custom report writer easliy by they client.
This was initially designed to be just a few fields on each page for doing simple surveys of customers. However some time down the track we discovered that we needed to extend the functionality to hadle two tables (foxpro has a limit of I think 256 fields per table or somthing similer) and one client had overflowed this table.
By the time I left the company there were some clients that were into their third table of custom fields. Makes me wonder if they were using any of the functionality that we had actually put into the main programme.
Admin
Admin
They *are* useful up to a point and when one wants to perform reasonably complex activities, they cause 2 things to occur:
- It takes longer and longer to make substantial code ans structural changes.
- It will be extremely costly when the system breaks down altogether and some third party is invited to come in, clean up the mess and implement a proper solution based on time-tested principles and design methodologies.
So, unless your application is for the most trivial of tasks, there will inevitably be a lot of WTF code in your code base to provide workarounds which will in turn require another set of workaround.Put it simply, applications fail when the people behind them fail, fail to grasp what it takes to write cogent and successful applications.
Point to note: Having ALL your data in one table means you have not understood the concept of using a relational database, plus the associated performance benefits and the decoupling of the UI layer from the data layer. Having everything in ONE TABLE gives me more than enough evidence to indicate that the design and implementation you posit is one big WTF!
Admin
Emacs Lisp: All the beauty of Lisp, without any of that static scoping and tail-recursion silliness!
Really, if you're going to use an editor to evangelize Lisp, at least make it a decent Lisp. Stallman was at MIT at the same time as Guy Steele and talked with him regularly. How did he get it so wrong?
Admin
Phew, that's a relief. I had one job where I had to code in that ridiculous language and it was obvious that it was going nowhere.
Admin
Unequivocally yes. It is an excellent application of description logics.
I would recommend using Common Lisp with Loom (a popular knowledge representation system) to construct your Enterprithe Ruleth Engine.
Admin
The (car ...) and (cdr ...) operators are used for list operations -- in modern lisp programming style they are reserved for destructive operations.
However, common lisp possesses vectors, arrays and hash tables as well.
Though I suspect your post was facetious (and I certainly took it that away), I would like to point you toward the database options available on an "enterprise" common lisp platform:
http://www.franz.com/products/database_tools/
Admin
Incidentally, that is all LISP programmers are good for. Infighting. bitching about how superior their language is, and claiming that everyone should use it. Hey, how old is it, 40 years? If it is so awesomely incredible, why isn't everyone using it by now. Why do you get superceded in popularity and industry adoption by just about every other programming paradigm invented since?
I suspect there has to be something wrong either with the language, or the people evangelizing it.
Admin
Not quite. The "second system effect" is the general problem of over-designing the second-time around to be overly general. The specific WTF here is using a relational database to implement a .. relational database.<o:p></o:p>
<o:p> </o:p>
As an off note, a then-colleague of mine was proposing and promoting something very similar to this around 10 years ago. When asked by other co-workers for my opinions on the idea, the best that I could say was that I had heard of it. <o:p>
</o:p>
Look at it this way: in this design, you are able to create "tables with "fields" having "types" at will using this design. But you can also do that by issuing "CREATE table" statements and the like at runtime. And it would be simpler, and faster, and would take advantage of all kinds of database optimizations. Not to mention that table joins are loads easier.<o:p></o:p>
Admin
Joins? Hell, even a basic query like "select x,y,z from blablubb where a=1 and b=2 and (c=3 or d=4)" is a pain in the "database on top of a database".
Admin
xxxxxx
Admin
WOW.
You must be really old ! [Y]
Admin
Although this particular anti-pattern could be driven by the Second System Effect, it is not necessarily so. The Second System Effect, as theorized by Fred Brooks, does not necessarily involve a program that "does everything by being totally generic at every level", but a program that contains everything the designers, programmers, marketeers, managers, users, and all involved ever wanted to include in the project, but felt restricted in the first release. Its more akin to feature-creep than to generalization, by lacking in the pragmatic restraint that is usually practiced when an inexperienced group is creating something completely new for the first time.
You see the same thing with most movie sequels, where more money is spent in making everything bigger, louder, flashier, fancier, without regards to what made the original so successful without such grandious excesses.
-dZ.
Admin
FYI, I first heard of this page on the wikipedia page about antipatterns.
http://en.wikipedia.org/wiki/Antipattern
...and my world has never been the same.
Admin
Did you read what I said? I said all of the captured form data is in one table. That's a very small subset of all the data in the system. All of the metadata for defining a form (and proof-of-claim) is 3NF spread across some 25-odd tables. If I'd tried to create table for each form with one column per question, it would have failed to meet several business requirements. It would have been a maintenance nightmare, in that every time a user updated a form layout (adding or deleting questions, changing flow), the system would have to have generated a DBA request which could take 5-10 business days to be completed.
What happens when the user decides to drop a question from a table and that column has data? Legally we can't just delete their data as it pertains to a proof-of-claim for an insurance company in liquidation. With my system, we can just remove the question from the form. Existing data will be retained, and life goes on. With a dynamic DDL system, you're hosed. You can't drop the column because it contains data that you have a legal obligation to retain. So you basically have to disallow removal of questions from forms, which means you don't meet one of the core business requirements that was specified at the outset.
The actual table for storing the data looks something like:
create table question_value
(
question_value_id numeric(10) primary key
, poc_id numeric(10) not null
, question_id numeric(10) not null
, receiver_value varchar2(1000)
, user_value varchar2(1000)
, reason varchar2(1000)
, instance numeric(10) not null
, constraint fk1_question_value foreign key (poc_id) references poc(poc_id)
, constraint fk2_question_value foreign key (question_id) references question(question_id)
);
I don't see a problem with this structure. Querying is simple. Want to find out how a user answered the question named 'claimant_type'?
select user_value
from question_value qv, question q
where
qv.question_id = q.question_id
and qv.poc_id = ?
and q.name = 'claimant_type'
;
Want to know how many of each claimant_type have been captured?
select distinct(user_value), count(*)
from question_value qv, question q
where
qv.question_id = q.question_id
and q.name = 'claimant_type'
group by user_value
;
Seriously I don't see a problem with storing answers made to questions in a single table with an 'value' field. It flows directly from the entity model that was created for the business rules. Questions can be added to and deleted from forms easily (IE without having to do a DBA-request), querying is fast, and it has the flexibility to handle arbitrary questions on each form, and it's transaction safe.
I don't see how you can claim it's a WTF
Admin
Sorry Dave, but this argument is pathetic. Nothing, absolutely nothing prevents a DDL system from keeping the column while hiding the question. One issue to consider with a DDL system is the limit of columns per table.
Admin
Dynamically defined data is sometimes the right way to do some things, particularly in cases like what Dave mentioned where the set of values captured changes a lot. It has serious drawbacks with regards to efficiency and ease of querying, but sometimes it is the right way.<o:p></o:p>
<o:p> </o:p>
However the original WTF is doing the *entire database* this way.<o:p></o:p>
Admin
Yes, you can simply hide the column but then, you have an even bigger problem is that your data model no longer accurately reflects the state of the system. One of the advantages of using DDL to create a table to represent a form over using a generic data model, is that there is a 1-1 correspondence between questions that appear on the form and columns in the table. When you break that correspondence by introducing columns that don't have questions on the form you lose this advantage.
Second, by constraining yourself to a 1-1 relationship between column and question, you disallow the scenarios where questions may appear multiple times on a form. For instance you might have a list of beneficiaries on a form: beneficiary first name, beneficiary last name. If you want to support multiple instances of this question on a single form you have to use some sort of name-mangling scheme, or even worse, use a generated name for the column and do a question->column mapping through a lookup table.
Once you start adding in all that crap to handle questions that were removed, multiple instances of questions, etc, you lose almost all of the advantages of having a DDL-based system, and indeed end up in a worse position from a maintenance/complexity standpoint. Because of the question->column mapping, you can't directly query the tables in any meaningful way, and you have to design a system for maintaining that metadata.
Admin
Can you give a few examples of these deficiencies?
Admin
I do not use anything else but Firefox to post messages here.
Others have posted similarly.
Sincerely,
Gene Wirchenko
Admin
Bit of a spin on First Normal Form there!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
C.J Date. An Introduction to Database Systems (Eighth Edition). Page 358<o:p></o:p>
A relvar is in 1NF and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute. <o:p></o:p>
This basically states that ever relvar is in 1NF. It is perfectly legal to define attributes to be of type ‘Xml Document’, vector, 3D Point, or even another relation, the Relvar will be in 1NF as long as the value in every tuple is legal for the attribute-type. (Which is one of the basis of the Null = Prohibited argument, a relvar with an attribute of type INTEGER will only contain legal INTEGER values in every tuple for that attribute. Null is not an Integer and thus any tuple contain a Null marker is not in 1NF) <o:p></o:p>
Anyway, whether defining an attribute to be of a type ‘Xml Document’ or a vector is good design is another thing. <o:p></o:p>
Bear in mind a character string, or even an Integer are still reducible!<o:p></o:p>
Admin
I couldn't agree more. Unfortunately, there's a big BUT. You see, the kind of customers who want their programs to do everything don't want to learn how to program. Actually, they don't want to learn anything. That's the whole point, those people aren't looking for a tool, but a secretary. Preferably one who can guess their thoughts :D
Admin
Of course they weren't. See my previous post ("Functionality? What functionality? I just want the software to... what do you mean it already does? Oh, the manual... I haven't read it."). They never do :(
And by the way, what the heck were they doing with over 512 fields?
Admin
This "advantage" is so tiny that it isn't worth to mention it, IMO.
Aggreed, 1:n relationships should be treated as such: with a second table. But, well, this could be done dynamically as well; especially if this second table has more than one non-key column.
As long as I do it correctly, I do not use the most important advantage of using the database as it should be used: Performance.
Query: List owners of a house have more than one SUV car and more than 2 sons born between 1990 and 2000?
Database used as it should be:
Now let's see how this looks like with the generic data modell...
Admin
Good example... but the join is perhaps a little much. Here's a join-less version:
select
*
from
persons
where
home_type = 'HOUSE' and
owns_home = 'Y' and
id in
(
select
cars.owner_id
from
cars
where
cars.car_type = 'SUV'
group by
cars.owner_id
having
count(*) >= 2
) and
id in
(
select
child.parent_id
from
persons as child
where
child.gender = 'M' and
child.birthdate between
to_date('1990-01-01', 'yyyy-mm-dd') and
to_date('2000-12-31', 'yyyy-mm-dd')
group by
child.parent_id
having
count(*) > 2
);