The Daily WTF: Curious Perversions in Information Technology

Maurits · 2006-04-24 Reply Admin

It had tabs when I posted, I swear :( :( :(

The real WTF is (etc., etc.)

ammoQ · 2006-04-25 Reply Admin

Maurits:

Good example... but the join is perhaps a little much.  Here's a join-less version:

My version didn't use joins, either; just another type of subquery. And I think my version is faster ;-)

2006-04-25 Reply Admin

This reminds me of the Microsoft Installer (MSI), which prides itself that it is using "relational database technology", but which is basically a large bunch of enormous database tables trying to store programs (whereby the order of the program is determined by a column with line numbers). And text. And pictures. And dialogs with layout. And generic logic describing interdependencies between subsets of the to-be-installed package. Aaaargh! The full horror will become clear if you download the SDK (link is on the Wikipedia page) and look inside the .MSI files with the supplied "Orca" program (a database editor).

2006-04-25 Reply Admin

Anonymous:
(link is on the Wikipedia page)

Of course the link disappeared even though the dialog for this reply shows it again in the URL field. It is http://en.wikipedia.org/wiki/Microsoft_Installer .

2006-04-25 Reply Admin

That sounds scarily like a Medical General Practice Management software company based in Hamilton that I worked for for a while....

Maurits · 2006-04-25 Reply Admin

ammoQ:
Maurits:
Good example... but the join is perhaps a little much.  Here's a join-less version:
My version didn't use joins, either; just another type of subquery. And I think my version is faster ;-)

s/join/correlated subquery/

2006-04-26 Reply Admin

ammoQ:

This "advantage" is so tiny that it isn't worth to mention it, IMO.

I disagree. Documentation is critically important for maintenance, and the more closely your application conforms to programmer expectations, and the fewer hoops you force them to jump through, the better. Having columns in a table that are unused by the application is confusing.

As long as I do it correctly, I do not use the most important advantage of using the database as it should be used: Performance.

Actually, in this environment, you lose performance. The 5-10 business day turn-around on DBA requests will absolutely *kill* performance, and since the application has no privileges to modify or create tables, you'll have to generate DBA requests.

Query: List owners of a house have more than one SUV car and more than 2 sons born between 1990 and 2000?

I'll concede that for this type of querying, the model is sub-optimal. However the application does not need to support this type of querying because all we do is collect the data, and then send it to an insurance company. The insurance company can store the data however it is appropriate for their needs.

Now let's see how this looks like with the generic data modell...

select fnv.user_value as first_name, lnv.user_value as last_name, ev.user_value as email
  from question fnq, question_value fnv,
       question lnq, question_value lnv,
       question eq, question_value ev,
       question htq, question_value htv,
       question ohq, question_value ohv
   where fnv.question_id = fnq.question_id and fnq.name='first_name'
     and lnv.question_id = lnq.question_id and lnq.name='first_name'
     and ev.question_id = eq.question_id and eq.name='email'
     and htv.quesion_id=htq.question_id and htq.name='home_type'
     and ohv.qustion_id=ohq.question_id and ohq.name='owns_home'
     and fnv.poc_id = lnv.poc_id
     and fnv.poc_id = ev.poc_id
     and fnv.poc_id = htv.poc_id
     and fnv.poc_id = ohv.poc_id
     and htv.user_value='HOUSE'
     and ohv.user_value='Y'
     and 2<= (select count(*) from question cq, question_value cv
                where cv.question_id = cq.question_id and cq.name='car_type'
                  and cv.user_value = 'SUV'
                  and fnv.poc_id = cv.poc_id)
     and 2< (select count(*) from question cgq, question_value cgv,
                                  question cbq, question_value cbv 
               where cgv.question_id = cgq.question_id and cgq.name='child_gender'
                 and cbv.question_id = cbq.question_id and cbq.name='child_birthday'
                 and fnv.poc_id = cgv.poc_id
                 and cgv.poc_id = cbv.poc_id
                 and cgv.instance = cbv.instance
                 and cgv.value = 'M'
                 and cbv.value between '19900101' and '20001231');

Just a few comments. I think part of the issue is that you're trying to do everything in a single query. Many times it's faster to use multiple queries rather than trying to create the Uber-Query of Doom. All of your joins to the question table on question_value.question_id = question.question_id can be eliminated by pre-selecting question_id s for the questions involved in your query. Plus, given the existence of a global temporary table (kind of a misnomer, in that the table is permanent, but the data is automatically deleted when your transaction ends) that can store numeric ids, you can Insert-Select the poc_ids from your various subqueries and then use UNION, MINUS and INTERSECT as necessary to extract only those poc_ids that meet your criteria. Then you can extract the specific question_value belonging to those poc_ids that you are interested in.

One of the systems I maintain is a validation program for insurance company filings. The validation rule selection query was at one point a massive 1000 line query with around 64 embedded sub-queries. For a typical filing, the query would take 20-30 seconds to run. For a massive filing, it could take 2-3 minutes to run. By splitting out the sub-queries and performing them individually, I reduced the query run-time to less than a second for typical filings, and 10-20 seconds for massive filings. In short, the DBMS optimizer is not perfect, and sometimes you're better off with multiple simpler queries, than with one complex query. That's the approach I would take to query this generic data model. Happily though, it's not one of my business requirements, so I can punt on the issue.

Dave

ammoQ · 2006-04-26 Reply Admin

Dave:

Documentation is critically important for maintenance, and the more closely your application conforms to programmer expectations, and the fewer hoops you force them to jump through, the better. Having columns in a table that are unused by the application is confusing.

These columns are not unused. They contain valuable data from former versions of the form. Having a good documentation on the data model is not harder with a reasonable data model, the opposite is true. With a generic data model, you need documentation for the data model and documentation for the second level data model.

Actually, in this environment, you lose performance. The 5-10 business day turn-around on DBA requests will absolutely *kill* performance, and since the application has no privileges to modify or create tables, you'll have to generate DBA requests.

That's a DBA problem. In an environment that does the politics of 60's hosts, a relational database system is wasted.
The "alter table", done on Oracle, would take aprox. 0.05 secs. (The time does not depend on the number of rows in Oracle. "alter table add (...) is faster than select count(*) on the same (large) table. )

I'll concede that for this type of querying, the model is sub-optimal. However the application does not need to support this type of querying because all we do is collect the data, and then send it to an insurance company. The insurance company can store the data however it is appropriate for their needs.

In that case, you don't need a database. XML will do. CSV will do. Plaintext will do. And, bonus points, you can get rid of those DBAs.

Just a few comments. I think part of the issue is that you're trying to do everything in a single query. Many times it's faster to use multiple queries rather than trying to create the Uber-Query of Doom. All of your joins to the question table on question_value.question_id = question.question_id can be eliminated by pre-selecting question_id s for the questions involved in your query.

This is true, but even without those joins, it's still a large query. Look, I've put the join and the filter on the same line; I've also put the question and question_value tables on one line. For that reason, the LOC stay the same even with preselected question_ids! The speed will also be the same, unless the database system is fundamentally broken.

Plus, given the existence of a global temporary table (kind of a misnomer, in that the table is permanent, but the data is automatically deleted when your transaction ends) that can store numeric ids, you can Insert-Select the poc_ids from your various subqueries and then use UNION, MINUS and INTERSECT as necessary to extract only those poc_ids that meet your criteria. Then you can extract the specific question_value belonging to those poc_ids that you are interested in.

Please, let the database do it's job. Don't think you can outsmart the people making Oracle, SQL Server etc. You can only be faster that way if the data model is bad. If the query is slower than it should be, check the indexes. Refresh the statistics. The "I can do better joins and subqueries than Oracle" approach is the direct way to the front page of this web site.

One of the systems I maintain is a validation program for insurance company filings. The validation rule selection query was at one point a massive 1000 line query with around 64 embedded sub-queries. For a typical filing, the query would take 20-30 seconds to run. For a massive filing, it could take 2-3 minutes to run. By splitting out the sub-queries and performing them individually, I reduced the query run-time to less than a second for typical filings, and 10-20 seconds for massive filings. In short, the DBMS optimizer is not perfect, and sometimes you're better off with multiple simpler queries, than with one complex query. That's the approach I would take to query this generic data model. Happily though, it's not one of my business requirements, so I can punt on the issue.

Well, in our case, the query is such a juggernaut only because the generic database modell is a bad idea. I know some cases where splitting the query makes it faster, mainly because the cohesion of the big query is weak. You 64-subquery-example is likely such a case.

2006-05-01 Reply Admin

What really annoys me whenever someone is talking about "patterns" is that he is implying that someone "discovered" that pattern. This is nonsense, he is just the one to market that pattern. The "pattern" will very well have been employed before, and very often the perception of a pattern isn't needed, just the concept of avoiding redundancy.

E.g. having an inner platform that is just a copy of the outer platform is redundancy and bad. However, an inner platform can still be a good thing when it simplifies handling, such as using an sql admin gui instead of plain sql, or a lua scripting engine inside a C++ program.

Then, as another reply pointed out, the real catch Marios code is that instead of using a DB structure that was more flexible, he used one where it was necessary to modify the DB structure with time

masklinn · 2006-05-02 Reply Admin

Anonymous:
What really annoys me whenever someone is talking about "patterns" is that he is implying that someone "discovered" that pattern. This is nonsense, he is just the one to market that pattern. The "pattern" will very well have been employed before, and very often the perception of a pattern isn't needed, just the concept of avoiding redundancy.

"Discovering" a pattern means that you recognized it as such, not that you created it or anything, this is why you discover patterns and you don't invent them.

A pattern that's never been used before is not a pattern, discovering a pattern is the formalization of a useful and existing solution to a common and existing problem.

Gene Wirchenko · 2006-05-02 Reply Admin

masklinn:
Anonymous:
What really annoys me whenever someone is talking about "patterns" is that he is implying that someone "discovered" that pattern. This is nonsense, he is just the one to market that pattern. The "pattern" will very well have been employed before, and very often the perception of a pattern isn't needed, just the concept of avoiding redundancy.

"Discovering" a pattern means that you recognized it as such, not that you created it or anything, this is why you discover patterns and you don't invent them.

A pattern that's never been used before is not a pattern, discovering a pattern is the formalization of a useful and existing solution to a common and existing problem.

How can "a useful and existing solution" be "never been used before"?

Sincerely,

Gene Wirchenko

2006-05-02 Reply Admin

ammoQ:
Dave:

Documentation is critically important for maintenance, and the more closely your application conforms to programmer expectations, and the fewer hoops you force them to jump through, the better. Having columns in a table that are unused by the application is confusing.

These columns are not unused. They contain valuable data from former versions of the form. Having a good documentation on the data model is not harder with a reasonable data model, the opposite is true. With a generic data model, you need documentation for the data model and documentation for the second level data model.

Here's a hint from someone who blew it trying to deduce code from a DB schema: don't do it. It was a huge mistake on my part. Figure out first what the code should be doing, then check and see if the database matches your expectation, NOT the other way around. Driving your maintenance with "well, it's in the database, we should be doing something with it," is pointless and silly.

I have a folder in my filing cabinet marked "Taxi pay stubs - 1999". I can't loose this file until next year, in case I should be audited for that year. But I certainly don't submit "Taxi income - $0" on my income tax forms since then - it's just not relevant any more. No auditor would ever require that should do so, either.

ammoQ · 2006-05-02 Reply Admin

Dustman:

Here's a hint from someone who blew it trying to deduce code from a DB schema: don't do it. It was a huge mistake on my part. Figure out first what the code should be doing, then check and see if the database matches your expectation, NOT the other way around. Driving your maintenance with "well, it's in the database, we should be doing something with it," is pointless and silly.

I have a folder in my filing cabinet marked "Taxi pay stubs - 1999". I can't loose this file until next year, in case I should be audited for that year. But I certainly don't submit "Taxi income - $0" on my income tax forms since then - it's just not relevant any more. No auditor would ever require that should do so, either.

I'm sorry, but I doubt you've read the whole thread, so your comment IMO makes no sense in the context of the given discussion.

random_garbage · 2006-05-06 Reply Admin

Gene Wirchenko:
How can "a useful and existing solution" be "never been used before"?

I suspect that that's the point of the statement

masklinn:
A pattern that's never been used before is not a pattern...

In summary, what both of you are noting is:

a block of code only becomes a "pattern" when it's a "useful and existing solution" to a problem, and
a pattern is "discovered" by the person who notes that people are using the pattern to good effect, and that it can therefore be helpful to others (the community at large) to formalize the pattern.

Similar comments can be made regarding anti-patterns by negating the terms "useful" and "good".

2006-11-06 Reply Admin

Funny...I think that's called PeopleSoft! ; )

Regards,

Big

2007-02-09 Reply Admin

Stallman innocent? The man who wanted to write a text editor and somehow ended up with a huge lisp interpreter instead? If that isn't the inner platfor effect I don't know what is.

2007-02-10 Reply Admin

I hardly noticed the data structure because of the gaping chasm where the bind variables ought to be.

2007-02-25 Reply Admin

But without the $ you'd think he was suggesting people are re-implimenting multiple sclerosis.

andreyvul · 2008-01-09 Reply Admin

The Inert-Platform Effect is more like it.

2009-10-19 Reply Admin

Ok, I am not a db guy but isnt this a case where you would use data warehousing?

You create the db correctly (fully normalized) and then you create a view that patches together everything into a big phones_and_options view... create a warehoused table of that view (whenever there is a phone option [column] change) -- and run your queries against that concrete table.

So, whenever there is a option (column) addition, you modify the view sql (possibly programatically).

2010-08-11 Reply Admin

Checkitowt, implementation of an inner-platform effect...

class InnerPlatformEffectDB { private static appendListTo(StringBuilder sb, string[] list) { for ( int i = 0; i < what.length(); i++ ) { sb.append(what[i]); sb.append(", "); } } SQLconn sql = null; InnerPlatformEffectDB(string dbName) { sql = new SQLconn(dbName) }

ResultSet ipe(params string[] command) {
    StringBuilder sb = new StringBuilder()
    appendListTo(sb, command)
    return sql.exec(sb.toString())
}
static string SELECT(params string[] what) {
   StringBuilder sb = new StringBuilder()
   sb.append("Select")
   appendListTo(sb, what)
   return sb.toString()
}
//snip: FROM, WHERE, DROP, etc.

}

...While not actually expending any time reinventing the wheel on top of the wheel, only different; brilliant.

TRWTF is that I used java :) HaHaOnlySerious

2010-11-30 Reply Admin

Nick:
The company for which I previously worked based their entire business on a system that they believed was 'code free' and 'modifiable by the clients' for 'immediate onsite customization'. The crazed CEO boasted that this concept of 'application configuration' was revolutionary and that his product would some day rival those released by the likes M$. The base product attempted to utilize a database modeler in which the user would construct relational DB schemas via a primative GUI, and then design interfaces to the DB model using a screen integrator (much like VB). The modeler was also supposedly able to generate the corresponding SQL statements. Long story short, I knew it was time to quit when they actually had to take an entire computer onsite with them and hand it to the client because they were unable to make a stable build anywhere else except this one machine.

Sounds a lot like Intuit QuickBase. I worked at a company that used a QuickBase "application" designed in exactly the manner you describe, modulo QuickBase is Web-based.

My job? To write a program to save them from having to manually modify each and every record by hand (taking a whole minute per record), because the page integrator had no way to express "when a user scans this number into this field, update the following columns in this table with these predefined values".

2011-07-12 Reply Admin

VGR:
As I mentioned just a few days ago, this anti-pattern already has a name: Second System Sydrome.
That's where after you've done a simple, direct version of something, you come up with dreams of creating a Grand Unification of Everything, a program that does everything by being totally generic at every level.

An example of this at a smaller scale is a vararg function or method that takes generic "instructions" for its parameters:

void *doeverything(String command1, void *parameters1, String command2, void *parameters2, ...);

It's a very easy trap to fall into. If you aren't aware of the phenomenon, it's hard to realize you're in the trap at all, even though you are surrounded by clues, such as the fact that it seems to make development harder instead of easier.

Exposing it to the user is even worse; few users care about infinitely configurable software. (Ever see anyone actually bother to configure their menu bar in Word?)

Thats obviously the wrong way, try this:

public void Main(){
  XmlDocument state=new XmlDocument();
  state.LoadFromFile("state.xml");
  Main2(state);
}

// the ultimate singleton, sole function in program, recursive
public void Main2(XmlDocument SystemState){
  //huge switch statement
}

All your globals are right there when you need them, along with your whole data layer

I actually worked on a program that was like that- somewhat. One giant cpp file that was a swiss army knife of functions evolved over years with no code cleanup at all. "It slices, it dices..." was actually one of the comments in there.

They used it for whatever they wanted to do by cloning it and modifying it. Took me a week to go through it and break it down to a meaningful set of codebase files -- of course that did not include refactoring out any repetitious code.

2013-10-15 Reply Admin

Everyone wants a system that doesn't require programming skill. The problem is coding is hard. You can shift the complexity of it from one place to another, but you cannot get rid of it.

2018-05-11 Reply Admin

I think platform-wrapper or platform-proxy is a better name for this anti-pattern

2018-07-05 Reply Admin

gee, that sounds like wordpress;-}

The Inner-Platform Effect

Leave a comment on “The Inner-Platform Effect”