The Daily WTF: Curious Perversions in Information Technology

Alex Papadimoulis · 2012-02-28 Reply Admin

Decided to cut this from the article... the comments seemed like a good place to paste it.

Yes to NOSQL? While NOSQL databases are somewhat based around EAV principles, I can say with 99.999% certainty that NOSQL is not for you. You are not, nor will not have to deal with Facebook-sized problems, and if you ever do, you'll be in a great position to hire a highly-specialized engineer who actually knows how to solve massive scalability problems.

NOSQL databases are not a database evolution by any means. In fact, they’re a throwback many of the systems designed in the 1960';s and 1970's, where a megabyte of memory cost more than a programmer's annual salary. As hardware costs became cheaper (i.e. a megabyte only cost a week’s salary), databases evolved into what we know today: the relational model.

Play with them on your hobby projects, but they don't belong in your work projects. Remember, programming is supposed to be boring.

2012-02-28 Reply Admin

The comment about NoSQL can be quickly wrong. NoSQL are not replacement of database to help performance, that's a common fallacy.

They are just another useful tool for a precise use. Exactly like SQL, but for others needs.

Redis, for example, is useful for precises cases : if you don't need to request data by something else than its id, have concurrent computing problems, and / or have an awful lot of writing, Redis can be a useful tool.

(note that "NoSQL" is by itself a wrong phrase. It's a buzzword tackled on very differents technologies with very different use)

And above all, keep your head cool and use the tool you need. Not using NoSQL is exactly as stupid as using only NoSQL. Do you stop using your car because you can take a plane ? That's the same with NoSQL. Use it when you need it, and don't use fallacy like "I am not facebook, this can't be useful".

2012-02-28 Reply Admin

We use (something like) www.deltasql.org for versioning of our DB schemas between stable / testing / development branches.

2012-02-28 Reply Admin

Maybe I'm having a late-term case of post-Monday stupidity, but what the hell does SAP have to do/mean in the context of "being replaced" - do you mean the Germain company or something else?

2012-02-28 Reply Admin

For those interested in doing versioning of their database in the manner Alex specified near the end of the article, the FOSS tool RoundhousE (https://github.com/chucknorris/roundhouse) is the tool we use where I work and have been exceedingly happy with.

emurphy · 2012-02-28 Reply Admin

me:
Maybe I'm having a late-term case of post-Monday stupidity, but what the hell does SAP have to do/mean in the context of "being replaced" - do you mean the Germain company or something else?

Their ERP software, presumably. IIRC it has a reputation of "there are lots of people out there who will customize the crap out of this thing if you have the budget for it".

2012-02-28 Reply Admin

Perhaps the most important component of any change management system is organizational discipline. It's particularly true about databases, where tools to automate it and audit it are immature at best. The organization must be disciplined enough to document its way around the lack of tools.

My organization (a large paint company up the road from Inedo) maintains a code repository chock full of DDL code, the latest checked-in version of which is always implemented in Dev, QA and Prod EXCEPT when there is an ACTIVE change request. The DDL is simply "CREATE OR REPLACE" for procedures/packages and simply "ALTER TABLE" when table structures change. (Yes, we know that makes it painful to recover to a given place in the chain of table mods but in 2+ years of being in maintenance mode that has never happened outside of Dev. Let's be realistic about whether we really need to solve that problem.) The change request system is sufficiently advanced to coordinate all the steps required to modify both the DDL layer in the database along with any executable code, scheduled jobs, configuration, etc. Both the business people and the several technical teams involved in most changes can all see the change request as it winds its way along, being notified by email whenever it advances between environments or teams. Whoever the latest person is in the chain sees whatever has already happened and we make sure to pester each other when we're not satisfied with the quality of the entries at each step. It's a metric buttload of paperwork but the process is auditable and repeatable, and that meets the ultimate business need of knowing what's in Production and what to do when it breaks at 3:00AM.

Alex is right about the resistance to change leading to uncontrolled change (and vice versa). Think of earthquake activity on a fault line, where pent up need to shift leads to sudden mass change and often disaster. We avoid that by having a workable, if labor intensive process, for managing lots of small changes. The learning curve on the process is steeper than anyone would like but it's really not that bad after the first few changes and it beats the crap out of the alternative that's been so aptly laid out in various WTFs over the years.

/CAPTCHA: nibh is where Mrs. Frisby escaped from

boog · 2012-02-28 Reply Admin

The Article:
...most developers’ reluctance to master this method leads towards a tendency to simultaneously resist change and change things uncontrollably.

...or to abandon databases altogether and seek out alternate solutions on the pretense that they are either "easier to understand" or "the future" of databases.

2012-02-28 Reply Admin

I am glad to be using Event Sourcing and CQRS; database changes aren't an issue.

2012-02-28 Reply Admin

3rd Ferguson:
/CAPTCHA: nibh is where Mrs. Frisby escaped from

That's NIMH, not NIBH.

2012-02-28 Reply Admin

I think it's incredibly limiting to say migrations have to be SQL scripts.

We have a system of migrations with full access to all our application's services, the full object model and the EntityManager (we're running on Hibernate).

Sometimes, these migrations just run an SQL script (like a simple UPDATE to set a new boolean field to false for every row), but often a new field will need to be calculated somehow, and then it's nice to have your actual application logic do the calculations.

Also, if your whole application is using JPA QL, dropping down to SQL in migrations will be bug prone.

ObiWayneKenobi · 2012-02-28 Reply Admin

The WRONG way to do this is to mandate all database access be done via stored procedures, and then have to run some five dozen sprocs to "update" a database. And store each of these scripts in SVN and never remove them, resulting in a not-quite-migration history of all the changes you've done to your database.

2012-02-28 Reply Admin

I delivered to a customer the truly ultimate, general purpose, totally flexible platform. I didn't even install an O/S on the computer I supplied.

2012-02-28 Reply Admin

+1 For logs. Without, it's Russian Roulette (just a matter of time).

2012-02-28 Reply Admin

Minor Point: We've used EAV several times when we didn't know what data we'd be capturing, and what we would capture would change regularly. It's a nice general solution to that problem. The only real alternatives I've seen is loads 'o data tables, or the single table with 99+ string columns.

2012-02-28 Reply Admin

Alembic. That is all.

2012-02-28 Reply Admin

It made me feel all warm and fuzzy to know I have worked in two different places that did this right. Between source control software, dev/test/prod (along with a dedicated QA team) we never had a problem escape into production. Submitting every db change as a script may have been a PITA, but it worked.

savar · 2012-02-28 Reply Admin

I haven't replied here in some time, but this article touches on a sensitive topic at my workplace for a few years. In that time we have had 4 different migration systems, including (1) no system at all (2) a perl hackery thing I wrote (3) the official migration system packaged with our ORM and finally (4) a new custom migration system written in the same language as the rest of our project that was thoroughly discussed and planned before beginning the implementation.

I'm not sure our solution would work everywhere, but so far it's solved all of the big problems we've had with each of our previous systems. And it largely matches what Alex describes.

Mike S:
Minor Point: We've used EAV several times when we didn't know what data we'd be capturing, and what we would capture would change regularly. It's a nice general solution to that problem. The only real alternatives I've seen is loads 'o data tables, or the single table with 99+ string columns.

Anyway, I wanted to reply to this in particular. The need for the EAV pattern isn't new or uncommon. Lots of designers face a similar problem with "we don't know what we'll need to store". The problem with putting EAV in an RDBMS is that you lose all of the useful features of the RDBMS such as types, constraints, foreign keys, etc. You also take on some of the negative aspects of RDBMS (such as the change management quagmire mentioned in the foregoing article).

At that point, the only real reason to put the EAV in the RDBMS is because you can't think of any better place to put it. That's usually not a good rationale for designing serious applications.

Depending on your needs, it may make more sense to serialize the EAV to a file (e.g. INI format or a native binary format), serialize to volatile storage (e.g. memcached), or write to a document-based storage system <please don't hit me>. Or you may decide that you're not going to store this EAV at all, and instead you'll normalize the information and tell the client that future changes will not be trivial but it's worth it because the present system will be more reliable, faster, and cheaper to operate.

I don't see anything wrong with having multiple storage systems for a single application. For data that fits the relational model well, store it an RDBMS. For data that fits some other model better, store it somewhere else.

2012-02-28 Reply Admin

EAV. First time I encountered this outside of school was Magento. As much as it makes a lot of sense theoretically, EAV can be very painful.

EAV has its place: college classrooms.

2012-02-28 Reply Admin

Would anyone care to comment on how well Alex's approach here matches the Ruby on Rails approach (migration scripts written in a database-agnostic DSL, each of which is meant to be reversible; a log in each data environment defining which scripts have been run against that environment)?

Anketam · 2012-02-28 Reply Admin

savar:
... <please don't hit me> ...

For the lols and giggles! *Punches savar in the arm*

2012-02-28 Reply Admin

I just knew there would be someone trying to ruin tdwtf by putting informative and well thought out comments on tdwtf!

In particular: "I don't see anything wrong with having multiple storage systems for a single application"

Microsoft's Great Plains software sort of does this. It essentially uses EAV on some tables while using a more traditional approach on others. However, the table names and data definitions being stored in files rather than having useful table names and column names in the database drives me nuts now and then.

Magento has moved a long ways from it's beginning. Nearly everything was in an EAV format in early versions, but now some data has transitioned from EAV to a more traditional column oriented style.

2012-02-28 Reply Admin

Simple:
I delivered to a customer the truly ultimate, general purpose, totally flexible platform. I didn't even install an O/S on the computer I supplied.

I gave my customers the most flexible system yet invented: A pad of paper and a box of crayons. They can store any data they want, including text and graphics.

2012-02-28 Reply Admin

Ohlmann:
The comment about NoSQL can be quickly wrong. NoSQL are not replacement of database to help performance, that's a common fallacy.
They are just another useful tool for a precise use. Exactly like SQL, but for others needs.

Redis, for example, is useful for precises cases : if you don't need to request data by something else than its id, have concurrent computing problems, and / or have an awful lot of writing, Redis can be a useful tool.

(note that "NoSQL" is by itself a wrong phrase. It's a buzzword tackled on very differents technologies with very different use)

And above all, keep your head cool and use the tool you need. Not using NoSQL is exactly as stupid as using only NoSQL. Do you stop using your car because you can take a plane ? That's the same with NoSQL. Use it when you need it, and don't use fallacy like "I am not facebook, this can't be useful".

NoSQL is a fad invented by programmers who think they have outsmarted giant enterprises when it comes to data storage and retrieval. Nothing more.

2012-02-28 Reply Admin

Side note: It is much harder to fix a bad database than a bad program.

If you discover that one module is poorly written, you can often fix it or even throw it away and rewrite it with minimal impact on other modules. Sometimes there are dependencies, of course, but you can do a lot of clean-up without changing external interfaces. And most of the time -- not always, but most of the time -- the number of other modules affected by changes to a given module is manageable.

But if you discover that a database table is poorly designed, fixing it very often means changing every module that accesses that table, which could be lots. Oh, you can add an index, and sometimes you can change a field length or that sort of minor thing. But if you realize that the employee table really needs to be broken into an employee table AND a benefits table, or that the shipping-info table must connect to order ITEMS and not to orders, etc, you could face many, many coding changes all over the place to support it. Plus, you'll have to write a data conversion program to reformat the data from the old schema to the new.

DB design changes are almost always painful. Adding stuff? Not so bad. Changing existing stuff? Painful.

2012-02-28 Reply Admin

A spirit level of good application code is the ease with which code can be arbitrarily changed without breaking the build.

Are you suggesting that the same applies to the database? As soon as you are unprepared to change the schema, the entire database is likely broken and is in need of an overhaul?

2012-02-28 Reply Admin

The solution we came up with turned 4 hour DB deployments that often failed into half hour deployments that always succeed. We used to have the horrible system where we had one sql script to deploy and another sql script to rollback (per DB). It failed all the time. People failed to update the rollback scripts consistently and they were rarely tested.

We now have xml deploy scripts that have very short bits of sql and the corresponding bit of rollback sql. For example:

	<sql db="cps_cpsdata">
		<terseDesc>Add and set cannedRole field on Role.</terseDesc>
		<deploy>
			alter table Role add cannedRole NUMBER(1)  DEFAULT(0);
			update Role set cannedRole=1 WHERE company='102' and name not like 'Employee%';
		</deploy>
		<rollback>
			alter table Role drop column cannedRole;
		</rollback>
	</sql>

We then have teamcity constantly running the deploy scripts against a test DB every time anything changes.

The point is that each <sql> element is in charge of deploying and rolling back. If you drop a column then you need to save the data in another table first so that the rollback can add it back.

With teamcity constantly deploying and rolling back we quickly find out of the script properly rolls things back.

For more complex operations we have actual dedicated deployer types:

<editPrivilege isNew="true" key="View Android Package" name="View Android Package"/>

The "editPrivilege" type corresponds to java code that knows how to deploy and roll back privilege objects perfectly every time.

The <sql> elements can also run against any DB, so deployments that require changes to several DB will all run at the same time and stay in sync.

The final step is the deploy phase. The deployer runs these scripts and then, if a failure occurs, it automatically rolls back. When the deployer is finished the DB is either in the new updated state or it's back in the original state.

DB deploy used to be a nightmare, but in the 2 years of using this system we have NEVER had a failed DB deployment.

Alex Papadimoulis · 2012-02-28 Reply Admin

Jacob:
Would anyone care to comment on how well Alex's approach here matches the Ruby on Rails approach (migration scripts written in a database-agnostic DSL, each of which is meant to be reversible; a log in each data environment defining which scripts have been run against that environment)?

I almost added this as an anti-pattern, but ran out of room. So, here goes…

Anti-Pattern: DB Migrate Ruby on Rails developers have championed an “innovative” approach called “DB Migrate” to handle database changes. Essentially, you define structures in a database-agnostic, domain-specific language and then apply those to any supported database platform. I put innovative in quotes because this is simply a reinvention of a bad idea.

Database management systems were designed to solve the problem of tightly-coupled application code and application data. They learned long ago that data not only outlives code, but it works a lot differently and, therefore, needs to be treated differently.

DB Migrate is simply a tightly coupled convention/syntax of the underlying RoR platform and, worse, it unnecessarily abstracts an abstraction. SQL does not need to be “wrapped” with a less expressive language.

The “reversibility” of the migrations is security theater. The idea of a “simple rollback” does lead to a warm and fuzzy feeling, but no one ever tests rollback scripts. Especially against production … where it actually matters. Because if they actually knew how to test stuff, then they wouldn’t need to worry about rollbacks in the first place. So, you’re left with a script that simply makes things worse.

Does it make development faster? Of course it does. Then again, so does writing your code directly in production.

2012-02-28 Reply Admin

"While NOSQL databases are somewhat based around EAV principles, I can say with 99.999% certainty that NOSQL is not for you. You are not, nor will not have to deal with Facebook-sized problems, and if you ever do, you'll be in a great position to hire a highly-specialized engineer who actually knows how to solve massive scalability problems."

What if my main reason for using NoSQL is NOT the expectation of dealing with Facebook-sized problems?

Say I want to manage a small or medium database, but avoid the development overhead from creating and altering table structures.

If I add a new field to my data model, then in a relational database I'll need to:

Change the code
Create a database change script
Test the database change script
Ensure that the change script runs exactly once
Keep backups of the data in case the change script fails, despite testing
Avoid altering the table during working hours since it involves a full table rebuild
etc etc etc

While in a NoSQL database like MongoDB I'll need to:

Change the code

...and it just works, right away.

If I also want to index the new field for optimum performance, that's another small step, but the signal-to-noise ratio of "what I want to do" and "how much code or queries do I need to write to accomplish it" is still lightyears ahead of RDBMS.

Is NoSQL still not for me?

2012-02-28 Reply Admin

Jay:
Side note: It is much harder to fix a bad database than a bad program.

+1 this. Start with your entities and attributes, people! Validate it with the business! Remember, the data belongs to the organization, not to the application. Repeat: The data belongs to the organization, not to the application. My organization has data that started out on recipe cards in the 1930's. That data was computerized on a mainframe in the 1960's, migrated to a Vax in the 1980's, migrated to MS SQL and .Net in 2011, and will undoubtedly be on something else 20 years from now. This is possible because the first people to computerize the data thought carefully about how to structure it - they kept future generations in mind. The data belongs to the organization, not to the application.

frits · 2012-02-28 Reply Admin

Jay:
Side note: It is much harder to fix a bad database than a bad program.
If you discover that one module is poorly written, you can often fix it or even throw it away and rewrite it with minimal impact on other modules. Sometimes there are dependencies, of course, but you can do a lot of clean-up without changing external interfaces. And most of the time -- not always, but most of the time -- the number of other modules affected by changes to a given module is manageable.

But if you discover that a database table is poorly designed, fixing it very often means changing every module that accesses that table, which could be lots. Oh, you can add an index, and sometimes you can change a field length or that sort of minor thing. But if you realize that the employee table really needs to be broken into an employee table AND a benefits table, or that the shipping-info table must connect to order ITEMS and not to orders, etc, you could face many, many coding changes all over the place to support it. Plus, you'll have to write a data conversion program to reformat the data from the old schema to the new.

DB design changes are almost always painful. Adding stuff? Not so bad. Changing existing stuff? Painful.

Ever heard of a DAL?

2012-02-28 Reply Admin

NoSQL versus SQL?

Martin Fowler recently put together a few excellent slides on the subject. Read and enjoy.

2012-02-28 Reply Admin

So tell us how you really feel about Notes, Alex...

2012-02-28 Reply Admin

Re: "Re-executable Change Scripts", I don't get it. If your intention is to add a column to a table you do exactly what's in the example. If the column already exists in (some versions of) the database, but is defined differently, you know about it and therefore stick an alter column statement to cover it. In fact, it is crucial that a script can be run multiple times while making sure that no changes are made if the script is executed again. People accidentally do that all the time.

2012-02-28 Reply Admin

That's only an anti-pattern because you're using the tool a foolish way. Most of us manage to have a single deploy script that is fully tested. It's called deploying against a restore of a recent backup.

I also find it utterly hilarious that you can create an article called 'Database Changes Done Right' without using the word 'transaction' once.

PiisAWheeL · 2012-02-28 Reply Admin

OneMist8k:
NoSQL versus SQL?
Martin Fowler recently put together [removed to make akismet happy.] a few excellent slides[/url] on the subject. Read and enjoy.

Here's my favorite reading on the subject: http://howfuckedismydatabase.com/nosql/

And akismet can piss off!

2012-02-28 Reply Admin

If you designed the database correctly the first time, this wouldn't be an issue.

2012-02-28 Reply Admin

Zylon:
Looks like somebody has an itchy comment-deleting finger today.

Looks like somebody was so satisfied with the self-fellation that was today's article that he couldn't allow a hint of dissent.

2012-02-28 Reply Admin

We use a very similar system in my workplace, with some minor changes that suit our deployment a little better. Eg, we have a simple db versioning system where a table exists called ControlRecord that contains Major, Minor and Release columns. Our scripts are batched into; SQL
SQL\Original SQL\Patches\v1.0.10 SQL\Patches\v1.2.00 SQL\Patches...

Where each folder contains a "001) DB Version Check.sql" script that ensures that the scripts in the folder are only every run against the correct version of the database. Our batch file is a little more explicit in that it parses the results of "dir *.sql /b/a-d/on" to ensure that the scripts are run in order (some file systems don't return files sorted by name automatically, eg, FAT32 - why such an FS would be used in production is a different question but you get the idea)

On a side note, you may want to change your scripts to use "sqlcmd" instead of "osql" as "osql" is being deprecated...

2012-02-28 Reply Admin

EAV Redux: EAV's useful for situations where you have to store structured data, where users are specifying the structure at run time, and the structure's different for different "classes" of data. In the systems that I've seen and worked on, everything else gets stored in 3rd normal form. It's got plenty of downsides (performance is poor, and reporting's a pain), but it's great for what it does. For reporting, we have a tool that converts EAV to 3rd normal format. It spits out over 100 tables into a seperate reporting database.

2012-02-28 Reply Admin

Alex Papadimoulis:
Decided to cut this from the article... the comments seemed like a good place to paste it.
Yes to NOSQL? While NOSQL databases are somewhat based around EAV principles, I can say with 99.999% certainty that NOSQL is not for you. You are not, nor will not have to deal with Facebook-sized problems, and if you ever do, you'll be in a great position to hire a highly-specialized engineer who actually knows how to solve massive scalability problems.

Either you're not using NoSql as the buzzword it has become, or I can say with 99.999% certainly you make claims off the cuff instead of due diligence.

This is like the fools that think Macs suck because they're not PCs and vice versa. Neither is correct, both are dumb.

2012-02-28 Reply Admin

Myth:
3rd Ferguson:
/CAPTCHA: nibh is where Mrs. Frisby escaped from

That's NIMH, not NIBH.

Depends on whether or not you have a cold...

\CAPTCHA: validus is the residue left over from validation.

2012-02-28 Reply Admin

I like the article and it sounds like it is a formula for success, but your comments on all of NoSQL and migration frameworks are...unfortunate.

It's almost like you don't want alternative approaches to work, like you're rooting against NoSQL and rooting against migration frameworks such as RoundhousE (which uses native SQL scripts, not an abstraction).

2012-02-28 Reply Admin

Alex Papadimoulis:
Anti-Pattern: DB Migrate Ruby on Rails developers have championed an “innovative” approach called “DB Migrate” to handle database changes. Essentially, you define structures in a database-agnostic, domain-specific language and then apply those to any supported database platform. I put innovative in quotes because this is simply a reinvention of a bad idea.
Database management systems were designed to solve the problem of tightly-coupled application code and application data. They learned long ago that data not only outlives code, but it works a lot differently and, therefore, needs to be treated differently.

DB Migrate is simply a tightly coupled convention/syntax of the underlying RoR platform and, worse, it unnecessarily abstracts an abstraction. SQL does not need to be “wrapped” with a less expressive language.

The “reversibility” of the migrations is security theater. The idea of a “simple rollback” does lead to a warm and fuzzy feeling, but no one ever tests rollback scripts. Especially against production … where it actually matters. Because if they actually knew how to test stuff, then they wouldn’t need to worry about rollbacks in the first place. So, you’re left with a script that simply makes things worse.

Does it make development faster? Of course it does. Then again, so does writing your code directly in production.

If you ever use a call to db:migrate:rollback in production, you've done something very, very wrong and should be restoring from backup anyway. If you've not tested your changes against a dev or staging environment, again, you've done something very, very wrong.

On the subject of NoSQL databases, in the case of the document database (mongodb, amazon simpledb), it's quite useful for quickly prototyping and developing. Might not be the best to go enterprise with, but quite fine for prototype stages.

2012-02-28 Reply Admin

Jonathan:
So tell us how you really feel about Notes, Alex...

I was thinking of that the entire time I read this article.

CAPTCHA: esse. don't stab me, esse.

2012-02-28 Reply Admin

without reading all the comments, check out Liquibase. Not perfect, but does all Alex mentioned.

captcha: f you for posting damn captchas

2012-02-28 Reply Admin

Alex Papadimoulis:
2.(Optional) Code Review – this is optional, of course, but it's always nice to have a second set of eyes

(Optional) Checking for any kind of bugs in any kind of development, anywhere. (Optional) A second set of eyes. (Optional) NO IT IS NOT.

(Optional) Producing front page material. (Optional) The owner of this site suggesting that the avoidance of front page material is Optional.

2012-02-28 Reply Admin

(Optional) Code Review – this is optional, of course, but it’s always nice to have a second set of eyes

Maybe the other eyes can help is true, but a lot of the time the value comes from you having to explain things. If it's just in your head, it can be a somewhat featureless blob of thought. You think "hey, I'll get that when I get there."

But to explain it to someone, you need to take the the featureless blob to something much more concrete. Those 'think of later' things need to be thought of, those loose ends need to be tied.

I keep a bookmark for the Cardboard Cutout Dog around.

2012-02-28 Reply Admin

Yes, I'll just run a SQL script against my UniVerse database, that'll work.

2012-02-28 Reply Admin

I used south with django, which is more or less the same functionality as ruby on rails migrations ( just comment if you think it's not that similar ). I'd say it's really great during development, but while rollback scripts can be handy, I'm not sure if it's really useful or safe on a production database. Such simple automated migrations are useful when the database is really bound to only one application, so let's not compare apples to oranges, on a centralized enterprise database with all kind of applications using it, that would be nonsense.

Then on nosql databases : you still have a schema that may change, and that means you may need data migrations scripts. For instance, a blog post may have comments directly embed in the blog post record ; but it doesn't work well if you have thousands of records embedded in a single parent record. So you may need to create another table with comments, and just keep IDs in the blog post table. So it's easy to create new columns, embed subrecords, or create indexes on the fly ... but using a nosql database does not mean you can avoid data migrations.

Database Changes Done Right

Leave a comment on “Database Changes Done Right”