The Daily WTF: Curious Perversions in Information Technology

2012-02-28 Reply Admin

Don't know much about other DBMS's but with Oracle change control is fairly straightforward.

For example, I just write the ALTER statements I need to add or drop columns. The scripts are perfectly safe to rerun, because if you try to add a column that has already been added, or drop a column that was already dropped, you just get an error message and no change is done.

Also, it's easy enough to build the rollback scripts - e.g. if you drop a column, you first copy the data for the column (along with the PK) to a backup table, then drop the column. To rollback, just re-add the column, update the rows from the backup data, then reapply any constraints.

When they're ready for testing, the change scripts are run in the test environment, testing is done, then the scripts are rolled back, more testing is done (to make sure the rollback did its job), then the scripts are run again.

This is standard practice in pretty much all the places I've worked. The exception was one client who had a clued-up DBA who knew how to use the Oracle flashback features to do the rollbacks - which made testing the scripts a whole lot easier. It didn't matter if a developer made a huge mess of their database change scripts - the DBA could easily undo their mess.

2012-02-28 Reply Admin

You could use a tool like LiquiBase to store your scripts in source control with your application, and manage which scripts need to be run in what order when deploying. All of the above rules still apply of course.

2012-02-28 Reply Admin

And, just how the fuck did I solicit this fucking speech!?

Shouldn't his site be featuring IT anecdotes?

2012-02-29 Reply Admin

Mike S:
EAV Redux: EAV's useful for situations where you have to store structured data, where users are specifying the structure at run time, and the structure's different for different "classes" of data. In the systems that I've seen and worked on, everything else gets stored in 3rd normal form. It's got plenty of downsides (performance is poor, and reporting's a pain), but it's great for what it does. For reporting, we have a tool that converts EAV to 3rd normal format. It spits out over 100 tables into a seperate reporting database.

ROFL. You just reminded me of the system I'm working with.

How about EAV IN 3rd normal form, eh?

(Table names have been adjusted to protect the oh so very guilty).

Table "entity"
    Column      |  Type   |
----------------+---------+
 entity_id      | integer |
 entity_type_id | integer |

Table "entity_type"
     Column       |          Type          |
------------------+------------------------+
 entity_type_id   | integer                |
 entity_type_code | character varying(128) |

Table "entity_attribute_type"
         Column                |          Type          |
-------------------------------+------------------------+
 entity_attribute_type_id      | integer                |
 entity_type_id                | integer                |
 entity_attribute_name         | character varying(128) |
 entity_attribute_data_type_id | integer                |

Table "entity_attribute"
         Column           |  Type   |
--------------------------+---------+
 entity_id                | integer |
 entity_attribute_type_id | integer |
 entity_attribute_value   | text    |

Joins, joins everywhere....

2012-02-29 Reply Admin

User customized packages, both big and small, use customizable columns, rather than DDL to alter the database.

As far as I remember, you do it that way because your code is designed to display and calculate data from the database, not meta-data from the database.

I think that putting the customer customizations in with the customer data seemed like the logical way to go, and certainly made updates and maintainance easier.

2012-02-29 Reply Admin

What about maintaining database creation scripts? Do you just have the original schema creation script and then run all your change scripts on it, or do you maintain scripts for creating 'new' databases from scratch?

2012-02-29 Reply Admin

Honestly the best part of systems like this was the point mentioned last.

Having the ability to spin up a DB that matches production anywhere any time on demand turbo charges productivity. No need for hellish shared development environments or scheduling time on the 'uat' environment for user testing is a freedom that I will never give up ever again. Once you have this ability you can really take your continuous integration testing to the next level too.

TheCPUWizard · 2012-02-29 Reply Admin

While EAV type architectures may be "anti-patterns" in many cases, there are definately valid use cases for them. My company created DynaSearch back in the 1980's which has become over the past 25 years a very sophisticated system that is based on the EAV pattern.

For "irregular data" (different items have different "fields") this pattern can be extremely performant, and compact.

On one system (a generalized taxonomoy system) we have a running instance with over 200G "cells" representing over 50TB of information. About 5 years ago another firm approached our client, and recommended going to a "normalized database". Their proposal required over 5000 tables, was estimated to take 3 times as much storage, and in prototype tests, could not match the performance, even though their test set contained less than 1% of the total data.

About 3 years ago, we integrated a NOSQL database for handling "supporting data". This effort has proven to also be quite successful, allowing us to have full access to all of the information which was previously stored in other types of repositories.

It is all about using the right tool for the right job.

2012-02-29 Reply Admin

Most of us manage to have a single deploy script that is fully tested. It's called deploying against a restore of a recent backup.

You didn't specify which of the environments you would restore the backup of, and where you would restore it to. If you have decided to only keep backups of production and not of the other environments, then it becomes obvious.

For the testing, you would restore a recent backup of the production database, and obviously you don't restore that on top of production, you restore it into a separate environment which is only used to restore backup copies of production. So that is not one of the environments already mentioned in the article.

Testing database changes this way is a good idea even if you also test it in other ways.

Periodically restoring backups of production into a separate environment is a good idea even if you don't use the environment you restored to for testing new code every time you have performed such a restore. The reason it is a good idea is that it gives you more confidence in the proper operation of your backup and restore procedures.

There is of course a few caveats to this approach.

The production system may have much more data than any of the other environments. An environment that you only use for testing against a restore of production data may need more hardware than you'd usually want to allocate for that purpose.
It might be that not everybody working on the code is permitted to access the production data. Testing against a restore of production data requires most of the access control that would also be used to guard production data on the live system.
When testing with production data, separation of the systems becomes even more important. Leaking data from the earlier test environments into the real world is rarely a disaster. If the test system was for example sending out emails or accidentally tries to perform changes against production, the damage would be limited, most likely none. However if something similar happened on a test environment that used a copy of production data, it could be bad.

2012-02-29 Reply Admin

Jeff Kemp:
Don't know much about other DBMS's but with Oracle change control is fairly straightforward.
For example, I just write the ALTER statements I need to add or drop columns. The scripts are perfectly safe to rerun, because if you try to add a column that has already been added, or drop a column that was already dropped, you just get an error message and no change is done.

Also, it's easy enough to build the rollback scripts - e.g. if you drop a column, you first copy the data for the column (along with the PK) to a backup table, then drop the column. To rollback, just re-add the column, update the rows from the backup data, then reapply any constraints.

When they're ready for testing, the change scripts are run in the test environment, testing is done, then the scripts are rolled back, more testing is done (to make sure the rollback did its job), then the scripts are run again.

This is standard practice in pretty much all the places I've worked. The exception was one client who had a clued-up DBA who knew how to use the Oracle flashback features to do the rollbacks - which made testing the scripts a whole lot easier. It didn't matter if a developer made a huge mess of their database change scripts - the DBA could easily undo their mess.

While that's the approach I use as well, just allowing the script to error on re-runs sucks a bit too, because now your script output is full of error messages for things that aren't really errors. I've been bitten a few times where I've missed a genuine bug in my script because I told myself "oh that's OK, I expected all those errors" (except the one that mattered, of course).

Flashback works nicely (for those that don't know, database flashback is logically equivalent to a "revert to revision X" command to a source control system). Often it's quicker than doing a restore (or an export/import) because the alternatives involve replacing everything from scratch from an export.

It's also a good way to get your test environment back to a known state for re-tests.

Personally, what I'd like to do (but don't - our DBAs don't cooperate) is see a combination of flashback (to revert the DB to it's "production" version) along with full runs of the migration script.

dkf · 2012-02-29 Reply Admin

Therac-25:
Joins, joins everywhere....

My God, it's full of JOINs!

It reminds me of a database I encountered a few years ago that was being used to support an RDF query processor (EAV, but where the values were really all more entities, as were the attributes themselves; crazy system!) A “database expert” colleague of mine had set it all up, and it was taking over 20 minutes to do a relatively simple query. Turns out, he'd explicitly decided to not have any indexes set up “because they take time to rebuild when the data changes”. Adding indexes got the query into the so-fast-we-don't-measure domain. The “expert” doesn't work for us any more (though not because of this incident, to be fair).

2012-02-29 Reply Admin

emurphy:
me:
Maybe I'm having a late-term case of post-Monday stupidity, but what the hell does SAP have to do/mean in the context of "being replaced" - do you mean the Germain company or something else?

Their ERP software, presumably. IIRC it has a reputation of "there are lots of people out there who will customize the crap out of this thing if you have the budget for it".

And for me it has a reputation of "this piece of crap is unusable until you customize the crap out of it ... and even then it sucks badly".

It's more corporate umbrella warfare than software anyway.

2012-02-29 Reply Admin

Alex:
I think it's incredibly limiting to say migrations have to be SQL scripts.
We have a system of migrations with full access to all our application's services, the full object model and the EntityManager (we're running on Hibernate).

Sometimes, these migrations just run an SQL script (like a simple UPDATE to set a new boolean field to false for every row), but often a new field will need to be calculated somehow, and then it's nice to have your actual application logic do the calculations.

Also, if your whole application is using JPA QL, dropping down to SQL in migrations will be bug prone.

And what if what you call your application logic is located in the wrong place and that's why you see that issue ?

Try and give a real example, because not using SQL means losing transactional integrity (which you do not have in your java program), and that's a heavy price.

2012-02-29 Reply Admin

savar:
I haven't replied here in some time, but this article touches on a sensitive topic at my workplace for a few years. In that time we have had 4 different migration systems, including (1) no system at all (2) a perl hackery thing I wrote (3) the official migration system packaged with our ORM and finally (4) a new custom migration system written in the same language as the rest of our project that was thoroughly discussed and planned before beginning the implementation.
I'm not sure our solution would work everywhere, but so far it's solved all of the big problems we've had with each of our previous systems. And it largely matches what Alex describes.

Mike S:
Minor Point: We've used EAV several times when we didn't know what data we'd be capturing, and what we would capture would change regularly. It's a nice general solution to that problem. The only real alternatives I've seen is loads 'o data tables, or the single table with 99+ string columns.

Anyway, I wanted to reply to this in particular. The need for the EAV pattern isn't new or uncommon. Lots of designers face a similar problem with "we don't know what we'll need to store". The problem with putting EAV in an RDBMS is that you lose all of the useful features of the RDBMS such as types, constraints, foreign keys, etc. You also take on some of the negative aspects of RDBMS (such as the change management quagmire mentioned in the foregoing article).

At that point, the only real reason to put the EAV in the RDBMS is because you can't think of any better place to put it. That's usually not a good rationale for designing serious applications.

Depending on your needs, it may make more sense to serialize the EAV to a file (e.g. INI format or a native binary format), serialize to volatile storage (e.g. memcached), or write to a document-based storage system <please don't hit me>. Or you may decide that you're not going to store this EAV at all, and instead you'll normalize the information and tell the client that future changes will not be trivial but it's worth it because the present system will be more reliable, faster, and cheaper to operate.

I don't see anything wrong with having multiple storage systems for a single application. For data that fits the relational model well, store it an RDBMS. For data that fits some other model better, store it somewhere else.

There exists no data that fits another model better.

No, seriously, the only reason people consider EAV is because they don't know how to handle the unexpected, EAV is but one of the ways of doing it, and most of all EAV kills the information aspect of having a line in a table. it goes back to data and needs to be processed for it to make any sense (varying attributes of the same objects are only related by identical strings in some column .. wtf)

While EAV may provide a very low quality solution, it almost always is a bad answer.

Wrote a bit about that a while back, explaining how adding history to a database through EAV was a very very bad idea.

For those who can't guess ... millions of rows with three columns, no information only data, no way to efficiently query about a point in time, etc.

2012-02-29 Reply Admin

Jay:
Side note: It is much harder to fix a bad database than a bad program.
If you discover that one module is poorly written, you can often fix it or even throw it away and rewrite it with minimal impact on other modules. Sometimes there are dependencies, of course, but you can do a lot of clean-up without changing external interfaces. And most of the time -- not always, but most of the time -- the number of other modules affected by changes to a given module is manageable.

But if you discover that a database table is poorly designed, fixing it very often means changing every module that accesses that table, which could be lots. Oh, you can add an index, and sometimes you can change a field length or that sort of minor thing. But if you realize that the employee table really needs to be broken into an employee table AND a benefits table, or that the shipping-info table must connect to order ITEMS and not to orders, etc, you could face many, many coding changes all over the place to support it. Plus, you'll have to write a data conversion program to reformat the data from the old schema to the new.

DB design changes are almost always painful. Adding stuff? Not so bad. Changing existing stuff? Painful.

And, you are not a DBA and don't know what should be done in that case obviously ?

add the new tables
move the data to the new tables
delete the old table
create a materialized view that represents the old table
migrate most of the accessing code as you go
dematerialize the view
near the end, add some logging to know which methods still reference that old design and get rid of it
get rid of the view when you're 100% sure it's over

Of course picking the WRONG database hurts, so just avoid MySQL, toySQL and other w/e noob implementations and stick to real RDBMS's like PostgreSQL, Oracle, MSSQL and a few others.

2012-02-29 Reply Admin

HappyEngineer:
The solution we came up with turned 4 hour DB deployments that often failed into half hour deployments that always succeed. We used to have the horrible system where we had one sql script to deploy and another sql script to rollback (per DB). It failed all the time. People failed to update the rollback scripts consistently and they were rarely tested.
We now have xml deploy scripts that have very short bits of sql and the corresponding bit of rollback sql. For example:
	<sql db="cps_cpsdata">
		<terseDesc>Add and set cannedRole field on Role.</terseDesc>
		<deploy>
			alter table Role add cannedRole NUMBER(1)  DEFAULT(0);
			update Role set cannedRole=1 WHERE company='102' and name not like 'Employee%';
		</deploy>
		<rollback>
			alter table Role drop column cannedRole;
		</rollback>
	</sql>
We then have teamcity constantly running the deploy scripts against a test DB every time anything changes.

The point is that each <sql> element is in charge of deploying and rolling back. If you drop a column then you need to save the data in another table first so that the rollback can add it back.

With teamcity constantly deploying and rolling back we quickly find out of the script properly rolls things back.

For more complex operations we have actual dedicated deployer types:
<editPrivilege isNew="true" key="View Android Package" name="View Android Package"/>
The "editPrivilege" type corresponds to java code that knows how to deploy and roll back privilege objects perfectly every time.

The <sql> elements can also run against any DB, so deployments that require changes to several DB will all run at the same time and stay in sync.

The final step is the deploy phase. The deployer runs these scripts and then, if a failure occurs, it automatically rolls back. When the deployer is finished the DB is either in the new updated state or it's back in the original state.

DB deploy used to be a nightmare, but in the 2 years of using this system we have NEVER had a failed DB deployment.

Would that have the same effect as, say, this ?

BEGIN <insert migration script> COMMIT

-- wtf

2012-02-29 Reply Admin

Jeff:
"While NOSQL databases are somewhat based around EAV principles, I can say with 99.999% certainty that NOSQL is not for you. You are not, nor will not have to deal with Facebook-sized problems, and if you ever do, you'll be in a great position to hire a highly-specialized engineer who actually knows how to solve massive scalability problems."
What if my main reason for using NoSQL is NOT the expectation of dealing with Facebook-sized problems?

Say I want to manage a small or medium database, but avoid the development overhead from creating and altering table structures.

If I add a new field to my data model, then in a relational database I'll need to:

Change the code

Create a database change script

Test the database change script

Ensure that the change script runs exactly once

Keep backups of the data in case the change script fails, despite testing

Avoid altering the table during working hours since it involves a full table rebuild

etc etc etc

While in a NoSQL database like MongoDB I'll need to:

Change the code

...and it just works, right away.

If I also want to index the new field for optimum performance, that's another small step, but the signal-to-noise ratio of "what I want to do" and "how much code or queries do I need to write to accomplish it" is still lightyears ahead of RDBMS.

Is NoSQL still not for me?

There's a good reason it's called Mongo(lian)DB, it's the perfect fit for you.

Anyone who doesn't understand why a RDBMS rocks should just not talk about it.

You sir, are not a DBA, are not an SQL expert and never will be a NoSQL expert since that does NOT exist (your very description of the tool shows how much of a tool for the brainless it is).

2012-02-29 Reply Admin

Mike S:
EAV Redux: EAV's useful for situations where you have to store structured data, where users are specifying the structure at run time, and the structure's different for different "classes" of data. In the systems that I've seen and worked on, everything else gets stored in 3rd normal form. It's got plenty of downsides (performance is poor, and reporting's a pain), but it's great for what it does. For reporting, we have a tool that converts EAV to 3rd normal format. It spits out over 100 tables into a seperate reporting database.

What about the FACT that you can specify the structure of a table at run time ? Seriously .. there's no good reason to use EAV when you could simply have them create a new table through your tool on the fly.

2012-02-29 Reply Admin

box:
Honestly the best part of systems like this was the point mentioned last.
Having the ability to spin up a DB that matches production anywhere any time on demand turbo charges productivity. No need for hellish shared development environments or scheduling time on the 'uat' environment for user testing is a freedom that I will never give up ever again. Once you have this ability you can really take your continuous integration testing to the next level too.

<yourdbms>dump <yourdbms>restore

Who does not have that ability ?

2012-02-29 Reply Admin

TheCPUWizard:
While EAV type architectures *may* be "anti-patterns" in many cases, there are definately valid use cases for them. My company created DynaSearch back in the 1980's which has become over the past 25 years a very sophisticated system that is based on the EAV pattern.
For "irregular data" (different items have different "fields") this pattern can be extremely performant, and compact.

On one system (a generalized taxonomoy system) we have a running instance with over 200G "cells" representing over 50TB of information. About 5 years ago another firm approached our client, and recommended going to a "normalized database". Their proposal required over 5000 tables, was estimated to take 3 times as much storage, and in prototype tests, could not match the performance, even though their test set contained less than 1% of the total data.

About 3 years ago, we integrated a NOSQL database for handling "supporting data". This effort has proven to also be quite successful, allowing us to have full access to all of the information which was previously stored in other types of repositories.

It is all about using the right tool for the right job.

Most people's comment on NoSQL show that it's more about using the tool right than using the right tool.

But w/e floats your boat.

I'm pretty sure I could make your thing fly on a DBMS anyway and the fact that you had a bad proposition from one person doesn't say much about the technology.

The one sure thing is, the simpler the technology, the most likely they'll use the tool right.

2012-02-29 Reply Admin

Another vote for Liquibase. Each change is a "changeset", each changeset is checksummed; changesets have rollback actions and cannot be modified.

2012-02-29 Reply Admin

I agree with pretty much everything you say. I have always considered it best practice to have a system capable of doing a test restore of production backups - and to perform that action frequently to ensure confidence in the backups being taken. That system can usually provide a good arena for performing that sort of deployment test - as long as you use all the relevant scrubbing procedures before running any tests. For a schema only test it's possible to simply restore the structure of the database through various methods, but these tend to vary wildly between DBMS...

2012-02-29 Reply Admin

oracle executes a commit immediately before and immediately after any DDL. you can have 20 inserts and then an unrelated alter table. the inserts get committed before the alter table is run, whether it succeeds or not.

I don't know whether this is the best system (last time I looked db2 did not commit ddl automatically), but it is how Oracle works.

Any migrate script has to take that into account.

2012-02-29 Reply Admin

Is "Entity-Attribute-Value" the same as the so-called "narrow table design," that Brad Fitzpatrick used at LiveJournal? I read about that recently but could not find much of anything on it, but the description I read sounds exactly like this. So it leads me to be confused. Alex says it's terrible, but Brad likes it. So confused!

2012-02-29 Reply Admin

L.:
There's a good reason it's called Mongo(lian)DB, it's the perfect fit for you.
Anyone who doesn't understand why a RDBMS rocks should just not talk about it.

You sir, are not a DBA, are not an SQL expert and never will be a NoSQL expert since that does NOT exist (your very description of the tool shows how much of a tool for the brainless it is).

Fair enough, I'm certainly not a DBA. I've worked with MySQL and SQLite databases for about 7 years, but my focus is on the application code, so I'm probably not an SQL expert either (though I may be bold enough to claim not to be a complete noob).

But can you point out some examples of why exactly MongoDB is so bad, other than basically just saying that it's stupid?

The greatest concern for me at the moment is its limited support for transactions. Although like Alex said, I'm not going to be dealing with Facebook scale problems :) For a small or middle sized application, I'd say the overhead from maintaining database structure and it's changes is a bigger problem.

Anything else I should be worried about? (I'm currently developing an application that uses MongoDB and haven't yet ran into significant problems, but if you have some convincing arguments, now is the time to save the project from certain doom)

2012-02-29 Reply Admin

Since when was FOR guaranteed to sort file names rather respecting whatever the underlying file system's order happens to be?

Matt:
I also find it utterly hilarious that you can create an article called 'Database Changes Done Right' without using the word 'transaction' once.

The real WTF is a so-called database that can't do DDL in transactions.

CAPTCHA: abigo - one of three friends with bunged-up noses.

2012-02-29 Reply Admin

Neil:
Since when was FOR guaranteed to sort file names rather respecting whatever the underlying file system's order happens to be?

Technically it would be the glob operator (*.sql) that is doing the sorting, not the FOR statement.

I don't know how the windows shells work, but common Unix shells (I checked bash and tcsh) are guaranteed to glob alphabetically.

2012-02-29 Reply Admin

Jeff:
L.:
There's a good reason it's called Mongo(lian)DB, it's the perfect fit for you.
Anyone who doesn't understand why a RDBMS rocks should just not talk about it.

You sir, are not a DBA, are not an SQL expert and never will be a NoSQL expert since that does NOT exist (your very description of the tool shows how much of a tool for the brainless it is).
Fair enough, I'm certainly not a DBA. I've worked with MySQL and SQLite databases for about 7 years, but my focus is on the application code, so I'm probably not an SQL expert either (though I may be bold enough to claim not to be a complete noob).

But can you point out some examples of why exactly MongoDB is so bad, other than basically just saying that it's stupid?

The greatest concern for me at the moment is its limited support for transactions. Although like Alex said, I'm not going to be dealing with Facebook scale problems :) For a small or middle sized application, I'd say the overhead from maintaining database structure and it's changes is a bigger problem.

Anything else I should be worried about? (I'm currently developing an application that uses MongoDB and haven't yet ran into significant problems, but if you have some convincing arguments, now is the time to save the project from certain doom)

Well . if you worked with MySQL for 7 years I don't think I can help you.

It took me all of one week to realize how much MySQL was behind the other RDBMS's and how much full of crap it was, as well as unreliable and slow and ...

I have no idea what that mongoDB is, but the positive point that was mentionned about it is NOT a positive point, it just states how easy it is to use the tool.

MysQL is easier to use than PostgreSQL, yet it is a thousand times worse.

The usual trend is : easier to use, more limited inside, slower inside, etc.

If you're fine with mongoDB, it means your whole approach to information and knowledge management needs rethinking, you use the database for so little it makes no difference what DB you use (quite often the case with MySQL users) and thus the simplicity of mongoDB goes your direction.

However, the correct way to solve your "blabla database structure changes problem" is to put much more thought in how you manage and handle data, and how to best represent it in a relational model in order to both greatly increase reliability and efficiency + maintainability.

SQLite is afaik even worse than MySQL in terms of features support (those that MySQL half-supports like triggers and all that ...) and MySQL itself lacks ACID support (which InnoDB does NOT handle, so no MySQL that I know of does).

You seem to realize that keeping your data/information safe is important, well you're damn right and that's why you should take a look at the most advanced, reliable and optimized information handling systems : RDBMS's.

So two things :

that schema change problem does not exist, it's merely an effect of the way you model data / application
try postgreSQl, it might be a bit harder than MySQL but at least you'll get something in return

2012-02-29 Reply Admin

Jeff:
L.:
There's a good reason it's called Mongo(lian)DB, it's the perfect fit for you.
Anyone who doesn't understand why a RDBMS rocks should just not talk about it.

You sir, are not a DBA, are not an SQL expert and never will be a NoSQL expert since that does NOT exist (your very description of the tool shows how much of a tool for the brainless it is).
Fair enough, I'm certainly not a DBA. I've worked with MySQL and SQLite databases for about 7 years, but my focus is on the application code, so I'm probably not an SQL expert either (though I may be bold enough to claim not to be a complete noob).

But can you point out some examples of why exactly MongoDB is so bad, other than basically just saying that it's stupid?

The greatest concern for me at the moment is its limited support for transactions. Although like Alex said, I'm not going to be dealing with Facebook scale problems :) For a small or middle sized application, I'd say the overhead from maintaining database structure and it's changes is a bigger problem.

Anything else I should be worried about? (I'm currently developing an application that uses MongoDB and haven't yet ran into significant problems, but if you have some convincing arguments, now is the time to save the project from certain doom)

http://stackoverflow.com/questions/5244437/pros-and-cons-of-mongodb

See . didn't need to know about what it was to tell you it was a piece of crap .

Obviously.. no joins (wtf you're going to join inside the application ????)

No transactions ?? lol seriously

obviously a major lack of consistency and coherency control etc... simply not what you get from a DBMS, which is precisely what you NEED when you're handling information.

Using an RDBMs will make your dev work much shorter, and can help you maintain data quality and consistency, etc. if you're handling anything remotely standard While using mongoDB will give you a much simpler much faster much less reliable very scalable solution for "worthless" information, like tweets or likes or ...

So yes, the simple-minded answer of "if you don't need the scaling, don't use it" - was the correct one.

2012-02-29 Reply Admin

at one company we had a single change script.

any new database changes were appended to that script. this script was maintained in revision control and always run from the start to build a developer database from scratch.

on the production side the maintainers tracked up to what point the changes in the script had been run and just applied any new changes after that point.

greetings, eMBee.

2012-02-29 Reply Admin

So...my coworker brings up an interesting point on EAV, the only time you should avoid this, is when the business's requirements are stupid and demand it.

He worked in an environment where the business required the ability to create document templates on the fly. Part of this is that there would be peculiar fields for a given template. The EAV was the best-of-the-worst answers. Or at least that's what he says.

Alex Papadimoulis · 2012-02-29 Reply Admin

Spike:
You could use a tool like LiquiBase to store your scripts in source control with your application, and manage which scripts need to be run in what order when deploying. All of the above rules still apply of course.

I haven't had an opportunity to adequately try out LiquiBase, so my opinion is based on a first impression. That being said...

When I first saw it, it seemed that you wrote "change scripts" in a database-agnostic XML language like this:

..<create table="employees"> ....<column type="char(5)" name="employee_id"> ......<constraint primaryKey="true">

I have a knee-jerk reaction to any less expressive abstractions of an abstraction, especially when XML is added. I fail to see any benefit to this.

If you feel that the "it can be run against any database" is really a feature/benefit, then why not apply the same anti-pattern to application code?

2012-02-29 Reply Admin

L.:
However, the correct way to solve your "blabla database structure changes problem" is to put much more thought in how you manage and handle data, and how to best represent it in a relational model in order to both greatly increase reliability and efficiency + maintainability.
/--/

that schema change problem does not exist, it's merely an effect of the way you model data / application

By this, do you mean that:

A) data should be modeled so well in the first place that structure changes will never be needed

or

B) the development process should be set up so that structure changes could be applied more reliably (e.g. the stuff Alex talks about in the article)?

Or some option C? :)

2012-02-29 Reply Admin

Fesh:
So...my coworker brings up an interesting point on EAV, the only time you should avoid this, is when the business's requirements are stupid and demand it.
He worked in an environment where the business required the ability to create document templates on the fly. Part of this is that there would be peculiar fields for a given template. The EAV was the best-of-the-worst answers. Or at least that's what he says.

That's fine, if it's modelling actual arbitrarily structured information from the user, then it's good. As long as it's only used where you need to use it.

That's not the antipattern. The antipattern is deciding to use that kind of structure for regular data that could just as easily have been put into a normal table.

The joins (and subselects) that result are terrifying.

2012-02-29 Reply Admin

Jeff:
Or some option C? :)

C) Work on a project that never grows, changes, or adds features so the thing you're modelling will never have to change.

2012-02-29 Reply Admin

L.:
http://stackoverflow.com/questions/5244437/pros-and-cons-of-mongodb
See . didn't need to know about what it was to tell you it was a piece of crap .

yay, internet nerd gets on high horse about product he knows nothing about, finds a single community post written a year ago about a system that is stated in that post to be fast growing and uses that to declare his assumption to be the correct one.

That's some real deep research there, friend.

Not evangelizing about mongo, it's an ok product with some valid applications and some things that it's very much not suited to do, but c'mon man. You're worse than Alex when he gets on his soapbox.

2012-02-29 Reply Admin

This is one reason I really love Rails -- all database changes MUST go through migrations. And every migration has an "up" and "down" depending on whether you are migrating up to it, or rolling back down to it. All migrations are timestamped down to microseconds when they are created, and that timestamp is part of the filename; there is no question what order they were created.

The applications knows where it is at in the migration process by keeping a record in the schema reference about the current migration version.

Most migration files contain simple add/remove type instructions, but I've also integrated seed data into a migration simply so it would be tracked in version control when I deploy to production. Love it.

2012-02-29 Reply Admin

I think that you're right on the premises, but your solution is a bit too simplistic. I Started to reply as a comment but finally the comment was so large that I thought it would be best to have it as a post on my own blog. See http://consultuning.blogspot.com/2012/02/is-there-right-way-of-doing-database.html

2012-02-29 Reply Admin

L.:
HappyEngineer:
The solution we came up with turned 4 hour DB deployments that often failed into half hour deployments that always succeed. We used to have the horrible system where we had one sql script to deploy and another sql script to rollback (per DB). It failed all the time. People failed to update the rollback scripts consistently and they were rarely tested.
We now have xml deploy scripts that have very short bits of sql and the corresponding bit of rollback sql. For example:
	<sql db="cps_cpsdata">
		<terseDesc>Add and set cannedRole field on Role.</terseDesc>
		<deploy>
			alter table Role add cannedRole NUMBER(1)  DEFAULT(0);
			update Role set cannedRole=1 WHERE company='102' and name not like 'Employee%';
		</deploy>
		<rollback>
			alter table Role drop column cannedRole;
		</rollback>
	</sql>
We then have teamcity constantly running the deploy scripts against a test DB every time anything changes.

The point is that each <sql> element is in charge of deploying and rolling back. If you drop a column then you need to save the data in another table first so that the rollback can add it back.

With teamcity constantly deploying and rolling back we quickly find out of the script properly rolls things back.

For more complex operations we have actual dedicated deployer types:
<editPrivilege isNew="true" key="View Android Package" name="View Android Package"/>
The "editPrivilege" type corresponds to java code that knows how to deploy and roll back privilege objects perfectly every time.

The <sql> elements can also run against any DB, so deployments that require changes to several DB will all run at the same time and stay in sync.

The final step is the deploy phase. The deployer runs these scripts and then, if a failure occurs, it automatically rolls back. When the deployer is finished the DB is either in the new updated state or it's back in the original state.

DB deploy used to be a nightmare, but in the 2 years of using this system we have NEVER had a failed DB deployment.
Would that have the same effect as, say, this ?

BEGIN <insert migration script> COMMIT

-- wtf

No, it would not. DDL doesn't take place as part of a transaction, so rollback wouldn't give you your column back.

Sometimes the errors are detected by the java code before they hit the DB.

The java deployers are far more reliable because they are tested and reused and they validate the input given.

Our method of deployment can be easily tested. Writing separate sql scripts for deploy and rollback was demonstrably more difficult to test (the constant deploy failures were proof of that).

Our system allows much more control. The error messages in the logs are very clear about which things failed. A full tree is output that shows the order of the deployment and which things failed and which did not. It also includes timing information so we can see how each part is taking.

For a while I thought about writing an open source version for use by others. I never got around to it, but perhaps I will someday. The difference between sql scripts and our method is huge.

2012-02-29 Reply Admin

[quote user=HappyEngineer] No, it would not. DDL doesn't take place as part of a transaction, so rollback wouldn't give you your column back. [/quote]

That depends entirely on the database. Postgres has perfect DLL transactions, which are completely nvisible to other transactions unless and until the commit line is succesfully reached.

Microsof'ts SQL server does reasonably well DDL transactions, although not everything is supported, and there are a few cases where in progress DDL transactions are visible to other transactions.

I'm guessing you are talking about Oracle, which is objectively a rather terrible database. Sure it is scalable, but it lacks critical things like distinguishing between null and empty strings, and even minimal DDL transaction support.

2012-03-01 Reply Admin

Nice article, but I don't see the point in separating out object scripts and change scripts.

It seems to me that objects are quite closely tied to the structure of your data: add/remove/rename/alter a column on a table, then any triggers, procedures, and/or views that rely on that table will likely need to change as well. I'd think you'd want these changes to occur in tandem, and ideally inside a single transaction. Separating these components into separate scripts would seem to lose that, and while one certainly can rerun an object script, I don't really see any reason why you'd want or need to run them independently of a change script. I.e., you'd only be running an object script because you're changing the database, either just the object alone or because of an underlying change in the data structure. Distinguishing between these changes seems unnecessary.

Plus, if one of the goals is to be able to spin up new database environments simply by running through the change scripts, it's quite possible -- if not likely -- that some of the scripts may rely on older object versions (triggers especially) to function properly. Moving the objects into separate scripts and keeping only the current version seems directly counter to being able to recreate a current (or earlier) version of the database.

Also, why the need for the external log on top of the database's internal log? Especially if we're reloading the database and going back to the drawing board should a script ever fail, both logs will be identical on a functioning database, no?

What am I missing?

2012-03-01 Reply Admin

So, it has come to this…

Now that we know what the solutions are not, what is the solution when (parts of the) data structures are defined by users at runtime?

Manipulating tables? What if you want to keep old data when a column goes obsolete?

2012-03-01 Reply Admin

Ok my "soapbox" time now. I work on application level with databases a lot. There are multiple languages and lots of different database implementations yet they all seem to suffer from similar issues.

Database activitiy is almost always done by constructing strings to pass to the database. This looks wrong to me in many ways.

I actually think that the policy of only calling stored procedures from an application is quite a good one. However you should be able to do all of this with

A single call to the database to retrieve an object for the stored procedure, related to its structure
Attach this procedure to a connection. It is important we do it at this point as we may watn to re-use the object.
Bind variables to this object, one or more times.
execute the procedure, one or more times
process the results for each execution call or handle errors.

ideally doing the latter through some async future.

There could be more "structured" programming for other types of query too. There is for BCP but it's non-standard and only supports inserts (and reads), not merges.

2012-03-01 Reply Admin

Alex:
I think it's incredibly limiting to say migrations have to be SQL scripts.
We have a system of migrations with full access to all our application's services, the full object model and the EntityManager (we're running on Hibernate).

Sometimes, these migrations just run an SQL script (like a simple UPDATE to set a new boolean field to false for every row), but often a new field will need to be calculated somehow, and then it's nice to have your actual application logic do the calculations.

Also, if your whole application is using JPA QL, dropping down to SQL in migrations will be bug prone.

This is dangerous, because you migration scripts are now depending on your application code. It will work fine initially, and then in changeset 25674 someone introduced a subtle change to the application logic, and then revision 26198 get tagged and released, and then the migration script fails.

Alex Papadimoulis:
SQL does not need to be “wrapped” with a less expressive language.

Actually, Python, Ruby, etc are more expressive than SQL, as the later is not Turing complete. SQLAlchemy (python ORM library), for example, has an migration library called Alembic that put the full expressiveness of Python at your disposal (no DSL), and can optionally generate SQL scripts for offline migration.

sbdragoo · 2012-03-01 Reply Admin

FuBar brings a great point. A previous "database" I worked with and ultimately converted to a digital format was also kept on note cards. It was a water valve system for a municipality of about 140K. The system was designed so well and the users were trained well and had such a high level of discipline that only a few values were in place where lookup tables would ultimately be employed in the digital version. Once the digital version was created, the screens were designed to mimic the card input already in place and the transition was the easiest I have ever seen. The key was the original design and the discipline of the users over the decades the system was in place... The original system was implemented in the 1920's and was converted to SQL Server / ESRI GIS in early 2000's.

sbdragoo · 2012-03-01 Reply Admin

Great article. I really appreciated the points regarding schema compare. Where I currently work, we had (yes, past tense) two developers in particular who swore that a certain compare tool was the only suitable method for DB deployment to occur. This continued until they both departed even though they wanted to compare the testing environment where multiple versions were being tested for release against the production environment. Now that we have moved to a system of structured change scripts with no alterations to the script once created (except in the case of syntax error), our application / database interface issues have become near zero.

2012-03-01 Reply Admin

I mostly agree with what is here. I have had success with making schema changes in the following way. Check for a "backup version" of the table(s) you are going to modify. If it does not exist rename the current table to the back up. You may need to drop FKs and some indexes to get this all to work.

Create new tables using the original table name. Move the data from the old tables into the new ones. Reestablish the indexes and FKs. If I don't get everything in the design right the first time, I can tweak the conversion script and rerun until everything is correct.

If you look at the change scripts the SSMS creates when you modify a table, you will get some idea how to do this. They of course delete the old version of the table as soon as the script completes. I do not do that and would clean the old table up in the next release.

If something goes horribly wrong during an install, you could always undo this by deleting the tables you created and renaming the old tables back to the original name. Then reestablish the required indexes and FKs.

I have been using this technic for many years now and have never had to do a backout or restore a DB from a backup due to a botched install. Of course this will only work with DB of a moderate size.

2012-03-01 Reply Admin

Can someone tell me the point of this article?

Making changes to databases is not hard. Versioning, testing, and deploying these changes is not hard. I could've summed this whole article up in a few sentences. Making yourself out to be a database guru because you're comparing to those that can't do simple change control on a database is pretty relative. Is this a re-publish from 10 years ago?

However, doing database changes alongside a set of much more rapidly updating applications is the tricky part. Imagine one database supports many applications (100+) that have much faster iterations than the database. A team of 200+ developers and a dozen DBAs for instance.

Unfortunately this article does nothing to address that. It doesn't address the relationship between application and database deployment. And not addressing this means it doesn't really hold any value at all. As an example, say I have a large high performance database that is a data store for 100+ applications. One of these large applications is business critical, and you have iterations in a matter of minutes and hours, not weeks and months.

Your business dictates you need an application change out immediately. This change requires new/different data in the data store shared by many. The application changes take an hour to implement. Take it to the low level, and the database needs a schema change. One that would likely break existing applications. Using this process, what do we do now? And if it actually works, does time to market suffer along the way? What about transitioning from the old version to the new version? I.E. both versions running in parallel, along with a dozen more.

I couldn't disagree with this article more. It's a great start for those that have absolutely no change control for databases on their small projects with slow iterations. But if you approach something like I'm describing you will very, very quickly outgrow this approach completely.

pbean · 2012-03-01 Reply Admin

We're using Liquibase... it's not perfect, but does the job. Handles database updates and rollbacks pretty well. Why should we reinvent something that already exists? Its disadvantages (XML) do not outweigh its advantages.

2012-03-01 Reply Admin

Alex, you didn't address fully the issue of the data flowing back from Prod into the Test and Dev databases. I agree fully with your statements when the data is end-user generated.

However, I work in an environment where data created by the company for public consumption has to be reviewed by the legal compliance team. Because many departments contribute to these changes the data is stored in an easily editable Content Management System. Because the data is sensitive, and putting wrong data out in Prod could shut down our company, it is too risky to make the company produced data changes directly in Prod. (All the CMSs I've seen are bad at holding back changes until a review has been made.)

The data changes produced by the company are non-editabe in Prod (the CMS server code is not even deployed to Prod) so we can keep a tighter lid on security. We do keep the CMS as close to Prod as we can and flow the changes in both directions.

Database Changes Done Right

Leave a comment on “Database Changes Done Right”