- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Don't know much about other DBMS's but with Oracle change control is fairly straightforward.
For example, I just write the ALTER statements I need to add or drop columns. The scripts are perfectly safe to rerun, because if you try to add a column that has already been added, or drop a column that was already dropped, you just get an error message and no change is done.
Also, it's easy enough to build the rollback scripts - e.g. if you drop a column, you first copy the data for the column (along with the PK) to a backup table, then drop the column. To rollback, just re-add the column, update the rows from the backup data, then reapply any constraints.
When they're ready for testing, the change scripts are run in the test environment, testing is done, then the scripts are rolled back, more testing is done (to make sure the rollback did its job), then the scripts are run again.
This is standard practice in pretty much all the places I've worked. The exception was one client who had a clued-up DBA who knew how to use the Oracle flashback features to do the rollbacks - which made testing the scripts a whole lot easier. It didn't matter if a developer made a huge mess of their database change scripts - the DBA could easily undo their mess.
Admin
You could use a tool like LiquiBase to store your scripts in source control with your application, and manage which scripts need to be run in what order when deploying. All of the above rules still apply of course.
Admin
And, just how the fuck did I solicit this fucking speech!?
Shouldn't his site be featuring IT anecdotes?
Admin
ROFL. You just reminded me of the system I'm working with.
How about EAV IN 3rd normal form, eh?
(Table names have been adjusted to protect the oh so very guilty).
Joins, joins everywhere....
Admin
User customized packages, both big and small, use customizable columns, rather than DDL to alter the database.
As far as I remember, you do it that way because your code is designed to display and calculate data from the database, not meta-data from the database.
I think that putting the customer customizations in with the customer data seemed like the logical way to go, and certainly made updates and maintainance easier.
Admin
What about maintaining database creation scripts? Do you just have the original schema creation script and then run all your change scripts on it, or do you maintain scripts for creating 'new' databases from scratch?
Admin
Honestly the best part of systems like this was the point mentioned last.
Having the ability to spin up a DB that matches production anywhere any time on demand turbo charges productivity. No need for hellish shared development environments or scheduling time on the 'uat' environment for user testing is a freedom that I will never give up ever again. Once you have this ability you can really take your continuous integration testing to the next level too.
Admin
While EAV type architectures may be "anti-patterns" in many cases, there are definately valid use cases for them. My company created DynaSearch back in the 1980's which has become over the past 25 years a very sophisticated system that is based on the EAV pattern.
For "irregular data" (different items have different "fields") this pattern can be extremely performant, and compact.
On one system (a generalized taxonomoy system) we have a running instance with over 200G "cells" representing over 50TB of information. About 5 years ago another firm approached our client, and recommended going to a "normalized database". Their proposal required over 5000 tables, was estimated to take 3 times as much storage, and in prototype tests, could not match the performance, even though their test set contained less than 1% of the total data.
About 3 years ago, we integrated a NOSQL database for handling "supporting data". This effort has proven to also be quite successful, allowing us to have full access to all of the information which was previously stored in other types of repositories.
It is all about using the right tool for the right job.
Admin
For the testing, you would restore a recent backup of the production database, and obviously you don't restore that on top of production, you restore it into a separate environment which is only used to restore backup copies of production. So that is not one of the environments already mentioned in the article.
Testing database changes this way is a good idea even if you also test it in other ways.
Periodically restoring backups of production into a separate environment is a good idea even if you don't use the environment you restored to for testing new code every time you have performed such a restore. The reason it is a good idea is that it gives you more confidence in the proper operation of your backup and restore procedures.
There is of course a few caveats to this approach.
Admin
While that's the approach I use as well, just allowing the script to error on re-runs sucks a bit too, because now your script output is full of error messages for things that aren't really errors. I've been bitten a few times where I've missed a genuine bug in my script because I told myself "oh that's OK, I expected all those errors" (except the one that mattered, of course).
Flashback works nicely (for those that don't know, database flashback is logically equivalent to a "revert to revision X" command to a source control system). Often it's quicker than doing a restore (or an export/import) because the alternatives involve replacing everything from scratch from an export.
It's also a good way to get your test environment back to a known state for re-tests.
Personally, what I'd like to do (but don't - our DBAs don't cooperate) is see a combination of flashback (to revert the DB to it's "production" version) along with full runs of the migration script.
Admin
It reminds me of a database I encountered a few years ago that was being used to support an RDF query processor (EAV, but where the values were really all more entities, as were the attributes themselves; crazy system!) A “database expert” colleague of mine had set it all up, and it was taking over 20 minutes to do a relatively simple query. Turns out, he'd explicitly decided to not have any indexes set up “because they take time to rebuild when the data changes”. Adding indexes got the query into the so-fast-we-don't-measure domain. The “expert” doesn't work for us any more (though not because of this incident, to be fair).
Admin
And for me it has a reputation of "this piece of crap is unusable until you customize the crap out of it ... and even then it sucks badly".
It's more corporate umbrella warfare than software anyway.
Admin
And what if what you call your application logic is located in the wrong place and that's why you see that issue ?
Try and give a real example, because not using SQL means losing transactional integrity (which you do not have in your java program), and that's a heavy price.
Admin
There exists no data that fits another model better.
No, seriously, the only reason people consider EAV is because they don't know how to handle the unexpected, EAV is but one of the ways of doing it, and most of all EAV kills the information aspect of having a line in a table. it goes back to data and needs to be processed for it to make any sense (varying attributes of the same objects are only related by identical strings in some column .. wtf)
While EAV may provide a very low quality solution, it almost always is a bad answer.
Wrote a bit about that a while back, explaining how adding history to a database through EAV was a very very bad idea.
For those who can't guess ... millions of rows with three columns, no information only data, no way to efficiently query about a point in time, etc.
Admin
And, you are not a DBA and don't know what should be done in that case obviously ?
Of course picking the WRONG database hurts, so just avoid MySQL, toySQL and other w/e noob implementations and stick to real RDBMS's like PostgreSQL, Oracle, MSSQL and a few others.
Admin
Would that have the same effect as, say, this ?
BEGIN <insert migration script> COMMIT
-- wtf
Admin
There's a good reason it's called Mongo(lian)DB, it's the perfect fit for you.
Anyone who doesn't understand why a RDBMS rocks should just not talk about it.
You sir, are not a DBA, are not an SQL expert and never will be a NoSQL expert since that does NOT exist (your very description of the tool shows how much of a tool for the brainless it is).
Admin
What about the FACT that you can specify the structure of a table at run time ? Seriously .. there's no good reason to use EAV when you could simply have them create a new table through your tool on the fly.
Admin
<yourdbms>dump <yourdbms>restore
Who does not have that ability ?
Admin
Most people's comment on NoSQL show that it's more about using the tool right than using the right tool.
But w/e floats your boat.
I'm pretty sure I could make your thing fly on a DBMS anyway and the fact that you had a bad proposition from one person doesn't say much about the technology.
The one sure thing is, the simpler the technology, the most likely they'll use the tool right.
Admin
Another vote for Liquibase. Each change is a "changeset", each changeset is checksummed; changesets have rollback actions and cannot be modified.
Admin
I agree with pretty much everything you say. I have always considered it best practice to have a system capable of doing a test restore of production backups - and to perform that action frequently to ensure confidence in the backups being taken. That system can usually provide a good arena for performing that sort of deployment test - as long as you use all the relevant scrubbing procedures before running any tests. For a schema only test it's possible to simply restore the structure of the database through various methods, but these tend to vary wildly between DBMS...
Admin
oracle executes a commit immediately before and immediately after any DDL. you can have 20 inserts and then an unrelated alter table. the inserts get committed before the alter table is run, whether it succeeds or not.
I don't know whether this is the best system (last time I looked db2 did not commit ddl automatically), but it is how Oracle works.
Any migrate script has to take that into account.
Admin
Is "Entity-Attribute-Value" the same as the so-called "narrow table design," that Brad Fitzpatrick used at LiveJournal? I read about that recently but could not find much of anything on it, but the description I read sounds exactly like this. So it leads me to be confused. Alex says it's terrible, but Brad likes it. So confused!
Admin
But can you point out some examples of why exactly MongoDB is so bad, other than basically just saying that it's stupid?
The greatest concern for me at the moment is its limited support for transactions. Although like Alex said, I'm not going to be dealing with Facebook scale problems :) For a small or middle sized application, I'd say the overhead from maintaining database structure and it's changes is a bigger problem.
Anything else I should be worried about? (I'm currently developing an application that uses MongoDB and haven't yet ran into significant problems, but if you have some convincing arguments, now is the time to save the project from certain doom)
Admin
Since when was FOR guaranteed to sort file names rather respecting whatever the underlying file system's order happens to be?
The real WTF is a so-called database that can't do DDL in transactions.CAPTCHA: abigo - one of three friends with bunged-up noses.
Admin
Technically it would be the glob operator (*.sql) that is doing the sorting, not the FOR statement.
I don't know how the windows shells work, but common Unix shells (I checked bash and tcsh) are guaranteed to glob alphabetically.
Admin
Well . if you worked with MySQL for 7 years I don't think I can help you.
It took me all of one week to realize how much MySQL was behind the other RDBMS's and how much full of crap it was, as well as unreliable and slow and ...
I have no idea what that mongoDB is, but the positive point that was mentionned about it is NOT a positive point, it just states how easy it is to use the tool.
MysQL is easier to use than PostgreSQL, yet it is a thousand times worse.
The usual trend is : easier to use, more limited inside, slower inside, etc.
If you're fine with mongoDB, it means your whole approach to information and knowledge management needs rethinking, you use the database for so little it makes no difference what DB you use (quite often the case with MySQL users) and thus the simplicity of mongoDB goes your direction.
However, the correct way to solve your "blabla database structure changes problem" is to put much more thought in how you manage and handle data, and how to best represent it in a relational model in order to both greatly increase reliability and efficiency + maintainability.
SQLite is afaik even worse than MySQL in terms of features support (those that MySQL half-supports like triggers and all that ...) and MySQL itself lacks ACID support (which InnoDB does NOT handle, so no MySQL that I know of does).
You seem to realize that keeping your data/information safe is important, well you're damn right and that's why you should take a look at the most advanced, reliable and optimized information handling systems : RDBMS's.
So two things :
Admin
http://stackoverflow.com/questions/5244437/pros-and-cons-of-mongodb
See . didn't need to know about what it was to tell you it was a piece of crap .
Obviously.. no joins (wtf you're going to join inside the application ????)
No transactions ?? lol seriously
obviously a major lack of consistency and coherency control etc... simply not what you get from a DBMS, which is precisely what you NEED when you're handling information.
Using an RDBMs will make your dev work much shorter, and can help you maintain data quality and consistency, etc. if you're handling anything remotely standard While using mongoDB will give you a much simpler much faster much less reliable very scalable solution for "worthless" information, like tweets or likes or ...
So yes, the simple-minded answer of "if you don't need the scaling, don't use it" - was the correct one.
Admin
at one company we had a single change script.
any new database changes were appended to that script. this script was maintained in revision control and always run from the start to build a developer database from scratch.
on the production side the maintainers tracked up to what point the changes in the script had been run and just applied any new changes after that point.
greetings, eMBee.
Admin
So...my coworker brings up an interesting point on EAV, the only time you should avoid this, is when the business's requirements are stupid and demand it.
He worked in an environment where the business required the ability to create document templates on the fly. Part of this is that there would be peculiar fields for a given template. The EAV was the best-of-the-worst answers. Or at least that's what he says.
Admin
I haven't had an opportunity to adequately try out LiquiBase, so my opinion is based on a first impression. That being said...
When I first saw it, it seemed that you wrote "change scripts" in a database-agnostic XML language like this:
..<create table="employees"> ....<column type="char(5)" name="employee_id"> ......<constraint primaryKey="true">
I have a knee-jerk reaction to any less expressive abstractions of an abstraction, especially when XML is added. I fail to see any benefit to this.
If you feel that the "it can be run against any database" is really a feature/benefit, then why not apply the same anti-pattern to application code?
Admin
A) data should be modeled so well in the first place that structure changes will never be needed
or
B) the development process should be set up so that structure changes could be applied more reliably (e.g. the stuff Alex talks about in the article)?
Or some option C? :)
Admin
That's fine, if it's modelling actual arbitrarily structured information from the user, then it's good. As long as it's only used where you need to use it.
That's not the antipattern. The antipattern is deciding to use that kind of structure for regular data that could just as easily have been put into a normal table.
The joins (and subselects) that result are terrifying.
Admin
C) Work on a project that never grows, changes, or adds features so the thing you're modelling will never have to change.
Admin
yay, internet nerd gets on high horse about product he knows nothing about, finds a single community post written a year ago about a system that is stated in that post to be fast growing and uses that to declare his assumption to be the correct one.
That's some real deep research there, friend.
Not evangelizing about mongo, it's an ok product with some valid applications and some things that it's very much not suited to do, but c'mon man. You're worse than Alex when he gets on his soapbox.
Admin
This is one reason I really love Rails -- all database changes MUST go through migrations. And every migration has an "up" and "down" depending on whether you are migrating up to it, or rolling back down to it. All migrations are timestamped down to microseconds when they are created, and that timestamp is part of the filename; there is no question what order they were created.
The applications knows where it is at in the migration process by keeping a record in the schema reference about the current migration version.
Most migration files contain simple add/remove type instructions, but I've also integrated seed data into a migration simply so it would be tracked in version control when I deploy to production. Love it.
Admin
I think that you're right on the premises, but your solution is a bit too simplistic. I Started to reply as a comment but finally the comment was so large that I thought it would be best to have it as a post on my own blog. See http://consultuning.blogspot.com/2012/02/is-there-right-way-of-doing-database.html
Admin
No, it would not. DDL doesn't take place as part of a transaction, so rollback wouldn't give you your column back.
Sometimes the errors are detected by the java code before they hit the DB.
The java deployers are far more reliable because they are tested and reused and they validate the input given.
Our method of deployment can be easily tested. Writing separate sql scripts for deploy and rollback was demonstrably more difficult to test (the constant deploy failures were proof of that).
Our system allows much more control. The error messages in the logs are very clear about which things failed. A full tree is output that shows the order of the deployment and which things failed and which did not. It also includes timing information so we can see how each part is taking.
For a while I thought about writing an open source version for use by others. I never got around to it, but perhaps I will someday. The difference between sql scripts and our method is huge.
Admin
[quote user=HappyEngineer] No, it would not. DDL doesn't take place as part of a transaction, so rollback wouldn't give you your column back. [/quote]
That depends entirely on the database. Postgres has perfect DLL transactions, which are completely nvisible to other transactions unless and until the commit line is succesfully reached.
Microsof'ts SQL server does reasonably well DDL transactions, although not everything is supported, and there are a few cases where in progress DDL transactions are visible to other transactions.
I'm guessing you are talking about Oracle, which is objectively a rather terrible database. Sure it is scalable, but it lacks critical things like distinguishing between null and empty strings, and even minimal DDL transaction support.
Admin
Nice article, but I don't see the point in separating out object scripts and change scripts.
It seems to me that objects are quite closely tied to the structure of your data: add/remove/rename/alter a column on a table, then any triggers, procedures, and/or views that rely on that table will likely need to change as well. I'd think you'd want these changes to occur in tandem, and ideally inside a single transaction. Separating these components into separate scripts would seem to lose that, and while one certainly can rerun an object script, I don't really see any reason why you'd want or need to run them independently of a change script. I.e., you'd only be running an object script because you're changing the database, either just the object alone or because of an underlying change in the data structure. Distinguishing between these changes seems unnecessary.
Plus, if one of the goals is to be able to spin up new database environments simply by running through the change scripts, it's quite possible -- if not likely -- that some of the scripts may rely on older object versions (triggers especially) to function properly. Moving the objects into separate scripts and keeping only the current version seems directly counter to being able to recreate a current (or earlier) version of the database.
Also, why the need for the external log on top of the database's internal log? Especially if we're reloading the database and going back to the drawing board should a script ever fail, both logs will be identical on a functioning database, no?
What am I missing?
Admin
So, it has come to this…
Now that we know what the solutions are not, what is the solution when (parts of the) data structures are defined by users at runtime?
Manipulating tables? What if you want to keep old data when a column goes obsolete?
Admin
Ok my "soapbox" time now. I work on application level with databases a lot. There are multiple languages and lots of different database implementations yet they all seem to suffer from similar issues.
Database activitiy is almost always done by constructing strings to pass to the database. This looks wrong to me in many ways.
I actually think that the policy of only calling stored procedures from an application is quite a good one. However you should be able to do all of this with
ideally doing the latter through some async future.
There could be more "structured" programming for other types of query too. There is for BCP but it's non-standard and only supports inserts (and reads), not merges.
Admin
This is dangerous, because you migration scripts are now depending on your application code. It will work fine initially, and then in changeset 25674 someone introduced a subtle change to the application logic, and then revision 26198 get tagged and released, and then the migration script fails.
Actually, Python, Ruby, etc are more expressive than SQL, as the later is not Turing complete. SQLAlchemy (python ORM library), for example, has an migration library called Alembic that put the full expressiveness of Python at your disposal (no DSL), and can optionally generate SQL scripts for offline migration.Admin
FuBar brings a great point. A previous "database" I worked with and ultimately converted to a digital format was also kept on note cards. It was a water valve system for a municipality of about 140K. The system was designed so well and the users were trained well and had such a high level of discipline that only a few values were in place where lookup tables would ultimately be employed in the digital version. Once the digital version was created, the screens were designed to mimic the card input already in place and the transition was the easiest I have ever seen. The key was the original design and the discipline of the users over the decades the system was in place... The original system was implemented in the 1920's and was converted to SQL Server / ESRI GIS in early 2000's.
Admin
Great article. I really appreciated the points regarding schema compare. Where I currently work, we had (yes, past tense) two developers in particular who swore that a certain compare tool was the only suitable method for DB deployment to occur. This continued until they both departed even though they wanted to compare the testing environment where multiple versions were being tested for release against the production environment. Now that we have moved to a system of structured change scripts with no alterations to the script once created (except in the case of syntax error), our application / database interface issues have become near zero.
Admin
I mostly agree with what is here. I have had success with making schema changes in the following way. Check for a "backup version" of the table(s) you are going to modify. If it does not exist rename the current table to the back up. You may need to drop FKs and some indexes to get this all to work.
Create new tables using the original table name. Move the data from the old tables into the new ones. Reestablish the indexes and FKs. If I don't get everything in the design right the first time, I can tweak the conversion script and rerun until everything is correct.
If you look at the change scripts the SSMS creates when you modify a table, you will get some idea how to do this. They of course delete the old version of the table as soon as the script completes. I do not do that and would clean the old table up in the next release.
If something goes horribly wrong during an install, you could always undo this by deleting the tables you created and renaming the old tables back to the original name. Then reestablish the required indexes and FKs.
I have been using this technic for many years now and have never had to do a backout or restore a DB from a backup due to a botched install. Of course this will only work with DB of a moderate size.
Admin
Can someone tell me the point of this article?
Making changes to databases is not hard. Versioning, testing, and deploying these changes is not hard. I could've summed this whole article up in a few sentences. Making yourself out to be a database guru because you're comparing to those that can't do simple change control on a database is pretty relative. Is this a re-publish from 10 years ago?
However, doing database changes alongside a set of much more rapidly updating applications is the tricky part. Imagine one database supports many applications (100+) that have much faster iterations than the database. A team of 200+ developers and a dozen DBAs for instance.
Unfortunately this article does nothing to address that. It doesn't address the relationship between application and database deployment. And not addressing this means it doesn't really hold any value at all. As an example, say I have a large high performance database that is a data store for 100+ applications. One of these large applications is business critical, and you have iterations in a matter of minutes and hours, not weeks and months.
Your business dictates you need an application change out immediately. This change requires new/different data in the data store shared by many. The application changes take an hour to implement. Take it to the low level, and the database needs a schema change. One that would likely break existing applications. Using this process, what do we do now? And if it actually works, does time to market suffer along the way? What about transitioning from the old version to the new version? I.E. both versions running in parallel, along with a dozen more.
I couldn't disagree with this article more. It's a great start for those that have absolutely no change control for databases on their small projects with slow iterations. But if you approach something like I'm describing you will very, very quickly outgrow this approach completely.
Admin
We're using Liquibase... it's not perfect, but does the job. Handles database updates and rollbacks pretty well. Why should we reinvent something that already exists? Its disadvantages (XML) do not outweigh its advantages.
Admin
Alex, you didn't address fully the issue of the data flowing back from Prod into the Test and Dev databases. I agree fully with your statements when the data is end-user generated.
However, I work in an environment where data created by the company for public consumption has to be reviewed by the legal compliance team. Because many departments contribute to these changes the data is stored in an easily editable Content Management System. Because the data is sensitive, and putting wrong data out in Prod could shut down our company, it is too risky to make the company produced data changes directly in Prod. (All the CMSs I've seen are bad at holding back changes until a review has been made.)
The data changes produced by the company are non-editabe in Prod (the CMS server code is not even deployed to Prod) so we can keep a tighter lid on security. We do keep the CMS as close to Prod as we can and flow the changes in both directions.