Database Changes Done Right

« Return to Article
  • Alex Papadimoulis 2012-02-28 09:47
    Decided to cut this from the article... the comments seemed like a good place to paste it.

    Yes to NOSQL?
    While NOSQL databases are somewhat based around EAV principles, I can say with 99.999% certainty that NOSQL is not for you. You are not, nor will not have to deal with Facebook-sized problems, and if you ever do, you'll be in a great position to hire a highly-specialized engineer who actually knows how to solve massive scalability problems.

    NOSQL databases are not a database evolution by any means. In fact, they’re a throwback many of the systems designed in the 1960';s and 1970's, where a megabyte of memory cost more than a programmer's annual salary. As hardware costs became cheaper (i.e. a megabyte only cost a week’s salary), databases evolved into what we know today: the relational model.

    Play with them on your hobby projects, but they don't belong in your work projects. Remember, programming is supposed to be boring.
  • Ohlmann 2012-02-28 10:46
    The comment about NoSQL can be quickly wrong. NoSQL are not replacement of database to help performance, that's a common fallacy.

    They are just another useful tool for a precise use. Exactly like SQL, but for others needs.

    Redis, for example, is useful for precises cases : if you don't need to request data by something else than its id, have concurrent computing problems, and / or have an awful lot of writing, Redis can be a useful tool.

    (note that "NoSQL" is by itself a wrong phrase. It's a buzzword tackled on very differents technologies with very different use)

    And above all, keep your head cool and use the tool you need. Not using NoSQL is exactly as stupid as using only NoSQL. Do you stop using your car because you can take a plane ? That's the same with NoSQL. Use it when you need it, and don't use fallacy like "I am not facebook, this can't be useful".
  • übercoder 2012-02-28 10:48
    We use (something like) www.deltasql.org for versioning of our DB schemas between stable / testing / development branches.
  • me 2012-02-28 10:56
    Maybe I'm having a late-term case of post-Monday stupidity, but what the hell does SAP have to do/mean in the context of "being replaced" - do you mean the Germain company or something else?
  • Matthew Brubaker 2012-02-28 11:05
    For those interested in doing versioning of their database in the manner Alex specified near the end of the article, the FOSS tool RoundhousE (https://github.com/chucknorris/roundhouse) is the tool we use where I work and have been exceedingly happy with.
  • emurphy 2012-02-28 11:11
    me:
    Maybe I'm having a late-term case of post-Monday stupidity, but what the hell does SAP have to do/mean in the context of "being replaced" - do you mean the Germain company or something else?


    Their ERP software, presumably. IIRC it has a reputation of "there are lots of people out there who will customize the crap out of this thing if you have the budget for it".
  • 3rd Ferguson 2012-02-28 11:13
    Perhaps the most important component of any change management system is organizational discipline. It's particularly true about databases, where tools to automate it and audit it are immature at best. The organization must be disciplined enough to document its way around the lack of tools.

    My organization (a large paint company up the road from Inedo) maintains a code repository chock full of DDL code, the latest checked-in version of which is always implemented in Dev, QA and Prod EXCEPT when there is an ACTIVE change request. The DDL is simply "CREATE OR REPLACE" for procedures/packages and simply "ALTER TABLE" when table structures change. (Yes, we know that makes it painful to recover to a given place in the chain of table mods but in 2+ years of being in maintenance mode that has never happened outside of Dev. Let's be realistic about whether we really need to solve that problem.) The change request system is sufficiently advanced to coordinate all the steps required to modify both the DDL layer in the database along with any executable code, scheduled jobs, configuration, etc. Both the business people and the several technical teams involved in most changes can all see the change request as it winds its way along, being notified by email whenever it advances between environments or teams. Whoever the latest person is in the chain sees whatever has already happened and we make sure to pester each other when we're not satisfied with the quality of the entries at each step. It's a metric buttload of paperwork but the process is auditable and repeatable, and that meets the ultimate business need of knowing what's in Production and what to do when it breaks at 3:00AM.

    Alex is right about the resistance to change leading to uncontrolled change (and vice versa). Think of earthquake activity on a fault line, where pent up need to shift leads to sudden mass change and often disaster. We avoid that by having a workable, if labor intensive process, for managing lots of small changes. The learning curve on the process is steeper than anyone would like but it's really not that bad after the first few changes and it beats the crap out of the alternative that's been so aptly laid out in various WTFs over the years.

    /CAPTCHA: nibh is where Mrs. Frisby escaped from
  • boog 2012-02-28 11:32
    The Article:
    ...most developers’ reluctance to master this method leads towards a tendency to simultaneously resist change and change things uncontrollably.
    ...or to abandon databases altogether and seek out alternate solutions on the pretense that they are either "easier to understand" or "the future" of databases.
  • Birfhum 2012-02-28 11:32
    I am glad to be using Event Sourcing and CQRS; database changes aren't an issue.
  • Myth 2012-02-28 11:34
    3rd Ferguson:
    /CAPTCHA: nibh is where Mrs. Frisby escaped from


    That's NIMH, not NIBH.
  • Alex 2012-02-28 11:34
    I think it's incredibly limiting to say migrations have to be SQL scripts.

    We have a system of migrations with full access to all our application's services, the full object model and the EntityManager (we're running on Hibernate).

    Sometimes, these migrations just run an SQL script (like a simple UPDATE to set a new boolean field to false for every row), but often a new field will need to be calculated somehow, and then it's nice to have your actual application logic do the calculations.

    Also, if your whole application is using JPA QL, dropping down to SQL in migrations will be bug prone.
  • ObiWayneKenobi 2012-02-28 11:38
    The WRONG way to do this is to mandate all database access be done via stored procedures, and then have to run some five dozen sprocs to "update" a database. And store each of these scripts in SVN and never remove them, resulting in a not-quite-migration history of all the changes you've done to your database.
  • Simple 2012-02-28 11:46
    I delivered to a customer the truly ultimate, general purpose, totally flexible platform. I didn't even install an O/S on the computer I supplied.
  • Jon 2012-02-28 11:47
    +1 For logs. Without, it's Russian Roulette (just a matter of time).
  • Mike S 2012-02-28 13:12
    Minor Point:
    We've used EAV several times when we didn't know what data we'd be capturing, and what we would capture would change regularly. It's a nice general solution to that problem.
    The only real alternatives I've seen is loads 'o data tables, or the single table with 99+ string columns.
  • Kemp 2012-02-28 13:28
    Alembic. That is all.
  • Matthew 2012-02-28 13:30
    It made me feel all warm and fuzzy to know I have worked in two different places that did this right. Between source control software, dev/test/prod (along with a dedicated QA team) we never had a problem escape into production. Submitting every db change as a script may have been a PITA, but it worked.
  • savar 2012-02-28 13:47
    I haven't replied here in some time, but this article touches on a sensitive topic at my workplace for a few years. In that time we have had 4 different migration systems, including (1) no system at all (2) a perl hackery thing I wrote (3) the official migration system packaged with our ORM and finally (4) a new custom migration system written in the same language as the rest of our project that was thoroughly discussed and planned before beginning the implementation.

    I'm not sure our solution would work everywhere, but so far it's solved all of the big problems we've had with each of our previous systems. And it largely matches what Alex describes.

    Mike S:
    Minor Point:
    We've used EAV several times when we didn't know what data we'd be capturing, and what we would capture would change regularly. It's a nice general solution to that problem.
    The only real alternatives I've seen is loads 'o data tables, or the single table with 99+ string columns.


    Anyway, I wanted to reply to this in particular. The need for the EAV pattern isn't new or uncommon. Lots of designers face a similar problem with "we don't know what we'll need to store". The problem with putting EAV in an RDBMS is that you lose all of the useful features of the RDBMS such as types, constraints, foreign keys, etc. You also take on some of the negative aspects of RDBMS (such as the change management quagmire mentioned in the foregoing article).

    At that point, the only real reason to put the EAV in the RDBMS is because you can't think of any better place to put it. That's usually not a good rationale for designing serious applications.

    Depending on your needs, it may make more sense to serialize the EAV to a file (e.g. INI format or a native binary format), serialize to volatile storage (e.g. memcached), or write to a document-based storage system <please don't hit me>. Or you may decide that you're not going to store this EAV at all, and instead you'll normalize the information and tell the client that future changes will not be trivial but it's worth it because the present system will be more reliable, faster, and cheaper to operate.

    I don't see anything wrong with having multiple storage systems for a single application. For data that fits the relational model well, store it an RDBMS. For data that fits some other model better, store it somewhere else.
  • Teotwawki 2012-02-28 13:52
    EAV. First time I encountered this outside of school was Magento. As much as it makes a lot of sense theoretically, EAV can be very painful.

    EAV has its place: college classrooms.
  • Jacob 2012-02-28 13:55
    Would anyone care to comment on how well Alex's approach here matches the Ruby on Rails approach (migration scripts written in a database-agnostic DSL, each of which is meant to be reversible; a log in each data environment defining which scripts have been run against that environment)?
  • Anketam 2012-02-28 14:04
    savar:
    ...
    <please don't hit me>
    ...
    For the lols and giggles!
    *Punches savar in the arm*
  • Teotwawki 2012-02-28 14:11
    I just knew there would be someone trying to ruin tdwtf by putting informative and well thought out comments on tdwtf!

    In particular: "I don't see anything wrong with having multiple storage systems for a single application"

    Microsoft's Great Plains software sort of does this. It essentially uses EAV on some tables while using a more traditional approach on others. However, the table names and data definitions being stored in files rather than having useful table names and column names in the database drives me nuts now and then.

    Magento has moved a long ways from it's beginning. Nearly everything was in an EAV format in early versions, but now some data has transitioned from EAV to a more traditional column oriented style.
  • Jay 2012-02-28 14:19
    Simple:
    I delivered to a customer the truly ultimate, general purpose, totally flexible platform. I didn't even install an O/S on the computer I supplied.


    I gave my customers the most flexible system yet invented: A pad of paper and a box of crayons. They can store any data they want, including text and graphics.
  • geoffrey, MCP, PMP 2012-02-28 14:20
    Ohlmann:
    The comment about NoSQL can be quickly wrong. NoSQL are not replacement of database to help performance, that's a common fallacy.

    They are just another useful tool for a precise use. Exactly like SQL, but for others needs.

    Redis, for example, is useful for precises cases : if you don't need to request data by something else than its id, have concurrent computing problems, and / or have an awful lot of writing, Redis can be a useful tool.

    (note that "NoSQL" is by itself a wrong phrase. It's a buzzword tackled on very differents technologies with very different use)

    And above all, keep your head cool and use the tool you need. Not using NoSQL is exactly as stupid as using only NoSQL. Do you stop using your car because you can take a plane ? That's the same with NoSQL. Use it when you need it, and don't use fallacy like "I am not facebook, this can't be useful".


    NoSQL is a fad invented by programmers who think they have outsmarted giant enterprises when it comes to data storage and retrieval. Nothing more.
  • Jay 2012-02-28 14:32
    Side note: It is much harder to fix a bad database than a bad program.

    If you discover that one module is poorly written, you can often fix it or even throw it away and rewrite it with minimal impact on other modules. Sometimes there are dependencies, of course, but you can do a lot of clean-up without changing external interfaces. And most of the time -- not always, but most of the time -- the number of other modules affected by changes to a given module is manageable.

    But if you discover that a database table is poorly designed, fixing it very often means changing every module that accesses that table, which could be lots. Oh, you can add an index, and sometimes you can change a field length or that sort of minor thing. But if you realize that the employee table really needs to be broken into an employee table AND a benefits table, or that the shipping-info table must connect to order ITEMS and not to orders, etc, you could face many, many coding changes all over the place to support it. Plus, you'll have to write a data conversion program to reformat the data from the old schema to the new.

    DB design changes are almost always painful. Adding stuff? Not so bad. Changing existing stuff? Painful.
  • SRP 2012-02-28 14:53
    A spirit level of good application code is the ease with which code can be arbitrarily changed without breaking the build.

    Are you suggesting that the same applies to the database? As soon as you are unprepared to change the schema, the entire database is likely broken and is in need of an overhaul?
  • HappyEngineer 2012-02-28 15:02
    The solution we came up with turned 4 hour DB deployments that often failed into half hour deployments that always succeed. We used to have the horrible system where we had one sql script to deploy and another sql script to rollback (per DB). It failed all the time. People failed to update the rollback scripts consistently and they were rarely tested.

    We now have xml deploy scripts that have very short bits of sql and the corresponding bit of rollback sql. For example:

    <sql db="cps_cpsdata">
    <terseDesc>Add and set cannedRole field on Role.</terseDesc>
    <deploy>
    alter table Role add cannedRole NUMBER(1) DEFAULT(0);
    update Role set cannedRole=1 WHERE company='102' and name not like 'Employee%';
    </deploy>
    <rollback>
    alter table Role drop column cannedRole;
    </rollback>
    </sql>

    We then have teamcity constantly running the deploy scripts against a test DB every time anything changes.

    The point is that each <sql> element is in charge of deploying and rolling back. If you drop a column then you need to save the data in another table first so that the rollback can add it back.

    With teamcity constantly deploying and rolling back we quickly find out of the script properly rolls things back.

    For more complex operations we have actual dedicated deployer types:

    <editPrivilege isNew="true" key="View Android Package" name="View Android Package"/>

    The "editPrivilege" type corresponds to java code that knows how to deploy and roll back privilege objects perfectly every time.

    The <sql> elements can also run against any DB, so deployments that require changes to several DB will all run at the same time and stay in sync.

    The final step is the deploy phase. The deployer runs these scripts and then, if a failure occurs, it automatically rolls back. When the deployer is finished the DB is either in the new updated state or it's back in the original state.

    DB deploy used to be a nightmare, but in the 2 years of using this system we have NEVER had a failed DB deployment.
  • Alex Papadimoulis 2012-02-28 15:10
    Jacob:
    Would anyone care to comment on how well Alex's approach here matches the Ruby on Rails approach (migration scripts written in a database-agnostic DSL, each of which is meant to be reversible; a log in each data environment defining which scripts have been run against that environment)?


    I almost added this as an anti-pattern, but ran out of room. So, here goes…

    Anti-Pattern: DB Migrate
    Ruby on Rails developers have championed an “innovative” approach called “DB Migrate” to handle database changes. Essentially, you define structures in a database-agnostic, domain-specific language and then apply those to any supported database platform. I put innovative in quotes because this is simply a reinvention of a bad idea.

    Database management systems were designed to solve the problem of tightly-coupled application code and application data. They learned long ago that data not only outlives code, but it works a lot differently and, therefore, needs to be treated differently.

    DB Migrate is simply a tightly coupled convention/syntax of the underlying RoR platform and, worse, it unnecessarily abstracts an abstraction. SQL does not need to be “wrapped” with a less expressive language.

    The “reversibility” of the migrations is security theater. The idea of a “simple rollback” does lead to a warm and fuzzy feeling, but no one ever tests rollback scripts. Especially against production … where it actually matters. Because if they actually knew how to test stuff, then they wouldn’t need to worry about rollbacks in the first place. So, you’re left with a script that simply makes things worse.

    Does it make development faster? Of course it does. Then again, so does writing your code directly in production.
  • Jeff 2012-02-28 15:17
    "While NOSQL databases are somewhat based around EAV principles, I can say with 99.999% certainty that NOSQL is not for you. You are not, nor will not have to deal with Facebook-sized problems, and if you ever do, you'll be in a great position to hire a highly-specialized engineer who actually knows how to solve massive scalability problems."

    What if my main reason for using NoSQL is NOT the expectation of dealing with Facebook-sized problems?

    Say I want to manage a small or medium database, but avoid the development overhead from creating and altering table structures.

    If I add a new field to my data model, then in a relational database I'll need to:

    1. Change the code
    2. Create a database change script
    3. Test the database change script
    4. Ensure that the change script runs exactly once
    5. Keep backups of the data in case the change script fails, despite testing
    6. Avoid altering the table during working hours since it involves a full table rebuild
    7. etc etc etc

    While in a NoSQL database like MongoDB I'll need to:

    1. Change the code

    ...and it just works, right away.

    If I also want to index the new field for optimum performance, that's another small step, but the signal-to-noise ratio of "what I want to do" and "how much code or queries do I need to write to accomplish it" is still lightyears ahead of RDBMS.

    Is NoSQL still not for me?
  • FuBar 2012-02-28 15:17
    Jay:
    Side note: It is much harder to fix a bad database than a bad program.
    +1 this. Start with your entities and attributes, people! Validate it with the business! Remember, the data belongs to the organization, not to the application. Repeat: The data belongs to the organization, not to the application. My organization has data that started out on recipe cards in the 1930's. That data was computerized on a mainframe in the 1960's, migrated to a Vax in the 1980's, migrated to MS SQL and .Net in 2011, and will undoubtedly be on something else 20 years from now. This is possible because the first people to computerize the data thought carefully about how to structure it - they kept future generations in mind. The data belongs to the organization, not to the application.
  • frits 2012-02-28 15:21
    Jay:
    Side note: It is much harder to fix a bad database than a bad program.

    If you discover that one module is poorly written, you can often fix it or even throw it away and rewrite it with minimal impact on other modules. Sometimes there are dependencies, of course, but you can do a lot of clean-up without changing external interfaces. And most of the time -- not always, but most of the time -- the number of other modules affected by changes to a given module is manageable.

    But if you discover that a database table is poorly designed, fixing it very often means changing every module that accesses that table, which could be lots. Oh, you can add an index, and sometimes you can change a field length or that sort of minor thing. But if you realize that the employee table really needs to be broken into an employee table AND a benefits table, or that the shipping-info table must connect to order ITEMS and not to orders, etc, you could face many, many coding changes all over the place to support it. Plus, you'll have to write a data conversion program to reformat the data from the old schema to the new.

    DB design changes are almost always painful. Adding stuff? Not so bad. Changing existing stuff? Painful.
    Ever heard of a DAL?
  • OneMist8k 2012-02-28 15:32
    NoSQL versus SQL?

    Martin Fowler recently put together a few excellent slides on the subject. Read and enjoy.

  • Jonathan 2012-02-28 15:42
    So tell us how you really feel about Notes, Alex...
  • Boris 2012-02-28 15:57
    Re: "Re-executable Change Scripts",
    I don't get it. If your intention is to add a column to a table you do exactly what's in the example. If the column already exists in (some versions of) the database, but is defined differently, you know about it and therefore stick an alter column statement to cover it. In fact, it is crucial that a script can be run multiple times while making sure that no changes are made if the script is executed again.
    People accidentally do that all the time.
  • Matt 2012-02-28 16:27
    That's only an anti-pattern because you're using the tool a foolish way. Most of us manage to have a single deploy script that is fully tested. It's called deploying against a restore of a recent backup.

    I also find it utterly hilarious that you can create an article called 'Database Changes Done Right' without using the word 'transaction' once.
  • PiisAWheeL 2012-02-28 16:47
    OneMist8k:
    NoSQL versus SQL?

    Martin Fowler recently put together
    [removed to make akismet happy.]
    a few excellent slides[/url] on the subject. Read and enjoy.


    Here's my favorite reading on the subject:
    http://howfuckedismydatabase.com/nosql/

    And akismet can piss off!
  • Gunslinger 2012-02-28 16:48
    If you designed the database correctly the first time, this wouldn't be an issue.
  • Zunesis, In the Flesh! (Your mom's!) 2012-02-28 16:55
    Zylon:
    Looks like somebody has an itchy comment-deleting finger today.
    Looks like somebody was so satisfied with the self-fellation that was today's article that he couldn't allow a hint of dissent.
  • DiskJunky 2012-02-28 17:06
    We use a very similar system in my workplace, with some minor changes that suit our deployment a little better. Eg, we have a simple db versioning system where a table exists called ControlRecord that contains Major, Minor and Release columns. Our scripts are batched into;
    SQL\
    SQL\Original
    SQL\Patches\v1.0.10
    SQL\Patches\v1.2.00
    SQL\Patches\...

    Where each folder contains a "001) DB Version Check.sql" script that ensures that the scripts in the folder are only every run against the correct version of the database. Our batch file is a little more explicit in that it parses the results of "dir *.sql /b/a-d/on" to ensure that the scripts are run in order (some file systems don't return files sorted by name automatically, eg, FAT32 - why such an FS would be used in production is a different question but you get the idea)

    On a side note, you may want to change your scripts to use "sqlcmd" instead of "osql" as "osql" is being deprecated...
  • Mike S 2012-02-28 17:07
    EAV Redux:
    EAV's useful for situations where you have to store structured data, where users are specifying the structure at run time, and the structure's different for different "classes" of data.
    In the systems that I've seen and worked on, everything else gets stored in 3rd normal form.
    It's got plenty of downsides (performance is poor, and reporting's a pain), but it's great for what it does.
    For reporting, we have a tool that converts EAV to 3rd normal format. It spits out over 100 tables into a seperate reporting database.
  • AN AMAZING CODER 2012-02-28 17:10
    Alex Papadimoulis:
    Decided to cut this from the article... the comments seemed like a good place to paste it.

    Yes to NOSQL?
    While NOSQL databases are somewhat based around EAV principles, I can say with 99.999% certainty that NOSQL is not for you. You are not, nor will not have to deal with Facebook-sized problems, and if you ever do, you'll be in a great position to hire a highly-specialized engineer who actually knows how to solve massive scalability problems.


    Either you're not using NoSql as the buzzword it has become, or I can say with 99.999% certainly you make claims off the cuff instead of due diligence.


    This is like the fools that think Macs suck because they're not PCs and vice versa. Neither is correct, both are dumb.
  • Dave-Sir 2012-02-28 17:21
    Myth:
    3rd Ferguson:
    /CAPTCHA: nibh is where Mrs. Frisby escaped from


    That's NIMH, not NIBH.
    Depends on whether or not you have a cold...

    \CAPTCHA: validus is the residue left over from validation.
  • Peter 2012-02-28 17:34
    I like the article and it sounds like it is a formula for success, but your comments on all of NoSQL and migration frameworks are...unfortunate.

    It's almost like you don't want alternative approaches to work, like you're rooting against NoSQL and rooting against migration frameworks such as RoundhousE (which uses native SQL scripts, not an abstraction).
  • C 2012-02-28 17:48
    Alex Papadimoulis:

    Anti-Pattern: DB Migrate
    Ruby on Rails developers have championed an “innovative” approach called “DB Migrate” to handle database changes. Essentially, you define structures in a database-agnostic, domain-specific language and then apply those to any supported database platform. I put innovative in quotes because this is simply a reinvention of a bad idea.

    Database management systems were designed to solve the problem of tightly-coupled application code and application data. They learned long ago that data not only outlives code, but it works a lot differently and, therefore, needs to be treated differently.

    DB Migrate is simply a tightly coupled convention/syntax of the underlying RoR platform and, worse, it unnecessarily abstracts an abstraction. SQL does not need to be “wrapped” with a less expressive language.

    The “reversibility” of the migrations is security theater. The idea of a “simple rollback” does lead to a warm and fuzzy feeling, but no one ever tests rollback scripts. Especially against production … where it actually matters. Because if they actually knew how to test stuff, then they wouldn’t need to worry about rollbacks in the first place. So, you’re left with a script that simply makes things worse.

    Does it make development faster? Of course it does. Then again, so does writing your code directly in production.


    If you ever use a call to db:migrate:rollback in production, you've done something very, very wrong and should be restoring from backup anyway. If you've not tested your changes against a dev or staging environment, again, you've done something very, very wrong.

    On the subject of NoSQL databases, in the case of the document database (mongodb, amazon simpledb), it's quite useful for quickly prototyping and developing. Might not be the best to go enterprise with, but quite fine for prototype stages.
  • Michael 2012-02-28 17:54
    Jonathan:
    So tell us how you really feel about Notes, Alex...


    I was thinking of that the entire time I read this article.

    CAPTCHA: esse. don't stab me, esse.
  • jinx 2012-02-28 17:54
    without reading all the comments, check out Liquibase. Not perfect, but does all Alex mentioned.
    <br/>

    captcha: f you for posting damn captchas
  • Norman Diamond 2012-02-28 18:33
    Alex Papadimoulis:
    2.(Optional) Code Review – this is optional, of course, but it's always nice to have a second set of eyes

    (Optional) Checking for any kind of bugs in any kind of development, anywhere.
    (Optional) A second set of eyes.
    (Optional)
    NO IT IS NOT.

    (Optional) Producing front page material.
    (Optional) The owner of this site suggesting that the avoidance of front page material is Optional.
  • Rich 2012-02-28 18:39
    (Optional) Code Review – this is optional, of course, but it’s always nice to have a second set of eyes


    Maybe the other eyes can help is true, but a lot of the time the value comes from you having to explain things. If it's just in your head, it can be a somewhat featureless blob of thought. You think "hey, I'll get that when I get there."

    But to explain it to someone, you need to take the the featureless blob to something much more concrete. Those 'think of later' things need to be thought of, those loose ends need to be tied.

    I keep a bookmark for the Cardboard Cutout Dog around.
  • myName 2012-02-28 20:27
    Yes, I'll just run a SQL script against my UniVerse database, that'll work.
  • Vincent 2012-02-28 21:15
    I used south with django, which is more or less the same functionality as ruby on rails migrations ( just comment if you think it's not that similar ). I'd say it's really great during development, but while rollback scripts can be handy, I'm not sure if it's really useful or safe on a production database. Such simple automated migrations are useful when the database is really bound to only one application, so let's not compare apples to oranges, on a centralized enterprise database with all kind of applications using it, that would be nonsense.

    Then on nosql databases : you still have a schema that may change, and that means you may need data migrations scripts. For instance, a blog post may have comments directly embed in the blog post record ; but it doesn't work well if you have thousands of records embedded in a single parent record. So you may need to create another table with comments, and just keep IDs in the blog post table. So it's easy to create new columns, embed subrecords, or create indexes on the fly ... but using a nosql database does not mean you can avoid data migrations.
  • Jeff Kemp 2012-02-28 21:17
    Don't know much about other DBMS's but with Oracle change control is fairly straightforward.

    For example, I just write the ALTER statements I need to add or drop columns. The scripts are perfectly safe to rerun, because if you try to add a column that has already been added, or drop a column that was already dropped, you just get an error message and no change is done.

    Also, it's easy enough to build the rollback scripts - e.g. if you drop a column, you first copy the data for the column (along with the PK) to a backup table, then drop the column. To rollback, just re-add the column, update the rows from the backup data, then reapply any constraints.

    When they're ready for testing, the change scripts are run in the test environment, testing is done, then the scripts are rolled back, more testing is done (to make sure the rollback did its job), then the scripts are run again.

    This is standard practice in pretty much all the places I've worked. The exception was one client who had a clued-up DBA who knew how to use the Oracle flashback features to do the rollbacks - which made testing the scripts a whole lot easier. It didn't matter if a developer made a huge mess of their database change scripts - the DBA could easily undo their mess.
  • Spike 2012-02-28 21:54
    You could use a tool like LiquiBase to store your scripts in source control with your application, and manage which scripts need to be run in what order when deploying. All of the above rules still apply of course.
  • commoveo 2012-02-28 22:17
    And, just how the fuck did I solicit this fucking speech!?

    Shouldn't his site be featuring IT anecdotes?
  • Therac-25 2012-02-29 01:01
    Mike S:
    EAV Redux:
    EAV's useful for situations where you have to store structured data, where users are specifying the structure at run time, and the structure's different for different "classes" of data.
    In the systems that I've seen and worked on, everything else gets stored in 3rd normal form.
    It's got plenty of downsides (performance is poor, and reporting's a pain), but it's great for what it does.
    For reporting, we have a tool that converts EAV to 3rd normal format. It spits out over 100 tables into a seperate reporting database.


    ROFL. You just reminded me of the system I'm working with.

    How about EAV *IN* 3rd normal form, eh?

    (Table names have been adjusted to protect the oh so very guilty).


    Table "entity"
    Column | Type |
    ----------------+---------+
    entity_id | integer |
    entity_type_id | integer |

    Table "entity_type"
    Column | Type |
    ------------------+------------------------+
    entity_type_id | integer |
    entity_type_code | character varying(128) |

    Table "entity_attribute_type"
    Column | Type |
    -------------------------------+------------------------+
    entity_attribute_type_id | integer |
    entity_type_id | integer |
    entity_attribute_name | character varying(128) |
    entity_attribute_data_type_id | integer |

    Table "entity_attribute"
    Column | Type |
    --------------------------+---------+
    entity_id | integer |
    entity_attribute_type_id | integer |
    entity_attribute_value | text |



    Joins, joins everywhere....
  • tradie 2012-02-29 01:15
    User customized packages, both big and small, use customizable columns, rather than DDL to alter the database.

    As far as I remember, you do it that way because your code is designed to display and calculate data from the database, not meta-data from the database.

    I think that putting the customer customizations in with the customer data seemed like the logical way to go, and certainly made updates and maintainance easier.

  • Benjol 2012-02-29 01:17
    What about maintaining database creation scripts? Do you just have the original schema creation script and then run all your change scripts on it, or do you maintain scripts for creating 'new' databases from scratch?
  • box 2012-02-29 02:15
    Honestly the best part of systems like this was the point mentioned last.

    Having the ability to spin up a DB that matches production anywhere any time on demand turbo charges productivity. No need for hellish shared development environments or scheduling time on the 'uat' environment for user testing is a freedom that I will never give up ever again. Once you have this ability you can really take your continuous integration testing to the next level too.
  • TheCPUWizard 2012-02-29 02:35
    While EAV type architectures *may* be "anti-patterns" in many cases, there are definately valid use cases for them. My company created DynaSearch back in the 1980's which has become over the past 25 years a very sophisticated system that is based on the EAV pattern.

    For "irregular data" (different items have different "fields") this pattern can be extremely performant, and compact.

    On one system (a generalized taxonomoy system) we have a running instance with over 200G "cells" representing over 50TB of information. About 5 years ago another firm approached our client, and recommended going to a "normalized database". Their proposal required over 5000 tables, was estimated to take 3 times as much storage, and in prototype tests, could not match the performance, even though their test set contained less than 1% of the total data.

    About 3 years ago, we integrated a NOSQL database for handling "supporting data". This effort has proven to also be quite successful, allowing us to have full access to all of the information which was previously stored in other types of repositories.

    It is all about using the right tool for the right job.
  • Kasper 2012-02-29 03:15
    Most of us manage to have a single deploy script that is fully tested. It's called deploying against a restore of a recent backup.
    You didn't specify which of the environments you would restore the backup of, and where you would restore it to. If you have decided to only keep backups of production and not of the other environments, then it becomes obvious.

    For the testing, you would restore a recent backup of the production database, and obviously you don't restore that on top of production, you restore it into a separate environment which is only used to restore backup copies of production. So that is not one of the environments already mentioned in the article.

    Testing database changes this way is a good idea even if you also test it in other ways.

    Periodically restoring backups of production into a separate environment is a good idea even if you don't use the environment you restored to for testing new code every time you have performed such a restore. The reason it is a good idea is that it gives you more confidence in the proper operation of your backup and restore procedures.

    There is of course a few caveats to this approach.
    - The production system may have much more data than any of the other environments. An environment that you only use for testing against a restore of production data may need more hardware than you'd usually want to allocate for that purpose.
    - It might be that not everybody working on the code is permitted to access the production data. Testing against a restore of production data requires most of the access control that would also be used to guard production data on the live system.
    - When testing with production data, separation of the systems becomes even more important. Leaking data from the earlier test environments into the real world is rarely a disaster. If the test system was for example sending out emails or accidentally tries to perform changes against production, the damage would be limited, most likely none. However if something similar happened on a test environment that used a copy of production data, it could be bad.
  • LISP Programmer 2012-02-29 03:22
    Jeff Kemp:
    Don't know much about other DBMS's but with Oracle change control is fairly straightforward.

    For example, I just write the ALTER statements I need to add or drop columns. The scripts are perfectly safe to rerun, because if you try to add a column that has already been added, or drop a column that was already dropped, you just get an error message and no change is done.

    Also, it's easy enough to build the rollback scripts - e.g. if you drop a column, you first copy the data for the column (along with the PK) to a backup table, then drop the column. To rollback, just re-add the column, update the rows from the backup data, then reapply any constraints.

    When they're ready for testing, the change scripts are run in the test environment, testing is done, then the scripts are rolled back, more testing is done (to make sure the rollback did its job), then the scripts are run again.

    This is standard practice in pretty much all the places I've worked. The exception was one client who had a clued-up DBA who knew how to use the Oracle flashback features to do the rollbacks - which made testing the scripts a whole lot easier. It didn't matter if a developer made a huge mess of their database change scripts - the DBA could easily undo their mess.


    While that's the approach I use as well, just allowing the script to error on re-runs sucks a bit too, because now your script output is full of error messages for things that aren't really errors. I've been bitten a few times where I've missed a genuine bug in my script because I told myself "oh that's OK, I expected all those errors" (except the one that mattered, of course).

    Flashback works nicely (for those that don't know, database flashback is logically equivalent to a "revert to revision X" command to a source control system). Often it's quicker than doing a restore (or an export/import) because the alternatives involve replacing everything from scratch from an export.

    It's also a good way to get your test environment back to a known state for re-tests.

    Personally, what I'd like to do (but don't - our DBAs don't cooperate) is see a combination of flashback (to revert the DB to it's "production" version) along with full runs of the migration script.
  • dkf 2012-02-29 04:32
    Therac-25:
    Joins, joins everywhere....
    My God, it's full of JOINs!

    It reminds me of a database I encountered a few years ago that was being used to support an RDF query processor (EAV, but where the values were really all more entities, as were the attributes themselves; crazy system!) A “database expert” colleague of mine had set it all up, and it was taking over 20 minutes to do a relatively simple query. Turns out, he'd explicitly decided to not have any indexes set up “because they take time to rebuild when the data changes”. Adding indexes got the query into the so-fast-we-don't-measure domain. The “expert” doesn't work for us any more (though not because of this incident, to be fair).
  • L. 2012-02-29 05:31
    emurphy:
    me:
    Maybe I'm having a late-term case of post-Monday stupidity, but what the hell does SAP have to do/mean in the context of "being replaced" - do you mean the Germain company or something else?


    Their ERP software, presumably. IIRC it has a reputation of "there are lots of people out there who will customize the crap out of this thing if you have the budget for it".


    And for me it has a reputation of "this piece of crap is unusable until you customize the crap out of it ... and even then it sucks badly".

    It's more corporate umbrella warfare than software anyway.
  • L. 2012-02-29 05:33
    Alex:
    I think it's incredibly limiting to say migrations have to be SQL scripts.

    We have a system of migrations with full access to all our application's services, the full object model and the EntityManager (we're running on Hibernate).

    Sometimes, these migrations just run an SQL script (like a simple UPDATE to set a new boolean field to false for every row), but often a new field will need to be calculated somehow, and then it's nice to have your actual application logic do the calculations.

    Also, if your whole application is using JPA QL, dropping down to SQL in migrations will be bug prone.


    And what if what you call your application logic is located in the wrong place and that's why you see that issue ?

    Try and give a real example, because not using SQL means losing transactional integrity (which you do not have in your java program), and that's a heavy price.
  • L. 2012-02-29 05:40
    savar:
    I haven't replied here in some time, but this article touches on a sensitive topic at my workplace for a few years. In that time we have had 4 different migration systems, including (1) no system at all (2) a perl hackery thing I wrote (3) the official migration system packaged with our ORM and finally (4) a new custom migration system written in the same language as the rest of our project that was thoroughly discussed and planned before beginning the implementation.

    I'm not sure our solution would work everywhere, but so far it's solved all of the big problems we've had with each of our previous systems. And it largely matches what Alex describes.

    Mike S:
    Minor Point:
    We've used EAV several times when we didn't know what data we'd be capturing, and what we would capture would change regularly. It's a nice general solution to that problem.
    The only real alternatives I've seen is loads 'o data tables, or the single table with 99+ string columns.


    Anyway, I wanted to reply to this in particular. The need for the EAV pattern isn't new or uncommon. Lots of designers face a similar problem with "we don't know what we'll need to store". The problem with putting EAV in an RDBMS is that you lose all of the useful features of the RDBMS such as types, constraints, foreign keys, etc. You also take on some of the negative aspects of RDBMS (such as the change management quagmire mentioned in the foregoing article).

    At that point, the only real reason to put the EAV in the RDBMS is because you can't think of any better place to put it. That's usually not a good rationale for designing serious applications.

    Depending on your needs, it may make more sense to serialize the EAV to a file (e.g. INI format or a native binary format), serialize to volatile storage (e.g. memcached), or write to a document-based storage system <please don't hit me>. Or you may decide that you're not going to store this EAV at all, and instead you'll normalize the information and tell the client that future changes will not be trivial but it's worth it because the present system will be more reliable, faster, and cheaper to operate.

    I don't see anything wrong with having multiple storage systems for a single application. For data that fits the relational model well, store it an RDBMS. For data that fits some other model better, store it somewhere else.


    There exists no data that fits another model better.

    No, seriously, the only reason people consider EAV is because they don't know how to handle the unexpected, EAV is but one of the ways of doing it, and most of all EAV kills the information aspect of having a line in a table. it goes back to data and needs to be processed for it to make any sense (varying attributes of the same objects are only related by identical strings in some column .. wtf)

    While EAV may provide a very low quality solution, it almost always is a bad answer.

    Wrote a bit about that a while back, explaining how adding history to a database through EAV was a very very bad idea.

    For those who can't guess ... millions of rows with three columns, no information only data, no way to efficiently query about a point in time, etc.
  • L. 2012-02-29 05:45
    Jay:
    Side note: It is much harder to fix a bad database than a bad program.

    If you discover that one module is poorly written, you can often fix it or even throw it away and rewrite it with minimal impact on other modules. Sometimes there are dependencies, of course, but you can do a lot of clean-up without changing external interfaces. And most of the time -- not always, but most of the time -- the number of other modules affected by changes to a given module is manageable.

    But if you discover that a database table is poorly designed, fixing it very often means changing every module that accesses that table, which could be lots. Oh, you can add an index, and sometimes you can change a field length or that sort of minor thing. But if you realize that the employee table really needs to be broken into an employee table AND a benefits table, or that the shipping-info table must connect to order ITEMS and not to orders, etc, you could face many, many coding changes all over the place to support it. Plus, you'll have to write a data conversion program to reformat the data from the old schema to the new.

    DB design changes are almost always painful. Adding stuff? Not so bad. Changing existing stuff? Painful.


    And, you are not a DBA and don't know what should be done in that case obviously ?

    1. add the new tables
    2. move the data to the new tables
    3. delete the old table
    4. create a materialized view that represents the old table
    5. migrate most of the accessing code as you go
    6. dematerialize the view
    7. near the end, add some logging to know which methods still reference that old design and get rid of it
    8. get rid of the view when you're 100% sure it's over

    Of course picking the WRONG database hurts, so just avoid MySQL, toySQL and other w/e noob implementations and stick to real RDBMS's like PostgreSQL, Oracle, MSSQL and a few others.
  • L. 2012-02-29 05:48
    HappyEngineer:
    The solution we came up with turned 4 hour DB deployments that often failed into half hour deployments that always succeed. We used to have the horrible system where we had one sql script to deploy and another sql script to rollback (per DB). It failed all the time. People failed to update the rollback scripts consistently and they were rarely tested.

    We now have xml deploy scripts that have very short bits of sql and the corresponding bit of rollback sql. For example:

    <sql db="cps_cpsdata">
    <terseDesc>Add and set cannedRole field on Role.</terseDesc>
    <deploy>
    alter table Role add cannedRole NUMBER(1) DEFAULT(0);
    update Role set cannedRole=1 WHERE company='102' and name not like 'Employee%';
    </deploy>
    <rollback>
    alter table Role drop column cannedRole;
    </rollback>
    </sql>

    We then have teamcity constantly running the deploy scripts against a test DB every time anything changes.

    The point is that each <sql> element is in charge of deploying and rolling back. If you drop a column then you need to save the data in another table first so that the rollback can add it back.

    With teamcity constantly deploying and rolling back we quickly find out of the script properly rolls things back.

    For more complex operations we have actual dedicated deployer types:

    <editPrivilege isNew="true" key="View Android Package" name="View Android Package"/>

    The "editPrivilege" type corresponds to java code that knows how to deploy and roll back privilege objects perfectly every time.

    The <sql> elements can also run against any DB, so deployments that require changes to several DB will all run at the same time and stay in sync.

    The final step is the deploy phase. The deployer runs these scripts and then, if a failure occurs, it automatically rolls back. When the deployer is finished the DB is either in the new updated state or it's back in the original state.

    DB deploy used to be a nightmare, but in the 2 years of using this system we have NEVER had a failed DB deployment.


    Would that have the same effect as, say, this ?

    BEGIN
    <insert migration script>
    COMMIT

    -- wtf
  • L. 2012-02-29 05:51
    Jeff:
    "While NOSQL databases are somewhat based around EAV principles, I can say with 99.999% certainty that NOSQL is not for you. You are not, nor will not have to deal with Facebook-sized problems, and if you ever do, you'll be in a great position to hire a highly-specialized engineer who actually knows how to solve massive scalability problems."

    What if my main reason for using NoSQL is NOT the expectation of dealing with Facebook-sized problems?

    Say I want to manage a small or medium database, but avoid the development overhead from creating and altering table structures.

    If I add a new field to my data model, then in a relational database I'll need to:

    1. Change the code
    2. Create a database change script
    3. Test the database change script
    4. Ensure that the change script runs exactly once
    5. Keep backups of the data in case the change script fails, despite testing
    6. Avoid altering the table during working hours since it involves a full table rebuild
    7. etc etc etc

    While in a NoSQL database like MongoDB I'll need to:

    1. Change the code

    ...and it just works, right away.

    If I also want to index the new field for optimum performance, that's another small step, but the signal-to-noise ratio of "what I want to do" and "how much code or queries do I need to write to accomplish it" is still lightyears ahead of RDBMS.

    Is NoSQL still not for me?


    There's a good reason it's called Mongo(lian)DB, it's the perfect fit for you.

    Anyone who doesn't understand why a RDBMS rocks should just not talk about it.

    You sir, are not a DBA, are not an SQL expert and never will be a NoSQL expert since that does NOT exist (your very description of the tool shows how much of a tool for the brainless it is).
  • L. 2012-02-29 05:56
    Mike S:
    EAV Redux:
    EAV's useful for situations where you have to store structured data, where users are specifying the structure at run time, and the structure's different for different "classes" of data.
    In the systems that I've seen and worked on, everything else gets stored in 3rd normal form.
    It's got plenty of downsides (performance is poor, and reporting's a pain), but it's great for what it does.
    For reporting, we have a tool that converts EAV to 3rd normal format. It spits out over 100 tables into a seperate reporting database.


    What about the FACT that you can specify the structure of a table at run time ?
    Seriously .. there's no good reason to use EAV when you could simply have them create a new table through your tool on the fly.
  • L. 2012-02-29 06:01
    box:
    Honestly the best part of systems like this was the point mentioned last.

    Having the ability to spin up a DB that matches production anywhere any time on demand turbo charges productivity. No need for hellish shared development environments or scheduling time on the 'uat' environment for user testing is a freedom that I will never give up ever again. Once you have this ability you can really take your continuous integration testing to the next level too.


    <yourdbms>dump
    <yourdbms>restore

    Who does not have that ability ?
  • L. 2012-02-29 06:04
    TheCPUWizard:
    While EAV type architectures *may* be "anti-patterns" in many cases, there are definately valid use cases for them. My company created DynaSearch back in the 1980's which has become over the past 25 years a very sophisticated system that is based on the EAV pattern.

    For "irregular data" (different items have different "fields") this pattern can be extremely performant, and compact.

    On one system (a generalized taxonomoy system) we have a running instance with over 200G "cells" representing over 50TB of information. About 5 years ago another firm approached our client, and recommended going to a "normalized database". Their proposal required over 5000 tables, was estimated to take 3 times as much storage, and in prototype tests, could not match the performance, even though their test set contained less than 1% of the total data.

    About 3 years ago, we integrated a NOSQL database for handling "supporting data". This effort has proven to also be quite successful, allowing us to have full access to all of the information which was previously stored in other types of repositories.

    It is all about using the right tool for the right job.


    Most people's comment on NoSQL show that it's more about using the tool right than using the right tool.

    But w/e floats your boat.

    I'm pretty sure I could make your thing fly on a DBMS anyway and the fact that you had a bad proposition from one person doesn't say much about the technology.

    The one sure thing is, the simpler the technology, the most likely they'll use the tool right.
  • Laird Nelson 2012-02-29 06:19
    Another vote for Liquibase. Each change is a "changeset", each changeset is checksummed; changesets have rollback actions and cannot be modified.
  • Matt 2012-02-29 07:04
    I agree with pretty much everything you say. I have always considered it best practice to have a system capable of doing a test restore of production backups - and to perform that action frequently to ensure confidence in the backups being taken. That system can usually provide a good arena for performing that sort of deployment test - as long as you use all the relevant scrubbing procedures before running any tests. For a schema only test it's possible to simply restore the structure of the database through various methods, but these tend to vary wildly between DBMS...
  • david 2012-02-29 07:08
    oracle executes a commit immediately before and immediately after any DDL. you can have 20 inserts and then an unrelated alter table. the inserts get committed before the alter table is run, whether it succeeds or not.

    I don't know whether this is the best system (last time I looked db2 did not commit ddl automatically), but it is how Oracle works.

    Any migrate script has to take that into account.
  • Helpfulcabbage 2012-02-29 08:30
    Is "Entity-Attribute-Value" the same as the so-called "narrow table design," that Brad Fitzpatrick used at LiveJournal? I read about that recently but could not find much of anything on it, but the description I read sounds exactly like this. So it leads me to be confused. Alex says it's terrible, but Brad likes it. So confused!
  • Jeff 2012-02-29 08:54
    L.:
    There's a good reason it's called Mongo(lian)DB, it's the perfect fit for you.

    Anyone who doesn't understand why a RDBMS rocks should just not talk about it.

    You sir, are not a DBA, are not an SQL expert and never will be a NoSQL expert since that does NOT exist (your very description of the tool shows how much of a tool for the brainless it is).

    Fair enough, I'm certainly not a DBA.
    I've worked with MySQL and SQLite databases for about 7 years, but my focus is on the application code, so I'm probably not an SQL expert either (though I may be bold enough to claim not to be a complete noob).

    But can you point out some examples of why exactly MongoDB is so bad, other than basically just saying that it's stupid?

    The greatest concern for me at the moment is its limited support for transactions. Although like Alex said, I'm not going to be dealing with Facebook scale problems :)
    For a small or middle sized application, I'd say the overhead from maintaining database structure and it's changes is a bigger problem.

    Anything else I should be worried about? (I'm currently developing an application that uses MongoDB and haven't yet ran into significant problems, but if you have some convincing arguments, now is the time to save the project from certain doom)
  • Neil 2012-02-29 09:18
    Since when was FOR guaranteed to sort file names rather respecting whatever the underlying file system's order happens to be?
    Matt:
    I also find it utterly hilarious that you can create an article called 'Database Changes Done Right' without using the word 'transaction' once.
    The real WTF is a so-called database that can't do DDL in transactions.

    CAPTCHA: abigo - one of three friends with bunged-up noses.
  • Therac-25 2012-02-29 09:40
    Neil:
    Since when was FOR guaranteed to sort file names rather respecting whatever the underlying file system's order happens to be?


    Technically it would be the glob operator (*.sql) that is doing the sorting, not the FOR statement.

    I don't know how the windows shells work, but common Unix shells (I checked bash and tcsh) are guaranteed to glob alphabetically.
  • L. 2012-02-29 10:21
    Jeff:
    L.:
    There's a good reason it's called Mongo(lian)DB, it's the perfect fit for you.

    Anyone who doesn't understand why a RDBMS rocks should just not talk about it.

    You sir, are not a DBA, are not an SQL expert and never will be a NoSQL expert since that does NOT exist (your very description of the tool shows how much of a tool for the brainless it is).

    Fair enough, I'm certainly not a DBA.
    I've worked with MySQL and SQLite databases for about 7 years, but my focus is on the application code, so I'm probably not an SQL expert either (though I may be bold enough to claim not to be a complete noob).

    But can you point out some examples of why exactly MongoDB is so bad, other than basically just saying that it's stupid?

    The greatest concern for me at the moment is its limited support for transactions. Although like Alex said, I'm not going to be dealing with Facebook scale problems :)
    For a small or middle sized application, I'd say the overhead from maintaining database structure and it's changes is a bigger problem.

    Anything else I should be worried about? (I'm currently developing an application that uses MongoDB and haven't yet ran into significant problems, but if you have some convincing arguments, now is the time to save the project from certain doom)


    Well . if you worked with MySQL for 7 years I don't think I can help you.

    It took me all of one week to realize how much MySQL was behind the other RDBMS's and how much full of crap it was, as well as unreliable and slow and ...

    I have no idea what that mongoDB is, but the positive point that was mentionned about it is NOT a positive point, it just states how easy it is to use the tool.

    MysQL is easier to use than PostgreSQL, yet it is a thousand times worse.

    The usual trend is : easier to use, more limited inside, slower inside, etc.

    If you're fine with mongoDB, it means your whole approach to information and knowledge management needs rethinking, you use the database for so little it makes no difference what DB you use (quite often the case with MySQL users) and thus the simplicity of mongoDB goes your direction.

    However, the correct way to solve your "blabla database structure changes problem" is to put much more thought in how you manage and handle data, and how to best represent it in a relational model in order to both greatly increase reliability and efficiency + maintainability.

    SQLite is afaik even worse than MySQL in terms of features support (those that MySQL half-supports like triggers and all that ...) and MySQL itself lacks ACID support (which InnoDB does NOT handle, so no MySQL that I know of does).

    You seem to realize that keeping your data/information safe is important, well you're damn right and that's why you should take a look at the most advanced, reliable and optimized information handling systems : RDBMS's.

    So two things :
    1. that schema change problem does not exist, it's merely an effect of the way you model data / application
    2. try postgreSQl, it might be a bit harder than MySQL but at least you'll get something in return
  • L. 2012-02-29 10:28
    Jeff:
    L.:
    There's a good reason it's called Mongo(lian)DB, it's the perfect fit for you.

    Anyone who doesn't understand why a RDBMS rocks should just not talk about it.

    You sir, are not a DBA, are not an SQL expert and never will be a NoSQL expert since that does NOT exist (your very description of the tool shows how much of a tool for the brainless it is).

    Fair enough, I'm certainly not a DBA.
    I've worked with MySQL and SQLite databases for about 7 years, but my focus is on the application code, so I'm probably not an SQL expert either (though I may be bold enough to claim not to be a complete noob).

    But can you point out some examples of why exactly MongoDB is so bad, other than basically just saying that it's stupid?

    The greatest concern for me at the moment is its limited support for transactions. Although like Alex said, I'm not going to be dealing with Facebook scale problems :)
    For a small or middle sized application, I'd say the overhead from maintaining database structure and it's changes is a bigger problem.

    Anything else I should be worried about? (I'm currently developing an application that uses MongoDB and haven't yet ran into significant problems, but if you have some convincing arguments, now is the time to save the project from certain doom)


    http://stackoverflow.com/questions/5244437/pros-and-cons-of-mongodb

    See . didn't need to know about what it was to tell you it was a piece of crap .

    Obviously.. no joins (wtf you're going to join inside the application ????)


    No transactions ?? lol seriously

    obviously a major lack of consistency and coherency control etc... simply not what you get from a DBMS, which is precisely what you NEED when you're handling information.

    Using an RDBMs will make your dev work much shorter, and can help you maintain data quality and consistency, etc. if you're handling anything remotely standard
    While using mongoDB will give you a much simpler much faster much less reliable very scalable solution for "worthless" information, like tweets or likes or ...

    So yes, the simple-minded answer of "if you don't need the scaling, don't use it" - was the correct one.
  • eMBee 2012-02-29 10:58
    at one company we had a single change script.

    any new database changes were appended to that script. this script was maintained in revision control and always run from the start to build a developer database from scratch.

    on the production side the maintainers tracked up to what point the changes in the script had been run and just applied any new changes after that point.

    greetings, eMBee.
  • Brenda Zeitlin 2012-02-29 11:25
    These are great methods to deal with a real life problem.
    As a matter of fact, we think the problem is that common, that we have developed a solution for it.
    And we even offer it for FREE for small teams.

    It is called dbMaestro, and it offers a complete database change management solution
    that covers all of the aspects of your database from development to deployment
    with dbMaestro www.dbMaestro.com you can

    Control- Control and manage all changes being implemented in the database- no out of process updates or changes can occur to the DB
    o Database change repository covering all DB object types
    o An enforced check in/out process
    o Table structure, PL/SQL code, Meta Data (relevant table contact – usually the data that influences application behavior)
    o Saving Object dependencies for future deployments
    o Enabling easy rollbacks
    o Interconnected to your existing SCM
    • Change Documentation
    o No code overrides (two developers start working on one package only to lose one's work at some point)
    o No undocumented changes are allowed

    Deploy – Eliminate risks threatening your deployment and decrease Your Deployment Costs by 95% with an integrated deployment engine
    o Create deployment scripts, instead of writing them manually – efficient, reliable, repeatable and documented and a huge time saver
    o Three way analysis/ impact analysis
    o Merge engine to automatically merge code from different teams doing development on the same time
    o Deploy changes based on business requirements (work items, tasks etc)

    Protect - Enhance your Database Security – control who is doing what in the database
    Track each database change back to the requirement and tasks that initiated it (+ integration with SCM for A to Z coverage)

    Comply- Improve Compliance to Regulatory Bodies
    o Roles and responsibilities enforcement – (important for audit compliance)
    o Instead of determining who can do what based on their access rights (can be very problematic if whole groups are using the same login credentials)
    o Make sure each person can do only what he should (linking with directory groups to determine project scopes of work)
    o Set roles: a DBA can change table structure while the developer can only change the procedures even if sharing the same login credentials


    Please have a look at our website (www.dbMaestro.com ) or contact me for more information I would be happy to set up a demo or a free download for any of the readers



    Brenda Zeitlin | dbMaestro Ltd.
    Email: Brendaz@dbMaestro.com Web: www.dbMaestro.com
    Toll Free: 1 8669318855
    CONTROL.DEPLOY.PROTECT.COMPLY
  • Fesh 2012-02-29 11:52
    So...my coworker brings up an interesting point on EAV, the only time you should avoid this, is when the business's requirements are stupid and demand it.

    He worked in an environment where the business required the ability to create document templates on the fly. Part of this is that there would be peculiar fields for a given template. The EAV was the best-of-the-worst answers. Or at least that's what he says.
  • Alex Papadimoulis 2012-02-29 12:21
    Spike:
    You could use a tool like LiquiBase to store your scripts in source control with your application, and manage which scripts need to be run in what order when deploying. All of the above rules still apply of course.


    I haven't had an opportunity to adequately try out LiquiBase, so my opinion is based on a first impression. That being said...

    When I first saw it, it seemed that you wrote "change scripts" in a database-agnostic XML language like this:

    ..<create table="employees">
    ....<column type="char(5)" name="employee_id">
    ......<constraint primaryKey="true">

    I have a knee-jerk reaction to any less expressive abstractions of an abstraction, especially when XML is added. I fail to see any benefit to this.

    If you feel that the "it can be run against *any* database" is really a feature/benefit, then why not apply the same anti-pattern to application code?
  • Jeff 2012-02-29 12:24
    L.:
    However, the correct way to solve your "blabla database structure changes problem" is to put much more thought in how you manage and handle data, and how to best represent it in a relational model in order to both greatly increase reliability and efficiency + maintainability.

    /--/

    1. that schema change problem does not exist, it's merely an effect of the way you model data / application

    By this, do you mean that:

    A) data should be modeled so well in the first place that structure changes will never be needed

    or

    B) the development process should be set up so that structure changes could be applied more reliably (e.g. the stuff Alex talks about in the article)?

    Or some option C? :)
  • Therac-25 2012-02-29 12:27
    Fesh:
    So...my coworker brings up an interesting point on EAV, the only time you should avoid this, is when the business's requirements are stupid and demand it.

    He worked in an environment where the business required the ability to create document templates on the fly. Part of this is that there would be peculiar fields for a given template. The EAV was the best-of-the-worst answers. Or at least that's what he says.


    That's fine, if it's modelling actual arbitrarily structured information from the user, then it's good. As long as it's only used *where you need to use it*.

    That's not the antipattern. The antipattern is deciding to use that kind of structure for regular data that could just as easily have been put into a normal table.

    The joins (and subselects) that result are terrifying.
  • Therac-25 2012-02-29 12:30
    Jeff:

    Or some option C? :)


    C) Work on a project that never grows, changes, or adds features so the thing you're modelling will never have to change.
  • C 2012-02-29 15:34
    L.:

    http://stackoverflow.com/questions/5244437/pros-and-cons-of-mongodb

    See . didn't need to know about what it was to tell you it was a piece of crap .


    yay, internet nerd gets on high horse about product he knows nothing about, finds a single community post written a year ago about a system that is stated in that post to be fast growing and uses that to declare his assumption to be the correct one.

    That's some real deep research there, friend.

    Not evangelizing about mongo, it's an ok product with some valid applications and some things that it's very much not suited to do, but c'mon man. You're worse than Alex when he gets on his soapbox.
  • Aaron 2012-02-29 16:09
    This is one reason I really love Rails -- all database changes MUST go through migrations. And every migration has an "up" and "down" depending on whether you are migrating up to it, or rolling back down to it. All migrations are timestamped down to microseconds when they are created, and that timestamp is part of the filename; there is no question what order they were created.

    The applications knows where it is at in the migration process by keeping a record in the schema reference about the current migration version.

    Most migration files contain simple add/remove type instructions, but I've also integrated seed data into a migration simply so it would be tracked in version control when I deploy to production. Love it.
  • Consultuning 2012-02-29 16:34
    I think that you're right on the premises, but your solution is a bit too simplistic. I Started to reply as a comment but finally the comment was so large that I thought it would be best to have it as a post on my own blog. See http://consultuning.blogspot.com/2012/02/is-there-right-way-of-doing-database.html
  • HappyEngineer 2012-02-29 18:00
    L.:
    HappyEngineer:
    The solution we came up with turned 4 hour DB deployments that often failed into half hour deployments that always succeed. We used to have the horrible system where we had one sql script to deploy and another sql script to rollback (per DB). It failed all the time. People failed to update the rollback scripts consistently and they were rarely tested.

    We now have xml deploy scripts that have very short bits of sql and the corresponding bit of rollback sql. For example:

    <sql db="cps_cpsdata">
    <terseDesc>Add and set cannedRole field on Role.</terseDesc>
    <deploy>
    alter table Role add cannedRole NUMBER(1) DEFAULT(0);
    update Role set cannedRole=1 WHERE company='102' and name not like 'Employee%';
    </deploy>
    <rollback>
    alter table Role drop column cannedRole;
    </rollback>
    </sql>

    We then have teamcity constantly running the deploy scripts against a test DB every time anything changes.

    The point is that each <sql> element is in charge of deploying and rolling back. If you drop a column then you need to save the data in another table first so that the rollback can add it back.

    With teamcity constantly deploying and rolling back we quickly find out of the script properly rolls things back.

    For more complex operations we have actual dedicated deployer types:

    <editPrivilege isNew="true" key="View Android Package" name="View Android Package"/>

    The "editPrivilege" type corresponds to java code that knows how to deploy and roll back privilege objects perfectly every time.

    The <sql> elements can also run against any DB, so deployments that require changes to several DB will all run at the same time and stay in sync.

    The final step is the deploy phase. The deployer runs these scripts and then, if a failure occurs, it automatically rolls back. When the deployer is finished the DB is either in the new updated state or it's back in the original state.

    DB deploy used to be a nightmare, but in the 2 years of using this system we have NEVER had a failed DB deployment.


    Would that have the same effect as, say, this ?

    BEGIN
    <insert migration script>
    COMMIT

    -- wtf



    No, it would not. DDL doesn't take place as part of a transaction, so rollback wouldn't give you your column back.

    Sometimes the errors are detected by the java code before they hit the DB.

    The java deployers are far more reliable because they are tested and reused and they validate the input given.

    Our method of deployment can be easily tested. Writing separate sql scripts for deploy and rollback was demonstrably more difficult to test (the constant deploy failures were proof of that).

    Our system allows much more control. The error messages in the logs are very clear about which things failed. A full tree is output that shows the order of the deployment and which things failed and which did not. It also includes timing information so we can see how each part is taking.

    For a while I thought about writing an open source version for use by others. I never got around to it, but perhaps I will someday. The difference between sql scripts and our method is huge.

  • Joe Smith 2012-02-29 22:01
    [quote user=HappyEngineer]
    No, it would not. DDL doesn't take place as part of a transaction, so rollback wouldn't give you your column back.
    [/quote]

    That depends entirely on the database. Postgres has perfect DLL transactions, which are completely nvisible to other transactions unless and until the commit line is succesfully reached.

    Microsof'ts SQL server does reasonably well DDL transactions, although not everything is supported, and there are a few cases where in progress DDL transactions are visible to other transactions.

    I'm guessing you are talking about Oracle, which is objectively a rather terrible database. Sure it is scalable, but it lacks critical things like distinguishing between null and empty strings, and even minimal DDL transaction support.
  • kn 2012-03-01 01:25
    Nice article, but I don't see the point in separating out object scripts and change scripts.

    It seems to me that objects are quite closely tied to the structure of your data: add/remove/rename/alter a column on a table, then any triggers, procedures, and/or views that rely on that table will likely need to change as well. I'd think you'd want these changes to occur in tandem, and ideally inside a single transaction. Separating these components into separate scripts would seem to lose that, and while one certainly can rerun an object script, I don't really see any reason why you'd want or need to run them independently of a change script. I.e., you'd only be running an object script because you're changing the database, either just the object alone or because of an underlying change in the data structure. Distinguishing between these changes seems unnecessary.

    Plus, if one of the goals is to be able to spin up new database environments simply by running through the change scripts, it's quite possible -- if not likely -- that some of the scripts may rely on older object versions (triggers especially) to function properly. Moving the objects into separate scripts and keeping only the current version seems directly counter to being able to recreate a current (or earlier) version of the database.

    Also, why the need for the external log on top of the database's internal log? Especially if we're reloading the database and going back to the drawing board should a script ever fail, both logs will be identical on a functioning database, no?

    What am I missing?
  • Atle 2012-03-01 03:15
    So, it has come to this…

    Now that we know what the solutions are not, what is the solution when (parts of the) data structures are defined by users at runtime?

    Manipulating tables? What if you want to keep old data when a column goes obsolete?
  • Cbuttius 2012-03-01 06:25
    Ok my "soapbox" time now. I work on application level with databases a lot. There are multiple languages and lots of different database implementations yet they all seem to suffer from similar issues.

    Database activitiy is almost always done by constructing strings to pass to the database. This looks wrong to me in many ways.

    I actually think that the policy of only calling stored procedures from an application is quite a good one. However you should be able to do all of this with

    - A single call to the database to retrieve an object for the stored procedure, related to its structure
    - Attach this procedure to a connection. It is important we do it at this point as we may watn to re-use the object.
    - Bind variables to this object, one or more times.
    - execute the procedure, one or more times
    - process the results for each execution call or handle errors.

    ideally doing the latter through some async future.

    There could be more "structured" programming for other types of query too. There is for BCP but it's non-standard and only supports inserts (and reads), not merges.
  • not alex 2012-03-01 09:27
    Alex:
    I think it's incredibly limiting to say migrations have to be SQL scripts.

    We have a system of migrations with full access to all our application's services, the full object model and the EntityManager (we're running on Hibernate).

    Sometimes, these migrations just run an SQL script (like a simple UPDATE to set a new boolean field to false for every row), but often a new field will need to be calculated somehow, and then it's nice to have your actual application logic do the calculations.

    Also, if your whole application is using JPA QL, dropping down to SQL in migrations will be bug prone.


    This is dangerous, because you migration scripts are now depending on your application code. It will work fine initially, and then in changeset 25674 someone introduced a subtle change to the application logic, and then revision 26198 get tagged and released, and then the migration script fails.

    Alex Papadimoulis:
    SQL does not need to be “wrapped” with a less expressive language.

    Actually, Python, Ruby, etc are more expressive than SQL, as the later is not Turing complete. SQLAlchemy (python ORM library), for example, has an migration library called Alembic that put the full expressiveness of Python at your disposal (no DSL), and can optionally generate SQL scripts for offline migration.
  • Gary 2012-03-01 09:46
    "On fail, fix database" is a problem only if there are no tools that can instantaneously fix the database.

    It is a weakness in tools you are identifying, not in process.

    Notably, application code is tied to the data-model, but that does not imply application code is tied to the database. It is possible to preserve to the application the notion that it is communicating with the data-model it wants.

    You may want to have a look at our solution: http://chronicdb.com
  • sbdragoo 2012-03-01 11:02
    FuBar brings a great point. A previous "database" I worked with and ultimately converted to a digital format was also kept on note cards. It was a water valve system for a municipality of about 140K. The system was designed so well and the users were trained well and had such a high level of discipline that only a few values were in place where lookup tables would ultimately be employed in the digital version. Once the digital version was created, the screens were designed to mimic the card input already in place and the transition was the easiest I have ever seen. The key was the original design and the discipline of the users over the decades the system was in place... The original system was implemented in the 1920's and was converted to SQL Server / ESRI GIS in early 2000's.
  • sbdragoo 2012-03-01 11:06
    Great article. I really appreciated the points regarding schema compare. Where I currently work, we had (yes, past tense) two developers in particular who swore that a certain compare tool was the only suitable method for DB deployment to occur. This continued until they both departed even though they wanted to compare the testing environment where multiple versions were being tested for release against the production environment. Now that we have moved to a system of structured change scripts with no alterations to the script once created (except in the case of syntax error), our application / database interface issues have become near zero.
  • Michael Welcome 2012-03-01 13:23
    I mostly agree with what is here. I have had success with making schema changes in the following way. Check for a "backup version" of the table(s) you are going to modify. If it does not exist rename the current table to the back up. You may need to drop FKs and some indexes to get this all to work.

    Create new tables using the original table name. Move the data from the old tables into the new ones. Reestablish the indexes and FKs. If I don't get everything in the design right the first time, I can tweak the conversion script and rerun until everything is correct.

    If you look at the change scripts the SSMS creates when you modify a table, you will get some idea how to do this. They of course delete the old version of the table as soon as the script completes. I do not do that and would clean the old table up in the next release.

    If something goes horribly wrong during an install, you could always undo this by deleting the tables you created and renaming the old tables back to the original name. Then reestablish the required indexes and FKs.

    I have been using this technic for many years now and have never had to do a backout or restore a DB from a backup due to a botched install. Of course this will only work with DB of a moderate size.
  • High Frequency Trading DBA 2012-03-01 16:24
    Can someone tell me the point of this article?

    Making changes to databases is not hard. Versioning, testing, and deploying these changes is not hard. I could've summed this whole article up in a few sentences. Making yourself out to be a database guru because you're comparing to those that can't do simple change control on a database is pretty relative. Is this a re-publish from 10 years ago?

    However, doing database changes alongside a set of much more rapidly updating applications is the tricky part. Imagine one database supports many applications (100+) that have much faster iterations than the database. A team of 200+ developers and a dozen DBAs for instance.

    Unfortunately this article does nothing to address that. It doesn't address the relationship between application and database deployment. And not addressing this means it doesn't really hold any value at all. As an example, say I have a large high performance database that is a data store for 100+ applications. One of these large applications is business critical, and you have iterations in a matter of _minutes and hours_, not _weeks and months_.

    Your business dictates you need an application change out immediately. This change requires new/different data in the data store shared by many. The application changes take an hour to implement. Take it to the low level, and the database needs a schema change. One that would likely break existing applications. Using this process, what do we do now? And if it actually works, does time to market suffer along the way? What about transitioning from the old version to the new version? I.E. both versions running in parallel, along with a dozen more.

    I couldn't disagree with this article more. It's a great start for those that have absolutely no change control for databases on their small projects with slow iterations. But if you approach something like I'm describing you will very, very quickly outgrow this approach completely.
  • pbean 2012-03-01 16:57
    We're using Liquibase... it's not perfect, but does the job. Handles database updates and rollbacks pretty well. Why should we reinvent something that already exists? Its disadvantages (XML) do not outweigh its advantages.
  • Security Guy NC 2012-03-01 19:54
    Alex, you didn't address fully the issue of the data flowing back from Prod into the Test and Dev databases. I agree fully with your statements when the data is end-user generated.

    However, I work in an environment where data created by the company for public consumption has to be reviewed by the legal compliance team. Because many departments contribute to these changes the data is stored in an easily editable Content Management System. Because the data is sensitive, and putting wrong data out in Prod could shut down our company, it is too risky to make the company produced data changes directly in Prod. (All the CMSs I've seen are bad at holding back changes until a review has been made.)

    The data changes produced by the company are non-editabe in Prod (the CMS server code is not even deployed to Prod) so we can keep a tighter lid on security. We do keep the CMS as close to Prod as we can and flow the changes in both directions.
  • Barf 4Eva 2012-03-01 21:36
    Well written article, and all points that I agree with! thank you for sharing.

    We have a lovely EAV-style table simply called the MASTER table. In it, you will find all sorts of things... Validations, connectivity info sometimes, mapping data, even full-blown tables hidden away, perhaps that people were too lazy (I'm guessing?) to put in to their own tables. We even have a column called "table". :D

    Anyways, it's a nice a conversation piece to have around when introducing the "new guy" to the leviathan!
  • Barf 4Eva 2012-03-01 21:47
    Laird Nelson:
    Another vote for Liquibase. Each change is a "changeset", each changeset is checksummed; changesets have rollback actions and cannot be modified.


    This looks cool... Thanks for the link! Looks like it is worth reading more on this.
  • Anonymous 2012-03-02 04:43
    Is this article is written in the SQL Server world? Some "missing features" there actually exist in other databases, like Oracle.

    "The only way to change a database is by running a SQL script against that database, and once you’ve done that, there’s no going back. You can run another SQL script to change the database again, but the only way to truly rollback changes is by restoring the entire database from back-up." - wrong. Oracle has "Flashback Transaction", in which every statement you run to make data changes auto-generates an "undosql" statement for you. Use the LogMiner to review and execute these statements to undo your changes. No need to restore the entire database.

    Oracle also has "Flashback Database" when you can roll back the entire database to a single point in time - no need to restore from a backup (unless the point in time is bigger than the flashback window).

    "once you’ve dropped that column, it’s gone forever; you could certainly restore your database from back-up, but that’s not exactly an undo, especially when you lose all the intermediate data" - wrong. Oracle has "flashback table" to restore a dropped column or table, meaning "instant undo" on the table.

    These features exist so that it is quick and easy for an experienced DBA to quickly undo bad changes with the minimum of downtime. If you didn't hate Oracle so much then perhaps you'd learn what it had to offer and benefit from it rather than moan how you've got to do all this additional work instead.
  • Eric 2012-03-02 06:08
    btw the batch file we use for executing our scripts in order is as follows;

    @echo off

    REM cycle through all sql scripts in the current folder and execute them
    For /F "eol= tokens=* delims= " %%f in ('dir *.sql /a-d /on /b') do (
    echo Processing %%f...

    sqlcmd -S .\SQLEXPRESS2008 -E -b -i "%%f"
    if errorlevel 1 goto badend

    echo %%f completed
    echo.
    )

    goto goodend


    :badend
    echo.
    echo **** BAD SCRIPT FILE ENCOUNTERED ****
    goto end


    :goodend
    echo.
    echo Batch script update successful


    :end
    pause

    This script could be modified to accept the instance name as a parameter but in practice we all have a default instance installed on our dev machines. The advantage of the script is that it stops executing if it encounters an error on any of the scripts, although the severity must be 10 or above to register on the program exit code. This isn't an issue if you're using TRY/CATCH with BEGIN TRANS/COMMIT/ROLLBACK. We use the following sample in all stored procs and update scripts;

    SET NOCOUNT ON
    BEGIN TRY
    BEGIN TRAN




    COMMIT TRAN
    END TRY
    BEGIN CATCH
    --might need to rollback
    IF (@@TRANCOUNT = 1)
    BEGIN
    ROLLBACK TRAN
    END
    ELSE
    BEGIN
    -- Nested transaction. We can't rollback from here or else we will
    -- get the SQL Msg 266 about erroneous transaction counts after SP
    -- completion. Instead we Commit this block of work trusting that
    -- the caller will eventually Rollback at the outermost level when
    -- it sees my error code.
    COMMIT TRAN
    END
    DECLARE @ErrMsg nvarchar(MAX), @ErrSeverity int
    SELECT @ErrMsg = ERROR_MESSAGE(),@ErrSeverity = ERROR_SEVERITY()
    RAISERROR(@ErrMsg, @ErrSeverity, 1)
    END CATCH

    All this, along with a script that checks the db version (and throws an exception if not) and named as "001) Check db version.sql", means that (in theory) it's not possible to run scripts for a particular patch against the wrong version of the database
  • Yum 2012-03-02 10:25
    Mmmmm... troll food!
  • RC 2012-03-03 02:55
    You've gotta accept that you're a tiny minority, not worthy of mention compared to the massive hordes of Oracle/DB2/Postgres/MySQL/etc. developers, admins, and users... It's sad but true.

    :fibcgf

    You aren't working at Petco are you?
    (Not a lot of other big U2 shops out there.)
  • Thomas James 2012-03-03 06:41
    I've found the use of dbMaintain to be an awesome tool to replace the batch file you talked about. It also enforces a few basic rules.

    http://www.dbmaintain.org/
  • Enigma 2012-03-03 09:02
    "Since these object scripts simply create objects, they can be run time and time again without any real consequences."

    If your database has, say, 100 procs, you are going to drop and recreate all of them any time any one of them changes? What if you have thousands of procs in a large environment with many servers (I do)? Deploying every proc even if it didn't change is not going to scale, since deployment would take hours. What if the procedure is replicated (then you won't be able to drop it)?

    Perhaps my environment is significantly different from yours, but mine is an environment with thousands of procs on dozens of servers and a deployment system that goes directly into production and does 40+ deployments per day (for a website). I use an SVN repository structure of ServerName\DatabaseName\ObjectType\ObjectName.sql and a powershell script for deployment. The script runs an SVN update to get the latest version, parses the output (if any files changed), and then deploys the changed objects only at the appropriate server and database. If it fails it sends the error via e-mail to the committer, and tries again several times (in case the failure was due to a timeout or something, i.e. caused by blocking). If it keeps failing and doesn't get fixed, eventually the DBAs get an e-mail asking us to look into it or roll it back (since it's vital that SVN is always in sync with production).

    Since you can't drop a replicated procedure (and we use lots of replication), a better method than "if exists drop, then create" is to do "if not exists create (as select 1), then alter". This will work in all cases. For functions it's slightly more annoying as you have to be aware of whether you're doing a table valued function or a scalar valued function when creating that dummy object.

    Also I'm not sure if you use SQLCLR assemblies or SSIS, but I deploy those directly from source control too which takes a ton of tedious work off the DBA team's plate.

    I completely agree about tables though - they should be deployed manually always by a DBA. Unless you really really trust that nobody is ever going to commit an alter that adds a not-nullable column with a default to your largest and most important table (thereby bringing your application to its knees), you really do want a person doing a deployment for tables and being conscious of the impact on the application, the time of day, etc. To keep source control in sync though I set up a job that automatically scripts out changed table definitions and commits them to SVN, which is quite helpful.

  • Omnifarious 2012-03-04 13:37
    I'm not fond of databases, even though I agree with all the change control steps listed in this article.

    I'm not fond of them because they require a completely specialized set of tools to deal with, and the nature of the specialized tools is different very every single stupid database.

    I like sqlite for development work because copying the database file works. Maybe the file is a stupidly opaque blob without a specialized. But at least copying it works. And the OS permissions operate as they should. And a whole host of other things work the way they're supposed to.

    But otherwise, databases are just incredibly stupid. They have their own authentication and permission system with their own obscure syntax (different from vendor to vendor of course) and you need to use specialized tools to even make a copy of the stupid things.
  • L. 2012-03-05 06:56
    Jeff:
    L.:
    However, the correct way to solve your "blabla database structure changes problem" is to put much more thought in how you manage and handle data, and how to best represent it in a relational model in order to both greatly increase reliability and efficiency + maintainability.

    /--/

    1. that schema change problem does not exist, it's merely an effect of the way you model data / application

    By this, do you mean that:

    A) data should be modeled so well in the first place that structure changes will never be needed

    or

    B) the development process should be set up so that structure changes could be applied more reliably (e.g. the stuff Alex talks about in the article)?

    Or some option C? :)


    A) You will never require a *major* structure change once your data is correctly modelled, i.e. a logical reflection of the concepts you are addressing.

    B) Your system shouldn't have any issues with new item types / new attributes / new whatever .

    It's not about development process but about application architecture.

    If your app. can't handle such tiny changes, it should be rewritten.
  • L. 2012-03-05 07:00
    HappyEngineer:
    L.:
    HappyEngineer:
    The solution we came up with turned 4 hour DB deployments that often failed into half hour deployments that always succeed. We used to have the horrible system where we had one sql script to deploy and another sql script to rollback (per DB). It failed all the time. People failed to update the rollback scripts consistently and they were rarely tested.

    We now have xml deploy scripts that have very short bits of sql and the corresponding bit of rollback sql. For example:

    <sql db="cps_cpsdata">
    <terseDesc>Add and set cannedRole field on Role.</terseDesc>
    <deploy>
    alter table Role add cannedRole NUMBER(1) DEFAULT(0);
    update Role set cannedRole=1 WHERE company='102' and name not like 'Employee%';
    </deploy>
    <rollback>
    alter table Role drop column cannedRole;
    </rollback>
    </sql>

    We then have teamcity constantly running the deploy scripts against a test DB every time anything changes.

    The point is that each <sql> element is in charge of deploying and rolling back. If you drop a column then you need to save the data in another table first so that the rollback can add it back.

    With teamcity constantly deploying and rolling back we quickly find out of the script properly rolls things back.

    For more complex operations we have actual dedicated deployer types:

    <editPrivilege isNew="true" key="View Android Package" name="View Android Package"/>

    The "editPrivilege" type corresponds to java code that knows how to deploy and roll back privilege objects perfectly every time.

    The <sql> elements can also run against any DB, so deployments that require changes to several DB will all run at the same time and stay in sync.

    The final step is the deploy phase. The deployer runs these scripts and then, if a failure occurs, it automatically rolls back. When the deployer is finished the DB is either in the new updated state or it's back in the original state.

    DB deploy used to be a nightmare, but in the 2 years of using this system we have NEVER had a failed DB deployment.


    Would that have the same effect as, say, this ?

    BEGIN
    <insert migration script>
    COMMIT

    -- wtf



    No, it would not. DDL doesn't take place as part of a transaction, so rollback wouldn't give you your column back.

    Sometimes the errors are detected by the java code before they hit the DB.

    The java deployers are far more reliable because they are tested and reused and they validate the input given.

    Our method of deployment can be easily tested. Writing separate sql scripts for deploy and rollback was demonstrably more difficult to test (the constant deploy failures were proof of that).

    Our system allows much more control. The error messages in the logs are very clear about which things failed. A full tree is output that shows the order of the deployment and which things failed and which did not. It also includes timing information so we can see how each part is taking.

    For a while I thought about writing an open source version for use by others. I never got around to it, but perhaps I will someday. The difference between sql scripts and our method is huge.



    Well, if your DBMS does not handle packaging drop columns inside a transaction, it's just wrong and that's it.

    If you say begin / whatever/ rollback and you're not in the initial state, your dbms should be trashed instantly for not doing its job.

    Furthermore, while your system is pretty nice (modelling those changes and associating roll back code is not bad), I really think it should rely more on your dbms and less on hand-coded rollback plans (when in most cases transaction rollback does that).
  • L. 2012-03-05 07:00
    [quote user="Joe Smith"][quote user=HappyEngineer]
    No, it would not. DDL doesn't take place as part of a transaction, so rollback wouldn't give you your column back.
    [/quote]

    That depends entirely on the database. Postgres has perfect DLL transactions, which are completely nvisible to other transactions unless and until the commit line is succesfully reached.

    Microsof'ts SQL server does reasonably well DDL transactions, although not everything is supported, and there are a few cases where in progress DDL transactions are visible to other transactions.

    I'm guessing you are talking about Oracle, which is objectively a rather terrible database. Sure it is scalable, but it lacks critical things like distinguishing between null and empty strings, and even minimal DDL transaction support.[/quote]

    Amen brother.
  • L. 2012-03-05 07:04
    Omnifarious:
    I'm not fond of databases, even though I agree with all the change control steps listed in this article.

    I'm not fond of them because they require a completely specialized set of tools to deal with, and the nature of the specialized tools is different very every single stupid database.

    I like sqlite for development work because copying the database file works. Maybe the file is a stupidly opaque blob without a specialized. But at least copying it works. And the OS permissions operate as they should. And a whole host of other things work the way they're supposed to.

    But otherwise, databases are just incredibly stupid. They have their own authentication and permission system with their own obscure syntax (different from vendor to vendor of course) and you need to use specialized tools to even make a copy of the stupid things.


    Well obviously you're an ignorant who shouldn't have anything to do with databases, and also a very successful troll.

    Let's all go back to the madness of filetypes and XLS, it's all so much better than actual query-able information in a logical format.

  • AYmous 2012-03-05 20:07
    geoffrey, MCP, PMP:

    NoSQL is a fad invented by programmers who think they have outsmarted giant enterprises when it comes to data storage and retrieval. Nothing more.


    software history is all about programmers who outsmart giant enterprises. if that's your argument, I'll back nosql to the hilt...

    ... well, except the enterprises will probably deploy an army of lawyers, lobbyists, and steak-dinner-buying salespeople to ensure that their crappy solutions will sell...
  • Mike S. 2012-03-07 16:02
    We're dealing with a survey system. Hundreds of different surveys, thousands of questions, with a method for skipping questions. And the survey owners can change the question orders and add and remove questions from any part of the survey at any time, and the survey questions themselves are versioned (so if you change the question text, a new version of the question is created).

    So if I wanted to ditch EAV, I would need to write something to create new tables on the fly, insert columns in any index in the table on the fly, and reorder the columns. If a user comes alone and deletes 30 previous questions and adds 25 more, then the table becomes full of dead space - the first N1000 patients don't have answers to the 25 new questions, the next N1000 patients don't have answers to the 30 that were removed.

    Or instead when the user ditches 25 questions and adds 30 more we could create a new table. Now instead of hundreds of survey tables we would have thousands, and the research analyst would still need to combine their data in some fashion.

    I'm not saying I have a great solution or even the optimal solution. But implementing a non-EAV solution looks incredibly complex.
  • Johhny NoGood 2012-03-07 23:12
    L.:
    HappyEngineer:
    The solution we came up with turned 4 hour DB deployments that often failed into half hour deployments that always succeed. We used to have the horrible system where we had one sql script to deploy and another sql script to rollback (per DB). It failed all the time. People failed to update the rollback scripts consistently and they were rarely tested.

    We now have xml deploy scripts that have very short bits of sql and the corresponding bit of rollback sql. For example:

    <sql db="cps_cpsdata">
    <terseDesc>Add and set cannedRole field on Role.</terseDesc>
    <deploy>
    alter table Role add cannedRole NUMBER(1) DEFAULT(0);
    update Role set cannedRole=1 WHERE company='102' and name not like 'Employee%';
    </deploy>
    <rollback>
    alter table Role drop column cannedRole;
    </rollback>
    </sql>

    We then have teamcity constantly running the deploy scripts against a test DB every time anything changes.

    The point is that each <sql> element is in charge of deploying and rolling back. If you drop a column then you need to save the data in another table first so that the rollback can add it back.

    With teamcity constantly deploying and rolling back we quickly find out of the script properly rolls things back.

    For more complex operations we have actual dedicated deployer types:

    <editPrivilege isNew="true" key="View Android Package" name="View Android Package"/>

    The "editPrivilege" type corresponds to java code that knows how to deploy and roll back privilege objects perfectly every time.

    The <sql> elements can also run against any DB, so deployments that require changes to several DB will all run at the same time and stay in sync.

    The final step is the deploy phase. The deployer runs these scripts and then, if a failure occurs, it automatically rolls back. When the deployer is finished the DB is either in the new updated state or it's back in the original state.

    DB deploy used to be a nightmare, but in the 2 years of using this system we have NEVER had a failed DB deployment.


    Would that have the same effect as, say, this ?

    BEGIN
    <insert migration script>
    COMMIT

    -- wtf


    No it wouldn't. Their tool appears to allow promotion / demotion. e.g. promote my app to v1.1 demote my app to v1.0 by executing the rollback clause.

    I might be wrong as well.

    I find the whole rollback of a database upgrade a little ridiculous. Unless your app is just a toy, the only way you can be certain is by restoring.

    If upgrading the application was merely adding or renaming a column, that's one thing, but don't forget about the data migration scripts. If you drop a column, how do you effectively roll back?
  • L. 2012-03-12 05:13
    Mike S.:
    We're dealing with a survey system. Hundreds of different surveys, thousands of questions, with a method for skipping questions. And the survey owners can change the question orders and add and remove questions from any part of the survey at any time, and the survey questions themselves are versioned (so if you change the question text, a new version of the question is created).

    So if I wanted to ditch EAV, I would need to write something to create new tables on the fly, insert columns in any index in the table on the fly, and reorder the columns. If a user comes alone and deletes 30 previous questions and adds 25 more, then the table becomes full of dead space - the first N1000 patients don't have answers to the 25 new questions, the next N1000 patients don't have answers to the 30 that were removed.

    Or instead when the user ditches 25 questions and adds 30 more we could create a new table. Now instead of hundreds of survey tables we would have thousands, and the research analyst would still need to combine their data in some fashion.

    I'm not saying I have a great solution or even the optimal solution. But implementing a non-EAV solution looks incredibly complex.

    You sir, have officially demonstrated how much of a moron you are. congratulations.

    The very friggin idea of having a column for each question shows you should be shot on sight if seen programming.
    But whatever, just stop working in the field and I'll be happy.

    Damn dimwits seriously...
  • L. 2012-03-12 05:15
    Johhny NoGood:
    L.:
    HappyEngineer:
    The solution we came up with turned 4 hour DB deployments that often failed into half hour deployments that always succeed. We used to have the horrible system where we had one sql script to deploy and another sql script to rollback (per DB). It failed all the time. People failed to update the rollback scripts consistently and they were rarely tested.

    We now have xml deploy scripts that have very short bits of sql and the corresponding bit of rollback sql. For example:

    <sql db="cps_cpsdata">
    <terseDesc>Add and set cannedRole field on Role.</terseDesc>
    <deploy>
    alter table Role add cannedRole NUMBER(1) DEFAULT(0);
    update Role set cannedRole=1 WHERE company='102' and name not like 'Employee%';
    </deploy>
    <rollback>
    alter table Role drop column cannedRole;
    </rollback>
    </sql>

    We then have teamcity constantly running the deploy scripts against a test DB every time anything changes.

    The point is that each <sql> element is in charge of deploying and rolling back. If you drop a column then you need to save the data in another table first so that the rollback can add it back.

    With teamcity constantly deploying and rolling back we quickly find out of the script properly rolls things back.

    For more complex operations we have actual dedicated deployer types:

    <editPrivilege isNew="true" key="View Android Package" name="View Android Package"/>

    The "editPrivilege" type corresponds to java code that knows how to deploy and roll back privilege objects perfectly every time.

    The <sql> elements can also run against any DB, so deployments that require changes to several DB will all run at the same time and stay in sync.

    The final step is the deploy phase. The deployer runs these scripts and then, if a failure occurs, it automatically rolls back. When the deployer is finished the DB is either in the new updated state or it's back in the original state.

    DB deploy used to be a nightmare, but in the 2 years of using this system we have NEVER had a failed DB deployment.


    Would that have the same effect as, say, this ?

    BEGIN
    <insert migration script>
    COMMIT

    -- wtf


    No it wouldn't. Their tool appears to allow promotion / demotion. e.g. promote my app to v1.1 demote my app to v1.0 by executing the rollback clause.

    I might be wrong as well.

    I find the whole rollback of a database upgrade a little ridiculous. Unless your app is just a toy, the only way you can be certain is by restoring.

    If upgrading the application was merely adding or renaming a column, that's one thing, but don't forget about the data migration scripts. If you drop a column, how do you effectively roll back?


    The only reason you would rollback is if the intended action failed, and that's what you get from SQL transactions . the rest is for kids playing around with shiny toys : woo see how i can switch versions faster than light oops broken again.

    And if you drop columns using their thingy, it's just dead that's it.
  • L. 2012-03-12 05:19
    Mike S.:
    We're dealing with a survey system. Hundreds of different surveys, thousands of questions, with a method for skipping questions. And the survey owners can change the question orders and add and remove questions from any part of the survey at any time, and the survey questions themselves are versioned (so if you change the question text, a new version of the question is created).

    So if I wanted to ditch EAV, I would need to write something to create new tables on the fly, insert columns in any index in the table on the fly, and reorder the columns. If a user comes alone and deletes 30 previous questions and adds 25 more, then the table becomes full of dead space - the first N1000 patients don't have answers to the 25 new questions, the next N1000 patients don't have answers to the 30 that were removed.

    Or instead when the user ditches 25 questions and adds 30 more we could create a new table. Now instead of hundreds of survey tables we would have thousands, and the research analyst would still need to combine their data in some fashion.

    I'm not saying I have a great solution or even the optimal solution. But implementing a non-EAV solution looks incredibly complex.


    I had to say it twice ... seriously how retarded do you have to be to write that as "not a troll" which is what it looks like.

    HOW THE F*CK COULD YOUR BRAIN ACTUALLY GO THROUGH SEVERAL SECONDS OF THOUGHT WITHOUT NOTICING HOW DUMB YOUR IMPLEMENTATION WAS ?????

    And don't say you didn't think when writing your n1k w/e bullcrap.

    And no i'm not giving you the solution, that would prevent you from ever becoming smart enough to actually do your job.
  • retnuh 2012-03-13 18:08
    Any guidelines, tips, for keeping a DVCS repo in sync and avoid file naming conflicts?

    We've got the following setup:

    - Live server
    - Dev server
    - 3x individual developer VMs with database
    - Mercurial

    I'd like to have a Live and Dev SQL Schema repo, in that anything pushed into the Live repo will be included in the live deployment build. My main concern is keeping the Dev in sync between all developers. So far the best solution I can think of is to use a common network share that gets auto committed and pushed every n minutes so we avoid Dev 1 and Dev 2 making change script 00135.sql near the same time and trying to push it. This really shouldn't happen all that often as schema changes are few and far between.

    I feel like I'm missing something obvious here, so any suggestions?
  • Conor 2013-02-06 20:07
    I am another user and fan of liquibase:

    http://www.liquibase.org/

    The tools provides the framework and rigour one needs to do db deltas properly, as expressed in your initial article Alex.

    Kind regards,
    Conor
  • SJS 2013-03-08 04:51
    Yes, yes, yes. Exactly right.

    Some other things I've found useful:

    1. Separate the database structural changes into "additive" and "subtractive" changes. Additive changes add columns, tables, indices, etc., while subtractive changes drop columns or tables, discard data, etc.

    This can help avoid a reload of the database -- only subtractive changes require a restore to fix, but that can be mitigated by a policy of not applying subtractive changes until well after nothing uses that data.

    (I like to split out constraints and index-creation changes as well, but that's probably a personal quirk.)

    2. Enforce the rule that code should continue to work correctly no matter what additive changes have been applied. Reject out of hand any code that depends on the order of columns in a database table, and have a word with that developer.

    (It can be useful to randomly re-order the columns in a table in the testing/QA environments. Pathological applications will likely fail messily. This is good.)

    This lets you modify the database without breaking applications, and to roll back applications without having to rebuild/restore your database. Subtractive changes can be applied well after the application is deployed without worry that the application will blow up if an unused column is removed.

    3. Track and deploy the database versions independently of the applications (assumption is that multiple applications are using the database). Applications should be considered to run in the context of the database's published structure, rather than imposing changes on the database structure. Letting an application force a database change is a recipe for breaking other applications, or for evolving an un-maintainable "design".
  • Bob 2013-04-15 19:35
    Write-Once – change scripts cannot change throughout the release lifecycle; they are not changed, they are undone (ideally, through a restore) and discarded


    A tool external to the database should probably be used to ensure change scripts are deployed once and once only, to prevent erroneous SQL being parsed to the database engine.

    I work on a system where change versioning is performed by checking the database versioning table from the change script SQL. Imagine a scenario where the following changes occur within a change cycle (meaning the scripts cannot yet be discarded):
    1 - add column A
    2 - modify column A
    3 - delete column A
    4 - any subsequent change

    Changes 1-3 have previously been deployed, then we go to build with change 4. Whilst on execution the change scripts contains code to check if changes 1-3 have already been deployed, change 2 will fail pre-execution on parsing as it references a non-existent column. The only option I've found in this scenario is to retrospectively modify the SQL for change 2 to compile the modify statement dynamically at runtime, bypassing the parser.
  • Ian 2014-02-11 00:26
    No, he's rooting FOR his product. But in principle you're allowed to re-implement. And that might cost more, if it's inconsistent with your core business.