• Chris (unregistered)

    Nobody is going to read this on the 3rd or 4th page but here goes.

    I agree that data integrity is the responsibility of both the RDBMS and the app.  

    As far as trusting the RDBMS, I couldn't tell you how many times I get a developer with some problem and he wants to blame the RDBMS.  The people who wrote the code for the RDBMS are human and make mistakes and blunders just like the folks in your IT shop.  However, I bet that by the time the RDBMS is released, the code has been tested more than most of my developers have tested ANYTHING in their lives combined. 

  • Adam (unregistered) in reply to XH
    Anonymous:
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)


    Wow, this guy has got to be on the MySQL development team.  "Foreign keys are bad and complicate the application unnecessarily."  "The application, not the database, is responsible for data quality and should prevent you from storing nonsensical dates like November 31st, 2006."

    As for turning of foreign keys and other constraints in the production database:  Can you say, with 100% certainty, that you have tested every possible test case, run every scenario, and thought of every possible contingency that may appear in a production environment.  No, you cannot.  Ever.  Database constraints, in comparison to any reasonably sized application, are quite a simple system for maintaining data quality, consistency, and coherency.  An application is not directly suited to this sort of work.  The economics of specialization require you to have each actor perform the work that it is best able to perform in comparison to all other actors.  Ergo, data consistency issues should be handled primarily by the database.  

    Additionally, the "once and only once" mandate of computer science requires that this work be done once and only once.  Now, given that several organizations use the database as an "intergration layer" (Hohpe & Woolf.  EIP.  Addison Wesley Professional) between applications, this work needs to be performed once across the entire organization.  Thus, the database is a natural choice.  Additionally, what if your team is rock-star, but the team across the hallway sucks?  Would you want to share the same pool if you knew they were pissing in it?  But if you could prevent them from pissing in the pool without modifying your app, would you?  Yes.

    Actually, keeping the other members of my team from pissing in the pool is enough to make it worth it. I have yet to work on anything big enough to require a DBA, but I do have a reputation for finding and verbally abusing anyone who makes me question the validity of data in the database. 

  • Jon (unregistered) in reply to Chris

    Data Integrety is the sole responsiblity of the database.  Data Validation is mostly the job of the programmer.  Having a well formed highly normalized database makes it Harder to shoot yourself in the foot.  Your App will get a bunch of exceptions and you will have to handle them with validation... That makes testing easier.  With a highly normalized database your program becomes more extensible and maintainable.  With a highly normalized database you can ensure that you wont have any orphaned records.. or absolute crap data... you will easily be able to move it into a datawarehouse without much need for scrubbing the data.  You wont have to worry about  loosing data because your structure allowed you to corrupt your database (lets say 2 rows being excatly the same. 

    Making sure a user enters a properly formated phone number into a Phone number field... Job of the App... Making sure the correct phone number is always returned/updated when asked for job of the database. 

     Good Database design ensures you can trust your data... share your data with other applications with confidence, allows flexibilty for the developer, increases performance of the App... increases dev time a fair amount.... but when you get use to it... can save you and your team weeks/months of time in testing, maintenance, re-doing this work for another applicaition because yours doesn't take care of it and the list goes on. 

    My programs know nothing about the database structure it talks too and my database structure knows nothing about the programs. Decoupled things are good.

     Jon

     

  • (cs) in reply to Jon
    Anonymous:

    Data Integrety is the sole responsiblity of the database.  Data Validation is mostly the job of the programmer.  Having a well formed highly normalized database makes it Harder to shoot yourself in the foot.  Your App will get a bunch of exceptions and you will have to handle them with validation... That makes testing easier.  With a highly normalized database your program becomes more extensible and maintainable.  With a highly normalized database you can ensure that you wont have any orphaned records.. or absolute crap data... you will easily be able to move it into a datawarehouse without much need for scrubbing the data.  You wont have to worry about  loosing data because your structure allowed you to corrupt your database (lets say 2 rows being excatly the same. 

    Making sure a user enters a properly formated phone number into a Phone number field... Job of the App... Making sure the correct phone number is always returned/updated when asked for job of the database. 

     Good Database design ensures you can trust your data... share your data with other applications with confidence, allows flexibilty for the developer, increases performance of the App... increases dev time a fair amount.... but when you get use to it... can save you and your team weeks/months of time in testing, maintenance, re-doing this work for another applicaition because yours doesn't take care of it and the list goes on. 

    My programs know nothing about the database structure it talks too and my database structure knows nothing about the programs. Decoupled things are good.

     Jon 

     Here here....! So say we all!

  • UTU (unregistered) in reply to Hit

    Anonymous:
    On one hand, you have the DBA, who thinks the DB should drive the entire application, and all the developer does is write an application shell that calls the stored procedureds he has graciously provided.

    Well now, that makes no sense. DBA is not supposed to be a developer (though many of them do actually start that way) ... DBA does not provide you with stored procedures; DBA provides you with a database that performs well. 

    Anonymous:
    The DB is law.

    With this, I can totally relate to. I'll quote one of my favourite authors (who just happened to be one of the persons filling in for Alex just a few weeks ago) ... The data is what is important, applications come and go!

    Anonymous:
    On the other hand, you have the developer.  He wants to be able to write code to have an application work.  The DB is a storage dump to persist his data.  Nothing more.  The Application is law.

     
    DROP DEVELOPER CASCADE APPLICATION(S);
    CREATE NEW DEVELOPER;
    EVALUATE DEVELOPER;
    GRANT DEVELOPER PRIVILEGES;)

     

  • UTU (unregistered) in reply to duck1123

    duck1123:
    It catches the error a step early, (just as the JS catches the zipcode-with-letters scenario)

    Umm... perhaps you don't know it, but ZIP-codes can (and often*do*) have letters in them. Oh... I see, you're only coding something to be used in some limited area where they don't. Well, good luck to you. Hope you never expand outside the area you originally targeted for :)


     

  • UTU (unregistered) in reply to XH

    Anonymous:
    this guy has got to be on the MySQL development team.

    Oh please... You can use MySQL well; that just requires you know it well enough to do so (honestly, MyISAM tables suck; you do have better engine choices)

    And this goes for *any* database I know of. You can use them badly, or you can use them right (and actually, nearly any way in between) ... you just need to freakin' know enough of them.

     

    The problem with MySQL is that it's a bit too easy to begin with. You can start without understanding anything about what you're doing. With any system that allows this to happen, I've noticed it to be a bad thing (however, I try to make my own applications in such a way that they require no previous understanding of what is about to happen - perhaps I should think about changing jobs or something). 

  • UTU (unregistered) in reply to Jon

    Anonymous:
    Making sure a user enters a properly formated phone number into a Phone number field... Job of the App... Making sure the correct phone number is always returned/updated when asked for job of the database.

     I think you're a bit off here... lemme explain...


    The only point the database needs to take care of, is to not accept an invalid value. That way, it will never return an invalid value (as an invalid value is never accepted into the database).  

     

    All the app needs to do here is make it clear to the user that the only accepted sort value for "phone number" is like *** (whatever). After that, if the app attempts to insert an invalid value into the database, the database tells the app that this is not doable and makes a rollback (if this failed, anything we attempted since last commit is most likely not commitable). When the app receives the message that the database failed because it didn't receive a correctly formatted "phone number" it can tell the user that it failed because it didn't receive a correctly formatted "phone number". After this the user needs to modify the format into one that suits the database; before it does, the data modification/insertion will not succeed and data integrity throughout the system is safe.

    I fail to see what is so difficult with all this for so many people? 

  • Wayne (unregistered)

    Oh, man.  Been there.  At one job I told this guy who had built a complex database with no relationships that I'd straighten it out, but he had to create a primary key on all the tables.  So he goes and adds an identity column on each one.

    On another job, I walk in and the main DB has a table, said table has millions of rows, said table (a) has no date/time data types, they're all in varchars, so you have to do a convert any time you want to query, and (b) all of the chars and varchars are nchars and nvarchars!  It was such a huge table that fixing it was almost impossible due to space considerations, not to mention that it was a 24/7 operation.

    The developers reason?  He uploaded the tables from an Ingres II/SCO system via Visual Studio into SQL Server and these were the data types that were produced.

     

  • (cs) in reply to UTU
    Anonymous:

    Anonymous:
    Making sure a user enters a properly formated phone number into a Phone number field... Job of the App... Making sure the correct phone number is always returned/updated when asked for job of the database.

     I think you're a bit off here... lemme explain...


    The only point the database needs to take care of, is to not accept an invalid value. That way, it will never return an invalid value (as an invalid value is never accepted into the database).  

     

    All the app needs to do here is make it clear to the user that the only accepted sort value for "phone number" is like *** (whatever). After that, if the app attempts to insert an invalid value into the database, the database tells the app that this is not doable and makes a rollback (if this failed, anything we attempted since last commit is most likely not commitable). When the app receives the message that the database failed because it didn't receive a correctly formatted "phone number" it can tell the user that it failed because it didn't receive a correctly formatted "phone number". After this the user needs to modify the format into one that suits the database; before it does, the data modification/insertion will not succeed and data integrity throughout the system is safe.

    I fail to see what is so difficult with all this for so many people? 

     

    Why waste a round trip to the DB when the client can validate the number itself?

     

    First, the client side language generally has a richer set of constructs to validate input, and may even have a larger context to do so in, ie. there may be more information available at input time about the data than what will be stored in the DB, so the client can often do checks that are not possible in the DB.

     Secondly, assuming we are talking about real DB servers here, and not local Access and MSDE style DB's, there is quite a hit in the roundtrip, especially when dealing with the multi-tiered environments common with web applications.  In your scenario, a client mistypes a phone number, say for instance, he hits q instead of 1 since the keys are next to each other.  That faulty phone number now gets transmitted over his low bandwidth consumer line, across the internet, hits the web server, is transmitted to the buisiness application server, is put into a business object and played around with through N business rules, and finally committed to a transaction with the database, at which time the database rejects it, rolls back the entire transaction, forcing the business layer to undo all it's data hashing, causing the webserver to serve up a special rejection, to be transmitted back across the internet to the user's browser which must render a new page describing the issue..... It would be SOOOO much simpler to have a little line of JScript catch that Q as invalid before causing all that hassle.  The DB can validate it too, just to be safe, but by the time the error propagates to the DB, it has already caused alot of extraneous work to be done.

     

  • Philippe (unregistered)

    It remember me at my old job, where possible size over 8060 was not a trouble, since "it would not happen in real world data"... Yeah sure, but the day it will happen, what will we do?

  • UTU (unregistered) in reply to procyon112

    procyon112:
    Why waste a round trip to the DB when the client can validate the number itself?

    I'm not saying that the client can't make the validation. Why not; if that saves the client the roundtrip, it's all for the good. However...

    procyon112:
    The DB can validate it too, just to be safe, but by the time the error propagates to the DB, it has already caused alot of extraneous work to be done.

     My view is that the DB is the place that should *always* validate data also; that's one of the things it's designed to handle - enforcing data integrity. As said, this does not mean that the client should not validate data itself. But as said, it's the data that is important ... client applications come and go but the data needs to stay intact. Designing a database for a single application is one of the shortsightest decisions I see done again and again. Most of the applications don't survive more than a couple of years, but if the database design is good enough, this is no problem.

    procyon112:
    It would be SOOOO much simpler to have a little line of JScript catch that Q as invalid before causing all that hassle.

    On this, I do agree. However, as I said, the database *must* do the validation also. If the client (for any given reason) fails to validate the value, the invalid value must be catched at least in the database (if not before). It's just a failsafe for the data integrity... I personally try to design my databases "well" - usually saves me a lot of work if things go wrong in the client end. Also it usually stops quite many wtf's from ever happening in the first place :) On a sidenote, I hate most the clients who want things done "quick and dirty", it just doesn't fit me :(

    The CAPTCHA appears to be "stfu" ... I believe that's a personal message for me :D

  • Justin (unregistered) in reply to RayMarron

    Skinner, principally, is either a) Being ironical, or b) being stupid

     captcha: genius: Surely not

  • PleasePullPlummer (unregistered)

    Personally, I'm a little disappointed in the efforts of the developer on this one.  Sure Varchar(8000) is easy, but to really show off Frank should have used images instead.  at least then he could have gotten around the whole 8060 problem.  sheesh!

  • Randyd (unregistered) in reply to D. Skinner
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

     

    i hope this is a troll - you've got it completely backwards.

  • Axel (unregistered) in reply to Volmarias

    Hm, turned of AdblockPlus for the Non-WTF Jobs, eh?

  • (cs) in reply to JustThat
    JustThat:

    ((2000-52) / 2000) * 100 = 97.4 % reduction, right.

    However,  2000/52 = 38.46 times or 3846% faster.

    So, both are right...just stated differently.

    No they aren't...

    97.4% reduction: correct

    3846% faster: correct

    3800% decrease in query time: WRONG

    It's quite simple, really. If you have an x% reduction of an amount y, what does that leave you?

    (100 - x)/100 * y

    or, if you prefer,

    (1 - x%) * y

    This means that a 3800% reduction is equivalent to multiplication with -37. In which case the query would now finish 74 seconds before it began (will begin?). If that were true, J.T. would be a hot candidate for next year's Nobel Prize in Physics.

    JustThat:

    Ain't math FUN?!?!?

    Only if you know how to use it.

  • Mike (unregistered) in reply to GoatCheez

    Actually it wasn't a 3800% decrease anyway.  It was a 97.4% decrease.  If it had been reversed to its former state, that would have been a 3800% INcrease.

    Sorry to nitpick.  It's my job...

  • XH (unregistered) in reply to UTU

    procyon112:
    It would be SOOOO much simpler to have a little line of JScript catch that Q as invalid before causing all that hassle.

    You're forgetting one of the first principles of web development.  Never trust the client.  If the code isn't running on the server, one cannot be sure it will ever get run.
  • webdev101 (unregistered) in reply to Arcan

    ---Web Developers don't understand being a DBA, they're not meant to, they're there to develop "internets"... being a DBA helps in this.

    I am what you call a common old web developer and I can never even dream of doing something like Mr. Frank here. Mr. Frank proves that not everybody has the gift of commonsense.

    Captcha: Web Developers are human too.;)

  • webdev101 (unregistered) in reply to Rob Baillie

    --why should the method of 'doing it in dev' be any different to the method of doing it in production?

     

    <sarcasm>

     So when the time comes to deploy the app and the boss is asking for the reason for delay the whole thing can be mounted on DBA.

    Boss: Why is my app not deployed?

    Developer: I already submited the code but the DBA is not uploading my DB changes to production and keeps on giving hard time to us....

    DBA: But sir....

    Boss: That's not true. I know the app is working in DEV.

    DBA: But sir....

    Boss: You are costing us $XXXXXXXXXXXX. You are no good. Our Devs know more about data then you............

    </sarcasm>

  • (cs) in reply to deathkrush

    deathkrush:
    J.T. should have used ISKINDANULL, ISMAYBENULL or at least ISVERYNULL

     

     accepted Forum standard:

     enum eMyBoolean {
        TRUE,
        MOSTLY_TRUE,
        COULD_BE_TRUE,
        SOMETIMES_IT_IS_TRUE,
        NOPE_IT_IS_FALSE,
        I_WAS_WRONG_IT_IS_REALLY_TRUE,
        FALSE,
        FILE_NOT_FOUND,
        FILE_FOUND_AND_SO_I_DELETED_IT
    };

  • (cs)

    I'm sorry, but both the DBA and dev here sound pretty snotty and annoying.

    The dev - yeah, obviously he didn't know what he was doing.

    But the DBA, while he may have just been doing his job, didn't really put the query into context.


    It reminded me of DBAs (which I was eventually to become one for a brief period) at a Fortune 50 company right out of school.

    They had this (pretty understandable) rule that you couldn't have "SELECT *" in SQL queries on production servers.

    Well, if you know anything about DBMSes you know that specifying the fields you want to pull back can help query performance just slightly.

    The thing that was foolish, was they would very easily let you submit apps to production that repeatedly made recursive calls to the database (much more inefficently than doing a single optimized SQL call), just because they couldn't run a parser to check for this kind of thing.  (whereas grep'ing for 'SELECT *' is easy)

    The optimization thing on a 2sec query also made me puke a bit.  Is the query running once per day, per week, per month, per minute?

    If someone told me to re-optimize a 2sec query that runs once a month, down to a .2sec query, I would litterally mushroom punch him in the face.

    I get paid $60 / hour.  How much is 1.8sec worth of cpu time per day / week / month?

    But sure, if the queryr had to run fairly often (more than once an hour), optimize it all you want.

     

Leave a comment on “Manager of the Data Dump”

Log In or post as a guest

Replying to comment #:

« Return to Article