• (cs)

    Frank rocks! Can you get me a copy of his resume?

    On second thought, can you get me the names and phone numbers of the 2 Battlestar Galactica babes I see in the upper right-hand ad in this thread? That may actually be more useful...

  • (cs) in reply to Hit

    Anonymous:
    This thread perfectly illustrates a common conflict that can happen in the business world today, the DBA vs the developer.

    You have the two extremes, of course.  On one hand, you have the DBA, who thinks the DB should drive the entire application, and all the developer does is write an application shell that calls the stored procedureds he has graciously provided.  The DB is law.

    On the other hand, you have the developer.  He wants to be able to write code to have an application work.  The DB is a storage dump to persist his data.  Nothing more.  The Application is law.

    Both of these approaches are just plain wrong.

    You have to work from the middle.  In order for this to happen, you need DBAs who understand the developer's needs (such as enough flexibility to provide adequate search functionality, for example) and developers who understand the DBA's needs (that is, don't write SELECT N + 1 queries that trash performance).  Without this mediation, you have a huge bottleneck that can be very difficult to overcome.

    All too often, you end up with one of the extremes.  In this WTF's case, you have the developer who could give a crap about silly things like data integrity, indexes, or those "key" things.

    I'm sure there could be another set of WTFs out there for the other, that is, the all-too-controlling DBA who forces stored procudures, etc on to the development crew and the process involves jumping through 500 burning hoops to get any kind of work done.


    Well said!

    I try very hard not to be a database dictator but I do insist on some things. By and large, I leave the developers to their own designs unless I think something so <font face="arial,sans-serif" size="-1"><font>egregious (like a complete lack of RDI or using a sysadmin-level account for a web application's access to a database).</font></font>

    Typically, my "rules" are more along the lines of security and stability practices than hard and fast methodology laws (i.e. you must only update tables with stored procedures or no BLOBs allowed!) 

  • Captcha: Zork. (unregistered) in reply to 3117 tr0llz0r find3r dud4

    So, and I the ONLY PHP programmer that plans the DB in advance before writting any code?

     

  • jefffurry (unregistered) in reply to biziclop
    biziclop:
    I used to work at a company where the DBA not only enforced rules but could also be invited for a coffee and a chat about how to solve a DB-related problem.

    As a result, he got well-designed data structures, optimized  queries, and in turn I've got caffeine poisoning because a good DBA can consume enormous quantities of coffee. :)

     I'm fortunate enough to work with a DBA like that. Well, except for the caffine part, and the need to record what he says and play it back at half speed so you can decipher what he's saying, cause he talks so fast. :-)

     
    Regarding today's WTF subject, I think the DBA was being too hard on that poor developer. All he needed was a little hand holding (along with wrist wrenching and arm twisting) so he could learn the idiocy of his ways....

     
    I type "initech", so therefore I am not a robot.
     

  • (cs) in reply to D. Skinner
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

    Whereas data integrity on the app level creates no onverhead at all?

    It resembles the (mis)management axiom: "Internal employees' work as free."

  • Scott Neumann (unregistered) in reply to D. Skinner
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

     
    Wow.  Paula Bean?  Is that you, dear?

  • Zlodo (unregistered) in reply to anonymous
    Anonymous:

    Seems to me like things are working just fine.  And, as long as the DBA keeps being the janitor, things will continue to work fine. 

    DBAs crack me up.  They're like, if a garbage man really thought he was a sanitation engineer, and ran around saying, "If I didn't have to pick up after all these people, I could get some engineering done."

     

    The DBA for the company I work for puts it roughly this way: "my job is to keep your data safe, keep the DB running fast, and make sure that any maintenance operation on it can be done as efficiently as possible: So when you do stuff like storing an array of keys to another table in a blob, you're preventing me to do my job properly."

    He actually works with the devteam and has commit right acces to the subversion repositery. When the devs need changes in the database, they have to go through im, and he implements the change in the database creation/update scripts and commit them himself.

    This is a good thing since most C++ programmers don't want to hear about SQL if they can help it, or if they do, they often don't want to learn how it works and why it's better than just mindlessly serializing stuff in a binary file (or a blob).

     

    I'm currently replacing, on some custom asset storage server we have, the way assets metadatas are stored,The way it's done now is nightmarish: some operations are coslty as hell, got fixed by slapping a secondary index on the side which isn't always properly kept in sync for some reason, the code to search through this search using some kind of very rogod set of criterias that can be independently enabled and is an awful unreadable mess of nested loops. Some common search operations are either very slow, or unreliable if they use the not-always-in-sync index.
     

    I did suggest to the guy who implemented this to use sqlite to store this stuff. As soon as I uttered the three letters sql, the suggestion was instantly dismissed. Constructing sql statements to communicate with a data storage library probably sounded pretty obscene to this guy who was a terminal premature-optimizer and loved using memcpy and strcpy into fixed size arrays all over the place.

    I inherited the maintenance of that thing (along with all the complains regarding the performance of some operations).

    When I'm done moving this over to sql, not only the code will be simpler and safer (atomic transactions), but it will be fast, because the data will be stored in a properly structured way. And then we'll be able to exploit it any way we want.

    I hate people eschewing sql and proceeding to implement crappy custom solutions that are nowhere as efficient, safe and correct (given that sql is the best tool for the problem at hand, of course). Reinventing a crappy solution to a problem that has been solved 40 fucking years ago is really stupid and wasteful.

  • (cs)

    Alex Papadimoulis:
    CREATE TABLE [DtxSurveys] (
    [Status] varchar (8000) NULL,
    [Resp] varchar (8000) NULL,
    [Last] varchar (8000) NULL,
    ... snip ...
    [Supervisor_Name] varchar (8000) NULL,
    [Supervisor_CUID] varchar (8000) NULL,
    [Supervisor_UID] varchar (8000) NULL,
    [File_Data_Date] varchar (8000) NULL)

    I got a little nauseous when I saw that.  How does anybody think that's a good idea, especially when there are certain columns that you know for a fact only need one bit or one char to perform their function?

    As a developer, I know the value of efficiency, not wasting resources, and using only what you absolutely need to get the job done as quickly and with the least amount of wasted space/time/resources possible.  I wouldn't think that would be too foreign of a concept for Frank to grasp, even if he had no real DBA experience at all, but...I guess that's why this site has enough material for daily postings, eh?

    Of course, that's not even touching his proposed lack of any keys or indexes.

  • (cs)

    One of my coworkers had the pleasure to work on the company's Workflow Engine. It was entirely written in PL/SQL with xml parsing, invoking services on webservers and a nice web-based admin interface using htp/htf packages.

    Other notable features included several hours of downtime each night(!) during daily incremental backup and the occasional "File not found" or "Disk full" Oracle errors.

    So my coworker decided to export and analyse the DDL of the developmeduction schema. We watched with awe and horror as the hideous, njon-euclidean "structure" unfolded on the sreen: box after box in neat little rows and not a single line between them.

    206 tables and not a single foreign key.

    So it goes.

  • (cs) in reply to biziclop
    biziclop:

    One of my coworkers had the pleasure to work on the company's Workflow Engine. It was entirely written in PL/SQL with xml parsing, invoking services on webservers and a nice web-based admin interface using htp/htf packages.

    Other notable features included several hours of downtime each night(!) during daily incremental backup and the occasional "File not found" or "Disk full" Oracle errors.

    So my coworker decided to export and analyse the DDL of the developmeduction schema. We watched with awe and horror as the hideous, njon-euclidean "structure" unfolded on the sreen: box after box in neat little rows and not a single line between them.

    206 tables and not a single foreign key.

    So it goes.

     Yeah, when I run across something like that I usually run screaming. I actually got the excuse from one developer that "that was the way the customer wanted it.". <sarcasm> Like that's a good reason. </sarcasm>


     

     "Yessir! Our database has no PK, no FK and no RDI!"

    "And what do your customers say about your database?"

    "Lots of FLWs!"

    "Oh, Four Letter Words"
     

  • (cs) in reply to JustThat

    Let me throw another minor WTF on top of this one.

    Imagine, if you will, a web app where the database login name and password is stored in clear-text in a web server config file.

    Now imagine that the account is a sysadmin account in the database server.

    Now imagine the DBA weping uncontrollably.

  • (cs) in reply to Disgruntled DBA
    Disgruntled DBA:

    Drill Sergeant:  And are you allowed to have varchar(8000) fields in your table definitions, Private Pyle?

    Private Pyle: Sir, no Sir!

    Drill Sergeant: Don't f***ing call me 'sir,' I maintain databases for a living!

    Private Pyle: Yes, Drill Sergeant! 

  • Jon (unregistered)

    At my company the developers are the DBAs and we are held to high standards in both code design and database design.  I didn't even realize there were programmers out there who didn't do both... I should get paid more...

  • rmg66 (unregistered) in reply to D. Skinner
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

     You, obviously, have never worked in a database that was larger than 50 megabytes.

  • rmg66 (unregistered) in reply to Unklegwar
    Anonymous:
    Anonymous:

    Seems to me like things are working just fine.  And, as long as the DBA keeps being the janitor, things will continue to work fine. 

    DBAs crack me up.  They're like, if a garbage man really thought he was a sanitation engineer, and ran around saying, "If I didn't have to pick up after all these people, I could get some engineering done."

      

     

    Please use your name, so I can pre-screen any future employers for your presence.

     

    Right On!

    DB Administrators and DB Architects Rock!

    WE ROCK!!!

  • (cs) in reply to Jon

    Anonymous:
    At my company the developers are the DBAs and we are held to high standards in both code design and database design.  I didn't even realize there were programmers out there who didn't do both... I should get paid more...

    You also do DB maintenance and stuff? We also try to be as skilled in DB-related things as we can, but a DBA knows a lot more about his system than the developers. And while our DBAs know a thing or two about JDBC, they're not supposed to know it inside out, while I am.

  • (cs) in reply to JustThat
    JustThat:

    I agree with what you said above but I was thinking more of the line between data integrity rules and pure business rules that may be difficult or impossible to implement within the database. For example, something like a ticketing system where a request needs to be signed of on by a manager before work can proceed. The database really isn't set up to do anything with that information. Sure, it can be stored as a value in the database but will the database know what it means? Not likely. It would take the front-end to look at the value in the database and ensure it meets the business rules before proceeding.

    In general, I believe rules dealing with data integrity should always be as you stated: a partnership between the app and the database with the database acting as the gatekeeper and having the final say (i.e. the app can't override the database). Rules dealing with business process are sometimes best put in an application which uses data known to be "clean" because it is in a well-designed database with proper RDI.

     Does that make sense? Or am I talking out my Frank?
     

     

    A friend of mine, taking a Project Management class (that apparently teaches RDBMS theory for some reason) at a state university asked me if it was "ok" for a table to only have one field in it (a primary key).  I told him that in general, no (I'm familiar with "utility/lookup" tables for things like state codes) and asked what it was for.  He said his assignment is to make a university course registration / advising system, and he wants the table so he can store a list of valid semesters up to two years in advance.  I said he probably should have a semester table so he can store things like the effective start and end dates and the tuition fees for each semester, but he said he had no need to since this system was ONLY for course registration/advising.

     I said in that case, don't bother making the semester table; just do the check in your application.  He then goes off about all the stuff he learned about "referential integrity" and that he was going to set a foreign key on the semester field of the courses table so as not to break this "referential integrity".  I told him I agree that a check must be done, but the check can and SHOULD be done in the application since it's completely business logic: Semesters are titled like 2006.1, 2006.2, 2007.1, etc, and courses are only allowed to be added for semesters within two years down the road (<--- business logic!).  Every time I told him why it should be done in the application (since you NEED to tell the user "This is not a valid semester" instead of "MS SQL ERROR NO. 1305 FOREIGN KEY CONSTRAINT") and how trivial it would be to do so, he told me "but we must ensure referential integrity of the data!".

     So finally I asked why his professor told them to do this logic in the database instead of the application, and he said "because he hasn't taught us anything about how to do computer programming."

    Now the DBA needs to manually update the database EXACTLY twice per year, or else the university administrators will be completely locked out from adding new courses.  Guess what happens if the DBA decides to take a little vacation?

     Oh, and the real WTF:  The class was 100% theory.  They never used any ACTUAL RDBMS.  The design of the database was to be done in MS Visio using ER notation.  Of course my friend gets an A for following the prof's teachings.  Good stuff they're being taught in school.

  • Adrian (unregistered) in reply to D. Skinner

    Dude I missed the /sarcasm tag.  So I am going to bite.

    try getting an app layer, a etl layer and eai layer all to honour the same data integrity constaints without PK and FK's is just a fucked up idea.
     

  • (cs) in reply to jmounce
    jmounce:

    On second thought, can you get me the names and phone numbers of the 2 Battlestar Galactica babes I see in the upper right-hand ad in this thread? That may actually be more useful...

    I believe they are Grace Park and Tricia Helfer.  Phone numbers... You're on your own. 

  • Arcan (unregistered) in reply to Captcha: Zork.

    No... I myself design my database structures in advance, and even hack out a few queries before I start. The issue isn't only PHP programmers though, .NET flunkies do it all the time.

    Web Developers don't understand being a DBA, they're not meant to, they're there to develop "internets"... being a DBA helps in this.

     
    This doesn't detract from the utter stupidity of what Frank starts out with. Good to see that there are smarter DBA's out there than the one's I have worked with.
     

  • Jon (unregistered) in reply to biziclop

    Well... nothing less than 3rd normal form goes into production unless its archiving... Usually we use autonumber ints as primary keys.  Sometimes we do softdeletes ...  We never hit a table from code, always a view or stored proc.  Always try to run queries in bulk updates... very rarely use cursors or loops... some of our queries get really insane.  We use DTS... fiddle with transaction log settings... growth settings etc.  use indexes when we need to... Am I qualified to be a full DBA probably not, but I'd probably be the DBA's best friend in another company.  One of our more notable successes of recent past was getting a query from running in 2 hrs down to less than 10 minutes.  This was a Cursor oriented StoredProc that was redone as in a bulk update.  Also our datawarehouse performs really well and is very maintainable and extensible. We do all our own User defined functions and TSQL stuff too. 

  • John Hensley (unregistered)

    Where are the ProductRequest columns in that query coming from?


  • Jon (unregistered) in reply to Jon

      Always try to run queries in bulk updates....umm... queries.. in bulk updates...umm.... damnit you know what I meant!!!

  • (cs)

    Can I report the article itself as a troll?  I mean come on.... <shudder>

  • A Different Anonymous (unregistered) in reply to Pap

    Sounds like his real problem was putting two pieces of information, the year and the semester, in the same field. Also, DBMS error numbers can be mapped to useful error messages. Granted, having an actual DBMS at his disposal would have helped...

  • Olddog (unregistered) in reply to XH

    The dev is probably a PHP-tard, so no cast is necessary.  I find this sort of thinking to be very common across the PHP developer-base.

    I'm a PHP-tard - and I admit it. I use it often. That means in my case that I'm also the DBA-of-sorts on my projects.  I'm fortunate to have peers who make a living as DBA's on an enterprise level. To be able to tap them on the shoulder an ask for advice is a great resource. I've found that when asked, they'll eagerly provide feedback on DB structure, as well as other aspects I would have never considered. There's no measure on what value they provide, other than to say, when I follow their advice... everything works. I have found that the best time to 'tap' them is before, not after the project is underway.

    As to which concern should claim ownership of DB integrity... I think it falls squarely on the shoulders of both the DB administrator( for design and governance ) and the Developer ( for implementation and execution ). The time wasted by finger-pointing between DBA and Developer is easily avoided ahead of time.

  • Lownewulf (unregistered) in reply to D. Skinner
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

    Those who are blindly mocking this statement need to learn a thing or two about engineering, and about database theory.

    Keys imply both an index and a constraint. Indexes generally make queries faster at the expense of insert/delete time and space. It is an important engineering tradeoff to determine exactly where this tradeoff makes sense and where it doesn't (hint: it often doesn't). Constraints limit data integrity - for many applications, you already know you will not fail a constraint check, and for foreign key cascading behavior, can provide such behavior yourself more efficiently since it doesn't have to be generic. Why, then, add constraints that should never be hit? Yes, some apps have such high requirements for reliability that protecting your data from even the application is a requirement, but I bet most of you aren't working with apps that require that level of reliability guarantees.

     

  • Idiot Magnet (unregistered)

    An additional WTF is that you think 52 is a "3800% reduction" from 2000.  It's a 97.4% reduction.
     

  • (cs) in reply to Idiot Magnet
    Anonymous:

    An additional WTF is that you think 52 is a "3800% reduction" from 2000.  It's a 97.4% reduction.
     

    ((2000-52) / 2000) * 100 = 97.4 % reduction, right.

    However,  2000/52 = 38.46 times or 3846% faster.

    So, both are right...just stated differently.

    Ain't math FUN?!?!? 


  • Pierre (unregistered) in reply to D. Skinner
    D Skinner:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

    CAPTCHA: wtf.

  • SocNuy (unregistered) in reply to Arcan
    Anonymous:

    Web Developers don't understand being a DBA, they're not meant to, they're there to develop "internets"... being a DBA helps in this.

    Don't know about other web developers, but I build web applications and believe myself to be a better good SQL 'coder.'  I don't know a damn thing about database administration, that's why the best DBA in the company is one of my best friends.  I trust him to tell me what to do when it comes to schema, etc.  I do an ORM; he does the physical implementation.  I do the SPs, etc.  He does the indexes and statistics.  Works great. It's teamwork.

     


    CAPTCHA : 1337 

  • (cs) in reply to Lownewulf
    Anonymous:
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

    Those who are blindly mocking this statement need to learn a thing or two about engineering, and about database theory.

    Keys imply both an index and a constraint. Indexes generally make queries faster at the expense of insert/delete time and space. It is an important engineering tradeoff to determine exactly where this tradeoff makes sense and where it doesn't (hint: it often doesn't). Constraints limit data integrity - for many applications, you already know you will not fail a constraint check, and for foreign key cascading behavior, can provide such behavior yourself more efficiently since it doesn't have to be generic. Why, then, add constraints that should never be hit? Yes, some apps have such high requirements for reliability that protecting your data from even the application is a requirement, but I bet most of you aren't working with apps that require that level of reliability guarantees.

     

    Although technically correct, you are so wrong you don't know that you are wrong.

    Your examples of increased overhead apply to algorithms within an application.  Once you have brought in an RDBMS, you are now dealing with a data archiving server designed to protect the integrity of the data.  If you want a place to just store some application data I might interest you an a million other technologies for storing data that are much, much, much more efficient.   Since you are not looking for efficiency, but rather, the safe, *relational* storage of data, then everything should be keyed, at the very least, for data integrity purposes.  In addition, for algorithmic efficiency, any combination of collumns that might be queried should be indexed, as index size is trivially cheap in comparison with the size of your data... I might be persuaded in edge cases against indexes, eg. you have a small lookup table, but keys should never be ommitted.  Personally, I would go so far as to say every valid constraint the DBA can conceive of should be applied... but ommitting keys in an RDBMS is a cardinal sin.  You cannot anticipate, with an RDBMS, that the only access to data will be from your specific app and that no other apps will have access.  You cannot anticipate that the app will always run, bug free, and do no harm to the integrity of the database.  You cannot anticipate that some human will not ever massage data.  You cannot anticipate that future application versions will be bug free with respect to the data even if this version is.

     

    Go use some lightweight distributed data broker if that's all you want, rather than a full-fledged, neccissarily inefficient RDBMS.
     

  • (cs)
    Anonymous:
    <quote>SELECT SUM(ProductRequest.ProductFaceValue) FROM Requests WHERE (ProductRequest.ProductID = Product_ID) AND (ProductRequest_createdByUserUID = @uid) GROUP BY ProductRequest_productName</quote>

    (Sorry of being stupid if i'm missing something here) -- But this query just ain't working in my head. What's the table/alias/prefix ProductRequests for when the query is only from a table called Requests? ProductRequest_createdByUserUID is blowing my head, or then I have totally misunderstool ansi sql-92. Or maybe this was some weird NVARCHAR2 sh*$ from which I just usually turn my head away...

    Don't misunderstand me, I totally understand the DBAs trying to work with queries taking over 100ms to execute. Just the story wasn't too convincing.

     

    <sux transcription>

    I goofed when I sent this in to Alex. Should read like this (note the _ instead of . in the column names):

    SELECT Product_ID, Product_name, 
    case WHEN 
          (SELECT SUM(ProductRequest_ProductFaceValue) 
          FROM Requests WHERE (ProductRequest_ProductID = Product_ID) 
          AND (ProductRequest_createdByUserUID = @uid) 
          GROUP BY ProductRequest_productName) is null 
    then 0 
    ELSE 
          (SELECT SUM(ProductRequest_ProductFaceValue) 
          FROM Requests WHERE (ProductRequest_ProductID = Product_ID) 
          AND (ProductRequest_createdByUserUID = @uid) 
          GROUP BY ProductRequest_productName) 
    end 
    as TotalDollarAmount FROM Requests WHERE (product_ID IN (@pid))
     I rewrote this to: 
    SELECT Product_ID, Product_name, ISNULL ((SELECT SUM (ProductRequest_ProductFaceValue) FROM Requests WHERE (ProductRequest_ProductID = Product_ID) AND (ProductRequest_createdByUserUID = @uid)),0) as TotalDollarAmount FROM Requests WHERE (product_ID IN (@pid))
    ...which runs in 52 milliseconds mostly because it only has to evaluate the subquery once 

     

  • (cs) in reply to Lownewulf
    Anonymous:
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

    Those who are blindly mocking this statement need to learn a thing or two about engineering, and about database theory.

    Keys imply both an index and a constraint. Indexes generally make queries faster at the expense of insert/delete time and space. It is an important engineering tradeoff to determine exactly where this tradeoff makes sense and where it doesn't (hint: it often doesn't). Constraints limit data integrity - for many applications, you already know you will not fail a constraint check, and for foreign key cascading behavior, can provide such behavior yourself more efficiently since it doesn't have to be generic. Why, then, add constraints that should never be hit? Yes, some apps have such high requirements for reliability that protecting your data from even the application is a requirement, but I bet most of you aren't working with apps that require that level of reliability guarantees.

     

    Ok, I'm confused. Can you explain how constraints limit data integrity? 

    "Why add constraints that should never be hit?"  you ask. Ok, I'll ask this back: Why wear a seatbelt if it should never have to be used?

     Wait, let me guess...your answer is going to be along the lines of "A well-written application will prevent bad data from ever reaching the database". And you're right. But what about a year down the road? Is that app still well-written? Can you guarantee that there is no way data that doesn't fit the application-based data model can be entered? Can you ensure that data will never be bulk inserted into the database from a source external to the application? In the real world, you can't do any of that.

    As for indexes not being worth the space or insert/delete time...in some cases you are right (for OLTP databases you may eschew indexes for the sake of insert speed) but for DSS systems you always want indexes as they are mostly to support reporting. Because most applications are really a mix of OLTP and DSS you need to weigh the performance changes an index will engender. Space usually isn't an issue with storage being so cheaply available now.

    You also said, "Yes, some apps have such high requirements for reliability that protecting your data from even the application is a requirement, but I bet most of you aren't working with apps that require that level of reliability guarantees." You're wrong here. I work with apps that require that level of reliability every day. People get paid on the apps I support. Generally, I like to have RI in the databases I administer to protect the data from developers who think they are god and never make mistakes.
     

     

  • (cs) in reply to Gsquared
    Anonymous:

    I believe you mean "decreased the run time to 52 milliseconds", instead of "increased".

    As a DBA, I see a lot of junk like your Frank examples.  A LOT.

    Best example I can think of off the top of my head is a query that went something like this:

    select OrderID, DateOrdered, OrderQty
    from Orders
    where dateordered between '1/1/2006' and '31/1/2006'
    union all
    select OrderID, DateOrdered, OrderQty
    from Orders
    where dateordered between '2/1/2006' and '2/31/2006'
    ...
    where dateordered between '6/1/2006' and '6/31/2006'

    All done as Unions, 1 per month (and yes, the original query request included February 31st; it was obviously a copy-and-past job).  The reason?  "We want it to be in order by month."

    I, for some reason, felt that this might work better:

    select OrderID, DateOrdered, OrderQty
    from Orders
    where dateordered between '1/1/2006' and '6/30/2006'
    order by dateordered

     BWAHAHAHAHAHAHAAAA! that's awesome!

    the best thing is the reason...Depending on how the columns are indexed, it may not even sort by date with the union statement. (i.e. if DateOrdered is not indexed, but OrderID is, odds are the compiler will sort by OrderID if no order by is specified)

  • Sizer (unregistered) in reply to Lownewulf
    Anonymous:
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

    Those who are blindly mocking this statement need to learn a thing or two about engineering, and about database theory.

    Keys imply both an index and a constraint. Indexes generally make queries faster at the expense of insert/delete time and space. It is an important engineering tradeoff to determine exactly where this tradeoff makes sense and where it doesn't <snip>

     Yes, yes, we know all this. 'Data integrity is the reponsibility of the app, not of the database,' may be freely mocked, and not blindly either. And 'keys are almost always bad,' is just freaking retarded. I've seen what happens when you follow this philosophy - your database turns into a seething mass of WTF. All it takes is /one/ little lapse in one update and suddenly all your assumptions are shot (like 'there will never be two users with the same unique internal id') and things break left and right. 'OH WELL THAT SHOULD NOT HAVE HAPPENED.' Yeah, well, in theory I shouldn't have to run lint or run unit tests either because everyone should be writing perfect code.

  • (cs) in reply to Unklegwar
    Anonymous:
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

    I remember a recent boss who had that "the app will maintain data integrity" philosophy. Needless to say, the data was a giant turdball. Too bad all the major database players have these key things that kill performance. What were they thinking? why didn't they consult someone like you before they went and did all of that?

     

    I've always held by enforcing integrity on both sides..the app would enforce the most basic things (i.e. numbers in number fields, dates in date fields etc) then let the DBMS enforce any other rules (i.e. date must be between the start and end date of a related row in another table etc..) it's fairly efficient, and good enough to not let bad data in.

  • Jeff (unregistered) in reply to Harry

    I almost worked there.  Upwards of a TB of data and the DBAs took the PK-FK links off the tables so the daily data loads would go faster.  We (the developers) finally harassed them enough to put them back in.  They still turn off the enforcement of the referential integrity during data loads, but at least the keys are there.  They may or may not be turning it back on after the load is done.  It's been a while and I suspect the data is so dirty they can't posssibly have clean data anymroe.  At least we kept the indices around.....


  • MurdocJ (unregistered) in reply to Lownewulf
    Anonymous:
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

    Those who are blindly mocking this statement need to learn a thing or two about engineering, and about database theory.

    Keys imply both an index and a constraint. Indexes generally make queries faster at the expense of insert/delete time and space. It is an important engineering tradeoff to determine exactly where this tradeoff makes sense and where it doesn't (hint: it often doesn't). Constraints limit data integrity - for many applications, you already know you will not fail a constraint check, and for foreign key cascading behavior, can provide such behavior yourself more efficiently since it doesn't have to be generic. Why, then, add constraints that should never be hit? Yes, some apps have such high requirements for reliability that protecting your data from even the application is a requirement, but I bet most of you aren't working with apps that require that level of reliability guarantees.

    I'm going to go ahead and mock the original statement.  I do know something about databases and engineering.  Anyone who thinks that keys are a bad idea, or that data integrity isn't the responsibility of the database, needs to enter the real world.  Yes, having an index slows down inserts. There are VERY few cases where having at least one  index doesn't vastly repay the insert cost.  About the only case I can imagine would be a table used as a log file, and even in that case unless you crunch the entire file you may well get benefit from a key.

    As far as data integrity, relying on the applications to never make a mistake is simply dumb.  One of the ways that you develop reliable apps is by having cross checks like database constraints. 

    I'm really really hoping that both you and the original post are actually jokes, and that you aren't serious. 

  • (cs) in reply to A Different Anonymous
    Anonymous:
    Sounds like his real problem was putting two pieces of information, the year and the semester, in the same field. Also, DBMS error numbers can be mapped to useful error messages. Granted, having an actual DBMS at his disposal would have helped...


    Using two fields may be more normalized, but in this case you just end up with two FK constraints and get the same problem.  Now if one FK fails, and you get error number 1234, what does that tell the user?  "Invalid year or semester"? I've never tried to relay a RDBMS error message to the end user before so maybe I just don't know.  What if your query had other (meaningful) FK constraints, like the instructor ID of a course having to match against the PK in the instructors table?  The error code wouldn't be enough to tell you what caused the insert query to fail.  You'll have to parse the error text manually and turn it into a useful error message, because I'll be damned if you're going to just echo out the complete error message to the user.

    I could completely see using the FK constraint if it was a static utility table, but relying on the DBA to go in there and update it bi-annually?  To me, what it comes down to is this:  Instead of relying on the app to do the data integrity enforcing, you THINK you're relying on the database to do it, but in fact you're relying on the
    DBA to keep the data in check by having him log in twice annually and do updates to the PK.

    Also, since you're not even doing the check in the app, nothing is going to stop a user from adding a course from a previous semester since the FK constraint will PASS.  So much for integrity.
  • JNK5 (unregistered) in reply to Captcha: Zork.

    I wonder what the DBA think of Rails. On the dev side, we don't have to write SQL so i don't know how efficient the queries may be. But it forces the use of PK and FK even on those that don't know what it is.

    But you basically NEED to write your models relationships down before writing any code. Then you NEED to define the relationship in your models, and you would be very lazy not to write thos few words that magically performs app level data validation.

    Well, why don't we have more RoR WTF ?

  • Rob Baillie (unregistered) in reply to JNK5

    So the WTF here is the means of communication between the DBA and the developers right?

    Why should a .NET developer have any real knowledge of the needs of a DBA, and why should a .NET developer live to the rules of a DBA when as far as he's concerned it's just a bunch of hoops and no real use to him?

    Instead of having a throw it over the wall attitude to this kind of stuff, there should be dev DBAs in the dev team pairing with people, explaining these things face to face, fostering good relationships between the different parts of the teams and ensuring quality before any code review stage.

    Plus, why should the method of 'doing it in dev' be any different to the method of doing it in production?  No, I don't mean that the DBA should import the CSV in an adhoc manner, I mean that the only way the developer should be able to get a table into his dev structure is by running the script that he's going to hand over the the DBA.  And it should run through the same database build strucutre that every script hits the production database through. The DBA does install the scripts by running a consistent single command containing a version control tag, the script extracting the relevant install scripts from VC, checking they're valid to run, the
    database is in the right state to run them, and they've not already been ran on this database before running them?

    If not, then the whole structure is big pile of WTF!

     
    Captcha: You've been tango'd

  • Benjamin Smith (unregistered) in reply to D. Skinner
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

     

    And that's why you make jack shit for income. If you don't see the value in having a database enforce various relationships in data, you don't see the value in building a system that resists the harm caused by some stupid bug.


    With this attitude, I wouldn't hire you,  and I pay my developers fairly well. So, you probably make < $30,000 per year.

  • (cs)

    Ever tried mysql? You can use TEXT fields, so you don't need to specify a maximum length...... and then you can use them as the primary key :)

    CREATE TABLE 'DtxSurveys' (
    'Status' TEXT NULL,
    'Resp' TEXT NULL,
    'Last' TEXT NULL,
    ... snip ...
    'Supervisor_Name' TEXT NULL,
    'Supervisor_CUID' TEXT NULL,
    'Supervisor_UID' TEXT NULL,
    'File_Data_Date' TEXT NULL),
    PRIMARY KEY  (Supervisor_UID(255))
    }

    Of course, even the mysql devs had enough insight to see how this would be a bad idea when a TEXT field gets to store several dozens pages worth of text, and they limited the key up to the first 255 characters, and also they force you to specify the limit.

  • DOA (unregistered) in reply to Drum D.

    I live in PHP-land as well and while it does tend to make you lazy (declare my variables? why? typecasting? what's that?) if you ever try to build anything big you better believe you'll try to get the database to run as efficiently as possible. Script languages like PHP tend to be problematic in the performance sector when they are asked to do some serious work and one way out is to use the database to do as much of the work as possible. That pretty much cures you of the huge varchar field sickness

  • (cs)

    This reminds me of a perfectly normalized table we have in our db... 

    CREATE TABLE [dbo].[HIERARCHIES](
        [Id] [int] NOT NULL,
        [Code] [nvarchar](128) NULL,
        [Description] [nvarchar](1024) NULL,
        [SomeBoolean] [bit] NULL,
        [Lvl1] [nvarchar](128) NULL,
        [Lvl2] [nvarchar](128) NULL,
        [Lvl3] [nvarchar](128) NULL,
        [Lvl4] [nvarchar](128) NULL,
        [Lvl5] [nvarchar](128) NULL,
        [Lvl6] [nvarchar](128) NULL,
        [Lvl7] [nvarchar](128) NULL,
        [Lvl8] [nvarchar](128) NULL,
        [Lvl9] [nvarchar](128) NULL,
        [Lvl10] [nvarchar](128) NULL,
    )

  • Jurgen (unregistered)

    I can see that a char(8000) is wasting resources, I don't see how a varchar(8000) does?

  • (cs) in reply to Lownewulf
    Anonymous:
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

    Those who are blindly mocking this statement need to learn a thing or two about engineering, and about database theory.

    Keys imply both an index and a constraint. Indexes generally make queries faster at the expense of insert/delete time and space. It is an important engineering tradeoff to determine exactly where this tradeoff makes sense and where it doesn't (hint: it often doesn't). Constraints limit data integrity - for many applications, you already know you will not fail a constraint check, and for foreign key cascading behavior, can provide such behavior yourself more efficiently since it doesn't have to be generic. Why, then, add constraints that should never be hit? Yes, some apps have such high requirements for reliability that protecting your data from even the application is a requirement, but I bet most of you aren't working with apps that require that level of reliability guarantees.

     

     I'm not mocking it, I just simply disagree with it.

    Your argument is a sensible one, however you assume that data will never be manipulated by anything else but your application. More often than not, this is a bad assumption. Even if eventually no-one else will touch your data until the end of times, just imagine the costs if they do.

    Why add constraints that should never be hit? I think you're asking the wrong question there. (Why pay insurance if your house never burns down?) I'd reverse the question: whereas you might save a few microsecs by not enforcing data integrity, is it worth the risk?

  • (cs) in reply to D. Skinner

    Trusting the app to maintain database integrity is like trusting Javascript to check the user's inputs to a form. Sure, it may do the trick if you can guarantee that the user will only be using a browser with scripting turned on, and the script runs properly, and they don't have any user scripts messing with your variables, and the user doesn't just form his own bogus request to communicate with the server directly.

     Having JS flag when the user is typing letters into their zip code is fine, but you have to check it again once it hits the server. The same thing applies to the relationship between the DB and the app. Having the app check and enforce the buisness rules is good. It catches the error a step early, (just as the JS catches the zipcode-with-letters scenario) but the DB has to treat the data coming from the app as potentially unreliable. Sure, you don't need to enforce all the business rules as DB constraints, but you should at the very least make sure that nothing gets stored that will actually break the integrity of the DB
     

    Quality is Job # M 1

  • devmem (unregistered) in reply to D. Skinner
    Anonymous:

    Since keys create overhead (sometimes a lot), they should not be mandated or even favored by any standards.  Frankly, I'm of the philosophy that keys are almost always bad.  Data integrity is the responsibility of the app, not of the database.  (Indexes, of course, are usually necessary for large tables, but even they should be used sparingly.)

     

     It's always funny to meet some hardcore crack devs. "Uh, Index? PK? We don't need them, it's only a small table!!11"

    Yep... until it's been 3 month in production... and people start to complain about bad performance...

    I'm really happy that I've started in the database department of a big bank before I went  more into development, and best thing is today, as a freelancer, I can do everything from DBA work, specialized DB development and the best: kick stupid developers asses who complain about a dba.

Leave a comment on “Manager of the Data Dump”

Log In or post as a guest

Replying to comment #:

« Return to Article