- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Frank rocks! Can you get me a copy of his resume?
On second thought, can you get me the names and phone numbers of the 2 Battlestar Galactica babes I see in the upper right-hand ad in this thread? That may actually be more useful...
Admin
Well said!
I try very hard not to be a database dictator but I do insist on some things. By and large, I leave the developers to their own designs unless I think something so <font face="arial,sans-serif" size="-1"><font>egregious (like a complete lack of RDI or using a sysadmin-level account for a web application's access to a database).</font></font>
Typically, my "rules" are more along the lines of security and stability practices than hard and fast methodology laws (i.e. you must only update tables with stored procedures or no BLOBs allowed!)
Admin
So, and I the ONLY PHP programmer that plans the DB in advance before writting any code?
Admin
I'm fortunate enough to work with a DBA like that. Well, except for the caffine part, and the need to record what he says and play it back at half speed so you can decipher what he's saying, cause he talks so fast. :-)
Regarding today's WTF subject, I think the DBA was being too hard on that poor developer. All he needed was a little hand holding (along with wrist wrenching and arm twisting) so he could learn the idiocy of his ways....
I type "initech", so therefore I am not a robot.
Admin
Whereas data integrity on the app level creates no onverhead at all?
It resembles the (mis)management axiom: "Internal employees' work as free."
Admin
Wow. Paula Bean? Is that you, dear?
Admin
The DBA for the company I work for puts it roughly this way: "my job is to keep your data safe, keep the DB running fast, and make sure that any maintenance operation on it can be done as efficiently as possible: So when you do stuff like storing an array of keys to another table in a blob, you're preventing me to do my job properly."
He actually works with the devteam and has commit right acces to the subversion repositery. When the devs need changes in the database, they have to go through im, and he implements the change in the database creation/update scripts and commit them himself.
This is a good thing since most C++ programmers don't want to hear about SQL if they can help it, or if they do, they often don't want to learn how it works and why it's better than just mindlessly serializing stuff in a binary file (or a blob).
I'm currently replacing, on some custom asset storage server we have, the way assets metadatas are stored,The way it's done now is nightmarish: some operations are coslty as hell, got fixed by slapping a secondary index on the side which isn't always properly kept in sync for some reason, the code to search through this search using some kind of very rogod set of criterias that can be independently enabled and is an awful unreadable mess of nested loops. Some common search operations are either very slow, or unreliable if they use the not-always-in-sync index.
I did suggest to the guy who implemented this to use sqlite to store this stuff. As soon as I uttered the three letters sql, the suggestion was instantly dismissed. Constructing sql statements to communicate with a data storage library probably sounded pretty obscene to this guy who was a terminal premature-optimizer and loved using memcpy and strcpy into fixed size arrays all over the place.
I inherited the maintenance of that thing (along with all the complains regarding the performance of some operations).
When I'm done moving this over to sql, not only the code will be simpler and safer (atomic transactions), but it will be fast, because the data will be stored in a properly structured way. And then we'll be able to exploit it any way we want.
I hate people eschewing sql and proceeding to implement crappy custom solutions that are nowhere as efficient, safe and correct (given that sql is the best tool for the problem at hand, of course). Reinventing a crappy solution to a problem that has been solved 40 fucking years ago is really stupid and wasteful.
Admin
I got a little nauseous when I saw that. How does anybody think that's a good idea, especially when there are certain columns that you know for a fact only need one bit or one char to perform their function?
As a developer, I know the value of efficiency, not wasting resources, and using only what you absolutely need to get the job done as quickly and with the least amount of wasted space/time/resources possible. I wouldn't think that would be too foreign of a concept for Frank to grasp, even if he had no real DBA experience at all, but...I guess that's why this site has enough material for daily postings, eh?
Of course, that's not even touching his proposed lack of any keys or indexes.
Admin
One of my coworkers had the pleasure to work on the company's Workflow Engine. It was entirely written in PL/SQL with xml parsing, invoking services on webservers and a nice web-based admin interface using htp/htf packages.
Other notable features included several hours of downtime each night(!) during daily incremental backup and the occasional "File not found" or "Disk full" Oracle errors.
So my coworker decided to export and analyse the DDL of the developmeduction schema. We watched with awe and horror as the hideous, njon-euclidean "structure" unfolded on the sreen: box after box in neat little rows and not a single line between them.
206 tables and not a single foreign key.
So it goes.
Admin
Yeah, when I run across something like that I usually run screaming. I actually got the excuse from one developer that "that was the way the customer wanted it.". <sarcasm> Like that's a good reason. </sarcasm>
"Yessir! Our database has no PK, no FK and no RDI!"
"And what do your customers say about your database?"
"Lots of FLWs!"
"Oh, Four Letter Words"
Admin
Let me throw another minor WTF on top of this one.
Imagine, if you will, a web app where the database login name and password is stored in clear-text in a web server config file.
Now imagine that the account is a sysadmin account in the database server.
Now imagine the DBA weping uncontrollably.
Admin
Drill Sergeant: Don't f***ing call me 'sir,' I maintain databases for a living!
Private Pyle: Yes, Drill Sergeant!
Admin
At my company the developers are the DBAs and we are held to high standards in both code design and database design. I didn't even realize there were programmers out there who didn't do both... I should get paid more...
Admin
You, obviously, have never worked in a database that was larger than 50 megabytes.
Admin
Right On!
DB Administrators and DB Architects Rock!
WE ROCK!!!
Admin
You also do DB maintenance and stuff? We also try to be as skilled in DB-related things as we can, but a DBA knows a lot more about his system than the developers. And while our DBAs know a thing or two about JDBC, they're not supposed to know it inside out, while I am.
Admin
A friend of mine, taking a Project Management class (that apparently teaches RDBMS theory for some reason) at a state university asked me if it was "ok" for a table to only have one field in it (a primary key). I told him that in general, no (I'm familiar with "utility/lookup" tables for things like state codes) and asked what it was for. He said his assignment is to make a university course registration / advising system, and he wants the table so he can store a list of valid semesters up to two years in advance. I said he probably should have a semester table so he can store things like the effective start and end dates and the tuition fees for each semester, but he said he had no need to since this system was ONLY for course registration/advising.
I said in that case, don't bother making the semester table; just do the check in your application. He then goes off about all the stuff he learned about "referential integrity" and that he was going to set a foreign key on the semester field of the courses table so as not to break this "referential integrity". I told him I agree that a check must be done, but the check can and SHOULD be done in the application since it's completely business logic: Semesters are titled like 2006.1, 2006.2, 2007.1, etc, and courses are only allowed to be added for semesters within two years down the road (<--- business logic!). Every time I told him why it should be done in the application (since you NEED to tell the user "This is not a valid semester" instead of "MS SQL ERROR NO. 1305 FOREIGN KEY CONSTRAINT") and how trivial it would be to do so, he told me "but we must ensure referential integrity of the data!".
So finally I asked why his professor told them to do this logic in the database instead of the application, and he said "because he hasn't taught us anything about how to do computer programming."
Now the DBA needs to manually update the database EXACTLY twice per year, or else the university administrators will be completely locked out from adding new courses. Guess what happens if the DBA decides to take a little vacation?
Oh, and the real WTF: The class was 100% theory. They never used any ACTUAL RDBMS. The design of the database was to be done in MS Visio using ER notation. Of course my friend gets an A for following the prof's teachings. Good stuff they're being taught in school.
Admin
Dude I missed the /sarcasm tag. So I am going to bite.
try getting an app layer, a etl layer and eai layer all to honour the same data integrity constaints without PK and FK's is just a fucked up idea.
Admin
I believe they are Grace Park and Tricia Helfer. Phone numbers... You're on your own.
Admin
No... I myself design my database structures in advance, and even hack out a few queries before I start. The issue isn't only PHP programmers though, .NET flunkies do it all the time.
Web Developers don't understand being a DBA, they're not meant to, they're there to develop "internets"... being a DBA helps in this.
This doesn't detract from the utter stupidity of what Frank starts out with. Good to see that there are smarter DBA's out there than the one's I have worked with.
Admin
Well... nothing less than 3rd normal form goes into production unless its archiving... Usually we use autonumber ints as primary keys. Sometimes we do softdeletes ... We never hit a table from code, always a view or stored proc. Always try to run queries in bulk updates... very rarely use cursors or loops... some of our queries get really insane. We use DTS... fiddle with transaction log settings... growth settings etc. use indexes when we need to... Am I qualified to be a full DBA probably not, but I'd probably be the DBA's best friend in another company. One of our more notable successes of recent past was getting a query from running in 2 hrs down to less than 10 minutes. This was a Cursor oriented StoredProc that was redone as in a bulk update. Also our datawarehouse performs really well and is very maintainable and extensible. We do all our own User defined functions and TSQL stuff too.
Admin
Where are the ProductRequest columns in that query coming from?
Admin
Always try to run queries in bulk updates....umm... queries.. in bulk updates...umm.... damnit you know what I meant!!!
Admin
Can I report the article itself as a troll? I mean come on.... <shudder>
Admin
Sounds like his real problem was putting two pieces of information, the year and the semester, in the same field. Also, DBMS error numbers can be mapped to useful error messages. Granted, having an actual DBMS at his disposal would have helped...
Admin
I'm a PHP-tard - and I admit it. I use it often. That means in my case that I'm also the DBA-of-sorts on my projects. I'm fortunate to have peers who make a living as DBA's on an enterprise level. To be able to tap them on the shoulder an ask for advice is a great resource. I've found that when asked, they'll eagerly provide feedback on DB structure, as well as other aspects I would have never considered. There's no measure on what value they provide, other than to say, when I follow their advice... everything works. I have found that the best time to 'tap' them is before, not after the project is underway.
As to which concern should claim ownership of DB integrity... I think it falls squarely on the shoulders of both the DB administrator( for design and governance ) and the Developer ( for implementation and execution ). The time wasted by finger-pointing between DBA and Developer is easily avoided ahead of time.
Admin
Those who are blindly mocking this statement need to learn a thing or two about engineering, and about database theory.
Keys imply both an index and a constraint. Indexes generally make queries faster at the expense of insert/delete time and space. It is an important engineering tradeoff to determine exactly where this tradeoff makes sense and where it doesn't (hint: it often doesn't). Constraints limit data integrity - for many applications, you already know you will not fail a constraint check, and for foreign key cascading behavior, can provide such behavior yourself more efficiently since it doesn't have to be generic. Why, then, add constraints that should never be hit? Yes, some apps have such high requirements for reliability that protecting your data from even the application is a requirement, but I bet most of you aren't working with apps that require that level of reliability guarantees.
Admin
An additional WTF is that you think 52 is a "3800% reduction" from 2000. It's a 97.4% reduction.
Admin
((2000-52) / 2000) * 100 = 97.4 % reduction, right.
However, 2000/52 = 38.46 times or 3846% faster.
So, both are right...just stated differently.
Ain't math FUN?!?!?
Admin
CAPTCHA: wtf.
Admin
Don't know about other web developers, but I build web applications and believe myself to be a better good SQL 'coder.' I don't know a damn thing about database administration, that's why the best DBA in the company is one of my best friends. I trust him to tell me what to do when it comes to schema, etc. I do an ORM; he does the physical implementation. I do the SPs, etc. He does the indexes and statistics. Works great. It's teamwork.
CAPTCHA : 1337
Admin
Although technically correct, you are so wrong you don't know that you are wrong.
Your examples of increased overhead apply to algorithms within an application. Once you have brought in an RDBMS, you are now dealing with a data archiving server designed to protect the integrity of the data. If you want a place to just store some application data I might interest you an a million other technologies for storing data that are much, much, much more efficient. Since you are not looking for efficiency, but rather, the safe, *relational* storage of data, then everything should be keyed, at the very least, for data integrity purposes. In addition, for algorithmic efficiency, any combination of collumns that might be queried should be indexed, as index size is trivially cheap in comparison with the size of your data... I might be persuaded in edge cases against indexes, eg. you have a small lookup table, but keys should never be ommitted. Personally, I would go so far as to say every valid constraint the DBA can conceive of should be applied... but ommitting keys in an RDBMS is a cardinal sin. You cannot anticipate, with an RDBMS, that the only access to data will be from your specific app and that no other apps will have access. You cannot anticipate that the app will always run, bug free, and do no harm to the integrity of the database. You cannot anticipate that some human will not ever massage data. You cannot anticipate that future application versions will be bug free with respect to the data even if this version is.
Go use some lightweight distributed data broker if that's all you want, rather than a full-fledged, neccissarily inefficient RDBMS.
Admin
<sux transcription>
I goofed when I sent this in to Alex. Should read like this (note the _ instead of . in the column names):
...which runs in 52 milliseconds mostly because it only has to evaluate the subquery onceAdmin
Ok, I'm confused. Can you explain how constraints limit data integrity?
"Why add constraints that should never be hit?" you ask. Ok, I'll ask this back: Why wear a seatbelt if it should never have to be used?
Wait, let me guess...your answer is going to be along the lines of "A well-written application will prevent bad data from ever reaching the database". And you're right. But what about a year down the road? Is that app still well-written? Can you guarantee that there is no way data that doesn't fit the application-based data model can be entered? Can you ensure that data will never be bulk inserted into the database from a source external to the application? In the real world, you can't do any of that.
As for indexes not being worth the space or insert/delete time...in some cases you are right (for OLTP databases you may eschew indexes for the sake of insert speed) but for DSS systems you always want indexes as they are mostly to support reporting. Because most applications are really a mix of OLTP and DSS you need to weigh the performance changes an index will engender. Space usually isn't an issue with storage being so cheaply available now.
You also said, "Yes, some apps have such high requirements for reliability that protecting your data from even the application is a requirement, but I bet most of you aren't working with apps that require that level of reliability guarantees." You're wrong here. I work with apps that require that level of reliability every day. People get paid on the apps I support. Generally, I like to have RI in the databases I administer to protect the data from developers who think they are god and never make mistakes.
Admin
BWAHAHAHAHAHAHAAAA! that's awesome!
the best thing is the reason...Depending on how the columns are indexed, it may not even sort by date with the union statement. (i.e. if DateOrdered is not indexed, but OrderID is, odds are the compiler will sort by OrderID if no order by is specified)
Admin
Yes, yes, we know all this. 'Data integrity is the reponsibility of the app, not of the database,' may be freely mocked, and not blindly either. And 'keys are almost always bad,' is just freaking retarded. I've seen what happens when you follow this philosophy - your database turns into a seething mass of WTF. All it takes is /one/ little lapse in one update and suddenly all your assumptions are shot (like 'there will never be two users with the same unique internal id') and things break left and right. 'OH WELL THAT SHOULD NOT HAVE HAPPENED.' Yeah, well, in theory I shouldn't have to run lint or run unit tests either because everyone should be writing perfect code.
Admin
I've always held by enforcing integrity on both sides..the app would enforce the most basic things (i.e. numbers in number fields, dates in date fields etc) then let the DBMS enforce any other rules (i.e. date must be between the start and end date of a related row in another table etc..) it's fairly efficient, and good enough to not let bad data in.
Admin
I almost worked there. Upwards of a TB of data and the DBAs took the PK-FK links off the tables so the daily data loads would go faster. We (the developers) finally harassed them enough to put them back in. They still turn off the enforcement of the referential integrity during data loads, but at least the keys are there. They may or may not be turning it back on after the load is done. It's been a while and I suspect the data is so dirty they can't posssibly have clean data anymroe. At least we kept the indices around.....
Admin
I'm going to go ahead and mock the original statement. I do know something about databases and engineering. Anyone who thinks that keys are a bad idea, or that data integrity isn't the responsibility of the database, needs to enter the real world. Yes, having an index slows down inserts. There are VERY few cases where having at least one index doesn't vastly repay the insert cost. About the only case I can imagine would be a table used as a log file, and even in that case unless you crunch the entire file you may well get benefit from a key.
As far as data integrity, relying on the applications to never make a mistake is simply dumb. One of the ways that you develop reliable apps is by having cross checks like database constraints.
I'm really really hoping that both you and the original post are actually jokes, and that you aren't serious.
Admin
Using two fields may be more normalized, but in this case you just end up with two FK constraints and get the same problem. Now if one FK fails, and you get error number 1234, what does that tell the user? "Invalid year or semester"? I've never tried to relay a RDBMS error message to the end user before so maybe I just don't know. What if your query had other (meaningful) FK constraints, like the instructor ID of a course having to match against the PK in the instructors table? The error code wouldn't be enough to tell you what caused the insert query to fail. You'll have to parse the error text manually and turn it into a useful error message, because I'll be damned if you're going to just echo out the complete error message to the user.
I could completely see using the FK constraint if it was a static utility table, but relying on the DBA to go in there and update it bi-annually? To me, what it comes down to is this: Instead of relying on the app to do the data integrity enforcing, you THINK you're relying on the database to do it, but in fact you're relying on the
DBA to keep the data in check by having him log in twice annually and do updates to the PK.
Also, since you're not even doing the check in the app, nothing is going to stop a user from adding a course from a previous semester since the FK constraint will PASS. So much for integrity.
Admin
I wonder what the DBA think of Rails. On the dev side, we don't have to write SQL so i don't know how efficient the queries may be. But it forces the use of PK and FK even on those that don't know what it is.
But you basically NEED to write your models relationships down before writing any code. Then you NEED to define the relationship in your models, and you would be very lazy not to write thos few words that magically performs app level data validation.
Well, why don't we have more RoR WTF ?
Admin
So the WTF here is the means of communication between the DBA and the developers right?
Why should a .NET developer have any real knowledge of the needs of a DBA, and why should a .NET developer live to the rules of a DBA when as far as he's concerned it's just a bunch of hoops and no real use to him?
Instead of having a throw it over the wall attitude to this kind of stuff, there should be dev DBAs in the dev team pairing with people, explaining these things face to face, fostering good relationships between the different parts of the teams and ensuring quality before any code review stage.
Plus, why should the method of 'doing it in dev' be any different to the method of doing it in production? No, I don't mean that the DBA should import the CSV in an adhoc manner, I mean that the only way the developer should be able to get a table into his dev structure is by running the script that he's going to hand over the the DBA. And it should run through the same database build strucutre that every script hits the production database through. The DBA does install the scripts by running a consistent single command containing a version control tag, the script extracting the relevant install scripts from VC, checking they're valid to run, the
database is in the right state to run them, and they've not already been ran on this database before running them?
If not, then the whole structure is big pile of WTF!
Captcha: You've been tango'd
Admin
And that's why you make jack shit for income. If you don't see the value in having a database enforce various relationships in data, you don't see the value in building a system that resists the harm caused by some stupid bug.
With this attitude, I wouldn't hire you, and I pay my developers fairly well. So, you probably make < $30,000 per year.
Admin
Ever tried mysql? You can use TEXT fields, so you don't need to specify a maximum length...... and then you can use them as the primary key :)
Of course, even the mysql devs had enough insight to see how this would be a bad idea when a TEXT field gets to store several dozens pages worth of text, and they limited the key up to the first 255 characters, and also they force you to specify the limit.
Admin
I live in PHP-land as well and while it does tend to make you lazy (declare my variables? why? typecasting? what's that?) if you ever try to build anything big you better believe you'll try to get the database to run as efficiently as possible. Script languages like PHP tend to be problematic in the performance sector when they are asked to do some serious work and one way out is to use the database to do as much of the work as possible. That pretty much cures you of the huge varchar field sickness
Admin
This reminds me of a perfectly normalized table we have in our db...
CREATE TABLE [dbo].[HIERARCHIES](
[Id] [int] NOT NULL,
[Code] [nvarchar](128) NULL,
[Description] [nvarchar](1024) NULL,
[SomeBoolean] [bit] NULL,
[Lvl1] [nvarchar](128) NULL,
[Lvl2] [nvarchar](128) NULL,
[Lvl3] [nvarchar](128) NULL,
[Lvl4] [nvarchar](128) NULL,
[Lvl5] [nvarchar](128) NULL,
[Lvl6] [nvarchar](128) NULL,
[Lvl7] [nvarchar](128) NULL,
[Lvl8] [nvarchar](128) NULL,
[Lvl9] [nvarchar](128) NULL,
[Lvl10] [nvarchar](128) NULL,
)
Admin
I can see that a char(8000) is wasting resources, I don't see how a varchar(8000) does?
Admin
I'm not mocking it, I just simply disagree with it.
Your argument is a sensible one, however you assume that data will never be manipulated by anything else but your application. More often than not, this is a bad assumption. Even if eventually no-one else will touch your data until the end of times, just imagine the costs if they do.
Why add constraints that should never be hit? I think you're asking the wrong question there. (Why pay insurance if your house never burns down?) I'd reverse the question: whereas you might save a few microsecs by not enforcing data integrity, is it worth the risk?
Admin
Trusting the app to maintain database integrity is like trusting Javascript to check the user's inputs to a form. Sure, it may do the trick if you can guarantee that the user will only be using a browser with scripting turned on, and the script runs properly, and they don't have any user scripts messing with your variables, and the user doesn't just form his own bogus request to communicate with the server directly.
Having JS flag when the user is typing letters into their zip code is fine, but you have to check it again once it hits the server. The same thing applies to the relationship between the DB and the app. Having the app check and enforce the buisness rules is good. It catches the error a step early, (just as the JS catches the zipcode-with-letters scenario) but the DB has to treat the data coming from the app as potentially unreliable. Sure, you don't need to enforce all the business rules as DB constraints, but you should at the very least make sure that nothing gets stored that will actually break the integrity of the DB
Quality is Job #
M1Admin
It's always funny to meet some hardcore crack devs. "Uh, Index? PK? We don't need them, it's only a small table!!11"
Yep... until it's been 3 month in production... and people start to complain about bad performance...
I'm really happy that I've started in the database department of a big bank before I went more into development, and best thing is today, as a freelancer, I can do everything from DBA work, specialized DB development and the best: kick stupid developers asses who complain about a dba.