- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Ah, the "whyyyyyyyy?" debug message. The calling card of a dev working overtime to debug a program and getting nowhere.
Admin
Admin
Hmm, something strange with this story. Data stored in a VarChar(255) in MySQL only use the storage used by the text, not the whole field. If you wrote 2 letters in the field, only 3 bytes of would be used..
Admin
Yea but no but yea but no...
That is true when it comes to storage. When it comes to schema, however, MySQL has to assume that there might be a row where all the varchars are completely filled, and that row must still be saveable (as it is a valid row according to the schema). So a VARCHAR(255) is 255 characters long as far as ALTER TABLE is concerned.
Admin
But wouldn't it complain if you tried to make too many of those columns, because it could overflow the row length?
...aaaand Hanzo'd
Admin
I came here specifically to find out if this was the case (as it seemed likely) and got an answer with amazing speed.
Admin
That and running a front end application like a website with DDL privileges.....
Admin
One of the systems that we integrate with uses this idea completely. And approximately 200 columns was the limit, if memory serves me right: I wrote a script to keep adding columns via the API until an error happened. Luckily, there was a choice between "text", "number" and "datetime" - I could squeeze a few more of the number/datetime fields in even when "full" of text fields. (Note: not "TEXT" but VARCHAR(255) for the "text" type; TEXT (BLOB) columns only contribute 4 (?) bytes to the 64KB limit as they are stored outside the row)
Admin
It's also the case if you set the character set to utf8 it will assume 3 bytes per character, so VARCHAR(255) contributes 767 bytes to towards the 64KB limit. (2 byte overhead)
My default VARCHAR is 85 so as it only use a 1 byte overhead and to fit within 255 bytes :smile: Of course we want to use utf8mb4 now, so we can store 💩 astral Unicode planes, so that would reduce that idea to 63...
Admin
You could always just add a foreign key to another table and fill it with more juicy VARCHAR(255) data. It's not that hard, people.
Admin
Discodevs would like a word with you...
Admin
And, what happened when a form field was removed? I'm going to guess: nothing. So in that table there are probably more columns for no data.
Admin
Probably going to get smacked....but IF the there was really a need to dynamically add the columns (possibly with some later validation/standardization to contain the beast), then this might have been a good use-case for EAV.
Admin
Everyone knows that varchar fields containing integers only contain about 90% integers.
Admin
Particularly when a load of people doing asset recording (for instance) are given an Excel spreadsheet to fill in, and then management decides after the event that it all needs to be put into a database. I have wanted to kill the designers of Excel more often than I care to think.
Admin
Why not use rows instead of columns. It's better solution i think. One table with column description: create table ColumnDescription Id, Name, DataType And one table with data: ColumnData Id, ColumnDescriptionId, ValueInt, ValueDouble, ValueString, ValueDate You could add as many value columns you like.
Admin
Admin
It would be backwards compatible, and would never need to alter a table again.
table Tables Id, name
table Columns Id, name, tableId
table Rows Id, tableId
table Cells Id, columnId, rowId, value (as varchar255)
Admin
That's similar to what I'd do. It's pretty much the only way to do it properly, given the specified requirements. The only issue with doing it that way is that searches across multiple 'columns' are trickier, but from what we've read, I don't think that's much of an issue.
It's certainly a lot better than adding columns to a table automatically and thus having to give the application DDL permissions.
Admin
Because there are better tools to do this? Like NoSQL document databases?
Admin
But that assumes the the Mico Managers will let you introduce new technology to the company.
Admin
Technically, there are four ways to tackle that problem of ever growing field lists: a) Entity-Attribute-Value schemas, as described by other people here. Problem: Complex Querys are hard to write and usually not performant
b) Storing a complex data type (e.g. an XML or JSON string) in a large column. Problem: Not 1st normal form. Querying such a table ususally means full table scans, unless your database supports indexing XML
c) Preparing the table with several generic spare columns, which will be put to use later. Problem: need to decide on a generic data type (most likely varchar) and a fixed number of such columns
d) Programmatically using ALTER TABLE like it has been done in the story. Problems: Depending on the database system, several side effects can happen.
Admin
(replaces deleted post having seen ammoQ's) Actually entity-attribute-value schemas can be fairly easily designed to avoid redundant cells, have simple indexing and be maintainable by having a management form to control what attributes are permitted. Also, it's possible to specify the data type of an attribute and a sequence number so that forms can be dynamically generated.
The queries need not be very complex, but the server side needs to be a reasonably high performance language.
Admin
@kupfernigk As far as data entry or primary key access are concerned, indexing and performance are usually not the problem. Trouble happens when you need more than that. Let's assume you have such a schema to keep all details about your customers. Now marketing comes and says: "We need a list of a female clients aged 30 to 40, who have a dog but no car." If this was a single table, you would write something like
select * from clients where gender='F' and dayofbirth between add_months(sysdate,-1240) and add_months(sysdate,-1230) and hasdog='Y' and hascar='N';"
In a EAV schema, you need several joins to get the same result, and there is no way to create an index on a combination of "gender", "hasdog", "hascar" and "dayofbirth" to speed up this query.
Admin
You might be surprised to find out that SQL injection is still possible with PDO and prepared statements, although you have to set the encoding incorrectly and PDO has to construct the prepared statement client side instead of on the SQL server (which is apparently does by default...)
Admin
Yes. Yes it does.
Long live php!
Admin
Actually, it's more likely assuming the maximum size is 765 bytes if the default UTF-8 collation is used. Every character could take three bytes.
Admin
DOB is a permanent attribute which lives with the primary entity. (Gender nowadays is a little more fluid, but for the moment we'll pretend the likes of Sophie Wilson don't exist.)
Problems: this is the master entity table. Apart from the horror of running ALTER TABLE on a live production database, the idea of backups which routinely have a different schema is surely a WTF (as distinct from backups where there is a change process for the schema.) Think about what ALTER TABLE's adding columns is doing to the disc image of the table, and weep. As with all patterns, YMMV for specific cases and the devil is in the (implementation) details. It is perhaps worth pointing out that nothing can truly exert negative pressure.Typically we are likely to be searching for a small number of transient attributes related to entities with a number of permanent attributes. If this data set is going to be used for more than a single query, and we hope it is to justify the effort in the design, it makes sense to create a new table on the fly to handle it, perhaps offloading it to a different database instance. A combination of queries with joins and merged queries can then do the job. The temporary/transient table can then be indexed for best performance based on the job it needs to do (e.g. produce mailing lists by region or mailmerge based on characteristics.) The question is whether we want to have a maintainable expandable schema upfront with no nasty surprises, and have to work a little harder at the application end, or use a naive schema and risk having the whole thing fall over - along with questions about the desirability of having periodically to update records in the master table to modify transient attributes.
Admin
It's called the EAV pattern, and it creates a high negative-pressure against spherical structures. Also it would not solve the problem.
The real problem is actually harder to discern from the story in detail, but I think we learn enough from this line:
This reminds me of a
funlong story with no ending. I was working a poorly-scoped, rickety piece of machinery for processing multi-gigabyte MySQL tables whose purpose changed on a near-weekly basis. It required a 4 hour cycle of testing to guarantee the satisfaction of myself and the QA. (That's right, 4 hours to guarantee satisfaction, try and make an innuendo out of that.)In this particular upgrade, one of the multi-gigbyte tables was getting an extra
576 fields. The meanings of the new fields had changed somewhat during development, so the names of the fields were barely holding on to their correct meanings. This wasn't a huge deal operationally, and could be modified later without too much trouble.However, a manager (head of operations, no less) specifically requested a field name change at late afternoon on the day before his chosen release date, just after we can completed a 4-hour testing cycle.
I would have been quite confident and happy to make the change, had the manager in question been somebody other than a useless blame-gaming cunt bent making his own stupid whims into somebody else's problem I would have been less resistant about it.
Admin
Wait. So they want sales people to collect 200 different pieces of information from a customer? Over the phone?
Admin
MeanwhIle, "the Intranet website was going to be taken down in a few months" means it will be up for years.
Admin
I won't begin to describe the horrors of dynamically provisioning a new column on a production database because the whim of a sales wonk decides it must be so.
I'd rather focus on retroactive "We're going to change the data type of this column to make it work the way we want it and not think about the access tools." How in the hell are you ensuring that the data previously available doesn't get truncated or corrupted?
Admin
Sounds like what they've been threatening to do with Trac (which was in place before I joined over 7 years ago) where I work. For documentation and ticket tracking. They managed with the latter. In a fashion...
First they tried with SharePoint (how that was ever going to work with ticketing I've no idea.)
Then someone discovered
PawFootprints.We now use the latter in conjunction with JIRA for ticketing (why two? Because they haven't discovered a third they could probably use :sarcasm:. Yes - if a ticket is raised on Footprints, a sibling ticket must be raised - manually no less - on JIRA. For reasons.)
And still we have important departmental reference material only1 on Trac. Which has moved servers a couple of times in the past but for some reason still lives on....
Admin
"Taken down in a few of months" is a WTF in waiting. Please report back in "a few months" and let us know the current state of affairs.
Admin
I doubt if they ask for two hundred items. It was probably for more ad-hoc needs like: "Wow, you actually have a ferret named Wally? Okay...let me note that in our database here..."
Other thoughts...
That was his very first warning right there. Should have expected...well, you know. :rolleyes: :facepalm:
Finally, everyone has heard of bloatware. I hereby dub this strategy, "Bloat-tables."
Admin
Someone should tell them that they've really fked up if they're making PDO vulnerable to SQL injection.
Admin
I don't know that product. (I know Sharepoint and JIRA too well; they're better than many of the alternatives. :anguished:)
Admin
Admin
Was answered by:
Translation: Allocation of space. 'nuff said.
Now, as for the article:
After reading:
I knew right there that this was a horror story in the making. As such, I have
correctedreconstructed a critical paragraph to more accurately reflect what must have happened:FTFY
Admin
So the hard part was getting it to keep running while he was still there?
Admin
The hard part was finding a way to automate the handling of the requests in order to buy him enough time away from the torture to find the new job. Don't want the VP getting suspicious.
Admin
I once worked for a company that wanted to take on a consultant to work on a database schema. Alarm bells started ringing at interview when one guy announced that he had worked on a project where one table had over 300 columns, as if this was a tribute to his abilities at schema design. They got louder when it turned out he was proud of his system for labelling columns so that they would alpha sort into creation order, but had no idea of ensuring that the fixed width keys were the first columns, to speed up index creation. The worst of it was the director who was also in on the interview seemed to be impressed by this stuff. I did wonder if the consultant was bullshitting and had just developed an interview technique that used metrics that senior managers would think they understood. Of course, at the other end of the scale was the guy who wanted to produce a schema where every table consisted, basically, of key-value pairs "for speed and efficiency". I think that counts as table bloat, whereas the multicolumn approach is just column bloat. Two different approaches to selling heftier servers.
Admin
And if you wouldn't have mentioned it, nobody would have thought of it.
But now that you've said it, pray tell us how do you make people wilfully blind?
Admin
The canonical method involves [spoiler]goatse.cx[/spoiler]…
Admin
Never underestimate how badly something can suck.
Admin
You have clearly never worked in the kind of environment where, the moment a manager mentions someone else's foul up, you know somebody is trying to ensure you get the blame for it.
However - allowing someone to change their date of birth while regarding gender as being fixed, as per your previous comment, is a bit of a fail. If DOB can be changed to deal with initial entry error, why not gender? Especially as gender can be, and has been, wrongly recorded on birth certificates.
Anyway, I seem to have well and truly got up your nose. Join my "do not reply to" club.
Admin
Both name and gender can change over time, though how easily depends on your legislature. I do not think that applies to your date of birth.
Admin
Marriage is an easy way to change your name.
Admin
Round here gender is way easier than name, apart from the last-name-on-marriage route. As far as I've heard you practically need a royal decree to change your name otherwise.
Admin
In the UK, to change your legal name, you just order a form, fill it in, and send it back. Didn't see any pricing on the website though, but then I didn't look.