• (disco)

    Ah, the "whyyyyyyyy?" debug message. The calling card of a dev working overtime to debug a program and getting nowhere.

  • (disco)
    connecting via PDO to a MySQL database
    SQL injection vulnerabilities
    Someone should tell them that PDO has prepared statements...
    to turn columns holding integer values into actual Integers
    I suspect that at least *some* of the values won’t be “actual Integers”.
  • (disco)

    Hmm, something strange with this story. Data stored in a VarChar(255) in MySQL only use the storage used by the text, not the whole field. If you wrote 2 letters in the field, only 3 bytes of would be used..

  • (disco) in reply to Oyvin_Hanssen

    Yea but no but yea but no...

    Data stored in a VarChar(255) in MySQL only use the storage used by the text, not the whole field. If you wrote 2 letters in the field, only 3 bytes of would be used.

    That is true when it comes to storage. When it comes to schema, however, MySQL has to assume that there might be a row where all the varchars are completely filled, and that row must still be saveable (as it is a valid row according to the schema). So a VARCHAR(255) is 255 characters long as far as ALTER TABLE is concerned.

  • (disco) in reply to Oyvin_Hanssen
    Oyvin_Hanssen:
    Hmm, something strange with this story. Data stored in a VarChar(255) in MySQL only use the storage used by the text, not the whole field. If you wrote 2 letters in the field, only 3 bytes of would be used..

    But wouldn't it complain if you tried to make too many of those columns, because it could overflow the row length?

    ...aaaand Hanzo'd

  • (disco) in reply to adhominis
    adhominis:
    MySQL has to assume that there might be a row where all the varchars are completely filled, and that row must still be saveable (as it is a valid row according to the schema). So a VARCHAR(255) is 255 characters long as far as ALTER TABLE is concerned.

    I came here specifically to find out if this was the case (as it seemed likely) and got an answer with amazing speed.

  • (disco) in reply to VinDuv

    That and running a front end application like a website with DDL privileges.....

  • (disco)

    One of the systems that we integrate with uses this idea completely. And approximately 200 columns was the limit, if memory serves me right: I wrote a script to keep adding columns via the API until an error happened. Luckily, there was a choice between "text", "number" and "datetime" - I could squeeze a few more of the number/datetime fields in even when "full" of text fields. (Note: not "TEXT" but VARCHAR(255) for the "text" type; TEXT (BLOB) columns only contribute 4 (?) bytes to the 64KB limit as they are stored outside the row)

  • (disco) in reply to kupfernigk
    kupfernigk:
    I came here specifically to find out if this was the case (as it seemed likely) and got an answer with amazing speed.

    It's also the case if you set the character set to utf8 it will assume 3 bytes per character, so VARCHAR(255) contributes 767 bytes to towards the 64KB limit. (2 byte overhead)

    My default VARCHAR is 85 so as it only use a 1 byte overhead and to fit within 255 bytes :smile: Of course we want to use utf8mb4 now, so we can store 💩 astral Unicode planes, so that would reduce that idea to 63...

  • (disco)

    You could always just add a foreign key to another table and fill it with more juicy VARCHAR(255) data. It's not that hard, people.

  • (disco) in reply to CreatedToDislikeThis
    CreatedToDislikeThis:
    foreign index
    CreatedToDislikeThis:
    It's not that hard, people.

    Discodevs would like a word with you...

  • (disco)

    And, what happened when a form field was removed? I'm going to guess: nothing. So in that table there are probably more columns for no data.

  • (disco)

    Probably going to get smacked....but IF the there was really a need to dynamically add the columns (possibly with some later validation/standardization to contain the beast), then this might have been a good use-case for EAV.

  • (disco)

    Everyone knows that varchar fields containing integers only contain about 90% integers.

  • (disco) in reply to eViLegion
    eViLegion:
    Everyone knows that varchar fields containing integers only contain about 90% integers.

    Particularly when a load of people doing asset recording (for instance) are given an Excel spreadsheet to fill in, and then management decides after the event that it all needs to be put into a database. I have wanted to kill the designers of Excel more often than I care to think.

  • (disco)

    Why not use rows instead of columns. It's better solution i think. One table with column description: create table ColumnDescription Id, Name, DataType And one table with data: ColumnData Id, ColumnDescriptionId, ValueInt, ValueDouble, ValueString, ValueDate You could add as many value columns you like.

  • (disco) in reply to Ventsislav_Mladenov
    Ventsislav_Mladenov:
    Why not use rows instead of columns. It's better solution i think.
    Really.
  • (disco) in reply to ChrisH

    It would be backwards compatible, and would never need to alter a table again.

    table Tables Id, name

    table Columns Id, name, tableId

    table Rows Id, tableId

    table Cells Id, columnId, rowId, value (as varchar255)

  • (disco) in reply to Ventsislav_Mladenov

    That's similar to what I'd do. It's pretty much the only way to do it properly, given the specified requirements. The only issue with doing it that way is that searches across multiple 'columns' are trickier, but from what we've read, I don't think that's much of an issue.

    It's certainly a lot better than adding columns to a table automatically and thus having to give the application DDL permissions.

  • (disco) in reply to Ventsislav_Mladenov
    TheCPUWizard:
    Probably going to get smacked....but IF the there was really a need to dynamically add the columns (possibly with some later validation/standardization to contain the beast), then this might have been a good use-case for EAV.
    Ventsislav_Mladenov:
    Why not use rows instead of columns. It's better solution i think.

    Because there are better tools to do this? Like NoSQL document databases?

  • (disco) in reply to Eldelshell

    But that assumes the the Mico Managers will let you introduce new technology to the company.

  • (disco)

    Technically, there are four ways to tackle that problem of ever growing field lists: a) Entity-Attribute-Value schemas, as described by other people here. Problem: Complex Querys are hard to write and usually not performant

    b) Storing a complex data type (e.g. an XML or JSON string) in a large column. Problem: Not 1st normal form. Querying such a table ususally means full table scans, unless your database supports indexing XML

    c) Preparing the table with several generic spare columns, which will be put to use later. Problem: need to decide on a generic data type (most likely varchar) and a fixed number of such columns

    d) Programmatically using ALTER TABLE like it has been done in the story. Problems: Depending on the database system, several side effects can happen.

  • (disco) in reply to ammoQ
    ammoQ:
    a) Entity-Attribute-Value schemas, as described by other people here. Problem: Complex Querys are hard to write and usually not performant

    (replaces deleted post having seen ammoQ's) Actually entity-attribute-value schemas can be fairly easily designed to avoid redundant cells, have simple indexing and be maintainable by having a management form to control what attributes are permitted. Also, it's possible to specify the data type of an attribute and a sequence number so that forms can be dynamically generated.

    The queries need not be very complex, but the server side needs to be a reasonably high performance language.

  • (disco) in reply to kupfernigk

    @kupfernigk As far as data entry or primary key access are concerned, indexing and performance are usually not the problem. Trouble happens when you need more than that. Let's assume you have such a schema to keep all details about your customers. Now marketing comes and says: "We need a list of a female clients aged 30 to 40, who have a dog but no car." If this was a single table, you would write something like

    select * from clients where gender='F' and dayofbirth between add_months(sysdate,-1240) and add_months(sysdate,-1230) and hasdog='Y' and hascar='N';"

    In a EAV schema, you need several joins to get the same result, and there is no way to create an index on a combination of "gender", "hasdog", "hascar" and "dayofbirth" to speed up this query.

  • (disco) in reply to VinDuv

    You might be surprised to find out that SQL injection is still possible with PDO and prepared statements, although you have to set the encoding incorrectly and PDO has to construct the prepared statement client side instead of on the SQL server (which is apparently does by default...)

  • (disco) in reply to razerwolf
    razerwolf:
    and PDO has to construct the prepared statement client side instead of on the SQL server (which is apparently does by default...)

    Yes. Yes it does.

    Long live php!

  • (disco) in reply to adhominis

    Actually, it's more likely assuming the maximum size is 765 bytes if the default UTF-8 collation is used. Every character could take three bytes.

  • (disco) in reply to ammoQ
    ammoQ:
    If this was a single table, you would write something like

    select * from clients where gender='F' and dayofbirth between add_months(sysdate,-1240) and add_months(sysdate,-1230) and hasdog='Y' and hascar='N';"

    In a EAV schema, you need several joins to get the same result, and there is no way to create an index on a combination of "gender", "hasdog", "hascar" and "dayofbirth" to speed up this query.

    DOB is a permanent attribute which lives with the primary entity. (Gender nowadays is a little more fluid, but for the moment we'll pretend the likes of Sophie Wilson don't exist.)
    Typically we are likely to be searching for a small number of transient attributes related to entities with a number of permanent attributes. If this data set is going to be used for more than a single query, and we hope it is to justify the effort in the design, it makes sense to create a new table on the fly to handle it, perhaps offloading it to a different database instance. A combination of queries with joins and merged queries can then do the job. The temporary/transient table can then be indexed for best performance based on the job it needs to do (e.g. produce mailing lists by region or mailmerge based on characteristics.) The question is whether we want to have a maintainable expandable schema upfront with no nasty surprises, and have to work a little harder at the application end, or use a naive schema and risk having the whole thing fall over - along with questions about the desirability of having periodically to update records in the master table to modify transient attributes.

    ammoQ:
    d) Programmatically using ALTER TABLE like it has been done in the story. Problems: Depending on the database system, several side effects can happen
    Problems: this is the master entity table. Apart from the horror of running ALTER TABLE on a live production database, the idea of backups which routinely have a different schema is surely a WTF (as distinct from backups where there is a change process for the schema.) Think about what ALTER TABLE's adding columns is doing to the disc image of the table, and weep.
    Shoreline:
    It's called the EAV pattern, and it creates a high negative-pressure against spherical structures. Also it would not solve the problem.
    As with all patterns, YMMV for specific cases and the devil is in the (implementation) details. It is perhaps worth pointing out that nothing can truly exert negative pressure.
  • (disco) in reply to Ventsislav_Mladenov
    Ventsislav_Mladenov:
    Why not use rows instead of columns.

    It's called the EAV pattern, and it creates a high negative-pressure against spherical structures. Also it would not solve the problem.

    The real problem is actually harder to discern from the story in detail, but I think we learn enough from this line:

    It turns out that Gary was being micro-managed by the VP of Operations.

    This reminds me of a fun long story with no ending. I was working a poorly-scoped, rickety piece of machinery for processing multi-gigabyte MySQL tables whose purpose changed on a near-weekly basis. It required a 4 hour cycle of testing to guarantee the satisfaction of myself and the QA. (That's right, 4 hours to guarantee satisfaction, try and make an innuendo out of that.)

    In this particular upgrade, one of the multi-gigbyte tables was getting an extra 5 7 6 fields. The meanings of the new fields had changed somewhat during development, so the names of the fields were barely holding on to their correct meanings. This wasn't a huge deal operationally, and could be modified later without too much trouble.

    However, a manager (head of operations, no less) specifically requested a field name change at late afternoon on the day before his chosen release date, just after we can completed a 4-hour testing cycle.

    I would have been quite confident and happy to make the change, had the manager in question been somebody other than a useless blame-gaming cunt bent making his own stupid whims into somebody else's problem I would have been less resistant about it.

  • (disco)

    Wait. So they want sales people to collect 200 different pieces of information from a customer? Over the phone?

  • (disco)

    MeanwhIle, "the Intranet website was going to be taken down in a few months" means it will be up for years.

  • (disco)

    I won't begin to describe the horrors of dynamically provisioning a new column on a production database because the whim of a sales wonk decides it must be so.

    I'd rather focus on retroactive "We're going to change the data type of this column to make it work the way we want it and not think about the access tools." How in the hell are you ensuring that the data previously available doesn't get truncated or corrupted?

  • (disco) in reply to RevCurtisP
    RevCurtisP:
    MeanwhIle, "the Intranet website was going to be taken down in a few months" means it will be up for years.

    Sounds like what they've been threatening to do with Trac (which was in place before I joined over 7 years ago) where I work. For documentation and ticket tracking. They managed with the latter. In a fashion...

    First they tried with SharePoint (how that was ever going to work with ticketing I've no idea.)

    Then someone discovered PawFootprints.

    We now use the latter in conjunction with JIRA for ticketing (why two? Because they haven't discovered a third they could probably use :sarcasm:. Yes - if a ticket is raised on Footprints, a sibling ticket must be raised - manually no less - on JIRA. For reasons.)

    And still we have important departmental reference material only1 on Trac. Which has moved servers a couple of times in the past but for some reason still lives on....


    1. "Only" as in 'easily accessible' and 'everyone knows where it is.' Both of which fail to apply to Sharepoint, JIRA and Footprints.
  • (disco) in reply to RevCurtisP

    "Taken down in a few of months" is a WTF in waiting. Please report back in "a few months" and let us know the current state of affairs.

  • (disco) in reply to Slapout
    Slapout:
    Wait. So they want sales people to collect 200 different pieces of information from a customer? Over the phone?

    I doubt if they ask for two hundred items. It was probably for more ad-hoc needs like: "Wow, you actually have a ferret named Wally? Okay...let me note that in our database here..."

    Other thoughts...

    [X ...] was of a quality that was not unusual for a corporate PHP website.

    That was his very first warning right there. Should have expected...well, you know. :rolleyes: :facepalm:

    Finally, everyone has heard of bloatware. I hereby dub this strategy, "Bloat-tables."

  • (disco) in reply to VinDuv
    VinDuv:
    connecting via PDO to a MySQL database
    SQL injection vulnerabilities
    Someone should tell them that PDO has prepared statements...

    Someone should tell them that they've really fked up if they're making PDO vulnerable to SQL injection.

  • (disco) in reply to PJH
    PJH:
    Foorprints

    I don't know that product. (I know Sharepoint and JIRA too well; they're better than many of the alternatives. :anguished:)

  • (disco) in reply to xaade
    xaade:
    It would be backwards compatible, and would never need to alter a table again.

    table TablesId,name

    table ColumnsId,name,tableId

    table RowsId,tableId

    table CellsId,columnId,rowId,value (as varchar255)

    Only make sure the inner platform tables are wooden.
    John_Imrie:
    But that assumes the the Mico Managers will let you introduce new technology to the company
    Even a micromanager has to go to the restroom every now and then...
    kupfernigk:
    Gender nowadays is a little more fluid,
    According to [the Personality Forge][1], name and gender are fixed once for all, whereas you can change your birthdate at will.
  • (disco) in reply to Oyvin_Hanssen
    Oyvin_Hanssen:
    Data stored in a VarChar(255) in MySQL only use the storage used by the text, not the whole field. If you wrote 2 letters in the field, only 3 bytes of would be used..

    Was answered by:

    adhominis:
    That is true when it comes to storage. When it comes to schema, however, MySQL has to assume that there might be a row where all the varchars are completely filled, and that row must still be saveable (as it is a valid row according to the schema). So a VARCHAR(255) is 255 characters long as far as ALTER TABLE is concerned.

    Translation: Allocation of space. 'nuff said.


    Now, as for the article:

    After reading:

    It was written in PHP.... It had been mostly written by several programmers who had left the company a few months prior, and was of a quality that was not unusual for a corporate PHP website.

    I knew right there that this was a horror story in the making. As such, I have corrected reconstructed a critical paragraph to more accurately reflect what must have happened:

    Gary was being tortured via micro-managedment by the VP of Operations. This led to took the form of an endless series of requests to add a new field to the form. Gary got tired of lost his mind from doing the same work over and over, and took it upon himself to resolved to take his revenge by devising a deviously clever way to automate the problem which would crash the system some time after he had obtained employment elsewhere. To this end, any time a new field was added to the form (usually by someone SSH'ing onto the remote server and just editing the necessary PHP files directly in production), the page would later on detect that the database did not have a matching column for it, and would create an alter table statement to add the new column to the database. Automatically. On the fly. With no human knowledge of, or intervention thereof. Live. In Production.

    FTFY

  • (disco) in reply to redwizard
    redwizard:
    which would crash the system some time after he had obtained employment elsewhere

    So the hard part was getting it to keep running while he was still there?

  • (disco) in reply to Onyx

    The hard part was finding a way to automate the handling of the requests in order to buy him enough time away from the torture to find the new job. Don't want the VP getting suspicious.

  • (disco) in reply to CoyneTheDup
    CoyneTheDup:
    Finally, everyone has heard of bloatware. I hereby dub this strategy, "Bloat-tables."

    I once worked for a company that wanted to take on a consultant to work on a database schema. Alarm bells started ringing at interview when one guy announced that he had worked on a project where one table had over 300 columns, as if this was a tribute to his abilities at schema design. They got louder when it turned out he was proud of his system for labelling columns so that they would alpha sort into creation order, but had no idea of ensuring that the fixed width keys were the first columns, to speed up index creation. The worst of it was the director who was also in on the interview seemed to be impressed by this stuff. I did wonder if the consultant was bullshitting and had just developed an interview technique that used metrics that senior managers would think they understood. Of course, at the other end of the scale was the guy who wanted to produce a schema where every table consisted, basically, of key-value pairs "for speed and efficiency". I think that counts as table bloat, whereas the multicolumn approach is just column bloat. Two different approaches to selling heftier servers.

  • (disco)
    kupfernigk:
    I am not responsible for other people's wilful blindness to reality.
    Nobody said you were.

    And if you wouldn't have mentioned it, nobody would have thought of it.

    But now that you've said it, pray tell us how do you make people wilfully blind?

  • (disco) in reply to PWolff
    PWolff:
    But now that you've said it, pray tell us how do you make people wilfully blind?

    The canonical method involves [spoiler]goatse.cx[/spoiler]…

  • (disco) in reply to kupfernigk
    kupfernigk:
    It is perhaps worth pointing out that nothing can truly exert negative pressure.

    Never underestimate how badly something can suck.

  • (disco) in reply to PWolff
    PWolff:
    Nobody said you were.

    You have clearly never worked in the kind of environment where, the moment a manager mentions someone else's foul up, you know somebody is trying to ensure you get the blame for it.

    However - allowing someone to change their date of birth while regarding gender as being fixed, as per your previous comment, is a bit of a fail. If DOB can be changed to deal with initial entry error, why not gender? Especially as gender can be, and has been, wrongly recorded on birth certificates.

    Anyway, I seem to have well and truly got up your nose. Join my "do not reply to" club.

  • (disco) in reply to kupfernigk

    Both name and gender can change over time, though how easily depends on your legislature. I do not think that applies to your date of birth.

  • (disco) in reply to PleegWat
    PleegWat:
    how easily depends on your legislature.

    Marriage is an easy way to change your name.

  • (disco) in reply to aliceif

    Round here gender is way easier than name, apart from the last-name-on-marriage route. As far as I've heard you practically need a royal decree to change your name otherwise.

  • (disco) in reply to PleegWat

    In the UK, to change your legal name, you just order a form, fill it in, and send it back. Didn't see any pricing on the website though, but then I didn't look.

Leave a comment on “Dynamic Tables”

Log In or post as a guest

Replying to comment #:

« Return to Article