- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
And don't forget, 73.5% of statistics are made up on the spot.
Admin
This hits close to home. I have just inherited my running club's website, which has been accepting thousands of run starts for nearly a decade. The site is built in Perl and it stores the data in vertical pipe delimited CSV files. The format is barely valid, and there's some extraneous data that makes no sense. I'm surprised that this site even works. But it's not a huge WTF, it's just old technology that was done by an amateur. There are probably thousands of these sites out there that work the same way.
Admin
Admin
Admin
You win the Internet for today. I bow down to your greatness.
Admin
Sex is an act. Gender is a state. Preferential Identity is a choice.
These are three different things.
If I have to explain sex to you, to bad, I won't. Gender is not just "Male" or "Female" even though that is how we see it. Gender is determined by chromosomes, normally XX and XY, there is no YY as you have to have an X from the mother. There are three other chromosomal genders, XXY, XYY, XXX. The details of those I will leave out here, but let it be said that in the human animal, there are five different genders. As for Preferential Identity, there can be far more options I am sure.
Admin
You're right. They're overcomplicating it by trying to understand the data because it's relatable to your own experience. Treat each person as, say, a grenade, and just put it into a decent format WITHOUT TRYING TO FIGURE OUT WHY. "Why" leads to madness. Then once the data's been put into the right system, offer to streamline or refine the data...
... at a small additional consulting charge.
Otherwise you're going OOS for no $$$ and that makes everyone sad.
Admin
Then we also need DNA_GENDER_NOT_FOUND
Admin
You forgot XO. You can't go wagging biology around a programming forum thinking you're the only one who studied outside the box. Nevertheless, since the vast majority of the population is either XY or XX, and the remaining few have discernible equipment that lets them pick a bathroom, male of female is pretty easy to answer in binary. (Although I wouldn't because there's no need to save the space. Use text unless it gets math done on it.)
Preferential Indentity has no business being in a database unless it's a members registry at a sex club.
Admin
Admin
Admin
What bugs me is that I'll say, "we may not make this deadline" because that's precisely my attitude. Then I get, "oh, no, that deadline is non-negotiable." Usually the client just made up a completely arbitrary deadline, and our managers accept it without any input from the engineers.
But it's only "non-negotiable" because they didn't even try and they want us to pull all the kinds of panicky shit you're talking about. The worst part is they're panicking at the very first meeting, especially when I say, "so since we've agreed to this, what's 'our' plan to make this happen?" And they still have the goddamned gall to try to dress up their panic with the tough-guy leader routine.
And the really irritating part is that no one, but NO ONE actually gives a flying fuck if you actually make these deadlines. Just make steady progress and write good code and they're happier than shit.
Admin
Someone's gotta make beans for them to count. How else are they gonna have jobs?
Admin
That's right! Like these simplistic idiots who try to classify a programming variable as being numeric OR string. What if I write:
So now is x an int or a string, huh? Everything doesn't necessarily fit into your narrow-minded little paradigm, does it?
Well, okay, I guess that would be a programming error.
Kind of like XXY is a chromosomal abnormality and not really another gender. Personally, I have a deformed kidney. That's not a new organ. It's just a kidney that is deformed.
I may "prefer" to call a dog a fish, but that doesn't make it a fish. It's still a mammal, no matter what I prefer to call it. Even if I glue a fin to its back, that won't make it a fish.
Admin
That was my initial reaction, but it really depends on how it is used.
From what we are actually told, there are a few silly tables, but not necessarily a very difficult problem.
Admin
Yeah forgot about XO, and in most cases the additional genders are not readily apparent outside of a test so most of this is a non starter anyway. And no where did I state that the Preferential Identity should be saved, I actually implied it shouldn't be as there could be far to many options there.
Though it is good to see someone else that hasn't limited themselves just to the digital world.
Admin
Data coming from someone who finally realized that a couple of spreadsheets are no longer cutting it. Congratulations.
Either import it exactly as given or do the work and make a real database out of it.
Geez, you should see the things I have to put up with.
Admin
I prefer zhe/zhim/zher.
http://en.wikipedia.org/wiki/Gender-specific_and_gender-neutral_pronouns#Invented_pronouns
Admin
Hmm, this story doesn't seem all that far out.
Data delivered in Excel. Is that how they actually maintain it, or did somebody dump it to Excel? But assuming that this is the best we're going to get, that's not really that big a problem. Export it to CSV, then import the CSV into your database. I've done that many times.
Data has two primary keys. Well, no it doesn't. It has one primary key and one alternate key. This isn't ideal but it's no great obstacle. It's not at all uncommon for data to have multiple unique identifiers. This is routine when some of those identifiers come from external sources that you do not control. Like, we use our customer account number as the primary key. But the customer's social security number also uniquely identifies a customer. So? If foreign keys sometimes use one and sometimes the other, we should pick one and translate all the others during data conversion. An extra step but not a big deal.
Data is unnormalized. Namely, "boy/girl", "he/she", "him/her", etc repeated in every record. Presumably if we refer to a certain person as "he" we will not also refer to that same person as "her". A single field to identify gender would seem to be sufficient. I'd guess that the he/she, him/her etc is inserted into contract documents, letters, maybe other displays. Okay fine, we pick one of these fields to keep in the child table and we move all the other, dependent data into another table and normalize. Also we should change text like "boy" or "girl" into a consistent code so we don't have to deal with mistypes like "gril". The odds are that when there is dependent data like this, there are inconsistencies. Like there are probably some number of records that have "boy" and "she". So when we do the normalization we kick out an error report and have someone clean those up by hand. Or maybe we can just pick one of the fields as authoritative in cases of inconsistency.
Nursery1/2/3. Unnormalized data again, but surely that one's easy. We create a separate nursery table and populate the data correctly.
The only real problem I see is the long text field for the age. It always baffles me when people do this sort of thing as it's surely more work for a human to type "the child will be six years old next august" than to just type "5". And given that kind of text field, I'm sure we can expect all sorts of variety. Some ages will be in months and others in yeears. Some will be in digits and others spelled out. An age like 18 months may be written "18 months", "1 1/2 years", "a year and a half", "1.5 years", etc etc. Plus, how do we know on what date this was entered? If it says the child is 3 years old, was that yesterday, last year, five years ago, thirty years ago, ... ? Anyway, my first thought would be to try to pick out a number and look for "years" or "months" and ignore everything else, then print a big list with the original text and what we pulled from it, and have someone go over it manually. Then convert it to a birth date. I don't know what you could do that would be better. Well, given that with no time stamp on the data entry the information may be useless, maybe we just throw it out and make them enter birthdates, but I doubt we'd get away with demanding that.
Admin
I'm in the "don't see the big deal" camp. Sure, data conversions can be a nightmare, but I don't see that in the samples provided.
Why worry about the "age" column? If it doesn't fit the data model of the target system, then just drop that column. Easy. And if the customer insists that all of the data has to be imported - well, hopefully the target system includes a notes/comment/memo column.
Admin
Yes, it is. XY, XX, XXY, XXY, and several other possibilities.
It IS in the DNA, and there are more than two choices.
Admin
The real wtf is the encoded HTML on this site's RSS fees.
Admin
I wouldn't blame the salesman too harshly. It is doubtful that he has taken the time to study the contents of the database. The fact that it's not as clean as it might have been is just one of those things. Suck it up, lazybones drama queen, and get on with it. And if you got fields that contain incomprehensible data like the age field, you flag it up and return to the customer for clarification, and explain that this might not be so easy to convert into a clean implementation.
Admin
You've never heard of the term "level of effort", have you?
Admin
I was a little annoyed (read: I almost cared) that we fed the original "Right or wrong" troll with all those replies. I forgot that, with this group especially, the troll was feeding us.
Admin
Ignoring any possibility of this "Daan" person having a proper name in some language, could an article writer please apply this same anonimization methodology to someone named "Bob"?
Thanks.
Admin
Admin
Import the date into Google Refine, fix most of the mess in there, and then export it again. Assuming that "Not yet", "Not Made" and similar multiple values for the same thing are the norm rather than the exception. Otherwise just drop the redundant data and stick to the column that you think is most accurrate.
Admin
Admin
Admin
And 10 out of 7 are obviously incorrect.
Admin
I would. It's the salesman's job to get input from engineering on how long the migration would take. That includes looking at the data. Not just making up some date you think will make the customer happy with no clue of how difficult the work is.
Admin
Don't be hater. Hate is like acid.
Admin
Wrong. Use a dictionary next time.
Sex is chromosomes, male/female. Gender is how you see yourself. Male, female, neuter, other... Preferential identity is a synonym for gender.
Notice that we never use gender for animals; we use sex. Because they can't tell us what they'd like to be known as, we can only determine what they are genetically.
Of course, if you're not in the USA, your nation's variant of English may differ...
Admin
Shouldn't it be de-programmer? un-programmer? Someone who, instead of creating functional software, destroys it?
...so I guess the opposite of 'programmer' is 'user'...
Admin
Admin
Next to "sisgrammer" in the table.
Admin
Somewhere close to "Software Evangelist."
Admin
This is wrong.
While sex may be an act, it is also a state.
sex (n) 1: either the male or female division of a species, especially as differentiated with reference to the reproductive functions.
Gender is a social construct. A gender represents the sum total of a society's expectations for a particular sex.
Admin
Admin
Management bullshit. Say what you mean, if you actually understand what you mean and aren't just jacking off on buzzwords.
Admin
Please show some sensitivity. I once logged in with the username "Bob" and let me assure you, it's no laughing matter.
Admin
"Captain Vimes says we don't have sex when on duty." -- Lance Corporal Angua
Admin
[quote user="QJo"The real WTF would be if you've been assigned a ridiculously short time to do it.[/quote]
You don't run out of time. Time is infinite. You are finite, Zathras is finite, This... is wrong database.
Admin
That's what a salesperson needs to find out in lieu of understanding that it's five billion records in a non-standard comma-delimited formatted, so no, you can't f$%&in' have it by next Wednesday.
Admin
Regarding the gender, himher, heshe, ... columns, in the 1910 US Census, one of my grandfather's sisters, Cora, was listed as "Daughter" in the relation to the head of household column and as a male in the sex column.
Admin
"Send Daan to sing contract with my company"
I just visualized a great bollywood style musical number about a man singing a contract with a company.
Admin
That last motherlode piece appears to me like an attempt to start a localization feature, to make it easy to adopt to foreign languages and their wording.
And I myself was a victim of such shameless salesman lying to the customers, in our case nobody else than federal government, so we better delivered: I hate them, I hate them, I hate them.
Admin
And the problem?
(this is site worth visiting any more???)
Admin
Problem is, if they don't over-promise, someone else wins the contract, and neither the salesman nor the engineers get paid. Nope, like it or not, lying salesmen are a fact of life... you just have to work around them... and try to sneak in some payback where you can (e.g social club paintball tournaments).