- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Anyone up for a Inmon vs Kimball rumble?
Admin
This is just silly talk I realize but wouldn't it make sense for them to have the date table built into the OLAP/SQL engine directly if they insist on using a table. Ie you install you OS/SQL with a specified time format and as part of installation any stupid date table it wants gets built once as a system level table? Also I'm not convinced that even with good caching that it would be faster to have a table than just to compute the required value on the fly especially if you can push that off onto the client so you don't tie up your server figuring things like that out.
Admin
So TRWTF is leveraging a data warehouse dimension table to implement an email filter rule for an issue tracking system?
Admin
Could someone please denormalize this sentence?
Admin
Absolutely. The only WTF here is that the submitter is working on a data warehouse but doesn't understand data warehousing query optimization. Yes, you can use a date function, but the query optimizer has a much more difficult time optimizing queries against values it can only determine at runtime. Best just to compute the values in advance and maintain the table.
The project manager was correct, although I'm sure he was just passing it along from a DBA who actually knew what they were talking about.
Admin
Truly, nobody has made a comment like "That table looks perfectly denormal to me" yet?
Okay, here goes: That table looks perfectly denormal to me.
Admin
Error: Hour Not Found
Of course
Admin
The only WTF is the need for that type of table.
Admin
You've obviously never written any MDX or even know what it is, so I'm just going to keep laughing at your ignorant comment that doesn't make any sense.
JC is spot on.
Admin
The third rule of data warehousing is to forget everything you know about relational databases. The first and second rules are to not talk about it.
Admin
So, here's the situation. First, the server is setup and designed specifically to be tied up figuring out things like that. Pushing things to the client sounds like a cool plan, until you need to push a billion pieces of data to the client to "compute the required value on the fly" to have the client then turn around and send back a billion computed rows of data so the server can then continue finding the appropriately related data.
Also, it is definitely faster to compute the date settings once at data load in, and then not have to have them recomputed every time 1 of 100 different users wants to run 1 of 100 different reports.
The big take away is this: Data warehousing systems are specifically designed to read and compute massive amounts of data, probably beyond what you're thinking because data is typically pulled from multiple sources and it is important to have historical data for trending.
Admin
Yep, I don't think that there's anything abnormal or particularly inappropriate about this.
The query optimiser often cannot work out how many of the dates in a table are Sunday without reference to a column storing day_of_week, and front-end query tools often do not have good syntax for working out these attributes on the fly.
Admin
Oh, sure, for spring. For fall, it's: DUPLICATE_HOUR_FOUND
Admin
I believe they think "For Dummies" is actually a requirement.
"Any idiot can use a computer. Many do."
Admin
Admin
I study computer science, and we had a course on data warehouses, where we created timetables like this using that wizard.
Can anyone explain why such a tablet is bad in a data warehouse context?
Admin
From a security standpoint I'm disappointed that this article includes actual customer data like that. Where is the obfuscation?
Admin
Perhaps the submitter would like to, because everyone else is asking the same thing!
Admin
I guess the real WTF here is a boss who knows what he is doing... because the submitter clearly doesn't understand the purpose of this.
In fact I've written something much like this. Sure there are Oracle/SQLServer functions that can compute the last business day of the month, but then there are the one-off situations: The day the storm knocked out power to the entire state on what would have been the last business day of the month, and now all our reports need to treat that day as a holiday. The day the accounting system crashed. When business unit A has a different holiday schedule than B. So what did I do:
(a) write a bunch of functions to adjust the date for these one-off issues. (b) create a view with functions in the columns (c) realize performance sucks when you try to join against this (d) materialize the view... and its a table now.
The other alternative (which I rejected) was to duplicate the last reported data with a new effective date. Problem is that the duplicated data can confuse other reports that need want to see unique entries.
The big table of dates actually works pretty well:
select calendar_table.reported_date, data_table.x from data_table, calendar_table where calendar_type='Adjusted_Business' and country='US' and business_unit='Boston' and data_table.effective_date=calendar_type.effective_date
And the query will duplicate the entries you needed duplicated, while others can still access the raw (uncorrected and unduplicated data). Now everyone is happy and I only wasted a few megabytes on desks.
Admin
Actually, the original request to create a time/date table which has all sorts of fields is completely accepted as best practice for data warehouses, and for good reason. Check out the works of Kimball and 'date dimensions' for info. You want to be able to drop a field called 'quarter' onto a report and have it aggregate facts to that level. You might not be working with a tool that allows or expects you to write the sql query to calculate the function- and indeed it might be quite slow in performance terms to have the query do all the work when it can just read it out. That's why denormalisation is good in a data warehouse, and that's why you do lots of extra work in the ETL when you can- to make the warehouse easy to understand and use, and perform fast when you use it.
The table you include is a little bit over the top though!
Admin
That's funny because I actually own the book "Data Warehousing for Dummies". i got it when I was learning Cognos many years ago.
Admin
If someone has to explain why a tablet is a poor choice compared to a good SMP server, then all is lost.
Admin
Feels like we are still 60-70 comments short of reaching a consensus that this is not a WTF. This conversation has become eerily similar to "Blame Peter".
Admin
I think I understand what you are saying. If they didn't denormalise so much they wouldn't have enough data to earn the title warehouses?
Admin
I also laughed at this. I work as BI consultant. In my work we always have calendar-table. One of my customer has 1 million customers, think about their fact-table. 'Push calculations to client side', huh.
Captcha: saepius, almost intelligent human
Admin
But, but, with such a table anything can happen!
Admin
I think daylight saving changes are the biggest WTF of all times. It creates so much confusion that cannot be properly represented. For example, a baby can be born at 2:30 AM in some country on a certain date, and die at 2:10 AM on the exact same date - and how do you record that properly, assuming that there are a ton of automated processes depending on those two datetime values?
So what if the sun comes out one hour earlier/later (relative to the local clock), compared to with daylight saving enforced? Adjust business/school hours related to that rather, for fuck sake. If I could go back in time to murder the motherfucker who first thought of this, I would have.
I'm tempted to make the same statement about timezones. So what if the sun comes out at, e.g. 2 AM in some country over there and goes down at 4 PM? We have those kind of oddities that we have to deal with anyway (e.g. close to north pole). Those guys up north still count 60 minuts for an hour. Using that timezone/daylight-savings crap of the logic, their hours should contain less or more minutes than 60, am I fucking right?
Admin
Does your aptitude for bedroom acrobatics depend upon whether DST is in effect?
Admin
Nah, you're pretty angry about what amounts to a very minor inconvenience twice a year.
Maybe try decaf? Or move to a state that doesn't do DST, like Arizona or Hawaii.
Admin
I see what you did there.
Admin
As ugly as it looks in a Normalized Data Store, that is classical Dimensional Data Store. Nothing bad about it if it is used in a Data Warehouse environment.
Admin
It's twice per year locally, but for a party handling this for multiple countries, it's that many times times 2. And that goes for the daylight-savings thingy.
About timezones: I think it would be nice if same datetime values relates to the same moment everywhere.
Am I fucking right?
Admin
TRWTF is column YYYYMMDD of type varchar(11).
Admin
Now if you want it NORMALIZED... well, that is beyond my ability.
Admin
ie 12492-11-23 for November 23rd, 12492 A.D.
Its good right up until 99999 A.D. Then we have to worry about Y100K.
Admin
Admin
What is "NextFactDayLocal"? If that means the next day that the local factory will be running, that table column makes sense. If it means the next day that the programmer will know his facts, you can initialize it all to NULL.
Admin
What you would then do is write a dedicated function to return whether a particular date is a business day in a particular country. This will of course probably involve some sort of table somewhere (unless you can find an online resource to interrogate) and will probably require regular (probably annual) maintenance when you learn what the holidays are going to be for the regions you need to service.
But then if you're storing the date data as defined in this WTF, you're probably going to have to do this anyway, and because of the unwieldiness of the table this will be a more difficult job.
Admin
Boring! Our BI date table has (I kid you not) 184 columns... I think about three of those are ours, the rest were all standard with the BI platform.
Admin
Admin
Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.
Admin
Seems like a typical date dimension table to me. I don't see the wtf factor of this one.
Admin
Admin
Admin
Admin
Admin
Date/Time dimension tables are very common in data warehouses and are recommended as a best practice for a large DW. 11,000 rows in a dimension table in a DW is peanuts compared to the rest of the database. Denormalization is the key to warehouse performance. I don't think this is very WTF-worthy.
Admin
Not a WTF. Real WTF is developer doing BI-related work without understanding BI-fundamentals.
Think about it this way; the business user wants to create a report. His fact table has a granularity of thousands of transactions per day. He wants to be able to see the numbers of transactions summed up to the yearly total, but with the ability to 1: drill down to any level, so for instance he'll see something like this: 2012
+1st. Quarter
-2nd. Quarter
-+April -+May --June --+Week 22 --+Week 23 --+Week 24 ----Monday ----Tuesday ----Wednesday ----Thursday ----Friday ----Saturday ----Sunday --+Week 25 --+Week 26 +3rd. Quarter +4th. Quarter
And with the sums correctly calculated on each level shown. Imagine how poorly an SQL that would have to calculate the actual quarter, month, week for EACH date you have in your dataset, possibly millions (or even billions) of rows large.
The reason why we build data warehouses using denormalized models is due to mainly two factors:
Admin
Sorry, but this is a very common practice in DW applications and is actually recommended by many designs such as the popular Kimball methodologies.
Sure, the database can do all the time functions for you, however this is harder for users to understand. Typically, a reporting tool like Business Objects allows them to drag and drop dimensions into the report to sort, pivot, etc. Having a selection of precalculated date and time fields is MUCH easier than building functions in the reports. Even the space of having a table with one row per minute for 30 years is nothing compared to the size of most fact tables.
Not a FAIL at all....
Admin
+1 TRWTF is the poster.