Comment On The Overloaded Time Table

A few years ago, while on a SQL Server data warehousing project, my project manager sent me a request - create a table with all kinds of information based on a given date. You know – given a date, be able to figure out day of the year, day of the week, month, last business day of the week, and so on, making sure the table stores enough dates to keep it maintenance free. Oh, and one more thing, it’s going to be linked off of the already complex star schema that we were developing for. Why? For reporting purposes of course. [expand full text]
« PrevPage 1 | Page 2 | Page 3Next »

Re: The Overloaded Time Table

2012-11-15 11:12 • by Zapp Brannigan (unregistered)
Anyone up for a Inmon vs Kimball rumble?

Re: The Overloaded Time Table

2012-11-15 11:15 • by Mike (unregistered)
394992 in reply to 394942
This is just silly talk I realize but wouldn't it make sense for them to have the date table built into the OLAP/SQL engine directly if they insist on using a table. Ie you install you OS/SQL with a specified time format and as part of installation any stupid date table it wants gets built once as a system level table? Also I'm not convinced that even with good caching that it would be faster to have a table than just to compute the required value on the fly especially if you can push that off onto the client so you don't tie up your server figuring things like that out.

Re: The Overloaded Time Table

2012-11-15 11:27 • by C-Derb (unregistered)
So TRWTF is leveraging a data warehouse dimension table to implement an email filter rule for an issue tracking system?

Re: The Overloaded Time Table

2012-11-15 11:30 • by C-Derb (unregistered)
Could someone please denormalize this sentence?

The Overloaded Time Table:
Terje explained that he's fears grepping the source for references to this table as the knowledge may reason may very well kill him, and I certainly don't blame him.

Re: The Overloaded Time Table

2012-11-15 11:36 • by Greg (unregistered)
394996 in reply to 394937
Absolutely. The only WTF here is that the submitter is working on a data warehouse but doesn't understand data warehousing query optimization. Yes, you can use a date function, but the query optimizer has a much more difficult time optimizing queries against values it can only determine at runtime. Best just to compute the values in advance and maintain the table.

The project manager was correct, although I'm sure he was just passing it along from a DBA who actually knew what they were talking about.

Re: The Overloaded Time Table

2012-11-15 11:38 • by Paul Neumann (unregistered)
Truly, nobody has made a comment like "That table looks perfectly denormal to me" yet?

Okay, here goes: That table looks perfectly denormal to me.

Re: The Overloaded Time Table

2012-11-15 11:57 • by cellocgw
394999 in reply to 394932
JM:
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?


Error: Hour Not Found

Of course

Re: The Overloaded Time Table

2012-11-15 11:58 • by Chris (unregistered)
The only WTF is the need for that type of table.

Re: The Overloaded Time Table

2012-11-15 12:10 • by Cam (unregistered)
395001 in reply to 394945
You've obviously never written any MDX or even know what it is, so I'm just going to keep laughing at your ignorant comment that doesn't make any sense.

JC is spot on.

Re: The Overloaded Time Table

2012-11-15 12:21 • by Zapp Brannigan (unregistered)
The third rule of data warehousing is to forget everything you know about relational databases. The first and second rules are to not talk about it.

Re: The Overloaded Time Table

2012-11-15 12:53 • by Alan (unregistered)
395004 in reply to 394992
Mike:
This is just silly talk I realize but wouldn't it make sense for them to have the date table built into the OLAP/SQL engine directly if they insist on using a table. Ie you install you OS/SQL with a specified time format and as part of installation any stupid date table it wants gets built once as a system level table? Also I'm not convinced that even with good caching that it would be faster to have a table than just to compute the required value on the fly especially if you can push that off onto the client so you don't tie up your server figuring things like that out.


So, here's the situation. First, the server is setup and designed specifically to be tied up figuring out things like that. Pushing things to the client sounds like a cool plan, until you need to push a billion pieces of data to the client to "compute the required value on the fly" to have the client then turn around and send back a billion computed rows of data so the server can then continue finding the appropriately related data.

Also, it is definitely faster to compute the date settings once at data load in, and then not have to have them recomputed every time 1 of 100 different users wants to run 1 of 100 different reports.

The big take away is this: Data warehousing systems are specifically designed to read and compute massive amounts of data, probably beyond what you're thinking because data is typically pulled from multiple sources and it is important to have historical data for trending.

Re: The Overloaded Time Table

2012-11-15 13:10 • by David Aldridge (unregistered)
395005 in reply to 394937
Yep, I don't think that there's anything abnormal or particularly inappropriate about this.

The query optimiser often cannot work out how many of the dates in a table are Sunday without reference to a column storing day_of_week, and front-end query tools often do not have good syntax for working out these attributes on the fly.

Re: The Overloaded Time Table

2012-11-15 13:28 • by Coyne
395006 in reply to 394955
snoofle:
JM:
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
HOUR_NOT_FOUND?


Oh, sure, for spring. For fall, it's: DUPLICATE_HOUR_FOUND

Re: The Overloaded Time Table

2012-11-15 13:47 • by OH GOD IT BURNS! (unregistered)
395007 in reply to 394975
Ralph:
... The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.


I believe they think "For Dummies" is actually a requirement.

"Any idiot can use a computer. Many do."

Re: The Overloaded Time Table

2012-11-15 13:58 • by starsky51 (unregistered)
395008 in reply to 394968
David:
At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.

They're called Business Users and without them, there'd be no market for data warehousing. God bless em!

Re: The Overloaded Time Table

2012-11-15 13:58 • by Dimse (unregistered)
395009 in reply to 394942
I study computer science, and we had a course on data warehouses, where we created timetables like this using that wizard.

Can anyone explain why such a tablet is bad in a data warehouse context?

Re: The Overloaded Time Table

2012-11-15 14:01 • by BlueBearr (unregistered)
From a security standpoint I'm disappointed that this article includes actual customer data like that. Where is the obfuscation?

Re: The Overloaded Time Table

2012-11-15 14:03 • by Herp (unregistered)
395011 in reply to 395009
Dimse:
I study computer science, and we had a course on data warehouses, where we created timetables like this using that wizard.

Can anyone explain why such a tablet is bad in a data warehouse context?


Perhaps the submitter would like to, because everyone else is asking the same thing!

Re: The Overloaded Time Table

2012-11-15 14:31 • by David (unregistered)
I guess the real WTF here is a boss who knows what he is doing... because the submitter clearly doesn't understand the purpose of this.

In fact I've written something much like this. Sure there are Oracle/SQLServer functions that can compute the last business day of the month, but then there are the one-off situations: The day the storm knocked out power to the entire state on what would have been the last business day of the month, and now all our reports need to treat that day as a holiday. The day the accounting system crashed. When business unit A has a different holiday schedule than B. So what did I do:

(a) write a bunch of functions to adjust the date for these one-off issues.
(b) create a view with functions in the columns
(c) realize performance sucks when you try to join against this
(d) materialize the view... and its a table now.


The other alternative (which I rejected) was to duplicate the last reported data with a new effective date. Problem is that the duplicated data can confuse other reports that need want to see unique entries.

The big table of dates actually works pretty well:

select calendar_table.reported_date, data_table.x from data_table, calendar_table where calendar_type='Adjusted_Business' and country='US' and business_unit='Boston' and data_table.effective_date=calendar_type.effective_date

And the query will duplicate the entries you needed duplicated, while others can still access the raw (uncorrected and unduplicated data). Now everyone is happy and I only wasted a few megabytes on desks.

Re: The Overloaded Time Table

2012-11-15 14:49 • by RB (unregistered)
Actually, the original request to create a time/date table which has all sorts of fields is completely accepted as best practice for data warehouses, and for good reason. Check out the works of Kimball and 'date dimensions' for info. You want to be able to drop a field called 'quarter' onto a report and have it aggregate facts to that level. You might not be working with a tool that allows or expects you to write the sql query to calculate the function- and indeed it might be quite slow in performance terms to have the query do all the work when it can just read it out. That's why denormalisation is good in a data warehouse, and that's why you do lots of extra work in the ETL when you can- to make the warehouse easy to understand and use, and perform fast when you use it.


The table you include is a little bit over the top though!

Re: The Overloaded Time Table

2012-11-15 15:07 • by Cognos Dummy (unregistered)
395020 in reply to 394975
That's funny because I actually own the book "Data Warehousing for Dummies". i got it when I was learning Cognos many years ago.

Re: The Overloaded Time Table

2012-11-15 15:27 • by TheCPUWizard
395021 in reply to 395009
Dimse:

Can anyone explain why such a tablet is bad in a data warehouse context?


If someone has to explain why a tablet is a poor choice compared to a good SMP server, then all is lost.

Re: The Overloaded Time Table

2012-11-15 15:48 • by C-Derb (unregistered)
Feels like we are still 60-70 comments short of reaching a consensus that this is not a WTF. This conversation has become eerily similar to "Blame Peter".

Re: The Overloaded Time Table

2012-11-15 15:50 • by nelson_no_nose (unregistered)
395023 in reply to 394949
Anonymoose:
letatio:
JC:
One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

And for anyone who has written MDX, thats a VERY good thing.


In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.


No, what I tink he really meant to say is that tables like this are pretty normal for the backing of a time dimension in a data warehouse. Seems like he understands data warehousing much better than you do.


I think I understand what you are saying. If they didn't denormalise so much they wouldn't have enough data to earn the title warehouses?

Re: The Overloaded Time Table

2012-11-15 15:53 • by DeliriousHippie (unregistered)
395024 in reply to 395004
I also laughed at this. I work as BI consultant. In my work we always have calendar-table. One of my customer has 1 million customers, think about their fact-table. 'Push calculations to client side', huh.

Captcha: saepius, almost intelligent human

Re: The Overloaded Time Table

2012-11-15 16:22 • by TheRider
But, but, with such a table *anything* can happen!

Re: The Overloaded Time Table

2012-11-15 16:30 • by erat (unregistered)
395026 in reply to 394932
JM:
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?


I think daylight saving changes are the biggest WTF of all times. It creates so much confusion that cannot be properly represented. For example, a baby can be born at 2:30 AM in some country on a certain date, and die at 2:10 AM on the exact same date - and how do you record that properly, assuming that there are a ton of automated processes depending on those two datetime values?

So what if the sun comes out one hour earlier/later (relative to the local clock), compared to with daylight saving enforced? Adjust business/school hours related to that rather, for fuck sake. If I could go back in time to murder the motherfucker who first thought of this, I would have.

I'm tempted to make the same statement about timezones. So what if the sun comes out at, e.g. 2 AM in some country over there and goes down at 4 PM? We have those kind of oddities that we have to deal with anyway (e.g. close to north pole). Those guys up north still count 60 minuts for an hour. Using that timezone/daylight-savings crap of the logic, their hours should contain less or more minutes than 60, am I fucking right?

Re: The Overloaded Time Table

2012-11-15 16:46 • by dkallen (unregistered)
395027 in reply to 395026
erat:
JM:
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?


<snip>

am I fucking right?


Does your aptitude for bedroom acrobatics depend upon whether DST is in effect?

Re: The Overloaded Time Table

2012-11-15 17:10 • by Dann of Thursday (unregistered)
395029 in reply to 395026
erat:
am I fucking right?


Nah, you're pretty angry about what amounts to a very minor inconvenience twice a year.

Maybe try decaf? Or move to a state that doesn't do DST, like Arizona or Hawaii.

Re: The Overloaded Time Table

2012-11-15 17:19 • by urbalt (unregistered)
395030 in reply to 395026
erat:
I think daylight saving changes are the biggest WTF of all times.


I see what you did there.

Re: The Overloaded Time Table

2012-11-15 18:02 • by Gekko (unregistered)
As ugly as it looks in a Normalized Data Store, that is classical Dimensional Data Store. Nothing bad about it if it is used in a Data Warehouse environment.

Re: The Overloaded Time Table

2012-11-15 19:03 • by caecus (unregistered)
395032 in reply to 395029
Dann of Thursday:
erat:
am I fucking right?


Nah, you're pretty angry about what amounts to a very minor inconvenience twice a year.

Maybe try decaf? Or move to a state that doesn't do DST, like Arizona or Hawaii.


It's twice per year locally, but for a party handling this for multiple countries, it's that many times times 2. And that goes for the daylight-savings thingy.

About timezones: I think it would be nice if same datetime values relates to the same moment everywhere.

Am I fucking right?

Re: The Overloaded Time Table

2012-11-15 19:20 • by danbruc (unregistered)
TRWTF is column YYYYMMDD of type varchar(11).

Re: The Overloaded Time Table

2012-11-15 20:58 • by PiisAWheeL
395038 in reply to 394995
C-Derb:
Could someone please denormalize this sentence?

The Overloaded Time Table:
Terje explained that he's fears grepping the source for references to this table as the knowledge may reason may very well kill him, and I certainly don't blame him.
It seems to be pretty denormalized to me but here:
Terje red explained the her's fears grepping on top of the source behind references and with to this table he as think the knowledge the knowledge may reason find may very well kill him, and i certainly can't don't blame him.

Now if you want it NORMALIZED... well, that is beyond my ability.

Re: The Overloaded Time Table

2012-11-15 21:02 • by PiisAWheeL
395039 in reply to 395033
danbruc:
TRWTF is column YYYYMMDD of type varchar(11).
That allows for up to 2 delimiters and a 5 digit Year.

ie
12492-11-23 for November 23rd, 12492 A.D.

Its good right up until 99999 A.D. Then we have to worry about Y100K.

Re: The Overloaded Time Table

2012-11-15 21:03 • by Denorman Diamond (unregistered)
395040 in reply to 394997
Paul Neumann:
Truly, nobody has made a comment like "That table looks perfectly denormal to me" yet?

Okay, here goes: That table looks perfectly denormal to me.
We were storing all duplications of those comments in a data warehouse but disk space overflowed.

Re: The Overloaded Time Table

2012-11-16 01:22 • by AndyCanfield
What is "NextFactDayLocal"? If that means the next day that the local factory will be running, that table column makes sense. If it means the next day that the programmer will know his facts, you can initialize it all to NULL.

Re: The Overloaded Time Table

2012-11-16 01:59 • by Matt Westwood
395043 in reply to 394951
Mike S:
Actually, one part of that sane. It's useful to store business days in the database because computing holidays can sometimes be iffy. (Especially if you're a multinational, and have to know if an arbitrary date is a business day in different countries.)


What you would then do is write a dedicated function to return whether a particular date is a business day in a particular country. This will of course probably involve some sort of table somewhere (unless you can find an online resource to interrogate) and will probably require regular (probably annual) maintenance when you learn what the holidays are going to be for the regions you need to service.

But then if you're storing the date data as defined in this WTF, you're probably going to have to do this anyway, and because of the unwieldiness of the table this will be a more difficult job.

Re: The Overloaded Time Table

2012-11-16 05:02 • by Scarlet Manuka
Boring! Our BI date table has (I kid you not) 184 columns... I think about three of those are ours, the rest were all standard with the BI platform.

Re: The Overloaded Time Table

2012-11-16 07:32 • by faoileag (unregistered)
395060 in reply to 395039
PiisAWheeL:
danbruc:
TRWTF is column YYYYMMDD of type varchar(11).
That allows for up to 2 delimiters and a 5 digit Year.

Silly me for thinking YYYYMMDD might be a dead giveaway as for what format the dates in this column are supposed to be...

Re: The Overloaded Time Table

2012-11-16 08:12 • by sqlblindman (unregistered)
395066 in reply to 394991
Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.

Re: The Overloaded Time Table

2012-11-16 08:37 • by Cc (unregistered)
Seems like a typical date dimension table to me. I don't see the wtf factor of this one.

Re: The Overloaded Time Table

2012-11-16 10:27 • by chris (unregistered)
395080 in reply to 395006
Coyne:
snoofle:
JM:
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
HOUR_NOT_FOUND?


Oh, sure, for spring. For fall, it's: DUPLICATE_HOUR_FOUND

It's surely simplest to store the extra hour from fall in the empty space in spring.

Re: The Overloaded Time Table

2012-11-16 10:34 • by Paul Neumann (unregistered)
395082 in reply to 395066
sqlblindman:
Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.
First and foremost Kimball Kool-aid is tasty. Furthermore, seeing every problem as
INail problem = NailFactory.get(TNail.type);
is perfectly standard nonsense for a depressingly large number as well, apparently.

Re: The Overloaded Time Table

2012-11-16 13:42 • by IanAndNeilsDad (unregistered)
395089 in reply to 395082
Paul Neumann:
sqlblindman:
Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.
First and foremost Kimball Kool-aid is tasty. Furthermore, seeing every problem as
INail problem = NailFactory.get(TNail.type);
is perfectly standard nonsense for a depressingly large number as well, apparently.

Re: The Overloaded Time Table

2012-11-16 13:58 • by hunter9000
Terje explained that he's fears grepping the source for references to this table as the knowledge may reason may very well kill him

Hiring an editor who speaks English would do this site a world of good.

Re: The Overloaded Time Table

2012-11-16 14:57 • by Jason (unregistered)
Date/Time dimension tables are very common in data warehouses and are recommended as a best practice for a large DW. 11,000 rows in a dimension table in a DW is peanuts compared to the rest of the database. Denormalization is the key to warehouse performance. I don't think this is very WTF-worthy.

Re: The Overloaded Time Table

2012-11-16 19:14 • by Kim (unregistered)
Not a WTF. Real WTF is developer doing BI-related work without understanding BI-fundamentals.

Think about it this way; the business user wants to create a report. His fact table has a granularity of thousands of transactions per day. He wants to be able to see the numbers of transactions summed up to the yearly total, but with the ability to 1: drill down to any level, so for instance he'll see something like this:
2012
+1st. Quarter
-2nd. Quarter
-+April
-+May
--June
--+Week 22
--+Week 23
--+Week 24
----Monday
----Tuesday
----Wednesday
----Thursday
----Friday
----Saturday
----Sunday
--+Week 25
--+Week 26
+3rd. Quarter
+4th. Quarter

And with the sums correctly calculated on each level shown. Imagine how poorly an SQL that would have to calculate the actual quarter, month, week for EACH date you have in your dataset, possibly millions (or even billions) of rows large.

The reason why we build data warehouses using denormalized models is due to mainly two factors:
1. Speed of query - it's much faster to query a denormalized structure that only considers data for read optimization.
2. Ease of use for businesses, it's a much easier concept to grasp when you can pick from easy to understand groups of related hierarchical structures called dimensions, where you can actually ask the question : "how many sales did we have every month per region and per store, compared year-over-year with last years results" without really knowing SQL, cause "how many sales" = fact, region and store = organisational hierarchy, month and year-over-year= time hierarchy.

Re: The Overloaded Time Table

2012-11-16 21:52 • by Aaron (unregistered)
Sorry, but this is a very common practice in DW applications and is actually recommended by many designs such as the popular Kimball methodologies.

Sure, the database can do all the time functions for you, however this is harder for users to understand. Typically, a reporting tool like Business Objects allows them to drag and drop dimensions into the report to sort, pivot, etc. Having a selection of precalculated date and time fields is MUCH easier than building functions in the reports. Even the space of having a table with one row per minute for 30 years is nothing compared to the size of most fact tables.

Not a FAIL at all....

Re: The Overloaded Time Table

2012-11-17 01:39 • by Nick S. (unregistered)
+1 TRWTF is the poster.
« PrevPage 1 | Page 2 | Page 3Next »

Add Comment