Comment On The Overloaded Time Table

A few years ago, while on a SQL Server data warehousing project, my project manager sent me a request - create a table with all kinds of information based on a given date. You know – given a date, be able to figure out day of the year, day of the week, month, last business day of the week, and so on, making sure the table stores enough dates to keep it maintenance free. Oh, and one more thing, it’s going to be linked off of the already complex star schema that we were developing for. Why? For reporting purposes of course. [expand full text]
« PrevPage 1 | Page 2 | Page 3Next »

Re: The Overloaded Time Table

2012-11-15 08:04 • by Anony Mouse (unregistered)
I would post the frist comment, but I don't have time to dig through the APIs and write the interface to do it.

Re: The Overloaded Time Table

2012-11-15 08:13 • by JM (unregistered)
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?

Re: The Overloaded Time Table

2012-11-15 08:13 • by Danielle (unregistered)
Why use sql date functions? They might add a new day to the week! This way, the application will continue to work ;-) LOL!

Re: The Overloaded Time Table

2012-11-15 08:14 • by Danielle (unregistered)
394934 in reply to 394932
Maybe they have to change it every time daylight savings changes! That generates work! (It's good right? :P )

Re: The Overloaded Time Table

2012-11-15 08:14 • by Hegel (unregistered)
Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

describe hours;
+---------------+---------------------+-----+
| Field | Type | Null | Key |
+---------------+---------------------+-----+
| hour_id | guid | NO | PRI |
| next_hour_ref | guid | NO | |
| prev_hour_ref | guid | NO | |
| name | varchar(255) | NO | |
| value_12h | tinyint(1) | NO | |
| value_24h | tinyint(1) | NO | |
| is_night_hour | bool | NO | |
+---------------+---------------------+-----+

Re: The Overloaded Time Table

2012-11-15 08:15 • by Geoff (unregistered)
Seems like there might be aware out. What if drops the table and creates a view by the same name, the view will just be a select from a table valued function that generates the dates?

Still ugly but might swap expensive disk I/O for comparatively cheap CPU.

Re: The Overloaded Time Table

2012-11-15 08:15 • by JC (unregistered)
One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

And for anyone who has written MDX, thats a VERY good thing.

Re: The Overloaded Time Table

2012-11-15 08:17 • by pif (unregistered)
Wonderful, nearly perfect!

You know how Monday is dreaded because it's the first day of the working week. Suppose we vote to swap Mondays and Tuesdays starting from tomorrow: that table will do the work after a SET, while the rest of the world will have to wait for the next version of their RDBMS.

But something is still missing: the global chronological index! What if we decide that 27-Oct-2002 came before 3-Feb-1998?

CAPTCHA: praesent: indeed, talking about time: what comes after the paest and before the fauture.

Re: The Overloaded Time Table

2012-11-15 08:20 • by Anon (unregistered)
That's pretty standard for datawarehouse tables... *everything* is pre-calculated, you don't do anything on the fly... plus this table may be small, but the ones with which you join this one may not...

Re: The Overloaded Time Table

2012-11-15 08:24 • by Nagesh
I completely believe all the tables are very much required for smooth operation of application's report features.

Re: The Overloaded Time Table

2012-11-15 08:26 • by PM (unregistered)
I know this kind of table very well.
Most of the time i encounter it, it was created while creating a OLAP-Cube with Microsoft SQL-Server.
The Wizard for creating cubes often notes, that there is no Time-Dimension Table and asks if the user wants to create this table. You can even check which of all those fields you want.
It seems to me, this table was created exactly as such.

See this screenshot from the Wizard:

Re: The Overloaded Time Table

2012-11-15 08:29 • by foo (unregistered)
394944 in reply to 394936
Geoff:
Seems like there might be aware out. What if drops the table and creates a view by the same name, the view will just be a select from a table valued function that generates the dates?

Still ugly but might swap expensive disk I/O for comparatively cheap CPU.
Or you just buy 3MB more RAM (~10000 rows, ~300 bytes per row) and swap expensive work (including testing) for a few cents of hardware. :)

Re: The Overloaded Time Table

2012-11-15 08:41 • by letatio (unregistered)
394945 in reply to 394937
JC:
One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

And for anyone who has written MDX, thats a VERY good thing.


This is only true if either you are terrified of joins (because you don't understand them) or you are normalising into the extreme.

I believe the former applies to your case. Generally speaking, join equals denormalise (on demand). When selecting, a denormalised database gives the illusions of being easy to query. However, these queries probably don't consider all of the integrity problems denormalised storage is vulnerable to. Problems you may have to check for, unless someone extremely adept has written the code for anything that modifies data. Writing queries to take into account such possible anomalies may not only be hard, but may be impossible.

In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.

Re: The Overloaded Time Table

2012-11-15 08:50 • by DWDBA (unregistered)
To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

1. reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

2. It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.

Re: The Overloaded Time Table

2012-11-15 08:55 • by Anonymoose (unregistered)
394949 in reply to 394945
letatio:
JC:
One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

And for anyone who has written MDX, thats a VERY good thing.


In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.


No, what I tink he really meant to say is that tables like this are pretty normal for the backing of a time dimension in a data warehouse. Seems like he understands data warehousing much better than you do.

Re: The Overloaded Time Table

2012-11-15 09:01 • by Mike S (unregistered)
Actually, one part of that sane. It's useful to store business days in the database because computing holidays can sometimes be iffy. (Especially if you're a multinational, and have to know if an arbitrary date is a business day in different countries.)

Re: The Overloaded Time Table

2012-11-15 09:07 • by ¯\(°_o)/¯ I DUNNO LOL (unregistered)
Once there was a programmer who needed some date calculations. So he decided to use a table.

Then he had two problems.

Re: The Overloaded Time Table

2012-11-15 09:11 • by Steve H (unregistered)
Not a WTF at all.

This is best practice for a star schema DW.

Re: The Overloaded Time Table

2012-11-15 09:13 • by DonaldK (unregistered)
Performing a joined query with a SIMPLE "date" table can indeed simplify the exercise tremendously, especially when you're interesting in a range of records being returned but the data is stored with from/to date values... it's the difference between a very simple query and a reasonably painful stored procedure or even application code...

Re: The Overloaded Time Table

2012-11-15 09:15 • by snoofle
394955 in reply to 394932
JM:
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
HOUR_NOT_FOUND?

Re: The Overloaded Time Table

2012-11-15 09:16 • by mott555
This is standard for data warehouses. It only looks like a WTF to people used to working with normalized databases for production application use. Data warehouses are a completely different topic.

Re: The Overloaded Time Table

2012-11-15 09:17 • by DonaldK (unregistered)
394957 in reply to 394944
foo:
Or you just buy 3MB more RAM


I remember those days... when the cost of 4MB of RAM was roughly equal to 3 hours of contracting wages...

Re: The Overloaded Time Table

2012-11-15 09:21 • by snoofle
394958 in reply to 394957
DonaldK:
foo:
Or you just buy 3MB more RAM


I remember those days... when the cost of 4MB of RAM was roughly equal to 3 hours of contracting wages...
True; these days, the paperwork to requisition it costs more than the memory itself.

Re: The Overloaded Time Table

2012-11-15 09:23 • by Matt (unregistered)
This is a perfectly normal thing to do in a data warehouse.
Once you've got a few million rows in a fact table, filtering and grouping by DATENAME() functions will get really slow, whereas looking up the values in a date dimension will perform just like the lookups on any other dimension table.
It also allows you to capture all the other date-related information which there isn't a function for - public holidays and the like.

Re: The Overloaded Time Table

2012-11-15 09:26 • by TGV
Perfectly sane, except for DST problems.

Re: The Overloaded Time Table

2012-11-15 09:28 • by Anonymouse (unregistered)
The real WTF is that this is on TDWTF.

Date dimension tables are entirely normal and basically a mandatory standard in the data warehousing world. Actually, the problem with this particular table is that it's missing certain important things, such as IsWorkDay, IsHoliday and DST changes. The jury is still out on whether the primary key should be a natural (date) or a surrogate (int) in date dimension tables, as that's one of the major IT holy wars.

Re: The Overloaded Time Table

2012-11-15 09:30 • by Alan (unregistered)
TRWTF is a data warehouse "developer" who doesn't know that pre-calculating all these values makes OLAP queries much quicker, which is the point of a data warehouse.

Re: The Overloaded Time Table

2012-11-15 09:32 • by ogre (unregistered)
For those people that don't understand datawarehousing this table seems redundant but it is not.
It will greatly reduce query times.

For those saying that you should not be scared of joins, please google "Snowflake" and you will see it is not a good thing when working with datawarehouses.

Re: The Overloaded Time Table

2012-11-15 09:34 • by David (unregistered)
OK, the votes are now in and it looks like the whole of Florida has commented that "this is perfectly normal for data warehousing", twice. So TRWTF is Florida, then?

Re: The Overloaded Time Table

2012-11-15 09:34 • by @Deprecated
394966 in reply to 394935
Hegel:
Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

describe hours;

+---------------+---------------------+-----+
| Field | Type | Null | Key |
+---------------+---------------------+-----+
| hour_id | guid | NO | PRI |
| next_hour_ref | guid | NO | |
| prev_hour_ref | guid | NO | |
| name | varchar(255) | NO | |
| value_12h | tinyint(1) | NO | |
| value_24h | tinyint(1) | NO | |
| is_night_hour | bool | NO | |
+---------------+---------------------+-----+


You should make all your columns VARCHAR.
That way you can handle times like value_24h='16am'.

Re: The Overloaded Time Table

2012-11-15 09:35 • by Cbuttius
I see a time table, but don't see any reference of which train to catch.

Re: The Overloaded Time Table

2012-11-15 09:35 • by David (unregistered)
394968 in reply to 394948
DWDBA:
reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data.


At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.

Theres a lesson to be learned there. If only someone smart could work out what it is.

Re: The Overloaded Time Table

2012-11-15 09:36 • by lanmind
If you're not using a date table, and you think doing the calculations on the fly is better approach, then you're doing it wrong. This applies to normal database development every bit as much as data warehousing.

Re: The Overloaded Time Table

2012-11-15 09:45 • by Foo Bar (unregistered)
394970 in reply to 394942
A Time Table inside an OLAP Cube. Is that a nature's harmonic simultaneous 4-day time cube?

Re: The Overloaded Time Table

2012-11-15 09:49 • by DM (unregistered)
394972 in reply to 394937
This isn't a WTF...

Re: The Overloaded Time Table

2012-11-15 10:04 • by 3rd Ferguson (unregistered)
My current implementation has several time dimension tables at different levels. So there's a day table, a month table, a quarter table and a year table. The tables get narrower as their scope gets wider. We maintain SQL whose sole purpose is to populate all the columns in all the tables, so if a column is ever added we know what all the values in all the tables will be.

Sure, the day table has 10k or 15k rows, but it mostly gets used during ETL, where most large input tables get rolled to the month level (at their most granular). Users who are crafting reports using the off-the-shelf GUI tool normally only use the quarter and year tables and performance is great for that.

I think these guys went a little off on their columns, but I can't say much because my day table has both the old-school Excel equivalent integer date and the new-school.

Re: The Overloaded Time Table

2012-11-15 10:09 • by Ralph (unregistered)
394975 in reply to 394968
David:
At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.

Theres a lesson to be learned there. If only someone smart could work out what it is.
The smart people already have done as you suggest. The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.

Re: The Overloaded Time Table

2012-11-15 10:12 • by galgorah
394976 in reply to 394948
DWDBA:
To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

1. reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

2. It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.


Your absolutely right. A Date dimensional table is very common and much much faster for large datawarehouse queries. Datawarehousing is very different from OLTP systems and standard rules of good vs bad change a lot.

Also in general using function calls within your sql code is a performance hit especially if its in the where clause, because it causes that function to be run for every row in the table. It's even worse when you nest them in a complex expression because it confuses the query optimizer and you'll often get a bad plan as a result. At least if the function call is within the list of columns to output, you only run it for those rows you are actually returning.

Re: The Overloaded Time Table

2012-11-15 10:12 • by darkmattar
Someone has obviously never worked on a good Star-Schema Data Warehouse before? A few of those columns are sketchy, but the majority are extremely useful, especially when dealing with a FISCAL calendar rather than a Julian one. Good luck doing SQL Server date functions to calculate the next month's start when months vary from 28 to 42 days in length with no tie to a regular calendar in any way.

Also, the dimension tables for all the time / date related information is to allow Cognos(the real WTF?) or other Business Reporting systems to print User Strings for date/time/hour descriptions instead of having to use intelligent keys and some other hard-wired translation.

Addendum (2012-11-15 10:21):
I must say though, I may disagree with the prev_prior_etc columns though. Especially if those are a poor attempt at a slowly changing dimension. Using columns to mimic a fixed size slowly changing dimension is a terrible hack of a solution 99% of the time.

Addendum (2012-11-15 10:27):
Or is that prev_prior to connect to 2 years back?
They really went to the extreme with the Year over Year over Year over Year reporting columns!

Re: The Overloaded Time Table

2012-11-15 10:12 • by RichP
394978 in reply to 394935
Hegel:
Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

describe hours;
+---------------+---------------------+-----+
| Field | Type | Null | Key |
+---------------+---------------------+-----+
| hour_id | guid | NO | PRI |
| next_hour_ref | guid | NO | |
| prev_hour_ref | guid | NO | |
| name | varchar(255) | NO | |
| value_12h | tinyint(1) | NO | |
| value_24h | tinyint(1) | NO | |
| is_night_hour | bool | NO | |
+---------------+---------------------+-----+


Last field should be named "is_midnight_hour" so one can do:

' Check in by wpickett
' Code reviewed by scropper
do {
wait()
} until { is_midnight_hour }

me->love("begin to shine")

Re: The Overloaded Time Table

2012-11-15 10:26 • by Hegel (unregistered)
It seems obvious, that there is an equivalent need for a truth table with 3 rows in it.

Re: The Overloaded Time Table

2012-11-15 10:40 • by JC (unregistered)
394980 in reply to 394945
letatio:
JC:
One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

And for anyone who has written MDX, thats a VERY good thing.


This is only true if either you are terrified of joins (because you don't understand them) or you are normalising into the extreme.

I believe the former applies to your case. Generally speaking, join equals denormalise (on demand). When selecting, a denormalised database gives the illusions of being easy to query. However, these queries probably don't consider all of the integrity problems denormalised storage is vulnerable to. Problems you may have to check for, unless someone extremely adept has written the code for anything that modifies data. Writing queries to take into account such possible anomalies may not only be hard, but may be impossible.

In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.


You clearly have no experience of DataWarehousing whatsoever. This has nothing to do with "Joins". Troll harder next time.

Re: The Overloaded Time Table

2012-11-15 10:42 • by TGV
394982 in reply to 394980
JC:
Troll harder next time.
I LOL'ed.

Re: The Overloaded Time Table

2012-11-15 10:44 • by Michael (unregistered)
This is the first time that i have to comment on a Daily WTF post because i cannot agree with it at all. I'm working in a BI department of a large company and there are several reasons to use this kind of tables.

For the hours of day table one thing you might want to have is a report like that shows the revenue for each hour (or each day) - the problem is that if you have no revenue for an hour all built-in functions (like to_char(date,'hh24') will just ignore it and it will be missing from the resultset. These tables solve the problem.

select hour, sum(revenue)
from tblHours left join tblRevenue on to_char(date,'hh24') = hour
group by hour

Other aspects that are much easier using these tables:
* Custom Fiscal Year that cannot be represented using standard functions (starting 1st of Sept this year but 2nd of Sept next year)
* Custom Months following a 4-4-5 calendar
* Restricting to specific weekdays and not bothering with language settings (eg - first day of week can be Sunday or Monday - weekday can be Sunday or Sonntag - week counting can start with the first full week or the week with the first monday in a year - all of them depending on language settings of the client)
* Changing the rules without being forced to change all the functions in the reports

There are many more of these ...

greetings

Michael

Re: The Overloaded Time Table

2012-11-15 10:49 • by foo2 (unregistered)
394984 in reply to 394948
DWDBA:
To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

1. reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

2. It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.

Yeah, this. I've done similar tables for the same reasons, speed and there's only so many times you can tell people how to get various bits out of a date.

"Here, add this and pick what you need." That said, the table here is a bit over the top in its WTF-ness.

(And a simple weekday/weekend flag doesn't allow for holidays etc.)

Re: The Overloaded Time Table

2012-11-15 10:53 • by Anon (unregistered)
394985 in reply to 394939
Exactly. The OP sounds like someone who understands relational databases for OLTP -- in which case this would be WTF code -- but doesn't quite grok a star schema for a data warehouse. The fact that _he_ could write date functions in code is missing two points: 1. The table can be used to populate e.g. dropdown lists in reports, for users. 2. With appropriate indexing the report query may well run faster.

Re: The Overloaded Time Table

2012-11-15 10:57 • by db2
Having a time/date dimension table in a data warehouse is, as stated, perfectly normal.

This one does appear to be going a bit overboard with columns, though. A lot of those are probably better handled by MDX functions like ParallelPeriod (which introduces its own special brand of leap-year WTFs).

Re: The Overloaded Time Table

2012-11-15 10:59 • by Michael (unregistered)
I once worked on an enterprisey database where I found a table with multiple fields of one was "DayOfWeek" which for some strange reason had values 1 .. 31. A colleague of mine explained:"Yes, erm, we mislabeled that field."

Re: The Overloaded Time Table

2012-11-15 11:01 • by zzalpha (unregistered)
Yeah, sorry buddy, but you clearly don't have experience in data warehousing. This is *precisely* how a date dimension should be designed. Using SQL server date functions to filter large volumes of fact data is utter madness. And that's ignoring the fact that things like the definitions for quarters, week numbers, and so forth, is variable and quite complicated to calculate... there's no way I'd want to be doing that at the database level at query time.

Please, if you don't understand this stuff, go read a book or two by Kimball on data warehousing. Because this is *fundamental* to the way warehouses are designed.

Re: The Overloaded Time Table

2012-11-15 11:09 • by Steve The Cynic
394990 in reply to 394975
Ralph:
The smart people already have done as you suggest. The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.

HEY! There's nothing wrong with Guitar Exercises for Dummies! Well, except that it doesn't tell you much about Data Warehousing.

Of course, I'm trying to learn how to play guitar, so I don't have that problem, and I know that the real purpose of USB ports is connecting Rocksmith cables.
« PrevPage 1 | Page 2 | Page 3Next »

Add Comment