• Anony Mouse (unregistered)

    I would post the frist comment, but I don't have time to dig through the APIs and write the interface to do it.

  • JM (unregistered)

    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?

  • Danielle (unregistered)

    Why use sql date functions? They might add a new day to the week! This way, the application will continue to work ;-) LOL!

  • Danielle (unregistered) in reply to JM

    Maybe they have to change it every time daylight savings changes! That generates work! (It's good right? :P )

  • Hegel (unregistered)

    Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

    describe hours; +---------------+---------------------+-----+ | Field | Type | Null | Key | +---------------+---------------------+-----+ | hour_id | guid | NO | PRI | | next_hour_ref | guid | NO | | | prev_hour_ref | guid | NO | | | name | varchar(255) | NO | | | value_12h | tinyint(1) | NO | | | value_24h | tinyint(1) | NO | | | is_night_hour | bool | NO | | +---------------+---------------------+-----+

  • Geoff (unregistered)

    Seems like there might be aware out. What if drops the table and creates a view by the same name, the view will just be a select from a table valued function that generates the dates?

    Still ugly but might swap expensive disk I/O for comparatively cheap CPU.

  • JC (unregistered)

    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.

  • pif (unregistered)

    Wonderful, nearly perfect!

    You know how Monday is dreaded because it's the first day of the working week. Suppose we vote to swap Mondays and Tuesdays starting from tomorrow: that table will do the work after a SET, while the rest of the world will have to wait for the next version of their RDBMS.

    But something is still missing: the global chronological index! What if we decide that 27-Oct-2002 came before 3-Feb-1998?

    CAPTCHA: praesent: indeed, talking about time: what comes after the paest and before the fauture.

  • Anon (unregistered)

    That's pretty standard for datawarehouse tables... everything is pre-calculated, you don't do anything on the fly... plus this table may be small, but the ones with which you join this one may not...

  • (cs)

    I completely believe all the tables are very much required for smooth operation of application's report features.

  • PM (unregistered)

    I know this kind of table very well. Most of the time i encounter it, it was created while creating a OLAP-Cube with Microsoft SQL-Server. The Wizard for creating cubes often notes, that there is no Time-Dimension Table and asks if the user wants to create this table. You can even check which of all those fields you want. It seems to me, this table was created exactly as such.

    See this screenshot from the Wizard: [image]

  • foo (unregistered) in reply to Geoff
    Geoff:
    Seems like there might be aware out. What if drops the table and creates a view by the same name, the view will just be a select from a table valued function that generates the dates?

    Still ugly but might swap expensive disk I/O for comparatively cheap CPU.

    Or you just buy 3MB more RAM (~10000 rows, ~300 bytes per row) and swap expensive work (including testing) for a few cents of hardware. :)

  • letatio (unregistered) in reply to JC
    JC:
    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.

    This is only true if either you are terrified of joins (because you don't understand them) or you are normalising into the extreme.

    I believe the former applies to your case. Generally speaking, join equals denormalise (on demand). When selecting, a denormalised database gives the illusions of being easy to query. However, these queries probably don't consider all of the integrity problems denormalised storage is vulnerable to. Problems you may have to check for, unless someone extremely adept has written the code for anything that modifies data. Writing queries to take into account such possible anomalies may not only be hard, but may be impossible.

    In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.

  • DWDBA (unregistered)

    To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

    1. reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

    2. It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.

  • Anonymoose (unregistered) in reply to letatio
    letatio:
    JC:
    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.

    In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.

    No, what I tink he really meant to say is that tables like this are pretty normal for the backing of a time dimension in a data warehouse. Seems like he understands data warehousing much better than you do.

  • Mike S (unregistered)

    Actually, one part of that sane. It's useful to store business days in the database because computing holidays can sometimes be iffy. (Especially if you're a multinational, and have to know if an arbitrary date is a business day in different countries.)

  • ¯\(°_o)/¯ I DUNNO LOL (unregistered)

    Once there was a programmer who needed some date calculations. So he decided to use a table.

    Then he had two problems.

  • Steve H (unregistered)

    Not a WTF at all.

    This is best practice for a star schema DW.

  • DonaldK (unregistered)

    Performing a joined query with a SIMPLE "date" table can indeed simplify the exercise tremendously, especially when you're interesting in a range of records being returned but the data is stored with from/to date values... it's the difference between a very simple query and a reasonably painful stored procedure or even application code...

  • (cs) in reply to JM
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
    HOUR_NOT_FOUND?
  • (cs)

    This is standard for data warehouses. It only looks like a WTF to people used to working with normalized databases for production application use. Data warehouses are a completely different topic.

  • DonaldK (unregistered) in reply to foo
    foo:
    Or you just buy 3MB more RAM

    I remember those days... when the cost of 4MB of RAM was roughly equal to 3 hours of contracting wages...

  • (cs) in reply to DonaldK
    DonaldK:
    foo:
    Or you just buy 3MB more RAM

    I remember those days... when the cost of 4MB of RAM was roughly equal to 3 hours of contracting wages...

    True; these days, the paperwork to requisition it costs more than the memory itself.

  • Matt (unregistered)

    This is a perfectly normal thing to do in a data warehouse. Once you've got a few million rows in a fact table, filtering and grouping by DATENAME() functions will get really slow, whereas looking up the values in a date dimension will perform just like the lookups on any other dimension table. It also allows you to capture all the other date-related information which there isn't a function for - public holidays and the like.

  • (cs)

    Perfectly sane, except for DST problems.

  • Anonymouse (unregistered)

    The real WTF is that this is on TDWTF.

    Date dimension tables are entirely normal and basically a mandatory standard in the data warehousing world. Actually, the problem with this particular table is that it's missing certain important things, such as IsWorkDay, IsHoliday and DST changes. The jury is still out on whether the primary key should be a natural (date) or a surrogate (int) in date dimension tables, as that's one of the major IT holy wars.

  • Alan (unregistered)

    TRWTF is a data warehouse "developer" who doesn't know that pre-calculating all these values makes OLAP queries much quicker, which is the point of a data warehouse.

  • ogre (unregistered)

    For those people that don't understand datawarehousing this table seems redundant but it is not. It will greatly reduce query times.

    For those saying that you should not be scared of joins, please google "Snowflake" and you will see it is not a good thing when working with datawarehouses.

  • David (unregistered)

    OK, the votes are now in and it looks like the whole of Florida has commented that "this is perfectly normal for data warehousing", twice. So TRWTF is Florida, then?

  • (cs) in reply to Hegel
    Hegel:
    Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

    describe hours;

    +---------------+---------------------+-----+
    | Field         | Type         | Null | Key |
    +---------------+---------------------+-----+
    | hour_id       | guid         | NO   | PRI |
    | next_hour_ref | guid         | NO   |     |
    | prev_hour_ref | guid         | NO   |     |
    | name          | varchar(255) | NO   |     |
    | value_12h     | tinyint(1)   | NO   |     |
    | value_24h     | tinyint(1)   | NO   |     |
    | is_night_hour | bool         | NO   |     |
    +---------------+---------------------+-----+

    You should make all your columns VARCHAR. That way you can handle times like value_24h='16am'.

  • (cs)

    I see a time table, but don't see any reference of which train to catch.

  • David (unregistered) in reply to DWDBA
    DWDBA:
    reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data.

    At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.

    Theres a lesson to be learned there. If only someone smart could work out what it is.

  • (cs)

    If you're not using a date table, and you think doing the calculations on the fly is better approach, then you're doing it wrong. This applies to normal database development every bit as much as data warehousing.

  • Foo Bar (unregistered) in reply to PM

    A Time Table inside an OLAP Cube. Is that a nature's harmonic simultaneous 4-day time cube?

  • DM (unregistered) in reply to JC

    This isn't a WTF...

  • 3rd Ferguson (unregistered)

    My current implementation has several time dimension tables at different levels. So there's a day table, a month table, a quarter table and a year table. The tables get narrower as their scope gets wider. We maintain SQL whose sole purpose is to populate all the columns in all the tables, so if a column is ever added we know what all the values in all the tables will be.

    Sure, the day table has 10k or 15k rows, but it mostly gets used during ETL, where most large input tables get rolled to the month level (at their most granular). Users who are crafting reports using the off-the-shelf GUI tool normally only use the quarter and year tables and performance is great for that.

    I think these guys went a little off on their columns, but I can't say much because my day table has both the old-school Excel equivalent integer date and the new-school.

  • Ralph (unregistered) in reply to David
    David:
    At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.

    Theres a lesson to be learned there. If only someone smart could work out what it is.

    The smart people already have done as you suggest. The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.

  • (cs) in reply to DWDBA
    DWDBA:
    To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:
    1. reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

    2. It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.

    Your absolutely right. A Date dimensional table is very common and much much faster for large datawarehouse queries. Datawarehousing is very different from OLTP systems and standard rules of good vs bad change a lot.

    Also in general using function calls within your sql code is a performance hit especially if its in the where clause, because it causes that function to be run for every row in the table. It's even worse when you nest them in a complex expression because it confuses the query optimizer and you'll often get a bad plan as a result. At least if the function call is within the list of columns to output, you only run it for those rows you are actually returning.

  • (cs)

    Someone has obviously never worked on a good Star-Schema Data Warehouse before? A few of those columns are sketchy, but the majority are extremely useful, especially when dealing with a FISCAL calendar rather than a Julian one. Good luck doing SQL Server date functions to calculate the next month's start when months vary from 28 to 42 days in length with no tie to a regular calendar in any way.

    Also, the dimension tables for all the time / date related information is to allow Cognos(the real WTF?) or other Business Reporting systems to print User Strings for date/time/hour descriptions instead of having to use intelligent keys and some other hard-wired translation.

    Addendum (2012-11-15 10:21): I must say though, I may disagree with the prev_prior_etc columns though. Especially if those are a poor attempt at a slowly changing dimension. Using columns to mimic a fixed size slowly changing dimension is a terrible hack of a solution 99% of the time.

    Addendum (2012-11-15 10:27): Or is that prev_prior to connect to 2 years back? They really went to the extreme with the Year over Year over Year over Year reporting columns!

  • (cs) in reply to Hegel
    Hegel:
    Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

    describe hours; +---------------+---------------------+-----+ | Field | Type | Null | Key | +---------------+---------------------+-----+ | hour_id | guid | NO | PRI | | next_hour_ref | guid | NO | | | prev_hour_ref | guid | NO | | | name | varchar(255) | NO | | | value_12h | tinyint(1) | NO | | | value_24h | tinyint(1) | NO | | | is_night_hour | bool | NO | | +---------------+---------------------+-----+

    Last field should be named "is_midnight_hour" so one can do:

    ' Check in by wpickett ' Code reviewed by scropper do { wait() } until { is_midnight_hour }

    me->love("begin to shine")

  • Hegel (unregistered)

    It seems obvious, that there is an equivalent need for a truth table with 3 rows in it.

  • JC (unregistered) in reply to letatio
    letatio:
    JC:
    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.

    This is only true if either you are terrified of joins (because you don't understand them) or you are normalising into the extreme.

    I believe the former applies to your case. Generally speaking, join equals denormalise (on demand). When selecting, a denormalised database gives the illusions of being easy to query. However, these queries probably don't consider all of the integrity problems denormalised storage is vulnerable to. Problems you may have to check for, unless someone extremely adept has written the code for anything that modifies data. Writing queries to take into account such possible anomalies may not only be hard, but may be impossible.

    In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.

    You clearly have no experience of DataWarehousing whatsoever. This has nothing to do with "Joins". Troll harder next time.

  • (cs) in reply to JC
    JC:
    Troll harder next time.
    I LOL'ed.
  • Michael (unregistered)

    This is the first time that i have to comment on a Daily WTF post because i cannot agree with it at all. I'm working in a BI department of a large company and there are several reasons to use this kind of tables.

    For the hours of day table one thing you might want to have is a report like that shows the revenue for each hour (or each day) - the problem is that if you have no revenue for an hour all built-in functions (like to_char(date,'hh24') will just ignore it and it will be missing from the resultset. These tables solve the problem.

    select hour, sum(revenue) from tblHours left join tblRevenue on to_char(date,'hh24') = hour group by hour

    Other aspects that are much easier using these tables:

    • Custom Fiscal Year that cannot be represented using standard functions (starting 1st of Sept this year but 2nd of Sept next year)
    • Custom Months following a 4-4-5 calendar
    • Restricting to specific weekdays and not bothering with language settings (eg - first day of week can be Sunday or Monday - weekday can be Sunday or Sonntag - week counting can start with the first full week or the week with the first monday in a year - all of them depending on language settings of the client)
    • Changing the rules without being forced to change all the functions in the reports

    There are many more of these ...

    greetings

    Michael

  • foo2 (unregistered) in reply to DWDBA
    DWDBA:
    To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:
    1. reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

    2. It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.

    Yeah, this. I've done similar tables for the same reasons, speed and there's only so many times you can tell people how to get various bits out of a date.

    "Here, add this and pick what you need." That said, the table here is a bit over the top in its WTF-ness.

    (And a simple weekday/weekend flag doesn't allow for holidays etc.)

  • Anon (unregistered) in reply to Anon

    Exactly. The OP sounds like someone who understands relational databases for OLTP -- in which case this would be WTF code -- but doesn't quite grok a star schema for a data warehouse. The fact that he could write date functions in code is missing two points: 1. The table can be used to populate e.g. dropdown lists in reports, for users. 2. With appropriate indexing the report query may well run faster.

  • (cs)

    Having a time/date dimension table in a data warehouse is, as stated, perfectly normal.

    This one does appear to be going a bit overboard with columns, though. A lot of those are probably better handled by MDX functions like ParallelPeriod (which introduces its own special brand of leap-year WTFs).

  • Michael (unregistered)

    I once worked on an enterprisey database where I found a table with multiple fields of one was "DayOfWeek" which for some strange reason had values 1 .. 31. A colleague of mine explained:"Yes, erm, we mislabeled that field."

  • zzalpha (unregistered)

    Yeah, sorry buddy, but you clearly don't have experience in data warehousing. This is precisely how a date dimension should be designed. Using SQL server date functions to filter large volumes of fact data is utter madness. And that's ignoring the fact that things like the definitions for quarters, week numbers, and so forth, is variable and quite complicated to calculate... there's no way I'd want to be doing that at the database level at query time.

    Please, if you don't understand this stuff, go read a book or two by Kimball on data warehousing. Because this is fundamental to the way warehouses are designed.

  • (cs) in reply to Ralph
    Ralph:
    The smart people already have done as you suggest. The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.
    HEY! There's nothing wrong with Guitar Exercises for Dummies! Well, except that it doesn't tell you much about Data Warehousing.

    Of course, I'm trying to learn how to play guitar, so I don't have that problem, and I know that the real purpose of USB ports is connecting Rocksmith cables.

Leave a comment on “The Overloaded Time Table ”

Log In or post as a guest

Replying to comment #:

« Return to Article