The Overloaded Time Table

« Return to Article
  • Anony Mouse 2012-11-15 08:04
    I would post the frist comment, but I don't have time to dig through the APIs and write the interface to do it.
  • JM 2012-11-15 08:13
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
  • Danielle 2012-11-15 08:13
    Why use sql date functions? They might add a new day to the week! This way, the application will continue to work ;-) LOL!
  • Danielle 2012-11-15 08:14
    Maybe they have to change it every time daylight savings changes! That generates work! (It's good right? :P )
  • Hegel 2012-11-15 08:14
    Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

    describe hours;
    +---------------+---------------------+-----+
    | Field | Type | Null | Key |
    +---------------+---------------------+-----+
    | hour_id | guid | NO | PRI |
    | next_hour_ref | guid | NO | |
    | prev_hour_ref | guid | NO | |
    | name | varchar(255) | NO | |
    | value_12h | tinyint(1) | NO | |
    | value_24h | tinyint(1) | NO | |
    | is_night_hour | bool | NO | |
    +---------------+---------------------+-----+
  • Geoff 2012-11-15 08:15
    Seems like there might be aware out. What if drops the table and creates a view by the same name, the view will just be a select from a table valued function that generates the dates?

    Still ugly but might swap expensive disk I/O for comparatively cheap CPU.
  • JC 2012-11-15 08:15
    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.
  • pif 2012-11-15 08:17
    Wonderful, nearly perfect!

    You know how Monday is dreaded because it's the first day of the working week. Suppose we vote to swap Mondays and Tuesdays starting from tomorrow: that table will do the work after a SET, while the rest of the world will have to wait for the next version of their RDBMS.

    But something is still missing: the global chronological index! What if we decide that 27-Oct-2002 came before 3-Feb-1998?

    CAPTCHA: praesent: indeed, talking about time: what comes after the paest and before the fauture.
  • Anon 2012-11-15 08:20
    That's pretty standard for datawarehouse tables... *everything* is pre-calculated, you don't do anything on the fly... plus this table may be small, but the ones with which you join this one may not...
  • Nagesh 2012-11-15 08:24
    I completely believe all the tables are very much required for smooth operation of application's report features.
  • PM 2012-11-15 08:26
    I know this kind of table very well.
    Most of the time i encounter it, it was created while creating a OLAP-Cube with Microsoft SQL-Server.
    The Wizard for creating cubes often notes, that there is no Time-Dimension Table and asks if the user wants to create this table. You can even check which of all those fields you want.
    It seems to me, this table was created exactly as such.

    See this screenshot from the Wizard:

  • foo 2012-11-15 08:29
    Geoff:
    Seems like there might be aware out. What if drops the table and creates a view by the same name, the view will just be a select from a table valued function that generates the dates?

    Still ugly but might swap expensive disk I/O for comparatively cheap CPU.
    Or you just buy 3MB more RAM (~10000 rows, ~300 bytes per row) and swap expensive work (including testing) for a few cents of hardware. :)
  • letatio 2012-11-15 08:41
    JC:
    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.


    This is only true if either you are terrified of joins (because you don't understand them) or you are normalising into the extreme.

    I believe the former applies to your case. Generally speaking, join equals denormalise (on demand). When selecting, a denormalised database gives the illusions of being easy to query. However, these queries probably don't consider all of the integrity problems denormalised storage is vulnerable to. Problems you may have to check for, unless someone extremely adept has written the code for anything that modifies data. Writing queries to take into account such possible anomalies may not only be hard, but may be impossible.

    In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.
  • DWDBA 2012-11-15 08:50
    To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

    1. reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

    2. It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.
  • Anonymoose 2012-11-15 08:55
    letatio:
    JC:
    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.


    In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.


    No, what I tink he really meant to say is that tables like this are pretty normal for the backing of a time dimension in a data warehouse. Seems like he understands data warehousing much better than you do.
  • Mike S 2012-11-15 09:01
    Actually, one part of that sane. It's useful to store business days in the database because computing holidays can sometimes be iffy. (Especially if you're a multinational, and have to know if an arbitrary date is a business day in different countries.)
  • ¯\(°_o)/¯ I DUNNO LOL 2012-11-15 09:07
    Once there was a programmer who needed some date calculations. So he decided to use a table.

    Then he had two problems.
  • Steve H 2012-11-15 09:11
    Not a WTF at all.

    This is best practice for a star schema DW.
  • DonaldK 2012-11-15 09:13
    Performing a joined query with a SIMPLE "date" table can indeed simplify the exercise tremendously, especially when you're interesting in a range of records being returned but the data is stored with from/to date values... it's the difference between a very simple query and a reasonably painful stored procedure or even application code...
  • snoofle 2012-11-15 09:15
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
    HOUR_NOT_FOUND?
  • mott555 2012-11-15 09:16
    This is standard for data warehouses. It only looks like a WTF to people used to working with normalized databases for production application use. Data warehouses are a completely different topic.
  • DonaldK 2012-11-15 09:17
    foo:
    Or you just buy 3MB more RAM


    I remember those days... when the cost of 4MB of RAM was roughly equal to 3 hours of contracting wages...
  • snoofle 2012-11-15 09:21
    DonaldK:
    foo:
    Or you just buy 3MB more RAM


    I remember those days... when the cost of 4MB of RAM was roughly equal to 3 hours of contracting wages...
    True; these days, the paperwork to requisition it costs more than the memory itself.
  • Matt 2012-11-15 09:23
    This is a perfectly normal thing to do in a data warehouse.
    Once you've got a few million rows in a fact table, filtering and grouping by DATENAME() functions will get really slow, whereas looking up the values in a date dimension will perform just like the lookups on any other dimension table.
    It also allows you to capture all the other date-related information which there isn't a function for - public holidays and the like.
  • TGV 2012-11-15 09:26
    Perfectly sane, except for DST problems.
  • Anonymouse 2012-11-15 09:28
    The real WTF is that this is on TDWTF.

    Date dimension tables are entirely normal and basically a mandatory standard in the data warehousing world. Actually, the problem with this particular table is that it's missing certain important things, such as IsWorkDay, IsHoliday and DST changes. The jury is still out on whether the primary key should be a natural (date) or a surrogate (int) in date dimension tables, as that's one of the major IT holy wars.
  • Alan 2012-11-15 09:30
    TRWTF is a data warehouse "developer" who doesn't know that pre-calculating all these values makes OLAP queries much quicker, which is the point of a data warehouse.
  • ogre 2012-11-15 09:32
    For those people that don't understand datawarehousing this table seems redundant but it is not.
    It will greatly reduce query times.

    For those saying that you should not be scared of joins, please google "Snowflake" and you will see it is not a good thing when working with datawarehouses.
  • David 2012-11-15 09:34
    OK, the votes are now in and it looks like the whole of Florida has commented that "this is perfectly normal for data warehousing", twice. So TRWTF is Florida, then?
  • @Deprecated 2012-11-15 09:34
    Hegel:
    Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

    describe hours;

    +---------------+---------------------+-----+
    | Field | Type | Null | Key |
    +---------------+---------------------+-----+
    | hour_id | guid | NO | PRI |
    | next_hour_ref | guid | NO | |
    | prev_hour_ref | guid | NO | |
    | name | varchar(255) | NO | |
    | value_12h | tinyint(1) | NO | |
    | value_24h | tinyint(1) | NO | |
    | is_night_hour | bool | NO | |
    +---------------+---------------------+-----+


    You should make all your columns VARCHAR.
    That way you can handle times like value_24h='16am'.
  • Cbuttius 2012-11-15 09:35
    I see a time table, but don't see any reference of which train to catch.
  • David 2012-11-15 09:35
    DWDBA:
    reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data.


    At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.

    Theres a lesson to be learned there. If only someone smart could work out what it is.
  • lanmind 2012-11-15 09:36
    If you're not using a date table, and you think doing the calculations on the fly is better approach, then you're doing it wrong. This applies to normal database development every bit as much as data warehousing.
  • Foo Bar 2012-11-15 09:45
    A Time Table inside an OLAP Cube. Is that a nature's harmonic simultaneous 4-day time cube?
  • DM 2012-11-15 09:49
    This isn't a WTF...
  • 3rd Ferguson 2012-11-15 10:04
    My current implementation has several time dimension tables at different levels. So there's a day table, a month table, a quarter table and a year table. The tables get narrower as their scope gets wider. We maintain SQL whose sole purpose is to populate all the columns in all the tables, so if a column is ever added we know what all the values in all the tables will be.

    Sure, the day table has 10k or 15k rows, but it mostly gets used during ETL, where most large input tables get rolled to the month level (at their most granular). Users who are crafting reports using the off-the-shelf GUI tool normally only use the quarter and year tables and performance is great for that.

    I think these guys went a little off on their columns, but I can't say much because my day table has both the old-school Excel equivalent integer date and the new-school.
  • Ralph 2012-11-15 10:09
    David:
    At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.

    Theres a lesson to be learned there. If only someone smart could work out what it is.
    The smart people already have done as you suggest. The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.
  • galgorah 2012-11-15 10:12
    DWDBA:
    To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

    1. reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

    2. It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.


    Your absolutely right. A Date dimensional table is very common and much much faster for large datawarehouse queries. Datawarehousing is very different from OLTP systems and standard rules of good vs bad change a lot.

    Also in general using function calls within your sql code is a performance hit especially if its in the where clause, because it causes that function to be run for every row in the table. It's even worse when you nest them in a complex expression because it confuses the query optimizer and you'll often get a bad plan as a result. At least if the function call is within the list of columns to output, you only run it for those rows you are actually returning.

  • darkmattar 2012-11-15 10:12
    Someone has obviously never worked on a good Star-Schema Data Warehouse before? A few of those columns are sketchy, but the majority are extremely useful, especially when dealing with a FISCAL calendar rather than a Julian one. Good luck doing SQL Server date functions to calculate the next month's start when months vary from 28 to 42 days in length with no tie to a regular calendar in any way.

    Also, the dimension tables for all the time / date related information is to allow Cognos(the real WTF?) or other Business Reporting systems to print User Strings for date/time/hour descriptions instead of having to use intelligent keys and some other hard-wired translation.

    Addendum (2012-11-15 10:21):
    I must say though, I may disagree with the prev_prior_etc columns though. Especially if those are a poor attempt at a slowly changing dimension. Using columns to mimic a fixed size slowly changing dimension is a terrible hack of a solution 99% of the time.

    Addendum (2012-11-15 10:27):
    Or is that prev_prior to connect to 2 years back?
    They really went to the extreme with the Year over Year over Year over Year reporting columns!
  • RichP 2012-11-15 10:12
    Hegel:
    Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

    describe hours;
    +---------------+---------------------+-----+
    | Field | Type | Null | Key |
    +---------------+---------------------+-----+
    | hour_id | guid | NO | PRI |
    | next_hour_ref | guid | NO | |
    | prev_hour_ref | guid | NO | |
    | name | varchar(255) | NO | |
    | value_12h | tinyint(1) | NO | |
    | value_24h | tinyint(1) | NO | |
    | is_night_hour | bool | NO | |
    +---------------+---------------------+-----+


    Last field should be named "is_midnight_hour" so one can do:

    ' Check in by wpickett
    ' Code reviewed by scropper
    do {
    wait()
    } until { is_midnight_hour }

    me->love("begin to shine")

  • Hegel 2012-11-15 10:26
    It seems obvious, that there is an equivalent need for a truth table with 3 rows in it.
  • JC 2012-11-15 10:40
    letatio:
    JC:
    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.


    This is only true if either you are terrified of joins (because you don't understand them) or you are normalising into the extreme.

    I believe the former applies to your case. Generally speaking, join equals denormalise (on demand). When selecting, a denormalised database gives the illusions of being easy to query. However, these queries probably don't consider all of the integrity problems denormalised storage is vulnerable to. Problems you may have to check for, unless someone extremely adept has written the code for anything that modifies data. Writing queries to take into account such possible anomalies may not only be hard, but may be impossible.

    In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.


    You clearly have no experience of DataWarehousing whatsoever. This has nothing to do with "Joins". Troll harder next time.
  • TGV 2012-11-15 10:42
    JC:
    Troll harder next time.
    I LOL'ed.
  • Michael 2012-11-15 10:44
    This is the first time that i have to comment on a Daily WTF post because i cannot agree with it at all. I'm working in a BI department of a large company and there are several reasons to use this kind of tables.

    For the hours of day table one thing you might want to have is a report like that shows the revenue for each hour (or each day) - the problem is that if you have no revenue for an hour all built-in functions (like to_char(date,'hh24') will just ignore it and it will be missing from the resultset. These tables solve the problem.

    select hour, sum(revenue)
    from tblHours left join tblRevenue on to_char(date,'hh24') = hour
    group by hour

    Other aspects that are much easier using these tables:
    * Custom Fiscal Year that cannot be represented using standard functions (starting 1st of Sept this year but 2nd of Sept next year)
    * Custom Months following a 4-4-5 calendar
    * Restricting to specific weekdays and not bothering with language settings (eg - first day of week can be Sunday or Monday - weekday can be Sunday or Sonntag - week counting can start with the first full week or the week with the first monday in a year - all of them depending on language settings of the client)
    * Changing the rules without being forced to change all the functions in the reports

    There are many more of these ...

    greetings

    Michael
  • foo2 2012-11-15 10:49
    DWDBA:
    To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

    1. reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

    2. It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.

    Yeah, this. I've done similar tables for the same reasons, speed and there's only so many times you can tell people how to get various bits out of a date.

    "Here, add this and pick what you need." That said, the table here is a bit over the top in its WTF-ness.

    (And a simple weekday/weekend flag doesn't allow for holidays etc.)
  • Anon 2012-11-15 10:53
    Exactly. The OP sounds like someone who understands relational databases for OLTP -- in which case this would be WTF code -- but doesn't quite grok a star schema for a data warehouse. The fact that _he_ could write date functions in code is missing two points: 1. The table can be used to populate e.g. dropdown lists in reports, for users. 2. With appropriate indexing the report query may well run faster.
  • db2 2012-11-15 10:57
    Having a time/date dimension table in a data warehouse is, as stated, perfectly normal.

    This one does appear to be going a bit overboard with columns, though. A lot of those are probably better handled by MDX functions like ParallelPeriod (which introduces its own special brand of leap-year WTFs).
  • Michael 2012-11-15 10:59
    I once worked on an enterprisey database where I found a table with multiple fields of one was "DayOfWeek" which for some strange reason had values 1 .. 31. A colleague of mine explained:"Yes, erm, we mislabeled that field."
  • zzalpha 2012-11-15 11:01
    Yeah, sorry buddy, but you clearly don't have experience in data warehousing. This is *precisely* how a date dimension should be designed. Using SQL server date functions to filter large volumes of fact data is utter madness. And that's ignoring the fact that things like the definitions for quarters, week numbers, and so forth, is variable and quite complicated to calculate... there's no way I'd want to be doing that at the database level at query time.

    Please, if you don't understand this stuff, go read a book or two by Kimball on data warehousing. Because this is *fundamental* to the way warehouses are designed.
  • Steve The Cynic 2012-11-15 11:09
    Ralph:
    The smart people already have done as you suggest. The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.

    HEY! There's nothing wrong with Guitar Exercises for Dummies! Well, except that it doesn't tell you much about Data Warehousing.

    Of course, I'm trying to learn how to play guitar, so I don't have that problem, and I know that the real purpose of USB ports is connecting Rocksmith cables.
  • Zapp Brannigan 2012-11-15 11:12
    Anyone up for a Inmon vs Kimball rumble?
  • Mike 2012-11-15 11:15
    This is just silly talk I realize but wouldn't it make sense for them to have the date table built into the OLAP/SQL engine directly if they insist on using a table. Ie you install you OS/SQL with a specified time format and as part of installation any stupid date table it wants gets built once as a system level table? Also I'm not convinced that even with good caching that it would be faster to have a table than just to compute the required value on the fly especially if you can push that off onto the client so you don't tie up your server figuring things like that out.
  • C-Derb 2012-11-15 11:27
    So TRWTF is leveraging a data warehouse dimension table to implement an email filter rule for an issue tracking system?
  • C-Derb 2012-11-15 11:30
    Could someone please denormalize this sentence?

    The Overloaded Time Table:
    Terje explained that he's fears grepping the source for references to this table as the knowledge may reason may very well kill him, and I certainly don't blame him.
  • Greg 2012-11-15 11:36
    Absolutely. The only WTF here is that the submitter is working on a data warehouse but doesn't understand data warehousing query optimization. Yes, you can use a date function, but the query optimizer has a much more difficult time optimizing queries against values it can only determine at runtime. Best just to compute the values in advance and maintain the table.

    The project manager was correct, although I'm sure he was just passing it along from a DBA who actually knew what they were talking about.
  • Paul Neumann 2012-11-15 11:38
    Truly, nobody has made a comment like "That table looks perfectly denormal to me" yet?

    Okay, here goes: That table looks perfectly denormal to me.
  • cellocgw 2012-11-15 11:57
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?


    Error: Hour Not Found

    Of course
  • Chris 2012-11-15 11:58
    The only WTF is the need for that type of table.
  • Cam 2012-11-15 12:10
    You've obviously never written any MDX or even know what it is, so I'm just going to keep laughing at your ignorant comment that doesn't make any sense.

    JC is spot on.
  • Zapp Brannigan 2012-11-15 12:21
    The third rule of data warehousing is to forget everything you know about relational databases. The first and second rules are to not talk about it.
  • Alan 2012-11-15 12:53
    Mike:
    This is just silly talk I realize but wouldn't it make sense for them to have the date table built into the OLAP/SQL engine directly if they insist on using a table. Ie you install you OS/SQL with a specified time format and as part of installation any stupid date table it wants gets built once as a system level table? Also I'm not convinced that even with good caching that it would be faster to have a table than just to compute the required value on the fly especially if you can push that off onto the client so you don't tie up your server figuring things like that out.


    So, here's the situation. First, the server is setup and designed specifically to be tied up figuring out things like that. Pushing things to the client sounds like a cool plan, until you need to push a billion pieces of data to the client to "compute the required value on the fly" to have the client then turn around and send back a billion computed rows of data so the server can then continue finding the appropriately related data.

    Also, it is definitely faster to compute the date settings once at data load in, and then not have to have them recomputed every time 1 of 100 different users wants to run 1 of 100 different reports.

    The big take away is this: Data warehousing systems are specifically designed to read and compute massive amounts of data, probably beyond what you're thinking because data is typically pulled from multiple sources and it is important to have historical data for trending.
  • David Aldridge 2012-11-15 13:10
    Yep, I don't think that there's anything abnormal or particularly inappropriate about this.

    The query optimiser often cannot work out how many of the dates in a table are Sunday without reference to a column storing day_of_week, and front-end query tools often do not have good syntax for working out these attributes on the fly.
  • Coyne 2012-11-15 13:28
    snoofle:
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
    HOUR_NOT_FOUND?


    Oh, sure, for spring. For fall, it's: DUPLICATE_HOUR_FOUND
  • OH GOD IT BURNS! 2012-11-15 13:47
    Ralph:
    ... The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.


    I believe they think "For Dummies" is actually a requirement.

    "Any idiot can use a computer. Many do."
  • starsky51 2012-11-15 13:58
    David:
    At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.

    They're called Business Users and without them, there'd be no market for data warehousing. God bless em!
  • Dimse 2012-11-15 13:58
    I study computer science, and we had a course on data warehouses, where we created timetables like this using that wizard.

    Can anyone explain why such a tablet is bad in a data warehouse context?
  • BlueBearr 2012-11-15 14:01
    From a security standpoint I'm disappointed that this article includes actual customer data like that. Where is the obfuscation?
  • Herp 2012-11-15 14:03
    Dimse:
    I study computer science, and we had a course on data warehouses, where we created timetables like this using that wizard.

    Can anyone explain why such a tablet is bad in a data warehouse context?


    Perhaps the submitter would like to, because everyone else is asking the same thing!
  • David 2012-11-15 14:31
    I guess the real WTF here is a boss who knows what he is doing... because the submitter clearly doesn't understand the purpose of this.

    In fact I've written something much like this. Sure there are Oracle/SQLServer functions that can compute the last business day of the month, but then there are the one-off situations: The day the storm knocked out power to the entire state on what would have been the last business day of the month, and now all our reports need to treat that day as a holiday. The day the accounting system crashed. When business unit A has a different holiday schedule than B. So what did I do:

    (a) write a bunch of functions to adjust the date for these one-off issues.
    (b) create a view with functions in the columns
    (c) realize performance sucks when you try to join against this
    (d) materialize the view... and its a table now.


    The other alternative (which I rejected) was to duplicate the last reported data with a new effective date. Problem is that the duplicated data can confuse other reports that need want to see unique entries.

    The big table of dates actually works pretty well:

    select calendar_table.reported_date, data_table.x from data_table, calendar_table where calendar_type='Adjusted_Business' and country='US' and business_unit='Boston' and data_table.effective_date=calendar_type.effective_date

    And the query will duplicate the entries you needed duplicated, while others can still access the raw (uncorrected and unduplicated data). Now everyone is happy and I only wasted a few megabytes on desks.
  • RB 2012-11-15 14:49
    Actually, the original request to create a time/date table which has all sorts of fields is completely accepted as best practice for data warehouses, and for good reason. Check out the works of Kimball and 'date dimensions' for info. You want to be able to drop a field called 'quarter' onto a report and have it aggregate facts to that level. You might not be working with a tool that allows or expects you to write the sql query to calculate the function- and indeed it might be quite slow in performance terms to have the query do all the work when it can just read it out. That's why denormalisation is good in a data warehouse, and that's why you do lots of extra work in the ETL when you can- to make the warehouse easy to understand and use, and perform fast when you use it.


    The table you include is a little bit over the top though!
  • Cognos Dummy 2012-11-15 15:07
    That's funny because I actually own the book "Data Warehousing for Dummies". i got it when I was learning Cognos many years ago.
  • TheCPUWizard 2012-11-15 15:27
    Dimse:

    Can anyone explain why such a tablet is bad in a data warehouse context?


    If someone has to explain why a tablet is a poor choice compared to a good SMP server, then all is lost.
  • C-Derb 2012-11-15 15:48
    Feels like we are still 60-70 comments short of reaching a consensus that this is not a WTF. This conversation has become eerily similar to "Blame Peter".
  • nelson_no_nose 2012-11-15 15:50
    Anonymoose:
    letatio:
    JC:
    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.


    In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.


    No, what I tink he really meant to say is that tables like this are pretty normal for the backing of a time dimension in a data warehouse. Seems like he understands data warehousing much better than you do.


    I think I understand what you are saying. If they didn't denormalise so much they wouldn't have enough data to earn the title warehouses?
  • DeliriousHippie 2012-11-15 15:53
    I also laughed at this. I work as BI consultant. In my work we always have calendar-table. One of my customer has 1 million customers, think about their fact-table. 'Push calculations to client side', huh.

    Captcha: saepius, almost intelligent human
  • TheRider 2012-11-15 16:22
    But, but, with such a table *anything* can happen!
  • erat 2012-11-15 16:30
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?


    I think daylight saving changes are the biggest WTF of all times. It creates so much confusion that cannot be properly represented. For example, a baby can be born at 2:30 AM in some country on a certain date, and die at 2:10 AM on the exact same date - and how do you record that properly, assuming that there are a ton of automated processes depending on those two datetime values?

    So what if the sun comes out one hour earlier/later (relative to the local clock), compared to with daylight saving enforced? Adjust business/school hours related to that rather, for fuck sake. If I could go back in time to murder the motherfucker who first thought of this, I would have.

    I'm tempted to make the same statement about timezones. So what if the sun comes out at, e.g. 2 AM in some country over there and goes down at 4 PM? We have those kind of oddities that we have to deal with anyway (e.g. close to north pole). Those guys up north still count 60 minuts for an hour. Using that timezone/daylight-savings crap of the logic, their hours should contain less or more minutes than 60, am I fucking right?
  • dkallen 2012-11-15 16:46
    erat:
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?


    <snip>

    am I fucking right?


    Does your aptitude for bedroom acrobatics depend upon whether DST is in effect?
  • Dann of Thursday 2012-11-15 17:10
    erat:
    am I fucking right?


    Nah, you're pretty angry about what amounts to a very minor inconvenience twice a year.

    Maybe try decaf? Or move to a state that doesn't do DST, like Arizona or Hawaii.
  • urbalt 2012-11-15 17:19
    erat:
    I think daylight saving changes are the biggest WTF of all times.


    I see what you did there.
  • Gekko 2012-11-15 18:02
    As ugly as it looks in a Normalized Data Store, that is classical Dimensional Data Store. Nothing bad about it if it is used in a Data Warehouse environment.
  • caecus 2012-11-15 19:03
    Dann of Thursday:
    erat:
    am I fucking right?


    Nah, you're pretty angry about what amounts to a very minor inconvenience twice a year.

    Maybe try decaf? Or move to a state that doesn't do DST, like Arizona or Hawaii.


    It's twice per year locally, but for a party handling this for multiple countries, it's that many times times 2. And that goes for the daylight-savings thingy.

    About timezones: I think it would be nice if same datetime values relates to the same moment everywhere.

    Am I fucking right?
  • danbruc 2012-11-15 19:20
    TRWTF is column YYYYMMDD of type varchar(11).
  • PiisAWheeL 2012-11-15 20:58
    C-Derb:
    Could someone please denormalize this sentence?

    The Overloaded Time Table:
    Terje explained that he's fears grepping the source for references to this table as the knowledge may reason may very well kill him, and I certainly don't blame him.
    It seems to be pretty denormalized to me but here:
    Terje red explained the her's fears grepping on top of the source behind references and with to this table he as think the knowledge the knowledge may reason find may very well kill him, and i certainly can't don't blame him.

    Now if you want it NORMALIZED... well, that is beyond my ability.
  • PiisAWheeL 2012-11-15 21:02
    danbruc:
    TRWTF is column YYYYMMDD of type varchar(11).
    That allows for up to 2 delimiters and a 5 digit Year.

    ie
    12492-11-23 for November 23rd, 12492 A.D.

    Its good right up until 99999 A.D. Then we have to worry about Y100K.
  • Denorman Diamond 2012-11-15 21:03
    Paul Neumann:
    Truly, nobody has made a comment like "That table looks perfectly denormal to me" yet?

    Okay, here goes: That table looks perfectly denormal to me.
    We were storing all duplications of those comments in a data warehouse but disk space overflowed.
  • AndyCanfield 2012-11-16 01:22
    What is "NextFactDayLocal"? If that means the next day that the local factory will be running, that table column makes sense. If it means the next day that the programmer will know his facts, you can initialize it all to NULL.
  • Matt Westwood 2012-11-16 01:59
    Mike S:
    Actually, one part of that sane. It's useful to store business days in the database because computing holidays can sometimes be iffy. (Especially if you're a multinational, and have to know if an arbitrary date is a business day in different countries.)


    What you would then do is write a dedicated function to return whether a particular date is a business day in a particular country. This will of course probably involve some sort of table somewhere (unless you can find an online resource to interrogate) and will probably require regular (probably annual) maintenance when you learn what the holidays are going to be for the regions you need to service.

    But then if you're storing the date data as defined in this WTF, you're probably going to have to do this anyway, and because of the unwieldiness of the table this will be a more difficult job.
  • Scarlet Manuka 2012-11-16 05:02
    Boring! Our BI date table has (I kid you not) 184 columns... I think about three of those are ours, the rest were all standard with the BI platform.
  • faoileag 2012-11-16 07:32
    PiisAWheeL:
    danbruc:
    TRWTF is column YYYYMMDD of type varchar(11).
    That allows for up to 2 delimiters and a 5 digit Year.

    Silly me for thinking YYYYMMDD might be a dead giveaway as for what format the dates in this column are supposed to be...
  • sqlblindman 2012-11-16 08:12
    Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.
  • Cc 2012-11-16 08:37
    Seems like a typical date dimension table to me. I don't see the wtf factor of this one.
  • chris 2012-11-16 10:27
    Coyne:
    snoofle:
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
    HOUR_NOT_FOUND?


    Oh, sure, for spring. For fall, it's: DUPLICATE_HOUR_FOUND

    It's surely simplest to store the extra hour from fall in the empty space in spring.
  • Paul Neumann 2012-11-16 10:34
    sqlblindman:
    Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.
    First and foremost Kimball Kool-aid is tasty. Furthermore, seeing every problem as
    INail problem = NailFactory.get(TNail.type);
    is perfectly standard nonsense for a depressingly large number as well, apparently.
  • IanAndNeilsDad 2012-11-16 13:42
    Paul Neumann:
    sqlblindman:
    Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.
    First and foremost Kimball Kool-aid is tasty. Furthermore, seeing every problem as
    INail problem = NailFactory.get(TNail.type);
    is perfectly standard nonsense for a depressingly large number as well, apparently.
  • hunter9000 2012-11-16 13:58
    Terje explained that he's fears grepping the source for references to this table as the knowledge may reason may very well kill him

    Hiring an editor who speaks English would do this site a world of good.
  • Jason 2012-11-16 14:57
    Date/Time dimension tables are very common in data warehouses and are recommended as a best practice for a large DW. 11,000 rows in a dimension table in a DW is peanuts compared to the rest of the database. Denormalization is the key to warehouse performance. I don't think this is very WTF-worthy.
  • Kim 2012-11-16 19:14
    Not a WTF. Real WTF is developer doing BI-related work without understanding BI-fundamentals.

    Think about it this way; the business user wants to create a report. His fact table has a granularity of thousands of transactions per day. He wants to be able to see the numbers of transactions summed up to the yearly total, but with the ability to 1: drill down to any level, so for instance he'll see something like this:
    2012
    +1st. Quarter
    -2nd. Quarter
    -+April
    -+May
    --June
    --+Week 22
    --+Week 23
    --+Week 24
    ----Monday
    ----Tuesday
    ----Wednesday
    ----Thursday
    ----Friday
    ----Saturday
    ----Sunday
    --+Week 25
    --+Week 26
    +3rd. Quarter
    +4th. Quarter

    And with the sums correctly calculated on each level shown. Imagine how poorly an SQL that would have to calculate the actual quarter, month, week for EACH date you have in your dataset, possibly millions (or even billions) of rows large.

    The reason why we build data warehouses using denormalized models is due to mainly two factors:
    1. Speed of query - it's much faster to query a denormalized structure that only considers data for read optimization.
    2. Ease of use for businesses, it's a much easier concept to grasp when you can pick from easy to understand groups of related hierarchical structures called dimensions, where you can actually ask the question : "how many sales did we have every month per region and per store, compared year-over-year with last years results" without really knowing SQL, cause "how many sales" = fact, region and store = organisational hierarchy, month and year-over-year= time hierarchy.
  • Aaron 2012-11-16 21:52
    Sorry, but this is a very common practice in DW applications and is actually recommended by many designs such as the popular Kimball methodologies.

    Sure, the database can do all the time functions for you, however this is harder for users to understand. Typically, a reporting tool like Business Objects allows them to drag and drop dimensions into the report to sort, pivot, etc. Having a selection of precalculated date and time fields is MUCH easier than building functions in the reports. Even the space of having a table with one row per minute for 30 years is nothing compared to the size of most fact tables.

    Not a FAIL at all....
  • Nick S. 2012-11-17 01:39
    +1 TRWTF is the poster.
  • billr 2012-11-17 09:14
    Hear, hear <g>. The number of IT folks who forget that those lusers pay IT salaries for the sole purpose of getting their work done more efficiently is depressing. If the user has to dedicate large blocks of time to learn about technology for its own sake, we failed.

    For folks insisting on using normalized structures instead of stars, imagine you need to query by day-of-month from a table containing billions of rows of timestamped events, a not uncommon requirement, and compute aggregates on the rows. Sure, you can select the computed day of month from the timestamp, if you have 20 or 30 minutes to wait. (Please don't suggest creating an index on a timestamp.)

    Or you could create an indexed view of the table, with the supporting structures needed, essentially recreating the structure of a data warehouse but with a much more inefficient design. Other benefits of this approach include saddling OLTP transactions with the overhead of updating these structures in real time, and enabling analytic and reporting tasks to interfere with OLTP so your LOB applications can slow down unpredictably.

    Or, you can learn how to use data warehouse techniques, present the structure the business needs in a data warehouse, achieve sub- or unit-second query responses, and decouple analytic processing from the ongoing transaction workload.

    Hmmm, I just can't make up my mind. Anyone have any Kool-Aid?


  • qbolec 2012-11-17 13:11
    I think that those who suggest using functions instead of such table, are missing one important thing : a table is a relation, not necessarly a function.
    You can use relation in many different ways, while functions are usually one-way.
    Consider a function getDayOfTheWeek(date) and comapre it to the relation between all dates and 7 days of week. Can you use the function to actually find all fridays?

    Are the following two queries equally fast:
    SELECT whatever FROM somewhere WHERE getDayOfTheWeek(day)=5

    SELECT whatever FROM somewhere, dates WHERE somewhere.day=dates.day AND dates.day=5

    No. And I do not mean that getDayOfTheWeek is slow or something. It's not about comparing constants behind the O(1) notation. It is about the fact, that the first query has to scan all rows, while the other one can first use the index on dates to find only these dates which are fridays, and then looup only those days in `somewhere`.

    This example is probably not so good as scanning every 1 of 7 rows is not quite fast in reality. But function getMonth -- now the rows which you are looking for form a nice slice on the disk.

    Above examples are purely fictional and I am not sure if in these cases join is really faster from using a function. The point is I believe there are situations in which this is faster, if the engine stores data column-wise etc.
  • Meep 2012-11-17 21:34
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?


    Nothing, since there is no "daylight savings". It's "saving," singular.
  • Meep 2012-11-17 21:47
    qbolec:
    I think that those who suggest using functions instead of such table, are missing one important thing : a table is a relation, not necessarly a function.
    You can use relation in many different ways, while functions are usually one-way.
    Consider a function getDayOfTheWeek(date) and comapre it to the relation between all dates and 7 days of week. Can you use the function to actually find all fridays?

    Are the following two queries equally fast:
    SELECT whatever FROM somewhere WHERE getDayOfTheWeek(day)=5

    SELECT whatever FROM somewhere, dates WHERE somewhere.day=dates.day AND dates.day=5

    No. And I do not mean that getDayOfTheWeek is slow or something. It's not about comparing constants behind the O(1) notation. It is about the fact, that the first query has to scan all rows, while the other one can first use the index on dates to find only these dates which are fridays, and then looup only those days in `somewhere`.


    Very astute observation, and that's why a proper relational database should implement functions as virtual tables. You would, of course, still have to implement the inverses, but that's usually not too hard. SQL, alas, isn't that clever.

    Above examples are purely fictional and I am not sure if in these cases join is really faster from using a function. The point is I believe there are situations in which this is faster, if the engine stores data column-wise etc.


    For a column-oriented system, you're looking for gains when doing scans of huge runs of data; it's a very different mentality than with typical transactional systems.
  • David 2012-11-18 17:31
    billr:
    If the user has to dedicate large blocks of time to learn about technology for its own sake, we failed.


    Rubbish! IT is the only professional discipline where people expect extremely complex systems to "just work" without any effort whatsoever on their part. Sometimes this is fair, but often it is not. If technology has failed at all, it has failed to manage user expectations so that users have some idea whether their demands are reasonable, or even (in some extreme cases) possible.

    billr:
    Hmmm, I just can't make up my mind. Anyone have any snake oil?


    FTFY.
  • billr 2012-11-19 16:13
    David:
    billr:
    If the user has to dedicate large blocks of time to learn about technology for its own sake, we failed.


    Rubbish! IT is the only professional discipline where people expect extremely complex systems to "just work" without any effort whatsoever on their part. Sometimes this is fair, but often it is not. If technology has failed at all, it has failed to manage user expectations so that users have some idea whether their demands are reasonable, or even (in some extreme cases) possible.


    Oh I think you've driven a car, watched TV or played a video game? Fairly complex, those items. I tend to think that IT (app design, actually) may be the only place where people can say that something is hard to use because it does complicated things, and people still let them get away with it.
  • David, who is abject before billr 2012-11-20 00:37
    billr:
    Oh I think you've driven a car...


    I have. It took months of lessons before I could pass my test. And I still wouldnt know how to rebuild a gearbox if it broke.

    If these things "just work" for you, you were evidently born with an intuitive knowledge of things that are even now almost beyond my ken. Truly I am an unworthy interlocutor.
  • Blorb 2012-11-20 11:36
    I agree. This sort of stuff makes life easier in data warehousing (though I don't grok the point of all the seq fields).

    Hour of Day and Day of Week dimensions? Very useful for aggregate fact tables and such. Month dimensions too.
  • DWalker59 2012-11-20 17:24
    Even a table of numbers is extremely useful.

    By Adam Machanic, who is a SQL guru:
    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/you-require-a-numbers-table.aspx

    The links from that page show cool uses of a numbers table. Date tables, as decribed in the original "wtf" here, have similar uses to a numbers table.
  • Neil 2012-11-21 10:36
    Hegel:
    It seems obvious, that there is an equivalent need for a truth table with 3 rows in it.
    Well, the boring answer would be

    TRUE
    FALSE
    NULL

    But perhaps you were thinking of an alternative third row?
  • Jimmy 2012-11-21 19:26
    To add to the previous posters, this is not a WTF. This is standard practice for data warehousing, as opposed OLTP.
  • anonymous 2012-11-28 02:18
    Nothing wrong with having a Fact date table in a data warehouse.
    I tend to create the primary key as an int rather than datetime which makes it much easier to use.
    i.e. 20121128 is the pkid of the table.

    Having control of the Fact date makes reporting and populating a data warehouse much easier.
  • Uncle Al 2012-11-28 16:44
    Jimmy:
    To add to the previous posters, this is not a WTF. This is standard practice for data warehousing, as opposed OLTP.


    Finally, a new TDWTF meme to replace "this is standard practice in embedded systems"!