• Zapp Brannigan (unregistered)

    Anyone up for a Inmon vs Kimball rumble?

  • Mike (unregistered) in reply to PM

    This is just silly talk I realize but wouldn't it make sense for them to have the date table built into the OLAP/SQL engine directly if they insist on using a table. Ie you install you OS/SQL with a specified time format and as part of installation any stupid date table it wants gets built once as a system level table? Also I'm not convinced that even with good caching that it would be faster to have a table than just to compute the required value on the fly especially if you can push that off onto the client so you don't tie up your server figuring things like that out.

  • C-Derb (unregistered)

    So TRWTF is leveraging a data warehouse dimension table to implement an email filter rule for an issue tracking system?

  • C-Derb (unregistered)

    Could someone please denormalize this sentence?

    The Overloaded Time Table:
    Terje explained that he's fears grepping the source for references to this table as the knowledge may reason may very well kill him, and I certainly don't blame him.
  • Greg (unregistered) in reply to JC

    Absolutely. The only WTF here is that the submitter is working on a data warehouse but doesn't understand data warehousing query optimization. Yes, you can use a date function, but the query optimizer has a much more difficult time optimizing queries against values it can only determine at runtime. Best just to compute the values in advance and maintain the table.

    The project manager was correct, although I'm sure he was just passing it along from a DBA who actually knew what they were talking about.

  • Paul Neumann (unregistered)

    Truly, nobody has made a comment like "That table looks perfectly denormal to me" yet?

    Okay, here goes: That table looks perfectly denormal to me.

  • (cs) in reply to JM
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?

    Error: Hour Not Found

    Of course

  • Chris (unregistered)

    The only WTF is the need for that type of table.

  • Cam (unregistered) in reply to letatio

    You've obviously never written any MDX or even know what it is, so I'm just going to keep laughing at your ignorant comment that doesn't make any sense.

    JC is spot on.

  • Zapp Brannigan (unregistered)

    The third rule of data warehousing is to forget everything you know about relational databases. The first and second rules are to not talk about it.

  • Alan (unregistered) in reply to Mike
    Mike:
    This is just silly talk I realize but wouldn't it make sense for them to have the date table built into the OLAP/SQL engine directly if they insist on using a table. Ie you install you OS/SQL with a specified time format and as part of installation any stupid date table it wants gets built once as a system level table? Also I'm not convinced that even with good caching that it would be faster to have a table than just to compute the required value on the fly especially if you can push that off onto the client so you don't tie up your server figuring things like that out.

    So, here's the situation. First, the server is setup and designed specifically to be tied up figuring out things like that. Pushing things to the client sounds like a cool plan, until you need to push a billion pieces of data to the client to "compute the required value on the fly" to have the client then turn around and send back a billion computed rows of data so the server can then continue finding the appropriately related data.

    Also, it is definitely faster to compute the date settings once at data load in, and then not have to have them recomputed every time 1 of 100 different users wants to run 1 of 100 different reports.

    The big take away is this: Data warehousing systems are specifically designed to read and compute massive amounts of data, probably beyond what you're thinking because data is typically pulled from multiple sources and it is important to have historical data for trending.

  • David Aldridge (unregistered) in reply to JC

    Yep, I don't think that there's anything abnormal or particularly inappropriate about this.

    The query optimiser often cannot work out how many of the dates in a table are Sunday without reference to a column storing day_of_week, and front-end query tools often do not have good syntax for working out these attributes on the fly.

  • (cs) in reply to snoofle
    snoofle:
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
    HOUR_NOT_FOUND?

    Oh, sure, for spring. For fall, it's: DUPLICATE_HOUR_FOUND

  • OH GOD IT BURNS! (unregistered) in reply to Ralph
    Ralph:
    ... The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.

    I believe they think "For Dummies" is actually a requirement.

    "Any idiot can use a computer. Many do."

  • starsky51 (unregistered) in reply to David
    David:
    At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.
    They're called Business Users and without them, there'd be no market for data warehousing. God bless em!
  • Dimse (unregistered) in reply to PM

    I study computer science, and we had a course on data warehouses, where we created timetables like this using that wizard.

    Can anyone explain why such a tablet is bad in a data warehouse context?

  • BlueBearr (unregistered)

    From a security standpoint I'm disappointed that this article includes actual customer data like that. Where is the obfuscation?

  • Herp (unregistered) in reply to Dimse
    Dimse:
    I study computer science, and we had a course on data warehouses, where we created timetables like this using that wizard.

    Can anyone explain why such a tablet is bad in a data warehouse context?

    Perhaps the submitter would like to, because everyone else is asking the same thing!

  • David (unregistered)

    I guess the real WTF here is a boss who knows what he is doing... because the submitter clearly doesn't understand the purpose of this.

    In fact I've written something much like this. Sure there are Oracle/SQLServer functions that can compute the last business day of the month, but then there are the one-off situations: The day the storm knocked out power to the entire state on what would have been the last business day of the month, and now all our reports need to treat that day as a holiday. The day the accounting system crashed. When business unit A has a different holiday schedule than B. So what did I do:

    (a) write a bunch of functions to adjust the date for these one-off issues. (b) create a view with functions in the columns (c) realize performance sucks when you try to join against this (d) materialize the view... and its a table now.

    The other alternative (which I rejected) was to duplicate the last reported data with a new effective date. Problem is that the duplicated data can confuse other reports that need want to see unique entries.

    The big table of dates actually works pretty well:

    select calendar_table.reported_date, data_table.x from data_table, calendar_table where calendar_type='Adjusted_Business' and country='US' and business_unit='Boston' and data_table.effective_date=calendar_type.effective_date

    And the query will duplicate the entries you needed duplicated, while others can still access the raw (uncorrected and unduplicated data). Now everyone is happy and I only wasted a few megabytes on desks.

  • RB (unregistered)

    Actually, the original request to create a time/date table which has all sorts of fields is completely accepted as best practice for data warehouses, and for good reason. Check out the works of Kimball and 'date dimensions' for info. You want to be able to drop a field called 'quarter' onto a report and have it aggregate facts to that level. You might not be working with a tool that allows or expects you to write the sql query to calculate the function- and indeed it might be quite slow in performance terms to have the query do all the work when it can just read it out. That's why denormalisation is good in a data warehouse, and that's why you do lots of extra work in the ETL when you can- to make the warehouse easy to understand and use, and perform fast when you use it.

    The table you include is a little bit over the top though!

  • Cognos Dummy (unregistered) in reply to Ralph

    That's funny because I actually own the book "Data Warehousing for Dummies". i got it when I was learning Cognos many years ago.

  • (cs) in reply to Dimse
    Dimse:
    Can anyone explain why such a tablet is bad in a data warehouse context?

    If someone has to explain why a tablet is a poor choice compared to a good SMP server, then all is lost.

  • C-Derb (unregistered)

    Feels like we are still 60-70 comments short of reaching a consensus that this is not a WTF. This conversation has become eerily similar to "Blame Peter".

  • nelson_no_nose (unregistered) in reply to Anonymoose
    Anonymoose:
    letatio:
    JC:
    One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

    And for anyone who has written MDX, thats a VERY good thing.

    In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.

    No, what I tink he really meant to say is that tables like this are pretty normal for the backing of a time dimension in a data warehouse. Seems like he understands data warehousing much better than you do.

    I think I understand what you are saying. If they didn't denormalise so much they wouldn't have enough data to earn the title warehouses?

  • DeliriousHippie (unregistered) in reply to Alan

    I also laughed at this. I work as BI consultant. In my work we always have calendar-table. One of my customer has 1 million customers, think about their fact-table. 'Push calculations to client side', huh.

    Captcha: saepius, almost intelligent human

  • (cs)

    But, but, with such a table anything can happen!

  • erat (unregistered) in reply to JM
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?

    I think daylight saving changes are the biggest WTF of all times. It creates so much confusion that cannot be properly represented. For example, a baby can be born at 2:30 AM in some country on a certain date, and die at 2:10 AM on the exact same date - and how do you record that properly, assuming that there are a ton of automated processes depending on those two datetime values?

    So what if the sun comes out one hour earlier/later (relative to the local clock), compared to with daylight saving enforced? Adjust business/school hours related to that rather, for fuck sake. If I could go back in time to murder the motherfucker who first thought of this, I would have.

    I'm tempted to make the same statement about timezones. So what if the sun comes out at, e.g. 2 AM in some country over there and goes down at 4 PM? We have those kind of oddities that we have to deal with anyway (e.g. close to north pole). Those guys up north still count 60 minuts for an hour. Using that timezone/daylight-savings crap of the logic, their hours should contain less or more minutes than 60, am I fucking right?

  • dkallen (unregistered) in reply to erat
    erat:
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
    <snip>

    am I fucking right?

    Does your aptitude for bedroom acrobatics depend upon whether DST is in effect?

  • Dann of Thursday (unregistered) in reply to erat
    erat:
    am I fucking right?

    Nah, you're pretty angry about what amounts to a very minor inconvenience twice a year.

    Maybe try decaf? Or move to a state that doesn't do DST, like Arizona or Hawaii.

  • urbalt (unregistered) in reply to erat
    erat:
    I think daylight saving changes are the biggest WTF of all times.

    I see what you did there.

  • Gekko (unregistered)

    As ugly as it looks in a Normalized Data Store, that is classical Dimensional Data Store. Nothing bad about it if it is used in a Data Warehouse environment.

  • caecus (unregistered) in reply to Dann of Thursday
    Dann of Thursday:
    erat:
    am I fucking right?

    Nah, you're pretty angry about what amounts to a very minor inconvenience twice a year.

    Maybe try decaf? Or move to a state that doesn't do DST, like Arizona or Hawaii.

    It's twice per year locally, but for a party handling this for multiple countries, it's that many times times 2. And that goes for the daylight-savings thingy.

    About timezones: I think it would be nice if same datetime values relates to the same moment everywhere.

    Am I fucking right?

  • danbruc (unregistered)

    TRWTF is column YYYYMMDD of type varchar(11).

  • (cs) in reply to C-Derb
    C-Derb:
    Could someone please denormalize this sentence?
    The Overloaded Time Table:
    Terje explained that he's fears grepping the source for references to this table as the knowledge may reason may very well kill him, and I certainly don't blame him.
    It seems to be pretty denormalized to me but here: Terje red explained the her's fears grepping on top of the source behind references and with to this table he as think the knowledge the knowledge may reason find may very well kill him, and i certainly can't don't blame him.

    Now if you want it NORMALIZED... well, that is beyond my ability.

  • (cs) in reply to danbruc
    danbruc:
    TRWTF is column YYYYMMDD of type varchar(11).
    That allows for up to 2 delimiters and a 5 digit Year.

    ie 12492-11-23 for November 23rd, 12492 A.D.

    Its good right up until 99999 A.D. Then we have to worry about Y100K.

  • Denorman Diamond (unregistered) in reply to Paul Neumann
    Paul Neumann:
    Truly, nobody has made a comment like "That table looks perfectly denormal to me" yet?

    Okay, here goes: That table looks perfectly denormal to me.

    We were storing all duplications of those comments in a data warehouse but disk space overflowed.

  • (cs)

    What is "NextFactDayLocal"? If that means the next day that the local factory will be running, that table column makes sense. If it means the next day that the programmer will know his facts, you can initialize it all to NULL.

  • (cs) in reply to Mike S
    Mike S:
    Actually, one part of that sane. It's useful to store business days in the database because computing holidays can sometimes be iffy. (Especially if you're a multinational, and have to know if an arbitrary date is a business day in different countries.)

    What you would then do is write a dedicated function to return whether a particular date is a business day in a particular country. This will of course probably involve some sort of table somewhere (unless you can find an online resource to interrogate) and will probably require regular (probably annual) maintenance when you learn what the holidays are going to be for the regions you need to service.

    But then if you're storing the date data as defined in this WTF, you're probably going to have to do this anyway, and because of the unwieldiness of the table this will be a more difficult job.

  • (cs)

    Boring! Our BI date table has (I kid you not) 184 columns... I think about three of those are ours, the rest were all standard with the BI platform.

  • faoileag (unregistered) in reply to PiisAWheeL
    PiisAWheeL:
    danbruc:
    TRWTF is column YYYYMMDD of type varchar(11).
    That allows for up to 2 delimiters and a 5 digit Year.
    Silly me for thinking YYYYMMDD might be a dead giveaway as for what format the dates in this column are supposed to be...
  • sqlblindman (unregistered) in reply to Zapp Brannigan

    Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.

  • Cc (unregistered)

    Seems like a typical date dimension table to me. I don't see the wtf factor of this one.

  • chris (unregistered) in reply to Coyne
    Coyne:
    snoofle:
    JM:
    With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?
    HOUR_NOT_FOUND?

    Oh, sure, for spring. For fall, it's: DUPLICATE_HOUR_FOUND

    It's surely simplest to store the extra hour from fall in the empty space in spring.

  • Paul Neumann (unregistered) in reply to sqlblindman
    sqlblindman:
    Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.
    First and foremost Kimball Kool-aid is tasty. Furthermore, seeing every problem as
    INail problem = NailFactory.get(TNail.type);
    is perfectly standard nonsense for a depressingly large number as well, apparently.
  • IanAndNeilsDad (unregistered) in reply to Paul Neumann
    Paul Neumann:
    sqlblindman:
    Sure. This is perfectly standard nonsense for folks who have drunk the Kimball Kool-aid. A depressingly large number, apparently.
    First and foremost Kimball Kool-aid is tasty. Furthermore, seeing every problem as
    INail problem = NailFactory.get(TNail.type);
    is perfectly standard nonsense for a depressingly large number as well, apparently.
  • (cs)
    Terje explained that he's fears grepping the source for references to this table as the knowledge may reason may very well kill him
    Hiring an editor who speaks English would do this site a world of good.
  • Jason (unregistered)

    Date/Time dimension tables are very common in data warehouses and are recommended as a best practice for a large DW. 11,000 rows in a dimension table in a DW is peanuts compared to the rest of the database. Denormalization is the key to warehouse performance. I don't think this is very WTF-worthy.

  • Kim (unregistered)

    Not a WTF. Real WTF is developer doing BI-related work without understanding BI-fundamentals.

    Think about it this way; the business user wants to create a report. His fact table has a granularity of thousands of transactions per day. He wants to be able to see the numbers of transactions summed up to the yearly total, but with the ability to 1: drill down to any level, so for instance he'll see something like this: 2012
    +1st. Quarter
    -2nd. Quarter
    -+April -+May --June --+Week 22 --+Week 23 --+Week 24 ----Monday ----Tuesday ----Wednesday ----Thursday ----Friday ----Saturday ----Sunday --+Week 25 --+Week 26 +3rd. Quarter +4th. Quarter

    And with the sums correctly calculated on each level shown. Imagine how poorly an SQL that would have to calculate the actual quarter, month, week for EACH date you have in your dataset, possibly millions (or even billions) of rows large.

    The reason why we build data warehouses using denormalized models is due to mainly two factors:

    1. Speed of query - it's much faster to query a denormalized structure that only considers data for read optimization.
    2. Ease of use for businesses, it's a much easier concept to grasp when you can pick from easy to understand groups of related hierarchical structures called dimensions, where you can actually ask the question : "how many sales did we have every month per region and per store, compared year-over-year with last years results" without really knowing SQL, cause "how many sales" = fact, region and store = organisational hierarchy, month and year-over-year= time hierarchy.
  • Aaron (unregistered)

    Sorry, but this is a very common practice in DW applications and is actually recommended by many designs such as the popular Kimball methodologies.

    Sure, the database can do all the time functions for you, however this is harder for users to understand. Typically, a reporting tool like Business Objects allows them to drag and drop dimensions into the report to sort, pivot, etc. Having a selection of precalculated date and time fields is MUCH easier than building functions in the reports. Even the space of having a table with one row per minute for 30 years is nothing compared to the size of most fact tables.

    Not a FAIL at all....

  • Nick S. (unregistered)

    +1 TRWTF is the poster.

Leave a comment on “The Overloaded Time Table ”

Log In or post as a guest

Replying to comment #:

« Return to Article