The Daily WTF: Curious Perversions in Information Technology

2012-11-15 Reply Admin

I would post the frist comment, but I don't have time to dig through the APIs and write the interface to do it.

2012-11-15 Reply Admin

With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?

2012-11-15 Reply Admin

Why use sql date functions? They might add a new day to the week! This way, the application will continue to work ;-) LOL!

2012-11-15 Reply Admin

Maybe they have to change it every time daylight savings changes! That generates work! (It's good right? :P )

2012-11-15 Reply Admin

Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

2012-11-15 Reply Admin

Seems like there might be aware out. What if drops the table and creates a view by the same name, the view will just be a select from a table valued function that generates the dates?

Still ugly but might swap expensive disk I/O for comparatively cheap CPU.

2012-11-15 Reply Admin

One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.

And for anyone who has written MDX, thats a VERY good thing.

2012-11-15 Reply Admin

Wonderful, nearly perfect!

You know how Monday is dreaded because it's the first day of the working week. Suppose we vote to swap Mondays and Tuesdays starting from tomorrow: that table will do the work after a SET, while the rest of the world will have to wait for the next version of their RDBMS.

But something is still missing: the global chronological index! What if we decide that 27-Oct-2002 came before 3-Feb-1998?

CAPTCHA: praesent: indeed, talking about time: what comes after the paest and before the fauture.

2012-11-15 Reply Admin

That's pretty standard for datawarehouse tables... everything is pre-calculated, you don't do anything on the fly... plus this table may be small, but the ones with which you join this one may not...

Nagesh · 2012-11-15 Reply Admin

I completely believe all the tables are very much required for smooth operation of application's report features.

2012-11-15 Reply Admin

I know this kind of table very well. Most of the time i encounter it, it was created while creating a OLAP-Cube with Microsoft SQL-Server. The Wizard for creating cubes often notes, that there is no Time-Dimension Table and asks if the user wants to create this table. You can even check which of all those fields you want. It seems to me, this table was created exactly as such.

See this screenshot from the Wizard: [image]

2012-11-15 Reply Admin

Geoff:
Seems like there might be aware out. What if drops the table and creates a view by the same name, the view will just be a select from a table valued function that generates the dates?
Still ugly but might swap expensive disk I/O for comparatively cheap CPU.

Or you just buy 3MB more RAM (~10000 rows, ~300 bytes per row) and swap expensive work (including testing) for a few cents of hardware. :)

2012-11-15 Reply Admin

JC:
One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.
And for anyone who has written MDX, thats a VERY good thing.

This is only true if either you are terrified of joins (because you don't understand them) or you are normalising into the extreme.

I believe the former applies to your case. Generally speaking, join equals denormalise (on demand). When selecting, a denormalised database gives the illusions of being easy to query. However, these queries probably don't consider all of the integrity problems denormalised storage is vulnerable to. Problems you may have to check for, unless someone extremely adept has written the code for anything that modifies data. Writing queries to take into account such possible anomalies may not only be hard, but may be impossible.

In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.

2012-11-15 Reply Admin

To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help
It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.

2012-11-15 Reply Admin

letatio:
JC:
One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.
And for anyone who has written MDX, thats a VERY good thing.

In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.

No, what I tink he really meant to say is that tables like this are pretty normal for the backing of a time dimension in a data warehouse. Seems like he understands data warehousing much better than you do.

2012-11-15 Reply Admin

Actually, one part of that sane. It's useful to store business days in the database because computing holidays can sometimes be iffy. (Especially if you're a multinational, and have to know if an arbitrary date is a business day in different countries.)

2012-11-15 Reply Admin

Once there was a programmer who needed some date calculations. So he decided to use a table.

Then he had two problems.

2012-11-15 Reply Admin

Not a WTF at all.

This is best practice for a star schema DW.

2012-11-15 Reply Admin

Performing a joined query with a SIMPLE "date" table can indeed simplify the exercise tremendously, especially when you're interesting in a range of records being returned but the data is stored with from/to date values... it's the difference between a very simple query and a reasonably painful stored procedure or even application code...

snoofle · 2012-11-15 Reply Admin

JM:
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?

HOUR_NOT_FOUND?

mott555 · 2012-11-15 Reply Admin

This is standard for data warehouses. It only looks like a WTF to people used to working with normalized databases for production application use. Data warehouses are a completely different topic.

2012-11-15 Reply Admin

foo:
Or you just buy 3MB more RAM

I remember those days... when the cost of 4MB of RAM was roughly equal to 3 hours of contracting wages...

snoofle · 2012-11-15 Reply Admin

DonaldK:
foo:
Or you just buy 3MB more RAM

I remember those days... when the cost of 4MB of RAM was roughly equal to 3 hours of contracting wages...

True; these days, the paperwork to requisition it costs more than the memory itself.

2012-11-15 Reply Admin

This is a perfectly normal thing to do in a data warehouse. Once you've got a few million rows in a fact table, filtering and grouping by DATENAME() functions will get really slow, whereas looking up the values in a date dimension will perform just like the lookups on any other dimension table. It also allows you to capture all the other date-related information which there isn't a function for - public holidays and the like.

TGV · 2012-11-15 Reply Admin

Perfectly sane, except for DST problems.

2012-11-15 Reply Admin

The real WTF is that this is on TDWTF.

Date dimension tables are entirely normal and basically a mandatory standard in the data warehousing world. Actually, the problem with this particular table is that it's missing certain important things, such as IsWorkDay, IsHoliday and DST changes. The jury is still out on whether the primary key should be a natural (date) or a surrogate (int) in date dimension tables, as that's one of the major IT holy wars.

2012-11-15 Reply Admin

TRWTF is a data warehouse "developer" who doesn't know that pre-calculating all these values makes OLAP queries much quicker, which is the point of a data warehouse.

2012-11-15 Reply Admin

For those people that don't understand datawarehousing this table seems redundant but it is not. It will greatly reduce query times.

For those saying that you should not be scared of joins, please google "Snowflake" and you will see it is not a good thing when working with datawarehouses.

2012-11-15 Reply Admin

OK, the votes are now in and it looks like the whole of Florida has commented that "this is perfectly normal for data warehousing", twice. So TRWTF is Florida, then?

@Deprecated · 2012-11-15 Reply Admin

Hegel:

Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.

describe hours;

+---------------+---------------------+-----+
| Field         | Type         | Null | Key |
+---------------+---------------------+-----+
| hour_id       | guid         | NO   | PRI |
| next_hour_ref | guid         | NO   |     |
| prev_hour_ref | guid         | NO   |     |
| name          | varchar(255) | NO   |     |
| value_12h     | tinyint(1)   | NO   |     |
| value_24h     | tinyint(1)   | NO   |     |
| is_night_hour | bool         | NO   |     |
+---------------+---------------------+-----+

You should make all your columns VARCHAR. That way you can handle times like value_24h='16am'.

Cbuttius · 2012-11-15 Reply Admin

I see a time table, but don't see any reference of which train to catch.

2012-11-15 Reply Admin

DWDBA:
reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data.

At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.

Theres a lesson to be learned there. If only someone smart could work out what it is.

lanmind · 2012-11-15 Reply Admin

If you're not using a date table, and you think doing the calculations on the fly is better approach, then you're doing it wrong. This applies to normal database development every bit as much as data warehousing.

2012-11-15 Reply Admin

A Time Table inside an OLAP Cube. Is that a nature's harmonic simultaneous 4-day time cube?

2012-11-15 Reply Admin

This isn't a WTF...

2012-11-15 Reply Admin

My current implementation has several time dimension tables at different levels. So there's a day table, a month table, a quarter table and a year table. The tables get narrower as their scope gets wider. We maintain SQL whose sole purpose is to populate all the columns in all the tables, so if a column is ever added we know what all the values in all the tables will be.

Sure, the day table has 10k or 15k rows, but it mostly gets used during ETL, where most large input tables get rolled to the month level (at their most granular). Users who are crafting reports using the off-the-shelf GUI tool normally only use the quarter and year tables and performance is great for that.

I think these guys went a little off on their columns, but I can't say much because my day table has both the old-school Excel equivalent integer date and the new-school.

2012-11-15 Reply Admin

David:
At least 90% of WTFs seem to arise from attempts to make technology "easier" for people who have no aptitude, inclination or intention of learning the first thing about it.
Theres a lesson to be learned there. If only someone smart could work out what it is.

The smart people already have done as you suggest. The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.

galgorah · 2012-11-15 Reply Admin

DWDBA:
To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.

Your absolutely right. A Date dimensional table is very common and much much faster for large datawarehouse queries. Datawarehousing is very different from OLTP systems and standard rules of good vs bad change a lot.

Also in general using function calls within your sql code is a performance hit especially if its in the where clause, because it causes that function to be run for every row in the table. It's even worse when you nest them in a complex expression because it confuses the query optimizer and you'll often get a bad plan as a result. At least if the function call is within the list of columns to output, you only run it for those rows you are actually returning.

darkmattar · 2012-11-15 Reply Admin

Someone has obviously never worked on a good Star-Schema Data Warehouse before? A few of those columns are sketchy, but the majority are extremely useful, especially when dealing with a FISCAL calendar rather than a Julian one. Good luck doing SQL Server date functions to calculate the next month's start when months vary from 28 to 42 days in length with no tie to a regular calendar in any way.

Also, the dimension tables for all the time / date related information is to allow Cognos(the real WTF?) or other Business Reporting systems to print User Strings for date/time/hour descriptions instead of having to use intelligent keys and some other hard-wired translation.

Addendum (2012-11-15 10:21): I must say though, I may disagree with the prev_prior_etc columns though. Especially if those are a poor attempt at a slowly changing dimension. Using columns to mimic a fixed size slowly changing dimension is a terrible hack of a solution 99% of the time.

Addendum (2012-11-15 10:27): Or is that prev_prior to connect to 2 years back? They really went to the extreme with the Year over Year over Year over Year reporting columns!

RichP · 2012-11-15 Reply Admin

Hegel:
Only 2 fields in hours table? Now that's a real wtf, the following is absolute minimum.
describe hours; +---------------+---------------------+-----+ | Field | Type | Null | Key | +---------------+---------------------+-----+ | hour_id | guid | NO | PRI | | next_hour_ref | guid | NO | | | prev_hour_ref | guid | NO | | | name | varchar(255) | NO | | | value_12h | tinyint(1) | NO | | | value_24h | tinyint(1) | NO | | | is_night_hour | bool | NO | | +---------------+---------------------+-----+

Last field should be named "is_midnight_hour" so one can do:

' Check in by wpickett ' Code reviewed by scropper do { wait() } until { is_midnight_hour }

me->love("begin to shine")

2012-11-15 Reply Admin

It seems obvious, that there is an equivalent need for a truth table with 3 rows in it.

2012-11-15 Reply Admin

letatio:
JC:
One thing ive learnt is that the uglier, more denormalised a DataWarehouse backing table is, the easier the query is to write against the DW.
And for anyone who has written MDX, thats a VERY good thing.

This is only true if either you are terrified of joins (because you don't understand them) or you are normalising into the extreme.

I believe the former applies to your case. Generally speaking, join equals denormalise (on demand). When selecting, a denormalised database gives the illusions of being easy to query. However, these queries probably don't consider all of the integrity problems denormalised storage is vulnerable to. Problems you may have to check for, unless someone extremely adept has written the code for anything that modifies data. Writing queries to take into account such possible anomalies may not only be hard, but may be impossible.

In summary, what you mean to say is that you suck at joins and are unable to mentally visualise the intermediate dataset created by joins with ease. You need joins to be done for you before hand. I pity the fool who hires you.

You clearly have no experience of DataWarehousing whatsoever. This has nothing to do with "Joins". Troll harder next time.

TGV · 2012-11-15 Reply Admin

JC:
Troll harder next time.

I LOL'ed.

2012-11-15 Reply Admin

This is the first time that i have to comment on a Daily WTF post because i cannot agree with it at all. I'm working in a BI department of a large company and there are several reasons to use this kind of tables.

For the hours of day table one thing you might want to have is a report like that shows the revenue for each hour (or each day) - the problem is that if you have no revenue for an hour all built-in functions (like to_char(date,'hh24') will just ignore it and it will be missing from the resultset. These tables solve the problem.

select hour, sum(revenue) from tblHours left join tblRevenue on to_char(date,'hh24') = hour group by hour

Other aspects that are much easier using these tables:

Custom Fiscal Year that cannot be represented using standard functions (starting 1st of Sept this year but 2nd of Sept next year)
Custom Months following a 4-4-5 calendar
Restricting to specific weekdays and not bothering with language settings (eg - first day of week can be Sunday or Monday - weekday can be Sunday or Sonntag - week counting can start with the first full week or the week with the first monday in a year - all of them depending on language settings of the client)
Changing the rules without being forced to change all the functions in the reports

There are many more of these ...

greetings

Michael

2012-11-15 Reply Admin

DWDBA:
To all those who are saying this is unnecessary and wtf-worthy, let me tell you this is perfectly standard in a DW world. While this table is missing some columns that I would personally add in (eg. weekday vs weekend), it is still very useful for data warehouses for 2 reasons:

reporting tools can pick this up as a standard dimension very easily and quickly, making it intuitive for the (non-tech) user to get at data. I haven't seen a lot of reporting tools generate good sql or use functions properly, so this is a huge help

It actually makes querying a lot faster as well by avoiding on the fly calculations which can be very tedious if running against a large (think terabyte-size) fact table. This dimension table is small enough to fit in memory, so disk access is not an issue. However, if you're doing on-the-fly calculations or using functions against every row in a large fact table (quite the norm for data warehouses), it will slow down the query by a non-significant amount, besides causing cpu issues.

Yeah, this. I've done similar tables for the same reasons, speed and there's only so many times you can tell people how to get various bits out of a date.

"Here, add this and pick what you need." That said, the table here is a bit over the top in its WTF-ness.

(And a simple weekday/weekend flag doesn't allow for holidays etc.)

2012-11-15 Reply Admin

Exactly. The OP sounds like someone who understands relational databases for OLTP -- in which case this would be WTF code -- but doesn't quite grok a star schema for a data warehouse. The fact that he could write date functions in code is missing two points: 1. The table can be used to populate e.g. dropdown lists in reports, for users. 2. With appropriate indexing the report query may well run faster.

db2 · 2012-11-15 Reply Admin

Having a time/date dimension table in a data warehouse is, as stated, perfectly normal.

This one does appear to be going a bit overboard with columns, though. A lot of those are probably better handled by MDX functions like ParallelPeriod (which introduces its own special brand of leap-year WTFs).

2012-11-15 Reply Admin

I once worked on an enterprisey database where I found a table with multiple fields of one was "DayOfWeek" which for some strange reason had values 1 .. 31. A colleague of mine explained:"Yes, erm, we mislabeled that field."

2012-11-15 Reply Admin

Yeah, sorry buddy, but you clearly don't have experience in data warehousing. This is precisely how a date dimension should be designed. Using SQL server date functions to filter large volumes of fact data is utter madness. And that's ignoring the fact that things like the definitions for quarters, week numbers, and so forth, is variable and quite complicated to calculate... there's no way I'd want to be doing that at the database level at query time.

Please, if you don't understand this stuff, go read a book or two by Kimball on data warehousing. Because this is fundamental to the way warehouses are designed.

Steve The Cynic · 2012-11-15 Reply Admin

Ralph:
The smart people already have done as you suggest. The problem is, we're vastly outnumbered by people who buy the "for dummies" books with pride.

HEY! There's nothing wrong with Guitar Exercises for Dummies! Well, except that it doesn't tell you much about Data Warehousing.

Of course, I'm trying to learn how to play guitar, so I don't have that problem, and I know that the real purpose of USB ports is connecting Rocksmith cables.

The Overloaded Time Table

Leave a comment on “The Overloaded Time Table ”