The Daily WTF: Curious Perversions in Information Technology

2012-11-17 Reply Admin

Hear, hear <g>. The number of IT folks who forget that those lusers pay IT salaries for the sole purpose of getting their work done more efficiently is depressing. If the user has to dedicate large blocks of time to learn about technology for its own sake, we failed.

For folks insisting on using normalized structures instead of stars, imagine you need to query by day-of-month from a table containing billions of rows of timestamped events, a not uncommon requirement, and compute aggregates on the rows. Sure, you can select the computed day of month from the timestamp, if you have 20 or 30 minutes to wait. (Please don't suggest creating an index on a timestamp.)

Or you could create an indexed view of the table, with the supporting structures needed, essentially recreating the structure of a data warehouse but with a much more inefficient design. Other benefits of this approach include saddling OLTP transactions with the overhead of updating these structures in real time, and enabling analytic and reporting tasks to interfere with OLTP so your LOB applications can slow down unpredictably.

Or, you can learn how to use data warehouse techniques, present the structure the business needs in a data warehouse, achieve sub- or unit-second query responses, and decouple analytic processing from the ongoing transaction workload.

Hmmm, I just can't make up my mind. Anyone have any Kool-Aid?

2012-11-17 Reply Admin

I think that those who suggest using functions instead of such table, are missing one important thing : a table is a relation, not necessarly a function. You can use relation in many different ways, while functions are usually one-way. Consider a function getDayOfTheWeek(date) and comapre it to the relation between all dates and 7 days of week. Can you use the function to actually find all fridays?

Are the following two queries equally fast: SELECT whatever FROM somewhere WHERE getDayOfTheWeek(day)=5

SELECT whatever FROM somewhere, dates WHERE somewhere.day=dates.day AND dates.day=5

No. And I do not mean that getDayOfTheWeek is slow or something. It's not about comparing constants behind the O(1) notation. It is about the fact, that the first query has to scan all rows, while the other one can first use the index on dates to find only these dates which are fridays, and then looup only those days in somewhere.

This example is probably not so good as scanning every 1 of 7 rows is not quite fast in reality. But function getMonth -- now the rows which you are looking for form a nice slice on the disk.

Above examples are purely fictional and I am not sure if in these cases join is really faster from using a function. The point is I believe there are situations in which this is faster, if the engine stores data column-wise etc.

2012-11-17 Reply Admin

JM:
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?

Nothing, since there is no "daylight savings". It's "saving," singular.

2012-11-17 Reply Admin

qbolec:
I think that those who suggest using functions instead of such table, are missing one important thing : a table is a relation, not necessarly a function. You can use relation in many different ways, while functions are usually one-way. Consider a function getDayOfTheWeek(date) and comapre it to the relation between all dates and 7 days of week. Can you use the function to actually find all fridays?
Are the following two queries equally fast: SELECT whatever FROM somewhere WHERE getDayOfTheWeek(day)=5

SELECT whatever FROM somewhere, dates WHERE somewhere.day=dates.day AND dates.day=5

No. And I do not mean that getDayOfTheWeek is slow or something. It's not about comparing constants behind the O(1) notation. It is about the fact, that the first query has to scan all rows, while the other one can first use the index on dates to find only these dates which are fridays, and then looup only those days in somewhere.

Very astute observation, and that's why a proper relational database should implement functions as virtual tables. You would, of course, still have to implement the inverses, but that's usually not too hard. SQL, alas, isn't that clever.

Above examples are purely fictional and I am not sure if in these cases join is really faster from using a function. The point is I believe there are situations in which this is faster, if the engine stores data column-wise etc.

For a column-oriented system, you're looking for gains when doing scans of huge runs of data; it's a very different mentality than with typical transactional systems.

2012-11-18 Reply Admin

billr:
If the user has to dedicate large blocks of time to learn about technology for its own sake, we failed.

Rubbish! IT is the only professional discipline where people expect extremely complex systems to "just work" without any effort whatsoever on their part. Sometimes this is fair, but often it is not. If technology has failed at all, it has failed to manage user expectations so that users have some idea whether their demands are reasonable, or even (in some extreme cases) possible.

billr:
Hmmm, I just can't make up my mind. Anyone have any snake oil?

FTFY.

2012-11-19 Reply Admin

David:
billr:
If the user has to dedicate large blocks of time to learn about technology for its own sake, we failed.

Rubbish! IT is the only professional discipline where people expect extremely complex systems to "just work" without any effort whatsoever on their part. Sometimes this is fair, but often it is not. If technology has failed at all, it has failed to manage user expectations so that users have some idea whether their demands are reasonable, or even (in some extreme cases) possible.

Oh I think you've driven a car, watched TV or played a video game? Fairly complex, those items. I tend to think that IT (app design, actually) may be the only place where people can say that something is hard to use because it does complicated things, and people still let them get away with it.

2012-11-20 Reply Admin

billr:
Oh I think you've driven a car...

I have. It took months of lessons before I could pass my test. And I still wouldnt know how to rebuild a gearbox if it broke.

If these things "just work" for you, you were evidently born with an intuitive knowledge of things that are even now almost beyond my ken. Truly I am an unworthy interlocutor.

2012-11-20 Reply Admin

I agree. This sort of stuff makes life easier in data warehousing (though I don't grok the point of all the seq fields).

Hour of Day and Day of Week dimensions? Very useful for aggregate fact tables and such. Month dimensions too.

DWalker59 · 2012-11-20 Reply Admin

Even a table of numbers is extremely useful.

By Adam Machanic, who is a SQL guru: http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/you-require-a-numbers-table.aspx

The links from that page show cool uses of a numbers table. Date tables, as decribed in the original "wtf" here, have similar uses to a numbers table.

2012-11-21 Reply Admin

Hegel:
It seems obvious, that there is an equivalent need for a truth table with 3 rows in it.

Well, the boring answer would be

TRUE FALSE NULL

But perhaps you were thinking of an alternative third row?

2012-11-21 Reply Admin

To add to the previous posters, this is not a WTF. This is standard practice for data warehousing, as opposed OLTP.

2012-11-28 Reply Admin

Nothing wrong with having a Fact date table in a data warehouse. I tend to create the primary key as an int rather than datetime which makes it much easier to use. i.e. 20121128 is the pkid of the table.

Having control of the Fact date makes reporting and populating a data warehouse much easier.

2012-11-28 Reply Admin

Jimmy:
To add to the previous posters, this is not a WTF. This is standard practice for data warehousing, as opposed OLTP.

Finally, a new TDWTF meme to replace "this is standard practice in embedded systems"!

The Overloaded Time Table

Leave a comment on “The Overloaded Time Table ”