Comment On The Overloaded Time Table

A few years ago, while on a SQL Server data warehousing project, my project manager sent me a request - create a table with all kinds of information based on a given date. You know – given a date, be able to figure out day of the year, day of the week, month, last business day of the week, and so on, making sure the table stores enough dates to keep it maintenance free. Oh, and one more thing, it’s going to be linked off of the already complex star schema that we were developing for. Why? For reporting purposes of course. [expand full text]
« PrevPage 1 | Page 2 | Page 3Next »

Re: The Overloaded Time Table

2012-11-17 09:14 • by billr (unregistered)
395106 in reply to 395008
Hear, hear <g>. The number of IT folks who forget that those lusers pay IT salaries for the sole purpose of getting their work done more efficiently is depressing. If the user has to dedicate large blocks of time to learn about technology for its own sake, we failed.

For folks insisting on using normalized structures instead of stars, imagine you need to query by day-of-month from a table containing billions of rows of timestamped events, a not uncommon requirement, and compute aggregates on the rows. Sure, you can select the computed day of month from the timestamp, if you have 20 or 30 minutes to wait. (Please don't suggest creating an index on a timestamp.)

Or you could create an indexed view of the table, with the supporting structures needed, essentially recreating the structure of a data warehouse but with a much more inefficient design. Other benefits of this approach include saddling OLTP transactions with the overhead of updating these structures in real time, and enabling analytic and reporting tasks to interfere with OLTP so your LOB applications can slow down unpredictably.

Or, you can learn how to use data warehouse techniques, present the structure the business needs in a data warehouse, achieve sub- or unit-second query responses, and decouple analytic processing from the ongoing transaction workload.

Hmmm, I just can't make up my mind. Anyone have any Kool-Aid?


Re: The Overloaded Time Table

2012-11-17 13:11 • by qbolec (unregistered)
I think that those who suggest using functions instead of such table, are missing one important thing : a table is a relation, not necessarly a function.
You can use relation in many different ways, while functions are usually one-way.
Consider a function getDayOfTheWeek(date) and comapre it to the relation between all dates and 7 days of week. Can you use the function to actually find all fridays?

Are the following two queries equally fast:
SELECT whatever FROM somewhere WHERE getDayOfTheWeek(day)=5

SELECT whatever FROM somewhere, dates WHERE somewhere.day=dates.day AND dates.day=5

No. And I do not mean that getDayOfTheWeek is slow or something. It's not about comparing constants behind the O(1) notation. It is about the fact, that the first query has to scan all rows, while the other one can first use the index on dates to find only these dates which are fridays, and then looup only those days in `somewhere`.

This example is probably not so good as scanning every 1 of 7 rows is not quite fast in reality. But function getMonth -- now the rows which you are looking for form a nice slice on the disk.

Above examples are purely fictional and I am not sure if in these cases join is really faster from using a function. The point is I believe there are situations in which this is faster, if the engine stores data column-wise etc.

Re: The Overloaded Time Table

2012-11-17 21:34 • by Meep (unregistered)
395115 in reply to 394932
JM:
With only 24 rows in the "HourOfDay" column, what do they do for daylight savings?


Nothing, since there is no "daylight savings". It's "saving," singular.

Re: The Overloaded Time Table

2012-11-17 21:47 • by Meep (unregistered)
395116 in reply to 395111
qbolec:
I think that those who suggest using functions instead of such table, are missing one important thing : a table is a relation, not necessarly a function.
You can use relation in many different ways, while functions are usually one-way.
Consider a function getDayOfTheWeek(date) and comapre it to the relation between all dates and 7 days of week. Can you use the function to actually find all fridays?

Are the following two queries equally fast:
SELECT whatever FROM somewhere WHERE getDayOfTheWeek(day)=5

SELECT whatever FROM somewhere, dates WHERE somewhere.day=dates.day AND dates.day=5

No. And I do not mean that getDayOfTheWeek is slow or something. It's not about comparing constants behind the O(1) notation. It is about the fact, that the first query has to scan all rows, while the other one can first use the index on dates to find only these dates which are fridays, and then looup only those days in `somewhere`.


Very astute observation, and that's why a proper relational database should implement functions as virtual tables. You would, of course, still have to implement the inverses, but that's usually not too hard. SQL, alas, isn't that clever.

Above examples are purely fictional and I am not sure if in these cases join is really faster from using a function. The point is I believe there are situations in which this is faster, if the engine stores data column-wise etc.


For a column-oriented system, you're looking for gains when doing scans of huge runs of data; it's a very different mentality than with typical transactional systems.

Re: The Overloaded Time Table

2012-11-18 17:31 • by David (unregistered)
395127 in reply to 395106
billr:
If the user has to dedicate large blocks of time to learn about technology for its own sake, we failed.


Rubbish! IT is the only professional discipline where people expect extremely complex systems to "just work" without any effort whatsoever on their part. Sometimes this is fair, but often it is not. If technology has failed at all, it has failed to manage user expectations so that users have some idea whether their demands are reasonable, or even (in some extreme cases) possible.

billr:
Hmmm, I just can't make up my mind. Anyone have any snake oil?


FTFY.

Re: The Overloaded Time Table

2012-11-19 16:13 • by billr (unregistered)
395196 in reply to 395127
David:
billr:
If the user has to dedicate large blocks of time to learn about technology for its own sake, we failed.


Rubbish! IT is the only professional discipline where people expect extremely complex systems to "just work" without any effort whatsoever on their part. Sometimes this is fair, but often it is not. If technology has failed at all, it has failed to manage user expectations so that users have some idea whether their demands are reasonable, or even (in some extreme cases) possible.


Oh I think you've driven a car, watched TV or played a video game? Fairly complex, those items. I tend to think that IT (app design, actually) may be the only place where people can say that something is hard to use because it does complicated things, and people still let them get away with it.

Re: The Overloaded Time Table

2012-11-20 00:37 • by David, who is abject before billr (unregistered)
395207 in reply to 395196
billr:
Oh I think you've driven a car...


I have. It took months of lessons before I could pass my test. And I still wouldnt know how to rebuild a gearbox if it broke.

If these things "just work" for you, you were evidently born with an intuitive knowledge of things that are even now almost beyond my ken. Truly I am an unworthy interlocutor.

Re: The Overloaded Time Table

2012-11-20 11:36 • by Blorb (unregistered)
395260 in reply to 394937
I agree. This sort of stuff makes life easier in data warehousing (though I don't grok the point of all the seq fields).

Hour of Day and Day of Week dimensions? Very useful for aggregate fact tables and such. Month dimensions too.

Re: The Overloaded Time Table

2012-11-20 17:24 • by DWalker59
Even a table of numbers is extremely useful.

By Adam Machanic, who is a SQL guru:
http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/you-require-a-numbers-table.aspx

The links from that page show cool uses of a numbers table. Date tables, as decribed in the original "wtf" here, have similar uses to a numbers table.

Re: The Overloaded Time Table

2012-11-21 10:36 • by Neil (unregistered)
395375 in reply to 394979
Hegel:
It seems obvious, that there is an equivalent need for a truth table with 3 rows in it.
Well, the boring answer would be

TRUE
FALSE
NULL

But perhaps you were thinking of an alternative third row?

Re: The Overloaded Time Table

2012-11-21 19:26 • by Jimmy (unregistered)
To add to the previous posters, this is not a WTF. This is standard practice for data warehousing, as opposed OLTP.

Re: The Overloaded Time Table

2012-11-28 02:18 • by anonymous (unregistered)
395659 in reply to 394942
Nothing wrong with having a Fact date table in a data warehouse.
I tend to create the primary key as an int rather than datetime which makes it much easier to use.
i.e. 20121128 is the pkid of the table.

Having control of the Fact date makes reporting and populating a data warehouse much easier.

Re: The Overloaded Time Table

2012-11-28 16:44 • by Uncle Al (unregistered)
395747 in reply to 395406
Jimmy:
To add to the previous posters, this is not a WTF. This is standard practice for data warehousing, as opposed OLTP.


Finally, a new TDWTF meme to replace "this is standard practice in embedded systems"!
« PrevPage 1 | Page 2 | Page 3Next »

Add Comment