Using Time Dimension

Submitted by nvovn on Wed, 03/22/2017 - 15:34

Performance issues regarding date related data are not that uncommon. Consider a date value:

dateToBeConsidered = 28 Feb 2017

Now I am interested in the following questions: Is dateToBeConsidered a german bank holiday, on a weekend, a Sunday?

In all these cases DWH artists have found a solution called time dimension.

Intermezzo: DWH people like to store data in a star shaped manner, where the facts table is the centre of the star and the dimension tables are the beams or satellites of the star.

The time dimension now has got a primary key which should be the date itself. It holds entries for every single relevant day, thus e.g. for all days between 1900 and 2100 (around 73,000 entries). If each day now has an boolean attribute called germanBankHoliday and this entry is once set correctly, we inevitably can answer question 1 by 

dateToBeConsidered = Date('28/02/2017');
isGermanBankHoliday = timeDimension[dateToBeConsidered]=>germanBankHoliday;

Every other question can be prepared and answered in the same manner. For other purposes you may store:

  • the name of the day
  • the name month
  • the day of the year
  • the german (!) week number (very weird to calculate)
  • the day of the week
  • the day of the month
  • leap year
  • etc.

Especially for reporting purposes it is much faster to calculate and store this once and reuse this simple table than to repeatedly calculate the results.

And for ad-hoc reporting you could even allow to let business build their reports to their liking including meaningful columns ("weekend") instead of complicated lookup table constructions.

 

Tags