So….fairly recently whilst working on a regular clients data they expressed an interest in displaying times for use in their dimensional database. We are all too familiar with using Date tables in a dimensional manner but times are much less used, at least in my experience. It is with this in mind, and with an hour to kill due to your average ‘Microsoft 71% progress bar’ that I decided to blog exactly how I went about doing this as I was pleased with the results and the simplicity of the solution. Apologies, this solution is using an older version of SSIS but this methodology can be tailored for the ETL tool of your choice. First lets lake a look at the issue, we are all familiar with a date table where for each day of the year between x & y we represent each day with a single individual record. The table below shows how that would work starting from the 1st January 1900 (I’m nothing if not thorough!) Thus instead of using a date time column in our fact table to (for instance) detail a ‘Sail Date’ we can instead use the SID value to refer to a record within the DimCalendars table. The advantage of this is both speed and the fact that we can dice the date within this calendar table in many different ways (Week, Period,Month, Year, FinancialPeriods etc)
Thus we could also do the same with a DimCalendarTimesTable that would expose every second of the day from 00:00:00 to 23:59:59 thus allowing us to report on every single time that could potentially be encountered within a dataset. In our instance this data will look something like this starting from the top of the day:-
As you can see TimeToTheSecond reports every single second within the day (in order which can come in useful when dealing with push down in SAP Data Services… bygones). Time to the Minute only changes with every whole minute so that at 00:01:00.000000 is the value used for every minute from 00:01:00.000000 to 00:01:59.000000. This allows you to report on all events at say 13:53 without worrying about the seconds value. Taking this one stage further we also have a TimeToTheHourfield which only changes once every hour allowing us to report on every event in the 10:00pm window without caring what the second or minute values were. In addition we have two extra fields , SID being the unique identifier and TimeString being a special field that we join to in order to properly resolve the Foreign Key values in the first instance; the primary use of this field is that it allows us to properly resolve NULL date values.
Below are displayed the tail of our DimCalendarTimes table along with that special null record. Note how the Time to the minute for all except the nulls are set to 23:59:00 and the TimeToTheHour is 23:00:00.
You can of course dice this in many other ways (shift patterns for instance) as you can see, but this is what we required. So how was this achieved and how complex… Well the answer is really not that complex at all. In fact it took just two Sql Statements:- The first to put in all of those times:-
INSERT INTO DimCalendarTimes (TimeToTheSecond,TimeToTheMinute,TImeToTheHour) SELECT CAST(DATEADD(second,Data.SecondsAfterMidNight,0) AS TIME) As UniqueSeconds, CAST(DATEADD(minute,(Data.SecondsAfterMidNight/60),0) AS TIME) As UniqueMinutes, CAST(DATEADD(hour,(Data.SecondsAfterMidNight/3600),0) AS TIME) As UniqueHours FROM (SELECT top (86400) (ROW_NUMBER() OVER (ORDER BY t1.id)-1) AS SecondsAfterMidNight FROM sys.sysobjects AS t1 cross join sys.sysobjects AS t2 ) AS Data
And the second to add that NULL value record
INSERT INTO DimCalendarTimes (TimeToTheSecond,TimeToTheMinute,TImeToTheHour) SELECT NULL As UniqueSeconds, NULL As UniqueMinutes, NULL As UniqueHours
Lets dissect that first SQL statement a little, we start off with the simple INSERT statement that determines which fields we are inserting:-
INSERT INTO DimCalendarTimes (TimeToTheSecond,TimeToTheMinute,TimeToTheHour)
Nice and simple, we then need to generate these values which we do by generating one second for each in the day, there being 86,400 seconds in a day. The 86400 seconds are generated as is usual in tally tables by cross going two large tables together (in SQL Server we generally use the sys objects tables) and generating a unique row number for each record.
FROM ( SELECT top (86400) (ROW_NUMBER() OVER (ORDER BY t1.id)-1) AS SecondsAfterMidNight FROM sys.sysobjects AS t1 cross join sys.sysobjects AS t2 ) AS Data
This sql statement will simply yield 86400 rows numbered from 0 to 86,399. We then simply need to generate the time values using the following simple pieces of date addition:-
SELECT CAST(DATEADD(second,Data.SecondsAfterMidNight,0) AS TIME) As UniqueSeconds, CAST(DATEADD(minute,(Data.SecondsAfterMidNight/60),0) AS TIME) As UniqueMinutes, CAST(DATEADD(hour,(Data.SecondsAfterMidNight/3600),0) AS TIME) As UniqueHours
They first column simply adds the row count value (or second) to the seed value of 0 and casts the resultant value to a TIME object. This will generate a true time starting at 00:00:00 right through to 23:59:59. The second and third columns do exactly the same except that they will divide the result by 60 and 3600 respectively to give the time to the minute or to the hour. And that really is the crux of it. We of course also generate our null value record as below to allow us to resolve data rows with no time data (dimensional databases should ALWAYS resolve to a foreign key for reasons of performance.)
Resolving the times in our staging data into the foreign key is thus as easy as the following pseudo SQL. This will deal with both valid times and also with NULL times:-
UPDATE S SET S.DimTimeFK= D.SID FROM (SELECT CAST(S.TimeValue AS nvarchar(30)) AS CastTime,* FROM StagingData AS S ) AS S INNER JOIN (select SID,CAST(TimeString AS TIME) AS TimeString FROM DimCalendarTimes ) AS D ON COALESCE(LTRIM(RTRIM(UPPER(S.CastTime))),'--') = COALESCE(LTRIM(RTRIM(UPPER(D.TimeString))),'--')
Perfectly simple. I hope that this escape into the dimensions of time has been both useful and informative. For me, it has certainly proved a distraction… I hear that 76% is the new 71% and besides, I feel sure that ‘Windows moon on a stick’ will be worth the tediom.