Time…. Its constantly against us in every working day and it matters not how productive you manage to be somebody will always rain on your parade with ‘just one more job’ to eat away at your life blood. So here’s the deal… I’ll tell you how to save a little bit of time during your day but…. Shhhhhhhhh, keep quiet about! Maybe you could use the time you will doubtless save to start planning your takeover of the world, lets face it you could hardly make more of a hash of it than the current crop of lunatics are, could you?. Agreed? Good, now without wasting more time lets improve the world….
When it comes to Data Services one can spend an awful lot of time rerunning the same data imports over and over again and the cost added up over time can be considerable. Sure we can use the ‘Bypass’ functionality but as anyone with more than a passing familiarity to this can testify it can be more hassle than its worth; and of course once you start moving jobs between repositories everything goes further south than The Duke Boys on a day trip to the Alamo. In a chance conversation with my colleague Angus Menter (if he hasn’t lived there, it probably doesn’t exist) the frustrations of a particular user were relayed to me and I was asked how I coped with this. The answer is in fact relatively simple once you understand but there is a little bit of setting up.
So lets state the problem, I have a small but complex job that moves data from a line of business database to a data warehouse for reporting purposes. This is rather simplistically accomplished in 3 workflows.
- Destage Data (Move data from source database to destination database for reasons of performance)
- Create Dimension Table
- Create Fact Tables (using data created in step)
Now imagine that i had one massive view to move over in Stage 1, which I then needed to split into over 500 dimension tables in step 2 before creating one fact table referencing all of these dimension tables in step 3. Imagine now that Stage 1 takes 15 minutes to upload… It doesn’t take a genius to see that when it comes to designing and testing and running stage 2 that initial 15 minutes download is going to get very old very fast, as indeed will I. Ideally we would probably only want to run this initial stage once a day during the development process, if that. This we would accomplish by changing our main workflow to use ‘Conditional Workflows’ overarching our 3 main workflows.
We start by removing the standard workflows from our main canvas (don’t worry, you won’t lose them) and replacing them with three conditional workflows named in a similar manner to the deleted workflows as shown in the following image.
We then open up each conditional workflow in turn and using the local object library’s ‘Workflow’ tab we drag our previous normal workflow (named the same) into the ‘IF’ window of the conditional workflow. What this will mean is that if the logical statement (which we will write later) equates to true then we SHOULD run this workflow otherwise we will run whatever is in the ‘ELSE’ window, in this instance…. Nothing at all. Your 3 conditional workflows should look a little like the following 3 images.
Great, so thats really coming along but of course we need to manage the logic regarding whether or not these jobs run. How do we do this? Well, there are a number of ways that it could possibly be achieved but I think for me the easiest and most flexible way is to use ‘Substitution Parameter Configurations’. For this of you not familiar with these it is essentially a set of parameters that are assigned at run time and fed through to the executing job. The job may then query these values to find out ‘how it should run’. Lets start off by defining our parameters, we want three one for each workflow so that we can control whether or not each runs. We thus open the Substitution Parameter Configurations dialog from the tools menu as shown below.
We will then be furnished with a dialog like the following where we can define our three parameters, I’m calling them:-
And I think that their usage is fairly obvious by their names. We set them all to have the value YES indicating that as things stand they SHOULD run. We then press OK and dismiss the dialog. We then return to each of the Conditional workflows in turn and set the ‘IF expression using the ellipsis button surrounded with a green square toward the top right of the following dialog. We then flip to the variables tab, expand the Substitution Parameters item and drag our ‘$$DestageData’ parameter into the formula editor before finishing the formula as shown below:-
Note how both the parameter name and the value MUST be enclosed in single quotes as they are text values and not numeric. We then do this for each of the three conditional workflows as shown below:-
Great strides! We’re nearly there. In fact, lets just test that what we have created works as intended. If you save and run your job you should see that it does indeed function as it did previously but of course we are STILL running all of the jobs.
We could leave it at this point if you wished, and omit the running of a stage simply by opening and changing the Parameter Configurations which is perfectly acceptable. However, we will go a little further and with a little bit more set up eliminate the manual editing entirely. We start by once again loading up the Substitution Parameter Configuration dialog and renaming our Configuration to something a bit more obvious, in this instance ‘Run All’ and as you can see all of the stages are set to run.
We then create an exact duplicate of this configuration using the indicated button which we rename to RunAllExceptDestage. Yep as you’ve probably guessed we then change the ‘$$DestageData’ value to ‘NO’ indicating that it should NOT run.
Brilliant, so lets run it right? Hold on Tiger… we’re not there yet.
We then need to utilise the System Configurations as shown below:-
These are amazingly useful things that allow you to create configurations with associated substitution parameters and data stores, you could thus work against a live database, or a test one or run in a different manner entirely. In our instance we don’t have different data stores and so we are only really interested in changing the substitution parameter configurations that are used. We thus create two System Configurations and entitle them rather imaginatively ‘RunAll’ and ‘RunAllExceptDestage’. Each configuration then uses different substitution parameters meaning that one will execute the full process, whilst the second will omit the ‘Destaging’ operation.
Nice and simple, we save those changes and now upon executing our job we get a new dropdown to pick from:-
From this we select ‘RunAllExceptDestage’. Once executed you should see that we no longer run the Destage operation and instead jump straight to the Dimensions job instead.
Amazing eh? Enjoy your executions……