How to keep your SSIS package ‘DRY’

desert-1So, you all know the deal… When it comes to coding there is one rule that all half decent programmers aspire to; Nope, it’s not ‘Never toss a dwarf’ which whilst being reasonably good advice is probably not quite relevant to normal weekly challenges at the coal face of code. It is this, Keep your code DRY

And we do, we really do. We all go to huge lengths using interfaces and the like to ensure that the same operations can be leveraged against differing, but similar kinds of objects. However when it comes to using SSIS for ETL work it always seems that all of these principles go flying out of the door, at least it seems to for me. SSIS is not best suited for DRY, I have lost count of the number of times I have defined workflows like this:-

Capture

So, my challenge was to import ‘n’ identically formatted Excel files (with different data) into the same database table after adding a few snippets of data. In the screenshot above I have only shown you two of the excel files to import, the client wanted 12 today, but up to 30 at some point in the future. Hmmmmm, so we discussed the matter further and it then became obvious that sometimes, just sometimes, they would only provide 1 file or maybe even none. It quickly became apparent that my clumsy transformation as defined above would not have worked for all of the following reasons:-

  1. Defining 30 import files would be lots of work, and they would then need to be merged together.
  2. Defining 30 files and then NONE of them, or just one of them resolving would have made the transformation fall over at run time with metadata issues.
  3. What if the client suddenly decided that they want 1000 files. I am NOT going to manually set all of those up
  4. I would also have to define 1 connection for each file…..
  5. There’s no flexibility. if they needed to support 30 files one week and ten the following as things stand we would have two different packages (or routines within the package)
  6. Most importantly, it violates DRY principles. Thus if I ever needed the change the way SQL was selecting data from the files I would need to make it to 30 transformations, or more even!

So, I had a drawing board to get back to…. Now as a seasoned SSIS user I must admit that I have never looked to deeply into some of the control flow components, topmost amongst these is the ForEachLoop Container

Capture

With a little bit of research I realised that this was exactly what I needed so without further ado I decided to implement a solution for my challenge using this component.I started at the start of course by firstly defining a new Excel connection (and ONLY one!) I set it to a file that represents the data that would be loaded at run time for each file loaded, remember that all of my import files are identical in format.

excelconnection

I would also need a couple of variables to allow me to store the Directory I am interrogating and to store the name of the file currently being processed in the Foreach Loop. These were defined like so:-

filepathvariable

filenamevariable

Now I could start to play with my Foreach Loop, I dragged a new component onto my Control Flow surface and then added a script component (more of that later) and my own workflow. The result was this little transformation process:-

IterateComponent

I then configured the Foreach Loop thus using the ‘Collections’ option :-

iterationcollectionconfiguration

As you can see I have created ForEach File enumerator which determines exactly what sort of behaviour this component will perform, Note how I have then created an Expression (using the expressions Editor Dialog) which allowed me to specify ‘Directory’ as the Expression to set and the value that it is being set to is the value stored in my ‘currentFilepath’ Variable. I could of course have just hard-coded the value by just using the Browse dialog further down the screen. In my instance I only want to use ‘xls’ file, or office 2003 format Excel files and so using the ‘Files’ field I can specify only to use *.xls files. Finally, I want all of the file paths returned to me to be Fully Qualified Path names and so that it what I have selected.

Then I needed to configure just what I should do with the value returned. In my instance I wanted to read the value (the fully qualified file name) into a variable for later use. We set that up using the ‘Variable Mappings’ as below scriptvariablereturnmappingsAs you can see we are setting the output result (with files index 0 is the ONLY option) into our ‘fileName ‘ variable. Once this has been done we have one job left to do, We need to programmatically change the connection string of our  ‘Current Financial’ connection to point to the file we are currently processing. This is what we use the script component for as mentioned earlier. So firstly we need to tell the script component what user variables it has access to:-

scriptvariablesFinally, we can now get to the scripting which is simplicity itself:-

scriptcodeAs you can see all we really do is obtain references to the Connection object and the ‘fileName’ variable (both by name). We then create a valid Connection string using our new file name and finally we set this new connection string into the ConnectionString property of the Connection object.

That’s it, we’re done and now running the code we can see that it doesn’t matter whether there are zero or a million files within the directory it will process them (in no particular order) and add them to our staging table ready for full adoption onto the system. I’m pleased with this solution, its DRY’d my transformations up nicely and given me the flexibility I would not have been able to achieve any other way.

Building Blocks – Objective C

abcYou can only work for so long in the world of Objective C before you become exposed to the concept of ‘Blocks’. A very powerful tool in the arsenal of any developer with which they can drastically increase the amount of code reuse that a developer can leverage in their program. Code reuse is of course a great thing as tried and trusted code can be relied upon to perform correctly and will need the minimum amount of testing whereas duplicated code tends to, as a former colleague of mine used to remark,  lead to the worst kind of inheritance; that is :-  ‘clipboard inheritance’ responsible for many a brainteasing bug. I have also seen it reported that Blocks effectively give you ‘lambda’ in Objective C which as a heavyweight user of LINQ in my preferred language C# I must admit makes me feel a little easier about taking on more Objective C work. They are however a little bit tricky to get your head around when you first come to dealing with them.

So on with building our blocks. In this example we have a function which will take an array of projects and will return them grouped by a particular key ( that the user may change) to allow them to be displayed in a MasterView. Note that I am not for the benefits of this example interested in the graphical implementation side of this only the supporting code that makes this possible.

So Let us start with the main ‘Hub’ function which deals with most of the work we are going to throw at our function.

groupbyhub

OK so line by line as you can see we have defined a function called ‘groupByHub’ which returns an NSMutableArray object. This in turn will contain NSMutableArrays containing projects grouped by the selected category. More on that later… The function takes two arguments:-

1. source being all of the projects our application can see.

2. groupFunc being the ‘block’ or piece of code that we are going to call in order to perform the grouping.

+(NSMutableArray*)groupByHub:(NSMutableArray *)source groupFunc:(groupByValue)groupFunc{

Ok So on to the following line we then create a NSMutableDictionary which is great for sorting our data out (although not so great for using in the UI layer I believe).

NSMutableDictionary *groupedData=[[NSMutableDictionary alloc] init];

The next line fairly simply iterates over all of the projects within our collection of projects.

for(Project *item in source){

We then use our ‘Block’ to obtain the value of the category pertaining to this particular project. We will examine the blocks themselves in a little while  

NSString *groupName = groupFunc(item);

The next few lines then deal with adding the project item to the relevant array within the dictionary. We see if an array for the current Group Name has been defined. If not we will create a new one and add it to the dictionary otherwise we will use the one already created. We will then add the ‘project’ item to the array we found/created. The Code for that looks like this:-

NSMutableArray *arr= [groupedData objectForKey:groupName];
if(arr == nil){
     arr = [[NSMutableArray alloc]init]; 
     [groupedData setObject:arr forKey:groupName];
}
[arr addObject:item];

We then iterate over all of the elements added to the dictionary and add them to the array that we wish to return. 

NSMutableArray *array = [[NSMutableArray alloc]init];
for(NSMutableArray *ar in groupedData.objectEnumerator){
     [array addObject:ar];
}
return array;

OK. SO that’s the meat and veg of our main function, it doesn’t however describe how the blocks are implemented so without further ado here is the main .h file:-

grouped

So defined below is the type defintion for our block, a very simple block by most standards but that is the point of examples! The name of the block is groupByValue, tt returns a string (the category value) and accepts a project item (which it will interrogate to ascertain the return value): 

typedef NSString* (^groupByValue)(Project *);

Note that we also define three other functions which will all in turn call the main groupByHub function passing across the ‘block ‘ to use to return data grouped in the appropriate manner. So now to the implementation:-

groupimp

We only really need to examine the code for our GroupByCustomerName function to see what is going on here. We start off by by defining the implementation of our block. The type name ‘groupByValue’ matches our typedef from the .h file as does the signature. The only real difference is that is this is an implementation we use a variable to refer to the block later in code. In this instance we use the variable name ‘gp’. The code within the block is extremely simple, given a project that is passed in we return the value of this group, in this instance ‘Customer Name’ but in other instances we also  return ‘Project Manager’ and ‘Account Manager’.
groupByValue gp=^(Project *item){
return item.CustomerName;
};

All that is then left is to call our main ‘groupByHub’ function passing across the data source and of course the block that we will use to perform the grouping

return [self groupByHub:source groupFunc:gp];

And Voila, There You go, Blocks… As easy as A,B and C.

As an aside, whilst I have tons of experience in C# I am a relatively inexperienced in the world of Objective C. Whilst I know that I will get there if there are any experienced Objective C coders who see omissions, mistakes or just better methodologies I would welcome your constructive criticism.