One of the nice features of SAP Data Integrator that DTS just cannot compete with is how it deals with datastores at run time. For an overview lets think about how the process will normally work…
An ETL developer will typically start out working on a development database, honing and coding to a particular database schema until he feels that it is ready for testing by the QA department. The (probably incomplete) database and ETL package will then be passed over the to the QA department for testing and the developer will likely continue coding and working against his development database.
For good reason the QA team will want to test integrations and data voracity against their own separate database, this is because they will want to test uninterrupted by the developer who will likely break functionality and data on an hourly basis like a heavily caffeinated gorilla with a keyboard. In addition the QA team will want to be able to keep track and debug why an issue raises its head which as we all know would be impossible if the QA and Development teams all used the same database.
The above development cycle is an iterative process, eventually the QA team will batter the development team with their own bugs and together as a team they will arrive at an ETL package which is good enough to release to the client. Again as you would expect the client would not wish to utilise the QA or development database’s and instead would expect their own clean and sacrosanct data which the developers should be kept away from at all costs.
So as you can see the ETL package should be able to deal with multiple legitimate data stores (for both destination and source data) and this is where the Microsoft offering breaks down. SSIS/DTS just cannot cope with this, I have a live and functioning DTS installation that makes use of Live, Test and Development databases. This is achieved by the invoker specifying ‘modes’ which then relate to different database configurations and locations; In my instance I actually store all but the destination data store in the destination database itself which is actually not great for promotion management at all. The point is, Im struggling to do it in a loosely coupled way without having huge unreadable parameter lists which I really didn’t want in this instance.
I recently came to do exactly the same with an SAP data services project, I did exactly as I normally would with DTS… ignored this part of the process totally and just defined my data stores in the normal manner hard coding them to my development database. Later I then came to think about the implementation and promotion strategy, rather than just rushing in and doing as I would normally do I decided to investigate the best methodologies that SAP had to offer; I was pleasantly surprised….
First lets take a look at the normal configuration of a datastore:-
Nice and simple, I have obfuscated the Server Name, Database Name and User Name but as you can see nothing controversial there. There is however towards the bottom of the screen a nice ‘Advanced’ button which hides a multitude…If we select this button you should see a dialog a little like this:-
As you can see this reveals a few extra settings many of which you’ll never use but more interestingly there is a button labelled ‘Edit’ next to a ‘Configurations’ label. It is this simple button that hides away Data Services ability to deal with multiple databases . If I open my configuration screen you can see that in each of the separate columns I have defined the following Data Store Configurations to deal with our development life cycle.
Lets take a quick look at what exactly you can do on this dialog, if we start with the blue arrow this button allows you to create a brand new configuration, the green arrow however is a much more useful item as this allows you to duplicate the currently selected configuration and the red arrow allows you to completely remove the currently selected configuration.
If we now take a look at the data itself, looking at the green rectangle you can see the name of the ‘Test Configuration'(sadly no spaces are allowed!) which may be edited by simply clicking in the header and typing. Below this, in the blue rectangular area, are the configuration details for the data store itself and as you can see the properties that you may edit are quite extensive; I have only really changed the database name as all my databases are on the same server.
Ok, this looks good but how do you go about running this. Well as far as schedules go we will cover this in a different blog post at a later date but as regards running it in normal execution mode. If I just pretend to run the package you can see that there is nowhere to specify which database to use, it would ordinarily sit in the indicated area :-
Lets investigate how to make this configuration option visible. The way SAP have designed this is good in that it again loosely coupled, you don’t specify data stores to use you instead specify a configuration profile to use which in turn houses all of the data configurations as well as the substitution parameter configuration itself. So how do we get to this? As you can see from the screen shot below if you look at the Tools menu there is a ‘System Configuration’ item. Lets press that and unleash the awesome….
A little drumroll and then a slightly underwhelming dialog like the following is displayed:-
I’m sure you can see where this is going though, There is a list detailing the different data sources within my project (as well as the substitution configuration). Its a very similar dialog to the Data Store Configuration screens in that if you press the indicated toolbar item you will create a new configuration profile, you may then select from your data store and substitution parameter configurations to create a configuration that will run in a certain manner and against a set selection of data stores. Lets configure a ‘Test’ System Configuration for the QA department.
And there you have it, a configuration called Test (actually TestConfiguration as I renamed it later) which uses the default Substitution Parameters configuration and points to the TestConfigurations for each of the 3 data stores which seems to make sense.
If we then save all of our changes and then attempt to run our transformations again we should now see a change on our dialog that allows for the selection of a ‘System Configuration’, we have only one defined but you can have as many as you wish. This is pretty awesome actually, much less unwieldy than the DTS options….