SAP Data Services – Importing Image Data

Quite recently I was working on an SAP Data Services project which involved importing a great deal of data about fruits including photographs of the different varieties. There was a large amount of work to do and I must admit that whenever I thought about this photographic element of the import I developed a kind of coders twitch; I would hit google, notice the total lack of suggestion as to how this can be achieved and I would then push it very deliberately to the bottom of my pile. This is not normally how I work, I generally tend to do the hardest things first make stunning progress and then realise how much I had underestimated the complexities of the ‘simpler tasks’! But this was image data which is probably my least favourite thing, I like data… Eventually there came a point when I had no pile left under which to push this job any longer and so I had to assert myself to the job in hand.

Lets start by stating the problem, which lets face it, could have been much worse. We were given a spreadsheet containing many different varieties of fruits along with an awful lot of data, most of which is to be frank, completely irrelevant to this blog. This document also contained a relative path for each variety of fruit which described the location of a jpg file which was the picture itself. My task was to get all of this data into a database to allow the reporting services  to report against. So lets take a simplistic look at the data:-

As you can see we have a Variety Column which describes the name of the fruit Variety  along with a Product column which describes the Type of fruit. Row 1 is thus a variety of ‘Mango’ called ‘Amelie’ whilst Row 2 is a variety of ‘Passion Fruit’ called ‘Edulis’. For each of these varieties we also have a ‘Photo’ column which in this instance stores a relative path (from the context of the location of the document itself). The base path of this document is in this instance:-


Helpfully the first  row we look at in our earlier screen capture has a null value but the next row contains a valid relative path of

Exotic photos\Edulis.JPG

Using this relative path and the known location of the document itself we can thus build a path that details the location of the image relating to the Edulis Passion Fruit:-

K:\ImportFiles\Shared\Photos\Exotic photos\Edulis.JPG

Nice and simple, so we have the path now how do we go about translating that path into BLOB data  representing the actual image itself? This is where I struggled to find information, I scanned the function library looking for a ‘LoadImage’ method or similar before hitting the internet itself, after turning over some pretty scabby stones I found a forum post that was pretty illegible though it did contain a morsel of gold dust. Turns out there is no specific load function within the SAP library which I had already ascertained. There was however a fairly simple, but equally obtuse method for loading image data which involves writing the path name to a Text File  using a certain format and then opening and importing that data using a standard workflow. Lets investigate just how that works. We start by processing our spreadsheet data into a format that can be written to our text file, the transformation  I wrote looks like this:-

As you can see when you examine this first query I am merely removing any leading/trailing blanks and ensuring that null values are replaced with an empty string. Nothing tricky there nor even necessarily relevant to you.

The next operation is where the real work is done.

For the purposes of this demonstration we are only really interested in the ‘Photo’ field where as you can see we are building a string that looks a lot like this:-

<<K:\ImportFiles\Shared\Photos\Exotic photos\Edulis.JPG>>

The appending of the angle brackets will later tell SAP data services to treat the data contained within them as the path for an image that should be loaded. You needn’t worry yourself with the replace function that I have had to use, this is peculiar to my data alone. In reality your code will look more like this:-

'<<K:\ImportFiles\Shared\Photos\\' ||  Query.PHOTO || '>>'

This data can then be written out to a staging file, my file is formatted like this, note how the Photo field we are really interested in has been defined as a blob.

We then need to import the data, we thus create a new data flow looking much like this one using our new staging file type as the source.

The query component with regard to the Photo field itself simply maps the data together, in my instance I wanted to remove some blanks in another field which is why the query is even necessary and the data is then simply written into a blob field in the target database.

And that my friends, is mission accomplished…. This blog will self destruct in 10 seconds.


Lost in Time? Reach For The Time Tables


So….fairly recently whilst working on a regular clients data they expressed an interest in displaying times for use in their dimensional database. We are all too familiar with using Date tables in a dimensional manner but times are much less used, at least in my experience. It is with this in mind, and with an hour to kill due to your average ‘Microsoft 71% progress bar’ that I decided to blog exactly how I went about doing this as I was pleased with the results and the simplicity of the solution. Apologies, this solution is using an older version of SSIS but this methodology can be tailored for the ETL tool of your choice. First lets lake a look at the issue, we are all familiar with a date table where for each day of the year between x & y we represent each day with a single individual record. The table below shows how that would work starting from the 1st January 1900 (I’m nothing if not thorough!) Thus instead of using a date time column in our fact table to (for instance) detail a ‘Sail Date’ we can instead use the SID value to refer to a record within the DimCalendars table. The advantage of this is both speed and the fact that we can dice the date within this calendar table in many different ways (Week, Period,Month, Year, FinancialPeriods etc)


Thus we could also do the same with a DimCalendarTimesTable that would expose every second of the day from 00:00:00 to 23:59:59 thus allowing us to report on every single time that could potentially be encountered within a dataset. In our instance this data will look something like this starting from the top of the day:-



As you can see TimeToTheSecond reports every single second within the day (in order which can come in useful when dealing with push down in SAP Data Services… bygones). Time to the Minute only changes with every whole minute so that at 00:01:00.000000 is the value used for every minute from 00:01:00.000000 to  00:01:59.000000. This allows you to report on all events at say 13:53 without worrying about the seconds value. Taking this one stage further we also have a TimeToTheHourfield which only changes once every hour allowing us to report on every event in the 10:00pm window without caring what the second or minute values were. In addition we have two extra fields , SID being the unique identifier and TimeString being a special field that we join to in order to properly resolve the Foreign Key values in the first instance; the primary use of this field  is that it allows us to properly resolve NULL date values.


Below are displayed the tail of our DimCalendarTimes table along with that special null record. Note how the Time to the minute for all except the nulls are set to 23:59:00 and the TimeToTheHour is 23:00:00.


2017-04-10_1443You can of course dice this in many other ways (shift patterns for instance) as you can see,  but this is what we required. So how was this achieved and how complex… Well the answer is really not that complex at all. In fact it took just two Sql Statements:- The first to put in all of those times:-

INSERT INTO DimCalendarTimes (TimeToTheSecond,TimeToTheMinute,TImeToTheHour)
 CAST(DATEADD(second,Data.SecondsAfterMidNight,0) AS TIME) As UniqueSeconds, 
 CAST(DATEADD(minute,(Data.SecondsAfterMidNight/60),0) AS TIME) As UniqueMinutes, 
 CAST(DATEADD(hour,(Data.SecondsAfterMidNight/3600),0) AS TIME) As UniqueHours

(SELECT top (86400) 
 (ROW_NUMBER() OVER (ORDER BY AS SecondsAfterMidNight

FROM sys.sysobjects AS t1 cross join sys.sysobjects AS t2 ) AS Data

And the second to add that NULL value record

INSERT INTO DimCalendarTimes (TimeToTheSecond,TimeToTheMinute,TImeToTheHour)
 NULL As UniqueSeconds, 
 NULL As UniqueMinutes, 
 NULL As UniqueHours

Lets dissect that first SQL statement a little, we start off with the simple INSERT statement that determines which fields we are inserting:-

INSERT INTO DimCalendarTimes 

Nice and simple, we then need to generate these values which we do by generating one second for each in the day, there being 86,400 seconds in a day.  The 86400 seconds are generated as is usual in tally tables by cross going two large tables together (in SQL Server we generally use the sys objects tables) and generating a unique row number for each record.

      AS SecondsAfterMidNight
  FROM sys.sysobjects AS t1 cross join sys.sysobjects AS t2 
) AS Data

This sql statement will simply yield 86400 rows numbered from 0 to 86,399. We then simply need to generate the time values using the following simple pieces of date addition:-

 CAST(DATEADD(second,Data.SecondsAfterMidNight,0) AS TIME) 
     As UniqueSeconds, 
 CAST(DATEADD(minute,(Data.SecondsAfterMidNight/60),0) AS TIME) 
     As UniqueMinutes, 
 CAST(DATEADD(hour,(Data.SecondsAfterMidNight/3600),0) AS TIME) 
     As UniqueHours

They first column simply adds the row count value (or second) to the seed value of 0 and casts the resultant value to a TIME object. This will generate a true time starting at 00:00:00 right through to 23:59:59. The second and third columns  do exactly the same except that they will divide the result by 60 and 3600 respectively to give the time to the minute or to the hour. And that really is the crux of it. We of course also generate our null value record as below to allow us to resolve data rows with no time data (dimensional databases should ALWAYS resolve to a foreign key for reasons of performance.)

Resolving the times in our staging data into the foreign key is thus as easy as the following pseudo SQL. This will deal with both valid times and also with NULL times:-

FROM (SELECT CAST(S.TimeValue AS nvarchar(30)) AS CastTime,* 
      FROM StagingData AS S 
) AS S
INNER JOIN (select SID,CAST(TimeString AS TIME) AS TimeString 
            FROM DimCalendarTimes
) AS D
       ON COALESCE(LTRIM(RTRIM(UPPER(S.CastTime))),'--') 
        = COALESCE(LTRIM(RTRIM(UPPER(D.TimeString))),'--')

Perfectly simple. I hope that this escape into the dimensions of time has been both useful and informative. For me, it has certainly proved a distraction… I hear that 76% is the new 71% and besides, I feel sure that ‘Windows moon on a stick’ will be worth the tediom.


Scheduling the Awesome.


Welcome to the final part of my primer on creating a production package using SAP Data Services. In part 1 we looked at just how you would configure your data service repositories to suit team development, in part 2 we dived in a little deeper and created our central repositories and in part 3 we looked at how you would interact with these repositories to create ‘Releases’ of software ready for testing and for the production environment.

All being well you should currently be at the stage where you have created a QA central repository where your candidate release code will be helped. You should also have a local production repository from which the production code can be executed (remember central repositories cannot be used to execute code) The previous blog should than have walked you through the process of synchronising your local repository with the candidate release ETL code ready for execution.

So what now…. Well you can of course wake up really really early every morning, go to work and then run this package before your accounts department log onto the system to ensure that their reporting data is up to date before business opens, you may laugh….. Alternatively you may do what the majority of normal people would do and set up a scheduled service too run the ETL on a given schedule. Lets do this  instead of the former for the purposes of this blog, maybe I’ll cover the other one in a later blog 😉

We start by logging onto the machine that contains the SAP CMC, generally but not always, this machine also contains a program called ‘Data Service Manager’ like so:-


Once you click on this the following webpage should be invoked, you should log on using the appropriate login for your installation:-


At this point things might start looking a little familiar as we see for the second time at least the following screen…


Again we want to select the administrator option to interact with the service settings for the Data Services service application. Upon invoking this you should be presented with a tree view down the left hand side with a Batch item, this contains all of the batch instructions for each configured LOCAL repository. In this instance we want to set up a schedule against our production repository where our candidate release code is located. We thus open the tree like so and select the ‘Production’ repository.


To the right hand side of this tree view the following detail page should be displayed showing all of the latest package executions that have been invoked:- Yours may well be blank:-


We need to set up a batch job and so we need to select the ‘Batch Job Configuration’ tab as selected which will detail us a list of all of the ETL Packages that are available to run. In this instance we have only the one ‘bjSalesIntegration’. In order to set up a schedule we then select the ‘Add Schedule’ item as indicated.


Nice and simple so far, and indeed it remains so. We merely choose a name for the package and set it to ‘Active’ so that upon saving it is run. When it comes to selecting a scheduler I tend to choose the Data Services scheduler, no reasoning behind that, just why not! Now we come to the scheduling, this is relatively comprehensive in that you can choose:-

  1. Days of Week to perform the  ETL on.
  2. Days of Month to perform the ETL on.
  3. Execute once a day at a given time
  4. Execute from a given time of the day for the duration specified

In this instance we are running the package daily at 8:30.

In addition we can also specify to use a given ‘System Configuration’ which we have detailed in a previous blog.


Once you press apply the scheduled job should appear like so on the ‘Repository Schedules’ meaning that the scheduled batch job has been configured and may/may not be currently active.


And voila, you’re all set and ready to go. If you now look at the top batch job status for my configuration you will see that the last execution was successful (with warnings but I expect that) and started at approx 8:30pm as requested executing for about 6 and a half minutes.


You have now scheduled the awesome, the world is yours…. Get up late, thats an order!

Dogs, Guard Dogs & Toblerones



Tris in a pensive moment.

As is usually the case this blog post started life through a totally off topic conversation about whether stereotypes are always negative. My fiancee is currently taking a language degree and part of the course relates to  languages & culture. When at first challenged with this question initially you DO think that stereotypes are negative; then you start to delve just a little deeper and you start to realise they are not always. Moreover they are actually an essential part of life as well as being an innate component of the learning process. We are born pretty much a blank slate and slowly over time our brain fills with experiences, observations and nuggets of information, it is these that inform our world view. One of the most amazing things about the human being is our adaptability, it is innate within us and allows us to hit the ground running in any of the different cultural situations we find ourselves in. There are some medical conditions that from a very early age prevent us from assimilating within a social group but in the main we are all able to cope with whatever the crucible of life throws at us. It is my belief that the blank slate approach is critical, if we had some level of pre-programming that allowed us to ‘fit in’ culturally I believe that our adaptability would be severely compromised, culture is of course also constantly evolving. It is my belief that pre programming would hamper our learning model to a huge degree.

When I was about 2 or 3 years ago I was of course impossibly cute, but even then I had a liking for chocolate; the only difference was I had a metabolism that could offset my interest. The following short vignette is one of my earliest memories and in the past it has always perfectly summed up my second memorable experience of disappointment. I was in the bath (cue sniggers from the back row…) and left to my own devices I seem to recall (don’t judge… it was the 70’s, kids bounced really well back then). I remember that by the side of the bath towards the tap end was a triangular package, cream coloured and looking almost familiar to my dopamine receptors. I remember taking the package and opening it expecting to find a Toblerone instead finding some sort of hair care product (I think it was Clairol but I am not sure some 4o odd years later!). As I have already said until fairly recently I have in the past always associated this story with disappointment. I have now come to realise that there was also a critical part of the learning process  in play here, ‘Generalisation’ a form of stereotyping.

At that tender point in my life I had not got all of the tools that I needed to make a reliable informed guess as to what this object was but that didn’t stop 3 year old me from trying did it?  I had not, at this age, acquired the necessary linguistic skills to read the label; sure I may have been able to read the letters but I’m guessing that THE THING with a Toblerone to a 3 year old is that it is triangular…That alone at this early stage informed my choice. I also had no parent with me at that point in time to ask whether the sacred object was chocolate and more to the point… could I have some. Sure, there was probably a picture of a woman with nice hair on the wrapper but then again in our society, especially back then, packaging can often display attractive men/women to help  sell any product. The context of the bathroom also never came into my mind, I may have thought

"Chocolate?    In the bath?"

But then… as now, I can see no down side to that! The most important and informing thing was THAT triangular package. With hindsight my brain was performing a fairly fundamental skill inherent in our ability to learn, it was generalising and using ‘the available’ data to make an assumption about the world it found itself in… Preferably a chocolate rich one. That it was not, was a learning experience to me; In retrospect I would have learnt a few things:-

  1. Not all triangular packets are Toblerone.
  2. Toblerones are probably not found in the bathroom, unless you take them there.
  3. Toblerone wrappers probably don’t have women on them.

As with much that happened in earlier life after this point I have no idea what happened but it was certainly not an educational dead end. My generalisations with regard to confectionery would have been honed still further and over time as I encountered distinctive wrappers in a strange place I would have paid more heed to the context. The experience would have have further informed my world.


The Dogtor

This behaviour is not of course limited to humans, the animal world has exactly the same principles as anyone with a dog would testify too. Yes they can be happy as anything but their experience informs their world view. We have a beautiful chocolate labrador called Tris, slightly obsessed with water, walks and dragging towels around…. when there is no food at hand of course. If I am to don my trapper hat, and am not competing with <insert any food here>  she will quite correctly, in general, draw the conclusion that walking is in her immediate future. That is her world view informed by all the experience that living has taught her. Similarly a couple of the older girls are keen, as girls of a certain age are, to experiment with how they wear their hair. On a few occasions upon doing so and emerging from the black holes of their teenage universes’ Tris will bark at them confused by these strange people. For a split second she does not ‘know’ who they are. Her informed world does not contain teenage girls with mad hair, it just doesn’t, and for a split second she can’t read the faces either. Maybe they are pulling a slightly different face and maybe of course being a dog she just does not have quite the same complex ‘face software’ that we have.

In the same vein my Dad once had a big German Shepherd called General who was not a family dog at all. At around about the same time he also had a Jensen Interceptor which if you have never heard one has a very distinctive engine tone. This was sleepy Shropshire in the 1970’s and  7.2 Chrysler V8’s were in short supply. There were in fact at that time a grand total of two Jensens in Shrewsbury, my Dads being one and my ‘Uncle Jacks’ being the other. General thus became used to the engine tone of the car and as you could hear it from quite a distance above the general noise of the area he would know a long time before you arrived that you were coming. At some point General’s services were no longer required and so he was sold on to a local(ish) scrapyard. That’s where this story could have ended and the punchline for my story would never have existed…. However cars being cars, we use them. Some unspecified time later, my Dad had occasion to be in the area of this scrapyard late one night driving the Jensen, loving his shepherds as he does he drove slowly past the scrapyard looking to see if he could see General. The sound of the car spooked all of the dogs on guard that night with the result that they were all giving it some with the woof woof noise….. all except one that is, who just sat there with his head cocked to one side listening to the engine tone. It was of course General, he had generalised and correctly so on this occasion, that the engine tone heralded the return of his old master. With the available data  that HE had, he saw no threat, the other dogs obviously did not have this generalisation built in to their experience and so with out further data THEY could only surmise that a threat was imminent and unknown. Remember this was a scrapyard and so they must have heard cars of all sorts coming and going all the time, one  engine tone probably sounded like every other….. and more importantly  the engine tones were probably not attached to any important memories.

Generalisation is of course a thing in software too,  albeit slightly different and much more focused. We use interfaces to describe and encapsulate a common problem and to generalise behaviours, programmers then implement shared behaviours using abstractions and finally using a concrete class will define the specific behaviours of an entity. The core of the code will neither know nor care exactly how these concrete classes work, but it can generalise and rely on them adhering to a certain behavioural pattern defined by the base interfaces. Lets say for instance we have an IEmployee interface which our core system exposes, this interface could define one method called CalculateWageSlip which the system would call once a month in order to populate a WagesDue database table. External processes could then query this data table in order to generate the necessary banking records to effect salary payments. In essence this sounds simple, the wages preparation system iterates over all of the Employee Objects and instructs them to prepare their due salary. The system though here has generalised away the real complexities…

  1. ProductionEmployees may be paid according to how many hours they worked during the week and at what rates those hours were chargeable at.
  2. ExecutiveEmployees may be paid on Salary regardless of performance or actual hours worked
  3. SalesEmployees may be paid (in part) on results within a timeframe and according to complex commission structures which may vary according to length of service, sales levels and responsibilities.

In the above instance we  have three different ‘concrete’ employee types who are ‘generally’ all paid in the same way (by calling CalculateWageSlip). The logic within these three different classes would be very different though and each object type encapsulates its own rules. The point is the system has been able to generalise (through design) that employees are paid, I guess that Milton from Office Space would attract his own rules…. If you don’t know who Milton is, you’re probably not a techie.


SAP Data Integrator: Repositories in the Release Cycle


In the previous two blogs we looked at first, how to structure your repositories for the release cycle and secondly how to create your central repositories so that they work seamlessly with your local repositories.

If you take a look at the diagram below you’ll see exactly how the repositories sit together in the full release cycle. We start towards the left with the development users working on code, synchronising their changes through the Central Development Repository, refactoring until the point that the team are ready to make a release as a whole. The release can then only be made via the central repository, the QA user(s)  download the latest version of the ETL package to their local QA Repositories. From here the package will be tested in isolation from the development team. Once the two teams have, between them, arrived at a candidate release that is ready to be released into production the QA team will upload the tested candidate; they will do this from the release source residing within their local QA repository into the Central QA Repository. From this point forward the ops team will have at their disposal the latest code which they may roll into production by downloading it into the local production repository(s), the central repository that they connect to will not see any of the intermediate releases and so they cannot mistakenly grab a partial implementation. They may then set up schedules against this code base to automate the process of running live data captures.


This blog will demonstrate how to use the repositories to move data from one ecosystem (development) into another (QA) and make the resultant tested code available for the production Ecosystem. We start with the development team having made a candidate release available to the QA department. As a member of the QA team I would then open my local repository in Data Services


As this is a first release of the ETL Package in question if we look underneath our batch jobs you should see that we do not currently have the job within our repository. We therefore need to download the source from the central repository. In this instance we have not yet associated our central repository to our Data Services local repository and so we must do this before we go any further, we thus select the ‘Central Repositories…’ menu item from the tools menu.


This display the list of currently assigned central repositories, currently none. We thus press the Add button in order to add a new Central repository as indicated below.


This invokes the following dialog box again which upon logging in will detail the central repositories that are available to you. In this instance I want to obtain the latest development release code which will of course be located within the ‘Dev_Central_Repo’, we thus select the repo as indicated and press ‘OK’



Once control returns to the ‘Central Repositories’ window all being well your development repository should now be within the list. We now need to ‘Activate’ this repository, which permissions allowing will allow us to interact with this repository. You may have only one active central repository at one time.


Once the dialog is dismissed in order to see the new active central repository we will need to show the ‘Central Object Library’ window available from the ‘Tools’ menu


Upon showing and docking the ‘Central Object Library’ your screen should look a little like this, note that we can see the batch job that has been built for testing in this instance called bjSalesIntegration.


Central Repositories cannot be worked on or executed at all and so what we will need to do is to pull the source down into our local repository, this is achieved by right clicking on the appropriate job and selecting the ‘Get Latest Version’ (OR ‘Get By Label’ where you have been asked to use a specific release), a sub menu will  then be provided for which you will need to select  ‘Objects and Dependants’. This will download the job you have requested along with ALL dependant objects, data sources, etc.


Once the download has finished processing you should be able to switch back to you local repository and see that the Job you elected to download has been copied into your local repository.


Upon opening the job you should see the workflow as defined by the developers, if you then elect to run the transformation you should notice (as with the following dialog) that we do not appear to have any system configurations available which is odd…. There are two reasons for this:-

  1.  Substitution Parameters are not exported via the Central Repository synchronisations
  2. System Configurations are also not exported via the Central Repository

Please review these (up and coming) blog posts for details of how to export Substitution Parameters and System Configurations.


Once you have System configurations available to you (after reading the aforementioned blog posts) we need to configure them thus, select the ‘System Configurations’ menu item available from the Tools menu.



You should then see the following dialog displayed, this details the configured data sources and substitution parameters that can be configured, as you can see though in this instance there are no configurations selected. In order to create one you need to press the indicated toolbar item


I am going to create a ‘Test’ configuration. This will detail the data sources and substitution parameters that are to be utilised whenever I elect to run this ‘Test’ configuration. As you can see I have elected to utilise the ‘Default’ substitution parameter configuration and for each of the the three data sources i wish to use the ‘TestConfiguration’ which will likely point at a test version of the specified database.


Once these details have been saved upon electing to run the job again the following dialog will be displayed, note how we NOW have a system configuration dropdown from which we can select the configuration that we wish to use when executing the ETL package. This can in turn determine exactly how the package is being run, against  which data sources and into which destination are the results to be persisted.


Once the QA department are totally happy with the release made by the development team they will likely want to  make a release of this available to the production team. Looking at the flow diagram at the start of this article you can see that the QA team will thus want to move the job into the QA repository so that the production team can subsequently download the job into the live execution repository and run in production. The QA team thus need to deploy the source into the QA central repository which as we can see they do not currently have assigned as a central repository.  They this need to press the ‘Add’ button below


This will prompt them for their logo on credentials, this will bring up a list of the repositories that they can see. In this case they wish to connect to the QA Central Repository and so they select this one and press OK.


Once you have added the repository you should be returned to the Central Repositories dialog, you will then need to select the QA Central Repository and press ‘Activate’. Please note that ONLY one central repository can be active at one time (life’s tough enough without added complications right?)


Once you reopen the Central Repository window you should now see that we are now looking at the QA_Central_Repo and as you can see there are currently no batch jobs contained, this is exactly what we expect to see.


Switching back to our local test users repository we can then right click on the job and select ‘Add to Central Repository’, this gives us a few options; in this instance we want to add the object specified AND all of the dependent objects (that would be all of the underlying data flows and data sources etc).


In typical with many source control providers you are then asked for any comments to help you identify the changes are new to this release. It is GOOD practice to write something sensible here not just ‘Changes for next version’ which 12 months down the line will mean nothing at all!


And thats it, your source is now available for the Production staff to download in exactly the same way that the QA team downloaded in this blog. There is of course a lot more to source control systems than just adding and getting files but this should do to get you started in the world of ETL software release cycles.

SAP Data Integrator: Configure Repositories for the Development Cycle – Part 2.


In part one of this blog we looked at why we might possibly want to configure separate repositories to control the processes for releasing an SAP Data Services job. My feeling is that it is better by far to have two central repositories, one for Development and one for QA. With this in mind I’m going to walk through from start to finish creating, in this case, our QA Repository. We are using SQL Anywhere for our repositories in this example, keep in mind that a repository is in fact just a database with a specific schema. So lets start by first creating a database by right clicking in the Sybase Central server window and electing to ‘Create Database…’


This will invoke the database wizard screen, there are FAR too many screens to go through in this blog, in the main I elected to go with the default settings with the exception of the following screens. In my instance I needed to create a database on a particular server.


Note that I am creating the QA_Central_Repo.db file, a nice simple name that describes exactly what we are looking at.


I then specified a user name and password for the new database.


Once we have been through all of the screens and selected the finish button we should see our new database there in Sybase Central ready for use


We then need to set up a 64 bit ODBC connection for our database to allow the various SAP data services to see our database, so after bringing up the 64 bit ODBC manager we press the add button:-


In this instance we select SQL Anywhere 16 as this is our database of choice


Next we need to configure the ODBC name so we’ll use something nice and logical, QA_Central_Repo the same as our database name.


And now configure the ODBC connection to point to our database, in this instance the database is running on the same server.


And before we move on from they step its always good practice to test that our ODBC connection is configured correctly by testing the connection, like thus. If it is configured correctly you should get a message box a little like this.



Right, thats it for the ODBC connection. Things are now getting a little bit more exciting than dull old databases and ODBC connections. You should have a program on your machine called the SAP Data Services Repository Manager. This application allows you create new repositories from empty databases and upgrade existing repositories to the latest version. If we then open this programme….


A very uninspiring dialog like the one below is displayed. What we want to do is to create a ‘Central’ repository from our QA_Central_Repo database and so we configure the screen as below. Two important points to note:-

  1. We must ensure that the tick box ‘Enable Security’ is ticked as this is what gives us a ‘Secure Repository’.
  2. We are using the ODBC DSN to connect to the repository.


Once you are happy that you have entered the correct information press the create button and the Data Services Application will run a little magic and create the required schema to turn this database into an SAP Data Services repository.


That was nice and simple,  I bet you think we’re all ready to go now…. Well no, we now need to link the repository into the Data Services infrastructure which we do via the SAP Central Management Console, most often referred to as the CMC. We load the CMC up and then select the Data Services item from the dropdown list. This will display the infrastructure associated with the data services functionality. In this case upon selecting the ‘Repositories’ folder on the left you can see a list of available repositories to the right of the screen. What we need to do is to add our repository to this list, this we achieve by use of the button indicated by the green arrow


This will display the following form to allow you to adopt our new repository into the infrastructure. As you can see we specify the name of the repository and all of the relevant connection details, we also specify a ODBC DSN and then before we save the details we test to make sure that the details are entered correctly and that the repository can be contacted.


As you can see below once we press the ‘Save’ button our new repository is added to the list of those available, note that it is a ‘Secure Central’ repository which is exactly what we intended.


Hand on heart, I promise you we are nearly there but we do have one more job to do before we can initialise the awesome. If you try and connect your local repository now to this ‘Central Repository’ upon activating you will get a message saying that the repository cannot be activated due to user permissions, I’m paraphrasing… don’t shoot me. What we therefore need to do is to associate relevant users with the repository. What we therefore need to do is to log into the ‘Data Services Management Console’ commonly known as ‘DSM’. The url is enclosed in the green rectangle below


Once you have logged in, using your SAP administrator logon you should see the following page, we need to access the ‘Administrator’ functions as indicated below and so upon clicking on this :-


You should see a screen like this, If you open the ‘Central Repositories’ node you will see all of the available central repositories. In this instance we want to select the QA_Central_Repo as indicated.


We then select the ‘Users and Groups’ sub node and within the groups node I need to add a group (you cannot have a user without a group). SAP enforce group policies which is a good thing.


Rather simplistically, because I am a simple beast I create an administrators group as below and press the ‘Apply’ button, don’t worry about there being no users for a second…


I can then select the ‘Users’ tab to the right of the ‘Groups’ tab and again I press ‘Add’


You should now be able to select a user from the list of SAP users and then assign groups to them. We need to make sure that in this case they are a member of our ‘administrator’ group (or whatever you called it) that we just created


And there we have it, all done except for one tiny job which is because we are using SQL Anywhere. We need to ensure that the repository is mounted when the service starts. We thus add a line pointing to our repository and save the changes.


Before stopping the ‘Server Intelligence Agent’ in the central configuration manager (which has hooks into the database service)


We can then Stop and restart our SQL Anywhere Service, and then restart the ‘Server Intelligence Agent’



Well done for bearing with us through this process, you should now have a properly functioning SAP Data Services Central Repository. Obviously you need to do this for EACH central repository you want to set up, so ….. good luck. In part 3 of this blog we show how these repositories can be used in the development and release cycle.


SAP Data Integrator: Configure Repositories for the Development Cycle – Part 1.

dataAs I discussed in a related blog recently it is important to provide dedicated repositories within SAP Data Integrator to allow for a frictionless coexistence  between  Development, QA and Production. Its a paradigm shared between ETL and ‘proper’ development and after over twenty years in the software industry I don’t know a single positive story that came out of not having a firmly laid down, separated and well understood release and promotion policy. Typically with simple small web applications there will be 3 separate environments:-

The development environment which generally consists of one central source control repository for the code, many client developers each potentially with their own decentralised repository of code changes and then either one central or many satellite application databases. Added to this there may be many coding sprints occurring at once. You can see how left to its own devices this will never be a frictionless process with that many complexities involved.

The QA Environment tends to be more orderly, there may be one or many test environments (where we have multiple sprints). There will be no source controller with regard to the code itself though if there are automated tests these may well be in a source code repository.  There is likely to be only one database and  one web installation for each branch under testing. Much cleaner Im sure you’ll agree.

And then we get to the production environment for which there will be only one, or a few installations. The data we would expect to be kept clean and free of developers and testers. There will be no source control at this level; just backs ups of the executable and database environments.

To orchestrate and stage releases and promotions between these three quite separate environments  we would typically use continuous integration mechanisms to migrate databases, web environments and the build process when required.

I’ve had a little think about what challenges there are and the benefits to be had for managing the workflows within the SAP Data Services environment. I have also read best practice documents and the one thing we can all agree on is that nobody can agree on anything. With that in mind I think it is probably best to take things on a case by case basis.

Lets start by looking at the possible configurations. It is worth pointing out from the outset that central repositories cannot execute jobs, I mean…. at all, and this is a good thing.

What is not really in doubt is that you should have one standard repository per interactive user, that is to say that each developer should certainly have a repository in which to develop and test code. It then follows that if central repositories cannot execute transformations then each member of the QA team should also have a repository in which they can execute jobs for the test cycle. Finally these jobs will then migrate to the production environment and so we should have a production repository which can take care of running the live business processes.

We then need to start thinking about how code migrates from the development cycle to the test cycle and thence onto production, the obvious answer is of course ‘Central Repositories’ which is a cross between a standard repository and a source control system. How many central repositories is the hotly contested point in all of this.

  1. Single Repository Model
  2. Two Repository Model
  3. Three Repository Model

Let me start out by saying that I just cannot see the point of a three repository model, The development repository would contain all of the source with releases marked using a label. QA would then download the required code into their local repository and test to their little hearts content, being QA they will not make changes to the code themselves so it follows that the only reason they would ever upload to the QA central repository is when the job is cleared for promotion to production. Thus the QA central repository would contain all of the candidate releases.

All good so far, so then the Production user would then download the latest candidate release to the production local repository ready for execution. What then would  be the point of uploading this codebase once more to the production repository? Its both a dead end and in fact a complete duplicate of the QA repository.

For me the best option is the two repository system as it utilises a true promotion methodology, Production users cannot see development code and so cannot accidentally deploy it. In addition changes made by development cannot suddenly appear in what was thought to be production code and everything that ‘appears’ in the QA repository has been rigorously tested. The picture below details how a two repository deployment model would work.


In part 2 we will look at how we would start to create and configure a central repository, I’m going to take it for granted that you have already set up local repositories.