Why Do You Have To Be So Sensitive?

Time Crystals

For this latest blog I have my soon to be ex-colleague (Splitter!) Luke ‘Maximo Park’ Johnson to thank. He asked me a question with regard to SAP Data Services which has always bugged me but that I have never managed to resolve, A quick google search on my part confirmed my previous line with regard to this issue and I left it at that, he sent me a link sometime later which I investigated which contained a nugget that had thus far eluded me. So I investigated and his nugget became fully a fledged gold bar. I therefore decided to blog this in order that others may benefit from this knowledge…. and I won’t forget!

So, what exactly was this problem? The question he asked me was ‘Why does Data Services create all of its tables within the data store with an upper case name. This has been my biggest gripe by far (apart from the god awful SQL that gets generated!) because being an everyday human being we use shortcuts for everything, consciously and subconsciously. Let’s step back a little to step forward…. When I create a database table dealing with Product Derivations I have a few ways I can do this, I could use the following names:-

Product Derivations

Product_Derivations

ProductDerivations

Its very much a matter of taste of course but as I have an abundance of taste I would always use the latter of these options! Why? Well option 1 with its embedded space means that when it comes to using SQL (which is an environment I live extensibily in)  you would have to use [] or “” around every table name which is just hideous and more more more work

SELECT * FROM “Product Derivations”

Or…

SELECT * FROM [Product Derivations]

Eugh… So option 2 is definitely less work as regards to these extra characters but I left visual basic behind a long time ago. To be perfectly honest it’s just amateurish and doesn’t really aid in the readability, all those silly little underscores which to me break just serve to the message up

SELECT * FROM Product_Derivations

No, I would use the final Camel Case option every time, it says exactly what it needs to and no more nor no less. Data Services, however, gets in the way of this as out of the box (and seemingly until today) it creates all Template Tables fully upper case which is just a bit of a car crash…Wait, I hear you say, surely it doesn’t matter? But actually, it really really does…  It all goes back to those shortcuts that we subconsciously do. I have read a LOT of words in my lifetime, many many millions I suspect and I know for a fact that not one of us reads every single letter in a word before arriving at ‘the word’ unless we are just learning that word or training our brains like children do. In general terms we just don’t work like that, we shortcut using context and we also shortcut using word shapes the word ‘balloon’ looks a lot different to the word ‘battlestation’ and we see that shape pretty much instantly. When I am therefore looking through a list of potentially hundreds of database tables, all beginning with P, my eyes are scanning for all of those that start with a shape like ‘Product’ and then once that shape has been recognised the rest of ‘the word’ will also be matched meaning that matching the words ‘ProductDerivation’ is a far easier job than it otherwise would have been.

ProductDerivative

What happens though if we take the case sensitivity out of the equation?

productderivative

I’m sure you’ll all agree much harder to read…

And if we go to the opposite extreme and do what Data Services does and upper case everything things go from bad to shocking! Practically unreadable at speed as we don’t work in upper case really.

PRODUCTDERIVATIVE

So, the question is how do I make Data Services behave and give me the tables EXACTLY as I define them? Well, the answer is very simple. WE can make one configuration change and normal service is resumed. We start by locating the DSConfig.txt file which is located within the Data Services/conf folder. On my machine this was within this directory:-

C:\Program Files (x86)\SAPBusinessObjects\DataServices\conf

We then open this file up and look for the [AL_Engine] Section heading which will look like this:-

[AL_Engine]

AlEnginePathName =…..

You then need to add a new setting called ODBCAlwaysCaseSensitive and give this a value of ‘TRUE’

Your file should now look a little like this:-

[AL_Engine]

AlEnginePathName =…..

ODBCAlwaysCaseSensitive = TRUE

And that’s it as far as changes are concerned. We just save the changes before stopping and restarting the SAP Data Services windows service available from the Windows Services Manager dialog. Lo and behold, you’re all done. What you should now find is that upon running your data transformations any template tables that you have defined will now upon closer examination within SQL Server have been created using the casing you specified at design time. I don’t know about you but for me, this will now be the first thing I change in any new data service installations….

 

Advertisements

There Is Nothing Like A Dame…

88b7a6c9-city-11104-5512cd1b

So there I was on a Friday (I’m lying it was a Thursday actually) minding my own business and trying to implement a like clause in SAP Data Services thinking… It really shouldn’t be this hard! But it was, it turns out that the double pipe characters   (||) which are the usual concatenation expressions within Data Services aren’t valid in the join expression. This was slightly problematical as I needed to query with a like as well as with exact matches. Faced with my like clause problems I would see the day quickly slipping away from me. The choices were as follows:-

  1. 1. Move all of the logic for my transform out to a Stored Procedure . I have no doubt that this would work but would have resulted in much reworking as my data transformations were reasonably complex and there were many of them. In addition it always feel like a cheat too as you end up losing the Data Lineage which to be honest had already gone to hell in a hand cart in this instance!
  2. 2. Not use a like, nor give a fig and just give up and run away to the South of France for the rest of my life. In all honesty, I'm struggling to find a downside here...
  3. 3. Add a new derived which performed the concatenation and created a new field call FuzzyData. I could then use the phrase OR FieldToCompare LIKE FuzzyData. Theretically this could work but I could see this extra query step potentially preventing my pushdowns from working and with the size of this data set that would be completely unworkable.

Now if you’re anything like me you stopped reading at point 2, made for the airport and was enjoying a cheeky red before embarkation; actually scratch that. I think that a red is probably way too classy and haughty to be considered cheeky. But I digress…. If you are also like me I would guess that Mr Lloyds also probably had you forcibly ejected  from the aeroplane United Airlines style; Something about better repaying those airline tickets size dents in ‘his’ money

So lets state the problem, we have a set of spreadsheet data that contains data returned from a client pertaining to products that they have ordered which are defective in some way; this data has been moved into a staging table within our data warehouse to allow for performant querying. There is of course much data within this table but the main identification field is the barcode which is always in the format ‘nnnnnnnn’ n being a numeric value of the barcode as exposed to the client.

Spreadsheet Barcodes

07987621
00234562
09812344

Also within our data warehouse is a DimProducts dimension table which has loaded the full inventory from our own LOB database. The barcodes within this data are generally  in exactly the same format although there are a few that have 11 characters instead of the more usual 8 (format ‘nnnnnn-nnn’). The client however will only ever see the (first) 8 characters of any barcode including these anomalies.

Dimension Barcodes

07987621
00234562
09812344-908

The difficulty is that we wish to map the data contained within the ‘spreadsheet’ to the data within the ‘LOB database’ so that for every defective complaint sent out we can associate it to the product within our inventory. We can then contact manufacturers etc. Remember that in the main there will be a direct match between these two sources, it will only be ‘edge cases’ where the ‘spreadsheet’ data will match on the first 8 characters of our 11 character barcodes. As you can see from the sample barcodes above if we were to perform the below join we would only match on the first two records

 'Spreadsheet.Barcode = Database.Barcode'

The final barcode would not be resolved as 09812344 is not the same as 09812344-908. In order to match against this we would need a join like this:-

'Spreadsheet.Barcode LIKE Database.Barcode || '%''

The percentage is of course required as SQL server requires wildcard characters to utilise fuzzy match on a LIKE statement otherwise it is essentially just an equal to statement. Nice and simple so far but just try and use that phrase inside an SAP data flow join. Not very helpful errors ensue and the afternoon disappears in much the same way that my salary does… alarmingly fast.

As I alluded to earlier in this blog post the easiest way to address this is to split your processing into two joined queries, in the first we select only the fields we are interested in from our spreadsheet staging table and derive a new field entitled FuzzyBarcode. This field uses the following formula for its text

Spreadsheet.Barcode || '%'

This removes the need for monkeying around with pipe characters in the joins at a later stage. Next off in the second query we bring in the data from the first query and join that to the data within our products table using the barcode join as shown below:-

'Dimension.Barcode LIKE Spreadsheet.FuzzyBarcode'

This data can then be output into your fact table and as you should see something magical happens. Not only does your data resolve in the manner that you wished the execution of this workflow (so long as you have not done anything else too complex) gets entirely pushed down to the database leaving you with a statement Similar to this pseudo SQL

INSERT INTO FactComplaints (DimProductFK,RejectionReason)
SELECT D.Sid,S.RejectionReason 
FROM Spreadsheet AS S INNER JOIN DimProducts AS D
ON D.Barcode LIKE S.FuzzyBarcode

Perfect, a like clause within SAP BODS and everything gets pushed down to the database for maximum efficiency.

Selectively Executing Jobs with SAP Data Services

Anwander_Alfred_MPI_CBS-768x432

Time…. Its constantly against us in every working day and it matters not how productive you manage to be somebody will always rain on your parade with ‘just one more job’ to eat away at your life blood. So here’s the deal… I’ll tell you how to save a little bit of time during your day but…. Shhhhhhhhh, keep quiet about! Maybe you could use the time you will doubtless save to start planning your takeover of the world, lets face it you could hardly make more of a hash of it than the current crop of lunatics are, could you?. Agreed? Good, now without wasting more time lets improve the world….

When it comes to Data Services one can spend an awful lot of time rerunning the same data imports over and over again and the cost added up over time can be considerable. Sure we can use the ‘Bypass’ functionality but as anyone with more than a passing familiarity to this can testify it can be more hassle than its worth; and of course once you start moving jobs between repositories everything goes further south than The Duke Boys on a day trip to the Alamo. In a chance conversation with my colleague Angus Menter (if he hasn’t lived there, it probably doesn’t exist) the frustrations of a particular user were relayed to me and I was asked how I coped with this. The answer is in fact relatively simple once you understand but there is a little bit of setting up.

So lets state the problem, I have a small but complex job that moves data from a line of business database to a data warehouse for reporting purposes. This is rather simplistically accomplished in 3 workflows.

  1. Destage Data (Move data from source database to destination database for reasons of performance)
  2. Create Dimension Table
  3. Create Fact Tables (using data created in step)

2017-07-04_1240

Now imagine that i had one massive view to move over in Stage 1, which I then needed to split into over 500 dimension tables in step 2 before creating one fact table referencing all of these dimension tables in step 3. Imagine now that Stage 1 takes 15 minutes to upload… It doesn’t take a genius to see that when it comes to designing and testing and running stage 2 that initial 15 minutes download is going to get very old very fast, as indeed will I. Ideally we would probably only want to run this initial stage once a day during the development process, if that. This we would accomplish  by changing our main workflow to use ‘Conditional Workflows’ overarching our 3 main workflows.

We start by removing the standard workflows from our main canvas (don’t worry, you won’t lose them) and replacing them with three conditional workflows named in a similar manner to the deleted workflows as shown in the following image.

2017-07-04_1243

We then open up each conditional workflow in turn and using the local object library’s ‘Workflow’ tab we drag our previous normal workflow (named the same) into the ‘IF’ window of the conditional workflow. What this will mean is that if the logical statement (which we will write later) equates to true then we SHOULD run this workflow otherwise we will run whatever is in the ‘ELSE’ window, in this instance…. Nothing at all. Your 3 conditional workflows should look a little like the following 3 images.

This slideshow requires JavaScript.

Great, so thats really coming along but of course we need to manage the logic regarding whether or not these jobs run. How do we do this? Well, there are a number of ways that it could possibly be achieved but I think for me the easiest and most flexible way is to use ‘Substitution Parameter Configurations’. For this of you not familiar with these it is essentially a set of parameters that are assigned at run time and fed through to the executing job. The job may then query these values to find out ‘how it should run’. Lets start off by defining our parameters, we want three one for each workflow so that we can control whether or not each runs. We thus open the Substitution Parameter Configurations dialog from the tools menu as shown below.

2017-07-04_1248

We will then be furnished with a dialog like the following where we can define our three parameters, I’m calling them:-

  1. $$DestageData
  2. $$LoadDimensions
  3. $$LoadFactsData

2017-07-04_1251

And I think that their usage is fairly obvious by their names. We set them all to have the value YES indicating that as things stand they SHOULD run. We then press OK and dismiss the dialog. We then return to each of the Conditional workflows in turn and set the ‘IF expression using the ellipsis button surrounded with a green square toward the top right of the following dialog. We then flip to the variables tab, expand the Substitution Parameters item and drag our ‘$$DestageData’ parameter into the formula editor before finishing the formula as shown below:-

2017-07-04_1521

Note how both the parameter name and the value MUST be enclosed in single quotes as they are text values and not numeric.  We then do this for each of the three conditional workflows as shown below:-

This slideshow requires JavaScript.

Great strides! We’re nearly there. In fact, lets just test that what we have created works as intended. If you save and run your job you should see that it does indeed function as it did previously but of course we are STILL running all of the jobs.

We could leave it at this point if you wished, and omit the running of a stage simply by opening and changing the Parameter Configurations which is perfectly acceptable. However, we will go a little further and with a little bit more set up eliminate the manual editing entirely.  We start by once again loading up the Substitution Parameter Configuration dialog and renaming our Configuration to something a bit more obvious, in this instance ‘Run All’ and as you can see all of the stages are set to run.

2017-07-04_1304

We then create an exact duplicate of this configuration using the indicated button which we rename to RunAllExceptDestage. Yep as you’ve probably guessed we then change the ‘$$DestageData’ value to ‘NO’ indicating that it should NOT run.

2017-07-04_1306

Brilliant, so lets run it right? Hold on Tiger… we’re not there yet.

We then need to utilise the System Configurations as shown below:-

2017-07-04_1307

These are amazingly useful things that allow you to create configurations with associated substitution parameters and data stores, you could thus work against a live database, or a test one or run in a different manner entirely. In our instance we don’t have different data stores and so we are only really interested in changing the substitution parameter configurations that are used. We thus create two System Configurations and entitle them rather imaginatively ‘RunAll’ and ‘RunAllExceptDestage’. Each configuration then uses different substitution parameters meaning that one will execute the full process, whilst the second will omit the ‘Destaging’ operation.

This slideshow requires JavaScript.

Nice and simple, we save those changes and now upon executing our job we get a new dropdown to pick from:-

2017-07-04_1414

From this we select ‘RunAllExceptDestage’. Once executed you should see that we no longer run the Destage operation and instead jump straight to the Dimensions job instead.

2017-07-04_1416

Amazing eh? Enjoy your executions……