After spending several hours yesterday in the Transformer Stage of IBM’s DataStage 8.1, I was incredibly frustrated with the date functions. No matter what I did, I couldn’t seem to get the StringToDate function to work on my input file. The data was coming in a format of YYYYMMDD, but when I tried to use StringToDate([FieldName],”%yy%mm%dd”) it kept complaining of the data being in an invalid format. I was tasked with modifying three jobs in an hour and a half. It’s not a difficult task when you find something that works, but DataStage 8.1 seems to behave in a different manner than DataStage 7.5 and the old way was not allowing the job to run.

I went for a walk, picked my brain, and finally came up with a solution. Instead of using any date related functions in the stage variable portion of the transformer stage, I’d treat the data like a string. Since the data could be either a valid date (20091011) or 0, I used an if then else statement to handle the 0. If the record had a date of 0, it was set to 9999-12-31, otherwise I took the various pieces of the input string and concatenated the -’s in the date. The code for each staging variable date ended up being:

IF Len(FieldName) <> 8
THEN ‘9999-12-31′
ELSE FieldName[1,4]:’-':FieldName[5,2]:’-':FieldName[7,2]

Using DataStage’s concatenation character (:), I was able to pick apart the input string and convert it into a format of YYYY-MM-DD. At this point the data is still in string format. The len() function checks to see what the length of the input data is. Since we are dealing with data in YYYYMMDD format, any data having a length not matching 8 characters can be rejected as invalid. In the output link column, I then used the following code:

IF NOT(IsValid(”Date”,StageVariable))
THEN ‘9999-12-31′
ELSE StageVariable

This code seems to work perfectly. Say that ABCD1122 is passed. Going through the staging variable code would result in ABCD-11-22 which is an invalid date. The NOT(IsValid()) function takes care of that. Since ABCD-11-22 is not a valid date, it is converted to the default (9999-12-31) and loaded.

As you can see from the above method, we have a way to easily take the date and convert it to a proper format (YYYY-MM-DD). This can be done with any uniform input format. If the input is not uniform, it will be much more difficult if not impossible to validate the date. It is my very strong belief that you should ALWAYS be able to expect the same format for each row in the input file. If you have a record with a date in format YYYY-MM-DD, YYYYMMDD and YYMMDD, you need to go back to the developer of the input file and asked that it be cleaned up. There is no reason to pass a variety of different formats in the same column and doing so is only going to result in disaster.

Having said that, implementing this technique is rather easy. First, consult with the source file developer or take a look at the file yourself. Determine the possible values for the input and identify which format it is in. Next, validate the data and construct the input into the ideal format (YYYY-MM-DD). The IsValid function for dates ONLY accepts YYYY-MM-DD, therefore this is critical. Finally, implement the validation function in the output link. This ensures that you trap any garbage that might have been passed along.