Incremental Update using Pentaho Kettle

Incremental update (SCD Type 1, Type 2 or Type 3) is the most corner stone developing ETL scripts for populating a Data mart.

Pentaho Kettle is a marvelous tool for data migration in very easy steps. I have been using Kettle for multiple Business Intelligence projects for data migration purpose and I found it very easy to learn and produce the desired out put with in no time. Kettle provides a very useful step Dimension Lookup-Update for incremental update your dimensions.

Dimension Lookup-Update step is used to implement SCD Types: Type I (update) and Type II (insert).
Not only can you use Dimension Lookup-Update to update a dimension table, it may also be used to look up values in a dimension.



Let say your source table is as follows

CustomerCode ChangedDate Address
Record 1 A 21/11/2008 Location A
Record 2 A 25/05/2009 Location B
Record 3 A 11/03/2009 Location C
Record 4 B 14/05/2009 Location X
Record 5 B 01/07/2009 Location Y

And this is how you would like to output that data to the dimension table

CustomerCode StartDate EndDate Address
Record 1 A 21/11/2008 25/05/2009 Location A
Record 2 A 25/05/2009 11/03/2009 Location B
Record 3 A 11/03/2009 NULL Location C
Record 4 B 14/05/2009 01/07/2009 Location X
Record 5 B 01/07/2009 NULL Location Y

We need to write a transformation that does the following

Table Input 1: Select MAX(StartDate) from dim_table

Table Input 2: Select * from source_table where EndDate > ?

Dimension Lookup / Update
Key: CustomerCode
Fields: Address
Technical Field Name: SK
Stream Date Field: ChangedDate

You would then end up with a table:
SK CustomerCode Version StartDate EndDate Address
1 A 1 21/11/2008 25/05/2009 Location A
2 A 2 25/05/2009 11/03/2009 Location B
3 B 1 14/05/2009 01/07/2009 Location X
4 A 3 11/03/2009 NULL Location C
5 B 2 01/07/2009 NULL Location Y

Dimension Lookup / Update will do a query for each row on:

Where stream.CustomerCode = dim_table.CustomerCode and dim_table.StartDate < stream.ChangedDate and dim_table.EndDate > stream.ChangedDate

You will likely get better performance if you sort the source data on ChangedDate (ascending) before pushing to the Dim Lookup / Update.

10 Responses to “Incremental Update using Pentaho Kettle”

  1. Good stuff. Nice to read some well written posts. A long way between them.

  2. Henriette Kight Says:

    “I feel a little sorry for those poor 8GB iPhone 3GS units that will be on display next to the iPhone 4 in Apple Stores around the world.” But I know many people wants to have iPhone even the first addition so this offer will be good for them ..

  3. thank you very helpfull

  4. nice examples for Incremental updates

  5. What will happen to open source now that Oracle has acquired java ?

  6. What i don’t understood is actually how you are not actually much more well-liked than you might be right now. You’re very intelligent. You realize thus significantly relating to this subject, made me personally consider it from so many varied angles. Its like women and men aren’t fascinated unless it’s one thing to accomplish with Lady gaga! Your own stuffs outstanding. Always maintain it up!

  7. Hello just wanted to give you a quick heads up. The words in your post seem to be running off the screen in Chrome. I’m not sure if this is a format issue or something to do with web browser compatibility but I thought I’d post to let you know. The layout look great though! Hope you get the issue resolved soon. Cheers

  8. Yesterday, while I was at work, my cousin stole my iphone and tested to see if it can survive a 25 foot drop, just so she can be a youtube sensation. My iPad is now broken and she has 83 views. I know this is completely off topic but I had to share it with someone!

  9. Im not sure what everyone else thinks but I agree with the author. Nice blog by the way.

  10. I just couldnt leave your web site prior to saying that I actually enjoyed the excellent quality information you provide to your visitors, May be back often to check up on brand new stuff you post!

Leave a Reply

Spam Protection by WP-SpamFree

Einfobuzz Site Map