Sharing Knlowledge With The World…

Month: July 2013

Dynamic Column Ordering From Source File In Informatica

source_struc
Hello Readers,
                      Today we will be discussing a scenario which commonly occurs during ETL development. Lets say we have a source file which contains  Employee information as described in the following given text file.
source_info
SCENARIO:- The given source file will be available to you on daily basis, but the order of the given columns varies daily  such that the ordering of the columns doesn’t remain constant . Today we have ename in first position ,the next day it may appear in the last position. Hence you cannot implement the logic straight away as we know that you must know your column ordering in informatica to implement the further logic.
So, to overcome this current situation we have developed a mapping which will provide you a sure shot solution and you can re modify this mapping as per your business needs.
COMPLETE JOB DESC:-

overall_job_desc

SOLUTION:-

Take the number of mapping variable as per number of columns in your source file. In my case i have three columns so i have defined three variables of Integer data type named as $$var1,$$var2,$$var3 set initial value to 0 while defining them.
Actual Implementation will be done in Expression Transformation. 
So carefully look into the given snapshot for expression transformation.
Now in expression transformation create four expr_variables as defined they were used for setting up the values of mapping variables. var_count is used for setting up the values first time for all the mapping variables.
now for var_ename variable use the code given below.
The code given above will check for the first time whether count is one(ie the header row) if yes then it checks if the column name ename for the  first row and if  contains data as ‘ename’ then setvariable($$var1,1) similarly it checks all columns for data ‘ename’ and sets the any of $var which satisfies the condition. 
Same code is repeated all the times for all column names but only the difference is the checking condition. Like deptno it will check whether the columns contains ‘deptno’ and if yes then it sets setvariable($$var1,2)and so on.
*remember to pull the data from the first row of the file don’t ignore the first like otherwise it will not work however you can skip your first row after you recognized your columns using filter in the next component.

Now Finally create three output ports which will be used to forward your corrected data to your next transformation. So again look into the code given below.
Similarly, use similar condition in all your output ports like for ename use $$var1 any of the variable holding your ename data will be satisfied here similarly you can define for rest of your ports. Further on condition will change just a bit instead of $$var1=1 use have to use $$var1=2 or 3 whatever you have specified before.
O/P:-



Thank u for reading…


Continue Reading

Date Dimension Using Talend

first
Hello Readers,
                      In our previous post we discussed what is the use of Date Time Dimension in Warehousing environment and shown a demo displaying a Date-Time Dimension using Informatica.

So, in this post we will demonstrate the same implementation using Talend Open Studio.

COMPONENTS TAKEN:-

1)Oracle Db(tOracleConnection)
2)tMap(for data manipulation)
3)tConvertType(to change the data type)
4)tLogRow(Displaying Result)

COMPLETE JOB DESCRIPTION:-

job_info

Now,
         In tOracleInput component define two columns as a schema both must be of data type and in sql editor write the query given below.

“select sysdate+level,trunc(sysdate+level,’q’) from dual connect by level < 366”

first column will return you data serially from the current date and second column will give you first date for every quarter.

Most of the implementation is completed using existing  components.
To compute monthOfQuarter–
var monthOfQuarter=Mathematical.MOD(TalendDate.getPartOfDate(“MONTH”,row1.ora_date)+1,3)  
then
(Var.monthOfQuarter==0.0)?3.0:Var.monthOfQuarter 

To pull out Weekday in character format use–
TalendDate.formatDate(“EEEE”,row1.ora_date)

To calculate days from first date of quarter–
TalendDate.diffDate(row1.ora_date,Var.quaterFirstDate,”DD”)+1    


So, this is how you can implement a basic date-time dimension. Beyond this you can alter this current implementation as per your requirement.

Thanks for giving your precious time….kindly leave a comment  in case of any queries ..

Talend Interview Questions

Author : Ankit Kansal & Nayan Naik

Continue Reading

Date Dimension Using Informatica

Complete
Hello Readers,
                       Today we will be discussing one of the most important topic which is widely used in warehouse implementation that is Time/Date Dimension using Informatica.
It is  also widely acceptable as a Confirmed Dimension (A confirmed Dimension is a table whose attributes are widely referenced in multiple tables, as it can serve multiple fact tables).
Date-Time Dimension is used to analyze/report the data in terms of date and time up to its granular level.
Now, in our implementation we have taken Oracle as a Database and Informatica as a tool to achieve the basic functionality.

COMPLETE JOB SNAPSHOT


Job_Desc


We have taken source as s_date_source this source may not present in your database, we have used this source to pull data from oracle using a sequel query.
QUERY:-
SELECT sysdate+level,trunc(sysdate+level,’q’) FROM dual CONNECT  by level <366;
In our case we are loading our date-time dimension from the current date to the next complete year.
*trunc(sysdate+level,’q’) is the date passed to the function which will return the first date of every quarter, which further can be used to evaluate further columns.
Now to implement real logic we need to pull out one expression transformation from the palette.
monthOfQuater:-IIF(MOD(GET_DATE_PART(ora_date,’MM’),3)=0,3,MOD(GET_DATE_PART(ora_date,’MM’),3))
date_type:
IIF(To_char(ora_date,’DAY’)=’Saturday’) OR TO_Char(ora_date,’DAY’)=’Sunday’, ‘Weekend’, ‘Weekday’)
Try To use most of the basic function given by Informatica to achieve your functionality, some of the important ones are
GET_DATE_PART()
DATE_DIFF()
TO_CHAR()
TO_INT()
IIF()
O/P Rows 
out_date_iInteger,dayofyear,monthofYear,monthInChar,dayOfmonth,weekOfMonth,weekDayInChar,quaterOfYear,monthOfQuater,out_quater_first_date,dayOfQuater,day_type

1072013,181,7,Jul,1,1,Monday,3,1,07/01/2013,1,Weekday
2072013,182,7,Jul,2,1,Tuesday,3,1,07/01,2013,2,Weekday
31022013,276,10,oct,3,1,Thursday,4,1,10/01/2013,Weekday

This is one of the way to implement the basic Date-Time Dimension and you can further use various function given by the tool to achieve more functionality as per your requirements.
Continue Reading

Late Arriving Dimension in Talend

Image1
Hello readers,
                     In one of our previous posts we had already covered the concept and implementation of late arriving dimension using informatica.
So, directly we will show you how to achieve this functionality using Talend DI tool.

COMPONENTS REQUIRED

  • tOracleInput
  • tOracleOutput
  • tLogRow
  • tJavaFlex
  • tMap
Given below is the complete job description:-
JOB DESCRIPTION:-
  1. Create a first job that will bring the max surrogate key (using sequel query) from the production dimension table and save in one of context variable using tJavaFlex Component. In main code section of tJavaFlex write context.prod_skey=row4.prod_skey;
  2. Create a second job that runs after successful completion of first job and here bring the tOracleInput_1 component that will bring all the source data that resides into your staging area.
  3. Pull two more oracle input sources that will act as a lookup one for prod_dim table and other one is for cust_dim table.
  4. tMap Job Desc as described in the below figure
Use Left Outer Joiner Model at the left hand side in both the lookup i.e. for customer and product.
THIS JOB IS CREATED TO IMPLEMENT LATE ARRIVING DIMENSION CONCEPT FOR PROD_DIM ONLY.

5. The Top right corner output in tMap is used to populate your normal fact table but make sure to check whether your prod_dim skey must not be null.
6. The second insert is used  again to update your fact table but with the record that contains no surrogate key as late arriving dimension concept here you just increment your context variable and populate using ++context.prod_skey.
7. And Finally Insert a record in the product dimension for that just add one more output in tMap with the name like insert_prod with the same check constraint and use the incremented context variable.
8.In my job i have used tLogRow for logging purposes in real scenario you have to use tOracleOutPut component to be effective for DB.
9. For prod_dim insert output in tOracleOutput remember to use update or insert because if any changes will come then it will directly effect to your database as SCD1.
In case of any queries please do comment…..
 
Continue Reading
PageLines