Sharing Knlowledge With The World…

Month: April 2013

ROWNUM Analytical Function in Talend

post_job_overview
Hi Readers,

Analytical funtions are quite useful in SQL and help us avoid extra coding.We encountered a similar scenario where in we had to implement a function similer to the ROWNUM() OVER (partition by clause) analytical functionality in Talend.

We ll give an overview of the Talend Job we created in order to attain the functionality:

Prerequisites:
Create a context variable and Initialize it to 0.

Scenario: Implement the ROW_NUMBER() over (partition by) analytical functionality in Talend

 The image below gives the overview of the job, the components to focus on would be  the tMemorizeRows and the tJavaRow component.

 The input file is as shown in the snapshot below:

 We need output to look something like this:

 The tMemorizeRows is configured as shown in the snapshot below:

 This configuration specifies that we MEMORIZE the ID column , and the row count specifies that we remember upto 2 rows flowing into the tMemorize component.

The most important part of the job ,is configuring the tJavaRow component ,wherein we specify the logic for implementing this analytical function.

The java code compares the previous and current row using the index for tMemorizeRows as shown in the snapshot..

Thank you for taking interest..

Talend Interview Questions

Author : Ankit Kansal & Nayan Naik

Continue Reading

Informatica – Merge multiple files into single file

filename1

Hi Readers,

Today we will learm how to merge multiple input sources into one target file.We will do this using only a single source (Indirect method).

Scenario: Merging multiple input sources into a single target along with the file names with a additional column.

The snapshot below shows the overall mapping.

full job description

1) SOURCE SETTING:-
                                       Open the source explorer select your source which has the same structure of your all input files and then go to properties check the Add Currently Processed Flat File Name Port.

source setting

2)Create your workflow and in session properties select source file type as INDIRECT this option will help you to read number of files with the same schema structure.

session settings

3) In the Input file name select the file which holds your name of files that is going to be used by integration service to read the files.

In my case file name is filelist.csv

now, when you run this job this job will print all your file names with the specified direcory but if you want to remove the directory path and just want to print the file names then again go back to designer console and add an expression transformation.

expression transformation

filename (column) expression

That’s all now when you run this job then it will going to print the file names with the extension.

/*** For Indirect Load to work , the files provided in the filelist should have the same structure as defined in the mapping ***/

Continue Reading

Adding Header and Footer in Informatica

header_footer

Creating a CSV file as a target which contains a header independent of the schema defined in the designer and a footer contains number of records passed with the concated date.

Create two CSV as targets–

  1. Contains your normal source to target business logic.
  2. Contains aggregated rows(no. of rows concatenated with SYSDATE
PROCEDURE

DESIGNER—->>>

Aggregator—->>>

 
 

Exp_transform—->>>

exp_transform                      
IN WORKFLOW
Go to Header Options and select Use Header Command Output
echo is a command which is used on a DOS prompt to print.
write your desired header.
if your requirement does not require any of the footer or it just require manual data then write in footer command and second aggregate mapping is not required then.
If you only requires date then write echo %date% in footer command 

Now your both files are created as per designer mapping rules
Finally go to mapping–>>components->>post-Session Success Command

create a non-reusable command
add one command and write
copy $PMTargetFileDirAnkitt_empfile.csv+$PMTargetFileDirAnkitt_num_rec_date.csv $PMTargetFileDirAnkitfinal_rec.csv

your final output contains a file with attached header and footer.

Continue Reading

SCD-2 Implementations in Talend

staging_creation
CREATING STAGING TABLE

1)From tOracleInput_3 bring the max surrogate key present in your dimension table using query
“SELECT max(user_profile_sur_key) FROM ANKIT_KANSAL.user_profile”
USER PROFILE IS OUR DIMENSION TABKE

2)Create a context variable snumber of integer type and assign a default value 0 and in the next step assign the max surrogate key to the variable using tJavaFlex component.

 

4) tOracleInput4 takes all your data from the source oracle system KEEP ALL THE SOURCE AS A VARCHAR2 FORMAT ONLY.

5)using tMap component calculate your date and also you can find out your GOOD and BAD records depending upon the business requiremetns.
AS FOLLOWS:-




In the value part perform these operations

checking tweet_id is numeric or not

 6) Finally load the data using running the job.

NOW SCD TYPE 2 IMPLEMENTATION

In the above figure
tOracleInput_1 contains the staging data.
tOracleInput_2 contains the data from the dimension table thus lookup from the dimension table must be performed to check weather the record exists or not.

creating the relationship using equi join

*If both data structure does not matches because staging may consist the data in string format for that column then you can achieve this join by converting the data type with in tmap only    Integer.parseInt(row1.USER_ID) at the join level only.
CATCHING FRESH INSERTS:-
At the right hand side you can achieve this functionality by enabling catch lookup inner join reject to true. 
IF A ROW ALREADY EXISTS THEN INSERTING A NEW RECORD WITH A FRESH VALUE AND UPDATING THE OLD RECORD WITH END DATE

TO UPDATE THE EXISTING RECORD WITH ASSIGNING THE END DATE WITH THE CURRENT DATE

Thank you for taking interest…

Talend Interview Questions

Author : Ankit Kansal & Nayan Naik
Continue Reading

numeric to date conversion in informatica

pic_1
Hello Readers,

Numeric to Date conversion ,this is one of the  most common scenarios an ETL developer comes across !!!

The file input that i have considered in this case contains the following values:

 id,ename,hiredate

101,Nayan,080202
102,Naik,080404
103,kumar,080103
104,google,080204
105,oracle,080105
106,yahoo,080103

The hiredate comes in form of an integer value where the first 2 values represent the year , the next two values represent the month followed by the day.

we would like the hiredate column in this format : DD-MON-YY in the target:

1)Lets start with the overview of the mapping created. We ll discuss each component in the mapping step by step.

2)After dragging the source and target into your magging area, there isn’t much to do really!!!Add an expression transformation ,connect the hiredate to it as shown in the snapshot above.

3)Configuring the expression transformation:

4)Connect your transformation to the target.Now browse to the properties tab of the target and change the date format as per your requirements.

And there!!! you ll have your desired output.

Thanks for reading!!

Continue Reading
PageLines