April 26, 2013
Analytical funtions are quite useful in SQL and help us avoid extra coding.We encountered a similar scenario where in we had to implement a function similer to the ROWNUM() OVER (partition by clause) analytical functionality in Talend.
We ll give an overview of the Talend Job we created in order to attain the functionality:
Create a context variable and Initialize it to 0.
Scenario: Implement the ROW_NUMBER() over (partition by) analytical functionality in Talend
The image below gives the overview of the job, the components to focus on would be the tMemorizeRows and the tJavaRow component.
The input file is as shown in the snapshot below:
We need output to look something like this:
The tMemorizeRows is configured as shown in the snapshot below:
This configuration specifies that we MEMORIZE the ID column , and the row count specifies that we remember upto 2 rows flowing into the tMemorize component.
The most important part of the job ,is configuring the tJavaRow component ,wherein we specify the logic for implementing this analytical function.
The java code compares the previous and current row using the index for tMemorizeRows as shown in the snapshot..
Thank you for taking interest..
April 15, 2013
Today we will learm how to merge multiple input sources into one target file.We will do this using only a single source (Indirect method).
Scenario: Merging multiple input sources into a single target along with the file names with a additional column.
The snapshot below shows the overall mapping.
|full job description|
1) SOURCE SETTING:-
Open the source explorer select your source which has the same structure of your all input files and then go to properties check the Add Currently Processed Flat File Name Port.
2)Create your workflow and in session properties select source file type as INDIRECT this option will help you to read number of files with the same schema structure.
3) In the Input file name select the file which holds your name of files that is going to be used by integration service to read the files.
now, when you run this job this job will print all your file names with the specified direcory but if you want to remove the directory path and just want to print the file names then again go back to designer console and add an expression transformation.
|filename (column) expression|
That’s all now when you run this job then it will going to print the file names with the extension.
/*** For Indirect Load to work , the files provided in the filelist should have the same structure as defined in the mapping ***/
April 10, 2013
Creating a CSV file as a target which contains a header independent of the schema defined in the designer and a footer contains number of records passed with the concated date.
Create two CSV as targets–
- Contains your normal source to target business logic.
- Contains aggregated rows(no. of rows concatenated with SYSDATE
Now your both files are created as per designer mapping rules
Finally go to mapping–>>components->>post-Session Success Command
create a non-reusable command
add one command and write
copy $PMTargetFileDirAnkitt_empfile.csv+$PMTargetFileDirAnkitt_num_rec_date.csv $PMTargetFileDirAnkitfinal_rec.csv
your final output contains a file with attached header and footer.
April 10, 2013
1)From tOracleInput_3 bring the max surrogate key present in your dimension table using query
“SELECT max(user_profile_sur_key) FROM ANKIT_KANSAL.user_profile”
USER PROFILE IS OUR DIMENSION TABKE
4) tOracleInput4 takes all your data from the source oracle system KEEP ALL THE SOURCE AS A VARCHAR2 FORMAT ONLY.
5)using tMap component calculate your date and also you can find out your GOOD and BAD records depending upon the business requiremetns.
|In the value part perform these operations|
|checking tweet_id is numeric or not|
6) Finally load the data using running the job.
NOW SCD TYPE 2 IMPLEMENTATION
In the above figure
tOracleInput_1 contains the staging data.
tOracleInput_2 contains the data from the dimension table thus lookup from the dimension table must be performed to check weather the record exists or not.
|creating the relationship using equi join|
Thank you for taking interest…
April 1, 2013
Numeric to Date conversion ,this is one of the most common scenarios an ETL developer comes across !!!
The file input that i have considered in this case contains the following values:
The hiredate comes in form of an integer value where the first 2 values represent the year , the next two values represent the month followed by the day.
we would like the hiredate column in this format : DD-MON-YY in the target:
1)Lets start with the overview of the mapping created. We ll discuss each component in the mapping step by step.
2)After dragging the source and target into your magging area, there isn’t much to do really!!!Add an expression transformation ,connect the hiredate to it as shown in the snapshot above.
3)Configuring the expression transformation:
4)Connect your transformation to the target.Now browse to the properties tab of the target and change the date format as per your requirements.
And there!!! you ll have your desired output.
Thanks for reading!!