Sharing Knlowledge With The World…

Month: June 2013

Load all records from source to target except last N using Talend

blogger-image-529581592
SCENARIO
Today we will discuss a situation in which we have to dump all source records to target  except the  last 5 records.
SOLUTION
We need a couple of components to achieve this requirement as given below in snapshot.

SEQUENCE CREATION

By using above class and method you can generate a sequence in Talend.

LOGIC

Pull all the rows from source and further in tmap create a sequence using Numeric.sequence method further sort all the rows using tSortRow in descending order key based upon that sequence generated in tMap, now create another sequence using  tmap. Resulting your last rows from source that are currently at top because of sort method will assigned a sequence from 1 and so on. 
Finally, use a tFilter and restrict all rows that you want to limit in my case its 5
Just use filter and give condition as

The number you select in tFilter the process will leave those rows to reach to the target.

Thank you, for reading!!!!

Talend Interview Questions

Author : Ankit Kansal & Nayan Naik

Continue Reading

How to use a shared DB connection in Talend

parent_job
Hi Readers,
                  Today we will discuss how to use or register a shared DB connection, across your jobs. In real time scenarios it’s not recommended to create connections again and again so instead of creating new connections you can use your old registered connections through out your sub jobs.
For Demonstration purpose we have created three jobs in which we are using a single connection through out our all jobs.
JOBS DESC:-

  1. shared_conn_demo–>> This job holds our two child jobs (parent,child)
  2. parent_db_conns–>>The primary job  is the one in which connection(shared) is registered for our sub jobs.
  3. child_job_conn–>>This is the third job which will be using the connection created in parent job.
STEPS
1) parent_db_conns:- Create a connection as you create normally and then at the last component property check use or register a shared DB connection and put a name you want to give for shared connection.
NOTE- The name must be in “”


We cannot share our own db_conn values
just fill all the enteries required to create a connection here.
2) child_job_conn:- Again check the shared DB component tab and write”shared_conn” or name which you had given in the property box to utilize the connection.

and leave all the field blanks.

2) shared_conn_demo:-Now you just have to connect both the jobs and then run them.
Give the trigger as OnSubjobOk


     
That’s it now you can use or share your connections through out your all jobs and save connections.

Thank you for reading…

Talend Interview Questions

Author : Ankit Kansal & Nayan Naik
Continue Reading

R programming Introduction

R-programming has taken the analytics domain by storm . A very handy and extremely powerful technique , we would be exploring R-programming in the coming few posts.

This post would give a fair intoduction to R-programming .

1)To get started with R, download and install R for your desktop from r-project.org .R is not available for Android or iOS.

2)I would suggest you install RStudio a free IDE which has numerous features.

3)Setting your work directory
Use the setwd() function to change or set your working directory
eg. setwd(“~/directory_name”) , we use only forward slashes in case of Windows as well as Unix.

4)Instaling and using packages
We can take advantage of various add-ons packages which have been developed . To install this package , we use the following command,
install.packages(“package_name”)

to see all installed packeges , use the following command,
installed.packages() 

Now to use this installed package , load it with the following command
library(“package_name”)

In the coming posts we will cover some basic commands , along with a scenario depicting data-analysis.

Continue Reading

Late Arriving Dimension in Informatica

pic1_overiew
Hello readers,
Today we will the covering how to handle late arriving dimension concept followed by an example of the same.We would be assuming that you have a basic understanding of the datawarehousing concepts along with understanding of the basic understanding of concept of late arriving dimensions.

Late arriving dimensions issues are commonly encountered in a datawarehousing environment.

As explained in our previous post Loading fact table , the staging table in consideration is firstly joined with all its dimension table so as to load the fact table with respective SKEYS of the dimensions and the measures received from the transactional table.
But it in real time scenarios , it may happen that data record in the transactional table does not have an entry in the dimension table at that point of time.I would suggest you read through this link
Late arriving dimension   to get a fair idea of the late arriving concept.

The snapshot below shows the overall mapping to resolve the late arrviving mapping

In this case we consider a type1 Insert/Update SCD dimension table and use a reusable sequence to resolve this issue:

Here is the snapshop of the over all mapping:

These are the overall steps followed:

 1)We have 2 dimension tables namely the product dimension and the store dimension ,which are used for fact table loading (sales fact table).

2) We simulated a condition where in the  late arriving dimension issue arises in the product dimension while loading into the sales fact table.

3)Add the source table as the sales oltp table .

4)Add two look up transformation for the product and the store dimension.

5)We have  added a sequencer named product_dimension_seq , this transformation is reusable sequence transformation that i have used while loading into the product dimension table , this helps me to come up with an SKEY(surrogate key) in case of an late arriving dimension.

6)In the expression dimension

As you can observe in the expression port, we have an out_prod_SKEY where in the following condition is defined “IIF(ISNULL(product_skey),product_seq_NEXTVAL,product_skey)
where in case the SKEY is NULL we use the sequencer’s value to generate an SKEY.

6)Now we have to targets in the flow:
        i)We use a filter transformation to find the product_SKEY is null and load it into the dimension  table with the value from the sequence generator , since the product dimension is an Insert else Update, whenever the dimension arrives the values will be updated.
        ii)Now since we have an SKEY for the missing or late dimension , we now load the data into the sales fact table.
 This will solve the late dimension issue . Thanks for reading !!!!!!
In case of any queries please do comment…..

Continue Reading

Data Masking Using Talend

Capture1
Hi Readers,
                 Today we will discuss What is Data Masking, Why it is important and do we achieve it using Talend Open Studio.
Data Masking–>>Data Masking is a process of encrypting a data field  to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles. It must also look real and appear consistent. production systems generally consists of intensively sensitive data and you can not take chances to make it available for others easily without some changes and for better performance of applications it’s also required to have similar kind of data for testing purposes.
If we are talking to current social networking boom if any of user related information become available without any security measures than it may lead to various disastrous results. So, to overcome these problems some encrypting strategies are required which will fulfill the needs.
Now,
Implementing Data Masking Using Talend
Overall Job Descriptive Image:-
JOB DESCRIPTION–>>
A source– file is taken which contains all valid data but which you do not want to give directly after known risks involved.
Replacement– File contains the data which is to be used to replace against the source matching data.
Take a tReplaceList component from the palette, and it properties define as:-

Lookup search column is the column which contains the same data present in your source file column and corresponding Lookup replacement column contains the data which is used to map/replace with the matches found against the source column. 

In column option part you have to check the box on which you want to search performed and simultaneously replacement should be done.

e.g.
Your source column contains:-
Text(/*Col_name*/)
“Hello this is a text which is to be replaced”

replacement file contains
Text|Replacement
Hello|HO
text|msg
is|tis
replaced|wow
which|haha

output generated
HO this tis a msg haha tis to be wow

and finally using tSendMail component the transformed file is sent to the Destination with all the security measures applied.
Thank you for taking interest.
Author : Ankit Kansal & Nayan Naik


Continue Reading
PageLines