Sharing Knlowledge With The World…

Category: Netezza

Talend Netezza Interaction

NET_IBM_RGB

Hello Readers,

Today we will discuss about how  Talend – Netezza interactwith each other , along with a simple demonstration that will use the available Netezza components.

you will find all Netezza related components in Palette available.

STEPS : –

1) Installing NZJDBC,jar :-  Initially when you drag any of Netezza component from the Palette , and if your library does not contain required .jar to support those components then it will throw an error as shown below.

This means your TOS do not have supported .jar

Talend Netezza Interaction

nzjdbc-error-mark

AUTOMATIC SETUP

Immediately a popup will come which will automatically install the required drivers. However if it fails to install the required .jar then you have to manually place that jar file.

MANUAL SETUP

(a)  Download nzjdbc.jar from the internet.

(b) Copy downloaded .jar file into this location  (C:\TOS\TOS_DI-Win32-r101800-V5.3.0\lib\java)

(c) Save your Current Job and Re-Start TOS (Talend Open Studio).

 

 EXAMPLE 

Now you are set to go ,just drag and drop your required components from the palette and connect them as per the business requirement.

This below job will show a simple Talend Netezaa Interaction and will create an excel file from a Netezza Database Table.

 

Talend Netezza Interaction

Talend Netezza Interaction

 

A) Drag tNetezzaConnection from the palette and fill  all required entries like   Host , DB-Name and  Password

B) Now to Read data from a table drag tNetezzaInput and specify attributes such as table Name and write your SQL like

(select emp_id , ename from emp limit 100)    *It will only fetch 100 rows from the employee table

C) At final to populate the row/data to some output( Excel file ) drop this component  to the designer area.

D) Connect All components as shown in figure.

E) Save and Run The Job.

 

Talend Netezza Interaction Successful …

 

 

Check out our other posts on Talend Interview Questions

 Author : – Ankit Kansal & Nayan Naik

 

 

Continue Reading

How to use a shared DB connection in Talend

parent_job
Hi Readers,
                  Today we will discuss how to use or register a shared DB connection, across your jobs. In real time scenarios it’s not recommended to create connections again and again so instead of creating new connections you can use your old registered connections through out your sub jobs.
For Demonstration purpose we have created three jobs in which we are using a single connection through out our all jobs.
JOBS DESC:-

  1. shared_conn_demo–>> This job holds our two child jobs (parent,child)
  2. parent_db_conns–>>The primary job  is the one in which connection(shared) is registered for our sub jobs.
  3. child_job_conn–>>This is the third job which will be using the connection created in parent job.
STEPS
1) parent_db_conns:- Create a connection as you create normally and then at the last component property check use or register a shared DB connection and put a name you want to give for shared connection.
NOTE- The name must be in “”


We cannot share our own db_conn values
just fill all the enteries required to create a connection here.
2) child_job_conn:- Again check the shared DB component tab and write”shared_conn” or name which you had given in the property box to utilize the connection.

and leave all the field blanks.

2) shared_conn_demo:-Now you just have to connect both the jobs and then run them.
Give the trigger as OnSubjobOk


     
That’s it now you can use or share your connections through out your all jobs and save connections.

Thank you for reading…

Talend Interview Questions

Author : Ankit Kansal & Nayan Naik
Continue Reading

Late Arriving Dimension in Informatica

pic1_overiew
Hello readers,
Today we will the covering how to handle late arriving dimension concept followed by an example of the same.We would be assuming that you have a basic understanding of the datawarehousing concepts along with understanding of the basic understanding of concept of late arriving dimensions.

Late arriving dimensions issues are commonly encountered in a datawarehousing environment.

As explained in our previous post Loading fact table , the staging table in consideration is firstly joined with all its dimension table so as to load the fact table with respective SKEYS of the dimensions and the measures received from the transactional table.
But it in real time scenarios , it may happen that data record in the transactional table does not have an entry in the dimension table at that point of time.I would suggest you read through this link
Late arriving dimension   to get a fair idea of the late arriving concept.

The snapshot below shows the overall mapping to resolve the late arrviving mapping

In this case we consider a type1 Insert/Update SCD dimension table and use a reusable sequence to resolve this issue:

Here is the snapshop of the over all mapping:

These are the overall steps followed:

 1)We have 2 dimension tables namely the product dimension and the store dimension ,which are used for fact table loading (sales fact table).

2) We simulated a condition where in the  late arriving dimension issue arises in the product dimension while loading into the sales fact table.

3)Add the source table as the sales oltp table .

4)Add two look up transformation for the product and the store dimension.

5)We have  added a sequencer named product_dimension_seq , this transformation is reusable sequence transformation that i have used while loading into the product dimension table , this helps me to come up with an SKEY(surrogate key) in case of an late arriving dimension.

6)In the expression dimension

As you can observe in the expression port, we have an out_prod_SKEY where in the following condition is defined “IIF(ISNULL(product_skey),product_seq_NEXTVAL,product_skey)
where in case the SKEY is NULL we use the sequencer’s value to generate an SKEY.

6)Now we have to targets in the flow:
        i)We use a filter transformation to find the product_SKEY is null and load it into the dimension  table with the value from the sequence generator , since the product dimension is an Insert else Update, whenever the dimension arrives the values will be updated.
        ii)Now since we have an SKEY for the missing or late dimension , we now load the data into the sales fact table.
 This will solve the late dimension issue . Thanks for reading !!!!!!
In case of any queries please do comment…..

Continue Reading

Loading Fact Table Using Talend

Capture4
Hi Readers,
                   In this post we will discuss how to load a fact table in a data warehouse using your dimension table and the data staged in a staging table. We will just show you what is the procedure of loading and further complexities depends upon your business requirements.
In our example we took THREE dimension tables.
  1. DIM_BOOK
  2. DIM_CUSTOMER
  3. DIM_TIME

All of the dimension tables are SCD-type implemented other than TIME DIMENSION.

STEPS:—->>>
1) create an connection first as i am using oracle as a database that’s why i used tOracleConnection.
2) After Successful connection run the further part.
From Dimension tables you have to fetch the data using Query Editor and place a condition where
END_DATE is null, if you have implemented SCD type-2 This will bring all those recordswho have currently validated state.
Now Fetch the OLTP data stored currently in Staging Table, and put all connections in tMap.
REMEMBER—-your staging table that has to be loaded must be first linked then only it will work fine and all other dimensions connected will work as a LOOKUP.



Now in tMap join all the dimensions with your source data using the keys and fetch down the SURROGATE_KEY and put all those skeys in the fact table. In my join condition i have used inner join as a join method.
In tmap component i have use some calculations to find out what are the percentage in discount and total value for the order . It may be anything depend upon your requirements.

note:- Just keep in mind data type conversion you have to keep in mind other wise it’ll give you trouble a lot in my case i just converted my data types in staging itself.

Thank you for taking interest….

Talend Interview Questions

Author : Ankit Kansal & Nayan Naik
Continue Reading

SCD-2 Implementations in Talend

staging_creation
CREATING STAGING TABLE

1)From tOracleInput_3 bring the max surrogate key present in your dimension table using query
“SELECT max(user_profile_sur_key) FROM ANKIT_KANSAL.user_profile”
USER PROFILE IS OUR DIMENSION TABKE

2)Create a context variable snumber of integer type and assign a default value 0 and in the next step assign the max surrogate key to the variable using tJavaFlex component.

 

4) tOracleInput4 takes all your data from the source oracle system KEEP ALL THE SOURCE AS A VARCHAR2 FORMAT ONLY.

5)using tMap component calculate your date and also you can find out your GOOD and BAD records depending upon the business requiremetns.
AS FOLLOWS:-




In the value part perform these operations

checking tweet_id is numeric or not

 6) Finally load the data using running the job.

NOW SCD TYPE 2 IMPLEMENTATION

In the above figure
tOracleInput_1 contains the staging data.
tOracleInput_2 contains the data from the dimension table thus lookup from the dimension table must be performed to check weather the record exists or not.

creating the relationship using equi join

*If both data structure does not matches because staging may consist the data in string format for that column then you can achieve this join by converting the data type with in tmap only    Integer.parseInt(row1.USER_ID) at the join level only.
CATCHING FRESH INSERTS:-
At the right hand side you can achieve this functionality by enabling catch lookup inner join reject to true. 
IF A ROW ALREADY EXISTS THEN INSERTING A NEW RECORD WITH A FRESH VALUE AND UPDATING THE OLD RECORD WITH END DATE

TO UPDATE THE EXISTING RECORD WITH ASSIGNING THE END DATE WITH THE CURRENT DATE

Thank you for taking interest…

Talend Interview Questions

Author : Ankit Kansal & Nayan Naik
Continue Reading

SCD-2 Implementations in Informatica

Complete_img

Hello reader,today we ll try and see how we can attain SCD-2(Slowly changing dimensions) implementation in informatica.

(Assumption :
1.assumed the source and target definitions are defined
2.assumed you have a basic understanding of SCD implementations.
3.possess the basic understanding of adding connections to the components.
4.possess basic understanding of working of all components used in the mapping.
)

Please have a look at the overall mapping.We will discuss each of them part by part in detail.

Lets explore this in detail:

1)Drag drop the Source onto your mapping.
2)Drag drop all the transformations required as shown in image above.

3)The Dynamic lookup is created on the target table on which SCD-2 is implemented ,in our case the table name is emp_SCD2, we have applied a filter to get records which are active.we have added a condition for matching employee_id.
5)Next we pass the o/p to the router transformation:we have created two groups in the router transformation namely :
i)Inser_New_Rec: which checks if the record is a new entry.
ii)Update_Insrt_grp: group for updating the existing record ‘s end date and inserting a new record with      end date as NULL.
We have applied SCD on the deptno of the employee, hence in the Update_Insrt_grp we are tracing if the deptno of the source and target are different.Change your conditions ,depending on the columns on which you want to implement SCD.
Now the further remaining mapping i have extended the components so that you can make out how the data flows in the subsequent steps.
6)A sequence generator is used to create SKEY for new inserts into the target.
7)Now the output from Insert_New_Rec are given directly to the target output table as shown in the image above.(emp_scd2)
8)two update sttratergies are implemented for the Update_Insrt_grp coming from the router. One update stratergy is used to update the record in the target with the end date we have defined using the dt column.And the second update stratergy is required to insert a new record for the changed records with end_dt as NULL.Connect the targets as shown in the image.
9)Connect to the target.
And there, we are set for implementing SCD-2 ,now create a workflow for the mapping.
Thanks for reading !!!

Continue Reading
PageLines