Sharing Knlowledge With The World…

Author: nayan

Informatica Tutorial – The Definitive Guide

Untitled design (8)

Informatica is the most important (and popular) tool in the Data Integration Industry. Actually, it’s a culmination of several different “Client Soft wares”: you need to master Mapping Designer , Workflow Monitor and good old Workflow Monitor if you want to master Informatica.

INFORMATICA IS ONE OF THE BEST ETL TOOLS OUT THERE, AND WE HAVE THE PERFECT INFORMATICA TUTORIAL GUIDE BOOK WITH THE BEST RESOURCES AROUND THE WEB.

Continue Reading

Informatica Powercenter Performance Tuning Tips

DABLTUU2uOc

DABLTUU2uOcHere are a few points which will get you started with Informatica Power center Performance Tuning .Some of these tips are very general in nature please consult your project members before implementing them in your projects.

 

1) Optimize the Input Query if it’s Relational (e.g. Oracle table) source –

  1. Reduce no.of rows queried by using where conditions instead of Filter/Router transformations later in the mapping. Since you choose fewer rows to start with, your mapping will run faster
  2. Maker sure appropriate Indexes are defined on necessary columns and analyzed. Indexes must be especially defined on the columns in the ‘where’ clause of your input query.
  • Eliminate columns that you do not need for your transformations
  1. Use ‘Hints’ as necessary.
  2. Use sort order based on need

Note: For # i results will vary based on how the table columns are Indexed/queried…etc.

 

2) Use the Filter transformation as close to the SQ Transformation as possible.

3) Use sorted input data for Aggregator or Joiner Transformation as necessary.

4) Eliminate un-used columns and redundant code in all the necessary transformations.

5) Use Local variables as necessary to improve the performance.

6) Reduce the amount of data caching in Aggregator.

7) Use parameterized input query and file for flexibility.

8) Changed memory related settings at workflow/session level as necessary.

9) When use multiple condition columns in Joiner/Lookup transformation make sure to

use numeric data type  column as first condition.

10) Use persistent cache if possible in Lookup transformations.

11) Go through the sessions Logs CLOSELY to find out any issues and change accordingly

12) Use overwrite queries in Lookup transformation to reduce the amount of data cached.

13) Make sure the data type and sizes are consistent throughout the mapping as much

as possible.

14) For Target Loads use Bulk Load as and when possible.

15) For Target Loads use SQL * Load with DIRECT and UNRECOVERABLE option for large volume of data loads.

16) Use Partitioning options as and when possible. This is true for both Informatica and Oracle. For Oracle, a rule of thumb is to have around 10M rows per partition

17) Make sure that there are NO Indexes for all “Pre/Post DELETE SQLs” used in all

the mappings/Workflows.

18) Use datatype conversions where ever possible

e.g: 1) Use ZIPCODE as Integer instead of Character this improves the speed of the lookup Transformation comparisons.

 2) Use the port to port datatype conversions to improve the session performance.

19) Use operators instead of functions

e.g: For concatenation use “||” instead of CONCAT

20) Reduce the amount of data writing to logs for each transformation by setting

log settings to Terse as necessary only.

21) Use Re-usable transformations and Mapplets where ever possible.

22) In Joiners use less no.of rows as Master ports.

23) Perform joins in db rather than using Joiner transformation where ever possible.

 

DABLTf_fEv0

Continue Reading

Informatica Best Practices for Cleaner Development

Informatica Best Practices

 

Don’t you just hate it when you can’t find that one mapping out of the thousand odd mappings present in your repository ??

A best practice is a method or technique that has consistently shown results superior to those achieved with other means, and that is used as a benchmark. In addition, a “best” practice can evolve to become better as improvements are discovered.Following these Informatica Best Practices guidelines , would allow better Repository Management , which would make your Life Easier. Incorporate these practices when you create informatica objects and your life would be much easier:

Mapping Designer

  • There should be a place holder transformation (expression) immediately after the source and one before the target.
  • Active transformations that reduce the number of records, should be used as early as possible.
  • Connect only the ports that are required in targets to subsequent transformations.
  • If a join must be used in the Mapping, select the driving/master table while using joins.
  • For generic logic to be used across mappings, create a mapplet and reuse across mappings.

 

Transformation Developer

  • Replace complex filter expression with a (Y/N) flags. Filter expression will take lesser time to process the flags than the logic.
  • Persistent caches should used in look ups if the look up data is not expected to change often.

Naming conventions – name the informatica transformations starting with the first 3 letters in small case indicating the transformation. E.g. : lkp_<name of the lookup> for Look Up, rtr_<name of router> for router transformation etc.

 

Workflow Manager

  • Naming convention for session, worklet, workflow- s_<name of the session>, wlt_<name of the worklet>, wkf_<name of the workflow>.
  • Sessions should be created as re usable to be used in multiple workflows.
  • While loading tables for full loads, truncate target table option should be checked.
  • Workflow Property “Commit interval” (Default value : 10,000) should be increased for increased for Volumes more than 1 million records.
  • Pre-Session command scripts should be used for disabling constraints, building temporary tables, moving files etc. Post-Sessions scripts should be used for rebuilding indexes and dropping temporary tables.

 

Performance Optimization Best Practices

We often come across situations where Data Transformation Manager(DTM) takes more time to read from Source or when writing in to a Target. Following standards/guidelines can improve the overall performance.

  • Use Source Qualifier if the Source tables reside in the same schema
  • Make use of Source Qualifier “Filter” properties if the Source type is Relational
  • Use flags as integer, as the integer comparison is faster than the string comparison
  • Use tables as lesser number of records as master table for joins
  • While reading from Flat files, define the appropriate data type instead of reading as String and converting
  • Have all ports that are required connected to Subsequent transformations else check whether we can remove these ports

 

  • Suppress ORDER BY using the ‘- – ’ at the end of the query in Lookup transformations
  • Minimize the number of Update strategies
  • Group by simple columns in transformations like Aggregate, Source qualifier
  • Use Router transformation in place of multiple Filter transformations
  • Turn Off the Verbose logging while moving the mappings to UAT/Production environment
  • For large volume of data drop index before loading and recreate indexes after load
  • For large of volume of records Use Bulk load increase the commit interval to a higher value large volume of data
  • Set ‘Commit on Target’ in the sessions

 

These are a few things a beginner should know when he starts coding in Informatica . These Informatica Best Practices guidelines are a must for efficient Repository and overall project management and tracking.

Continue Reading

Clover ETL

clover etl

We are living in exciting times in the field of data integration , tools like Talend , Pentaho are giving heavyweight tools Informatica ,Ablnitio and Datastage a run for their money . The features offered by these tools though not as matured as the bigger heavy weight tools , but are exhaustive nonetheless (especially the features offered by Talend) . Today we are discussing another such tool , Clover ETL .

Clover Overview

Clover Data Integration is relatively new compared to their competitors Talend , Pentaho . Like Talend , Clover uses the Eclipse framework for the visual editor and use JRE to run transformations the difference being Clover is a metadata driven tool and does not need code generation to run jobs.

CloverETL has a smaller pallete but offers functionality which is much complex.They are easier to choose from with well defined functionality.But features like parallelism is supported in the enterprise version , which seems to be a trend in all these “opensource data integration” tools.

Features:

  • Pass parameter to Graphs through file.
  • Enables visual debugging and data monitoring at any point of time.
  • Easy switch between graphs.
  • Share connections between data structures.

 

Input and Output Components:

Capture1

 

 

 

 

 

 

 

 

Transformations Provided:

CloverETL-Transformation

 

 

 

 

 

 

 

 

 

 

 

 

Types of Joins Provided:

CloverETL- Joiners

 

 

 

 

 

 

If you would like to learn more about this awesome tool , access the following link :CloverETL

Conclusion

Clover ETL shows a lot of promise and is a very powerful tool in its own right. Provided this article just provide an overview of the tool , you can expect more articles in the future to have a detailed Talend v/s CloverETL analysis. Given the economic slowdown where bootstrapping is a necessity , CloverETL along with other tools like Talend and Pentaho are reliable , powerful options to consider !!

Continue Reading

Cloud Computing Pros and Cons

Cloud Computing Pros and Cons

 

Cloud Computing Pros and ConsCloud Computing has definitely emerged as one of the most upcoming technologies , where organizations are considering to upgrade their infrastructure to support cloud computing.We have put together a list of Cloud Computing Pros and Cons that a company should be aware of in regards to cloud computing.

Cloud Computing Pros

Cost Effective

Cloud Computing is highly cost efficient in terms of maintenance . Traditional desktop software contribute to extra cost when adding multiper user licences which increases the cost to company. While cloud technology is comparatively lot cheaper than maintaining your own hardware.There are many one-time-payment, pay-as-you-go and other scalable options available.

Unlimited  Storage (Almost !!)

You need not worry about running out of storage space , the cloud service provider will take care of all storage concerns.You just need to request for additional space.

Backup Recovery

Backup and high availability is perhaps one of the greatest advantages of computing on cloud. Since all data is available in cloud ,it is much easier to maintain backups rather than having data on a physical device.

Data accessibility

Since data is always available , it can be accessed from any part of the world (as long as you have a internet connection!!) .Time zone difference are taken care of as well.

Environmental Benefits

Cloud computing drastically reduces the carbon emissions for an organization . Studies show that total carbon emission reduces by upto 35%.

Cloud Computing Cons:

Security

This perhaps is the biggest threat when it comes to Cloud computing.All the enterprise data is at the helm of a third party vendor which could pose major threat if the vendor is not reliable.

Compatibility

While moving onto cloud , organizations would have the need to replace their existing IT infrastructure so that it is compatible on cloud.This could prove to be a costly affair.

Standardization

It could happen that the cloud service vendor does not have proper standard for cloud computing. IEEE has set standards in cloud computing for optimum quality .In order avoid pitfalls due to standardization , the company needs to investigate if the provider uses standard technology.

Conclusions

Yes cloud computing does have certain disadvantages associated with it , but the Pros outweighs the cons . These risks can be easily mitigated if the company involved takes steps to manage these risks. However the immense benefits that cloud computing  offers is over whelming and extremely beneficial . Cloud computing and BI is definitely the way of the future.

 

Continue Reading

Business Intelligence Benefits – Top 10 Reasons your business should go for BI

Business Intelligence Benefits

Business Intelligence BenefitsBusiness Intelligence is on the verge of huge change. For a long time it has been limited to only the larger organizations . With the advent of open-source technologies , BI has become much more accessible . In today’s post we would be discussing some key  Business Intelligence benefits to create awareness of this wonderful concept and as to why one should leverage BI capabilities in their enterprise solutions.

1) Stay ahead of competition

Arm your team with compelling BI reporting and examination so they can stay in front of the pack, and exploit patterns and new open doors as they arrive. Visually intelligent dashboards gives your group access to sales financial and operational parameters of your product/organization as well as your competitor.

2) Foresee and react to patterns

Examine organization and competitor information to set competitive objectives, and track achievements against key performance indicators for the business . Analyze trends and mitigate business risks. Apply hypothetical variables to your information to foresee possible outcomes , historical informational data caters to creation of a predictive model.

3) Empower Decision Making with on-demand data .

Business Intelligence  lets your team, from field reps to top level administration, access dynamic data and empower them with business perspective by offering deep insights into data , enabling better decision making.

4) Visual dashboards and custom-made reports at your fingertips.

Visually stunning and detailed information could be conveyed to the significant individuals via reporting dashdboards with custom reports , with ability to drill down and dissect the information to gain insights. Furthermore BI reports has provisions to customize reports to fit each users needs through simple drag drop functionality.

5) Reduce wastage , efficient manage resources.

BI helps to get rid of  nonaligned data systems and eliminate resource wastage. Eliminates wasting time and resources spent on correcting data errors and manually compiling spreadsheet based reports , so your resources can add maximum value to the organization growth by focusing more on the business growth.

6) Track and monitor Customer behavior

Business Intelligence helps you to identify what customers are buying, what are trends of the buying customers. You can monitor customer behavior and use data insights to drive loyalty and maximize profit by identifying up-sell and cross-sell opportunities. BI also provides opportunities to identify conversions for a potential lead to become a customer

DataWareho

7) Ability to mine data , discover your true manufacturing costs

Business Intelligence software has the ability to create automated alerts to notify managers of incidents that require action. This brings more efficiency in managing procurement , managing inventory , delivery which reduces cost and mitigates risks.

8) Discovering New Questions and Their Answers

Business intelligence is data driven in nature , trending patterns helps us identify and ask new questions leading to a data-driven discovery. Else it could work the other way round where-in you may have certain questions which could be answered using historical data in the warehouse.

9)  Mobile Device Accesibility

BI enables easy integration with various  mobile devices like I-pads,laptops , mobiles. Hence reps/directors can have access to reports on-site on -fields or in office.

10) Efficient Data Management

BI enables integration of enterprise wide  disparate data sources into a single unified data-warehouse,  to ensure consistency and accuracy. This data is cleansed quality data , which makes data management very easy.

These points here just scratch the surface when it comes to Business Intelligence benefits , if you feel strongly feel about other benefits you want to point out please feel free to comment below and let us know.

Check this very awesome video out which outlines BI concepts in a very interesting way.

Continue Reading

Simple Informatica Scenarios – Part 1

Informatica Interview Questions

We have decided to come up with some common scenarios in Informatica , this would be an ongoing post where-in we would be adding common issues faced by an informatica developer.We would be introducing complex scenarios as we move ahead. Please feel free to discuss these scenarios in the comment section below .

1)How to  Concat row data through informatica?

Source:

Ename EmpNo
stev 100
methew 100
john 101
tom 101

Target:

Ename EmpNo
Stev methew 100
John tom 101

Ans:

Using Dynamic Lookup on Target table:

If record doen’t exit do insert in target .If it is already exist then get corresponding Ename vale from lookup  and concat in expression with current Ename value then update the target Ename column using update strategy.

Using Var port Approch:

Sort the data in Source Qualifier  based on EmpNo column then Use expression to store previous record information using Var port after that use router to insert a record if it is first time if it is already inserted then update Ename with concat value of prev name and  current name value then update in target.

2) How to send Unique (Distinct) records into One target and duplicates into another target?

Source:

Ename EmpNo
stev 100
Stev 100
john 101
Mathew 102

Output:

Target_1:

Ename EmpNo
Stev 100
John 101
Mathew 102

Target_2:

Ename EmpNo
Stev 100

Ans:

Using Dynamic Lookup on Target table:

If record doen’t exit do insert in target_1 .If it is already exist then send it to Target_2 using Router.

Using Var port Approch:

Sort the data in sq  based on EmpNo column then Use expression to store previous record information using Var ports after that use router to route the data into targets  if it is first time then sent it to first target  if it is already inserted then send it to Tartget_2.

………………………………………………………………………………………..

Below is the solution for processing multiple flat files into table using informatica.

3) How to Process multiple flat files to single target table through informatica if all files are same structure?

We can process all flat files through one mapping and one session using list file.

First we need to create list file using unix script  for all flat file the extension of the list file is .LST.

This list file it will have only flat file names.

At session  level  we need to set

source file directory  as list file path

And source file name as list file name

And file type as indirect.

……………………………………………………………………………………………………..

This is also One of the advanced feature in Informatica 8.

4) How to populate file name to target while loading multiple files using list file concept.

In informatica 8.6 by selecting Add currently processed flatfile name option in the properties tab of source definition after import source file defination in source analyzer.It will add new column as currently processed file name.we can map this column to target to populate filename.

……………………………………………………………………………………………..

5)How to load unique or distinct records from flat file to target?

Ans:Using Aggregator we can load unique records by setting group by on all columns this is one approach

Another method is we can use sorter after source qualifier in sorter  we can use distinct option.

………………………………………………………………………………………………..

6) How to load first record and  last record in to target  from a file  using Informatica?

Solution:

Step 1.

Create  one mapping variable like $$Record_Count and create one Sequence Generator transformation with reset option  then use filter transformation .

In filter transformation put condition like below.

Seq.NEXT  Value =1  or  Seq.NEXT  Value =$$Record_Count

Step 1. Use  Unix script to create/update  parameter file with file record count (wc –l) .this parameter file will supply the value to mapping variable  $$Record_Count

Below is the order of the tasks.

Wf——-àcommand task——-àmain session

Command task—To execute unix script

 

7)How to  add a lengthy query in the source qualifier,if the query length exceeds 32K characters ?

If you are trying to use a very length query in the SQ over-ride, there seems to be a character limit for this. This is something around 32K characters.
This must be mainly due to limitation of saving this query as metadata in underlying database repository.
This issue can be solved by writing this query as a parameter value in the param file.
Since the query will be fetched dynamically, the limitation will not be an issue.

Note – Ensure that the query is written in one single line in the parameter file.

 

 

Continue Reading
PageLines