Você está na página 1de 22

Extract, Transform, and Load

Operational Systems
RDBMS Mainframe Other

Decision Support
Data Warehouse

Informatica PowerMart / PowerCenter Basics


Education Services
PC6B-20030512

Aggregate Data Transaction level data Cleanse Data Optimized for Transaction Consolidate Data Response Time Apply Business Rules De-normalize Current Normalized or DeTransform Normalized data

Aggregated data Historical

Extract

ETL

Load
3

Informatica Corporation, 2003. All rights reserved.

Course Objectives
At the end of this course you will: Understand how to use all major PowerCenter components Be able to perform basic Repository administration tasks Be able to build basic ETL Mappings and Mapplets Be able to create, run and monitor Workflows Understand available options for loading target data Be able to troubleshoot most problems

PowerCenter Architecture

PowerCenter Architecture
native
Sources
Server

Repository Topics
native
Targets

By the end of this section you will be familiar with: The purpose of the Repository Server and Agent

TCP/IP
Heterogeneous Targets

Repository Server

Heterogeneous Targets

The Repository Server Administration Console GUI interface The Repository Manager GUI interface Repository maintenance operations Security and privileges

TCP/IP

Repository Agent

native
Repository Designer Workflow Workflow Rep Server Manager Manager Monitor Administrative Console

Repository

Object sharing, searching and locking Metadata Extensions


5 7

Not Shown: Client ODBC Connections for Source and Target metadata

PowerCenter Components
PowerCenter Repository PowerCenter Repository Server PowerCenter Client Designer Repository Manager Repository Server Administration Console Workflow Manager Workflow Monitor PowerCenter Server External Components Sources Targets
6

Repository Server
Each Repository has an independent architecture for the management of the physical Repository tables Components: one Repository Server, and a Repository Agent for each Repository
Server Repository Server Repository Agent
Repository Manager Repository Server Administration Console

Repository
Client overhead for Repository management is greatly reduced by the Repository Server 8

Repository Server Features


Manages connections to the Repository from client applications Can manage multiple Repositories on different machines on a network Uses one Repository Agent process to insert, update and fetch objects from the Repository database tables, for each Repository it manages Maintains object consistency by controlling object locking

Repository Server Administration Console

Information Nodes

HTML View

Console Tree Hypertext Links to Repository Maintenance Tasks

The Repository Server runs on the same system running the Repository Agent

11

Repository Server Administration Console


Use Repository Administration console to Administer Repository Servers and Repositories through Repository Server. Following tasks can be performed: Add, Edit and Remove Repository Configurations Export and Import Repository Configurations Create a Repository *Promote a local Repository to a Global Repository Copy a Repository Delete a Repository from the Database Backup and Restore a Repository Start, Stop, enable and Disable a Repositories View Repository connections and locks Close Repository connections. Upgrade a Repository
10

Repository Management
Perform all Repository maintenance tasks through Repository Server from the Repository Server Administration Console Create the Repository Configuration Select Repository Configuration and perform maintenance tasks:
Create Delete Backup Copy from Disable Export Connection Make Global
12

Notify Users Propagate Register Restore Un-Register Upgrade

Repository Manager
Use Repository manager to navigate through multiple folders and repositories. Perform following tasks:
Manage the Repository

Users, Groups and Repository Privileges


Steps:
Create groups Create users Assign users to groups Assign privileges to groups Assign additional privileges to users (optional)
13 15

Launch Repository Server Administration Console for this purpose

Implement Repository Security Managing Users and Users Groups Perform folder functions Create, Edit, Copy and Delete folders View Metadata Analyze Source, Target, Mappings and Shortcut

dependencies.

Repository Manager Interface

Managing Privileges
Check box assignment of privileges

Navigator Window

Main Window

Dependency Window

Output Window

14

16

Folder Permissions
Assign one user as the folder owner for first tier permissions Select one of the owners groups for second tier permissions All users and groups in the Repository will be assigned the third tier permissions
17

Object Searching
(Menu- Analyze Search) Keyword search

Limited to keywords previously defined in the Repository (via Warehouse Designer)

Search all

Filter and search objects

19

Object Locking
Object Locks preserve Repository integrity Use the Edit menu for Viewing Locks and Unlocking Objects

Object Sharing
Reuse existing objects Enforces consistency Decreases development time Share objects by using copies and shortcuts
COPY Copy object to another folder Changes to original object not captured Duplicates space Copy from shared or unshared folder SHORTCUT Link to an object in another folder Dynamically reflects changes to original object Preserves space Created from a shared folder

Required security settings for sharing objects:


Repository Privilege: Originating Folder Permission: Destination Folder Permissions: Use Designer Read Read/Write

18

20

Adding Metadata Extensions


Allows developers and partners to extend the metadata stored in the Repository Accommodates the following metadata types:
Vendor-defined - Third-party application vendor-created metadata lists
For example, Applications such as Ariba or PowerConnect for Siebel can add information such as contacts, version, etc.

Design Process

1. Create Source definition(s) 2. Create Target definition(s) 3. Create a Mapping 4. Create a Session Task 5. Create a Workflow from Task components 6. Run the Workflow 7. Monitor the Workflow and verify the results
21 23

User-defined - PowerCenter/PowerMart users can define and create their own metadata

Must have Administrator Repository or Super User Repository privileges

Sample Metadata Extensions

Source Object Definitions


By the end of this section you will: Be familiar with the Designer GUI interface Be familiar with Source Types Be able to create Source Definitions Understand Source Definition properties

Sample User Defined Metadata, e.g. - contact information, business user

Be able to use the Data Preview option

Reusable Metadata Extensions can also be created in the Repository Manager

22

24

Source Analyzer

Analyzing Relational Sources


Source Analyzer ODBC Relational Source
Table View Synonym
DEF Repository Server TCP/IP

Designer Tools

Analyzer Window
Navigation Window

Repository Agent

native
Repository
27

DEF
25

Methods of Analyzing Sources

Analyzing Relational Sources


Editing Source Definition Properties

Import from Database Import from File Import from Cobol File Import from XML file Create manually

Repository

Source Analyzer

Relational

XML file

Flat file

COBOL file
26 28

Analyzing Flat File Sources


Source Analyzer
Mapped Drive NFS Mount Local Directory

XML Source Analysis


Source Analyzer
Mapped Drive NFS Mounting Local Directory

.DTD File
DEF

Flat File
DEF

Fixed Width or Delimited


Repository Server TCP/IP Repository Server TCP/IP DATA

Repository Agent

Repository Agent

native
Repository
29

native
Repository

In addition to the DTD file, an XML Schema or XML file can be used as a Source Definition
31

DEF

DEF

Flat File Wizard


Three-step wizard Columns can be renamed within wizard Text, Numeric and Datetime datatypes are supported Wizard guesses datatype
30

Analyzing VSAM Sources


Source Analyzer
Mapped Drive NFS Mounting Local Directory

.CBL File
DEF

Repository Server TCP/IP

DATA

Repository Agent

native
Repository

Supported Numeric Storage Options: COMP, COMP-3, COMP-6

DEF

32

VSAM Source Properties

Creating Target Definitions


Methods of creating Target Definitions Import from Database Import from an XML file Manual Creation Automatic Creation

33

35

Target Object Definitions


By the end of this section you will: Be familiar with Target Definition types Know the supported methods of creating Target Definitions Understand individual Target Definition properties

Automatic Target Creation


Drag-anddrop a Source Definition into the Warehouse Designer Workspace

34

36

Import Definition from Database


Can Reverse engineer existing object definitions from a database system catalog or data dictionary
Warehouse Designer

Target Definition Properties

Database
ODBC

Repository Server TCP/IP Repository Agent

DEF

Table View Synonym

native
Repository DEF
37 39

Manual Target Creation


1. Create empty definition 2. Add desired columns

Target Definition Properties

3. Finished target definition

ALT-F can also be used to create a new column

38

40

10

Creating Physical Tables

Transformation Concepts
By the end of this section you will be familiar with:

DEF

Transformation types and views Transformation calculation error treatment Null data treatment
Execute SQL via Designer PHYSICAL Target database tables
41

DEF

DEF

Informatica data types Expression transformation Expression Editor Informatica Functions Expression validation
43

LOGICAL Repository target table definitions

Creating Physical Tables


Create tables that do not already exist in target database
Connect - connect to the target database Generate SQL file - create DDL in a script file Edit SQL file - modify DDL script as needed Execute SQL file - create physical tables in target database

Transformation Types
Informatica PowerCenter provides 23 objects for data transformation
Aggregator: performs aggregate calculations Application Source Qualifier: reads Application object sources as ERP
42

Use Preview Data to verify the results (right mouse click on object)

Custom: Calls a procedure in shared library or DLL Expression: performs row-level calculations External Procedure (TX): calls compiled code for each row Filter: drops rows conditionally Joiner: joins heterogeneous sources Lookup: looks up values and passes them to other objects Normalizer: reorganizes records from VSAM, Relational and Flat File Rank: limits records to the top or bottom of a range Input: Defines mapplet input rows. Available in Mapplet designer Output: Defines mapplet output rows. Available in Mapplet designer
44

11

Transformation Types
Router: splits rows conditionally Sequence Generator: generates unique ID values Sorter: sorts data Source Qualifier: reads data from Flat File and Relational Sources Stored Procedure: calls a database stored procedure Transaction Control: Defines Commit and Rollback transactions Union: Merges data from different databases Update Strategy: tags rows for insert, update, delete, reject XML Generator: Reads data from one or more Input ports and outputs XML through single output port XML Parser: Reads XML from one or more Input ports and outputs data through single output port XML Source Qualifier: reads XML data
45

Edit Mode
Allows users with folder write permissions to change or create transformation ports and properties
Define port level handling Enter comments Make reusable Define transformation level properties

Switch between transformations

47

Transformation Views
A transformation has three views:
Iconized - shows the transformation in relation to the rest of the mapping Normal - shows the flow of data through the transformation Edit - shows transformation ports and properties; allows editing
46

Expression Transformation
Perform calculations using non-aggregate functions (row level)
Passive Transformation Connected Ports Mixed Variables allowed Create expression in an output or variable port Usage Perform majority of data manipulation
48

Click here to invoke the Expression Editor

12

Expression Editor
An expression formula is a calculation or conditional statement Used in Expression, Aggregator, Rank, Filter, Router, Update Strategy Performs calculation based on ports, functions, operators, variables, literals, constants and return values from other transformations

Informatica Functions
TO_CHAR (numeric) TO_DATE TO_DECIMAL TO_FLOAT TO_INTEGER TO_NUMBER ADD_TO_DATE DATE_COMPARE DATE_DIFF GET_DATE_PART LAST_DAY ROUND (date) SET_DATE_PART TO_CHAR (date) TRUNC (date)
49

Conversion Functions Used to convert datatypes

Date Functions Used to round, truncate, or compare dates; extract one part of a date; or perform arithmetic on a date To pass a string to a date function, first use the TO_DATE function to convert it to an date/time datatype
51

Informatica Functions - Samples


ASCII CHR CHRCODE CONCAT INITCAP INSTR LENGTH LOWER LPAD LTRIM RPAD RTRIM SUBSTR UPPER REPLACESTR REPLACECHR

Informatica Functions
Numerical Functions
ABS CEIL CUME EXP FLOOR LN LOG MOD MOVINGAVG MOVINGSUM POWER ROUND SIGN SQRT TRUNC

Character Functions Used to manipulate character data CHRCODE returns the numeric value (ASCII or Unicode) of the first character of the string passed to this function

Used to perform mathematical operations on numeric data Scientific Functions Used to calculate geometric values of numeric data
COS COSH SIN SINH TAN TANH

For backwards compatibility only - use || instead

50

52

13

Informatica Functions
ERROR ABORT DECODE IIF Special Functions Used to handle specific conditions within a session; search for certain values; test conditional statements IIF(Condition,True,False) ISNULL IS_DATE IS_NUMBER IS_SPACES Test Functions Used to test if a lookup result is null Used to validate data

Variable Ports
Use to simplify complex expressions

e.g. - create and store a depreciation formula to be referenced more than once
Use in another variable port or an output port expression Local to the transformation (a variable port cannot also be an input or output port) Available in the Expression, Aggregator and Rank transformations

SOUNDEX METAPHONE

Encoding Functions Used to encode string values


53 55

Expression Validation
The Validate or OK button in the Expression Editor will: Parse the current expression
Remote port searching (resolves references to ports in other transformations)

Informatica Data Types


NATIVE DATATYPES TRANSFORMATION DATATYPES

Specific to the source and target database types Display in source and target tables within Mapping Designer

PowerMart / PowerCenter internal datatypes based on ANSI SQL-92 Display in transformations within Mapping Designer

Native

Transformation

Native

Parse transformation attributes


e.g. - filter condition, lookup condition, SQL Query

Parse default values Check spelling, correct number of arguments in functions, other syntactical errors


54

Transformation datatypes allow mix and match of source and target database types When connecting ports, native and transformation datatypes must be compatible (or must be explicitly converted)

56

14

Datatype Conversions
Integer Integer Decimal Double Char Date Raw X X X X Decimal X X X X Double X X X X Char X X X X X X X X Date Raw

Mapping Designer
Transformation Toolbar

Mapping List

Iconized Mapping

All numeric data can be converted to all other numeric datatypes, e.g. - integer, double, and decimal All numeric data can be converted to string, and vice versa Date can be converted only to date and string, and vice versa Raw (binary) can only be linked to raw Other conversions not listed above are not supported These conversions are implicit; no function is necessary
57 59

Mappings
By the end of this section you will be familiar with: Mapping components Source Qualifier transformation Mapping validation Data flow rules System Variables Mapping Parameters and Variables

Pre-SQL and Post-SQL Rules


Can use any command that is valid for the database type; no nested comments Can use Mapping Parameters and Variables in SQL executed against the source Use a semi-colon (;) to separate multiple statements Informatica Server ignores semi-colons within single quotes, double quotes or within /* ...*/ To use a semi-colon outside of quotes or comments, escape it with a back slash (\) Workflow Manager does not validate the SQL
58 60

15

Data Flow Rules


Each Source Qualifier starts a single data stream (a dataflow) Transformations can send rows to more than one transformation (split one data flow into multiple pipelines) Two or more data flows can meet together -- if (and only if) they originate from a common active transformation Cannot add an active transformation into the mix
ALLOWED DISALLOWED

Mapping Validation
Mappings must: Be valid for a Session to run Be end-to-end complete and contain valid expressions Pass all data flow rules Mappings are always validated when saved; can be validated without being saved Output Window will always display reason for invalidity

Passive T T T

Active T
61 63

Example holds true with Normalizer in lieu of Source Qualifier. Exceptions are: Mapplet Input and Joiner transformations

Connection Validation
Examples of invalid connections in a Mapping:
Connecting ports with incompatible datatypes Connecting output ports to a Source Connecting a Source to anything but a Source

Workflows
By the end of this section, you will be familiar with: The Workflow Manager GUI interface Workflow Schedules Setting up Server Connections
Relational, FTP and External Loader

Qualifier or Normalizer transformation Connecting an output port to an output port or an input port to another input port Connecting more than one active transformation to another transformation (invalid dataflow)

Creating and configuring Workflows Workflow properties Workflow components Workflow Tasks
62 64

16

Workflow Manager Interface


Task Tool Bar

Workflow Structure
A Workflow is set of instructions for the Informatica Server to perform data transformation and load Combines the logic of Session Tasks, other types of Tasks and Worklets
Workflow Designer Tools

The simplest Workflow is composed of a Start Task, a Link and one other Task
Link

Navigator Window

Workspace Start Task Session Task

Output Window Status Bar 65 67

Workflow Manager Tools


Workflow Designer
Maps the execution order and dependencies of Sessions, Tasks and Worklets, for the Informatica Server

Workflow Scheduler Objects


Setup reusable schedules to associate with multiple Workflows

Task Developer
Create Session, Shell Command and Email tasks Tasks created in the Task Developer are reusable

Used in Workflows and

Session Tasks

Worklet Designer
Creates objects that represent a set of tasks Worklet objects are reusable

66

68

17

Server Connections
Configure Server data access connections Used in Session Tasks
Configure: 1. Relational 2. MQ Series 3. FTP 4. Custom 5. External Loader

Relational Connection Properties


Define native relational (database) connection
User Name/Password Database connectivity information Rollback Segment assignment (optional)

Optional Environment SQL (executed with each use of database connection)


69 71

Relational Connections (Native )


Create a relational (database) connection Instructions to the Server to locate relational tables Used in Session Tasks

70

72

18

Session Task
Create an External Loader connection Instructions to the Server to invoke database bulk loaders Used in Session Tasks Server instructions to runs the logic of ONE specific Mapping e.g. - source and target data location specifications, memory allocation, optional Mapping overrides, scheduling, processing and load instructions
Becomes a

component of a Workflow (or Worklet) If configured in the Task Developer, the Session Task is reusable (optional)

75

Task Developer
Create basic Reusable building blocks to use in any Workflow Reusable Tasks Session Command Email Set of instructions to execute Mapping logic Specify OS shell / script command(s) to run during the Workflow Send email at any point in the Workflow
Session Command Email

Command Task
Specify one (or more) Unix shell or DOS (NT, Win2000) commands to run at a specific point in the Workflow Becomes a component of a Workflow (or Worklet) If configured in the Task Developer, the Command Task is reusable (optional)

74

Commands can also be referenced in a Session through the Session Components tab as Pre- or Post-Session commands

76

Command Task

Developing Workflows
Create a new Workflow in the Workflow Designer

Customize Workflow name

Select a Server

77

79

Additional Workflow Components


Two additional components are Worklets and Links Worklets are objects that contain a series of Tasks

Workflow Properties
Customize Workflow Properties
Workflow log displays

Links are required to connect objects in a Workflow

Select a Workflow Schedule (optional) May be reusable or non-reusable 78 80

20

Workflows Properties
Create a User-defined Event which can later be used with the Raise Event Task

Workflow Designer - Links


Required to connect Workflow Tasks Can be used to create branches in a Workflow All links are executed -- unless a link condition is used which makes a link false
Link 1 Link 3

Define Workflow Variables that can be used in later Task objects (example: Decision Task) Link 2 81 83

Building Workflow Components


Add Sessions and other Tasks to the Workflow Connect all Workflow components with Links Save the Workflow Start the Workflow
Save Start Workflow

Session Tasks
After this section, you will be familiar with: How to create and configure Session Tasks Session Task properties Transformation property overrides Reusable vs. non-reusable Sessions Session partitions

Sessions in a Workflow can be independently executed

82

84

21

Session Task
Created to execute the logic of a mapping (one mapping only) Session Tasks can be created in the Task Developer (reusable) or Workflow Developer (Workflow-specific) Steps to create a Session Task
Select the Session button from the Task Toolbar or Select menu Tasks | Create

Session Task - Properties

Session Task Bar Icon

85

87

Session Task - General

Session Task Config Object

86

88

22

Você também pode gostar