Você está na página 1de 25

Informatica PowerCenter Design

Specifications
<Month Year>
for <Client name> <Project Name>
Design Specifications

Notice

This is a controlled document. Unauthorised access, copying, replication or


usage for a purpose other than for which it is intended, are prohibited.
All trademarks that appear in the document have been used for
identification purposes only and belong to their respective companies.

Page 3 of 25
Design Specifications

DOCUMENT REVISION LIST

Client:

Project:

Document Name:

Release Notice Reference (for release):

Rev. Revision Revision Description Page Prev Action Addenda/ Release


No. Date No. Page Taken New Notice
No. Page Reference

Page 4 of 25
Design Specifications

Table of Contents

1 Introduction............................................................................................................6
1.1 Objective and Scope.......................................................................................6
1.1.1 In Scope..................................................................................................6
1.1.2 Out of Scope...........................................................................................6
1.2 Reference........................................................................................................6
1.3 Structure of the document..............................................................................6
1.4 Acronyms.......................................................................................................6
2 ETL Architecture....................................................................................................7
2.1 Logical Architecture.......................................................................................7
2.1.1 Source System........................................................................................7
2.1.2 Staging Area...........................................................................................7
2.1.3 ODS........................................................................................................7
2.1.4 EDW.......................................................................................................7
2.1.5 DM.........................................................................................................7
2.2 Physical Architecture......................................................................................8
2.3 Sources and Targets System Details...............................................................9
2.3.1 Source System 1.....................................................................................9
2.3.2 Target System 1....................................................................................10
2.3.3 Volumetric............................................................................................10
2.4 Schedule and Frequency...............................................................................11
2.4.1 Execution Order...................................................................................11
2.4.2 Frequency.............................................................................................12
2.4.3 Scheduler Information..........................................................................12
2.5 Historical Data Loading...............................................................................12
2.6 ETL Control Schema Design.......................................................................12
2.6.1 Data Validation Rules...........................................................................12
2.6.2 Control Schema Tables.........................................................................12
2.6.3 Reconciliation.......................................................................................12
2.6.4 Exception Handling..............................................................................12
3 Release Management............................................................................................13
3.1 Configuration Management..........................................................................13
3.2 Security........................................................................................................13
3.3 Deployment Options....................................................................................13
4 Reusable Components..........................................................................................14
4.1 Transformations............................................................................................14
4.2 Mapplets.......................................................................................................14
4.3 User Defined Functions................................................................................15
4.4 Parameters and Variables.............................................................................15
4.4.1 Parameter File Format..........................................................................15
5 Testing Strategy....................................................................................................16
5.1 Testing Environments...................................................................................16
5.2 Unit Testing..................................................................................................17
5.2.1 Approach..............................................................................................17
5.2.2 Deliverables..........................................................................................17
5.3 System Integration Testing...........................................................................17
5.3.1 Approach..............................................................................................17

Page 5 of 25
Design Specifications

5.3.2 Deliverables..........................................................................................17
Appendix A: Traceability to SRS.................................................................................19
Appendix B: File Formats............................................................................................20
Appendix C: Standards and Best Practices..................................................................21
Appendix D: Mapping Inventory.................................................................................22

Page 6 of 25
Design Specifications

1 Introduction
1.1 Objective and Scope
<<Client name>> is deploying xxx product as <<an ETL Solution that’s part of a
BI initiative/a Data Migration Platform/a Data Consolidation Platform/a Data
Quality Platform/a Tool Migration Platform>>.

This document covers detailed design of the ETL layer that extracts data from
<sources> and loads transformed data to <target>.

1.1.1 In Scope
 Source to target mapping flow
 Data Cleansing Rules
 Data Validation and Reconciliation Rules
 Execution Order and Scheduling
 Security Model
 Release Management
 Testing Strategy

1.1.2 Out of Scope


 xxx

1.2 Reference
 Product Best Practices
 Product Lessons Learnt
 Project SRS document
 Product Manuals
 <Any other reference material>

1.3 Structure of the document


 xxx

1.4 Acronyms

Page 7 of 25
Design Specifications

2 ETL Architecture
This chapter details different components that make up the ETL environment.
Logical and physical deployment architectures, execution order (scheduling),
connectivity and details on sources and targets systems are elaborated in
following sub-sections.

2.1 Logical Architecture


This section elaborates the process flow information from the source system to
the target. E.g. in a typical BI project, process flow is as below:

Source System -> Staging Area -> ODS -> EDW -> DM

2.1.1 Source System


Holds transaction details to be extracted

2.1.2 Staging Area


Holds incremental data for a load cycle

2.1.3 ODS
Holds clean and standardized transactional data

2.1.4 EDW
Holds facts-dimensions

2.1.5 DM
Holds subject area specific summaries/snapshots

Table below summarizes ETL details for various transformation phases. The details
filled in this table are sample information
Data
ETL Extraction Extract Target Load Frequency R
Source Type Retention
Stage Strategy Mechanism Type Mechanism of Loads S
period
Sourc
e to 1. SAP 1. Incremental 1. Push T
Oracle Direct 1 load cycle Daily
Stagin 2. Oracle 2. Incremental 2. Pull In
g
Stagin
1 fiscal In
g to Oracle Full extracts Pull Oracle Direct Daily
month U
ODS
ODS
Teradat History (no In
to Oracle Changed Data Pull Indirect Daily
a deletes) U
EDW

Explanation of contents to be filled in each column:

Extraction Strategy:

Page 8 of 25
Design Specifications

 Full Extracts - All the data from the source is extracted for transformation and
loading
 Incremental Extracts - Date stamps or flags available on the source tables are
utilized for this. Usually, data inserted/updated/deleted in a load cycle is
fetched and passed for further processing. A control schema can also be
utilized for identifying records from source system to be fetched for a run cycle
 Changed Data - All the data that has changed in the source system will be
fetched. Products like PowerExchange can be utilized for capturing changed
data. Otherwise, complete compare of source tables with the target tables can
be done to identify changed data. Also, date stamps or flags as mentioned in
Incremental aggregation can be utilized

Extract Mechanism:
 Push Mechanism - Source system provides data to the ETL engine in file
format. These could be flat files or normalized files that are ftped to a server
machine. E.g. Cobol files pushed via JCL’s
 Pull Mechanism - ETL engine extracts data from source system via ODBC or
Native connectivity. Also, extraction via adapters for mainframe and
applications (such as SAP, PeopleSoft, etc) is also a Pull mechanism

Load Mechanism:
 Direct - Data is directly loaded to target tables after transformations via native
or ODBC connectivity.
 Indirect - Data is written to flat files or intermediate structures after
transformations. These are passed to Loader utilities to be committed on
target system

2.2 Physical Architecture


This section elaborates on the hardware/software environment. Details on
licensing options are included in this section.
Examples -
 High Availability
 Grid
 Pushdown Optimization
 Team Based Development
 Data Quality (IDQ/IDE)
 Metadata Reporter

The hardware/software environment for Development, Test and Production


systems is tabulated in the following table:

System Integration
Environment Development Pre-production Production
Test
Domain Name
Node Name
Operating System
No. of CPUs

Page 9 of 25
Design Specifications

32-bits/64-bits
environment
RAM (in GB)
Services Running on
the node
Backup Node
Repository Database
2.3 Sources and Targets System Details
This section provides information on target tables getting loaded from one or
more source tables.

Any custom table generated from ETL perspective for Error Handling,
Reconciliation and Reference data are also to be included in this section. All the
custom tables should ideally map only to the target system. These tables are
termed as Control tables and provide metadata on data errors and ETL Load
process.

Intermediate systems like Staging, ODS, etc are to be documented as


Source/Target systems depending on design.

2.3.1 Source System 1

Source system Name


Source system Type
Connectivity
Source system version
OS type and version

The following table can be used as a reference for detailing Source System type
and Connectivity columns

Source System type Connectivity


Relational (RDBMS) Product specific drivers, Native connectivity
Files (Flat, VSAM, XML) Connectivity to server on which files will be placed
Proprietary Systems (ERP, CRM) Product specific adapters (e.g. PowerCenter Connect)
Queues (Tibco, WebMethods, Product specific adapters (e.g. PowerCenter Connect)
etc)
Pull from Mainframe system Product specific adapters (e.g. Power Exchange)
CDC Product specific Log Miner (e.g. Power Exchange)

The following table lists the source tables from which data will be extracted.

S No. Table name Type Indexes Description

 Type of table: Transaction, Lookup, Master, etc

Page 10 of 25
Design Specifications

 Indexes: Unique, Referential


 Description: Business information of the data that is held in the source table
from extraction perspective

Page 11 of 25
Design Specifications

2.3.2 Target System 1

Target system Name


Target system Type
Connectivity
Target system version
OS type and version

The following table lists the target tables to which data will be loaded as part of
the ETL program:

S No. Table name Type Indexes SDC Type Description

 Type of table: Dimension/Fact/Aggregate/Snapshot


 Indexes: Primary, Unique, Referential
 SCD Type: For dimension tables, this will have values like SCD1, SCD2 or SCD3
 Description: This should include information on what sort of data is held in this
table

2.3.3 Volumetric

Sources
S No Load Type Source Row Size Number Estimated
table /File (in KB) of Rows Size in
MB/GB
Initial Load
Regular
/Scheduled
Load

Dimensions
S Load Type Target Number Row Number Number Number Estimated
No table of size SCD of of New of Volume in
columns In KB Type history Rows Updated MB/GB
rows Rows
Initial
Regular/
Scheduled

Facts
S Load Type Target Number Row Number Number Number Estimated
No table of size of of New of Volume in
columns In KB history Rows Updated MB/GB
rows Rows
Initial

Page 12 of 25
Design Specifications

Regular/
Scheduled
2.4 Schedule and Frequency
This section details the data flow from source to target system based on:
 Execution order of various ETL routines
 Dependencies between ETL routines

2.4.1 Execution Order


This section details the process flow in terms of ETL components from Source to
target system. Sample flow is as depicted in the following figure

The following sample figure depicts the sub-processes executed as part of ETL_Process1

Table below gives the elaborate list of individual ETL processes and sub-process hierarchy

Process Level 1 (Batch) Process Level 2 ETL Process

Page 13 of 25
Design Specifications

For PowerCenter, this could mean Workflow/Worklet/Session


2.4.2 Frequency
Two aspects are documented in this sub-section:
 Load Window
 Load Frequency

2.4.3 Scheduler Information


Details on scheduler that will be utilized for invoking PowerCenter workflows will be provided
here. This could be PowerCenter Scheduler or external schedulers like Control M, Cron, etc

2.5 Historical Data Loading


Historical Data loading is usually done in following scenarios:
1. Initial loads, when target system is empty and requirement is spelled to have some
history in target
2. New entities are added in target system

These are usually one time loads. Separate set of PowerCenter mappings can be created for
one time loading or mappings created for on-going load process can be utilized for this. Thus,
the strategy to be followed for loading Historical data will go in this sub section.

2.6 ETL Control Schema Design


This section elaborates high level business rules for cleaning source data while it is
transformed. The process could be embedded as part of ETL routines or could be
implemented external to PowerCenter using products like IDQ, Trillium, First Logic, etc. All the
common data validation rules are mentioned as part of this sub-section. These could be:
1. Functional - Validation checks done on source data for respective functionality (eg.
Billing Address for a customer should always be in xxx format)
2. Technical or Operational - Unique Constraints, Duplicate data, Null values, default
values, etc

The validation checks could result in dropping the entire dataset or partial removal of source
data. In any scenario, information about errors is reported either by utilizing PowerCenter
Metadata repository or by creating a custom schema.

Reference: ETL Error Handling and Reconciliation Strategy.doc

2.6.1 Data Validation Rules

2.6.2 Control Schema Tables

2.6.3 Reconciliation
Target data should always trace back to extracted + rejected data. A strategy for handling this
traceability is documented as part of this sub-section.

2.6.4 Exception Handling


All the constraints, validations that stop the ETL process from completing successfully are
mentioned in this sub-section.

Page 14 of 25
Design Specifications

3 Release Management

3.1 Configuration
Management
Versioning of PowerCenter components can be
managed from within the product utilizing Team
Based Development. Also, the objects can be
exported as xml and held in version
management tools like Visual Source Safe,
PVCS, etc.

In many scenarios, along with Team based


development option, external version control
tools are used. The sole reason for this is,
PowerCenter cannot manage versioning of any
external component like scripts, documents,
etc.

Folder system within PowerCenter and external


Configuration management tools is defined in
this sub-section. Sample information is at
depicted in the screen shot.

3.2 Security
This sub-section mentions:
1. Groups and user created in PowerCenter environment for performing Development,
Deployment and house keeping activities
2. User privileges on source and target systems
3. Constraints - Typically, for a target system, DELETE or TRUNCATE options from
within ETL routines are not enabled

3.3 Deployment Options


Details on how the code will be moved from one environment to another. The different
possibilities are -
1. Using Deployment Groups
2. Using XML import/export facilities
3. Drag/Drop from one environment to another - This is not a best practice though

Page 15 of 25
Design Specifications

4 Reusable Components
All common business rules should be out-lined in this sub-section. The details could be as
granular as mentioned in Detail Design document or could be at a very high level.
Considerable effort should be invested in identifying reusable components before Build phase
to avoid duplication of efforts and components and thus enabling faster build-test with better
performance. Also, this also reduces maintenance overheads.

4.1 Transformations
These could be tabulated as in following table

Note: This is a sample for Informatica PowerCenter


Transformatio Transfor Is Shared Transformat
Sr.
n Name mation Shared? Folder ion Value Comments
No
(As in map) Type (Y/N) Name Properties
Lookup Connected/
Type Unconnected
Lookup
Cache Y/N
enabled?
Lookup
1. Cache Y/N
LKP_xxx Lookup
persistent?
Lookup First
Policy on value/Last
Multiple Value/Any
Match Value
Connection $Source/
Information $Target
3. Expressi
EXP_xxx
on
4. UNION_xxx Union Y

4.2 Mapplets

MPI_ERRORS_ EXP_SET_ERR FIL_NO_ERRO LKP_MESSAGE


PROCESS_LOG OR_FLAG RS S

EXP_FIELD_CO MPO_ERRORS
NCATENATION _PROCESS_LO
G
Sample VISIO templates can be build for any reusable mapplet components as depicted in
process flow above.

Transformations within the mapplet can be elaborated as in 4.1

Page 16 of 25
Design Specifications

4.3 User Defined Functions


In PowerCenter 8.x, User Defined functions can be created for standard conversions like
string to date, etc.

4.4 Parameters and Variables

Data
S.No Name Type Prec Scale Aggregation Example
Type
1. $$P_CUTOFF_DATE Parameter String 8 20060910

4.4.1 Parameter File Format

Page 17 of 25
Design Specifications

5 Testing Strategy
The primary purpose of testing is to verify that the system is developed according to the
design and specifications provided. In addition, it will ensure that the solution operates as
intended.

There will be three levels of testing performed for each delivery.

Testing Level Description


Unit Testing of each individual function. This Usually includes testing individual
mappings, any scripts and any other external programs. Ideally, the
developers will tests all error conditions and logic branches within the
code.
System Testing performed to review the system as a whole. This may include, but
is not limited to, data integrity and reliability of the system
Incremental At the end of each delivery, to review impact of changes made in current
Regression delivery on prior deliveries, Incremental regression will be performed.
After the delivery of all phases, end-to-end testing from Source to Target
will be performed to validate all the components created from Phase 1

5.1 Testing Environments


Provide the environment details in this section. For each environment, include
1. Hardware/Software Platform details
2. Access Control
3. Data policy

Various testing environments are depicted in the following diagram.

Development/ SOLT Production


System Testing
Development Box Testing Box

Development PowerCenter environment will be reused for Unit testing as well as System
Testing. Backup of production source database will be used as source for both Unit as well as
System testing. Two separate target instances will be created for Unit testing and system
testing.

Testing environment or SOLT should be a Production-like environment. This is particularly


true if significant production readiness and/or performance testing is to be performed.

Page 18 of 25
Design Specifications

5.2 Unit Testing


In Unit testing, stress will be given on validating all the transformation rules are applied
correctly. Also, components look and feel, naming conventions will be verified.
Stand-alone components will be tested as part of unit testing. PowerCenter debugger will be
used for ensuring all the business rules are applied and getting transformed correctly.

5.2.1 Approach
1. Prepare
a. Unit Test Plans
b. Ensure data available in source is sufficient to execute all test cases
2. Review of Test Plans and test results internally
3. Perform Unit Testing
a. Perform Code Walkthrough to ensure coding standards and practices are
applied
b. Document Unit test results corresponding to Unit test cases
4. Close any defects raised during Unit Testing

5.2.2 Deliverables
1. Unit Test Plans
2. Unit Test Results
3. Issue tracker
4. Unit tested components

5.3 System Integration Testing


Purpose of system testing is -
 Data Validation - For sample data, Source data will be validated against target data to
ensure application of business rules defined in requirements
 Data Reconciliation - Will compare checksums, record counts
 Functional Dependencies

5.3.1 Approach
1. Prepare
a. Test Plans
b. Prepare Test Data – from existing production data
c. Operations Guide - for conducting verification testing
d. Test scripts
2. Review of Test plans by internal and client team
3. Setup System Test environment
a. Movement of PowerCenter components from local folders to Project specific
folders in Development environment
b. Setting up a new instance of Target schema
c. Replication of source data from other environments to test environment
4. Perform System Testing
a. Documentation of system test results corresponding to test cases
5. Incremental Regression Testing
a. System test cases will be reused
b. Regression test results corresponding to test cases will be documented
6. Close any defects raised during System Testing

5.3.2 Deliverables

Page 19 of 25
Design Specifications

1. System Test Plans


2. System Test Scripts
3. System Test Results
4. Incremental Regression Test results
5. Operations Guide
6. Status reports and issue tracker

Page 20 of 25
Design Specifications

Appendix A: Traceability to SRS


The following table gives the traceability of the design document to the SRS:

No Section in the SRS document Section in this document Task Responsib


Profile ility
1.
2.

Page 21 of 25
Design Specifications

Appendix B: File Formats

Page 22 of 25
Design Specifications

Appendix C: Standards and Best


Practices

Page 23 of 25
Design Specifications

Appendix D: Mapping Inventory

Page 24 of 25

Você também pode gostar