Você está na página 1de 92

Asias Largest Global Software & Services Company Confidential

Ab Initio
1
Introduction
Asias Largest Global Software & Services Company Confidential
Ab Initio
2
Agenda
WhatisDatawarehousing?
WhyDatawarehousing?
ETLprocess
VariousETLtools
IntroductionaboutAb Initio
whyAb Initio
HowUnixinvolvedwithAb Initio
GDEwindow
EMERepository
Sandboxes UserandStandardSandbox
Ab Initio Components
Creationofsimplegraphs
Asias Largest Global Software & Services Company Confidential
Ab Initio
Data warehousing
and
ETL Process
Asias Largest Global Software & Services Company Confidential
Ab Initio
4
DataWarehouse
DataWarehouseisacollectionoflogical

DataMarts,eachofwhichisdesignedfora

particularlineofbusinessi.e.Sales,Marketing(designedtofavor/facilitatedata

analysisandreporting).
Asias Largest Global Software & Services Company Confidential
Ab Initio
5
Why

Datawarehousing?
Asias Largest Global Software & Services Company Confidential
Ab Initio
6
ETLprocess
DataisfirststoredtemporarilyinaStagingTable/Area

andiscalledStagingData

i.e.

Dataqueuedforprocessing.
TheprocessingtoolreadstheStagedData,performsqualitativeprocessing,filtering,

cleansing(AsrequiredfortheOLAPi.e.reporting/analysis)and

finallyloads/writes

themintoDataWarehouse.
Allthesedataflow(bothinwardandoutward)anddataprocessingactivities

(ExtractionfromSourceSystem Transformationofdatabycleansing/filtering

LoadingintoDataWarehouse)areperformedusinganETLtooli.e.Ab

Initio,

Informatica

etc.
ThisentireprocessissaidtobeasETLprocess.
Asias Largest Global Software & Services Company Confidential
Ab Initio
7
Extract,Transformation,Load(ETL)functionalities
Extract:
ThefirstphaseofanETLprocessistoextractthedatafromthesourcesystems.

Eachseparatesystemmayalsouseadifferentdataorganization/

format.

Commondatasourceformatsarerelationaldatabases,andflatfiles,butother

sourceformatsexist.Extractionconvertsthedataintorecordsandcolumns.
Transform: Thetransformphaseappliesaseriesofrulesorfunctionstothe
extracteddata.
Examples:
Deriveanewcalculatedvalue(e.g.sale_amount =qty*unit_price)
Summarizemultiplerowsofdata(e.g.totalsalesforeachregion)
Load:
o Theloadphaseloadsthedataintothedatawarehouse.Depending

onthe

requirementsoftheorganization,thisprocessrangeswidely.
o SimpleOverwriteolddatawithnew.
o Morecomplexsystems>Maintenanceofhistoryandaudittrailofallchangesto

thedata
Asias Largest Global Software & Services Company Confidential
Ab Initio
8
VariousPopularETLTools
Tool Name Company Name
Informatica Informatica Corporation
DT/studio Embarcadero technologies
Datastage IBM
Abinitio Abinitio Software corporation
Talend Talend corporation
Pentaho Pentaho corporation
Datajunction Pervasive Software
Oracle warehouse builder Oracle Corporation
Microsoft SQl Server Integration Microsoft
Asias Largest Global Software & Services Company Confidential
Ab Initio
Introduction-Ab-Initio
Asias Largest Global Software & Services Company Confidential
Ab Initio
10
IntroductiontoAbinitio
Data processing tool from Ab Initio software corporation
(http://www.abinitio.com)
Latin for from the beginning
Designed to support largest and most complex business applications
Graphical, intuitive, and fits the way your business works .
Focus:
Moving Data -
Move small and large volumes of data in an efficient manner.
Deal with the complexity associated with business data.
High Performance
Scalable Solutions
Better productivity
Usage:
Data Warehousing
Batch Processing
Data Movement
Data Transformation
Asias Largest Global Software & Services Company Confidential
Ab Initio
11
ProductConstituents
CooperatingSystem(Co>Ops)
GraphicalDevelopmentEnvironment

(GDE)
SSH
REXEC
TELNET
DCOM
EME
DB
Conduct>IT
CF
Product Functionality
GDE UserInterfaceforcreatingGraphsandPlansinAb

Initio
DataProfiler Ab

Initio

ToolforDataProfiling
Co>Ops ServerComponentforrunningdeployedAb

Initio

programs
EME Ab

Initio

TechnicalRepository PartofCo>OpsInstall
Database Ab

Initio

ServerDatabaseComponents
Conduct>IT Ab

Initio

ServerComponentforrunningAb

Initio

Plans
Continuous

Flow
Ab

Initio

ServerComponentsforrunningCFprograms
Allservercomponentsareinstalledbydefault
AB_HOME

referstoinstallationlocationofAb

Initio
VariousConnectorsandPlugins

installedinAB_HOME/Connectors

&

AB_HOME/plugins

location
AllbinariesandlibraryfilesavailableinAB_HOME/bin

&AB_HOME/lib

respectively
Asias Largest Global Software & Services Company Confidential
Ab Initio
12
Ab

Initio

ProductArchitecture
Native Operating System (Unix, Windows, OS/390)
The Ab Initio Co>Operating System
Component
Library
Development Environments
GDE Shell
3rd Party
Components
User-defined
Components
User Applications
Ab Initio
EME
Asias Largest Global Software & Services Company Confidential
Ab Initio
13
ProductArchitecture
Unix Shell Script or NT Batch File
Supplies parameter values to underlying
programs through arguments and
environment variables
Controls the flow of data through pipes
Usually generated using the GDE
Operating System
( Unix , Windows NT )
User
Programs
Co>Operating System
Ab Initio Built-in
Component Programs
(Partitions, Transforms etc)
Host Machine 1
Operating System
User
Programs
Host Machine 2
Co-Operating
System
GDE
Ability to graphically design
batch programs comprising Ab
Initio components, connected
by pipes
Ability to test run the graphical
design and monitor its
progress
Ability to generate a shell
script or batch file from the
graphical design
Asias Largest Global Software & Services Company Confidential
Ab Initio
14
Co>OperatingSystemandGDE
Co>Operating System
Layered on the top of the operating system.
Unites a network of computing resources CPUs, storage disks,
programs, datasets into a data-processing system with scalable
performance.
GDE
can talk to the Co-operating system using several protocols like
Telnet, Rexec and FTP
It is GUI for building applications in Ab Initio
Asias Largest Global Software & Services Company Confidential
Ab Initio
15
The Graph Model
Graph
isthelogicalmodularunitofanapplication.
consistsofseveralcomponentsthatformsthebuildingblocksof

anAbInitio

application
StartScript(HostSetup)

LocaltotheGraph
EndScript

LocaltotheGraph
Component
isaprogramthatdoesaspecifictypeofjobandcanbecontrolledbyitsparameter

settings.Ex:Join,Reformatetc
ComponentOrganizer
Groupsallcomponentsunderdifferentcategories.
SetupCommand
AbInitioHost(AIH)file
BuildsuptheenvironmenttorunanAbInitioapplication.
Asias Largest Global Software & Services Company Confidential
Ab Initio
16
Partsoftypicalgraph
Files
Formats
Components
Flows
Layouts
Building with mp job
Building with mp run
Asias Largest Global Software & Services Company Confidential
Ab Initio
17
The Graph Model: Naming the pieces
A Sample Graph
L1
Customers
L1*
Score
out*
deselect*
L1*
Select
L1
Good
Customers
L1
Other
Customers
Dataset
Components
Datasets
Flows
Asias Largest Global Software & Services Company Confidential
Ab Initio
18
The Graph Model: A closer look
ASampleGraph
Ports
Record format metadata
Expression Metadata
Layout
Asias Largest Global Software & Services Company Confidential
Ab Initio
19
RuntimeEnvironment
Agraph,afterdevelopment,isdeployedtothebackendserverasaUnixshellscript
orWindowsNTbatchfile.
ThisbecomestheexecutabletorunatthebackendwiththehelpoftheCooperating
system.
TheexecutioncanbedonefromtheGDEitselformanuallyfromthebackend
AbInitioruntimeenvironmentisdifferentfromthedevelopmentenvironment.
Asias Largest Global Software & Services Company Confidential
Ab Initio
UnixandAbinitio
Unix serves as backend for Ab-initio.
All the graphs/Jobs in Ab-initio can be accessible through Unix(backend)
Putty connectivity
Environment Quick Overview:
$AI_RUN,$AI_BINrun directory, .ksh scripts
$AI_PLAN, $AI_SERIAL_<LOG/ERROR/SUMMARY>
$AI_DMLrecord format files
$AI_XFRtransform files
$AI_MPgraphs
$AI_DBdatabase config files
$AI_SERIAL - serial source data, other serial data
$AI_MFS - Ab Initio multifile directory in training will also contain partition
directories (more about this later!)
$AI_LOG - A location to place logging files, etc
Asias Largest Global Software & Services Company Confidential
Ab Initio
Sandboxes are work areas used to
develop, test or run code associated
with a given project. Only one
version of the code can be held
within the sandbox at any time.
The EME Datastore contains all
versions of the code that have been
checked into it.
Check-in
Check-out
Sandboxes and EME
Check-out
Asias Largest Global Software & Services Company Confidential
Ab Initio
AbinitioEnvironment
Asias Largest Global Software & Services Company Confidential
Ab Initio
AbinitioEnvironment

Jobrun
How a job runs
The execution of an Ab Initio graph is a job.
To run a job, need to invoke a shell script that the GDE generates from a
graph.
The script process initiates job processes that control the execution of
the programs represented by the graph.
Graph->mp/graph1.mp ; Shell script->run/graph1.ksh
Asias Largest Global Software & Services Company Confidential
Ab Initio
AbinitioEnvironment

Jobrun
You can invoke the script in two ways:
From the GDE
From a command line
To invoke the script from the GDE, click the Run button or
choose Run > Start from the GDE menu bar.
To invoke the script through command line,
For bin script: ksh scriptname.ksh in bin path.
To run a graph from backend: $AI_RUN Graphname.ksh
parameters(if needed) in run path.
Asias Largest Global Software & Services Company Confidential
Ab Initio
CreationofaGraph
Asias Largest Global Software & Services Company Confidential
Ab Initio
Components - Overview
Asias Largest Global Software & Services Company Confidential
Ab Initio
ComponentOrganizer
Asias Largest Global Software & Services Company Confidential
Ab Initio
28
Asamplegraph
Asias Largest Global Software & Services Company Confidential
Ab Initio
29
A sample korn shell script
Asias Largest Global Software & Services Company Confidential
Ab Initio
DatasetComponentProperties
Double click on a
component to bring
up its Properties Page
Asias Largest Global Software & Services Company Confidential
Ab Initio
ViewingPortProperties
Click on the Ports Tab
to view the Port(s)
Properties
Asias Largest Global Software & Services Company Confidential
Ab Initio
32
DMLsandXFRs
DML
Ab Initio stores metadata in the form of record formats.
Metadata can be embedded within a component or can be stored
external to the graph in a file with a .dml extension.
XFR
Data can be transformed with the help of transform functions.
Transform functions can be embedded within a component or can be
stored external to the graph in a file with a .xfr extension.
Asias Largest Global Software & Services Company Confidential
Ab Initio
33
DataMetadataLanguageorDML
DMLSyntax
Recordtypesbeginwithrecord andendwithend
Fieldsaredeclared:data_type(len)field_name;
Fieldnamesconsistofletters(az,AZ),digits(09)andunderscores(_)and
areCasesensitive
Keywords/Reservedwordsarerecord, end,date.
SomeoftheDataTypesavailable
String
Decimal
Integer
StoringDatainbinaryform
DateandDatetime
EBCDICandASCIIrecords
NullinAbInitio Nonexistenceofcolumnvalues.
Asias Largest Global Software & Services Company Confidential
Ab Initio
InTextview(specialsymbolasdelimiter)
Asias Largest Global Software & Services Company Confidential
Ab Initio
Recordformat

InGraphicalform(gridview)
DML format created for a data
0345John Smith
0212Sam Spade
0322Elvis Jones
0492Sue West
0121Mary Forth
0221Bill Black
Asias Largest Global Software & Services Company Confidential
Ab Initio
EditingTypesinGDE
DML creation
Field name Field type Field length
Asias Largest Global Software & Services Company Confidential
Ab Initio
MoreRecordFormatEditing
View Attributes.
Field Type drop-down
Length can be delimiter string
Date format goes here
Asias Largest Global Software & Services Company Confidential
Ab Initio
AutoDMLcreationinTablecomponent
Asias Largest Global Software & Services Company Confidential
Ab Initio
DMLcreation Usefileoptionindataset
Asias Largest Global Software & Services Company Confidential
Ab Initio
40
TransformFunctions:XFRs
User-defined function producing one or more output from one or more
input
Associated with transform components
Rules that computes expression from input values and local variable and
assigns the result to output objects
Syntax
Functions :
output-records : : function-name (input-records) =
begin
assignments
End;
Assignments :
Direct Mapping without any transformation: out.* :: in.*
Asias Largest Global Software & Services Company Confidential
Ab Initio
Inputfilesettings
Asias Largest Global Software & Services Company Confidential
Ab Initio
InputData

RecordView
Asias Largest Global Software & Services Company Confidential
Ab Initio
InputfileView Backend
Asias Largest Global Software & Services Company Confidential
Ab Initio
Output

Settings(Propagatingfrominput)
Asias Largest Global Software & Services Company Confidential
Ab Initio
45
LookupFile
Serial or Multifiles
Held in main memory
Searching and Retrieval is key-based and faster as compared to files stored
on disks
associates key values with corresponding data values to index records and
retrieve them
Lookup parameters
Key
Record Format
Asias Largest Global Software & Services Company Confidential
Ab Initio
46
BasicComponents
FilterbyExpression
Reformat
RedefineFormat
Sort
Join
Replicate
Dedup
Rollup
Asias Largest Global Software & Services Company Confidential
Ab Initio
47
FilterbyExpression
Readsrecordfrominput port
Evaluatetheselect_expr
Ifresultistrue,recordwrittentoout port
Ifresultisfalse,recordwrittentodeselect port
true?
expr
No
Input port
Deselect Out port
Yes
Asias Largest Global Software & Services Company Confidential
Ab Initio
48
Diagnostic Ports
REJECT
Input records that caused error
ERROR
Associated error message
LOG
Logging records
Asias Largest Global Software & Services Company Confidential
Ab Initio
FilterbyExpression
Asias Largest Global Software & Services Company Confidential
Ab Initio
Filteredoutput
Asias Largest Global Software & Services Company Confidential
Ab Initio
51
Reformat
1.

Readsrecordfrominput

port
2.

Recordpassesasargumenttotransformfunctionorxfr
3.

Recordswrittentoout

ports,ifthefunctionreturnsasuccessstatus
4.

Recordswrittentoreject

ports,ifthefunctionreturnsafailurestatus
5.

ParametersofReformatComponent
Count
Transform(Xfr)Function
RejectThreshold
Abort
NeverAbort
UseLimit&Ramp
Limit Numberoferrorstotolerate
Ramp ScaleoferrorstotolerateperInput
Asias Largest Global Software & Services Company Confidential
Ab Initio
Reformatrejectthreshold
A drop-down menu
specifying the number of
errors to tolerate.
Asias Largest Global Software & Services Company Confidential
Ab Initio
TransformfunctionalityinReformat
Asias Largest Global Software & Services Company Confidential
Ab Initio
Reformattedoutput
Asias Largest Global Software & Services Company Confidential
Ab Initio
55
Sort
SortComponent
Readsrecordsfrominputport,sortsthembykey,writesresulttooutputport
Parameters
Key
Maxcore
Keys
Akeyidentifiesafieldorsetoffieldstoorganizeadataset
SingleField:employee_number
MultiplefieldorCompositekey:(last_name;first_name)
Modifiers:employee_numberdescending
Maxcore:Maximummemoryusageinbytes
Asias Largest Global Software & Services Company Confidential
Ab Initio
SortFunctionality
Asias Largest Global Software & Services Company Confidential
Ab Initio
Sortedoutput
Asias Largest Global Software & Services Company Confidential
Ab Initio
58
Join
1.

Readsrecordsfrommultipleinputports
2.

Operatesonrecordswithmatchingkeysusingamultiinputtransformfunction
3.

Writesresulttotheoutputport
PORTS
PARAMETERS
in
out
unused
reject(optional)
error(optional)
log(optional)
count
key
overridekey
transform
limit
ramp
Asias Largest Global Software & Services Company Confidential
Ab Initio
JoinParameters
Asias Largest Global Software & Services Company Confidential
Ab Initio
Joinedoutput
Asias Largest Global Software & Services Company Confidential
Ab Initio
Rollup
Rollup evaluates a group of input records that have the same key, and then
generates records that either summarize each group or select certain information
from each group.
Parameters:
check-sort,sorted input limit,Ramp
logging log_group
log_input log_intermediate
log_output grouped-input
error_group key
key-method major-key
log_reject max-core
Asias Largest Global Software & Services Company Confidential
Ab Initio
Rollup - functionality
Asias Largest Global Software & Services Company Confidential
Ab Initio
Rollup

Output
Asias Largest Global Software & Services Company Confidential
Ab Initio
BuiltinFunctionsforRollup
The following aggregation functions are predefined and are only
available in the rollup component:
avg
max
min
count
first
Product
last
Sum
Multi-stage Transform
initialize,iterate,finalize,use of variables
Asias Largest Global Software & Services Company Confidential
Ab Initio
RollupWizard
Note the use of an aggregation function in the expression
Asias Largest Global Software & Services Company Confidential
Ab Initio
SimpleandComplexComponents
Inthesecomponentstherecordformatmetadata

typicallychanges(goesthroughatransformation)

frominputtooutput
Inthesecomponentstherecordformat

metadatadoesnotchangefrominputtooutput
Asias Largest Global Software & Services Company Confidential
Ab Initio
67
PriorityAssignment
ThePriority

istheorderofevaluationofrulesinatransformfunction.
Anexample
Ajoincomponentmayhaveatransformfunctionwithprioritizedrulesas
out.ssn:1:in1.ssn;
out.ssn:2:in2.ssn;
out.ssn:3:"999999999";
Asias Largest Global Software & Services Company Confidential
Ab Initio
PriorityAssignmentcontd
Asias Largest Global Software & Services Company Confidential
Ab Initio
UsinglookupinsteadofJoin
Using Last-
Visits
as a lookup
file
Asias Largest Global Software & Services Company Confidential
Ab Initio
UsingalookupfileinaTransformFunction
Output record format:
record
decimal(4) id;
string(8) city;
decimal(3) amount;
date(YYYY/MM/DD) dt;
end
Input 0 record format:
record
decimal(4) id;
string(6) name;
string(8) city;
decimal(3) amount;
end
Transform function:
out :: lookup_info(in) =
begin
out.id : : in.id;
out.city : : in.city;
out.amount : : in.amount;
out.dt :1 : lookup(Last-Visits, in.id).dt;
out.dt :2 : 1900/01/01;
end;
Asias Largest Global Software & Services Company Confidential
Ab Initio
TheGDEDebugger
TheGDEhasabuiltindebuggercapability
ToenabletheDebugger,Debugger:EnableDebugger
TheDebuggerToolbar
Enable
Debugger
Add Watcher
File
Isolate
Components
Remove All
Watchers
Asias Largest Global Software & Services Company Confidential
Ab Initio
72
MultistageTransform
Datatransformationinmultiplestagesfollowingseveralsetsof

rules
Eachsetofruleformonetransformfunction
Informationispassedacrossstagesbytemporaryvariables
Stagesincludeinitialization,iteration,finalizationandmore
Fewmultistagecomponentsareaggregate,rollup,scan
Aggregate/Rollup/Scan
Generatessummaryrecordsforgroupofinputrecords
Asias Largest Global Software & Services Company Confidential
Ab Initio
DatabaseComponents
*

Join with DB
* Truncate Table
Deletes all the rows in a specified DB table
* Run SQL
Executes SQL statements in a DB
Asias Largest Global Software & Services Company Confidential
Ab Initio
74
BuiltInFunctions
AbInitiobuiltinfunctionsareDMLexpressionsthat
canmanipulatestrings,dates,andnumbers
accesssystemproperties
Functioncategories
Datefunctions:now(),today(),date_to_int(),..
Inquiryanderrorfunctions:is_defined(),is_valid(),force_error(),..
Lookupfunctions:lookup(),lookup_local(),..
Mathfunctions:ceiling(),floor(),..
Miscellaneousfunctions:decimal_round(),hash_value(),..
Stringfunctions:string_substring(),is_blank(),..
Asias Largest Global Software & Services Company Confidential
Ab Initio
75
Components contd..
Name Description
Normalize Generates multiple data records from each input data record
Separate a data record with a vector field into several individual records, each containing
one element of the vector.
Denormalize
Sorted
Consolidates groups of related data records into a single output record with a vector
field for each group
Requires Grouped Input
Validate
Records
Separates valid data records from invalid data records
Check Order Tests whether data records are sorted according to a key-specifier.
Compare
Records
Compares data records from two flows one by one
Generate
Records
Generates a specified number of data records with fields of specified lengths and types.
Gather Logs Collects the output from the log ports of components for analysis of a graph after
execution
Sample Selects a specified number of data records at random from one or multiple input flows
Asias Largest Global Software & Services Company Confidential
Ab Initio
Mechanism by which some or all constituents of an application
datasets and processing modules are replicated into a number of
partitions, each spawning a process.
This makes the Ab initio to process considerable huge volume (in
millions) of records with an optimum usage of hardware available.
The power of Ab Initio lies in the fact that it can process data in parallel
runtime environment
Types of Parallelism
Component Parallelism
Pipeline Parallelism
Data Parallelism
ParallelisminAbInitio
Asias Largest Global Software & Services Company Confidential
Ab Initio
Component Parallelism is achieved when different instances of same
component run on separate data sets. Component parallelism scales to the
number of branches of a graph the more branches a graph has, the greater
the component parallelism. If a graph has only one branch, component
parallelism cannot occur.
ComponentParallelism
Asias Largest Global Software & Services Company Confidential
Ab Initio
Pipeline parallelism occurs when several connected program components on
the same branch of a graph execute simultaneously. In this kind the two
processing stages of the graph run concurrently.
PipelineParallelism
Asias Largest Global Software & Services Company Confidential
Ab Initio
When data is divided into segments or partitions and multiple instances of
program components run simultaneously on each partition
Expanded
View
Linear
View
DataParallelism
Asias Largest Global Software & Services Company Confidential
Ab Initio
80
DataParallelism
Multifiles
Aglobalviewofasetofordinaryfilescalledpartitions usuallylocatedondifferent
disksorsystems
AbInitioprovidesshelllevelutilitiescalledm_commands forhandlingmultifiles
(copy,delete,moveetc.)
MultifilesresideonMultidirectories
EachisrepresentedusingURLnotationwithmfile astheprotocolpart:
mfile://pluto.us.com/usr/ed/mfs1/new.dat
Asias Largest Global Software & Services Company Confidential
Ab Initio
//host1/vol4/pA/mydir
/myfile.dat
//host2/vol3/pB/mydir
/myfile.dat
//host3/vol7/pC/mydir
/myfile.dat
Control
Partition
Data
Partition
on Host1
Data
Partition
on Host2
Data
Partition
on Host3
Afilespanningacrosspartitionsonsame/differenthosts
mfile://host1/u/jo/mfs/mydir/myfile.dat
//host1/u1/jo/mfs/mydir
/myfile.dat
AMultifile
Asias Largest Global Software & Services Company Confidential
Ab Initio
82
DataPartitioningComponents
Data can be partitioned using
Partition by Round-robin
Partition by Key
Broadcast
Partition by Expression
Partition by Range
Partition by Percentage
Partition by Load Balance
Asias Largest Global Software & Services Company Confidential
Ab Initio
Writes records to each partition evenly
Block-size records go into one partition before
moving on to the next.
RoundrobinPartition
B
C
D
F
C
D
B
G
B
D
F
D
B
C
D
F
C
D
B
G
B
D
F
E
E
E
E
A
A
A
A
A
A
A
A
D
Partition 0 Partition 1 Partition 2
Asias Largest Global Software & Services Company Confidential
Ab Initio
A Data Parallel Application: The Global View
Asias Largest Global Software & Services Company Confidential
Ab Initio
Partitioning by Key
B
C
D
F
C
D
B
G
B
D
F
D
Partition 0 Partition 1 Partition 2
B
C
D
F
C
D
B
G
B
D
F
E
E
A
A
A
A
E
E
A
A
A
A
D
B
C
D
F
C
D
B
G
B
D
F
D
Partition 0 Partition 1 Partition 2
B
C
D
F
C
D
B
G
B
D
F
E
E
A
A
A
A
E
E
A
A
A
A
D
A hash code computed using the key determines which partition a record will be written on,
meaning that records with the same key value will go to the same partition
Asias Largest Global Software & Services Company Confidential
Ab Initio
86
DepartitioningComponents
Gather
Reads data records from the flows connected to the input port
Combines the records arbitrarily and writes to the output
Concatenate
Concatenate appends multiple flow partitions of data records one
after another
Merge
Combines data records from multiple flow partitions that have been
sorted on a key
Maintains the sort order
Asias Largest Global Software & Services Company Confidential
Ab Initio
87
Factors:Phases&Checkpoints
Phasing:
Breaking an application into phases limits the contention for
Main memory.
Processor(s).
Breaking an application into phases cost
Disk space.
Checkpoint - Purpose:
Provide same functionality as phase
Additional: Provide restart capability
How does it work ?
At job start, output datasets are copied to temporary files (in .WORK-
serial or .WORK-parallel directories)
At checkpoint completion, intermediate datasets and job state are
stored in temporary files
Recovery information is stored in host and vnode directories
represented by AB_WORK_DIR defined in the Ab Initio environment
Asias Largest Global Software & Services Company Confidential
Ab Initio
Directory dedicated to Co>Ops
Should have enough free space; Cannot be NFS or NAS mounted
Holds Storage of Internal Log Files (used in recovery of Ab Initio Graph)
Used when components are connected via name pipes
Sub-directories of AB_WORK_DIR
host Holds Control Node Recovery Files
vnode Holds Processing Node Recovery Files
data Holds files for Layouts
cache Holds Cache Files needed by remote components
Important logging information in host and vnode directories
Usually does not have data files.
Components with host layouts or database layouts, data written to data
subdirectory
AB_WORK_DIR fill up leads to non-recovery of Ab Initio Jobs.
88
AB_WORK_DIR
Asias Largest Global Software & Services Company Confidential
Ab Initio
89
Performance:DebuggingLogfile
A sample log file ..
Asias Largest Global Software & Services Company Confidential
Ab Initio
90
Performance:DebuggingLogfile
ReadingtheLog:CPU
CPUtime:totalprocessingforcomponent
Status:[Running:Finished]
Skew:amongCPUtimesofeachpartition
Vertex:component
ReadingtheLog:DATA
Databytes:#processed
Records:#processed
Status:[unopened:opened:closed]
Skew:amongdatabytesinpartitions
Flow:linkbetweencomponents
datatrackinginfoisdisplayedonflowsinGDE
Vertex:component
Port:ofcomponent
Interpretingthelog
Computedatabytes/secthroughcomponent,ineachpartition
Lookforserialization:effectiveCPU=(cpu time)/(elapsedtime)
compareopenvs.closedpartitions:serialized whensomepartitionsremainopenlongafter
othershaveclosed dataskew
Deadlock:no changeinrecordcountsovercoupleofintervals
Asias Largest Global Software & Services Company Confidential
Ab Initio
91
Performance:Inanutshell..
AvoidSortsasitisconsumingmorememory.
AvoidcomponentslikeJoinwithDB(hitting

dbforeachandeveryrecord).
UseLookups.
UseInmemoryJoin/Rollup.
AssignDrivingPortofJoincorrectly.
Filteringunrequireddatabeforeprocessing.
Phasing.
Asias Largest Global Software & Services Company Confidential
Ab Initio
92
THANK YOU

Você também pode gostar