Escolar Documentos
Profissional Documentos
Cultura Documentos
Ascential Products
Ascential DataStage
Ascential DataStage EE (3)
Ascential DataStage EE MVS
Ascential DataStage TX
3
Query statistics
ETL statistics
Source Information
Target Information
Source to Target mapping Information
Local containers. These are created within a job and are only accessible by that job. A
local container is edited in a tabbed page of the jobs Diagram window.
Shared containers. These are created separately and are stored in the Repository in the
same way that jobs are. There are two types of shared container
What is function? ( Job Control Examples of Transform Functions )
Functions take arguments and return a value.
DataStage BASIC functions: These functions can be used in a job control routine,
which is defined as part of a jobs properties and allows other jobs to be run and controlled from
the first job. Some of the functions can also be used for getting status information on the current
job; these are useful in active stage expressions and before- and after-stage subroutines.
To do this ...
Specify the job you want to control
Set parameters for the job you want to control
DSSetJobLimit
DSRunJob
DSWaitForJob
DSGetLinkMetaData
DSGetProjectInfo
DSGetIPCStageProps
DSGetJobMetaBag
DSGetStagesOfType
DSGetStageTypes
DSGetLinkInfo
DSGetParamInfo
DSGetLogEntry
DSGetLogSummary
DSLogEvent
DSStopJob
DSDetachJob
Log a fatal error message in a job's log file and aborts the job.
DSLogFatal
DSLogInfo
DSGetJobInfo
DSGetStageInfo
DSGetStageLinks
DSGetNewestLogId
DSLogToController
DSPrepareJob
DSSendMail
DSTransformError
DSTranslateCode
DSCheckRoutine
DSLogWarn
DSMakeJobReport
DSMakeMsg
DSWaitForFile
DSExecute
DSSetUserStatus
What is Routines?
Routines are stored in the Routines branch of the Data Stage Repository, where you can create,
view or edit. The following programming components are classified as routines:
Transform functions, Before/After subroutines, Custom UniVerse functions, ActiveX (OLE)
functions, Web Service routines
Dimension Modeling types along with their significance
Data Modelling is broadly classified into 2 types.
A) E-R Diagrams (Entity - Relatioships).
B) Dimensional Modelling.
Question: Dimensional modelling is again sub divided into 2 types.
A) Star Schema - Simple & Much Faster. Denormalized form.
B) Snowflake Schema - Complex with more Granularity. More normalized form.
What is the flow of loading data into fact & dimensional tables?
Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys in Dimensional
table. Consists of fields with numeric values.
Dimension table - Table with Unique Primary Key.
Load - Data should be first loaded into dimensional table. Based on the primary key values in
dimensional table, then data should be loaded into Fact table.
What is Hash file stage and what is it used for?
Under UNIX: Poll for the file. Once the file has start the job or sequencer depending on the file.
What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director?
Use crontab utility along with dsexecute() function along with proper parameters passed.
Did you work in UNIX environment?
Yes. One of the most important requirements.
How would call an external Java function which are not supported by DataStage?
Starting from DS 6.0 we have the ability to call external Java functions using a Java package from
Ascential. In this case we can even use the command line to invoke the Java function and write the
return values from the Java program (if any) and use that files as a source in DataStage job.
How will you determine the sequence of jobs to load into data warehouse?
First we execute the jobs that load the data into Dimension tables, then Fact tables, then load the
Aggregator tables (if any).
The above might raise another Why do we have to load the dimensional tables first, then fact tables:
As we load the dimensional tables the keys (primary) are generated and these keys (primary) are Foreign
keys in Fact tables.
Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to
the DB or does it do some kind of Delete logic.
There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete from statement.
On an OCI stage such as Oracle, you do have both Clear and Truncate options. They are radically
different in permissions (Truncate requires you to have alter table permissions where Delete doesn't).
How do you rename all of the jobs to support your new File-naming conventions?
Create an Excel spreadsheet with new and old names. Export the whole project as a dsx. Write a Perl
program, which can do a simple rename of the strings looking up the Excel file. Then import the new dsx
file probably into a new project for testing. Recompile all jobs. Be cautious that the name of the jobs
has also been changed in your job control jobs or Sequencer jobs. So you have to make the necessary
changes to these Sequencers.
When should we use ODS?
DWH's are typically read only, batch updated on a schedule
ODS's are maintained in more real time, trickle fed constantly
Les Barbusinskis Without getting into specifics, here are some differences you may want to
explore with each vendor:
Does the tool use a relational or a proprietary database to store its Meta data and
scripts? If proprietary, why?
What add-ons are available for extracting data from industry-standard ERP, Accounting,
and CRM packages?
Can the tools Meta data be integrated with third-party data modeling and/or business
intelligence tools? If so, how and with which ones?
How well does each tool handle complex transformations, and how much external
scripting is required?
Almost any ETL tool will look like any other on the surface. The trick is to find out which one
will work best in your environment. The best way Ive found to make this determination is to
ascertain how successful each vendors clients have been using their product. Especially clients
who closely resemble your shop in terms of size, industry, in-house skill sets, platforms, source
systems, data volumes and transformation complexity.
14
15
A. Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job.
May be you can schedule the sequencer around the time the file is expected to arrive.
B. Under UNIX: Poll for the file. Once the file has start the job or sequencer depending on the
file
What are Sequencers?
Sequencers are job control programs that execute other jobs with preset Job parameters.
How did you handle an 'Aborted' sequencer?
In almost all cases we have to delete the data inserted by this from DB manually and fix the job
and then run the job again.
Question34: What is the difference between the Filter stage and the Switch stage?
Ans: There are two main differences, and probably some minor ones as well. The two main differences
are as follows.
1)The Filter stage can send one input row to more than one output link. The Switch stage can not the C switch construct has an implicit break in every case.
2)The Switch stage is limited to 128 output links; the Filter stage can have a theoretically unlimited
number of output links. (Note: this is not a challenge!)
How can i achieve constraint based loading using datastage7.5.My target tables have inter
dependencies i.e. Primary key foreign key constraints. I want my primary key tables to be loaded
first and then my foreign key tables and also primary key tables should be committed before the
foreign key tables are executed. How can I go about it?
Ans:1) Create a Job Sequencer
to load you tables in Sequential mode
In the sequencer Call all Primary Key tables loading Jobs first and followed by Foreign key
tables, when triggering the Foreign tables load Job trigger them only when Primary Key load
Jobs run Successfully ( i.e. OK trigger)
2) To improve the performance of the Job, you can disable all the constraints on the tables and
load them. Once loading done, check for the integrity of the data. Which does not meet raise
exceptional data and cleanse them.
This only a suggestion, normally when loading on constraints are up, will drastically
performance will go down.
16
Business advantages:
Technological advantages:
19
by a job sequencer
How to invoke a Datastage shell command?
Datastage shell commands can be invoked from :
DataStage Developer - user with full access to all areas of a DataStage project
DataStage Operator - has privileges to run and manage deployed DataStage jobs
FILE.STAT command
ANALYZE.FILE command
Is it possible to run two versions of datastage on the same pc?
Yes, even though different versions of Datastage use different system dll libraries.
To dynamically switch between Datastage versions install and run DataStage Multi-Client Manager.
That application can unregister and register system libraries used by Datastage.
Error in Link collector - Stage does not support in-process active-to-active inputs or outputs
To get rid of the error just go to the Job Properties -> Performance and select Enable row buffer.
Then select Inter process which will let the link collector run correctly.
Buffer size set to 128Kb should be fine, however it's a good idea to increase the timeout.
The Final Warning Text - the red fatal, the message which is included in the sequence
abort message
21
source odbc stage which fetches one record from the database and maps it to one
column - for example: select sysdate from dual
A transformer which passes that record through. If required, add pl/sql procedure
parameters as columns on the right-hand side of tranformer's mapping
Put Stored Procedure (STP) stage as a destination. Fill in connection parameters, type in
the procedure name and select Transform as procedure type. In the input tab select 'execute
procedure for each row' (it will be run once).
Design of a DataStage server job with Oracle plsql procedure call
22
Datastage routine which reads the first line from a text file
Note! work dir and file1 are parameters passed to the routine.
* open file1
OPENSEQ work_dir : '\' : file1 TO H.FILE1 THEN
CALL DSLogInfo("******************** File " : file1 : " opened successfully", "JobControl")
END ELSE
CALL DSLogInfo("Unable to open file", "JobControl")
ABORT
END
READSEQ FILE1.RECORD FROM H.FILE1 ELSE
Call DSLogWarn("******************** File is empty", "JobControl")
END
firstline = Trim(FILE1.RECORD[1,32]," ","A") ******* will read the first 32 chars
Call DSLogInfo("******************** Record read: " : firstline, "JobControl")
CLOSESEQ H.FILE1
How to test a datastage routine or transform?
To test a datastage routine or transform go to the Datastage Manager.
Navigate to Routines, select a routine you want to test and open it. First compile it and then click
'Test...' which will open a new window. Enter test parameters in the left-hand side column and click run
all to see the results.
Datastage will remember all the test arguments during future tests.
When hashed files should be used? What are the benefits or using them?
Hashed files are the best way to store data for lookups. They're very fast when looking up the key-value
pairs.
Hashed files are especially useful if they store information with data dictionaries (customer details,
countries, exchange rates). Stored this way it can be spread across the project and accessed from
different jobs.
How to construct a container and deconstruct it or switch between local and shared?
To construct a container go to Datastage designer, select the stages that would be included in the
container and from the main menu select Edit -> Construct Container and choose between local and
shared.
Local will be only visible in the current job, and share can be re-used. Shared containers can be viewed
and edited in Datastage Manager under 'Routines' menu.
Local Datastage containers can be converted at any time to shared containers in datastage designer by
right clicking on the container and selecting 'Convert to Shared'. In the same way it can be converted
back to local.
23
Clear the table then insert rows - deletes the contents of the table (DELETE statement) and
adds new rows (INSERT).
Truncate the table then insert rows - deletes the contents of the table (TRUNCATE statement)
and adds new rows (INSERT).
Insert rows without clearing - only adds new rows (INSERT statement).
Delete existing rows only - deletes matched rows (issues only the DELETE statement).
Replace existing rows completely - deletes the existing rows (DELETE statement), then adds
new rows (INSERT).
Update existing rows or insert new rows - updates existing data rows (UPDATE) or adds new
rows (INSERT). An UPDATE is issued first and if succeeds the INSERT is ommited.
Insert new rows or update existing rows - adds new rows (INSERT) or updates existing rows
(UPDATE). An INSERT is issued first and if succeeds the UPDATE is ommited.
24
ICONV and OCONV functions are quite often used to handle data in Datastage.
ICONV converts a string to an internal storage format and OCONV converts an expression to an
output format.
Syntax:
Iconv (string, conversion code)
Oconv(expression, conversion )
Some useful iconv and oconv examples:
Iconv("10/14/06", "D2/") = 14167
Oconv(14167, "D-E") = "14-10-2006"
Oconv(14167, "D DMY[,A,]") = "14 OCTOBER 2006"
Oconv(12003005, "MD2$,") = "$120,030.05"
That expression formats a number and rounds it to 2 decimal places:
Oconv(L01.TURNOVER_VALUE*100,"MD2")
Iconv and oconv can be combined in one expression to reformat date format easily:
Oconv(Iconv("10/14/06", "D2/"),"D-E") = "14-10-2006"
ERROR 81021 Calling subroutine DSR_RECORD ACTION=2
Datastage system help gives the following error desription:
SYS.HELP. 081021
MESSAGE.. dsrpc: Error writing to Pipe.
The problem appears when a job sequence is used and it contains many stages (usually more than 10)
and very often when a network connection is slow.
Basically the cause of a problem is a failure between DataStage client and the server communication.
The solution to the issue is:
Do not log in to Datastage Designer using 'Omit' option on a login screen. Type in explicitly username
and password and a job should compile successfully.
execute the DS.REINDEX ALL command from the Datastage shell - if the above does not help
How to check Datastage internal error descriptions
To check the description of a number go to the datastage shell (from administrator or telnet to the
server machine) and invoke the following command:
SELECT * FROM SYS.MESSAGE WHERE @ID='081021'; - where in that case the number 081021 is an
error number
The command will produce a brief error description which probably will not be helpful in resolving an
issue but can be a good starting point for further analysis.
25
The problem appears when a project is moved from one project to another (for example when
deploying a project from a development environment to production).
The solution to the issue is:
Rebuild the repository index by executing the DS.REINDEX ALL command from the Datastage shell
Datastage Designer hangs when editing job activity properties
The appears when running Datastage Designer under Windows XP after installing patches or the Service
Pack 2 for Windows.
After opening a job sequence and navigating to the job activity properties window the application
freezes and the only way to close it is from the Windows Task Manager.
The solution of the problem is very simple. Just Download and install the XP SP2 patch for the
Datastage client.
It can be found on the IBM client support site (need to log in):
https://www.ascential.com/eservice/public/welcome.do
Go to the software updates section and select an appropriate patch from the Recommended DataStage
patches section.
Sometimes users face problems when trying to log in (for example when the license doesnt cover the
IBM Active Support), then it may be necessary to contact the IBM support which can be reached at
WDISupport@us.ibm.com
26
Parallel processing
Datastage jobs are highly scalable due to the implementation of parallel processing. The EE
architecture is process-based (rather than thread processing), platform independent and uses the
processing node concept. Datastage EE is able to execute jobs on multiple CPUs (nodes) in
parallel and is fully scalable, which means that a properly designed job can run across resources
within a single machine or take advantage of parallel platforms like a cluster, GRID, or MPP
architecture (massively parallel processing).
Parallel jobs are executable datastage programs, managed and controlled by Datastage
Server runtime environment
o
Parallel jobs have a built-in mechanism for Pipelining, Partitioning and Parallelism. In
most cases no manual intervention is needed to implement optimally those techniques.
o
Parallel jobs are a lot faster in such ETL tasks like sorting, filtering, aggregating
2.
Datastage EE jobs are compiled into OSH (Orchestrate Shell script language).
OSH executes operators - instances of executable C++ classes, pre-built components representing stages
used in Datastage jobs.
Server Jobs are compiled into Basic which is an interpreted pseudo-code. This is why parallel jobs run
faster, even if processed on one CPU.
3.
Datastage Enterprise Edition adds functionality to the traditional server stages, for instance
record and column level format properties.
4.
Datastage EE brings also completely new stages implementing the parallel concept, for example:
o
Enterprise Database Connectors for Oracle, Teradata & DB2
o
Development and Debug stages - Peek, Column Generator, Row Generator, Head, Tail,
Sample ...
o
Data set, File set, Complex flat file, Lookup File Set ...
o
Join, Merge, Funnel, Copy, Modify, Remove Duplicates ...
5.
When processing large data volumes Datastage EE jobs would be the right choice, however
when dealing with smaller data environment, using Server jobs might be just easier to develop,
understand and manage.
When a company has both Server and Enterprise licenses, both types of jobs can be used.
6.
Sequence jobs are the same in Datastage EE and Server editions.
Snowflake Schema
Normalized Data Structure
29
No Parent Table
Simple DB Structure
Complicated DB Structure
CHMOD command?
Permissions
u - User who owns the file.
g - Group that owns the file.
o - Other.
a - All.
r - Read the file.
w - Write or edit the file.
x - Execute or run the file as a program.
Numeric Permissions:
CHMOD can also to attributed by using Numeric Permissions:
400 read by owner
040 read by group
004 read by anybody (other)
200 write by owner
020 write by group
002 write by anybody
100 execute by owner
010 execute by group
001 execute by anybody
What is the diffrence between IBM Web Sphere DataStage 7.5 (Enterprise Edition ) & Standard
Ascential DataStage 7.5 Version ?
IBM Information Server also known as DS 8 has more features like Quality Stage & MetaStage .
It maintains its repsository in DB2 unlike files in 7.5. Also it has stage specifically for SCD 1 &
2.
I think there is no version like standard Ascential DataStage 7.5, I know only the advanced
edition of Datastage i.e., only web sphere Datastage and Quality stage, it is released by IBM
itself and given the version as 8.0.1, in this there are only 3 client tools(admin..,desig..,director),
here they have removed the manager, it is included in designer itself (for importing and
exporting) and in this they have added some extra stages like SCD stage , by using this we can
impliment scd1 and scd2 directly, and there are some advanced stages are there.
They have included the QualityStage, which is used for data validation which is very very
importent for dwh. There are somany things are available in Qualitystage, we can think it as a
seperate tools for dwh.
What are the errors you expereiced with data stage
Here in datastage there are some warnings and some fatal errors will come in the log file.
34
***********************************************************
In server we dont have an option to process the data in multiple nodes as in parallel. In parallel
we have an advatage to process the data in pipelines and by partitioning, whereas we dont have
any such concept in server jobs.
There are lot of differences in using same stages in server and parallel. For example, in parallel, a
sequencial file or any other file can have either an input link or an output ink, but in server it can
have both(that too more than 1).
********************************************************************
server jobs can compile and run with in datastage server but parallel jobs can compile and run
with in datastage unix server.
server jobs can extact total rows from source to anthor stage then only that stage will be activate
and passing the rows into target level or dwh.it is time taking.
but in parallel jobs it is two types
1.pipe line parallelisam
2.partion parallelisam
1.based on statistical performence we can extract some rows from source to anthor stage at the
same time the stage will be activate and passing the rows into target level or dwh.it will maintain
only one node with in source and target.
35
Used to change the datatypes, if the source contains the varchar and the target contains integer
then we have to use this Modify Stage and we have to change according to the requirement. And
we can do some modification in length also.
Modify Stage is used for the purpose of Datatype Change.
What is the difference between Squential Stage & Dataset Stage. When do u use them.
a)Sequential stage is use for format of squential file and Dataset is use for any type of format
file (random)
b)Parallel jobs use data sets to manage data within a job. You can think of each link in a job as
carrying a data set. The Data Set stage allows you to store data being operated on in a persistent
form, which can then be used by other WebSphere DataStage jobs. Data sets are operating
system files, each referred to by a control file, which by convention has the suffix .ds. Using data
sets wisely can be key to good performance in a set of linked jobs. You can also manage data
sets independently of a job using the Data Set Management utility, available from the
WebSphere DataStage Designer or Director In datset dat is stored in some encrypted format ie.,
we can view the data through view data facility available in datastage but it cant be viewed in
Linux or back end system. In sequential file data can be viewed any where. Extraction of data
from the datset is much more faster than the sequential file.
how can we improve the performance of the job while handling huge amount of data
a)Minimize the transformer state,Reference table have huge amount of date then you can use join
stage. Reference table have less amount of data then you can use lookup.
b)this require a job level tuning or server level tuning.
in job level we can do the follwing.
job level tuning
use Join for huge amount of data rather than lookup.
use modify stage rather than transformer for simple transformation.
Sort the data before remove duplicate stage.
server level tuning
this can only be done after having adequate knowledge of the serever level parameter which can
improve the server execution performance.
36
How will you determine the sequence of jobs to load into data warehouse?
First we execute the jobs that load the data into Dimension tables, then Fact tables, then load the
Aggregator tables (if any).
The sequence of the job can also be determined by the determining the parent child relationship
in the target tables to be loaded. parent table always need to be loaded before child tables.
37
In Designer Pallete Development/Debug we can find Head & tail. By using this we can do......
b) by using lookup stage and change capture stage we will implement the scd.
we have 3 types of scds
type1:it will maintain the current values only.
type2: it will maintain the both current and historical values.
type3: it will maintain the current and partial historical values.
39
40
b)The data is extracted from the different source systems.After extraction the data is transfered to
the staging layer for cleansing purpose.Cleansing means LTRIM/RTRIM etc.The data is coming
periodically to the staging layer.An ODS is used to store the Resent data.An ODS and the
Staging Area are the two types of layer between the source system and the target system.After
41
d)The following rules apply to the names that you can give DataStage jobs:
Job names can be any length.
They must begin with an alphabetic character.
They can contain alphanumeric characters and underscores.
Job category names can be any length and consist of any characters, including spaces
e)1.System Variables are inbuilt functions that can be called in a transformer stage
2.Containers is a group of stages and links,thy are of 2 types,local containers and shared
containers
3.Using IPC,Managing the array and transaction size,Project tunables can be set through the
Administrator.
4.Values that would be required during the job run
5.Routines are which call the jobs or any actions to be performed using DS,Transforms are the
manipulation of data during teh load.
6.
7.Using HASH FILE
8. Using the target oracle stages depending on teh update action
9.using a row id or seq generated numbers
43
What are the difficulties faced in using DataStage ? or what are the constraints in using
DataStage ?
a)1)If the number of lookups are more?
2)what will happen, while loading the data due to some regions job aborts?
b)1. I feel, the most difficult part is understanding the "Datastage director job log error
messages'. It doesn't give u in proper readable message.
2.We dont have many date functions available like in Informatica or traditional Relational
databases.
3. Datastage is like unique product interms of functions ex: Most of the database or ETL tools
use for converting from lower case to upper case : UPPER. The datastage uses "UCASE".
Datastage is peculiar when we compare to other ETL tools.
Otherthan that, i dont see any issues with Datastage.
c)* The issue that i faced with datastage is that, it was very difficult to find the errors from the
error code since the error table did not specify the reason for the error. And as a fresher i did not
know what the error codes satnd for :)
* Another issue is that the help in the datastage was not of much use since it was not specific and
was more general.
* I donot know about other tools since this is the only tool that i have used until now. But it was
simple to use so liked using it inspite of above issues.
Have you ever involved in updating the DS versions like DS 5.X, if so tell us some the steps you
have
A) Yes. The following are some of the steps; I have taken in doing so:
1) Definitely take a back up of the whole project(s) by exporting the project
as a .dsx file
2) See that you are using the same parent folder for the new version also for
your old jobs using the hard-coded file path to work.
3) After installing the new version import the old project(s) and you have to
compile them all again. You can use Compile All tool for this.
4) Make sure that all your DB DSNs are created with the same name as old ones.
This step is for moving DS from one machine to another.
5) In case if you are just upgrading your DB from Oracle 8i to Oracle 9i there
is tool on DS CD that can do this for you.
6) Do not stop the 6.0 server before the upgrade, version 7.0
Install process collects project information during the upgrade. There is NO
rework (recompilation of existing jobs/routines) needed after the upgrade.
44
45
46
47
48
c)Then how about Pipeline and Partition Paralleism, are they also 2 types of Parallel processing?
d)3 types of llrlism .
data llrlism
pipeline llrlism
round robin
e)there two types of parallel processing1) SMP --> Symmertical Multi Processing2) MPP--->
Massive Parallel Processing
f)Parallel processing are two types.
1) Pipeline parellel processing
49
c)1. Define the job parameter at the job level or Project level.
2. Use the file name in the stage(source or target or lookup)
3. supply the file name at the run time.
50
52
53
c)For every job in DataStage, a phantom is generated for the job as well as for every active stage
which contributes to the job. These phantoms writes logs reg the stage/job. If there is any
abnormality ocuurs, an error meesage is written and these errors are called phantom errors. These
logs are stored at &ph& folder.
Phantoms can be killed through DataStage Administrator or at server level.
what is the use of environmental variables?
Environment variables are predefined variable those we can use while creating datastage
jobs.We can set in project level or job level once we set the variable the variable will be
available in the project.
b)Enviroment Variables are the one who set the enviroments.
Oce you set these varicables in datastage you can use them in any job as a perameter.
Example is
you you want to connect to database you need userid , password and schema.
These are constant through out the project so they will be created as env variables.
use them where ever you are want with #Var# .
By using this if there is any change in password or schema no need to worry about all the jobs .
change it at the level of env variable that will take care of all the jobs.
54
hi..
Disadvantages of staging area
a)I think disadvantage of staging are is disk space as we have to dump data into a local area.. As
per my knowledge concern, there is no other disadvantages of staging area.
b)Yes, its like a disadvantage of staging area, it takes more space in database and it may not be
cost effective for client.
How can we remove duplicates using sort stage?
a)Set the "Allow Duplicates" option to false
b) TreeSet<String> set = new TreeSet<String>(Arrays.asList(names));
for (String name : set)
System.out.println(name);
55
this is enough for sorting and removing of duplicate elements (using Java 5 in this example)
what is the difference between RELEASE THE JOB and KILL THE JOB?
Release the job is to release the job from any dependencies and run it.
Kill the job means kill the job that's currently running or scheduled to run.
Can you convert a snow flake schema into star schema?
Yes, We can convert by attaching one hierarchy to lowest level of another hierarchy.
No. It is not possible
What is repository?
Repository resides in a spcified data base. it holds all the meta data, rawdata, mapping
information and all the respective mapping information.
Repository is a content which is having all metadata (information).
What is Fact loading, how to do it?
a)firstly u have to run the hash-jobs, secondly dimensional jobs and lastly fact jobs.
b)Once we have loaded our dimensions, then as per business requirements we identify the
facts(columns or measures on which business is measured) and then load into fact tables..
What is the alternative way where we can do job control??
Job Control will possble Through scripting. Controling is dependent on Reqirements.need of
the job.
b)Jobcontrol can be done using :
Datastage job Sequencers
Datastage Custom routines
Scripting
Scheduling tools like Autosys
Where we can use these Stages Link Partetionar, Link Collector & Inter Process (OCI) Stage
whether in Server Jobs or in Parallel Jobs ? And SMP is a Parallel or Server ?
You can use Link partitioner and link collector stages in server jobs to speed up processing.
Suppose you have a source and target and a transformer in between that does some processing,
applying fns etc.
You can speed it up by using link partitioner to split the data from source into differernt links,
apply the Business logic and then collect the data back using link collector and pump it into
output.
56
57