Escolar Documentos
Profissional Documentos
Cultura Documentos
edu/10664164/Data_Stage_Interview_Questions
http://www.redbooks.ibm.com/redbooks/pdfs/sg247830.pdf
What is the main difference between data set and file set stage?
Dataset is an internal format of DataStage the main points to be considered abou
t dataset before using are:
1) It stores data in binary in the internal format of DataStage so, it takes les
s time to read/write from dataset
than any other source/target.
2)It preserves the partioning schemes so that you don't have to partition it aga
in.
3)You cannot view data without datastage
Now, About Fileset
1)It stores data in the format similar to a sequential file.
2)Only advantage of using fileset over a sequential file is "it preserves partio
ning scheme"
3)You can view the data but in the order defined in partitioning scheme '
What is difference between Join/Lookup/Merge stages? How these will react if dup
licates records come in input links?
Join Stage:
1.) It has n input links(one being primary and remaining being secondary links),
one output link and there is no reject link
2.) It has 4 join operations: inner join, left outer join, right outer join and
full outer join
3.) join occupies less memory, hence performance is high in join stage
4.) Here default partitioning technique would be Hash partitioning technique
5.) Prerequisite condition for join is that before performing join operation, th
e data should be sorted.
Look up Stage:
1.) It has n input links, one output link and 1 reject link
2.) It can perform only 2 join operations: inner join and left outer join
3.) Join occupies more memory, hence performance reduces
4.) Here default partitioning technique would be Entire
Merge Stage:
1.) Here we have n inputs master link and update links and n-1 reject links
2.) in this also we can perform 2 join operations: inner join, left outer join
3.) the hash partitioning technique is used by default
4.) Memory used is very less, hence performance is high
5.) sorted data in master and update links are mandatory
How many rejects links I can give in Merge stage?
In join stage, if one input have col1,col2,col3 and other have col4,col5,col6 th
en how to join this and perform left outer join ?
n.
Constant - Conditions that are either true or false that specifies flow of data
with a link.
6) Containers : Usage and Types?
Container is a collection of stages used for the purpose of Reusability.
There are 2 types of Containers. a) Local Container: Job Specific b) Shared Cont
ainer: Used in any job within a project.
There are two types of shared container:
1.Server shared container. Used in server jobs (can also be used in parallel job
s).
2.Parallel shared container. Used in parallel jobs. You can also include server
shared containers in parallel jobs as a way of incorporating
server job functionality into a parallel stage (for example, you could use one t
o make a server plug-in stage available to a parallel job)
7) Where Datastage stores his repositiry ? most of part in SQL server and Oracle
8) What is Surrogate key ?
9) What are routines ?
10) What are job parameters ?
11) Datastage architecture ?
12) What is ora bulck stage?
this stage is used to bulck load the oracle target table
13)