Você está na página 1de 15

1) Fact Table ?

A) The fact tables hold the main data.The centralized table in a star schema is
called as FACT table. A fact table typically has two types of columns: those that
contain facts and those that are foreign keys to dimension tables.
2) Dimention Table?
A) Dimension tables r nothing but a master tables thru which u can extract the
actual transactions.dimension table contains normalized data.They have simple
primary keys
3)What are Confirmed Dimentions?
A) Conformed Dimension is a dimension which can be used across multiple data
marts
4) Slowly Changing Dimentions?
A) The definition of slowly changing dimension is in its name only. The
dimension which changes slowly with time. A customer dimension table
represents customer. When creating a customer, normal assumption is it is
independent of time. But what if address of customer changes or name of
customer changes?

This change can be implemented in three ways...

Type I: Replace the old record with a new record with updated data, there by we
lose the history. But data warehouse has a responsibility to track the history
effectively, where Type I implementation fails.

Type II: Create a new additional dimension table record with new value. By this
way we can keep the history. We can determine which dimension is current by
adding a current record flag or by time stamp on the dimensional row.

Type III: In this type of implementation we create a new field in the dimension
table, which stores the old value of the dimension. When an attribute of the
dimension changes then, we push the updated value to the current field and old
value to the old field.

5)What is a Star Schema?


A) The star schema is the simplest style of data warehouse schema. The star
schema consists of a few fact tables (possibly only one) referencing any number
of dimention tables
Eg:Sales Table,with attributes as Date,Store,Product
6)What is a Snow flake Schema?
A) A snowflake schema is a logical arrangement of tables in a database such that
the entity relationshipdiagram resembles a snowflake in shape. Closely related to
the star schema, the snowflake schema is represented by centralized fact
tables which are connected to multiple dimensions.

Eg: Sales Table,with attributes as Date,Store,Product


7) What are VIEWS?
A) Views
A view takes the output of a query and makes it appear like a virtual table. You
can use a view in most places where a table can be used.
All operations performed on a view will affect data in the base table and so are
subject to the integrity constraints and triggers of the base table.
A View can be used to simplify SQL statements for the user or to isolate an
application from any future change to the base table definition. A View can also
be used to improve security by restricting access to a predetermined set of rows
or columns.
In addition to operating on base tables, one View can be based on another, a view
can also JOIN a view with a table (GROUP BY or UNION).

8) What is a Materialized View?


A) Materialized Views
Materialized views are schema objects that can be used to summarize, replicate,
and distribute data. E.g. to construct a data warehouse.
A materialized view provides indirect access to table data by storing the results of
a query in a separate schema object. Unlike an ordinary view, which does not take
up any storage space or contain any data.
You can define a materialized view on a base table, partitioned table or view and
you can define indexes on a materialized view.

9)What is a Surrogate Key?


A) Surrogate key is system generated artifical primary key values, It is used to
uniquely identify a record in dimension tables.
Or
When creating a dimension table in a DWH we generally create the tables witha
system generated key to unqiuely identify a row in the dimension. This key is also
known as a surrogate key. The surrogate key is used as the primary key in the
dimension table. The surrogate key will also be placed in the fact table and a
foreign key will be defined between the two tables. When you ultimately join the
data it will join just as any other join within the database.

10)What are active stages and passive stages?


Ans) The stages which are involved in transformation are known as active stages,
ex: transformer,sort,filter,surrogatekey generator..etc
the stages which are involved in extracting and loading the records are known as
passive stages
ex:sequential file,dataset,dynamic RDBMS..etc.

11) Differences between a sequential and a hasehed file?


Ans) 1. Hashfile can be used as lookup but not the seq file.
2. Hashfile works based on Hashed algorithm.
3. The performance is more in Hash file when it is used as a ref link(for lkp)
4. we can eliminate duplicates by selecting key value in server jobs.
5. Must and should we have to mention one key in Hash file.
6. There is a limit of 2GB in Seq file.
7. Hash file can be stored in DS memory (Buffer) but Sequential file cannot be.. duplicates will be
removed in hash file i.e No duplicates in Hash file.

12)How do u populate timed dimentions?


Ans) Usage of Procedure(Oracle) is common to Populate Time Dim

Example: CREATE OR REPLACE PROCEDURE TIMEDIMBUILD


(p_start_date IN DATE
p_end_date IN DATE)
AS
v_full_date DATE;
v_day_of_month NUMBER;
v_day_of_year NUMBER;
v_day_full_name VARCHAR2(30);
v_week_number NUMBER;
v_week_full_name VARCHAR2(30);
v_month_full_name VARCHAR2(10);
v_month_number NUMBER;
v_calendar_year NUMBER;
v_quarter NUMBER;
v_key NUMBER;
BEGIN
DELETE FROM TimeDim;
v_full_date : p_start_date;
v_key: 1;
WHILE v_full_date < p_end_date LOOP
BEGIN
v_day_of_month : TO_CHAR(p_start_date 'DD');
v_day_of_year : TO_CHAR( p_start_date 'DDD');
v_day_full_name : UPPER(TO_CHAR(p_start_date 'DAY'));
v_week_number : TO_CHAR(p_start_date 'WW');
v_month_full_name : UPPER(TO_CHAR(p_start_date 'MONTH'));
v_month_number : TO_CHAR(p_start_date 'MM');
v_calendar_year : TO_CHAR(p_start_date 'YYYY');
v_quarter : TO_CHAR(p_start_date 'Q');
INSERT INTO TimeDim
(TimeKey FullDateCode DayOfMonth DayOfYear
DayFullName WeekNumber MonthFullName
MonthNumber Quarter CalendarYear )
VALUES
(v_key v_full_date v_day_of_month v_day_of_year
v_day_full_name v_week_number v_month_full_name
v_month_number v_quarter v_calendar_year );
v_full_date : v_full_date+1;
v_key: v_key+1;
END;
END LOOP;
END;

13) can we use target hashed file as lookup?

14) what are the stages in job sequencer?


Ans) You can create job sequencer using designer client if you click create new job like server or
px job in Datastage it will get open window there you can find to create the job sequencer.

Job sequencer uses to run multiple jobs either server or px based on the dependencies.

15)a)How do we pass parameters?


Ans) There is an icon to go to Job parameters in the tool bar. Or you can press Ctrl+J to enter
into Job Parameters dialog box. Once you enter give a parameter name and corresponding default
value for it. This helps to enter the value when you run the job. Its not necessary always to open
the job to change the parameter value. Also when the job runs through script its just enough to
give the parameter value in the command line of script.
Else you have to change the value in the job compile and then run in the script. So its easy for the
users to handle the jobs using parameters.

b)How do we pass parameters in a sequence?

Ans) Yes You can pass the parameters.There is a options called "Usage Variables"
in job sequence.Here you can declare ur parameters.These will be available in
entire job sequence.

16)what are log tables?

17)what is job controlling?


Ans) Controlling Datstage jobs through some other Datastage jobs. Ex: Consider
two Jobs XXX and YYY. The Job YYY can be executed from Job XXX by using
Datastage macros in Routines.
To Execute one job from other job following steps needs to be followed in
Routines.
1. Attach job using DSAttachjob function.
2. Run the other job using DSRunjob function
3. Stop the job using DSStopJob function

18)what are hierarchies explain?


Ans) a)A hierarchy defines relationships among a set of attributes that are
grouped by levels in the dimension of a cube model.
b) Each dimension in a data warehouse may have one or more hierarchies
applied to it. For the "Date" dimension, there are several possible hierarchies:
"Day > Month > Year", "Day > Week > Year", "Day > Month > Quarter > Year",
etc.

Example of a balanced hierarchy

Example of an unbalanced hierarchy


19)what is initial loading and incremental loading?

Ans) Initial Loading Or Full Load is the entire data dump load taking place the
very first time
Incremental Loading - Where delta or difference between target and source
data is dumped at regular intervals . Timsetamp for previous delta load has to be
maintained.

20)How to develop scd2 in u r project?(Very Imp)


Ans) http://etl-tools.info/en/datastage-tutorial-L008_scd-implementation-
datastage.htm

21)Difference between union and union all?


Ans) The difference between UNION ALL and UNION is that, while UNION only selects distinct
values, UNION ALL selects all values.

22)What is a minus operator?


Ans) SQL MINUS operator work on two table expressions. The result set takes records from the first
table expression, and then subtract out the ones that appear in the second table expression. If the second
table expression includes the records which are not appear in the first table expression, these records will be
ignored.

23)What is audit table?


Ans) Audit table mean its log file.in every job should has audit
Table

24)There is a larger hashed file and a smaller oracle table and u r


looking from
transformer which is faster?

25)How to implement SCD in a project?(very imp)


Ans) ) http://etl-tools.info/en/datastage-tutorial-L008_scd-implementation-
datastage.htm

26)What r derivations in Transformer?


Ans) Derivation is like expression which is used to modify or get some values from input
columns
Or
Derivation is which u apply the business rule
Execution order is stage variable, constraint, derivation

27)How to use surrogate Key in reporting?


Ans) check 9th answer

28)What is keen?

29)what is pivot stage and how it works?


Ans) Pivot stage supports only horizontal pivoting columns into rows
Pivot stage doesn t supports vertical pivoting rows into columns

Example: In the below source table there are two cols about quarterly sales of a product but
biz req. as target should contain single col. to represent quarter sales we can achieve this
problem using pivot stage i.e. horizontal pivoting.

Source Table

ProdID Q1_Sales Q2_Sales


234550
1010 123450

Target Table

ProdID Quarter_Sales Quarter


Q1
1010 123450

1010 234550 Q2

30)How do constraints work in transformer?

31)How to declair a constraint in datastage?

32)where the data stored in data stage?


33)can we use sequential file as a lookup?
Ans)No

34)Why cant we use a sequestila file as a lookup?

35)What is a factless fact table?

36)When will we use connect and unconnect lookup?

37)Which cache supports connected and unconnected lookup?

38)what are the types of triggers?

39)What are informational dimentions?

40) What are Measures?

41)How to clear source files?

42)How do u find a link if not found?

43)Differences between a transformer and a routine?

44)How do we secure our project?

45)How to handle errors? And Explain Hndlers?

46)How do we know how many rows rejected?

47)What is universe stage?

48)what is UAT,Unit,Integration,System Testing?

49)Local,development,Preproduction,Production servers?

50)Types of Dimentions?& wat is generated dimention?

51)what is the size of the datamart?

52)How frequently u used to get the data as source?and in which


format?

53)what is a composite key?

54) what is SCD and SGT?

55) How to import source and targetswhat are the types of sources
and targets?

56)Differences between the following stages?


a. Merge and Join
b. Copy and Transformer
c. ODBC and OCI
d. Lookup and Join
e. Change Capture and Difference
f. Hashed and Sequential file

57)How to decide weather to go for a join stage or a lookupstage?

58)What is partition key which key is used for round robin partition?

59)What are change capture stagge and change apply stage?

60)How many streams to transformer we can have?

61) What is a routine ?what is before and after subroutines?

62)what is a config file ?

63)What is a Node?

64)What is IPC stage? What is increase performance?

65)What is sequential buffer?

66)How to schedule a job at every end date of the month?

67)What is status view? Why to clear this? What will happen


internally if we clear this?

68)what are operators in parallel jobs?

69)What are parameters and parameter files?

70)What is the execution flow of constraints,deriations and variables


in transformer stage?

71)can we use hashed file to remove the duplicate rows ?

72)1st n 8th record are duplicated then which will be skipped can we
configure this?

73)How to import and export DS jobs? What is the file extension?


74) Difference between a routine and a transform?

75)I have 10 key colomns values in this situation lookup is


necessary,but which type of lookup is used? Either OCBC or Hashed
file lookup? why?

76) When we write routines?

77)How many shared containers can be created?

78)How to move hashed file from one location to another location?

79)How to create a static hashed file?

80) What are system variables and type of sys variables used in ur
Project?

81)Different data stage funcs used in u r project?

82)What are confirmed,Degenerated and Junk dimentions?

83)what are confirmed facts?

84)Different types of facts and examples?

85)what are environmental variables and global variables?

86)why do you go for oracle sequence to generate surrogate key


rather than datastage routines?

87)what is the biggest table and size in your schema or in your


project?

88)How to improve the performance of the hashed files

89)I have 2 tables one table contains 100 records and other table
contains 1000 records which table is the master table ? Why?

90)I have one job from 1 flat file .I have to load the data to data
stage,10 lak rec’s are there after loading 9 lak recs job is aborted. How
do u load the remaining records?

91)Which data your project contains?

92)As you told if your sources are flat filesand ORACLE OCI then why
you need ODBC in your project rather than ORACLE OCI stage ?
93) Can you use sequential file as source to hashed file? Have you
done it? What error it will give?

94)Why hashed file improve the performance?

95)Have you used sort stage in u r job?


Ans) Sort stage is parallel stage , we need to use server jobs
only---------double check

96)Can Aggregator and transformer stage used for sorting data?


How?

97)If I have 2 sources to aggregator stage and oracle as Target,I can


sort data in aggregator but if I don’t want to use the aggregator to sort
the data how will I do?

98)Why we use surrogate key in DWH?How will a surrogate key


increases the performance? Where will it store? How do we handle
the surrogate key in the project? Where we use the surrogate key
mostly?

99)How many i/p links can a transformer stage have?

100)Can you give more than one source to transformer (If you say
“No” he will ask what error it will give when you try to do this)

101)If a company manintaining SCD type1, Now it decided to change


their plan to maintain Type2 example in customer table.
What are all the changed need to do in the customer table?
(wheather you have to change the structure of the table if it’s under
Type3 right?
Or no changes? How do u implement this?)

102)How many dimentions are there in your project war are they?

103)What are facts in your fact table?

104)How do u get system date in Oracle?

105)What is a Dual table in Oracle?

106)What is the use of UNION in oracle ?If I write query select * from
EMP UNION select * from dept, is it executed well??

107)select * from EMP table group by dept; is this query executed?


If no wat is the error?
108)What are XML files how do u read the data frm XML files which
stage is used?

109)How do u catch bad rows from OCI stage

110) How do u use SQL loader or OCI stage?

111)How do you populate source files?

112)How do u pass file name as parameters for job?

113) HOW DO U PASS PARAMETRES IF JOB IS RUNNING AT NIGHT

114)What happens if job fails at night?

115)What is SQL tuning how to do it?

116)How do we call external fncs or subroutines from datastage?

117)How do u do oracle 4 way inner join if there are 4 oracle i/p files?

118)Diff between oracle 8i and 9i?

119)what is a quality stage?

120)whata is a meta stage?

121)How can we join oracle source and sequential file?

122)How can we implement lookup in DS server jobs?

123)what are the 3rd party tools used in DS

124)what are job parameters?

125)How can we create containers?

126)what are system variables?

127)How to fix error “OCI has fetched truncated data”

128) How to create batched from DS command prompt?

129)How do u eliminate duplicate rows?

130)Suppose if there are million records,did u use OCI?if not what


stage will u prefer?

131)What is the order of execution done internally with the


transformerwith the stage editor having i/p links on the left hand side
and oup pur links?

132) I want to process 3 files in a sequence one by one,How can I do


that? While processing the files it should fetch files automatically?

133)Steps to create a flat file jobs?

134)is the any tool by ascential to pull the meta data from various
sources?

135)What is the Table definition changes ?what impact will it have in


our jobs?

136)How to use debugger?

137)How to schedule DS job thru unix script?

138)explain Autosys?

139)How to use hash file and how to create hash file?

140)If a table definition has been changed in manager will that


automatically polulate in the job ?

141) can we use sequential file as a reference file ?


Answer is NO.

142)Difference between a hashed file and a sequential file

143)Different options to see table definitions?

144)What is Normalizer Transformation?

145)what are bridge tables?

146)Types of indexes?

147) Type of partitioning?

148)How do u take care of unknown values for the primary key for
dimention?

149)What is ETL architecture?


150)How many types of loading techniques are available?

151)How can we call a stored procedure in datastage?

152) Difference between DRS stage and ODBC? Which one is the best
for performance?

153) What is the difference between interprocess ans inprocess?


Which one is the best?

154) What is CRC32? On which situation we go for CRC32?

155)What is row-splitter and Row-Merger? Can we use separately is


that possible?

156)If one user locked the resource?How to release the particular job?

157)Difference between Clear Log files and Clear stage-files?

158)Differences between statis and dynamic hash files?

159)How do we set environment variables in DS?

160)How to relase a job?

161)How to do Auto-perge in DS?

162)Difference between Top-down and Bottom-up approaches?

163)What is a cleansing?

164)What r generated dimentions?

165)what are SCD and SGT? Difference between them and explain
SGT in u r project?

166)Which cache supports connected and unconnected lookups?

Você também pode gostar