Você está na página 1de 36

Development and Debugging Stages:

Head Stage:
1.The Head Stage is a Development/Debug stage
2. It can have a single input link and a single output link
3. The Head Stage selects the first N rows from each partition of an input data set and
copies the selected rows to an output data set. You determine which rows are copied by
setting properties which allow you to specify:
The number of rows to copy
The partition from which the rows are copied
The location of the rows to copy
The number of rows to skip before the copying operation begins
4.This stage is helpful in testing and debugging applications with large data sets. For
example, the Partition property lets you see data from a single partition to determine if
the data is being partitioned as you want it to be. The Skip property lets you access a
certain portion of a data set.
The stage editor has three pages:
Stage Page. This is always present and is used to specify general information about
the stage.
Input Page. This is where you specify the details about the single input set from
which you are selecting records.
Output Page. This is where you specify details about the processed data being output
from the stage.
Head stage properties:
1. Rows
All Rows[After Skip]=True/False
Copy all rows to the output following any requested skip positioing
All Rows[After Skip]=False
No of rows [Per partition]=10
Period[Per Partition]=N
Copy every N'th row per partition, starting with the first.
Skip[Per Partition]=1
Number of rows to skip at the start of every partition.
If we select false then only No of rows [Per partition]=10 will come

2. Partitions
All Partition=True
When set to True copies rows from all partitions. When set to False, copies from specific
partition numbers, which must be specified.

Example Job for Head Stage:


Input SeqEmpData:

Output seqEmpdata:

JOB:

InputseqEmpData Properties:
2

Case-1:
Head stage Properties:
AllRows=False
Number of rows=5
Allpartitios=TRUE

Head Stage Output columns:

Target Output_Sequentialdata:

Example Job for Head Stage:


Input SeqEmpData:

OutputSequential data:

Job:

Input SeqEmpData:

Head stage Properties


Case-2:
Head stage Properties:
AllRows=True
Allpartitions=True

Head stage output columns:

Target OutputSequentialData:

Target Output_Sequentail_data:

Tail Stage:
1.The Tail Stage is a Development/Debug stage
2. It can have a single input link and a single output link
3. The Tail Stage selects the last N records from each partition of an input data set and
copies the selected records to an output data set. You determine which records are copied
by setting properties which allow you to specify:
The number of records to copy
The partition from which the records are copied

4.This stage is helpful in testing and debugging applications with large data sets. For
example, the Partition property lets you see data from a single partition to determine if
the data is being partitioned as you want it to be. The Skip property lets you access a
certain portion of a data set
The stage editor has three pages:
Stage Page. This is always present and is used to specify general information about
the stage.
Input Page. This is where you specify the details about the single input set from
which you are selecting records.
Output Page. This is where you specify details about the processed data being output
from the stage
Tail stage properties:
1.Rows
2.Partitions
Rows:
No of rows[Per partition]=10(Default is 10 if we need more than 10 or less
than 10 we have to change the number)
Number of rows to copy from input to output per partition.

Partitions:
All Partition=True/False
When set to True copies rows from all partitions. When set to False, copies from
specific partition numbers, which must be specified.

Example Job for Tail Stage:


Input SeqEmpData:

JOB:

Input seqEmpdata Properties:

Tailstage Properties:

Output Columns:

10

Target Output_Sequentialdata:

Sample Stage:
1.The Sample stage is a Development/Debug stage.
2. It can have a single input link and any number of output links when operationg in
percent mode,
3. And a single input and single output link when operating in period mode
4.The Sample stage samples an input data set. It operates in two modes. In Percent mode,
it extracts rows, selecting them by means of a random number generator, and writes a
given percentage of these to each output data set. You specify the number of output data
sets, the percentage written to each, and a seed value to start the random number
generator. You can reproduce a given distribution by repeating the same number of
outputs, the percentage, and the seed value
5.In Period mode, it extracts every Nth row from each partition, where N is the period,
which you supply. In this case all rows will be output to a single data set, so the stage
used in this mode can only have a single output link
6.For both modes you can specify the maximum number of rows that you want to sample
from each partition.
The stage editor has three pages:

11

Stage Page. This is always present and is used to specify general information about
the stage.
Input Page. This is where you specify details about the data set being Sampled.
Output Page. This is where you specify details about the Sampled data being output
from the stage.
EXAMPLE JOB FOR SAMPLE STAGE:
Note: Sample stage we can Operate in Two Modes one is Period and Another one is
Percentage Mode
Input data:

JOB:

Input seqfile properties:

12

Sample stage properties:

Output Columns:

13

Target Seqfile properties:

Output data:

14

Note :Here out put we get only 3 records because we set option period[perpartion]=3 So
it takes every 3 rd record from input file data

Example Job for Sample stage :


Operate in percentage mode:
Input seqfile data:

Job:

15

Input sequential file stage properties:

Sample stage properties:

16

Output columns for Output1:

Output Columns for Output2:

Output Coluns for output3:


17

Sample stage link order ing:

Output1 seqfile stage properties:

18

Output data for outdat1 seqfile:

Output2 Seqfile stage properties:

19

Output2 seqfile data:

Output3 Seqfile stage properties:

Output3 Seqfile data:

20

Peek Stage:
1.The Peek stage is a Development/Debug stage.
2. It can have a single input link and any number of output links.
3.The Peek stage lets you print record column values either to the job log or to a separate
output link as the stage copies records from its input data set to one or more output data
sets.
4.Like the Head stage and the Tail stage (Sample stage), the Peek stage can be helpful for
monitoring the progress of your application or to diagnose a bug in your application.
The stage editor has three pages:
Stage Page. This is always present and is used to specify general information about
the stage.
Input Page. This is where you specify the details about the single input set from
which you are selecting records.
Output Page. This is where you specify details about the processed data being output
from the stage.

Peek Stage Properties:


1.Rows
All records[AfterSkip]=True/False
Print all rows following any requested skip positioning.
All records[AfterSkip]=True/False
If we select True then Number of records [Per Partition]=10 it wont come if we set false
then only the number of records[ Per Partition]=10 will appear
2.Columns
Peek all input columns=True/False
When set to True prints all column values. When set to False, prints specific column
values, which must be specified.
3.Partitions
All Partitions=True/False
When set to True prints rows from all partitions. When set to False, prints from specific
partition numbers, which must be specified.
4.Options
Peek Records output mode=Joblog/output
Job log = print output to log file; Output = write to second output link of stage
Show column names=True/False
Set True to print the column name, followed by a colon, followed by the value; otherwise
prints only the value, followed by a space.
21

Example Job for Peek Stage:


Inputdata:

JOB:

Input sequential file properties:

22

Peek stage properties:

Peek stage Output columns:

23

Output seqfile properties:

Output seq file data:

EXAMPLE JOB FOR


PEEKSTAGE:
Inputdata:

Option outputmode=Joblog:
Job:

24

Input seqfile properties:

Input seqfile data:

25

peek stage properties:

Here we set the option Peek output mode=job log so we can see the data at Logs only
Procedure for see the data at logs:
Go to the tools and run directornow click on view log it will show the screen like

26

in the above screen from bottom to 8th row u click it will show the log details

27

Example Job for Peek Stage


Inputdata:

JOB:

28

Input seqfile data:

Input seqfile properties:

29

Peek stage properties:

Peek output1 columns:

30

peek output1 mappings:

Peek output2 columns:

31

peekoutput2 mappings:

peekoutput3 columns:

32

Peekoutput3 mappings:

peekout1 properties:

33

peekoutput1 data:

Peekoutput2 properties:

34

Peekoutput2 data:

Peekoutput2 properties:

35

Peekoutput3 data:

Peekoutput3 properties:

36

Você também pode gostar