Escolar Documentos
Profissional Documentos
Cultura Documentos
A list of top frequently asked DataStage Interview Questions and answers are given
below.
DataStage is an integrated set of tools for designing, developing, running, compiling, and
managing applications. It can extract data from one or more data sources, achieve multi-
part conversions of the data, and load one or more target files or databases with the
resultant data.
o Projects
o Jobs
o Stages
o Servers
o Client Components
Quality stage Notes: Bhaskar Reddy
o DataStage provides partitioning and parallel processing techniques which allow the
DataStage jobs to process an enormous volume of data quite faster.
o It has enterprise-level networking.
o It's a data integration component of IBM InfoSphere information server.
o It's a GUI based tool.
o In DataStage, we need to drag and drop the DataStage objects, and also we can
convert it to DataStage code.
Quality stage Notes: Bhaskar Reddy
o DataStage is used to perform the various ETL operations (Extract, transform, load)
o It provides connectivity with different sources & multiple targets at the same time
o InfoSphere
o DataStage Server 9.1.2 or above
o Microsoft Visual Studio .NET 2010 Express Edition C++
o Oracle client (full client, not an instant client) if connecting to an Oracle database
o DB2 client if connecting to a DB2 database
1. File= /home/myFile1.txt
2. File= /home/myFile2.txt
3. File= /home/myFile3.txt
4. Read Method= Specific file(s) fcec
information to your key business goals such as big data and analytics, data warehouse
modernization, and master data management.
o IBM InfoSphere can connect with multiple source systems as well as write to various
target systems. It acts as a single platform for data integration.
o It is based on centralized layers. All the modules of the suit can share the baseline
architecture of the suite.
o It has some additional layers for the unified repository, for integrated metadata
services, and sharing a parallel engine.
o It has tools for analysis, monitoring, cleansing, transforming and delivering data.
o It has extremely parallel processing capabilities that provide high-speed processing.
Jobinfo: it returns the job information (Job-status, job runtime,end time, etc.)
Stageinfo: it returns the stage name, stage type, input rows, etc.)
Report: it is used to display a report which contains Generated time, start time, elapsed
time, status, etc.
IBM InfoSphere DataStage and QualityStage jobs can access data from enterprise
applications and data sources such as:
o Relational databases
o Mainframe databases
o Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM)
databases
o Online Analytical Processing (OLAP) or performance management databases
o Business and analytic applications
The Stream connector allows integration between the Streams and the DataStage.
InfoSphere Stream connector is used to send data from a DataStage job to a Stream job
and vice versa.
InfoSphere Streams can perform close to real-time analytic processing in parallel to the
data loading into a data warehouse. Alternatively, the InfoSphere Streams job
performs RTAP processing. After RTAP processing, it forwards the data to InfoSphere
DataStage to transform, enrich, and store the details for archival purposes.
Examples: If myexample1.time contains the time 22:30:00, then the following two
functions are equivalent and return the integer value 22.
1. HoursFromTime(myexample1.time)
2. HoursFromTime("22:30:00")
Data Encryption Data encryption needs to be done before Informatica allows "Dat
reaching the DataStage Server. Transformation" inside
Designer as a separate
22) If you want to use the same piece of code in different jobs,
how will you achieve it?
Quality stage Notes: Bhaskar Reddy
DataStage facilitates with a feature called shared containers which allows sharing the
same piece of code for a different job. The containers are shared for reusability. A shared
container consists of a reusable job element of stages and links. We can call a shared
container in, unlike DataStage jobs.
o Link sort
o Standalone Sort stage
Link sort is used unless a specific option is needed over Sort Stage. Most often, the Sort
stage is used to specify the Sort Key mode for partial sorts.
Routine is a set of tasks which are defined by the DS manager. It is run via the transformer
stage.
o Parallel routines
o Mainframe routines
o Server routines
It is because the user tries to assign a longer string to a shorter string destination, and
sometimes if the length of one or more range boundaries in a RANGE_N function is a string
literal with a length higher than that of the test value
Quality stage Notes: Bhaskar Reddy