Você está na página 1de 13

TCS Public

BFS 2.1

Prasanna Desigan Kesavan
prasannadesigan.k@tcs.com
ETL Testing Simplified

Version 1.1
20/01/2012

ETL Testing Simplified


Internal Use 2

Document History

Revision History


Version Date of Change Owner of Changes Description of Change
1.0 13 Jan 2012 Prasanna Desigan
Kesavan
Created the document
1.1 20 Jan 2012 Prasanna Desigan
Kesavan
Added Revision History, TOC, Page
Numbers

ETL Testing Simplified


Internal Use 3

Table of Contents
1. Introduction _____________________________________________________________ 4
2. Types of ETL Data movement _______________________________________________ 4
a) Source to Target through Direct Pull ______________________________________________ 4
b) Source to Target through Lookup ________________________________________________ 4
c) Source to Target straight move and Source to Target through Lookup ________________ 5
d) Source to Target through Direct Pull and Source to Target through Derivation _________ 6
e) Source to Target through Direct Pull, through Lookup and through Derivation _________ 7
3. Different stages of ETL testing ______________________________________________ 8
a) Record Count Verification _______________________________________________________ 8
b) Data Completeness Verification __________________________________________________ 8
c) Data Integrity Verification _____________________________________________________ 10
i. Data Integrity verification for Direct Pull fields _________________________________________ 10
ii. Data Integrity verification for Lookup fields ___________________________________________ 10
iii. Data Integrity verification for Derived fields ___________________________________________ 11
d) Data Quality Verification _______________________________________________________ 12
e) Delta Load Verification ________________________________________________________ 13
i. Truncate and Reload _______________________________________________________________ 13
ii. Delta Load ________________________________________________________________________ 13


ETL Testing Simplified


Internal Use 4

1. Introduction
ETL stands for Extract Transform Load which means Data gets extracted from source system, then gets
transformed as per the requirements of the target system and gets loaded in to target system. ETL is
mainly applicable for data movement between tables and data bases. Testing whether the data
movement has been done properly is the main purpose ETL testing.
2. Types of ETL Data movement
a) Source to Target through Direct Pull
Data available in Source system is copied as such in to Target system without any transformation. The
field names might get abbreviated or expanded, but the values stay intact.
For example, all records in TCS.EMPLOYEE table are moved in to TCS_STG.EMP_STG table where
EMPLOYEE is a table in TCS schema while EMP_STG is a table in TCS_STG schema. TCS.EMPLOYEE has
the fields EMPLOYEE_NUMBER, EMPLOYEE_STATUS, EMPLOYEE_NAME and
EMPLOYEE_JOINING_DATE, while TCS_STG.EMP_STG has the fields EMP_ID, EMP_STA, EMP_NM and
EMP_JOIN_DT respectively.
Source TCS.EMPLOYEE
EMPLOYEE_N
UMBER
EMPLOYEE_ST
ATUS
EMPLOYEE_N
AME
EMPLOYEE_JOINING_DATE
267523 Active Prasanna 19-09-1993

SELECT EMPLOYEE_NUMBER as EMP_ID,
EMPLOYEE_STATUS as EMP_STA,
EMPLOYEE_NAME as EMP_NM,
EMPLOYEE_JOINING_DATE as EMP_JOIN_DT
FROM TCS.EMPLOYEE

Target TCS_STG.EMP_STG
EMP_ID EMP_STA EMP_NM EMP_JOIN_DT
267523 Active Prasanna 19-09-1993
b) Source to Target through Lookup
Data available in Source system is utilized to pick up values from Lookup system and picked up values
are directly moved in to Target system without any transformation. Data available in Source system
will not be there in Target system, but corresponding data from Lookup system will be present in
Target system. The field names might get abbreviated or expanded, but the values stay intact.

ETL Testing Simplified


Internal Use 5
For example, all ID information in TCS.EMPLOYEE_DETAIL table is moved as name information in to
TCS_STG.EMP_DTL_STG table where EMPLOYEE_ DETAIL is a table in TCS schema while EMP_DTL_STG
is a table in TCS_STG schema. TCS.EMPLOYEE_DETAIL has the fields EMPLOYEE_ID, DEPARTMENT_ID
and BRANCH_ID while TCS_STG.EMP_DTL_STG has the fields EMP_ NM, DEPT_NM and BNCH_NM
respectively. The Names of the Employee, Department and Branch are picked up from TCS.EMPLOYEE,
TCS.DEPARTMENT and TCS.BRANCH Lookup tables respectively by utilizing IDs of Employee,
Department and Branch in TCS.EMPLOYEE_DETAIL
Source TCS.EMPLOYEE_DETAIL
EMPLOYEE_ID DEPARTMENT_ID BRANCH_ID
267523 2 5
Lookup
TCS.EMPLOYEE
EMPLOYEE_ID EMPLOYEE_NAME
267523 Prasanna
TCS.DEPARTMENT
DEPARTMENT_ID DEPARTMENT _NAME
2 Testing
TCS.BRANCH
BRANCH_ID BRANCH _NAME
5 Chennai

SELECT EMPLOYEE_NAME as EMP_NM,
DEPARTMENT_NAME as DEPT_NM,
BRANCH_NAME as BNCH_NM
FROM TCS.EMPLOYEE_DETAIL A,
TCS.EMPLOYEE B,
TCS.DEPARTMENT C,
TCS.BRANCH D
WHERE A.EMPLOYEE_ID = B.EMPLOYEE_ID
AND A.DEPARTMENT_ID = C.DEPARTMENT_ID
AND A.BRANCH_ID = D.BRANCH_ID

Target TCS_STG.EMP_DTL_STG
EMP_NM DEPT_NM BNCH_NM
Prasanna Testing Chennai
c) Source to Target straight move and Source to Target through Lookup
Some of the data in Source system is copied as such in to Target system without any transformation,
while remaining data is utilized to pick up values from Lookup system and picked up values are
directly moved in to Target system without any transformation. Data available in Source system will be
there in Target system and corresponding data from Lookup system will also be present in Target
system. The field names might get abbreviated or expanded, but the values stay intact.
Source TCS.EMPLOYEE_DETAIL
EMPLOYEE_ID DEPARTMENT_NAME BRANCH_NAME
267523 Testing Chennai
Lookup TCS.EMPLOYEE
EMPLOYEE_ID EMPLOYEE_NAME
267523 Prasanna

Target TCS_STG.EMP_DTL_STG
EMP_ID EMP_NM DEPT_NM BNCH_NM
267523 Prasanna Testing Chennai

ETL Testing Simplified


Internal Use 6

SELECT EMPLOYEE_ID as EMP_ID,
EMPLOYEE_NAME as EMP_NM,
DEPARTMENT_NAME as DEPT_NM,
BRANCH_NAME as BNCH_NM
FROM TCS.EMPLOYEE_DETAIL A,
TCS.EMPLOYEE B
WHERE A.EMPLOYEE_ID = B.EMPLOYEE_ID
d) Source to Target through Direct Pull and Source to Target through Derivation
All/Some of the data in Source system is copied as such in to Target system without any
transformation, while some data is utilized to derive values that are not available in Source system and
derived values are moved in to Target system. Data available in Source system will be there in Target
system and derived data will also be present in Target system. The field names might get abbreviated
or expanded.
For example, all records in TCS.EMPLOYEE table are moved in to TCS_STG.EMP_STG table where
EMPLOYEE is a table in TCS schema while EMP_STG is a table in TCS_STG schema. TCS.EMPLOYEE has
the fields EMPLOYEE_NUMBER, EMPLOYEE_STATUS, EMPLOYEE_NAME and
EMPLOYEE_JOINING_DATE, while TCS_STG.EMP_STG has the fields EMP_ID, EMP_STA, EMP_NM,
EMP_JOIN_DT and EMP_GEMS.
EMP_GEMS in TCS_STG.EMP_STG is derived from EMP_STA and EMP_JOIN_DT in TCS.EMPLOYEE table
using the following business logics. When Employee Status is Terminated, EMP_GEMS = 0. When
Employee Status is Active and Employee has more than 3 years of experience, EMP_GEMS = 1000.
When Employee Status is Active and Employee has more than 5 years of experience, EMP_GEMS =
2000
Source TCS.EMPLOYEE
EMPLOYEE_N
UMBER
EMPLOYEE_ST
ATUS
EMPLOYEE_N
AME
EMPLOYEE_JOINING_DATE
267523 Active Prasanna 19-09-2003
267524 Active Vivek 19-09-2007
267525 Terminated Jaikumar 19-09-1993

SELECT EMPLOYEE_NUMBER as EMP_ID,
EMPLOYEE_STATUS as EMP_STA,
EMPLOYEE_NAME as EMP_NM,
EMPLOYEE_JOINING_DATE as EMP_JOIN_DT,
CASE
WHEN EMP_STA = 'Terminated'
THEN 0
WHEN EMP_STA = 'Active' AND EMP_JOIN_DT < 20 - JUL - 2008
THEN 1000
WHEN EMP_STA = 'Active' AND EMP_JOIN_DT < 20 - JUL - 2006
THEN 2000

ETL Testing Simplified


Internal Use 7
END as EMP_GEMS
FROM TCS.EMPLOYEE

e) Source to Target through Direct Pull, through Lookup and through Derivation
All/Some of the data in Source system is copied as such in to Target system without any
transformation while some data is utilized to pick up values from Lookup system and picked up values
are directly moved in to Target system without any transformation. Also, some data is utilized to
derive values that are not available in Source system and derived values are moved in to Target
system. Data available in Source system, corresponding data from Lookup system and derived data
will be present in Target system. The field names might get abbreviated or expanded
Source TCS.EMPLOYEE
EMPLOYEE_N
UMBER
EMPLOYEE_ST
ATUS
BRANCH_N
AME
EMPLOYEE_JOINING_DATE
267523 Active Chennai 19-09-2003
267524 Active Hyderabad 19-09-2007
267525 Terminated Bangalore 19-09-1993
Lookup TCS.EMP_DTL
EMPLOYEE_N
UMBER
EMPLOYEE_N
AME
267523 Prasanna
267524 Vivek
267525 Jaikumar

SELECT EMPLOYEE_NUMBER as EMP_ID,
EMPLOYEE_STATUS as EMP_STA,
EMPLOYEE_NAME as EMP_NM,
EMPLOYEE_JOINING_DATE as EMP_JOIN_DT,
CASE
WHEN EMP_STA = 'Terminated'
THEN 0
WHEN EMP_STA = 'Active' AND EMP_JOIN_DT < 20 - JUL - 2008
THEN 1000
WHEN EMP_STA = 'Active' AND EMP_JOIN_DT < 20 - JUL - 2006
THEN 2000
END as EMP_GEMS
FROM TCS.EMPLOYEE A,
TCS.EMP_DTL B
WHERE A.EMPLOYEE_NUMBER = B.EMPLOYEE_NUMBER

Target TCS_STG.EMP_STG
EMP_ID EMP_STA EMP_NM EMP_JOIN_DT EMP_GEMS
267523 Active Prasanna 19-09-2003 1000
267524 Active Vivek 19-09-2007 2000
267525 Terminated Jaikumar 19-09-1993 0

ETL Testing Simplified


Internal Use 8
3. Different stages of ETL testing
a) Record Count Verification
In any type of ETL data movement, comparing the number of records in Source system and Target
system is the definition of Record Count Verification. The Source system will mostly be a single table
from which the all/key information will be loaded in Target table.
For example, Data from TCS.EMPLOYEE_DETAIL, TCS.EMPLOYEE, TCS.DEPARTMENT and TCS.BRANCH
tables are loaded in to TCS_STG.EMP_DTL_STG. The Source system here will be
TCS.EMPLOYEE_DETAIL table as any Employee who has a record in this table will have a record in
TCS_STG.EMP_DTL_STG table irrespective of the employee being present in TCS.EMPLOYEE,
TCS.DEPARTMENT and TCS.BRANCH tables.
Here, we will verify whether the number of records or employees in TCS.EMPLOYEE_DETAIL is same as
the number of records in TCS_STG.EMP_DTL_STG table.
This will be 1 scenario as well as 1 test case.
SELECT Source, COUNT(*)
FROM TCS.EMPLOYEE_DETAIL A
UNION
SELECT Target, COUNT(*)
FROM TCS_STG.EMP_DTL_STG B
b) Data Completeness Verification
In all types of ETL data movement except Source to Target through Lookup, comparing the key
information in Source system and Target system is the definition of Data Completeness Verification.
The Source system will mostly be a single table from which the all/key information will be loaded in
Target table.
For example, EMP_ID, EMP_STA, EMP_NM, EMP_JOIN_DT along with EMP_GEMS derived from
TCS.EMPLOYEE table are loaded in to TCS_STG.EMP_STG table.
Here, we will verify whether all EMPLOYEE_NUMBERs in TCS.EMPLOYEE table are present in
TCS_STG.EMP_STG table as EMP_IDs and no additional EMP_IDs are present in TCS_STG.EMP_STG.
SELECT EMPLOYEE_NUMBER
FROM TCS.EMPLOYEE_DETAIL A
Target TCS_STG.EMP_STG
EMP_ID EMP_STA EMP_NM
EMP_BNCH_
NM
EMP_JOIN_
DT
EMP_G
EMS
267523 Active Prasanna Chennai 19-09-2003 1000
267524 Active Vivek Hyderabad 19-09-2007 2000
267525 Terminated Jaikumar Bangalore 19-09-1993 0

ETL Testing Simplified


Internal Use 9
MINUS
SELECT EMP_ID
FROM TCS_STG.EMP_DTL_STG B

SELECT EMP_ID
FROM TCS_STG.EMP_DTL_STG B
MINUS
SELECT EMPLOYEE_NUMBER
FROM TCS.EMPLOYEE_DETAIL A
In Source to Target through Lookup type of ETL data movement, comparing the key information in
Source system and key information in Lookup system corresponding to the picked up information in
Target system is the definition of Data Completeness Verification. The Source system will be a
combination of one or more table from which the key information will be picked up and loaded in
Target table.
For example, EMP_ NM, DEPT_NM and BNCH_NM are picked up from TCS.EMPLOYEE,
TCS.DEPARTMENT and TCS.BRANCH Lookup tables by utilizing IDs of Employee, Department and
Branch in TCS.EMPLOYEE_DETAIL and loaded in to TCS_STG.EMP_DTL_STG table.
Here, we will verify whether all EMPLOYEE_NUMBERs in TCS.EMPLOYEE_DETAIL table are present as
EMP_NMs in TCS_STG.EMP_DTL_STG table and no additional EMP_NMs are present in
TCS_STG.EMP_DTL_STG table.
This will be 1 scenario which has 2 test cases under it. That is one for verifying whether all key
information in Source system is present in Target system, while other for verifying whether no
additional key information in present in Target system.
SELECT EMPLOYEE_NAME
FROM TCS.EMPLOYEE_DETAIL A
TCS.EMPLOYEE B
WHERE A.EMPLOYEE_NUMBER = B.EMPLOYEE_NUMBER
MINUS
SELECT EMP_NM
FROM TCS_STG.EMP_DTL_STG C

SELECT EMP_NM
FROM TCS_STG.EMP_DTL_STG C
MINUS
SELECT EMPLOYEE_NAME
FROM TCS.EMPLOYEE_DETAIL A
TCS.EMPLOYEE B
WHERE A.EMPLOYEE_NUMBER = B.EMPLOYEE_NUMBER

ETL Testing Simplified


Internal Use 10
c) Data Integrity Verification
Comparing the values of Non key fields between Target system and Source/Lookup system is the
definition of Data Integrity Verification.
For example, EMP_ID, EMP_STA, EMP_JOIN_DT, EMP_ NM, DEPT_NM, BNCH_NM and EMP_GEMS are
present in TCS_STG.EMP_DTL_STG table. Here
EMP_ID, EMP_STA and EMP_JOIN_DT fields are directly pulled from TCS.EMPLOYEE_DETAIL
EMP_ NM, DEPT_NM and BNCH_NM are picked up from TCS.EMPLOYEE, TCS.DEPARTMENT
and TCS.BRANCH Lookup tables by utilizing IDs of Employee, Department and Branch in
TCS.EMPLOYEE_DETAIL and loaded in to TCS_STG.EMP_DTL_STG table
EMP_GEMS in TCS_STG.EMP_STG is derived from EMP_STA and EMP_JOIN_DT in
TCS.EMPLOYEE table
There are some types of Data Integrity verification as follows:
i. Data Integrity verification for Direct Pull fields
Here, we will verify whether EMP_STA and EMP_JOIN_DT are matching between source
system TCS.EMPLOYEE_DETAIL table and target system TCS_STG.EMP_DTL_STG table for each
EMP_ID or Employee.
This will be 1 scenario as well as 1 test case. When the number of source systems is more that
is when combination of fields from different source systems makes one unique combination in
target system, this will be one scenario while the number of test cases will be equal to the
number of source systems.
SELECT EMP_ID,
EMP_STA,
EMP_JOIN_DT
FROM TCS.EMPLOYEE A
MINUS
SELECT EMPLOYEE_NUMBER as EMP_ID,
EMPLOYEE_STATUS as EMP_STA,
EMPLOYEE_JOINING_DATE as EMP_JOIN_DT
FROM TCS.EMPLOYEE A
ii. Data Integrity verification for Lookup fields
Here, we will verify whether
- EMP_ NM in target system TCS_STG.EMP_DTL_STG table is matching with
EMPLOYEE_NAME in Lookup system TCS.EMPLOYEE table for each EMP_ID or Employee
- DEPT_ NM in target system TCS_STG.EMP_DTL_STG table is matching with
DEPARTMENT_NAME in Lookup system TCS.DEPARTMENT table for each DEPT_ID or
Department
- BNCH_ NM in target system TCS_STG.EMP_DTL_STG table is matching with
BRANCH_NAME in Lookup system TCS.BRANCH table for each BNCH_ID or Branch

ETL Testing Simplified


Internal Use 11

This will be 1 scenario while the number of test cases will be equal to the number of lookup
systems that is 3 in our example.
SELECT EMP_ID,
EMP_NM
FROM TCS_STG.EMP_DTL_STG A
MINUS
SELECT EMPLOYEE_NUMBER as EMP_ID,
EMPLOYEE_NAME as EMP_NM
FROM TCS.EMPLOYEE_DETAIL A,
TCS.EMPLOYEE B
WHERE A.EMPLOYEE_ NUMBER = B.EMPLOYEE_ NUMBER

SELECT DEPT_ID,
DEPT_NM
FROM TCS_STG.EMP_DTL_STG A
MINUS
SELECT DEPARTMENT_ID as DEPT_ID,
DEPARTMENT_NAME as DEPT_NM
FROM TCS.EMPLOYEE_DETAIL A,
TCS.DEPARTMENT C
WHERE A.DEPARTMENT_ID = C.DEPARTMENT_ID

SELECT BNCH_ID,
BNCH _NM
FROM TCS_STG.EMP_DTL_STG A
MINUS
SELECT BRANCH_ID as BNCH_ID,
BRANCH_NAME as BNCH_NM
FROM TCS.EMPLOYEE_DETAIL A,
TCS.BRANCH D
WHERE A.BRANCH_ID = B.BRANCH_ID
iii. Data Integrity verification for Derived fields
Here, we will verify whether
- When EMP_STA = T in TCS_STG.EMP_DTL_STG table, EMP_GEMS = 0
- When EMP_STA = A and SYSDATE > (EMP_JOIN_DT + 3 Years) in TCS_STG.EMP_DTL_STG
table, EMP_GEMS = 1000
- When EMP_STA = A and SYSDATE > (EMP_JOIN_DT + 5 Years) in TCS_STG.EMP_DTL_STG
table, EMP_GEMS = 2000
There will be 1 scenario for 1 derived field, while the number of test cases will be equal to the
number of possible values for each derived field that is 3 in our example.
SELECT EMP_ID,
FROM TCS_STG.EMP_DTL_STG A

ETL Testing Simplified


Internal Use 12
WHERE EMP_STA = 'T'
AND EMP_GEMS > 0

SELECT EMP_ID,
FROM TCS_STG.EMP_DTL_STG A
WHERE EMP_STA = 'A' AND EMP_JOIN_DT < 20 - JUL - 2008
AND EMP_GEMS <> 1000

SELECT EMP_ID,
FROM TCS_STG.EMP_DTL_STG A
WHERE EMP_STA = 'A' AND EMP_JOIN_DT < 20 - JUL - 2006
AND EMP_GEMS <> 2000
d) Data Quality Verification
Verifying whether key field in Target system has unique values and certain other fields in Target
system does not have a value other than specified ones is the definition of Data Quality verification.
For example, EMP_ID, EMP_STA, EMP_JOIN_DT, EMP_ NM, DEPT_NM, BNCH_NM and EMP_GEMS are
present in TCS_STG.EMP_DTL_STG table.
EMP_ID field must have unique values as you cannot have more than one record for an
employee
SELECT EMP_ID
FROM TCS_STG.EMP_DTL_STG A
GROUP BY EMP_ID
HAVING COUNT(*) > 1
BNCH_NM field must have unique values for each employee as one employee cannot work in
more than one branch
SELECT EMP_NM
FROM TCS_STG.EMP_DTL_STG A
GROUP BY EMP_NM
HAVING COUNT(*) > 1
EMP_NM field cannot have NULL values as every employee must have a name
SELECT EMP_ID
FROM TCS_STG.EMP_DTL_STG A
WHERE EMP_NM IS NULL
EMP_STA cannot have a value other than T for Terminated and A for Active
SELECT EMP_ID
FROM TCS_STG.EMP_DTL_STG A
WHERE EMP_STA NOT IN (T,A)
EMP_JOIN_DT cannot be greater than SYSDATE

ETL Testing Simplified


Internal Use 13
SELECT EMP_ID
FROM TCS_STG.EMP_DTL_STG A
WHERE EMP_JOIN_DT > SYSDATE
There will be 1 scenario for fields that need to have unique values, while 1 scenario for fields that need
to have only specific values. The number of test cases will be equal to the number of fields that need
to have unique values and specific values that is 5 in our example.
These might not have been verified in the Source system and might have been moved as such in to
Target system. These can happen when the Target system gets appended with records from source
system during every load instead of replacing or updating the existing records. These can also happen
due to some refresh issues or truncation issues. Hence Data Quality verification is done apart from
Data Integrity verification.
e) Delta Load Verification
In any warehouse there are two different methods in which various tables are loaded.
i. Truncate and Reload
The Target table being loaded is completely truncated and fresh data from source tables is
loaded again.

ii. Delta Load
The source tables of the target table being loaded are scanned for any change in records. If
any changes are identified in any of the records in source tables, those records are alone
updated in Target table. Similarly, if any new records is identified in source table, those are
added in Target table and vice versa for deleted records in source table
Delta testing is applicable only for Delta load tables. In this type of Testing, change in a record of
Source Table is artificially created using UPDATE/INSERT/DELETE sql commands. Then the workflow
for loading the corresponding Target table is executed. If the changes made are reflected in the Target
at the end of the load, the testing is successful. If not, the reasons have to be analyzed and fixed