Você está na página 1de 24

Business Intelligence and Tools Unit 7

Sikkim Manipal University Page No. 154


Unit 7 Data Transformation and Loading
Structure:
7.1 Introduction
Objectives
7.2 Overview of Transformation
7.2.1 Selection and Splitting/J oining
7.2.2 Summing Up
7.2.3 Conversion
7.2.4 Enrichment
Self Assessment Question(s) (SAQs)
7.3 Major Transformation Types
7.3.1 Format Revisions
7.3.2 Decoding of Fields
7.3.3 Calculated and Derived values
7.3.4 Splitting of Single Fields
7.3.5 Merging of Information
7.3.6 Summing Up
7.3.7 Character Set Conversion
7.3.8 Conversion of Units of Measurements
7.3.9 Key Restructuring
7.3.10 Reduplication
Self Assessment Question(s) (SAQs)
7.4 Data Integration and Consolidation
7.4.1 Identification of an Entity
7.4.2 Existence of Multiple Sources
Self Assessment Question(s) (SAQs)
7.5 Implementation of Transformation
7.5.1 Manual Methods
7.5.2 Transformation Tools
Self Assessment Question(s) (SAQs)
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 155
7.6 Transformation for Dimension Attributes
7.6.1 Type 1 Changes - Correction of Errors
7.6.2 Type 2 Changes - History Preservation
7.6.3 Type 3 Changes - Soft Revisions
7.7 Data Loading
7.7.1 Types of Load
7.7.2 Modes of applying the Data
7.7.3 Data Refresh versus Update
Self Assessment Question(s) (SAQs)
7.8 Summary
7.9 Terminal Questions (TQs)
7.10 Multiple Choice Questions (MCQs)
7.11 Answers to SAQs, TQs, and MCQs
7.11.1 Answers to Self Assessment Questions (SAQs)
7.11.2 Answers to Terminal Questions (TQs)
7.11.3 Answers to Multiple Choice Questions (MCQs)
7.1 Introduction
In the previous Unit, we have discussed several data extraction techniques.
But the extracted data is raw data and it cannot be directly loaded into a
data warehouse. To have useful information for strategic decision-making is
an underlying principle of the data warehouse and the data in the
operational source systems cannot fulfill this purpose. So the transformation
and loading functions play a key role in the preparation of the data that can
assist the senior managers of an organization in making the strategic
decisions.
Objectives:
The objectives of the Unit are to make you understand:
The basic tasks in the transformation function
Several types of transformation function
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 156
Data integration and consolidation
The implementation of transformation function
Techniques and processes involved in data loading
7.2 Overview of Transformation
You need to perform various types of transformation tasks before moving
the extracted data from the source systems into the data warehouse. The
transformation of the data is to be done as per the standards as the data
comes from various source systems and you also need to ensure that the
combined data does not violate the business rules.
Irrespective of the complexity of the source systems, and regardless of the
extent of your data warehouse, some of the basic tasks performed in the
data transformation function are as follows:
7.2.1 Selection and Splitting/Joining
This is the basic task that is done at the beginning of the entire data
transformation process. Using this task, you may select either whole records
or parts of several records from the source systems. Usually, the selection
task forms a part of the extraction function itself. But the composition of the
source structure may not be amenable to selection of the necessary parts
while extracting the data and you may have to extract the whole record and
sue the selection task as a part of the transformation function.
The splitting/joining task includes the type of data manipulation you need to
perform on selected records of the source systems. You can either split the
selected parts further or join the parts selected from many source systems.
But the joining task is quite often used in the data warehouse environment.
7.2.2 Summing Up
This task is used in case you find that it is not required to keep data at the
lowest level of detail in your data warehouse. For instance, for a grocery
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 157
chain, sales data at the lowest level of detail for every transaction at the
checkout may not be required. Storing sales by product by store by day may
be adequate. Therefore, the data transformation function includes
summarization of daily product and by store.
7.2.3 Conversion
This task includes a large variety of rudimentary conversions of single fields.
This task is done for two reasons:
to standardize the data among the data extractions from disparate
source systems
to make the fields usable and understandable to the users
7.2.4 Enrichment
This task involves the rearrangement and simplification of individual fields to
make them useful for the data warehouse environment. You can use one or
more fields from the same input record to create a better view of the data for
the data warehouse. The principle is extended when one or more fields
originate from multiple records, resulting in a single field for the data
warehouse.
Self Assessment Question(s) (SAQs)
For Section 7.2
1. What is the role of the transformation function in building a warehouse?
2. What are the basic tasks of transformation?
7.3 Major Transformation Types
By undertaking a combination of the basic tasks discussed above, you can
do the following transformation functions:
7.3.1 Format Revisions
Format revisions include changes to the data types and lengths of individual
fields. For instance, product package types in your source systems may be
indicated by codes and names in which the fields are numeric and text data
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 158
types. Also, the lengths of package types might vary from one source
system to another. Therefore, you can standardize and change the data
type to text in order to provide values meaningful to the users using format
revisions.
7.3.2 Decoding of Fields
This type of transformation deals with multiple source systems and you are
bound to have same data items described by a plethora of field values. For
instance, the coding for two products manufactured by an organization
might have been done as 1 and 2 in one source system and is done as A
and B in another system. In such situations, you need to decode the codes
and standardize the code before loading the data into a data warehouse;
otherwise there would be a conflict in the data analysis.
7.3.3 Calculated and Derived values
You can maintain both calculated and derived types of data values in a
typical data warehouse. For instance, you can keep profit margin (this can
be calculated as the difference between the total sales and total cost) as a
calculated value along with sales and cost amounts after extracting the data
from the sales system viz., sales volume, sales value, operating cost
estimates. Similarly, you may use average daily balances and operating
ratios as derived fields.
7.3.4 Splitting of Single Fields
You need to split the larger single files for improved understanding and
making better analysis. For instance, the traditional legacy systems store
name and address of customers in a large text files. Similarly, some
systems store city, state, and zip code data together in a single file. But
these components need to be stored individually to improve the operation
performance by indexing on individual components and to perform analysis
by using individual components such as city, state, and zip code.
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 159
7.3.5 Merging of Information
This type of transformation deals with merging of information available in
various source systems into a single entity. For instance, the product code
and description may come from one data source and the relevant package
types, the cost data may come from several other source systems. Here,
merging of information denotes combining the product code, description,
package types, and cost into a single entity.
7.3.6 Summing Up
In this type of transformation, the summaries are created and then loaded in
the data warehouse instead of loading the most granular level of data. For
instance, a credit card company need not store each and every single
transaction on each credit card in the data warehouse to analyze sales
patterns. Instead, the data can be summarized to the extent possible and
store the summary data instead of the most granular data.
7.3.7 Character Set Conversion
In this type of data transformation, the character sets are converted into an
agreed standard character set for textual data in the data warehouse. For
instance, the source data will be in EBCDIC (Extended Binary Coded
Decimal Interchange Code) characters if you have mainframe legacy
systems as source systems. So you need to convert from mainframe
EBCDIC format to the ASCII (American Standard Code for Information
Interchange), format if PC-based architecture is the choice of your data
warehouse.
7.3.8 Conversion of Units of Measurements
Use of standard unit of measurement is one of the prerequisites in building a
data warehouse. If your company has overseas operations, you may have
to convert the metrics accordingly so that the numbers may all be in one
standard unit of measurement.
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 160
Here, the date/time conversion is an important measurement. For example,
the date of October 9, 2006 is written as 10/09/2006 in the U.S format and
as 09/10/2006 in the British format. This can be standardized by writing it as
09 Oct 2006.
7.3.9 Key Restructuring
You have to come up with keys for the fact and dimension tables for a data
warehouse to be built based on the keys in the extracted records. So you
look at the primary keys of the extracted records while extracting data from
the input sources. For instance, the product code in an organization is
structured to have an inherent meaning (like first letter describes the
location code, second letter describes the machine code, etc.) and you use
this product code as the primary key and move the data into another
warehouse. Then the warehouse part of the product key will have to be
changed before moving the data. Therefore, avoid the keys with built-in
meanings while choosing keys for your data warehouse database tables
and transform such keys into generic keys (that are generated by the
system itself).
7.3.10 Reduplication
Some companies may maintain several records for a single customer and
so duplicates are the result of the additional records. Therefore, it is
suggested to keep a single record for one customer and link all the
duplicates in the source systems to this single record in your data
warehouse. This process is called reduplication.
Self Assessment Question(s) (SAQs)
For Section 7.3
1. Discuss the major types of transformation that are in practice and give
an example for each of these types.
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 161
7.4 Data Integration and Consolidation
In general, most of the data that the warehouse gets is the data extracted
from a combination of legacy mainframe systems, old minicomputer
applications, and some client/server systems. But these source systems do
not conform to the same set of business rules. Thus they may often follow
different naming conventions and varied standards for data representation.
Thus the process of data integration and consolidation plays a vital role.
Here, the data integration includes combining of all relevant operational data
into coherent data structures so as to make them ready for loading into data
warehouse. It standardizes the names and data representations and
resolves the discrepancies. Some of the challenges involved in the data
integration and consolidation process are as follows.
7.4.1 Identification of an Entity
Suppose there are three legacy applications that are in use in your
organization; one is the order entry system, second is customer service
support system, and the third is the marketing system. Each of these
systems might have their own customer file to support the system. Even
most of the customers will be common to all these three files, the same
customer on each of these files have a different unique identification
number.
As you need to keep a single record for each customer in a data warehouse,
you need to get the transactions of each customer from various source
systems and then match them up to load into the data warehouse. This is an
entity identification problem in which you do not know which of the customer
records relate to the same customer. This problem is prevalent where
multiple sources exist for the same entities and the other entities that are
prone to this type of problem include vendors, suppliers, employees, and
various products manufactured by a company.
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 162
In case of three customer files, you have to design complex algorithms to
match records from all the three files and groups of matching records. But
this is a difficult exercise. If the matching criterion is too tight, then some
records might escape the groups. Similarly, a particular group may include
records of more than one customer if the matching criterion designed is too
loose. Also, you might have to involve your users or the respective
stakeholders to understand the transaction accurately. Some of the
companies attempt this problem in two phases. In the first phase, the entire
records, irrespective whether they are duplicates or not, are assigned
unique identifiers and in the second phase, the duplicates are reconciled
periodically ether through automatic algorithms or manually.
7.4.2 Existence of Multiple Sources
Another major challenge in the area of data integration and consolidation
results from a single data element having more than one source. For
instance, cost values are calculated and updated at specific intervals in the
standard costing application. Similarly, your order processing application
also carries the unit costs for all products. Thus there are two sources
available to obtain the unit cost of a product and so there could be a slight
variation in their values. Which of these systems needs to be considered to
store the unit cost in the data warehouse becomes an important question.
One easy way of handling this situation is to prioritize the two sources, or
you may select the source on the basis of the last update date.
Self Assessment Question(s) (SAQs)
For Section 7.4
1. What is data integration and consolidation?
2. Discuss the major challenges involved in the process of data integration
and consolidation?
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 163
7.5 Implementation of Transformation
The implementation of data transformation is a complex exercise. You may
have to go beyond the manual methods, usual methods of writing
conversion programs while deploying the operational systems. You need to
consider several other factors to decide the methods to be adopted.
Suppose you are considering automating the data transformation functions,
you have to identify, configure and install the tools, train the team on these
tools, and integrate them into the data warehouse environment. But a
combination of both methods proves to be effective. The issues you may
face in using manual methods and transformation tools are discussed
below.
7.5.1 Manual Methods
These are the traditional methods that are in practice in the recent past.
These methods are adequate in case of smaller data warehouses. These
methods include manually coded programs and scripts that are mainly
executed in the data staging area. Since these methods call for elaborate
coding and testing and programmers and analysts who posses the
specialized knowledge in this area only can produce the programs and
scripts.
Although the initial cost may be reasonable, ongoing maintenance may
escalate the cost while implementing these methods. Moreover these
methods are always prone to errors. Another disadvantage of these
methods is about the creation of metadata. Even if the in-house programs
record the metadata initially, the metadata needs to be updated every time
the changes occur in the transformation rules.
7.5.2 Transformation Tools
The difficulties involved in using the manual methods can be eliminated
using the sophisticated and comprehensive set of transformation tools that
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 164
are now available. Use of these automated tools certainly improves
efficiency and accuracy. If the inputs provided into the tools are accurate,
then the rest of the work is performed efficiently by the tool. So you have to
carefully specify the required parameters, the data definitions and the rules
to the transformation tool.
Also, the transformation tools enable the recording of metadata. When you
specify the transformation parameters and rules, these values are stored as
metadata by the tool and this metadata becomes a part of the overall
metadata component of the data warehouse. When changes occur to
business rules or data definitions, you just have to enter the changes into
the tool and the metadata for the transformations get adjusted automatically.
But relying on the transformation tools alone without using the manual
methods is also not practically possible.
Self Assessment Question(s) (SAQs)
For Section 7.5
1. Elaborate the types of methods in practice while implementing the
transformation function of building a data warehouse?
7.6 Transformation for Dimension Attributes
Now we consider the updating of the dimension tables. The dimension
tables are more stable in nature and so they are less volatile compared to
the fact tables. The fact tables change through an increase in the number of
rows, but the dimension tables change through the changes to the
attributes. For instance, we consider a product dimension table. Every year,
rows are added as new models become available. But what about the
attributes that are within the dimension table. You might face a situation
where there is a change in the product dimension table because a particular
product was moved into a different product category. So the corresponding
values must be changed in the product dimension table. Though most of the
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 165
dimensions are generally constant over a period of time, they may change
slowly.
The usual changes in the dimension tables can be classified into three types
as provided below (Refer Fig 7.1).
7.6.1 Type 1 Changes Correction of Errors
Type 1 changes are applied to the data warehouse without any need to
preserve history as these changes usually relate to the corrections of errors
in the source systems. For instance if there is a spelling mistake in the name
of a customer in the source system, as this name is erroneous, it needs to
be discarded and also there is no need to preserve the old name. Therefore
it is understood that the Type 1 changes have little significance and the old
values need not be preserved in the data warehouse.
The method of applying the Type 1 changes is to overwrite the attribute
value in the dimension table row with new value. Also, it will not affect the
key of this dimension table. This method is easy to implement as the old
value need not be preserved and no other changes are made in the
dimension table row.
7.6.2 Type 2 Changes History Preservation
Suppose there is a change in the marital status of a customer and one of
the essential requirements of your data warehouse is to track the orders
according to the marital status. If the customer is married on 9
th
December
2001, all his orders before the marriage date needs to be under single and
they need to be under married after the marriage date. So there is a need
to preserve the history in the data warehouse.
The Type 2 changes are related to true changes in source systems and this
change leads to partitioning of the history in the data warehouse. So every
change for the same attribute is to be preserved in case of Type 2 changes.
To apply the Type 2 changes, you can add a new dimension table row with
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 166
new value of the changed attribute. The key of the original row is not
affected and there are no changes in the original row in the dimension table.
An effective date file may be included and the new row is inserted with a
new surrogate key.
7.6.3 Type 3 Changes Soft Revisions
Type 3 changes are tentative or soft revisions. Unlike the Type 2 changes,
the orders need to be maintained in the old and new groups after an
effective date. For instance, you moved a salesperson from Territory A to
Territory B to analyze his ability in both the territories. Therefore his orders
need to be captured in both the territories after an effective date. So there is
a need to keep track of history with old and new values of the changed
attribute.
To apply for Type 3 changes, you have to add an old file in the dimension
table for the affected attribute. Then you push down the existing value of the
attribute from the current field to the old field and keep the new value of
the attribute in the current field. Also, you may add a current effective date
field for the attribute. Here, the key of the row is not affected and no new
dimension is needed. The existing queries will seamlessly be switched to
the current value and any queries that need to use the old value is to be
revised accordingly.
In order to apply all these changes correctly, you have to transform the
incoming changes and prepare the changes to the data for loading into the
data warehouse.
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 167
Error!
7.7 Data Loading
After the creation of load images, the next set of activities is to take the
prepared data, apply it to the data warehouse, and store it in the data
warehouse database. Here, the data warehouse will be offline during the
loads. As the process of loading is a time-consuming activity, it is preferred
to divide the whole load process into smaller chunks and populate a few
files at a time. This enables you to run the smaller loads in parallel. Also,
you can keep some parts of the warehouse up and running while loading the
other parts.
7.7.1 Types of Load
There are three types of application of data to the data warehouse:
Initial Load that involves populating all the data warehouse tables for
the first time
Create Load Image
(include effective date)
Source System
data changes for
dimensions
Perform data
transformation functions
Perform data
cleansing functions
Determine type of
dimension change
Consolidate and
integrate data
Convert production key to
existing surrogate key
Convert production key to
new surrogate key
Convert production key
to existing surrogate key
Create Load
Image
Create Load
Image
Fig. 7.1: Transformation for Dimensional Attributes
Type 1 Type 2 Type 3
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 168
Increment Load that involves applying ongoing changes as necessary
in a periodic manner
Full refresh that involves complete erasing of the contents of one or
more tables and reloading with fresh data (initial load is afresh of all the
tables)
For instance, consider a product data. During the initial load, you extract the
data for all products from the source systems, integrate and transform it,
and create load images to load the data into the product dimension table.
During an incremental load, you collect the changes to the product data in
the source systems since the previous extract, run the changes through the
integration and transformation process, and create output records to be
applied to the product dimension table. A full refresh is similar to the initial
load wherein the fresh data is reloaded.
7.7.2 Modes of applying the Data
To apply the data to the warehouse, you may adopt any of the following
modes:
Load: The load process wipes out the existing data and applies the data
from the incoming file to the target table. If the table is empty before
loading, the load process simply applies the data from the incoming file.
Append: The append process unconditionally adds the incoming data,
preserving the existing data in the target table. When an incoming
record is a duplicate of an existing record, you can define the process,
either to allow or reject the incoming record.
Destructive Merge: When you apply the incoming data to the target
data, the destructive merge process updates target record, if the primary
key of an incoming record matches with the key of an existing record.
The incoming record simply gets added to the target table, if the
incoming record is a new record.
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 169
Constructive Merge: If the primary key of an incoming record matches
with the key of an existing record, it leaves the existing record, adds the
incoming record and marks it as superceding the old record.
Some of the important points you need to understand with regard to data
loading are as follows:
The straight forward method of applying the data is writing special load
programs and the number of load programs can be large depending on
the size of the warehouse.
It is difficult to estimate the running times of the loads (especially the
initial load or a complete refresh). So you can do test loads to estimate
the running times and verify the correctness.
When you are running a load, do not expect every record in the source
load image file to be successfully applied to the data warehouse. You
need to provide procedures to handle the load images that do not load.
You can save the effort of moving the load images to the data
warehouse server if the data staging area and the data warehouse
database are on the same server.
You need to consider an appropriate option (web, FTP, and database
links) if you have to transport the load images to the data warehouse
server.
7.7.3 Data Refresh versus Update
There are two methods to maintain the data warehouse and keep it up-to-
date after the initial load. They are:
Update
Refresh
Update is an application of incremental changes in the data sources and
refresh is a complete reload of data at specified intervals. The refresh
option involves the periodic replacement of complete data warehouse
tables. But the refresh jobs take a long time to run. But you need to devise
an appropriate strategy to extract the changes from each data source to use
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 170
the update option. Then you have to determine the best strategy to apply
the changes to the data warehouse.
Technically, refresh is a much simpler option than update. But you may
have to keep the data warehouse down for unacceptably long time if you run
refresh jobs every day. Therefore, you need to draw a clear line between the
two methods to identify the right choice. The cost of refresh is constant
irrespective of the number of changes in the source systems. If the number
of changes increases, the time and effort to do a full refresh remains the
same. In contrast to this, the cost of update varies with the number of
records to be updated. In general, the cost of loading per record tends to be
the same if you choose either a refresh or update.
Self Assessment Question(s) (SAQs)
For Section 7.7.1
1. What are the types of application of data in building a data warehouse?
For Section 7.7.2
1. List out various modes of applying the data in building a data warehouse?
For Section 7.7.3
1. What are the methods available to maintain the data warehouse and
keep it up-to-date after the initial load? Discuss the key differences
between the two methods.
7.8 Summary
The transformation of the data is to be done as per the standards, as the
data comes from varied source systems. Some of the basic tasks performed
in the data transformation function are selection and splitting/joining,
summarization, and conversion enrichment. By doing a combination of
these basic tasks, one can perform the following transformation functions:
format revisions, decoding of files, calculated and derived values, splitting of
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 171
single file, merging of information, summarization, character set conversion,
conversion of units of measurements, key restructuring, and deduplication.
The process of data integration and consolidation deals with combining of all
relevant operational data into coherent data structures so as to make them
ready for loading into data warehouse. This process standardizes the
names and data representations and resolves the discrepancies. But
identification of an entity and existence of multiple sources are some of the
challenges you may have to face during this process. Then you have to
perform the implementation of data transformation which is a complex
exercise. An appropriate mix of manual methods and transformation tools
need to be used to complete the process.
There are three types of application of data to the data warehouse. They are
Initial Load, increment Load and full refresh. Initial load involves populating
the data warehouse tables for the first time. Increment load involves in
applying ongoing changes as necessary in a periodic manner and full
refresh involves complete erasing of the contents of one or more tables and
reloading with fresh data. To apply the data to the warehouse, you may
adopt any of the following modes; Load, Append, Destructive Merge, and
Constructive Merge. Update and Refresh are the two methods that are in
practice to maintain the data warehouse and keep it up-to-date after the
initial load.
7.9 Terminal Questions (TQs)
1. Explain the significance of the transformation and loading function in
building a data warehouse?
2. How can you classify the changes in the dimension tables? Discuss
each of these changes by taking an example.
3. Analyze the important issues you need to look into while loading the
data?
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 172
7.10 Multiple Choice Questions (MCQs)
1. Which of the following is not a basic task of data transformation?
a. Enrichment
b. Empowerment
c. Summarization
d. Conversion
2. Which of the following tasks of data transformation deals with the
rearrangement and simplification of individual file of data to make file ?
them more useful for the data warehouse environment?
a. Enrichment
b. Empowerment
c. Summarization
d. Conversion
3. The data transformation function is important in the building of a data
warehouse. The reason is ______.
a. the extracted data cannot be applied to the data warehouse as it
might not be in a usable format
b. the operational data is extracted from several legacy systems and
so the quality of the data needs to be enriched and improved
before loading it into a data warehouse
c. Both (a) and (b)
d. None of the above
4. If none of your users ever need data at the lowest granularity for
analysis or querying, the type of transformation task to be considered
is __________.
a. Enrichment
b. J oining/Combining
c. Consolidation
d. Summarization
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 173
5. Which of the following process suggests keeping a single record for
one customer and linking all the duplicates in the source systems to
that single record?
a. Reduplication
b. Deduplication
c. Unduplication
d. Enduplication
6. Key restructuring involves __________.
a. Conversion of all the important keys into the primary keys
b. Deleting the duplicate records from the database tables and
maintaining them in a specific source system
c. Transformation of keys with built-in meaning into general keys that
are generated by the system
d. Assigning of the some of the keys to the specific data warehouses
7. Which of the following is not a type of transformation?
a. Data integration and consolidation
b. Format revisions
c. Key restructuring
d. Character set conversion
8. Which of the following process involves in combining all relevant
operational data into coherent data structures so as to make them
ready for loading into data warehouse?
a. Deduplication
b. Multiple-extraction
c. Key restructuring
d. Data integration and consolidation
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 174
9. Which of the following is an important challenge in the area of data
integration and consolidation?
a. Identification of an Entity
b. Existence of Multiple Sources
c. Both (a) and (b)
d. None of the above.
10. Which of the following methods can be used for the implementation of
the transformation function in building a data warehouse?
a. Manual methods
b. Transformation tools
c. Both (a) and (b)
d. None of the above.
11. Which of the following is not a type of application of data to the data
warehouse?
a. Initial load
b. Increment load
c. Full refresh
d. Initial refresh
12. Which of the following is not a mode of applying the data in building a
data warehouse?
a. Append
b. Refresh
c. Load
d. Destructive/Constructive Merge
13. Type 3 changes in the transformation for dimension attributes deal
with __________.
a. Hard changes
b. Soft revisions
c. History preservation
d. Correction of errors
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 175
14. The transformation function is assumed to end with the ________.
a. Creation of a load image
b. Identifying the Type 1, Type 2 and Type 3 changes
c. Completion of the data integration and consolidation process
d. None of the above
15. Which of the following is not a method to keep the data warehouse up-
to-date after the initial load?
a. Reload
b. Update
c. Refresh
d. None of the above
7.11 Answers to SAQs, TQs, and MCQs
7.11.1 Answers to Self Assessment Questions (SAQs)
Section 7.2
1. The extracted data is raw data and it cannot be directly loaded into a
data warehouse. So the transformation function ensures that the
combined data does not violate the business rules. It standardizes the
data to make the data assist the managers to make strategic decisions.
2. The basic tasks in transformation are: selection, splitting/joining,
summarization, conversion, and enrichment. You can discuss them as
provided in the Section 7.2.
Section 7.3
1. The major types of transformation are format revisions, decoding of
fields, calculated and derived values, splitting of single fields, merging of
information, summarization, character set conversion, conversion of
units of measurements, key restructuring, and deduplication. You can
describe these types of transformation as discussed in the Section 7.3.
Section 7.4
1. The data integration and consolidation includes combining all relevant
operational data into coherent data structures so as to make them ready
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 176
to load into a data warehouse. The process standardizes the names,
data representations and resolves the discrepancies.
2. Identification of an entity and existence of multiple sources are the
important challenges in implementing the data integration and
consolidation process. You can describe these challenges as detailed in
the Section 7.4.
Section 7.5
1. Manual methods and use of transformation tools are the important
methods in implementing the transformation function to build a
warehouse. These methods are discussed in the Section 7.5.
Section: 7.7.1
1. There are three types of application of data to the data warehouse. They
are initial load, incremental load, and full refresh. These methods are
discussed in the Section 7.7.1.
Section: 7.7.2
1. The modes of applying the data in building a data warehouse include
load, append, destructive merge and constructive merge. These modes
of applying the data are discussed in the Section 7.7.2.
Section: 7.7.3
1. Update and refresh are the two methods available to maintain the data
warehouse and keep it up-to-date after the initial load. These methods
are discussed in the Section 7.7.3.
7.11.2 Answers to Terminal Questions (TQs)
1. The data transformation function encompasses data conversion,
cleansing, consolidation and integration and the data loading function
relates to the initial load, regular periodic incremental loads, and full
refreshes from time to time. After extracting the data from diverse source
systems, the transformation and loading functions thus play a critical
role in preparing the strategic data to assist the managers make
appropriate decisions.
Business Intelligence and Tools Unit 7
Sikkim Manipal University Page No. 177
2. The changes to the dimension tables can be classified into three types:
Type 1 changes, Type 2 changes, and Type 3 changes. Type 1 changes
deal with correction of errors. Type 2 changes deal with history
prevention and Type 3 changes deal with tentative or soft revisions. You
can discuss the implementation of these changes to the transformation
tables by considering an example.
3. You can discuss some of the following points; to write special load
programs, to estimate the running times of the loads though test loads,
to handle the load images that do not get loaded, to consider an
appropriate option (web, FTP, and database links) for transporting the
load images to the data warehouse server.
7.11.3 Answers to Multiple Choice Questions (MCQs)
1. Ans: b
2. Ans: a
3. Ans: c
4. Ans: d
5. Ans: b
6. Ans: c
7. Ans: a
8. Ans: d
9. Ans: c
10. Ans: c
11. Ans: d
12. Ans: b
13. Ans: b
14. Ans: a
15. Ans: a

Você também pode gostar