Unit 7 Data Transformation and Loading Structure: 7.1 Introduction Objectives 7.2 Overview of Transformation 7.2.1 Selection and Splitting/J oining 7.2.2 Summing Up 7.2.3 Conversion 7.2.4 Enrichment Self Assessment Question(s) (SAQs) 7.3 Major Transformation Types 7.3.1 Format Revisions 7.3.2 Decoding of Fields 7.3.3 Calculated and Derived values 7.3.4 Splitting of Single Fields 7.3.5 Merging of Information 7.3.6 Summing Up 7.3.7 Character Set Conversion 7.3.8 Conversion of Units of Measurements 7.3.9 Key Restructuring 7.3.10 Reduplication Self Assessment Question(s) (SAQs) 7.4 Data Integration and Consolidation 7.4.1 Identification of an Entity 7.4.2 Existence of Multiple Sources Self Assessment Question(s) (SAQs) 7.5 Implementation of Transformation 7.5.1 Manual Methods 7.5.2 Transformation Tools Self Assessment Question(s) (SAQs) Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 155 7.6 Transformation for Dimension Attributes 7.6.1 Type 1 Changes - Correction of Errors 7.6.2 Type 2 Changes - History Preservation 7.6.3 Type 3 Changes - Soft Revisions 7.7 Data Loading 7.7.1 Types of Load 7.7.2 Modes of applying the Data 7.7.3 Data Refresh versus Update Self Assessment Question(s) (SAQs) 7.8 Summary 7.9 Terminal Questions (TQs) 7.10 Multiple Choice Questions (MCQs) 7.11 Answers to SAQs, TQs, and MCQs 7.11.1 Answers to Self Assessment Questions (SAQs) 7.11.2 Answers to Terminal Questions (TQs) 7.11.3 Answers to Multiple Choice Questions (MCQs) 7.1 Introduction In the previous Unit, we have discussed several data extraction techniques. But the extracted data is raw data and it cannot be directly loaded into a data warehouse. To have useful information for strategic decision-making is an underlying principle of the data warehouse and the data in the operational source systems cannot fulfill this purpose. So the transformation and loading functions play a key role in the preparation of the data that can assist the senior managers of an organization in making the strategic decisions. Objectives: The objectives of the Unit are to make you understand: The basic tasks in the transformation function Several types of transformation function Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 156 Data integration and consolidation The implementation of transformation function Techniques and processes involved in data loading 7.2 Overview of Transformation You need to perform various types of transformation tasks before moving the extracted data from the source systems into the data warehouse. The transformation of the data is to be done as per the standards as the data comes from various source systems and you also need to ensure that the combined data does not violate the business rules. Irrespective of the complexity of the source systems, and regardless of the extent of your data warehouse, some of the basic tasks performed in the data transformation function are as follows: 7.2.1 Selection and Splitting/Joining This is the basic task that is done at the beginning of the entire data transformation process. Using this task, you may select either whole records or parts of several records from the source systems. Usually, the selection task forms a part of the extraction function itself. But the composition of the source structure may not be amenable to selection of the necessary parts while extracting the data and you may have to extract the whole record and sue the selection task as a part of the transformation function. The splitting/joining task includes the type of data manipulation you need to perform on selected records of the source systems. You can either split the selected parts further or join the parts selected from many source systems. But the joining task is quite often used in the data warehouse environment. 7.2.2 Summing Up This task is used in case you find that it is not required to keep data at the lowest level of detail in your data warehouse. For instance, for a grocery Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 157 chain, sales data at the lowest level of detail for every transaction at the checkout may not be required. Storing sales by product by store by day may be adequate. Therefore, the data transformation function includes summarization of daily product and by store. 7.2.3 Conversion This task includes a large variety of rudimentary conversions of single fields. This task is done for two reasons: to standardize the data among the data extractions from disparate source systems to make the fields usable and understandable to the users 7.2.4 Enrichment This task involves the rearrangement and simplification of individual fields to make them useful for the data warehouse environment. You can use one or more fields from the same input record to create a better view of the data for the data warehouse. The principle is extended when one or more fields originate from multiple records, resulting in a single field for the data warehouse. Self Assessment Question(s) (SAQs) For Section 7.2 1. What is the role of the transformation function in building a warehouse? 2. What are the basic tasks of transformation? 7.3 Major Transformation Types By undertaking a combination of the basic tasks discussed above, you can do the following transformation functions: 7.3.1 Format Revisions Format revisions include changes to the data types and lengths of individual fields. For instance, product package types in your source systems may be indicated by codes and names in which the fields are numeric and text data Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 158 types. Also, the lengths of package types might vary from one source system to another. Therefore, you can standardize and change the data type to text in order to provide values meaningful to the users using format revisions. 7.3.2 Decoding of Fields This type of transformation deals with multiple source systems and you are bound to have same data items described by a plethora of field values. For instance, the coding for two products manufactured by an organization might have been done as 1 and 2 in one source system and is done as A and B in another system. In such situations, you need to decode the codes and standardize the code before loading the data into a data warehouse; otherwise there would be a conflict in the data analysis. 7.3.3 Calculated and Derived values You can maintain both calculated and derived types of data values in a typical data warehouse. For instance, you can keep profit margin (this can be calculated as the difference between the total sales and total cost) as a calculated value along with sales and cost amounts after extracting the data from the sales system viz., sales volume, sales value, operating cost estimates. Similarly, you may use average daily balances and operating ratios as derived fields. 7.3.4 Splitting of Single Fields You need to split the larger single files for improved understanding and making better analysis. For instance, the traditional legacy systems store name and address of customers in a large text files. Similarly, some systems store city, state, and zip code data together in a single file. But these components need to be stored individually to improve the operation performance by indexing on individual components and to perform analysis by using individual components such as city, state, and zip code. Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 159 7.3.5 Merging of Information This type of transformation deals with merging of information available in various source systems into a single entity. For instance, the product code and description may come from one data source and the relevant package types, the cost data may come from several other source systems. Here, merging of information denotes combining the product code, description, package types, and cost into a single entity. 7.3.6 Summing Up In this type of transformation, the summaries are created and then loaded in the data warehouse instead of loading the most granular level of data. For instance, a credit card company need not store each and every single transaction on each credit card in the data warehouse to analyze sales patterns. Instead, the data can be summarized to the extent possible and store the summary data instead of the most granular data. 7.3.7 Character Set Conversion In this type of data transformation, the character sets are converted into an agreed standard character set for textual data in the data warehouse. For instance, the source data will be in EBCDIC (Extended Binary Coded Decimal Interchange Code) characters if you have mainframe legacy systems as source systems. So you need to convert from mainframe EBCDIC format to the ASCII (American Standard Code for Information Interchange), format if PC-based architecture is the choice of your data warehouse. 7.3.8 Conversion of Units of Measurements Use of standard unit of measurement is one of the prerequisites in building a data warehouse. If your company has overseas operations, you may have to convert the metrics accordingly so that the numbers may all be in one standard unit of measurement. Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 160 Here, the date/time conversion is an important measurement. For example, the date of October 9, 2006 is written as 10/09/2006 in the U.S format and as 09/10/2006 in the British format. This can be standardized by writing it as 09 Oct 2006. 7.3.9 Key Restructuring You have to come up with keys for the fact and dimension tables for a data warehouse to be built based on the keys in the extracted records. So you look at the primary keys of the extracted records while extracting data from the input sources. For instance, the product code in an organization is structured to have an inherent meaning (like first letter describes the location code, second letter describes the machine code, etc.) and you use this product code as the primary key and move the data into another warehouse. Then the warehouse part of the product key will have to be changed before moving the data. Therefore, avoid the keys with built-in meanings while choosing keys for your data warehouse database tables and transform such keys into generic keys (that are generated by the system itself). 7.3.10 Reduplication Some companies may maintain several records for a single customer and so duplicates are the result of the additional records. Therefore, it is suggested to keep a single record for one customer and link all the duplicates in the source systems to this single record in your data warehouse. This process is called reduplication. Self Assessment Question(s) (SAQs) For Section 7.3 1. Discuss the major types of transformation that are in practice and give an example for each of these types. Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 161 7.4 Data Integration and Consolidation In general, most of the data that the warehouse gets is the data extracted from a combination of legacy mainframe systems, old minicomputer applications, and some client/server systems. But these source systems do not conform to the same set of business rules. Thus they may often follow different naming conventions and varied standards for data representation. Thus the process of data integration and consolidation plays a vital role. Here, the data integration includes combining of all relevant operational data into coherent data structures so as to make them ready for loading into data warehouse. It standardizes the names and data representations and resolves the discrepancies. Some of the challenges involved in the data integration and consolidation process are as follows. 7.4.1 Identification of an Entity Suppose there are three legacy applications that are in use in your organization; one is the order entry system, second is customer service support system, and the third is the marketing system. Each of these systems might have their own customer file to support the system. Even most of the customers will be common to all these three files, the same customer on each of these files have a different unique identification number. As you need to keep a single record for each customer in a data warehouse, you need to get the transactions of each customer from various source systems and then match them up to load into the data warehouse. This is an entity identification problem in which you do not know which of the customer records relate to the same customer. This problem is prevalent where multiple sources exist for the same entities and the other entities that are prone to this type of problem include vendors, suppliers, employees, and various products manufactured by a company. Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 162 In case of three customer files, you have to design complex algorithms to match records from all the three files and groups of matching records. But this is a difficult exercise. If the matching criterion is too tight, then some records might escape the groups. Similarly, a particular group may include records of more than one customer if the matching criterion designed is too loose. Also, you might have to involve your users or the respective stakeholders to understand the transaction accurately. Some of the companies attempt this problem in two phases. In the first phase, the entire records, irrespective whether they are duplicates or not, are assigned unique identifiers and in the second phase, the duplicates are reconciled periodically ether through automatic algorithms or manually. 7.4.2 Existence of Multiple Sources Another major challenge in the area of data integration and consolidation results from a single data element having more than one source. For instance, cost values are calculated and updated at specific intervals in the standard costing application. Similarly, your order processing application also carries the unit costs for all products. Thus there are two sources available to obtain the unit cost of a product and so there could be a slight variation in their values. Which of these systems needs to be considered to store the unit cost in the data warehouse becomes an important question. One easy way of handling this situation is to prioritize the two sources, or you may select the source on the basis of the last update date. Self Assessment Question(s) (SAQs) For Section 7.4 1. What is data integration and consolidation? 2. Discuss the major challenges involved in the process of data integration and consolidation? Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 163 7.5 Implementation of Transformation The implementation of data transformation is a complex exercise. You may have to go beyond the manual methods, usual methods of writing conversion programs while deploying the operational systems. You need to consider several other factors to decide the methods to be adopted. Suppose you are considering automating the data transformation functions, you have to identify, configure and install the tools, train the team on these tools, and integrate them into the data warehouse environment. But a combination of both methods proves to be effective. The issues you may face in using manual methods and transformation tools are discussed below. 7.5.1 Manual Methods These are the traditional methods that are in practice in the recent past. These methods are adequate in case of smaller data warehouses. These methods include manually coded programs and scripts that are mainly executed in the data staging area. Since these methods call for elaborate coding and testing and programmers and analysts who posses the specialized knowledge in this area only can produce the programs and scripts. Although the initial cost may be reasonable, ongoing maintenance may escalate the cost while implementing these methods. Moreover these methods are always prone to errors. Another disadvantage of these methods is about the creation of metadata. Even if the in-house programs record the metadata initially, the metadata needs to be updated every time the changes occur in the transformation rules. 7.5.2 Transformation Tools The difficulties involved in using the manual methods can be eliminated using the sophisticated and comprehensive set of transformation tools that Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 164 are now available. Use of these automated tools certainly improves efficiency and accuracy. If the inputs provided into the tools are accurate, then the rest of the work is performed efficiently by the tool. So you have to carefully specify the required parameters, the data definitions and the rules to the transformation tool. Also, the transformation tools enable the recording of metadata. When you specify the transformation parameters and rules, these values are stored as metadata by the tool and this metadata becomes a part of the overall metadata component of the data warehouse. When changes occur to business rules or data definitions, you just have to enter the changes into the tool and the metadata for the transformations get adjusted automatically. But relying on the transformation tools alone without using the manual methods is also not practically possible. Self Assessment Question(s) (SAQs) For Section 7.5 1. Elaborate the types of methods in practice while implementing the transformation function of building a data warehouse? 7.6 Transformation for Dimension Attributes Now we consider the updating of the dimension tables. The dimension tables are more stable in nature and so they are less volatile compared to the fact tables. The fact tables change through an increase in the number of rows, but the dimension tables change through the changes to the attributes. For instance, we consider a product dimension table. Every year, rows are added as new models become available. But what about the attributes that are within the dimension table. You might face a situation where there is a change in the product dimension table because a particular product was moved into a different product category. So the corresponding values must be changed in the product dimension table. Though most of the Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 165 dimensions are generally constant over a period of time, they may change slowly. The usual changes in the dimension tables can be classified into three types as provided below (Refer Fig 7.1). 7.6.1 Type 1 Changes Correction of Errors Type 1 changes are applied to the data warehouse without any need to preserve history as these changes usually relate to the corrections of errors in the source systems. For instance if there is a spelling mistake in the name of a customer in the source system, as this name is erroneous, it needs to be discarded and also there is no need to preserve the old name. Therefore it is understood that the Type 1 changes have little significance and the old values need not be preserved in the data warehouse. The method of applying the Type 1 changes is to overwrite the attribute value in the dimension table row with new value. Also, it will not affect the key of this dimension table. This method is easy to implement as the old value need not be preserved and no other changes are made in the dimension table row. 7.6.2 Type 2 Changes History Preservation Suppose there is a change in the marital status of a customer and one of the essential requirements of your data warehouse is to track the orders according to the marital status. If the customer is married on 9 th December 2001, all his orders before the marriage date needs to be under single and they need to be under married after the marriage date. So there is a need to preserve the history in the data warehouse. The Type 2 changes are related to true changes in source systems and this change leads to partitioning of the history in the data warehouse. So every change for the same attribute is to be preserved in case of Type 2 changes. To apply the Type 2 changes, you can add a new dimension table row with Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 166 new value of the changed attribute. The key of the original row is not affected and there are no changes in the original row in the dimension table. An effective date file may be included and the new row is inserted with a new surrogate key. 7.6.3 Type 3 Changes Soft Revisions Type 3 changes are tentative or soft revisions. Unlike the Type 2 changes, the orders need to be maintained in the old and new groups after an effective date. For instance, you moved a salesperson from Territory A to Territory B to analyze his ability in both the territories. Therefore his orders need to be captured in both the territories after an effective date. So there is a need to keep track of history with old and new values of the changed attribute. To apply for Type 3 changes, you have to add an old file in the dimension table for the affected attribute. Then you push down the existing value of the attribute from the current field to the old field and keep the new value of the attribute in the current field. Also, you may add a current effective date field for the attribute. Here, the key of the row is not affected and no new dimension is needed. The existing queries will seamlessly be switched to the current value and any queries that need to use the old value is to be revised accordingly. In order to apply all these changes correctly, you have to transform the incoming changes and prepare the changes to the data for loading into the data warehouse. Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 167 Error! 7.7 Data Loading After the creation of load images, the next set of activities is to take the prepared data, apply it to the data warehouse, and store it in the data warehouse database. Here, the data warehouse will be offline during the loads. As the process of loading is a time-consuming activity, it is preferred to divide the whole load process into smaller chunks and populate a few files at a time. This enables you to run the smaller loads in parallel. Also, you can keep some parts of the warehouse up and running while loading the other parts. 7.7.1 Types of Load There are three types of application of data to the data warehouse: Initial Load that involves populating all the data warehouse tables for the first time Create Load Image (include effective date) Source System data changes for dimensions Perform data transformation functions Perform data cleansing functions Determine type of dimension change Consolidate and integrate data Convert production key to existing surrogate key Convert production key to new surrogate key Convert production key to existing surrogate key Create Load Image Create Load Image Fig. 7.1: Transformation for Dimensional Attributes Type 1 Type 2 Type 3 Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 168 Increment Load that involves applying ongoing changes as necessary in a periodic manner Full refresh that involves complete erasing of the contents of one or more tables and reloading with fresh data (initial load is afresh of all the tables) For instance, consider a product data. During the initial load, you extract the data for all products from the source systems, integrate and transform it, and create load images to load the data into the product dimension table. During an incremental load, you collect the changes to the product data in the source systems since the previous extract, run the changes through the integration and transformation process, and create output records to be applied to the product dimension table. A full refresh is similar to the initial load wherein the fresh data is reloaded. 7.7.2 Modes of applying the Data To apply the data to the warehouse, you may adopt any of the following modes: Load: The load process wipes out the existing data and applies the data from the incoming file to the target table. If the table is empty before loading, the load process simply applies the data from the incoming file. Append: The append process unconditionally adds the incoming data, preserving the existing data in the target table. When an incoming record is a duplicate of an existing record, you can define the process, either to allow or reject the incoming record. Destructive Merge: When you apply the incoming data to the target data, the destructive merge process updates target record, if the primary key of an incoming record matches with the key of an existing record. The incoming record simply gets added to the target table, if the incoming record is a new record. Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 169 Constructive Merge: If the primary key of an incoming record matches with the key of an existing record, it leaves the existing record, adds the incoming record and marks it as superceding the old record. Some of the important points you need to understand with regard to data loading are as follows: The straight forward method of applying the data is writing special load programs and the number of load programs can be large depending on the size of the warehouse. It is difficult to estimate the running times of the loads (especially the initial load or a complete refresh). So you can do test loads to estimate the running times and verify the correctness. When you are running a load, do not expect every record in the source load image file to be successfully applied to the data warehouse. You need to provide procedures to handle the load images that do not load. You can save the effort of moving the load images to the data warehouse server if the data staging area and the data warehouse database are on the same server. You need to consider an appropriate option (web, FTP, and database links) if you have to transport the load images to the data warehouse server. 7.7.3 Data Refresh versus Update There are two methods to maintain the data warehouse and keep it up-to- date after the initial load. They are: Update Refresh Update is an application of incremental changes in the data sources and refresh is a complete reload of data at specified intervals. The refresh option involves the periodic replacement of complete data warehouse tables. But the refresh jobs take a long time to run. But you need to devise an appropriate strategy to extract the changes from each data source to use Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 170 the update option. Then you have to determine the best strategy to apply the changes to the data warehouse. Technically, refresh is a much simpler option than update. But you may have to keep the data warehouse down for unacceptably long time if you run refresh jobs every day. Therefore, you need to draw a clear line between the two methods to identify the right choice. The cost of refresh is constant irrespective of the number of changes in the source systems. If the number of changes increases, the time and effort to do a full refresh remains the same. In contrast to this, the cost of update varies with the number of records to be updated. In general, the cost of loading per record tends to be the same if you choose either a refresh or update. Self Assessment Question(s) (SAQs) For Section 7.7.1 1. What are the types of application of data in building a data warehouse? For Section 7.7.2 1. List out various modes of applying the data in building a data warehouse? For Section 7.7.3 1. What are the methods available to maintain the data warehouse and keep it up-to-date after the initial load? Discuss the key differences between the two methods. 7.8 Summary The transformation of the data is to be done as per the standards, as the data comes from varied source systems. Some of the basic tasks performed in the data transformation function are selection and splitting/joining, summarization, and conversion enrichment. By doing a combination of these basic tasks, one can perform the following transformation functions: format revisions, decoding of files, calculated and derived values, splitting of Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 171 single file, merging of information, summarization, character set conversion, conversion of units of measurements, key restructuring, and deduplication. The process of data integration and consolidation deals with combining of all relevant operational data into coherent data structures so as to make them ready for loading into data warehouse. This process standardizes the names and data representations and resolves the discrepancies. But identification of an entity and existence of multiple sources are some of the challenges you may have to face during this process. Then you have to perform the implementation of data transformation which is a complex exercise. An appropriate mix of manual methods and transformation tools need to be used to complete the process. There are three types of application of data to the data warehouse. They are Initial Load, increment Load and full refresh. Initial load involves populating the data warehouse tables for the first time. Increment load involves in applying ongoing changes as necessary in a periodic manner and full refresh involves complete erasing of the contents of one or more tables and reloading with fresh data. To apply the data to the warehouse, you may adopt any of the following modes; Load, Append, Destructive Merge, and Constructive Merge. Update and Refresh are the two methods that are in practice to maintain the data warehouse and keep it up-to-date after the initial load. 7.9 Terminal Questions (TQs) 1. Explain the significance of the transformation and loading function in building a data warehouse? 2. How can you classify the changes in the dimension tables? Discuss each of these changes by taking an example. 3. Analyze the important issues you need to look into while loading the data? Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 172 7.10 Multiple Choice Questions (MCQs) 1. Which of the following is not a basic task of data transformation? a. Enrichment b. Empowerment c. Summarization d. Conversion 2. Which of the following tasks of data transformation deals with the rearrangement and simplification of individual file of data to make file ? them more useful for the data warehouse environment? a. Enrichment b. Empowerment c. Summarization d. Conversion 3. The data transformation function is important in the building of a data warehouse. The reason is ______. a. the extracted data cannot be applied to the data warehouse as it might not be in a usable format b. the operational data is extracted from several legacy systems and so the quality of the data needs to be enriched and improved before loading it into a data warehouse c. Both (a) and (b) d. None of the above 4. If none of your users ever need data at the lowest granularity for analysis or querying, the type of transformation task to be considered is __________. a. Enrichment b. J oining/Combining c. Consolidation d. Summarization Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 173 5. Which of the following process suggests keeping a single record for one customer and linking all the duplicates in the source systems to that single record? a. Reduplication b. Deduplication c. Unduplication d. Enduplication 6. Key restructuring involves __________. a. Conversion of all the important keys into the primary keys b. Deleting the duplicate records from the database tables and maintaining them in a specific source system c. Transformation of keys with built-in meaning into general keys that are generated by the system d. Assigning of the some of the keys to the specific data warehouses 7. Which of the following is not a type of transformation? a. Data integration and consolidation b. Format revisions c. Key restructuring d. Character set conversion 8. Which of the following process involves in combining all relevant operational data into coherent data structures so as to make them ready for loading into data warehouse? a. Deduplication b. Multiple-extraction c. Key restructuring d. Data integration and consolidation Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 174 9. Which of the following is an important challenge in the area of data integration and consolidation? a. Identification of an Entity b. Existence of Multiple Sources c. Both (a) and (b) d. None of the above. 10. Which of the following methods can be used for the implementation of the transformation function in building a data warehouse? a. Manual methods b. Transformation tools c. Both (a) and (b) d. None of the above. 11. Which of the following is not a type of application of data to the data warehouse? a. Initial load b. Increment load c. Full refresh d. Initial refresh 12. Which of the following is not a mode of applying the data in building a data warehouse? a. Append b. Refresh c. Load d. Destructive/Constructive Merge 13. Type 3 changes in the transformation for dimension attributes deal with __________. a. Hard changes b. Soft revisions c. History preservation d. Correction of errors Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 175 14. The transformation function is assumed to end with the ________. a. Creation of a load image b. Identifying the Type 1, Type 2 and Type 3 changes c. Completion of the data integration and consolidation process d. None of the above 15. Which of the following is not a method to keep the data warehouse up- to-date after the initial load? a. Reload b. Update c. Refresh d. None of the above 7.11 Answers to SAQs, TQs, and MCQs 7.11.1 Answers to Self Assessment Questions (SAQs) Section 7.2 1. The extracted data is raw data and it cannot be directly loaded into a data warehouse. So the transformation function ensures that the combined data does not violate the business rules. It standardizes the data to make the data assist the managers to make strategic decisions. 2. The basic tasks in transformation are: selection, splitting/joining, summarization, conversion, and enrichment. You can discuss them as provided in the Section 7.2. Section 7.3 1. The major types of transformation are format revisions, decoding of fields, calculated and derived values, splitting of single fields, merging of information, summarization, character set conversion, conversion of units of measurements, key restructuring, and deduplication. You can describe these types of transformation as discussed in the Section 7.3. Section 7.4 1. The data integration and consolidation includes combining all relevant operational data into coherent data structures so as to make them ready Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 176 to load into a data warehouse. The process standardizes the names, data representations and resolves the discrepancies. 2. Identification of an entity and existence of multiple sources are the important challenges in implementing the data integration and consolidation process. You can describe these challenges as detailed in the Section 7.4. Section 7.5 1. Manual methods and use of transformation tools are the important methods in implementing the transformation function to build a warehouse. These methods are discussed in the Section 7.5. Section: 7.7.1 1. There are three types of application of data to the data warehouse. They are initial load, incremental load, and full refresh. These methods are discussed in the Section 7.7.1. Section: 7.7.2 1. The modes of applying the data in building a data warehouse include load, append, destructive merge and constructive merge. These modes of applying the data are discussed in the Section 7.7.2. Section: 7.7.3 1. Update and refresh are the two methods available to maintain the data warehouse and keep it up-to-date after the initial load. These methods are discussed in the Section 7.7.3. 7.11.2 Answers to Terminal Questions (TQs) 1. The data transformation function encompasses data conversion, cleansing, consolidation and integration and the data loading function relates to the initial load, regular periodic incremental loads, and full refreshes from time to time. After extracting the data from diverse source systems, the transformation and loading functions thus play a critical role in preparing the strategic data to assist the managers make appropriate decisions. Business Intelligence and Tools Unit 7 Sikkim Manipal University Page No. 177 2. The changes to the dimension tables can be classified into three types: Type 1 changes, Type 2 changes, and Type 3 changes. Type 1 changes deal with correction of errors. Type 2 changes deal with history prevention and Type 3 changes deal with tentative or soft revisions. You can discuss the implementation of these changes to the transformation tables by considering an example. 3. You can discuss some of the following points; to write special load programs, to estimate the running times of the loads though test loads, to handle the load images that do not get loaded, to consider an appropriate option (web, FTP, and database links) for transporting the load images to the data warehouse server. 7.11.3 Answers to Multiple Choice Questions (MCQs) 1. Ans: b 2. Ans: a 3. Ans: c 4. Ans: d 5. Ans: b 6. Ans: c 7. Ans: a 8. Ans: d 9. Ans: c 10. Ans: c 11. Ans: d 12. Ans: b 13. Ans: b 14. Ans: a 15. Ans: a
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint