Você está na página 1de 3

Investigation Stage : The Investigation stage is the stage used to perform data investigation when the data parses

to this stage while doing the investigation the free form data will be divided in to individual tokens then analyzed . In investigation stage we can perform two types of investigations. 1) Character Investigation 2) Word Investigation Character Investigation:Character Investigation is used to investigation of the characters. When we perform character investigation on the data , it will check for each and every character in that string. There are two types of character investigations again. a)Character Discrete investigation ( Single ) b) Character Concatenate investigation ( Multi) 2) Word Investigation In the word investigation our investigation stage will investigation entire word. Normally this word investigation we will peerform on multi domain fields. ( like addres )

Objectives of the investigation process : a)Values that donot match metadata levels. Column name one data us other one. For example the driver license number. b) Values that overlap adjacent fields and thus required a re-alignment of field. It discovers additional tokens such as name prefixes, name suffixes, street unit type. It verifies the usefulness of data. There are three masks available while doing Data Quality. 1) Mask c: It is used to inspect the value in your column . It shows data character word wise. 2) Mask T: It is used to inspect the type of data in a character position. 3) Mask X: It is used to excludes the character in the frequency count.

And the Investigation Stage Supports 1 input and multiple outputs.

Standardize Stage is one of the stage in quality stage, used to make the data into standardize format. After applying the investigation stage, we will move the data into the standardize stage to make the data into standardize format.In the standardize stage we will be applying the three types of standardization rules. The Standardize rules are as below a) Domain preprocessor rule b) Domain specific rule and c) Validation Rule Domain Preprocessor rule is the rule that rules are set do not perform standardization but parse the columns in each row record and each token into the appropriate domain specific column sets which are name, area, address like that. Ex: Rules------ usprep -------ok Some standard examples will be like Name 1 John doe Name 2 123 clay field, brisbane Address 1 c/o smith james Address 2 West end 4000

Domain Specific Rules: These rules are rules that rules are set, we can check every individual domain level whether that data is valid or invalid. This is mostly domain specific rules set we an apply on three domains. They are name domain, address domain, area domain

Validation Rules: The validation rules are used to standardize, the common business data including data, name, emailid, phone number, social security number, credit card numbers etc We will be validating the data and reporting the error.

Você também pode gostar