Você está na página 1de 5

This document hold information about data

migration process in Big Data platform

Key features:

1. Challenges on data migration issues

using Big Data implementation

2. Lessons Learnt to overcome the data

migration issues using Big Data

3. Automations process on Data migration

using Big Data implementation

Data Migration process using BIG

DATA platform

Page 0 of 5

Big data migration refers transferring large volume of data between

computer systems, storage types and formats. This is applied in the switching of
single or multiple old systems to a new system when historical data are migrated
into new ones. Customer Relationship Management (CRM), Business Intelligence
(BI), and Enterprise Resource Planning (ERP) projects will be succeeded depends
heavily on the quality of data that emerges from the data migration process.

Challenges on data migration issues using Big Data implementation:

1. Uncertainty of the Data Management Landscape – There are many competing

technologies, and within each technical area there are numerous rivals. Our first
challenge is making the best choices while not introducing additional unknowns
and risk to big data adoption.
2. The Big Data Talent Gap – The excitement around big data applications seems to
imply that there is a broad community of experts available to help in
implementation. However, this is not yet the case, and the talent gap poses our
second challenge.
3. Getting Data into the Big Data Platform – The scale and variety of data to be
absorbed into a big data environment can overwhelm the unprepared data
practitioner, making data accessibility and integration our third challenge.
4. Synchronization Across the Data Sources – As more data sets from diverse
sources are incorporated into an analytical platform, the potential for time lags
to impact data currency and consistency becomes our fourth challenge.
5. Getting Useful Information out of the Big Data Platform – Lastly, using big data
for different purposes ranging from storage augmentation to enabling high-
performance analytics is impeded if the information cannot be adequately
provisioned back within the other components of the enterprise information
architecture, making big data syndication our fifth challenge.

Challenge Impact Risk

Uncertainty of the market Difficulty in choosing Committing to failing

landscape technology components product or failing vendor
Vendor lock-in

Big data talent gap Steep learning curve Delayed time to value
Extended time for design,
development, and
Big data loading Increased cycle time for Inability to actualize the
analytical platform data program due to
population unmanageable data

Synchronization Data that is inconsistent Flawed decisions based on

or out of date flawed data

Page 1 of 5
Big data accessibility Increased complexity in Inability to appropriately
syndicating data to end- satisfy the growing
user discovery tools community of data

Lessons Learnt to overcome the data migration issues using Big Data

1. Accessing data stored in a variety of standard configurations (including XML,

JSON, and BSON objects).

2. Relying on standard relational data access methods (such as ODBC/JDBC).

3. Enabling canonical means for virtualizing data accesses to consumer


4. Employ push-down capabilities of a wide variety of data management

systems (ranging from conventional RDBMS data stores to newer NoSQL
approaches) to optimize data access.

5. Rapid application of data transformations as data sets are migrated from

sources to the big data target platforms.

Automations process on Data migration using Big Data implementation

To perform automation in Data migration using Big Data, we need to follow below
mentioned steps,

[Info updated via internet research]

Step 1

Data assessment-Pre migration: We need hands on information about Migration

scope, Validation strategy and work plan.

Step 2

Mapping document -Pre migration: Need complete details about Source data and
control metrics and Dashboard.

Step 3

Data Extraction, Validation- during migration: This can be achieved by Extraction of

valid data without anomalies, Mock migration and Load.

Step 4

Page 2 of 5
Migration Validation- during migration: Have to perform Data validation, Prepare
validation reports and review reports.

Step 5

Reconciliation process-Post migration: Prepare complete migration report and Target

system report.

Step 6

Data quality- Final stage: Factors need to consider Profiling, Cleansing,

Standardization, Monitoring, Enrichment and Matching.

Example: HDFS scenario architecture for Automation

Data Processing may be of three types

1. Batch
2. Real Time
3. Interactive

Step 1: Data Staging Validation

1. Data from various source like RDBMS, weblogs etc. should be validated to
make sure that correct data is pulled into system.
2. Comparing source data with the data pushed into the Hadoop system to
make sure they match.
3. Verify the right data is extracted and loaded into the correct HDFS location

Step 2: "Map Reduce" Validation

1. Map Reduce process works correctly

Page 3 of 5
2. Data aggregation or segregation rules are implemented on the data
3. Key value pairs are generated
4. Validating the data after Map Reduce process

Step 3: Output Validation Phase

1. To check the transformation rules are correctly applied

2. To check the data integrity and successful data load into the target system
3. To check that there is no data corruption by comparing the target data with
the HDFS file system data


Page 4 of 5