Você está na página 1de 12

The complete Data Migration Methodology

The Phases of Data Migration


The ideal data migration project should be broken down into phases, which mirror the
overall development phases. These phases consists of:
Pre-Strategy
Strategy
Pre-Analysis
Analysis
Pre-Design
Design
Pre-Test
Test
Implementation
Revise
Maintenance
Pre-Strategy:
The Pre-Strategy phase is the clearest part of the migration project planning process.
In this phase, the focus of the overall project is determined. It is important to note that
data migration project do not happen independently. Rather they are spawned from
other development efforts such as implementation of OLTPs and/or OLAPs. This is
where the first fundamental mistake generally occurs. The project is clearly focused
on determining the requirements that the new system must satisfy, and pays little or
no attention to the data migration(s) that must occur.
The focus of the Pre-Strategy phase is to determine the scope of the migration. In other
words, answering the question, What we are trying to migrate? This is the time to identify the
number of legacy systems requiring migration and a count of their data structures. Interfaces
are another critical factor that should be identified at this juncture. Interfaces are no different
from other data sources, except they may receive data from the new system, as well as
supplying data.
At this point, we are not only identifying the number of data files, but also the number of
different systems from which data will be migrated. In the case of OLAP systems, multiple
data sources are a common occurrence. Data sources are not limited to actual data
processing systems. Word processing documents, spreadsheets, desktop RDBMS packages,
and raw text files are just a few examples of data sources you can expect to uncover in the
early phases of data migration.
The Complete Data Migration Methodology -Strategy
The Strategy phase of data migration should be scheduled to occur concurrently with the
Strategy phase of the core project (i.e. OLTP, OLAP). Unfortunately, in most cases, the same
people are expected to perform both analyses. This can be done, but these people need a
clearly defined set of tasks for which they are responsible.
The focus of the Strategy Phase is to determine whether or not the objectives documented in
the Pre-Strategy phase are achievable. This involves examining the actual data you plan to
migrate. Remember, at this point in the project, you have no idea if the data is even of high
enough quality to consider migrating. In order to get a better sense, it is helpful to obtain
reports that can provide row counts, column counts and other statistics pertaining to your
source data. This kind of information gives you a rough idea of just how much data there is to
migrate. You may find that the overall cost of migration is prohibitive relative to the quantity of
data that needs to be moved.

If this occurs, the most common solution is to migrate the source data to the new platform into
data structures that are constructed identically to that of the source system. Doing so allows
you to shut down the old system and bring up the new one with confidence and without losing
historical data. The cost of developing a set of reports to access the historical data on the new
platform tends to be far cheaper than the cost of migration in nearly every scenario.
The milestone of the Strategy phase is the Data Migration Strategy Document, which outlines
the intentions of the overall migration effort. In this document, we outline the reasons for our
conclusions about whether data migration is worthwhile. The data quality research performed
in this phase is still at a very high level, and in no way suggests that the team has gained a
thorough understanding of the specific data cleansing issues they will face later in the project.

The Complete Data Migration Methodology, Analysis


The Analysis phase of data migration also happens in parallel with the Analysis phase of the
core project. This is because each data element identified as a candidate for migration
inevitably results in a change to the data model.
The Analysis phase is not intended to thoroughly identify the transformation rules by which
historical data will be massaged into the new system; rather, it involves making a checklist of
the legacy data elements that we know must be migrated. This list of data elements will have
three sources:

Legacy data analysis as described above in the Pre-Analysis phase

Legacy report audits: A complete set of legacy reports from the old OLTP system is
reviewed. A comprehensive list of field usages should be compiled from these
reports.
User feedback sessions: These sessions should be scheduled for topics such as
data model audits, reviews of interview notes and the results of the legacy data
analysis and legacy report audits.

The

Complete
Data
Migration
Methodology- Design
The Design phase is where the bulk of the actual
mapping of legacy data elements to columns takes
place. The physical data structures have been frozen,
offering an ideal starting point for migration testing.
Note that data migration is iterative it does not
happen in a single sitting.
The mapping portion of a data migration project can
be expected to span the Design phase through
Implementation. The most important resources for
validating the migration are the users of the new
system. Unfortunately, they will be unable to grasp
the comprehensiveness of the migration until they
view the data through the new applications. We have
concluded from experience that developing the new
reports prior to new forms allows for more thorough
validation of the migration earlier on in the project
lifespan. For instance, if some sort of calculation was
performed incorrectly by a migration script, reports
will reflect this. A form typically displays a single
master record at a time, whereas reports display
several records per page, making them a better
means of displaying the results of migration testing.
It is necessary to perform data mapping to the
physical data model. With the physical data structures
in place, you can begin the mapping process.
Mapping is generally conducted by a team of at least
three people per core business area (i.e. Purchasing,
Inventory, Accounts Receivable, etc.). Of these three
people, the first should be a business analyst,
generally an end user possessing intimate knowledge
of the historical data to be migrated. The second
team member is usually a systems analyst with
knowledge of both the source and target systems.
The third person is a programmer/analyst that
performs data research and develops migration
routines based upon the mappings defined by the
business analyst and the systems analyst,
cooperatively.

The Complete Data Migration Methodology- PreTest/Test Implementation


The Pre-Test phase deals with two core subject
areas: logical errors and physical errors. Physical
errors are typically syntactical in nature and can be
easily identified and resolved. Physical errors have
nothing to do with the quality of the mapping effort.
Rather, this level of testing deals with semantics of
the scripting language used in the transformation
effort.
The Test phase is where we identify and resolve
logical errors. The first step is to execute the
mapping. Even if it is successfully completed

successfully, we must still ask questions such as:

How many records did we expect this script


to create?
Did the correct number of records get
created? If not, why?
Has the data been loaded into the correct
fields?
Has the data been formatted correctly?
The truest test of data mapping is providing the
populated target data structures to the users that
assisted in the analysis and design of the core
system. Invariably, the users will begin to identify
scores of other historical data elements to be
migrated that were not apparent to them during the
Analysis/Design phases.
The data migration Testing phases must be reached
as soon as possible to ensure that it occurs prior to
the Design and Build phases of the core project.
Otherwise, months of development effort can be lost
as each additional migration requirement slowly but
surely wreaks havoc on the data model, which, in
turn, requires substantive modifications to the
applications. The measures taken in the Test phases
are executed as early as the Design phase. Testing
is just as iterative as the migration project itself, in
that every enhancement must pass the test plan.

The Complete Data Migration Methodology - Revise


The Revise phase is really a superset of the last four phases (Pre-Test, Test, Implement and
Maintenance), which are iterative and do not take place independently. This is the point in the
process where cleanup is managed. All of the data model modifications, transformation rule
adjustments, and script modifications are essentially combined to form the Revise phase.
At this point, the question must be asked: Are both the logical and physical data model being
maintained? If so, you now have doubled the administrative workload for the keeper of the
data models. In many projects, the intention is to maintain continuity between the logical and
physical designs. However, because the overwhelming volume of work tends to exceed its
perceived value, the logical model is inevitably abandoned, resulting in inconsistent system
documentation.
CASE tools can be used to maintain the link between the logical and physical models, though
it is necessary for several reports to be developed in house. For example, you will want
reports that indicate discrepancies between entities/tables and attributes/columns. These
reports will indicate whether there is a mis-match between the numbers of entities versus
tables and/or attributes versus columns, identify naming convention violations and seek out
data definition discrepancies. Choose a CASE tool that provides an API to the meta-data,
because it will definitely be needed.

The Spiral Model:


The spiral model, also known as the spiral
lifecycle model, is a systems development
method (SDM) used in information technology
(IT). This model of development combines the
features of the prototyping model and the
waterfall model. The spiral model is intended
for large, expensive, and complicated projects.
The steps in the spiral model can be
generalized as follows:

The new system requirements are


defined in as much detail as possible.
This usually involves interviewing a
number of users representing all the
external or internal users and other
aspects of the existing system.
A preliminary design is created for the
new system.
A first prototype of the new system is
constructed from the preliminary
design. This is usually a scaled-down
system, and represents an
approximation of the characteristics of
the final product.
A second prototype is evolved by a
fourfold procedure: (1) evaluating the
first prototype in terms of its strengths,
weaknesses, and risks; (2) defining the
requirements of the second prototype;
(3) planning and designing the second
prototype; (4) constructing and testing
the second prototype.
The existing prototype is evaluated in
the same manner as was the previous
prototype, and, if necessary, another
prototype is developed from it
according to the fourfold procedure
outlined above.
The preceding steps are iterated until
the customer is satisfied that the
refined prototype represents the final
product desired.
The final system is constructed, based
on the refined prototype.
The final system is thoroughly
evaluated and tested. Routine
maintenance is carried out on a
continuing basis to prevent large-scale
failures and to minimize downtime.

The spiral model is used most often in large


projects. For smaller projects, the concept of

agile software development is becoming a


viable alternative. The US military has
adopted the spiral model for its Future
Combat Systems program.
Development Team Testing Strategies in Agile:
Agile development teams generally follow a whole team strategy where people with testing
skills are effectively embedded into the development team and the team is responsible for the
majority of the testing. This strategy works well for the majority of situations but when your
environment is more complex you'll find that you also need an independent test team working
in parallel to the development and potentially performing end-of-lifecycle testing as well.
Regardless of the situation, agile development teams will adopt practices such as continuous
integration which enables them to do continuous regression testing, either with a test-driven
development (TDD) or test-immediately after approach.

Identification and authentication (I&A)


Identification and authentication (I&A) is the process of verifying that an identity is bound to
the entity that makes an assertion or claim of identity. The I&A process assumes that there
was an initial validation of the identity, commonly called identity proofing. Various methods of
identity proofing are available ranging from in person validation using government issued
identification to anonymous methods that allow the claimant to remain anonymous, but known
to the system if they return. The method used for identity proofing and validation should
provide an assurance level commensurate with the intended use of the identity within the
system. Subsequently, the entity asserts an identity together with an authenticator as a
means for validation. The only requirements for the identifier is that it must be unique within
its security domain.
Authenticators are commonly based on at least one of the following four factors:
Something you know, such as a password or a personal identification number (PIN). This
assumes that only the owner of the account knows the password or PIN needed to access the
account.
Something you have, such as a smart card or security token. This assumes that only the
owner of the account has the necessary smart card or token needed to unlock the account.
Something you are, such as fingerprint, voice, retina, or iris characteristics.
Where you are, for example inside or outside a company firewall, or proximity of login location
to a personal GPS device.
Computer security:
In computer security, access control includes authentication, authorization and audit. It also
includes measures such as physical devices, including biometric scans and metal locks,
hidden paths, digital signatures, encryption, social barriers, and monitoring by humans and
automated systems.
In any access control model, the entities that can perform actions in the system are called
subjects, and the entities representing resources to which access may need to be controlled
are called objects. Subjects and objects should both be considered as software entities and
as human users. Although some systems equate subjects with user IDs, so that all processes
started by a user by default have the same authority, this level of control is not fine-grained
enough to satisfy the Principle of least privilege, and arguably is responsible for the
prevalence of malware in such systems.
In some models, for example the object-capability model, any software entity can potentially
act as both a subject and object.

Access control models used by current systems tend to fall into one of two classes: those
based on capabilities and those based on access control lists (ACLs). In a capability-based
model, holding an unforgivable reference or capability to an object provides access to the
object (roughly analogous to how possession of your house key grants you access to your
house); access is conveyed to another party by transmitting such a capability over a secure
channel. In an ACL-based model, a subject's access to an object depends on whether its
identity is on a list associated with the object (roughly analogous to how a bouncer at a
private party would check your ID to see if your name is on the guest list); access is conveyed
by editing the list.

Security testing is a process to determine that an information system protects data and
maintains functionality as intended.
The six basic security concepts that need to be covered by security testing are:
1. Confidentiality.
2. Integrity.
3. Authentication.
4. Authorization.
5. Availability.
6. Non-repudiation.

Agile Requirements Strategies:


This is an overview to agile approaches to requirement elicitation and management. This is
important because your approach to requirements goes hand-in-hand with your approach to
validating those requirements, therefore to understand how disciplined agile teams approach
testing and quality you first need to understand how agile teams approach requirements.
The best practices of Agile Modeling (AM) which address agile strategies for modeling and
documentation, and in the case of TDD and executable specifications arguably strays into
testing. This section is organized into the following topics:

Active Stakeholder Participation.


Functional Requirements Management.
Initial Requirements Envisioning.
Iteration Modeling.
Just in Time (JIT) Model Storming.
Non-Functional Requirements Management.
Who is Doing This?
The Implications for Testing.

How Agile is Different from other testing activities:


Traditional testing professionals who are making the move to agile development may find the
following aspects of agile development to be very different than what they are used to:

Greater collaboration: Agile developers work closely together, favoring direct


communication over passing documentations back and forth to each other. They

recognize that documentation is the least effective manner of communication


between people.

Shorter work cycle: The time between specifying a requirement in detail and
validating that requirement is now on the order of minutes, not months or years, due
to the adoption of test-driven development (TDD) approaches, greater collaboration,
and less of a reliance on temporary documentation.

Agilest embrace change: Agile developers choose to treat requirements like a


prioritized stack which is allowed to change throughout the lifecycle. A changed
requirement is a competitive advantage if you're able to implement it.

Greater flexibility is required of testers: Gone are the days of the development team
handing off a "complete specification" which the testers can test again. The
requirements evolve throughout the project. Ideally, acceptance-level "story tests" are
written before the production code which fulfills them, implying that the tests become
detailed requirement specifications.

Greater discipline is required of IT: It's very easy to say that you're going to work
closely with your stakeholders, respect their decisions, produce potentially shippable
software on a regular basis, and write a single test before writing just enough
production code to fulfill that test (and so on) but a lot harder to actually do them.

Greater accountability is required of stakeholders: One of the implications of


adopting the practices active stakeholder participation, prioritized requirements, and
producing working software on a regular basis is that stakeholders are now
accountable for the decisions that they make.

Systems and Development Life Cycle (SDLC) is a process of process used by a systems
analyst to develop an information system, including requirements, validation, training, and
user (stakeholder) ownership. Any SDLC should result in a high quality system that meets or
exceeds customer expectations, reaches completion within time and cost estimates, works
effectively and efficiently in the current and planned Information Technology infrastructure,
and is inexpensive to maintain and cost-effective to enhance.
Computer systems are complex and often (especially with the recent rise of Service-Oriented
Architecture) link multiple traditional systems potentially supplied by different software
vendors. To manage this level of complexity, a number of SDLC models have been created:
"waterfall"; "fountain"; "spiral"; "build and fix"; "rapid prototyping"; "incremental"; and
"synchronize and stabilize".
SDLC models can be described along a spectrum of agile to iterative to sequential. Agile
methodologies, such as XP and Scrum, focus on light-weight processes which allow for rapid
changes along the development cycle. Iterative methodologies, such as Rational Unified
Process and Dynamic Systems Development Method, focus on limited project scopes and
expanding or improving products by multiple iterations. Sequential or big-design-upfront
(BDUF) models, such as Waterfall, focus on complete and correct planning to guide large
projects and risks to successful and predictable results. Other models, such as Anamorphic
Development, tend to focus on a form of development that is guided by project scope and
adaptive iterations of feature development.
Product lifecycle management (PLM) Benefits:

Reduced time to market.


Improved product quality.

Reduced prototyping costs.


More accurate and timely Request For Quote generation.
Ability to quickly identify potential sales opportunities and revenue contributions.
Savings through the re-use of original data.
A framework for product optimization.
Reduced waste.
Savings through the complete integration of engineering workflows.
Ability to provide Contract Manufacturers with access to a centralized product record.

Product lifecycle management (PLM) is the process of managing the entire lifecycle of a
product from its conception, through design and manufacture, to service and disposal.PLM
integrates people, data, processes and business systems and provides a product information
backbone for companies and their extended enterprise.
'Product lifecycle management' (PLM) should be distinguished from 'Product life cycle
management (marketing)' (PLCM). PLM describes the engineering aspect of a product, from
managing descriptions and properties of a product through its development and useful life;
whereas, PLCM refers to the commercial management of life of a product in the business
market with respect to costs and sales measures.
Product lifecycle management is one of the four cornerstones of a corporation's information
technology structure.

All companies need to manage communications and information with their customers
(CRM-Customer Relationship Management).
Their suppliers (SCM-Supply Chain Management).
Their resources within the enterprise (ERP-Enterprise Resource Planning) and
Their planning (SDLC-Systems Development Life Cycle).

In addition, manufacturing engineering companies must also develop, describe, manage and
communicate information about their products.

Application lifecycle management (ALM) is business management, software engineering


made possible by tools that facilitate and integrate requirements management, architecture,
coding, testing, tracking, and release management.

Benefits:
Increases productivity, as the team shares best practices for development and deployment,
and developers need focus only on current business requirements

Improves quality, so the final application meets the needs and expectations of users

Breaks boundaries through collaboration and smooth information flow

Accelerates development through simplified integration

Cuts maintenance time by synchronizing application and design

Maximizes investments in skills, processes, and technologies

Increases flexibility by reducing the time it takes to build and adapt applications that support
new business initiatives.

Dynamic program analysis:


Dynamic program analysis is the analysis of computer software that is performed by
executing programs built from that software system on a real or virtual processor. For
dynamic program analysis to be effective, the target program must be executed with sufficient
test inputs to produce interesting behavior. Use of software testing techniques such as code
coverage helps ensure that an adequate slice of the program's set of possible behaviors has
been observed. Also, care must be taken to minimize the effect that instrumentation has on
the execution (including temporal properties) of the target program.

The highest usage scenarios of Reverse Semantic Traceability(RST) method can be:

Validating UML models: quality engineers restore a textual description of a domain,


original and restored descriptions are compared.

Validating model changes for a new requirement: given an original and changed
versions of a model, quality engineers restore the textual description of the
requirement, original and restored descriptions are compared.

Validating a bug fix: given an original and modified Source code, quality engineers
restore a textual description of the bug that was fixed, original and restored
descriptions are compared.

Integrating new software engineer into a team: a new team member gets an
assignment to do Reverse Semantic Traceability for the key artifacts from the current
projects.

A sample testing cycle


Although variations exist between organizations, there is a typical cycle for testing. The
sample below is common among organizations employing the Waterfall development model.
Requirements analysis: Testing should begin in the requirements phase of the
software development life cycle. During the design phase, testers work with
developers in determining what aspects of a design are testable and with what
parameters those tests work.
Test planning: Test strategy, test plan, test bed creation. Since many activities will be
carried out during testing, a plan is needed.
Test development: Test procedures, test scenarios, test cases, test datasets, test
scripts to use in testing software.
Test execution: Testers execute the software based on the plans and test documents
then report any errors found to the development team.
Test reporting: Once testing is completed, testers generate metrics and make final
reports on their test effort and whether or not the software tested is ready for release.
Test result analysis: Or Defect Analysis, is done by the development team usually
along with the client, in order to decide what defects should be treated, fixed, rejected
(i.e. found software working properly) or deferred to be dealt with later.
Defect Retesting: Once a defect has been dealt with by the development team, it is
retested by the testing team. AKA Resolution testing.
Regression testing: It is common to have a small test program built of a subset of
tests, for each integration of new, modified, or fixed software, in order to ensure that
the latest delivery has not ruined anything, and that the software product as a whole
is still working correctly.
Test Closure: Once the test meets the exit criteria, the activities such as capturing
the key outputs, lessons learned, results, logs, documents related to the project are
archived and used as a reference for future projects.

Input combinations and preconditions in Testing:


A very fundamental problem with software testing is that testing under all combinations of
inputs and preconditions (initial state) is not feasible, even with a simple product.
This means that the number of defects in a software product can be very large and defects
that occur infrequently are difficult to find in testing. More significantly, non-functional
dimensions of quality (how it is supposed to be versus what it is supposed to do)usability,
scalability, performance, compatibility, reliabilitycan be highly subjective; something that
constitutes sufficient value to one person may be intolerable to another.

Requirements Management Plan


A requirements management plan is a component of the project management plan. Generally,
the purpose of RM is to ensure customer, developer and tester have a common
understanding of what the requirements for an undertaking are. Several subordinate goals
must be met for this to take place: in particular, requirements must be of good quality and
change must be controlled. The plan documents how these goals will be achieved.
Depending on your project standards, a variety of sections might be included in your RM plan.
Some examples are:
Introduction to RM and document overview.
Document scope.
Issues affecting implementation of the plan, such as training on the RM tool.
Applicable documents, such as policies and standards.
Terms and definitions used in the plan - if you use the term requirement to include
several requirement categories
define it here.
Methods and tools that will be used for the RM process (or the requirements for
selecting a tool if one is not selected)
The RM process, including any diagrams of the process.
Authorities and responsibilities of participants.
Strategy for achieving requirement quality, including traceability and change control.
Appendices usually contain a discussion of quality factors, as well as references, any
forms to be used in the
Process, and additional details not included in the main body of the plan, such as
report examples.

Different Types of Testing:


Basis Path Testing:
A white box test case design technique that uses
the algorithmic flow of the program to design
tests.
Baseline:
The point at which some deliverable produced
during the software engineering process is put
under formal change control
Binary Portability Testing:
Testing an executable application for portability
across system platforms and environments,
usually for conformation to an ABI specification

Breadth Testing:
A test suite that exercises the full functionality of
a product but does not test features in detail.

Você também pode gostar