Você está na página 1de 3

Test Data:

What You Need and


How to Get It There are four basic strategies
for assembling a test data envi-
ronment: production sampling,
by Linda Hayes starting from scratch, seeding
data, or generating it.

In most test cases, the data is key to Approach Advantages Disadvantages


the result. Not just entering or verify-
ing data values, but knowing what Sample production Represents reality Complex to extract sample
the state of the data is supposed to be Large random volume Fractional coverage
so you can predict expected results. Already exists Refresh maintenance
Getting control of the test data is fun- Unavailable for new application
damental for any test effort, because
a basic tenet of software testing is Start from scratch Complete control
that you must know both the input Easily recreated Requires automation
conditions of the data and the Targeted Not all files can be created
required output results to perform a Lacks reality, randomness
valid test. If you don't know either of
these, it's not a test; it's an experi- Seed data Expands on production Effort to define, add
ment, because you don't know what Targeted Maintenance
will happen. This predictability is
important for manual testing, but for
Generate data Large defined volume Lacks reality, randomness
automated testing it's essential.
Multiple file types Referential integrity
For many systems you can't even get Automated Locating specific data
started until you have enough test
data to make it meaningful...and if you each. First, the test platform seldom replicates
need thousands or millions of data production capacity, and so a subset
records, you've got a whole new prob- Sample Production must be extracted. Acquiring this subset
lem. In an extreme case, testing an air- If your application is already in produc- is not as easy as taking every Nth record
line fare pricing application required tion and you need data for regression or some flat percentage of the data: the
tens of thousands of setup transactions testing, the most common test-data complex interrelationships between
to create the cities, flights, passengers acquisition technique is to take it from files means that the subset must be
and fares needed to exercise all of the production. This approach seems both internally cohesive. For example, the
requirements. The actual test itself took logical and practical: production repre- selected transactions must reference
less time than the data setup. sents reality, in that it contains the actu- valid selected master accounts, and the
al situations the software must deal with totals must coincide with balances and
There are four basic strategies for and it offers both depth and breadth histories. Simply identifying these rela-
assembling a test data environment: while ostensibly saving the time tionships and tracing through all of the
production sampling, starting from required to create new data. files to assure that the subset makes
scratch, seeding data, or generating it. sense can be a major undertaking in and
Let's look at each strategy and consider There are other drawbacks to note. of itself.
the advantages and disadvantages of

June 2003 http://www.testinginstitute.com Journal of Software Testing Professionals 1


Further, it is difficult to know how large scratch, in effect reconstructing the test cycles. So, the alternative of acquiring
of a sample is necessary to achieve cov- data each time. This approach has the the data from production and perform-
erage of all critical states and combina- benefit of complete control -- the con- ing the necessary maintenance on the
tions, and whether the complete set tent is always known and can be tests proved to be less time-consuming.
even exists in production at any given enhanced or extended over time, pre- Once the data was assembled, it was
point. A snapshot may reflect only half serving prior efforts. Internal cohesion archived for reuse.
or even less of all possible data types, as is assured because the software itself
one financial services firm learned. creates and maintains the interrelation- It's still not easy. You must still sur-
Even after exercising production data ships, and changes to file structures or mount the cohesion challenge, assuring
for five different customers, their code record layouts are automatically incor- that the subset you acquire makes sense,
coverage tool showed only fractional porated. and you must still have an efficient
results. Obviously missing from pro- means of creating the additional data
duction are invalid data and transaction But reconstructing test data is not free needed for test conditions. Furthermore,
types that exercise exception and error from hazards. The most obvious is that, you must treat the resulting data as the
handling functionality, but less obvious without automation, it's highly imprac- valuable asset that it is, instituting pro-
might be conditions that are time or tical for large-scale applications. But cedures for archiving it safely so that it
event specific. So, even complete pro- less obvious is the fact that some files can be restored and reused.
duction data does not represent code cannot be created through online inter-
complete coverage. Another drawback action: they are system generated only Although a popular and sensible con-
of this approach is that the tests them- through interfaces or processing cycles. cept, reuse brings its own issues. For
selves and the extracted data must be Thus, it may not be possible to start time-sensitive applications, which
constantly modified to work together, from a truly clean slate. many if not most are, reusing the same
and may even require scrubbing to data over and over is not viable unless
remove sensitive data such as social A compelling argument also might be you can roll the data dates forward or
security numbers or names. made that data created in a vacuum, so the system date back. For example, an
to speak, lacks the expanse of produc- employee who is 64 one month may
Going back to our basic tenet, we must tion: unique or unusual situations that turn 65 the next, resulting in different
know the input conditions for a valid often arise in the real world may not be tax consequences for pension payouts.
test: in this case, the data contents. Each contemplated by test designers. Luckily, in many cases, such test capa-
fresh extraction starts everything over. Granted, this technique allows for bilities were left behind by a Y2K proj-
If a payroll tax test requires an employ- steady and constant expansion of the ect.
ee whose year-to-date earnings will test data as necessary circumstances are
cross over the FICA limit on the next discovered, but it lacks the randomness Furthermore, modifications to file
paycheck, for example, the person per- and reality that makes production so structures and record layouts demand
forming the test must either find such an appealing. data conversions, but this may be seen
employee in the subset, modify one, or as an advantage since--hopefully--the
add one. If the test is automated, it too Seeding Data conversions are tested against the test
must be modified for the new employee Seeding test data is a combination of bed before they are performed against
number and related information. using production files and creating new production.
Searching for an employee that meets data with specific conditions. This
all the conditions you are interested in is approach provides a dose of reality tem- Generating Data
like searching for a needle in a pered by a measure of control. Automatically generated test data can
haystack. be used to create databases containing
This is the strategy adopted by a major defined cases and conditions as well as
Thus, the time savings are illusory mutual fund to enable test automation. enough information to approximate
because there is limited repeatability: Without predictable, repeatable data real-world conditions for testing capac-
all effort to establish the proper test there was no practical means of reusing ity and performance. If you need to
conditions is lost every time the extract automated tests across releases. assure that your database design can
is refreshed. Although much of the data could be cre- support millions of customers or bil-
ated through the online interface, lions of transactions and still deliver
And finally, this approach obviously including funds, customers and acceptable response times, generation
cannot be employed for new systems accounts, other data had to be extracted may be the only practical means of cre-
under development, since no production from production. Testing statements ating these volumes.
data is available. and tax reports, for example, required
Starting from Scratch historical transactions that could not be Fortunately, test data generators are
The other extreme is to start from generated except by multiple execution becoming more powerful and available.

2 Journal of Software Testing Professionals http://www.testinginstitute.com June 2003


These products offer powerful features it may be easy to generate huge vol-
that can expedite the tiresome job of Some users have found it easier to use a umes of accounts whose balances are
populating files and databases with test data generator to create data that is all over the map in terms of their
enough data to support complex test then read by an automated test tool and amount and due dates, it is not as sim-
scenarios. A few examples include entered into the application. This is an ple to know exactly which accounts sat-
Datatect from Banner Software interesting combination of data generat- isfy which business rules.
(www.datatect.com), Datamacs from ing and seeding. The synergy between
Computer Associates International Inc. test data generation and test automation In this context, the same issues exist as
(www.cai.com), D-Generate from tools is a natural, and in some cases, the they do for simply sampling production
Synthetic Minds (www.synthetic- test data generation capability is being data. Even if you are comfortable that
minds.com) and TurboData from embedded in test execution products. your data sample is representative of
Canam Software (www.turbodata.com). the types of conditions that must be test-
The goal of using a test data generation ed, it's another matter altogether to
But what do these tools do? How well product, of course, is data--tons of it. know which accounts meet which
do they work? And what do they test? Thousands or even millions, some requirements. Testing complex business
claim billions, of records containing rules may require knowing the exact
As you might imagine, test data genera- variations of the described data can be state of several variables that are spread
tors begin with the description of the generated and outputted to the database over multiple databases, tables, and/or
file or database that is to be created. In or file format of choice. Most major files, and finding that precise combina-
most cases, the tools can read the data- databases--Oracle, Sybase, Informix, tion may be like looking for a needle in
base tables directly to determine the SQL Server, and Microsoft Access are a haystack.
fields and their type, length, and format. supported, as well as formats such as
The user can then add the rules, rela- XML or flat files. Can you benefit from a test data gener-
tionships, and constraints that govern ation tool? Probably. And, if you can,
the generation of valid data. But, like most things, it's not always how do you select the best one for your
quite that easy. These days, databases needs? The most important step is to
Standard "profiles" are also offered, can contain more than just data, such as carefully research the types of data and
which can automatically produce bil- stored procedures or derived foreign files that your application uses, espe-
lions of names, addresses, cities, states, keys that link other tables or databases. cially the way the database is struc-
zip codes, Social Security numbers, test In these cases, it is not feasible to gen- tured. Not only will this help you eval-
dates, and other common data values erate data directly into the tables. Too uate the various products, it will add to
such as random values, ranges, and type often maintaining database integrity is a your education about how your soft-
mixes. User customizable data types are project unto itself. ware and its data are designed. And that
also available in most products, which information is useful whether you gen-
can be used for generating unique SIC And, of course, in the end volume is its erate data or not!
(standard industrialization classifica- own challenge. More is not necessarily
tion) business codes, e-mail addresses, better. Too much data will take too long Summary
and other data types. to generate, will require too much stor-
age, and may create even more issues As you can see, acquiring and manag-
A more critical feature, and more diffi- than not enough. ing test data is not easily done no mat-
cult to implement, is support for par- ter which route you choose, and the
ent/child and other relationships in Functional testing is a different animal. amount of time and effort will be sig-
complex databases. For example, a par- If you are testing business rules - such nificant in any event. It is important to
ent record, such as a customer account as whether or not an account whose bal- realize that your test data strategy is
master, must be linked with multiple ance owed is more than 90 days past integral to the integrity of your overall
child records, such as different accounts due will permit additional credit pur- test process, and without a thoughtful
and transactions. This type of function- chases to be posted - then you must and consistent approach, it will be an
ality is essential for relational database know precisely which account number expense instead of an investment.
environments where referential integri- contains the condition and which trans-
ty is key. actions will be entered against it. While

June 2003 http://www.testinginstitute.com Journal of Software Testing Professionals 3

Você também pode gostar