Você está na página 1de 47

Black box testing

Black box testing takes an external perspective of the test object to derive test
cases. These tests can be functional or non-functional, though usually functional. The test
designer selects valid and invalid input and determines the correct output. There is no
knowledge of the test object's internal structure.

This method of test design is applicable to all levels of software testing: unit,
integration, functional testing, system and acceptance. The higher the level, and hence the
bigger and more complex the box, the more one is forced to use black box testing to
simplify. While this method can uncover unimplemented parts of the specification, one
cannot be sure that all existent paths are tested.

Contents
• 1 Test design techniques
• 2 User input validation

• 3 Hardware

Test design techniques


Typical black box test design techniques include:

• Equivalence partitioning
• Boundary value analysis
• Decision table testing
• Pairwise testing
• State transition tables
• Use case testing
• Cross-functional testing

User input validation


User input must be validated to conform to expected values. For example, if the
software program is requesting input on the price of an item, and is expecting a value
such as 3.99, the software must check to make sure all invalid cases are handled. A user
could enter the price as "-1" and achieve results contrary to the design of the program.
Other examples of entries that could be entered and cause a failure in the software
include: "1.20.35", "Abc", "0.000001", and "999999999". These are possible test
scenarios that should be entered for each point of user input.
Other domains, such as text input, need to restrict the length of the characters that
can be entered. If a program allocates 30 characters of memory space for a name, and the
user enters 50 characters, a buffer overflow condition can occur.

Typically when invalid user input occurs, the program will either correct it
automatically, or display a message to the user that their input needs to be corrected
before proceeding.

Hardware
Functional testing devices like power supplies, amplifiers, and many other simple
function electrical devices is common in the electronics industry. Automated functional
testing of specified characteristics is used for production testing, and part of design
validation.

STRESS TESTING CODE COVERAGE


Code coverage is a measure used in software testing. It describes the degree to
which the source code of a program has been tested. It is a form of testing that looks at
the code directly and as such comes under the heading of white box testing.

Code coverage techniques were amongst the first techniques invented for
systematic software testing. The first published reference was by Miller and Maloney in
Communications of the ACM in 1963.

There are a number of different ways of measuring code coverage, the main ones being:

• Statement Coverage - Has each line of the source code been executed and tested?
• Condition Coverage - Has each evaluation point (such as a true/false decision)
been executed and tested?
• Path Coverage - Has every possible route through a given part of the code been
executed and tested?
• Entry/Exit Coverage - Has every possible call and return of the function been
executed and tested?

Safety critical applications are often required to demonstrate that testing achieves
100% of some form of code coverage.

Some of the coverage criteria above are connected; for instance, path coverage
implies condition, statement and entry/exit coverage. Statement coverage does not imply
condition coverage, as the code (in the C programming language) shows:
void foo(int bar)
{
printf("This is ");
if (bar < 0)
{
printf("not ");
}
printf("a positive integer.\n");
return;
}

If the function "foo" were called with variable "bar = -1", statement coverage would be
achieved. Condition coverage, however, would not.

Full path coverage, of the type described above, is usually impractical or


impossible. Any module with a succession of n decisions in it can have up to 2n paths
within it; loop constructs can result in an infinite number of paths. Many paths may also
be infeasible, in that there is no input to the program under test that can cause that
particular path to be executed. However, a general-purpose algorithm for identifying
infeasible paths has been proven to be impossible (such an algorithm could be used to
solve the halting problem). Techniques for practical path coverage testing instead attempt
to identify classes of code paths that differ only in the number of loop executions, and to
achieve "basis path" coverage the tester must cover all the path classes.

Usually the source code is instrumented and run through a series of tests. The
resulting output is then analysed to see what areas of code have not been exercised, and
the tests are updated to include these areas as necessary. Combined with other code
coverage methods the aim is to develop a rigorous yet manageable set of regression tests.

Code coverage is ultimately expressed as a percentage, as in "We have tested 67%


of the code." The meaning of this depends on what form(s) of code coverage have been
used, as 67% path coverage is more comprehensive than 67% statement coverage.The
value of code coverage as a measure of test quality is debated (see external links).

Defect tracking - In engineering, defect tracking is the process of finding


defects in a product, (by inspection, testing, or recording feedback from customers), and
making new versions of the product that fix the defects. Defect tracking is important in
software engineering as complex software systems typically have tens or hundreds or
thousands of defects: managing, evaluating and prioritizing these defects is a difficult
task: defect tracking systems are computer database systems that store defects and help
people to manage them.

IBM Rational ClearQuest is an industry leading defect tracking tool.

VMS & Bugzilla are bug tracking tools.


Software release life cycle
A software release is the distribution, whether public or private, of an initial or
new and upgraded version of a computer software product. Each time a software program
or system is changed, the programmers and company doing the work decide on how to
distribute the program or system, or changes to that program or system. Software patches
are one method of distributing the changes, as are downloads and compact discs.
Software release stages
The software release life cycle is composed of different stages that describe the
stability of a piece of software and the amount of development it requires before final
release. Each major version of a product usually goes through a stage when new features
are added, or the alpha stage; a stage when it is being actively debugged, or the beta
stage; and finally a stage when all important bugs have been removed, or the stable stage.
Intermediate stages may also be recognized. The stages may be formally announced and
regulated by the project's developers, but sometimes the terms are used informally to
describe the state of a product. Conventionally, code names are often used by many
companies for versions prior to the release of the product, though the actual product and
features are rarely secret.

Contents
• 1 Software release stages
o 1.1 Pre-alpha
o 1.2 Alpha
o 1.3 Beta
 1.3.1 Origin of 'alpha' and 'beta'
o 1.4 Release candidate
o 1.5 Gold/general availability release
o 1.6 RTM / RTW
 1.6.1 Box copy
o 1.7 Stable/unstable
• 2 See also

• 3 External links

Software release stages


Pre-alpha

Sometimes a build known as pre-alpha is issued, before the release of an alpha or


beta. In contrast to alpha and beta versions, the pre-alpha is usually not "feature
complete". When it is used, it refers to all activities performed during the software project
prior to software testing. These activities can include requirements analysis, software
design, software development and unit testing.
Alpha
The alpha version of a product still awaits full testing of all its functionality but
satisfies all the software requirements. As the first major stage in the release lifecycle, it
is named after alpha, the first letter in the Greek alphabet.

The alpha build of the software is the build delivered to the software testers, that
is persons different from the programmers, but usually internal to the organization or
community that develops the software. In a rush to market, more and more companies are
engaging external customers or value-chain partners in their alpha testing phase. This
allows more extensive usability testing during the alpha phase.

In the first phase of testing, developers generally test the software using white box
techniques. Additional validation is then performed using black box or grey box
techniques, by another dedicated testing team, sometimes concurrently. Moving to black
box testing inside the organization is known as alpha release.

Beta
A beta version is the first version released outside the organization or community
that develops the software, for the purpose of evaluation or real-world black/grey-box
testing. The process of delivering a beta version to the users is called beta release.

The users of a beta version are said beta testers. They are usually customers or
prospective customers of the organization that develops the software. They receive the
software for free or for a reduced price, but act as free testers.

Beta versions test the supportability of the product, the go-to-market messaging
(while recruiting Beta customers), the manufacturability of the product, and the overall
channel flow or channel reach.

Beta version software is likely to be useful for internal demonstrations and


previews to select customers, but unstable and not yet ready for release. Some developers
refer to this stage as a preview, a prototype, a technical preview (TP) or as an early
access. As the second major stage in the release lifecycle, following the alpha stage, it is
named after the Greek letter beta, the second letter in the Greek alphabet.

Often this stage begins when the developers announce a feature freeze on the
product, indicating that no more feature requirements will be accepted for this version of
the product. Only software issues, or bugs and unimplemented features will be addressed.

Beta versions stand at an intermediate step in the full development cycle.


Developers release either a closed beta or an open beta; closed beta versions are released
to a select group of individuals for a user test, while open betas are to a larger community
group, usually the general public. The testers report any bugs that they found and
sometimes minor features they would like to see in the final version.
An example of a major public beta test was when Microsoft started releasing
regular Windows Vista Community Technology Previews (CTP) to beta testers starting in
January 2005. The first of these was build 5219. Subsequent CTPs introduced most of the
planned features, as well as a number of changes to the user interface, based in large part
on feedback from beta testers. Windows Vista was deemed feature complete with the
release of build 5308 CTP, released on February 22, 2006, and much of the remainder of
work between that build and the final release of the product focused on stability,
performance, application and driver compatibility, and documentation.

When a beta becomes available to the general public it is often widely used by the
technologically savvy and those familiar with previous versions as though it were the
finished product. Usually developers of freeware or open-source betas release them to the
general public while proprietary betas go to a relatively small group of testers. Recipients
of highly proprietary betas may have to sign a non-disclosure agreement. A release is
called feature complete when the product team agrees that functional requirements of the
system are met and no new features will be put into the release, but significant software
bugs may still exist. Companies with a formal software process will tend to enter the beta
period with a list of known bugs that must be fixed to exit the beta period, and some
companies make this list available to customers and testers.

As the Internet has allowed for rapid and inexpensive distribution of software,
companies have begun to take a more flexible approach to use of the word "beta".
Netscape Communications was infamous for releasing alpha level versions of its
Netscape web browser as public beta releases. In February 2005, ZDNet published an
article about the recent phenomenon of a beta version often staying for years and being
used as if it were in production-level [1]. It noted that Gmail and Google News, for
example, had been in beta for a long period of time and were not expected to drop the
beta status despite the fact that they were widely used; however, Google News did leave
beta in January 2006. This technique may also allow a developer to delay offering full
support and/or responsibility for remaining issues. In the context of Web 2.0, people even
talk of perpetual betas to signify that some software is meant to stay in beta state.

Origin of 'alpha' and 'beta'


The term beta test applied to software comes from an early IBM hardware
product test convention dating back to punched card tabulating and sorting machines.
Hardware first went through an alpha test for preliminary functionality and small scale
manufacturing feasibility. Then came a beta test to verify that it actually correctly
performed the functions it was supposed to and could be manufactured at scales
necessary for the market, and then a c test to verify safety. With the advent of
programmable computers and the first shareable software programs, IBM used the same
terminology for testing software. Beta tests were conducted by people or groups other
than the developers. As other companies began developing software for their own use,
and for distribution to others, the terminology stuck and now is part of our common
vocabulary.
Release candidate
The term release candidate refers to a version with potential to be a final
product, ready to release unless fatal bugs emerge. In this stage, the product features all
designed functionalities and no known showstopper class bugs. At this phase the product
is usually code complete.

Microsoft Corporation often uses the term release candidate. During the 1990s,
Apple Computer used the term "golden master" for its release candidates, and the final
golden master was the general availability release. Other terms include gamma (and
occasionally also delta, and perhaps even more Greek letters) for versions that are
substantially complete, but still under test, and omega for final testing of versions that are
believed to be bug-free, and may go into production at any time. (Gamma, delta, and
omega are, respectively, the third, fourth, and last letters of the Greek alphabet.) Some
users disparagingly refer to release candidates and even final "point oh" releases as
"gamma test" software, suggesting that the developer has chosen to use its customers to
test software that is not truly ready for general release. Often, beta testers, if privately
selected, will be billed for using the release candidate as though it were a finished
product.

A release is called code complete when the development team agrees that no
entirely new source code will be added to this release. There may still be source code
changes to fix defects. There may still be changes to documentation and data files, and to
the code for test cases or utilities. New code may be added in a future release.

Gold/general availability release

The gold or general availability release version is the final version of a


particular product. It is typically almost identical to the final release candidate, with only
last-minute bugs fixed. A gold release is considered to be very stable and relatively bug-
free with a quality suitable for wide distribution and use by end users. In commercial
software releases, this version may also be signed (used to allow end-users to verify that
code has not been modified since the release). The expression that a software product
"has gone gold" means that the code has been completed and "is being mass-produced
and will be for sale soon." Other terms for the version include gold master, gold release,
or gold build.

The term gold anecdotally refers to the use of "gold master disc" which was
commonly used to send the final version to manufacturers who use it to create the mass-
produced retail copies. It may in this context be a hold-over from music production. In
some cases, however, the master disc is still actually made of gold, for both aesthetic
appeal and resistance to corrosion.
RTM / RTW

Microsoft and others use the term "release to manufacturing" (RTM) to refer to
this version (for commercial products, like Windows XP, as in, "Build 2600 is the
Windows XP RTM release"), and "release to Web" (RTW) for freely downloadable
products. Typically, RTM is at least one or two weeks before GA because the RTM
version must be burnt to disc and boxed etc.

Box copy

A box copy is the final product, printed on a disc that is included in the actual
release, complete with disc graphic art. This term is used mostly by reviewers to
differentiate from gold master discs. A box copy does not necessarily come enclosed in
the actual boxed product - it refers to the disc itself.

Stable/unstable

In open source programming, version numbers or the terms stable and unstable
commonly distinguish the stage of development. The term stable refers to a version of
software that is substantially identical to a version that has been through enough real-
world testing to reasonably assume there are no showstopper problems, or at least that
any problems are known and documented. On the other hand, the term unstable does not
necessarily mean that there are problems - rather, that enhancements or changes have
been made to the software that have not undergone rigorous testing and that more
changes are expected to be imminent. Users of such software are advised to use the
stable version if it meets their needs, and to only use the unstable version if the new
functionality is of interest that exceeds the risk that something might simply not work
right.

In the Linux kernel, version numbers take the form of three numbers, separated by a
decimal point. Prior to the 2.6.x series, an even second number was used to represent a
stable release and an odd second number used to represent an unstable release. As of the
2.6.x series, the even or odd status of the second number no longer holds any
significance. The practice of using even and odd numbers to indicate the stability of a
release has been used by many other open and closed source projects.

Dynamic program analysis


Dynamic code analysis is the analysis of computer software that is performed
with executing programs built from that software on a real or virtual processor (analysis
performed without executing programs is known as static code analysis). Such tools may
require loading of special libraries or even recompilation of program code.
Examples
• Valgrind, performances run on a virtual processor, can detect memory errors (e.g.
connected with misuse malloc and free) and race conditions in multithread
programs.
• Dmalloc, library for checking memory allocation and leaks. Software must be
recompiled, and all files must include the special C header file dmalloc.h.
• VB Watch injects dynamic analysis code into Visual Basic programs to monitor
their performance, call stack, execution trace, instantiated objects, variables and
code coverage.
• What is dynamic analysis and how can it be automated? Dynamic Analysis uses
test data sets to execute software in order to observe its behaviour and produce
test coverage reports. This assessment of source code ensures consistent levels of
high quality testing and correct use of capture/playback tools.

Exploratory testing
Exploratory testing is an approach in software testing with simultaneous
learning, test design and test execution. While the software is being tested, the tester
learns things that together with experience and creativity generates new good tests to run.

Contents
• 1 History
• 2 Description
• 3 Benefits and drawbacks

• 4 Usage

History
Exploratory testing has been performed for a long time, and has similarities to ad
hoc testing. In the early 1990s, ad hoc was too often synonymous with sloppy and
careless work. As a result, a group of test methodologists (now calling themselves the
Context-Driven School) began using the term "exploratory" seeking to emphasize the
dominant thought process involved in unscripted testing, and to begin to develop the
practice into a teachable discipline. This new terminology was first published by Cem
Kaner in his book Testing Computer Software. Exploratory testing can be as disciplined
as any other intellectual activity.
Description
Exploratory testing seeks to find out how the software actually works, and to ask
questions about how it will handle difficult and easy cases. The testing is dependent on
the testers skill of inventing test cases and finding defects. The more the tester knows
about the product and different test methods, the better the testing will be.

To further explain, comparison can be made to the antithesis scripted testing,


which basically means that test cases are designed in advance, including steps to
reproduce and expected results. These tests are later performed by a tester who compares
the actual result with the expected.

When performing exploratory testing, there are no exact expected results; it is the
tester that decides what will be verified, critically investigating the correctness of the
result.

In reality, testing almost always is a combination of exploratory and scripted


testing, but with a tendency towards either one, depending on context.

The documentation of exploratory testing ranges from documenting all tests


performed to just documenting the bugs. During pair testing, two persons create test cases
together; one performs them, and the other documents. Session-based testing is a method
specifically designed to make exploratory testing auditable and measurable on a wider
scale.

Benefits and drawbacks


The main advantage of exploratory testing is that less preparation is needed,
important bugs are found fast, and is more intellectually stimulating than scripted testing.

Disadvantages are that the tests can't be reviewed in advance (and by that prevent
errors in code and test cases), and that it can be difficult to show exactly which tests have
been run.When repeating exploratory tests, they will not be performed in the exact same
manner, which can be an advantage if it is important to find new errors; or a disadvantage
if it is more important to know that exact things are functional.

Usage
Exploratory testing is extra suitable if requirements and specifications are
incomplete, or if there is lack of time. The method can also be used to verify that previous
testing has found the most important defects. It is common to perform a combination of
exploratory and scripted testing where the choice is based on risk.

An example of exploratory testing in practice is Microsofts verification of


Windows compatibility.
Category:SoftwareTestingFormal Verification
In the context of hardware and software systems, formal verification is the act of
proving or disproving the correctness of intended algorithms underlying a system with
respect to a certain formal specification or property, using formal methods of
mathematics.

Contents
• 1 Explanation
• 2 Usage
• 3 Approaches to formal verification
• 4 Validation and Verification

• 5 Program verification

Explanation
Software testing alone cannot prove that a system does not contain any defects.
Neither can it prove that it does have a certain property. Only the process of formal
verification can prove that a system does not have a certain defect or does have a certain
property. It is impossible to prove or test that a system has "no defect" since it is
impossible to formally specify what "no defect" means. All that can be done is prove that
a system does not have any of the defects that can be thought of, and has all of the
properties that together make it functional and useful.

Usage
Formal verification can be used for example for systems such as cryptographic
protocols, combinatorial circuits, digital circuits with internal memory, and software
expressed as source code.

The verification of these systems is done by providing a formal proof on an


abstract mathematical model of the system, the correspondence between the
mathematical model and the nature of the system being otherwise known by construction.
Examples of mathematical objects often used to model systems are: finite state machines,
labelled transition systems, Petri nets, timed automata, hybrid automata, process algebra,
formal semantics of programming languages such as operational semantics, denotational
semantics, axiomatic semantics and Hoare logic.
Approaches to formal verification
There are roughly two approaches to formal verification.

The first approach is model checking, which consists of a systematically


exhaustive exploration of the mathematical model (this is only possible for a model that
is finite). Usually this consists of exploring all states and transitions in the model, by
using smart and domain-specific abstraction techniques to consider whole groups of
states in a single operation and reduce computing time. Implementation techniques
include state space enumeration, symbolic state space enumeration, abstract
interpretation, symbolic simulation, abstraction refinement.

The second approach is logical inference. It consists of using a formal version of


mathematical reasoning about the system, usually using theorem proving software such
as the HOL theorem prover or Isabelle theorem prover. This is usually only partially
automated and is driven by the user's understanding of the system to validate.

The properties to be verified are often described in temporal logics, such as linear
temporal logic (LTL) or computational tree logic (CTL).

Validation and Verification


Verification is one aspect of testing a product's fitness for purpose. Validation is the
complementary aspect. Often one refers to the overall checking process as V & V.

• Validation: "Are we building the right product?", i.e., does the product do what
the user really requires?
• Verification: "Are we building the product right?", i.e., does the product conform
to the specifications?

The verification process consists of static and dynamic parts. E.g., for a software
product one can inspect the source code (static) and run against specific test cases
(dynamic). Validation usually can only be done dynamically, i.e., the product is tested by
putting it through typical usages and atypical usages ("Can we break it?"). See also
Verification and Validation

Program verification
Program verification is the process of formally proving that a computer program
does exactly what is stated in the program specification it was written to realize. This is a
type of formal verification which is specifically aimed at verifying the code itself, not an
abstract model of the program.
For functional programming languages, some programs can be verified by equational
reasoning, usually together with induction. Code in an imperative language could be
proved correct by use of Hoare logic.

Fuzz testing
Fuzz testing or fuzzing is a software testing technique that provides random data
("fuzz") to the inputs of a program. If the program fails (for example, by crashing, or by
failing built-in code assertions), the defects can be noted.

The great advantage of fuzz testing is that the test design is extremely simple, and
free of preconceptions about system behavior. Fuzz testing was developed at the
University of Wisconsin-Madison in 1989 by Professor Barton Miller and the students in
his graduate Advanced Operating Systems class.

Contents
• 1 Uses
• 2 Fuzz testing methods
o 2.1 Advantages and disadvantages
o 2.2 Event-driven fuzz
o 2.3 Character-driven fuzz

o 2.4 Database fuzz

Uses
Fuzz testing is often used in large software development projects that perform
black box testing. These usually have a budget to develop test tools, and fuzz testing is
one of the techniques which offers a high benefit to cost ratio.

Fuzz testing is also used as a gross measurement of a large software system's


quality. The advantage here is that the cost of generating the tests is relatively low. For
example, third party testers have used fuzz testing to evaluate the relative merits of
different operating systems and application programs.

Fuzz testing is thought to enhance software security and software safety because it
often finds odd oversights and defects which human testers would fail to find, and even
careful human test designers would fail to create tests for.

However, fuzz testing is not a substitute for exhaustive testing or formal methods:
it can only provide a random sample of the system's behavior, and in many cases passing
a fuzz test may only demonstrate that a piece of software handles exceptions without
crashing, rather than behaving correctly. Thus, fuzz testing can only be regarded as a bug-
finding tool rather than an assurance of quality.

Fuzz testing methods


As a practical matter, developers need to reproduce errors in order to fix them. For
this reason, almost all fuzz testing makes a record of the data it manufactures, usually
before applying it to the software, so that if the computer fails dramatically, the test data
is preserved. If the fuzz stream is pseudo-random number generated it may be easier to
store the seed value to reproduce the fuzz attempt.

Modern software has several different types of inputs:

• Event driven inputs are usually from a graphical user interface, or possibly from a
mechanism in an embedded system.
• Character driven inputs are from files, or data streams such as sockets.
• Database inputs are from tabular data, such as relational databases.
• Inherited program state such as environment variables

There are at least two different forms of fuzz testing:

• Valid fuzz attempts to assure that the random input is reasonable, or conforms to
actual production data.
• Simple fuzz usually uses a pseudo random number generator to provide input.
• A combined approach uses valid test data with some proportion of totally random
input injected.

By using all of these techniques in combination, fuzz-generated randomness can test


the un-designed behavior surrounding a wider range of designed system states.

Fuzz testing may use tools to simulate all of these domains.

Advantages and disadvantages

The main problem with fuzzing to find program faults is that it generally only
finds very simple faults. The problem itself is exponential and every fuzzer takes
shortcuts to find something interesting in a timeframe that a human cares about. A
primitive fuzzer may have poor code coverage; for example, if the input includes a
checksum which is not properly updated to match other random changes, only the
checksum validation code will be verified. Code coverage tools are often used to estimate
how "well" a fuzzer works, but these are only guidelines to fuzzer quality. Every fuzzer
can be expected to find a different set of bugs.

On the other hand, bugs found using fuzz testing are frequently severe,
exploitable bugs that could be used by a real attacker. This has become even more true as
fuzz testing has become more widely known, as the same techniques and tools are now
used by attackers to exploit deployed software. This is a major advantage over binary or
source auditing, or even fuzzing's close cousin, fault injection, which often relies on
artificial fault conditions that are difficult or impossible to exploit.

Event-driven fuzz

Normally this is provided as a queue of datastructures. The queue is filled with


data structures that have random values.

The most common problem with an event-driven program is that it will often
simply use the data in the queue, without even crude validation. To succeed in a fuzz-
tested environment, software must validate all fields of every queue entry, decode every
possible binary value, and then ignore impossible requests.

One of the more interesting issues with real-time event handling is that if error
reporting is too verbose, simply providing error status can cause resource problems or a
crash. Robust error detection systems will report only the most significant, or most recent
error over a period of time.

Character-driven fuzz

Normally this is provided as a stream of random data. The classic source in UNIX
is the random data generator.

One common problem with a character driven program is a buffer overrun, when
the character data exceeds the available buffer space. This problem tends to recur in every
instance in which a string or number is parsed from the data stream and placed in a
limited-size area.

Another is that decode tables or logic may be incomplete, not handling every
possible binary value.

Database fuzz

The standard database scheme is usually filled with fuzz that is random data of
random sizes. Some IT shops use software tools to migrate and manipulate such
databases. Often the same schema descriptions can be used to automatically generate fuzz
databases.

Database fuzz is controversial, because input and comparison constraints reduce


the invalid data in a database. However, often the database is more tolerant of odd data
than its client software, and a general-purpose interface is available to users. Since major
customer and enterprise management software is starting to be open-source, database-
based security attacks are becoming more credible.
A common problem with fuzz databases is buffer overflow. A common data
dictionary, with some form of automated enforcement is quite helpful and entirely
possible. To enforce this, normally all the database clients need to be recompiled and
retested at the same time. Another common problem is that database clients may not
understand the binary possibilities of the database field type, or, legacy software might
have been ported to a new database system with different possible binary values. A
normal, inexpensive solution is to have each program validate database inputs in the same
fashion as user inputs. The normal way to achieve this is to periodically "clean"
production databases with automated verifiers.

White noise Integration testing


Integration testing (sometimes called Integration and Testing, abbreviated I&T)
is the phase of software testing in which individual software modules are combined and
tested as a group. It follows unit testing and precedes system testing.

Integration testing takes as its input modules that have been unit tested, groups
them in larger aggregates, applies tests defined in an integration test plan to those
aggregates, and delivers as its output the integrated system ready for system testing.

Contents
• 1 Purpose

• 2 Limitations

Purpose
The purpose of integration testing is to verify functional, performance and
reliability requirements placed on major design items. These "design items", i.e.
assemblages (or groups of units), are exercised through their interfaces using black box
testing, success and error cases being simulated via appropriate parameter and data
inputs. Simulated usage of shared data areas and inter-process communication is tested
and individual subsystems are exercised through their input interface. Test cases are
constructed to test that all components within assemblages interact correctly, for example
across procedure calls or process activations, and this is done after testing individual
modules, i.e. unit testing.

The overall idea is a "building block" approach, in which verified assemblages are
added to a verified base which is then used to support the integration testing of further
assemblages.
The different types of integration testing are big bang, top-down, bottom-up, and
back bone.

Big Bang: In this approach, all or most of the developed modules are coupled
together to form a complete software system or major part of the system and then used
for integration testing. The Big Bang method is very effective for saving time in the
integration testing process. However, if the test cases and their results are not recorded
properly, the entire integration process will be more complicated and may prevent the
testing team from achieving the goal of integration testing.

Bottom Up: All the bottom or low level modules, procedures or functions are
integrated and then tested. After the integration testing of lower level integrated modules,
the next level of modules will be formed and can be used for integration testing. This
approach is helpful only when all or most of the modules of the same development level
are ready. This method also helps to determine the levels of software developed and
makes it easier to report testing progress in the form of a percentage.

Limitations
Any conditions not stated in specified integration tests, outside of the confirmation of the
execution of design items, will generally not be tested. Integration tests can not include
system-wide (end-to-end) change testing.
Software testing Test case
In software engineering, the most common definition of a test case is a set of
conditions or variables under which a tester will determine if a requirement or use case
upon an application is partially or fully satisfied. It may take many test cases to determine
that a requirement is fully satisfied. In order to fully test that all the requirements of an
application are met, there must be at least one test case for each requirement unless a
requirement has sub requirements. In that situation, each sub requirement must have at
least one test case. This is frequently done using a Traceability matrix. Some
methodologies, like RUP, recommend creating at least two test cases for each
requirement. One of them should perform positive testing of requirement and other
should perform negative testing. Written test cases should include a description of the
functionality to be tested, and the preparation required to ensure that the test can be
conducted.

If the application is created without formal requirements, then test cases can be
written based on the accepted normal operation of programs of a similar class. In some
schools of testing, test cases are not written at all but the activities and results are
reported after the tests have been run.

What characterizes a formal, written test case is that there is a known input and an
expected output, which is worked out before the test is executed. The known input should
test a precondition and the expected output should test a postcondition.

Under special circumstances, there could be a need to run the test, produce results,
and then a team of experts would evaluate if the results can be considered as a pass. This
happens often on new products' performance number determination. The first test is taken
as the base line for subsequent test / product release cycles.

Written test cases are usually collected into Test suites.

A variation of test cases are most commonly used in acceptance testing.


Acceptance testing is done by a group of end-users or clients of the system to ensure the
developed system meets their requirements. User acceptance testing is usually
differentiated by the inclusion of happy path or positive test cases.

Structure of test case


Formal, written test cases consist of three main parts with subsections:

• Information contains general information about Test case.


o Identifier is unique identifier of test case for further references, for
example, while describing found defect.
o Test case owner/creator is name of tester or test designer, who created
test or is responsible for its development
o Version of current Test case definition
o Name of test case should be human-oriented title which allows to quickly
understand test case purpose and scope.
o Identifier of the requirement which is covered by the test case. Also here
could be an identifier of a use case or a functional specification item.
o Purpose contains short description of test purpose, what functionality it
checks.
o Dependencies
• Test case activity
o Testing environment/configuration contains information about
configuration of hardware or software which must be met while executing
test case
o Initialization describes actions, which must be performed before test case
execution is started. For example, we should open some file.
o Finalization describes actions to be done after test case is performed. For
example if test case crashes database, tester should restore it before other
test cases will be performed.
o Actions step by step to be done to complete test.
o Input data description
• Results
o Expected results contains description of what tester should see after all
test steps has been completed
o Actual results contains a brief description of what the tester saw after the
test steps has been completed. This is often replaced with a Pass/Fail.
Quite often if a test case fails, reference to the defect involved should be
listed in this column.

Not all written tests require all of these sections. However, the bare bones of a test can be
reduced to three essential steps:

• Establish the preconditions


• Exercise the item under test
• Verify the postconditions

It is important to note that if the preconditions cannot be established, the item cannot
be tested according to its software requirements specification and the test must not
proceed.

Verifying the postcondition is equivalent to establishing that the actual results are as
expected.

Note that several tests may need to be run to challenge the postcondition. For
example, to test a user login routine would need at least a case of a known username-
password pair and a second case of an unknown username-password pair.

• System testing and the other is Unit testing


Traceability matrix
In a software development process, a Traceability Matrix is a table that
correlates any two baselined documents that require a many to many relationship to
determine the completeness of the relationship. It is often used with high-level
requirements (sometimes known as Marketing Requirements) and detailed requirements
of the software product to the matching parts of high-level design, detailed Design, test
plan, and test cases.

Common usage is to take the identifier for each of the items of one document and
place them in the left column. The identifiers for the other document are placed across the
top row. When an item in the left column is related to an item across the top, a mark is
placed in the intersecting cell. The number of relationships are added up for each row and
each column. This value indicates the mapping of the two items. Zero values indicate that
no relationship exists and that one must be made. Large values imply that the item is too
complex and should be simplified.

To ease with the creation of traceability matrices, it is advisable to add the


relationships to the source documents for both backward traceability and forward
traceability. In other words, when an item is changed in one baselined document, it's easy
to see what needs to be changed in the other.
Sample traceability matrix

Requireme REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1
reqs
nt
tested UC UC UC UC UC UC UC UC UC UC UC TECH TECH TECH
Identifiers
1.0 1.2 1.3 2.1 2.2 2.3.1 2.3.2 2.3.3 2.4 3.1 3.2 1.1 1.2 1.3

Test Cases 21 3 2 3 1 1 1 1 1 1 2 3 1 1 1
tested implicitly 0
1 x
1.1.1
2 x x
1.1.2
2 x x
1.1.3
1 x
1.1.4
2 x x
1.1.5
1 x
1.1.6
1 x
1.1.7
2 X x
1.2.1
2 x x
1.2.2
2 x x
1.2.3
1 x
1.3.1
1 x
1.3.2
1 x
1.3.3
1 x
1.3.4
1 x
1.3.5
etc…
1 x
5.6.2
Unit testing
In computer programming, unit testing is a procedure used to validate that
individual units of source code are working properly. A unit is the smallest testable part of
an application. In procedural programming a unit may be an individual program,
function, procedure etc, while in object-oriented programming, the smallest unit is always
a Class; which may be a base/super class, abstract class or derived/child class. Units are
distinguished from modules in that modules are typically made up of units.

Ideally, each test case is independent from the others; mock objects and test
harnesses can be used to assist testing a module in isolation. Unit testing is typically done
by the developers and not by end-users.

Contents
• 1 Benefit
o 1.1 Facilitates change
o 1.2 Simplifies integration
o 1.3 Documentation
o 1.4 Separation of interface from implementation
• 2 Limitations of unit testing
• 3 Applications
o 3.1 Extreme Programming
o 3.2 Techniques

o 3.3 Unit testing frameworks

Benefit
The goal of unit testing is to isolate each part of the program and show that the
individual parts are correct. A unit test provides a strict, written contract that the piece of
code must satisfy. As a result, it affords several benefits.

Facilitates change

Unit testing allows the programmer to refactor code at a later date, and make sure
the module still works correctly (i.e. regression testing). The procedure is to write test
cases for all functions and methods so that whenever a change causes a fault, it can be
quickly identified and fixed.

Readily-available unit tests make it easy for the programmer to check whether a
piece of code is still working properly. Good unit test design produces test cases that
cover all paths through the unit with attention paid to loop conditions.
In continuous unit testing environments, through the inherent practice of sustained
maintenance, unit tests will continue to accurately reflect the intended use of the
executable and code in the face of any change. Depending upon established development
practices and unit test coverage, up-to-the-second accuracy can be maintained.

Simplifies integration

Unit testing helps to eliminate uncertainty in the units themselves and can be used
in a bottom-up testing style approach. By testing the parts of a program first and then
testing the sum of its parts, integration testing becomes much easier.

A heavily debated matter exists in assessing the need to perform manual


integration testing. While an elaborate hierarchy of unit tests may seem to have achieved
integration testing, this presents a false sense of confidence since integration testing
evaluates many other objectives that can only be proven through the human factor. Some
argue that given a sufficient variety of test automation systems, integration testing by a
human test group is unnecessary. Realistically, the actual need will ultimately depend
upon the characteristics of the product being developed and its intended uses.
Additionally, the human or manual testing will greatly depend on the availability of
resources in the organization.

Documentation

Unit testing provides a sort of "living document". Clients and other developers
looking to learn how to use the module can look at the unit tests to determine how to use
the module to fit their needs and gain a basic understanding of the API.

Unit test cases embody characteristics that are critical to the success of the unit.
These characteristics can indicate appropriate/inappropriate use of a unit as well as
negative behaviors that are to be trapped by the unit. A unit test case, in and of itself,
documents these critical characteristics, although many software development
environments do not rely solely upon code to document the product in development.

On the other hand, ordinary narrative documentation is more susceptible to


drifting from the implementation of the program and will thus become outdated (e.g.
design changes, feature creep, relaxed practices to keep documents up to date).

Separation of interface from implementation

Because some classes may have references to other classes, testing a class can
frequently spill over into testing another class. A common example of this is classes that
depend on a database: in order to test the class, the tester often writes code that interacts
with the database. This is a mistake, because a unit test should never go outside of its own
class boundary. As a result, the software developer abstracts an interface around the
database connection, and then implements that interface with their own mock object. By
abstracting this necessary attachment from the code (temporarily reducing the net
effective coupling), the independent unit can be more thoroughly tested than may have
been previously achieved. This results in a higher quality unit that is also more
maintainable.

Limitations of unit testing


Unit testing will not catch every error in the program. By definition, it only tests
the functionality of the units themselves. Therefore, it will not catch integration errors,
performance problems or any other system-wide issues. In addition, it may not be easy to
anticipate all special cases of input the program unit under study may receive in reality.
Unit testing is only effective if it is used in conjunction with other software testing
activities.

It is unrealistic to test all possible input combinations for any non-trivial piece of
software. Like all forms of software testing, unit tests can only show the presence of
errors; it cannot show the absence of errors.

To obtain the intended benefits from unit-testing, a rigorous sense of discipline is


needed throughout the software development process. It is essential to keep careful
records, not only of the tests that have been performed, but also of all changes that have
been made to the source-code of this or any other unit in the software. Use of a version
control system is essential; If a later version of the unit fails a particular test that it had
previously passed, the version-control software can provide list of the source-code
changes (if any) that have been applied to the unit since that time.

Applications
Extreme Programming

The cornerstone of Extreme Programming (XP) is the unit test. XP relies on an


automated unit testing framework. This automated unit testing framework can be either
third party, e.g. xUnit, or created within the development group.

Extreme Programming uses the creation of unit tests for test-driven development.
The developer writes a unit test that exposes either a software requirement or a defect.
This test will fail because either the requirement isn't implemented yet, or because
it intentionally exposes a defect in the existing code. Then, the developer writes the
simplest code to make the test, along with other tests, pass.

All classes in the system are unit tested. Developers release unit testing code to
the code repository in conjunction with the code it tests. XP's thorough unit testing allows
the benefits mentioned above, such as simpler and more confident code development and
refactoring, simplified code integration, accurate documentation, and more modular
designs. These unit tests are also constantly run as a form of regression test.
Techniques

Unit testing is commonly automated, but may still be performed manually. The
IEEE[1] does not favor one over the other. A manual approach to unit testing may employ
a step-by-step instructional document. Nevertheless, the objective in unit testing is to
isolate a unit and validate its correctness. Automation is efficient for achieving this, and
enables the many benefits listed in this article. Conversely, if not planned carefully, a
careless manual unit test case may execute as an integration test case that involves many
software components, and thus preclude the achievement of most if not all of the goals
established for unit testing.

Under the automated approach, to fully realize the effect of isolation, the unit or
code body subjected to the unit test is executed within a framework outside of its natural
environment, that is, outside of the product or calling context for which it was originally
created. Testing in an isolated manner has the benefit of revealing unnecessary
dependencies between the code being tested and other units or data spaces in the product.
These dependencies can then be eliminated.

Using an automation framework, the developer codes criteria into the test to
verify the correctness of the unit. During execution of the test cases, the framework logs
those that fail any criterion. Many frameworks will also automatically flag and report in a
summary these failed test cases. Depending upon the severity of a failure, the framework
may halt subsequent testing.

As a consequence, unit testing is traditionally a motivator for programmers to


create decoupled and cohesive code bodies. This practice promotes healthy habits in
software development. Design patterns, unit testing, and refactoring often work together
so that the most ideal solution may emerge.

Unit testing frameworks

Unit testing frameworks, which help simplify the process of unit testing, have
been developed for a wide variety of languages. It is generally possible to perform unit
testing without the support of specific framework by writing client code that exercises the
units under test and uses assertion, exception, or early exit mechanisms to signal failure.
This approach is valuable in that there is a negligible barrier to the adoption of
unit testing. However, it is also limited in that many advanced features of a proper
framework are missing or must be hand-coded.
Developer Black box testing
Black box testing takes an external perspective of the test object to derive test
cases. These tests can be functional or non-functional, though usually functional. The test
designer selects valid and invalid input and determines the correct output. There is no
knowledge of the test object's internal structure.

This method of test design is applicable to all levels of software testing: unit,
integration, functional testing, system and acceptance. The higher the level, and hence the
bigger and more complex the box, the more one is forced to use black box testing to
simplify. While this method can uncover unimplemented parts of the specification, one
cannot be sure that all existent paths are tested.

Contents
• 1 Test design techniques
• 2 User input validation

• 3 Hardware

Test design techniques


Typical black box test design techniques include:

• Equivalence partitioning
• Boundary value analysis
• Decision table testing
• Pairwise testing
• State transition tables
• Use case testing
• Cross-functional testing

User input validation


User input must be validated to conform to expected values. For example, if the
software program is requesting input on the price of an item, and is expecting a value
such as 3.99, the software must check to make sure all invalid cases are handled. A user
could enter the price as "-1" and achieve results contrary to the design of the program.
Other examples of entries that could be entered and cause a failure in the software
include: "1.20.35", "Abc", "0.000001", and "999999999". These are possible test
scenarios that should be entered for each point of user input.
Other domains, such as text input, need to restrict the length of the characters that
can be entered. If a program allocates 30 characters of memory space for a name, and the
user enters 50 characters, a buffer overflow condition can occur.

Typically when invalid user input occurs, the program will either correct it
automatically, or display a message to the user that their input needs to be corrected
before proceeding.

Hardware
Functional testing devices like power supplies, amplifiers, and many other simple
function electrical devices is common in the electronics industry. Automated functional
testing of specified characteristics is used for production testing, and part of design
validation.

Equivalence partitioning
Equivalence partitioning is a software testing related technique with the goal:

1. To reduce the number of test cases to a necessary minimum.


2. To select the right test cases to cover all possible scenarios.

Although in rare cases equivalence partitioning is also applied to outputs of a


software component, typically it is applied to the inputs of a tested component. The
equivalence partitions are usually derived from the specification of the component's
behaviour. An input has certain ranges which are valid and other ranges which are
invalid. This may be best explained at the following example of a function which has the
pass parameter "month" of a date. The valid range for the month is 1 to 12, standing for
January to December. This valid range is called a partition. In this example there are two
further partitions of invalid ranges. The first invalid partition would be <= 0 and the
second invalid partition would be >= 13.

.... -2 -1 0 1 .............. 12 13 14 15 .....


--------------|-------------------|---------------------
invalid partition 1 valid partition invalid partition 2

Equivalence partitioning is no stand alone method to determine test cases. It has


to be supplemented by boundary value analysis. Having determined the partitions of
possible inputs the method of boundary value analysis has to be applied to select the most
effective test cases out of these partitions.

Contents
• 1 The Theory
• 2 Black Box vs. White Box

• 3 Types of Equivalence Classes

The Theory
The testing theory related to equivalence partitioning says that only one test case
of each partition is needed to evaluate the behaviour of the program for the related
partition. In other words it is sufficient to select one test case out of each partition to
check the behaviour of the program. To use more or even all test cases of a partition will
not find new faults in the program. The values within one partition are considered to be
"equivalent". Thus the number of test cases can be reduced considerably.

An additional effect by applying this technique is that you also find the so called
"dirty" test cases. An unexperienced tester may be tempted to use as test cases the input
data 1 to 12 for the month and forget to select some out of the invalid partitions. This
would lead to a huge number of unnecessary test cases on the one hand, and a lack of test
cases for the dirty ranges on the other hand.

Black Box vs. White Box


The tendency is to relate equivalence partitioning to the so called black box
testing which is strictly checking a software component at its interface, without
consideration of internal structures of the software. But having a closer look on the
subject there are cases where it applies to the white box testing as well. Imagine an
interface to a component which has a valid range between 1 and 12 like in the example
above. However internally the function may have a differentiation of values between 1
and 6 and the values between 7 and 12. Depending on the input value the software
internally will run through different paths to perform slightly different actions. Regarding
the input and output interfaces to the component this difference will not be noticed,
however in your white-box testing you would like to make sure that both paths are
examined. To achieve this it is necessary to introduce additional equivalence partitions
which would not be needed for black-box testing. For this example this would be:

.... -2 -1 0 1 ..... 6 7 ..... 12 13 14 15 .....


--------------|---------|----------|---------------------
invalid partition 1 P1 P2 invalid partition 2
valid partitions

To check for the expected results you would need to evaluate some internal
intermediate values rather than the output interface.
Types of Equivalence Classes
• Continuous classes run from one point to another, with no clear separations of
values. An example is a temperature range.

• Discrete classes have clear separation of values. Discrete classes are sets, or
enumerations.

• Boolean classes are either true or false. Boolean classes only have two values,
either true or false, on or off, yes or no. An example is whether a checkbox is
checked or unchecked

Boundary value analysis


Boundary value analysis is a software testing design technique to determine test
cases covering off-by-one errors. The boundaries of software component input ranges are
areas of frequent problems.

Introduction
Testing experience has shown that especially the boundaries of input ranges to a
software component are liable to defects. A programmer who has to implement e.g. the
range 1 to 12 at an input, which e.g. stands for the month January to December in a date,
has in his code a line checking for this range. This may look like:

if (month > 0 && month < 13)

But a common programming error may check a wrong range e.g. starting the
range at 0 by writing:

if (month >= 0 && month < 13)

For more complex range checks in a program this may be a problem which is not
so easily spotted as in the above simple example.

Applying boundary value analysis


To set up boundary value analysis test cases you first have to determine which
boundaries you have at the interface of a software component. This has to be done by
applying the equivalence partitioning technique. Boundary value analysis and
equivalence partitioning are inevitably linked together. For the example of the month in a
date you would have the following partitions:
... -2 -1 0 1 .............. 12 13 14 15 .....
--------------|-------------------|---------------------
invalid partition 1 valid partition invalid partition 2

Applying boundary value analysis you have to select now a test case at each side
of the boundary between two partitions. In the above example this would be 0 and 1 for
the lower boundary as well as 12 and 13 for the upper boundary. Each of these pairs
consists of a "clean" and a "dirty" test case. A "clean" test case should give you a valid
operation result of your program. A "dirty" test case should lead to a correct and specified
input error treatment such as the limiting of values, the usage of a substitute value, or in
case of a program with a user interface, it has to lead to warning and request to enter
correct data. The boundary value analysis can have 6 testcases.n, n-1,n+1 for the upper
limit and n, n-1,n+1 for the lower limit.

A further set of boundaries has to be considered when you set up your test cases.
A solid testing strategy also has to consider the natural boundaries of the data types used
in the program. If you are working with signed values this is especially the range around
zero (-1, 0, +1). Similar to the typical range check faults, programmers tend to have
weaknesses in their programs in this range. e.g. this could be a division by zero problem
where a zero value may occur although the programmer always thought the range started
at 1. It could be a sign problem when a value turns out to be negative in some rare cases,
although the programmer always expected it to be positive. Even if this critical natural
boundary is clearly within an equivalence partition it should lead to additional test cases
checking the range around zero. A further natural boundary is the natural lower and upper
limit of the data type itself. E.g. an unsigned 8-bit value has the range of 0 to 255. A good
test strategy would also check how the program reacts at an input of -1 and 0 as well as
255 and 256.

The tendency is to relate boundary value analysis more to the so called black box
testing which is strictly checking a software component at its interfaces, without
consideration of internal structures of the software. But looking closer at the subject,
there are cases where it applies also to white box testing.

After determining the necessary test cases with equivalence partitioning and
subsequent boundary value analysis, it is necessary to define the combinations of the test
cases when there are multiple inputs to a software component.
Decision Table
Decision tables are a precise yet compact way to model complicated logic.
Decision tables, like if-then-else and switch-case statements, associate conditions with
actions to perform. But, unlike the control structures found in traditional programming
languages, decision tables can associate many independent conditions with several
actions in an elegant way.

Contents
• 1 Structure
• 2 Example

• 3 Software engineering benefits

Structure
Decision tables are typically divided into four quadrants, as shown below.

The four quadrants


Conditions Condition alternatives
Actions Action entries

Each decision corresponds to a variable, relation or predicate whose possible


values are listed among the condition alternatives. Each action is a procedure or operation
to perform, and the entries specify whether (or in what order) the action is to be
performed for the set of condition alternatives the entry corresponds to. Many decision
tables include in their condition alternatives the don't care symbol, a hyphen. Using don't
cares can simplify decision tables, especially when a given condition has little influence
on the actions to be performed. In some cases, entire conditions thought to be important
initially are found to be irrelevant when none of the conditions influence which actions
are performed.

Aside from the basic four quadrant structure, decision tables vary widely in the
way the condition alternatives and action entries are represented. Some decision tables
use simple true/false values to represent the alternatives to a condition (akin to if-then-
else), other tables may use numbered alternatives (akin to switch-case), and some tables
even use fuzzy logic or probabilistic representations for condition alternatives. In a
similar way, action entries can simply represent whether an action is to be performed
(check the actions to perform), or in more advanced decision tables, the sequencing of
actions to perform (number the actions to perform).
Example
The limited-entry decision table is the simplest to describe. The condition
alternatives are simple boolean values, and the action entries are check-marks,
representing which of the actions in a given column are to be performed.

A technical support company writes a decision table to diagnose printer problems


based upon symptoms described to them over the phone from their clients.

Printer does not print Y Y Y Y N N N N


Conditions A red light is flashing Y Y N N Y Y N N
Printer is unrecognized Y N Y N Y N Y N
Check the power cable X
Check the printer-computer cable X X
Actions Ensure printer software is installed X X X X
Check/replace ink X X X X
Check for paper jam X X

Of course, this is just a simple example,it demonstrates how decision tables can
scale to several conditions with many possibilities.

Software engineering benefits


Decision tables make it easy to observe that all possible conditions are accounted
for. In the example above, every possible combination of the three conditions is given. In
decision tables, when conditions are omitted, it is obvious even at a glance that logic is
missing. Compare this to traditional control structures, where it is not easy to notice gaps
in program logic with a mere glance --- sometimes it is difficult to follow which
conditions correspond to which actions!

Just as decision tables make it easy to audit control logic, decision tables demand
that a programmer think of all possible conditions. With traditional control structures, it is
easy to forget about corner cases, especially when the else statement is optional. Since
logic is so important to programming, decision tables are an excellent tool for designing
control logic. In one incredible anecdote, after a failed 6 man-year attempt to describe
program logic for a file maintenance system using flow charts, four people solved the
problem using decision tables in just four weeks. Choosing the right tool for the problem
is fundamental.
System testing
System testing is testing conducted on a complete, integrated system to evaluate
the system's compliance with its specified requirements. System testing falls within the
scope of black box testing, and as such, should require no knowledge of the inner design
of the code or logic. [1]

As a rule, system testing takes, as its input, all of the "integrated" software
components that have successfully passed integration testing and also the software
system itself integrated with any applicable hardware system(s). The purpose of
integration testing is to detect any inconsistencies between the software units that are
integrated together (called assemblages) or between any of the assemblages and the
hardware. System testing is a more limiting type of testing; it seeks to detect defects both
within the "inter-assemblages" and also within the system as a whole.

Contents
• 1 Testing the whole system
• 2 Types of system testing

Testing the whole system


System testing is actually done to the entire system against the Functional
Requirement Specification(s) (FRS) and/or the System Requirement Specification (SRS).
Moreover, the system testing is an investigatory testing phase, where the focus is to have
almost a destructive attitude and test not only the design, but also the behaviour and even
the believed expectations of the customer. It is also intended to test up to and beyond the
bounds defined in the software/hardware requirements specification(s).

One could view System testing as the final destructive testing phase before user
acceptance testing.
Types of system testing
The following examples are different types of testing that should be considered
during System testing:

• User interface testing


• Usability testing
• Performance testing
• Compatibility testing
• Error handling testing
• Load testing
• Volume testing
• Stress testing
• User help testing
• Security testing
• Capacity testing
• Sanity testing
• Smoke testing
• Exploratory testing
• Adhoc testing
• Regression testing
• Reliability testing
• Recovery testing
• Installation testing
• Maintenance testing
• Accessibility testing, including compliance with:
o Americans with Disabilities Act of 1990
o Section 508 Amendment to the Rehabilitation Act of 1973
o Web Accessibility Initiative (WAI) of the World Wide Web Consortium
(W3C)

Although different testing organizations may prescribe different tests as part of


System testing, this list serves as a general framework or foundation to begin with.
Usability testing
Usability testing is a means for measuring how well people can use some human-
made object (such as a web page, a computer interface, a document, or a device) for its
intended purpose, i.e. usability testing measures the usability of the object. Usability
testing focuses on a particular object or a small set of objects, whereas general human-
computer interaction studies attempt to formulate universal principles.

If usability testing uncovers difficulties, such as people having difficulty


understanding instructions, manipulating parts, or interpreting feedback, then developers
should improve the design and test it again. During usability testing, the aim is to observe
people using the product in as realistic a situation as possible, to discover errors and areas
of improvement. Designers commonly focus excessively on creating designs that look
"cool", compromising usability and functionality. This is often caused by pressure from
the people in charge, forcing designers to develop systems based on management
expectations instead of people's needs. A designer's primary function should be more than
appearance, including making things work with people.

Simply gathering opinions on an object or document is market research, rather


than usability testing. Usability testing usually involves a controlled experiment to
determine how well people can use the product. 1

Rather than showing users a rough draft and asking, "Do you understand this?",
usability testing involves watching people trying to use something for its intended
purpose. For example, when testing instructions for assembling a toy, the test subjects
should be given the instructions and a box of parts. Instruction phrasing, illustration
quality, and the toy's design all affect the assembly process.

Setting up a usability test involves carefully creating a scenario, or realistic


situation, wherein the person performs a list of tasks using the product being tested while
observers watch and take notes. Several other test instruments such as scripted
instructions, paper prototypes, and pre- and post-test questionnaires are also used to
gather feedback on the product being tested. For example, to test the attachment function
of an e-mail program, a scenario would describe a situation where a person needs to send
an e-mail attachment, and ask him or her to undertake this task. The aim is to observe
how people function in a realistic manner, so that developers can see problem areas, and
what people like. Techniques popularly used to gather data during a usability test include
think aloud protocol and eye tracking.

Hallway testing (or hallway usability testing) is a specific methodology of


software usability testing. Rather than using an in-house, trained group of testers, just five
to six random people, indicative of a cross-section of end users, are brought in to test the
software (be it an application, web site, etc.); the name of the technique refers to the fact
that the testers should be random people who pass by in the hallway. The theory, as
adopted from Jakob Nielsen's research, is that 95% of usability problems can be
discovered using this technique.

Contents
• 1 What to measure
• 2 See also

• 3 External links

What to measure
Usability testing generally involves measuring how well test subjects respond in four
areas: time, accuracy, recall, and emotional response. The results of the first test can be
treated as a baseline or control measurement; all subsequent tests can then be compared
to the baseline to indicate improvement.

• Time on Task -- How long does it take people to complete basic tasks? (For
example, find something to buy, create a new account, and order the item.)
• Accuracy -- How many mistakes did people make? (And were they fatal or
recoverable with the right information?)
• Recall -- How much does the person remember afterwards or after periods of non-
use?
• Emotional Response -- How does the person feel about the tasks completed?
(Confident? Stressed? Would the user recommend this system to a friend?)

In the early 1990s, Jakob Nielsen, at that time a researcher at Sun Microsystems,
popularized the concept of using numerous small usability tests -- typically with only five
test subjects each -- at various stages of the development process. His argument is that,
once it is found that two or three people are totally confused by the home page, little is
gained by watching more people suffer through the same flawed design. "Elaborate
usability tests are a waste of resources. The best results come from testing no more than 5
users and running as many small tests as you can afford." 2. Nielsen subsequently
published his research and coined the term heuristic evaluation.

The claim of "Five users is enough" was later described by a mathematical model
(Virzi, R.A., Refining the Test Phase of Usability Evaluation: How Many Subjects is
Enough? Human Factors, 1992. 34(4): p. 457-468.) which states for the proportion of
uncovered problems U

U = 1 − (1 − p)n
where p is the probability of one subject identifying a specific problem and n the number
of subjects (or test sessions). This model shows up as an asymptotic graph towards the
number of real existing problems (see figure below).

In later research Nielsen's claim has eagerly been questioned with both empirical
evidence 3 and more advanced mathematical models (Caulton, D.A., Relaxing the
homogeneity assumption in usability testing. Behaviour & Information Technology,
2001. 20(1): p. 1-7.). Two of the key challeges to this assertion are: (1) since usability is
related to the specific set of users, such a small sample size is unlikely to be
representative of the total population so the data from such a small sample is more likely
to reflect the sample group than the population they may represent and (2) many usability
problems encountered in testing are likely to prevent exposure of other usability
problems, making it impossible to predict the percentage of problems that can be
uncovered without knowing the relationship between existing problems. Most researchers
today agree that, although 5 users can generate a significant amount of data at any given
point in the development cycle, in many applications a sample size larger than five is
required to detect a satisfying amount of usability problems.

Bruce Tognazzini advocates close-coupled testing: "Run a test subject through the
product, figure out what's wrong, change it, and repeat until everything works. Using this
technique, I've gone through seven design iterations in three-and-a-half days, testing in
the morning, changing the prototype at noon, testing in the afternoon, and making more
elaborate changes at night." 4 This testing can be useful in research situations.
Load testing
Load testing is the process of creating demand on a system or device and
measuring its response.

In mechanical systems it refers to the testing of a system to certify it under the


appropriate regulations (LOLER in the UK - Lifting Operations and Lifting Equipment
Regulations). Load testing is usually carried out to a load 1.5x the SWL (Safe Working
Load) periodic recertification is required.

In software engineering it is a blanket term that is used in many different ways


across the professional software testing community.

Load testing generally refers to the practice of modeling the expected usage of a
software program by simulating multiple users accessing the program's services
concurrently. As such, this testing is most relevant for multi-user systems, often one built
using a client/server model, such as web servers. However, other types of software
systems can be load-tested also. For example, a word processor or graphics editor can be
forced to read an extremely large document; or a financial package can be forced to
generate a report based on several years' worth of data. The most accurate load testing
occurs with actual, rather than theoretical, results.

When the load placed on the system is raised beyond normal usage patterns, in
order to test the system's response at unusually high or peak loads, it is known as stress
testing. The load is usually so great that error conditions are the expected result, although
no clear boundary exists when an activity ceases to be a load test and becomes a stress
test.

There is little agreement on what the specific goals of load testing are. The term is
often used synonymously with performance testing, reliability testing, and volume
testing.

Volume testing
Volume Testing belongs to the group of non-functional tests, which are often
misunderstood and/or used interchangeably. Volume testing refers to testing a software
application for a certain data volume. This volume can in generic terms be the database
size or it could also be the size of an interface file that is the subject of volume testing.
For example, if you want to volume test your application with a specific database size,
you will explode your database to that size and then test the application's performance on
it. Another example could be when there is a requirement for your application to interact
with an interface file (could be any file such as .dat, .xml); this interaction could be
reading and/or writing on to/from the file. You will create a sample file of the size you
want and then test the application's functionality with that file to check performance.
Stress testing
Stress testing is a form of testing that is used to determine the stability of a given
system or entity. It involves testing beyond normal operational capacity, often to a
breaking point, in order to observe the results. Stress testing may have a more specific
meaning in certain industries.

Contents
• 1 IT industry
• 2 Medicine

• 3 Financial sector

IT industry
In software testing, stress testing often refers to tests that put a greater emphasis
on robustness, availability, and error handling under a heavy load, than on what would be
considered correct behavior under normal circumstances. In particular, the goals of such
tests may be to ensure the software doesn't crash in conditions of insufficient
computational resources (such as memory or disk space), unusually high concurrency, or
denial of service attacks.

Examples:

• A web server may be stress tested using scripts, bots, and various denial of service
tools to observe the performance of a web site during peak loads.

Medicine
• A Cardiac stress test is used most commonly to detect marked imbalances in
blood flow to the heart muscle.

Financial sector
• Instead of doing financial projection on a "best estimate" basis, a company may
do stress testing where they look at how robust a financial instrument is in certain
crashes. They may test the instrument under, for example, the following stresses:
o What happens if the market crashes by more than x% this year?
o What happens if interest rates go up by at least y%?
o What if half the instruments in the portfolio terminate their contacts in the
5th year?
o What happens if oil prices rise by 200%?

This type of analysis has become increasingly widespread, and has been taken up
by various governmental bodies (such as the FSA in the UK) as a regulatory requirement
on certain financial institutions to ensure adequate capital allocation levels to cover
potential losses incurred during extreme, but plausible, events. This emphasis on
adequate, risk adjusted determination of capital has been further enhanced by
modifications to banking regulations such as Basel II. Stress testing models typically
allow not only the testing of individual stressors, but also combinations of different
events. There is also usually the ability to test the current exposure to a known historical
scenario (such as the Russian debt default in 1998 or 9/11 terrorist attacks) to ensure the
liquidity of the institution

Sanity testing
A sanity test or sanity check is a basic test to quickly evaluate the validity of a
claim or calculation. In mathematics, for example, when multiplying by three or nine,
verifying that the sum of the digits of the result is a multiple of 3 or 9 respectively is a
sanity test.

In computer science it is a very brief run-through of the functionality of a


computer program, system, calculation, or other analysis, to assure that the system or
methodology works as expected, often prior to a more exhaustive round of testing.

Sanity tests are sometimes mistakenly equated to smoke tests. Where a distinction
is made between sanity testing and smoke testing, it's usually in one of two directions.
Either sanity testing is a focused but limited form of regression testing – narrow and
deep, but cursory; or it's broad and shallow, like a smoke test, but concerned more with
the possibility of "insane behavior" such as slowing the entire system to a crawl, or
destroying the database, but is not as thorough as a true smoke test.

Generally, a smoke test is scripted (either using a written set of tests or an


automated test), whereas a sanity test is usually unscripted.

With the evolution of test methodologies, sanity tests are useful both for initial
environment validation and future interactive increments. The process of sanity testing
begins with the execution of some online transactions of various modules, batch
programs of various modules to see whether the software runs without any hindrance or
abnormal termination. This practice can help identify most of the environment related
problems. A classic example of this in programming is the hello world program. If a
person has just set up a computer and a compiler, a quick sanity test can be performed to
see if the compiler actually works: write a program that simply displays the words "hello
world".

A sanity test can refer to various order of magnitude and other simple rule of thumb
devices applied to cross-check mathematical calculations. For example:

• If one were to attempt to square 738 and calculated 53,874, a quick sanity check
could show that this cannot be true. Consider that 500 < 738, yet 5002 = 521002 =
250000 > 53874. Since squaring preserves inequality for positive numbers (see
inequality), this cannot be true and so the calculation was bad.
• In multiplication, 918 x 155 is not 142135 since 918 is divisible by three but
142135 is not (digits add up to 13 the digits of which do not add up to a multiple
of three).
• When talking about quantities in physics, the power output of a car cannot be 700
kJ since that is a unit of energy, not power (energy per unit time). See dimensional
analysis.

Smoke testing
Smoke testing is a term used in plumbing, woodwind repair, electronics, and
computer software development. It refers to the first test made after repairs or first
assembly to provide some assurance that system under test will not catastrophically fail.
After a smoke test proves that the pipes will not leak, the keys seal properly, the circuit
will not burn, or the software will not crash outright, the assembly is ready for more
stressful testing.

• In computer programming and software testing, smoke testing is a preliminary to


further testing, which should reveal simple failures severe enough to reject a
prospective software release. In this case, the smoke is metaphorical.

Smoke testing in software development


Smoke testing is done by developers before the build is released or by testers
before accepting a build for further testing.
In software engineering, a smoke test generally consists of a collection of tests
that can be applied to a newly created or repaired computer program. Sometimes the tests
are performed by the automated system that builds the final software. In this sense a
smoke test is the process of validating code changes before the changes are checked into
the larger product’s official source code collection. Next after code reviews, smoke
testing is the most cost effective method for identifying and fixing defects in software;
some even believe that it is the most effective of all.

In software testing, a smoke test is a collection of written tests that are performed
on a system prior to being accepted for further testing. This is also known as a build
verification test. This is a "shallow and wide" approach to the application. The tester
"touches" all areas of the application without getting too deep, looking for answers to
basic questions like, "Can I launch the test item at all?", "Does it open to a window?",
"Do the buttons on the window do things?". There is no need to get down to field
validation or business flows. If you get a "No" answer to basic questions like these, then
the application is so badly broken, there's effectively nothing there to allow further
testing. These written tests can either be performed manually or using an automated tool.
When automated tools are used, the tests are often initiated by the same process that
generates the build itself.

Exploratory testing
Exploratory testing is an approach in software testing with simultaneous
learning, test design and test execution. While the software is being tested, the tester
learns things that together with experience and creativity generates new good tests to run.

Contents
• 1 History
• 2 Description
• 3 Benefits and drawbacks

• 4 Usage

History
Exploratory testing has been performed for a long time, and has similarities to ad
hoc testing. In the early 1990s, ad hoc was too often synonymous with sloppy and
careless work. As a result, a group of test methodologists (now calling themselves the
Context-Driven School) began using the term "exploratory" seeking to emphasize the
dominant thought process involved in unscripted testing, and to begin to develop the
practice into a teachable discipline. This new terminology was first published by Cem
Kaner in his book Testing Computer Software. Exploratory testing can be as disciplined
as any other intellectual activity.

Description
Exploratory testing seeks to find out how the software actually works, and to ask
questions about how it will handle difficult and easy cases. The testing is dependent on
the testers skill of inventing test cases and finding defects. The more the tester knows
about the product and different test methods, the better the testing will be.

To further explain, comparison can be made to the antithesis scripted testing,


which basically means that test cases are designed in advance, including steps to
reproduce and expected results. These tests are later performed by a tester who compares
the actual result with the expected.

When performing exploratory testing, there are no exact expected results; it is the
tester that decides what will be verified, critically investigating the correctness of the
result.

In reality, testing almost always is a combination of exploratory and scripted


testing, but with a tendency towards either one, depending on context.

The documentation of exploratory testing ranges from documenting all tests


performed to just documenting the bugs. During pair testing, two persons create test cases
together; one performs them, and the other documents. Session-based testing is a method
specifically designed to make exploratory testing auditable and measurable on a wider
scale.

Benefits and drawbacks


The main advantage of exploratory testing is that less preparation is needed,
important bugs are found fast, and is more intellectually stimulating than scripted testing.

Disadvantages are that the tests can't be reviewed in advance (and by that prevent
errors in code and test cases), and that it can be difficult to show exactly which tests have
been run.

When repeating exploratory tests, they will not be performed in the exact same
manner, which can be an advantage if it is important to find new errors; or a disadvantage
if it is more important to know that exact things are functional.

Usage
Exploratory testing is extra suitable if requirements and specifications are
incomplete, or if there is lack of time. The method can also be used to verify that previous
testing has found the most important defects. It is common to perform a combination of
exploratory and scripted testing where the choice is based on risk.

An example of exploratory testing in practice is Microsofts verification of


Windows compatibility

Regression testing
Regression testing is any type of software testing which seeks to uncover
regression bugs. Regression bugs occur whenever software functionality that previously
worked as desired stops working or no longer works in the same way that was previously
planned. Typically regression bugs occur as an unintended consequence of program
changes.

Common methods of regression testing include re-running previously run tests and
checking whether previously fixed faults have re-emerged.

Experience has shown that as software is developed, this kind of reemergence of


faults is quite common. Sometimes it occurs because a fix gets lost through poor revision
control practices (or simple human error in revision control), but just as often a fix for a
problem will be "fragile" - i.e. if some other change is made to the program, the fix no
longer works. Finally, it has often been the case that when some feature is redesigned, the
same mistakes will be made in the redesign that were made in the original
implementation of the feature.

Therefore, in most software development situations it is considered good practice


that when a bug is located and fixed, a test that exposes the bug is recorded and regularly
retested after subsequent changes to the program. Although this may be done through
manual testing procedures using programming techniques, it is often done using
automated testing tools. Such a 'test suite' contains software tools that allow the testing
environment to execute all the regression test cases automatically; some projects even set
up automated systems to automatically re-run all regression tests at specified intervals
and report any regressions. Common strategies are to run such a system after every
successful compile (for small projects), every night, or once a week.

Regression testing is an integral part of the extreme programming software


development method. In this method, design documents are replaced by extensive,
repeatable, and automated testing of the entire software package at every stage in the
software development cycle.

Contents
• 1 Types of regression
• 2 Mitigating regression risk

• 3 Uses

Types of regression
• Local - changes introduce new bugs.
• Unmasked - changes unmask previously existing bugs.
• Remote - Changing one part breaks another part of the program. For example,
Module A writes to a database. Module B reads from the database. If changes to
what Module A writes to the database break Module B, it is remote regression.

There's another way to classify regression.

• New feature regression - changes to code that is new to release 1.1 break other
code that is new to release 1.1.

• Existing feature regression - changes to code that is new to release 1.1 break code
that existed in release 1.0.

Mitigating regression risk


• Complete test suite repetition
• Regression test automation (GUI, API, CLI)
• Partial test repetition based on traceability and analysis of technical and business
risks
• Customer or user testing
o Beta - early release to both potential and current customers
o Pilot - deploy to a subset of users
o Parallel - users use both old and new systems simultaneously
• Use larger releases. Testing new functions often covers existing functions. The
more new features in a release, the more "accidental" regression testing.
• Emergency patches - these patches are released immediately, and will be included
in future maintenance releases.

Uses
Regression testing can be used not only for testing the correctness of a program,
but it is also often used to track the quality of its output. For instance in the design of a
compiler, regression testing should track the code size, simulation time and compilation
time of the test suites.
Installation testing
Implementation Testing or sometimes called Installation testing is typically
completed by the software testing engineer in conjunction with the configuration
manager. Implementation Testing is usually defined as testing which takes place using the
compile version of code into the Testing environment or pre-production environment
which may or may not make it into Production. This generally takes place outside of the
development environment to limit code corruption from other future releases which may
reside on the development environment.

While the ideal installation might simply appear to be to run an install program
sometimes called package software. This package software typically uses a setup
program which acts as a multi configuration wrapper, which may allow the software to be
installed on a variety of machine and/or operating environments. Every possible
configuration should require extensive testing before it can be used with confidence.

In distributed systems, particularly where software is to be released into an


already live target environment (such as an operational web site) installation (or Software
deployment as it is sometimes called) can involve database schema changes as well as the
installation of new software. Deployment plans in such circumstances may include back-
out procedures whose use is intended to roll the target environment back in the event that
the deployment is unsuccessful. Ideally, the deployment plan itself should be tested in an
environment that is a replica of the live environment. A factor that can increase the
organizational requirements of such an exercise is the need to synchronize the data in the
test deployment environment with that in the live environment with minimum disruption
to live operation. This type of implementation may include testing of the processes which
take place during the installation or upgrade of a multi tear application. This type of
testing is commonly compared to a dress rehearsal or may even be called a “dry run”.

Implementation Testing is testing of a full, partial or upgrades install/uninstall


processes.

Você também pode gostar