Escolar Documentos
Profissional Documentos
Cultura Documentos
Black box testing takes an external perspective of the test object to derive test
cases. These tests can be functional or non-functional, though usually functional. The test
designer selects valid and invalid input and determines the correct output. There is no
knowledge of the test object's internal structure.
This method of test design is applicable to all levels of software testing: unit,
integration, functional testing, system and acceptance. The higher the level, and hence the
bigger and more complex the box, the more one is forced to use black box testing to
simplify. While this method can uncover unimplemented parts of the specification, one
cannot be sure that all existent paths are tested.
Contents
• 1 Test design techniques
• 2 User input validation
• 3 Hardware
• Equivalence partitioning
• Boundary value analysis
• Decision table testing
• Pairwise testing
• State transition tables
• Use case testing
• Cross-functional testing
Typically when invalid user input occurs, the program will either correct it
automatically, or display a message to the user that their input needs to be corrected
before proceeding.
Hardware
Functional testing devices like power supplies, amplifiers, and many other simple
function electrical devices is common in the electronics industry. Automated functional
testing of specified characteristics is used for production testing, and part of design
validation.
Code coverage techniques were amongst the first techniques invented for
systematic software testing. The first published reference was by Miller and Maloney in
Communications of the ACM in 1963.
There are a number of different ways of measuring code coverage, the main ones being:
• Statement Coverage - Has each line of the source code been executed and tested?
• Condition Coverage - Has each evaluation point (such as a true/false decision)
been executed and tested?
• Path Coverage - Has every possible route through a given part of the code been
executed and tested?
• Entry/Exit Coverage - Has every possible call and return of the function been
executed and tested?
Safety critical applications are often required to demonstrate that testing achieves
100% of some form of code coverage.
Some of the coverage criteria above are connected; for instance, path coverage
implies condition, statement and entry/exit coverage. Statement coverage does not imply
condition coverage, as the code (in the C programming language) shows:
void foo(int bar)
{
printf("This is ");
if (bar < 0)
{
printf("not ");
}
printf("a positive integer.\n");
return;
}
If the function "foo" were called with variable "bar = -1", statement coverage would be
achieved. Condition coverage, however, would not.
Usually the source code is instrumented and run through a series of tests. The
resulting output is then analysed to see what areas of code have not been exercised, and
the tests are updated to include these areas as necessary. Combined with other code
coverage methods the aim is to develop a rigorous yet manageable set of regression tests.
Contents
• 1 Software release stages
o 1.1 Pre-alpha
o 1.2 Alpha
o 1.3 Beta
1.3.1 Origin of 'alpha' and 'beta'
o 1.4 Release candidate
o 1.5 Gold/general availability release
o 1.6 RTM / RTW
1.6.1 Box copy
o 1.7 Stable/unstable
• 2 See also
• 3 External links
The alpha build of the software is the build delivered to the software testers, that
is persons different from the programmers, but usually internal to the organization or
community that develops the software. In a rush to market, more and more companies are
engaging external customers or value-chain partners in their alpha testing phase. This
allows more extensive usability testing during the alpha phase.
In the first phase of testing, developers generally test the software using white box
techniques. Additional validation is then performed using black box or grey box
techniques, by another dedicated testing team, sometimes concurrently. Moving to black
box testing inside the organization is known as alpha release.
Beta
A beta version is the first version released outside the organization or community
that develops the software, for the purpose of evaluation or real-world black/grey-box
testing. The process of delivering a beta version to the users is called beta release.
The users of a beta version are said beta testers. They are usually customers or
prospective customers of the organization that develops the software. They receive the
software for free or for a reduced price, but act as free testers.
Beta versions test the supportability of the product, the go-to-market messaging
(while recruiting Beta customers), the manufacturability of the product, and the overall
channel flow or channel reach.
Often this stage begins when the developers announce a feature freeze on the
product, indicating that no more feature requirements will be accepted for this version of
the product. Only software issues, or bugs and unimplemented features will be addressed.
When a beta becomes available to the general public it is often widely used by the
technologically savvy and those familiar with previous versions as though it were the
finished product. Usually developers of freeware or open-source betas release them to the
general public while proprietary betas go to a relatively small group of testers. Recipients
of highly proprietary betas may have to sign a non-disclosure agreement. A release is
called feature complete when the product team agrees that functional requirements of the
system are met and no new features will be put into the release, but significant software
bugs may still exist. Companies with a formal software process will tend to enter the beta
period with a list of known bugs that must be fixed to exit the beta period, and some
companies make this list available to customers and testers.
As the Internet has allowed for rapid and inexpensive distribution of software,
companies have begun to take a more flexible approach to use of the word "beta".
Netscape Communications was infamous for releasing alpha level versions of its
Netscape web browser as public beta releases. In February 2005, ZDNet published an
article about the recent phenomenon of a beta version often staying for years and being
used as if it were in production-level [1]. It noted that Gmail and Google News, for
example, had been in beta for a long period of time and were not expected to drop the
beta status despite the fact that they were widely used; however, Google News did leave
beta in January 2006. This technique may also allow a developer to delay offering full
support and/or responsibility for remaining issues. In the context of Web 2.0, people even
talk of perpetual betas to signify that some software is meant to stay in beta state.
Microsoft Corporation often uses the term release candidate. During the 1990s,
Apple Computer used the term "golden master" for its release candidates, and the final
golden master was the general availability release. Other terms include gamma (and
occasionally also delta, and perhaps even more Greek letters) for versions that are
substantially complete, but still under test, and omega for final testing of versions that are
believed to be bug-free, and may go into production at any time. (Gamma, delta, and
omega are, respectively, the third, fourth, and last letters of the Greek alphabet.) Some
users disparagingly refer to release candidates and even final "point oh" releases as
"gamma test" software, suggesting that the developer has chosen to use its customers to
test software that is not truly ready for general release. Often, beta testers, if privately
selected, will be billed for using the release candidate as though it were a finished
product.
A release is called code complete when the development team agrees that no
entirely new source code will be added to this release. There may still be source code
changes to fix defects. There may still be changes to documentation and data files, and to
the code for test cases or utilities. New code may be added in a future release.
The term gold anecdotally refers to the use of "gold master disc" which was
commonly used to send the final version to manufacturers who use it to create the mass-
produced retail copies. It may in this context be a hold-over from music production. In
some cases, however, the master disc is still actually made of gold, for both aesthetic
appeal and resistance to corrosion.
RTM / RTW
Microsoft and others use the term "release to manufacturing" (RTM) to refer to
this version (for commercial products, like Windows XP, as in, "Build 2600 is the
Windows XP RTM release"), and "release to Web" (RTW) for freely downloadable
products. Typically, RTM is at least one or two weeks before GA because the RTM
version must be burnt to disc and boxed etc.
Box copy
A box copy is the final product, printed on a disc that is included in the actual
release, complete with disc graphic art. This term is used mostly by reviewers to
differentiate from gold master discs. A box copy does not necessarily come enclosed in
the actual boxed product - it refers to the disc itself.
Stable/unstable
In open source programming, version numbers or the terms stable and unstable
commonly distinguish the stage of development. The term stable refers to a version of
software that is substantially identical to a version that has been through enough real-
world testing to reasonably assume there are no showstopper problems, or at least that
any problems are known and documented. On the other hand, the term unstable does not
necessarily mean that there are problems - rather, that enhancements or changes have
been made to the software that have not undergone rigorous testing and that more
changes are expected to be imminent. Users of such software are advised to use the
stable version if it meets their needs, and to only use the unstable version if the new
functionality is of interest that exceeds the risk that something might simply not work
right.
In the Linux kernel, version numbers take the form of three numbers, separated by a
decimal point. Prior to the 2.6.x series, an even second number was used to represent a
stable release and an odd second number used to represent an unstable release. As of the
2.6.x series, the even or odd status of the second number no longer holds any
significance. The practice of using even and odd numbers to indicate the stability of a
release has been used by many other open and closed source projects.
Exploratory testing
Exploratory testing is an approach in software testing with simultaneous
learning, test design and test execution. While the software is being tested, the tester
learns things that together with experience and creativity generates new good tests to run.
Contents
• 1 History
• 2 Description
• 3 Benefits and drawbacks
• 4 Usage
History
Exploratory testing has been performed for a long time, and has similarities to ad
hoc testing. In the early 1990s, ad hoc was too often synonymous with sloppy and
careless work. As a result, a group of test methodologists (now calling themselves the
Context-Driven School) began using the term "exploratory" seeking to emphasize the
dominant thought process involved in unscripted testing, and to begin to develop the
practice into a teachable discipline. This new terminology was first published by Cem
Kaner in his book Testing Computer Software. Exploratory testing can be as disciplined
as any other intellectual activity.
Description
Exploratory testing seeks to find out how the software actually works, and to ask
questions about how it will handle difficult and easy cases. The testing is dependent on
the testers skill of inventing test cases and finding defects. The more the tester knows
about the product and different test methods, the better the testing will be.
When performing exploratory testing, there are no exact expected results; it is the
tester that decides what will be verified, critically investigating the correctness of the
result.
Disadvantages are that the tests can't be reviewed in advance (and by that prevent
errors in code and test cases), and that it can be difficult to show exactly which tests have
been run.When repeating exploratory tests, they will not be performed in the exact same
manner, which can be an advantage if it is important to find new errors; or a disadvantage
if it is more important to know that exact things are functional.
Usage
Exploratory testing is extra suitable if requirements and specifications are
incomplete, or if there is lack of time. The method can also be used to verify that previous
testing has found the most important defects. It is common to perform a combination of
exploratory and scripted testing where the choice is based on risk.
Contents
• 1 Explanation
• 2 Usage
• 3 Approaches to formal verification
• 4 Validation and Verification
• 5 Program verification
Explanation
Software testing alone cannot prove that a system does not contain any defects.
Neither can it prove that it does have a certain property. Only the process of formal
verification can prove that a system does not have a certain defect or does have a certain
property. It is impossible to prove or test that a system has "no defect" since it is
impossible to formally specify what "no defect" means. All that can be done is prove that
a system does not have any of the defects that can be thought of, and has all of the
properties that together make it functional and useful.
Usage
Formal verification can be used for example for systems such as cryptographic
protocols, combinatorial circuits, digital circuits with internal memory, and software
expressed as source code.
The properties to be verified are often described in temporal logics, such as linear
temporal logic (LTL) or computational tree logic (CTL).
• Validation: "Are we building the right product?", i.e., does the product do what
the user really requires?
• Verification: "Are we building the product right?", i.e., does the product conform
to the specifications?
The verification process consists of static and dynamic parts. E.g., for a software
product one can inspect the source code (static) and run against specific test cases
(dynamic). Validation usually can only be done dynamically, i.e., the product is tested by
putting it through typical usages and atypical usages ("Can we break it?"). See also
Verification and Validation
Program verification
Program verification is the process of formally proving that a computer program
does exactly what is stated in the program specification it was written to realize. This is a
type of formal verification which is specifically aimed at verifying the code itself, not an
abstract model of the program.
For functional programming languages, some programs can be verified by equational
reasoning, usually together with induction. Code in an imperative language could be
proved correct by use of Hoare logic.
Fuzz testing
Fuzz testing or fuzzing is a software testing technique that provides random data
("fuzz") to the inputs of a program. If the program fails (for example, by crashing, or by
failing built-in code assertions), the defects can be noted.
The great advantage of fuzz testing is that the test design is extremely simple, and
free of preconceptions about system behavior. Fuzz testing was developed at the
University of Wisconsin-Madison in 1989 by Professor Barton Miller and the students in
his graduate Advanced Operating Systems class.
Contents
• 1 Uses
• 2 Fuzz testing methods
o 2.1 Advantages and disadvantages
o 2.2 Event-driven fuzz
o 2.3 Character-driven fuzz
Uses
Fuzz testing is often used in large software development projects that perform
black box testing. These usually have a budget to develop test tools, and fuzz testing is
one of the techniques which offers a high benefit to cost ratio.
Fuzz testing is thought to enhance software security and software safety because it
often finds odd oversights and defects which human testers would fail to find, and even
careful human test designers would fail to create tests for.
However, fuzz testing is not a substitute for exhaustive testing or formal methods:
it can only provide a random sample of the system's behavior, and in many cases passing
a fuzz test may only demonstrate that a piece of software handles exceptions without
crashing, rather than behaving correctly. Thus, fuzz testing can only be regarded as a bug-
finding tool rather than an assurance of quality.
• Event driven inputs are usually from a graphical user interface, or possibly from a
mechanism in an embedded system.
• Character driven inputs are from files, or data streams such as sockets.
• Database inputs are from tabular data, such as relational databases.
• Inherited program state such as environment variables
• Valid fuzz attempts to assure that the random input is reasonable, or conforms to
actual production data.
• Simple fuzz usually uses a pseudo random number generator to provide input.
• A combined approach uses valid test data with some proportion of totally random
input injected.
The main problem with fuzzing to find program faults is that it generally only
finds very simple faults. The problem itself is exponential and every fuzzer takes
shortcuts to find something interesting in a timeframe that a human cares about. A
primitive fuzzer may have poor code coverage; for example, if the input includes a
checksum which is not properly updated to match other random changes, only the
checksum validation code will be verified. Code coverage tools are often used to estimate
how "well" a fuzzer works, but these are only guidelines to fuzzer quality. Every fuzzer
can be expected to find a different set of bugs.
On the other hand, bugs found using fuzz testing are frequently severe,
exploitable bugs that could be used by a real attacker. This has become even more true as
fuzz testing has become more widely known, as the same techniques and tools are now
used by attackers to exploit deployed software. This is a major advantage over binary or
source auditing, or even fuzzing's close cousin, fault injection, which often relies on
artificial fault conditions that are difficult or impossible to exploit.
Event-driven fuzz
The most common problem with an event-driven program is that it will often
simply use the data in the queue, without even crude validation. To succeed in a fuzz-
tested environment, software must validate all fields of every queue entry, decode every
possible binary value, and then ignore impossible requests.
One of the more interesting issues with real-time event handling is that if error
reporting is too verbose, simply providing error status can cause resource problems or a
crash. Robust error detection systems will report only the most significant, or most recent
error over a period of time.
Character-driven fuzz
Normally this is provided as a stream of random data. The classic source in UNIX
is the random data generator.
One common problem with a character driven program is a buffer overrun, when
the character data exceeds the available buffer space. This problem tends to recur in every
instance in which a string or number is parsed from the data stream and placed in a
limited-size area.
Another is that decode tables or logic may be incomplete, not handling every
possible binary value.
Database fuzz
The standard database scheme is usually filled with fuzz that is random data of
random sizes. Some IT shops use software tools to migrate and manipulate such
databases. Often the same schema descriptions can be used to automatically generate fuzz
databases.
Integration testing takes as its input modules that have been unit tested, groups
them in larger aggregates, applies tests defined in an integration test plan to those
aggregates, and delivers as its output the integrated system ready for system testing.
Contents
• 1 Purpose
• 2 Limitations
Purpose
The purpose of integration testing is to verify functional, performance and
reliability requirements placed on major design items. These "design items", i.e.
assemblages (or groups of units), are exercised through their interfaces using black box
testing, success and error cases being simulated via appropriate parameter and data
inputs. Simulated usage of shared data areas and inter-process communication is tested
and individual subsystems are exercised through their input interface. Test cases are
constructed to test that all components within assemblages interact correctly, for example
across procedure calls or process activations, and this is done after testing individual
modules, i.e. unit testing.
The overall idea is a "building block" approach, in which verified assemblages are
added to a verified base which is then used to support the integration testing of further
assemblages.
The different types of integration testing are big bang, top-down, bottom-up, and
back bone.
Big Bang: In this approach, all or most of the developed modules are coupled
together to form a complete software system or major part of the system and then used
for integration testing. The Big Bang method is very effective for saving time in the
integration testing process. However, if the test cases and their results are not recorded
properly, the entire integration process will be more complicated and may prevent the
testing team from achieving the goal of integration testing.
Bottom Up: All the bottom or low level modules, procedures or functions are
integrated and then tested. After the integration testing of lower level integrated modules,
the next level of modules will be formed and can be used for integration testing. This
approach is helpful only when all or most of the modules of the same development level
are ready. This method also helps to determine the levels of software developed and
makes it easier to report testing progress in the form of a percentage.
Limitations
Any conditions not stated in specified integration tests, outside of the confirmation of the
execution of design items, will generally not be tested. Integration tests can not include
system-wide (end-to-end) change testing.
Software testing Test case
In software engineering, the most common definition of a test case is a set of
conditions or variables under which a tester will determine if a requirement or use case
upon an application is partially or fully satisfied. It may take many test cases to determine
that a requirement is fully satisfied. In order to fully test that all the requirements of an
application are met, there must be at least one test case for each requirement unless a
requirement has sub requirements. In that situation, each sub requirement must have at
least one test case. This is frequently done using a Traceability matrix. Some
methodologies, like RUP, recommend creating at least two test cases for each
requirement. One of them should perform positive testing of requirement and other
should perform negative testing. Written test cases should include a description of the
functionality to be tested, and the preparation required to ensure that the test can be
conducted.
If the application is created without formal requirements, then test cases can be
written based on the accepted normal operation of programs of a similar class. In some
schools of testing, test cases are not written at all but the activities and results are
reported after the tests have been run.
What characterizes a formal, written test case is that there is a known input and an
expected output, which is worked out before the test is executed. The known input should
test a precondition and the expected output should test a postcondition.
Under special circumstances, there could be a need to run the test, produce results,
and then a team of experts would evaluate if the results can be considered as a pass. This
happens often on new products' performance number determination. The first test is taken
as the base line for subsequent test / product release cycles.
Not all written tests require all of these sections. However, the bare bones of a test can be
reduced to three essential steps:
It is important to note that if the preconditions cannot be established, the item cannot
be tested according to its software requirements specification and the test must not
proceed.
Verifying the postcondition is equivalent to establishing that the actual results are as
expected.
Note that several tests may need to be run to challenge the postcondition. For
example, to test a user login routine would need at least a case of a known username-
password pair and a second case of an unknown username-password pair.
Common usage is to take the identifier for each of the items of one document and
place them in the left column. The identifiers for the other document are placed across the
top row. When an item in the left column is related to an item across the top, a mark is
placed in the intersecting cell. The number of relationships are added up for each row and
each column. This value indicates the mapping of the two items. Zero values indicate that
no relationship exists and that one must be made. Large values imply that the item is too
complex and should be simplified.
Requireme REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1
reqs
nt
tested UC UC UC UC UC UC UC UC UC UC UC TECH TECH TECH
Identifiers
1.0 1.2 1.3 2.1 2.2 2.3.1 2.3.2 2.3.3 2.4 3.1 3.2 1.1 1.2 1.3
Test Cases 21 3 2 3 1 1 1 1 1 1 2 3 1 1 1
tested implicitly 0
1 x
1.1.1
2 x x
1.1.2
2 x x
1.1.3
1 x
1.1.4
2 x x
1.1.5
1 x
1.1.6
1 x
1.1.7
2 X x
1.2.1
2 x x
1.2.2
2 x x
1.2.3
1 x
1.3.1
1 x
1.3.2
1 x
1.3.3
1 x
1.3.4
1 x
1.3.5
etc…
1 x
5.6.2
Unit testing
In computer programming, unit testing is a procedure used to validate that
individual units of source code are working properly. A unit is the smallest testable part of
an application. In procedural programming a unit may be an individual program,
function, procedure etc, while in object-oriented programming, the smallest unit is always
a Class; which may be a base/super class, abstract class or derived/child class. Units are
distinguished from modules in that modules are typically made up of units.
Ideally, each test case is independent from the others; mock objects and test
harnesses can be used to assist testing a module in isolation. Unit testing is typically done
by the developers and not by end-users.
Contents
• 1 Benefit
o 1.1 Facilitates change
o 1.2 Simplifies integration
o 1.3 Documentation
o 1.4 Separation of interface from implementation
• 2 Limitations of unit testing
• 3 Applications
o 3.1 Extreme Programming
o 3.2 Techniques
Benefit
The goal of unit testing is to isolate each part of the program and show that the
individual parts are correct. A unit test provides a strict, written contract that the piece of
code must satisfy. As a result, it affords several benefits.
Facilitates change
Unit testing allows the programmer to refactor code at a later date, and make sure
the module still works correctly (i.e. regression testing). The procedure is to write test
cases for all functions and methods so that whenever a change causes a fault, it can be
quickly identified and fixed.
Readily-available unit tests make it easy for the programmer to check whether a
piece of code is still working properly. Good unit test design produces test cases that
cover all paths through the unit with attention paid to loop conditions.
In continuous unit testing environments, through the inherent practice of sustained
maintenance, unit tests will continue to accurately reflect the intended use of the
executable and code in the face of any change. Depending upon established development
practices and unit test coverage, up-to-the-second accuracy can be maintained.
Simplifies integration
Unit testing helps to eliminate uncertainty in the units themselves and can be used
in a bottom-up testing style approach. By testing the parts of a program first and then
testing the sum of its parts, integration testing becomes much easier.
Documentation
Unit testing provides a sort of "living document". Clients and other developers
looking to learn how to use the module can look at the unit tests to determine how to use
the module to fit their needs and gain a basic understanding of the API.
Unit test cases embody characteristics that are critical to the success of the unit.
These characteristics can indicate appropriate/inappropriate use of a unit as well as
negative behaviors that are to be trapped by the unit. A unit test case, in and of itself,
documents these critical characteristics, although many software development
environments do not rely solely upon code to document the product in development.
Because some classes may have references to other classes, testing a class can
frequently spill over into testing another class. A common example of this is classes that
depend on a database: in order to test the class, the tester often writes code that interacts
with the database. This is a mistake, because a unit test should never go outside of its own
class boundary. As a result, the software developer abstracts an interface around the
database connection, and then implements that interface with their own mock object. By
abstracting this necessary attachment from the code (temporarily reducing the net
effective coupling), the independent unit can be more thoroughly tested than may have
been previously achieved. This results in a higher quality unit that is also more
maintainable.
It is unrealistic to test all possible input combinations for any non-trivial piece of
software. Like all forms of software testing, unit tests can only show the presence of
errors; it cannot show the absence of errors.
Applications
Extreme Programming
Extreme Programming uses the creation of unit tests for test-driven development.
The developer writes a unit test that exposes either a software requirement or a defect.
This test will fail because either the requirement isn't implemented yet, or because
it intentionally exposes a defect in the existing code. Then, the developer writes the
simplest code to make the test, along with other tests, pass.
All classes in the system are unit tested. Developers release unit testing code to
the code repository in conjunction with the code it tests. XP's thorough unit testing allows
the benefits mentioned above, such as simpler and more confident code development and
refactoring, simplified code integration, accurate documentation, and more modular
designs. These unit tests are also constantly run as a form of regression test.
Techniques
Unit testing is commonly automated, but may still be performed manually. The
IEEE[1] does not favor one over the other. A manual approach to unit testing may employ
a step-by-step instructional document. Nevertheless, the objective in unit testing is to
isolate a unit and validate its correctness. Automation is efficient for achieving this, and
enables the many benefits listed in this article. Conversely, if not planned carefully, a
careless manual unit test case may execute as an integration test case that involves many
software components, and thus preclude the achievement of most if not all of the goals
established for unit testing.
Under the automated approach, to fully realize the effect of isolation, the unit or
code body subjected to the unit test is executed within a framework outside of its natural
environment, that is, outside of the product or calling context for which it was originally
created. Testing in an isolated manner has the benefit of revealing unnecessary
dependencies between the code being tested and other units or data spaces in the product.
These dependencies can then be eliminated.
Using an automation framework, the developer codes criteria into the test to
verify the correctness of the unit. During execution of the test cases, the framework logs
those that fail any criterion. Many frameworks will also automatically flag and report in a
summary these failed test cases. Depending upon the severity of a failure, the framework
may halt subsequent testing.
Unit testing frameworks, which help simplify the process of unit testing, have
been developed for a wide variety of languages. It is generally possible to perform unit
testing without the support of specific framework by writing client code that exercises the
units under test and uses assertion, exception, or early exit mechanisms to signal failure.
This approach is valuable in that there is a negligible barrier to the adoption of
unit testing. However, it is also limited in that many advanced features of a proper
framework are missing or must be hand-coded.
Developer Black box testing
Black box testing takes an external perspective of the test object to derive test
cases. These tests can be functional or non-functional, though usually functional. The test
designer selects valid and invalid input and determines the correct output. There is no
knowledge of the test object's internal structure.
This method of test design is applicable to all levels of software testing: unit,
integration, functional testing, system and acceptance. The higher the level, and hence the
bigger and more complex the box, the more one is forced to use black box testing to
simplify. While this method can uncover unimplemented parts of the specification, one
cannot be sure that all existent paths are tested.
Contents
• 1 Test design techniques
• 2 User input validation
• 3 Hardware
• Equivalence partitioning
• Boundary value analysis
• Decision table testing
• Pairwise testing
• State transition tables
• Use case testing
• Cross-functional testing
Typically when invalid user input occurs, the program will either correct it
automatically, or display a message to the user that their input needs to be corrected
before proceeding.
Hardware
Functional testing devices like power supplies, amplifiers, and many other simple
function electrical devices is common in the electronics industry. Automated functional
testing of specified characteristics is used for production testing, and part of design
validation.
Equivalence partitioning
Equivalence partitioning is a software testing related technique with the goal:
Contents
• 1 The Theory
• 2 Black Box vs. White Box
The Theory
The testing theory related to equivalence partitioning says that only one test case
of each partition is needed to evaluate the behaviour of the program for the related
partition. In other words it is sufficient to select one test case out of each partition to
check the behaviour of the program. To use more or even all test cases of a partition will
not find new faults in the program. The values within one partition are considered to be
"equivalent". Thus the number of test cases can be reduced considerably.
An additional effect by applying this technique is that you also find the so called
"dirty" test cases. An unexperienced tester may be tempted to use as test cases the input
data 1 to 12 for the month and forget to select some out of the invalid partitions. This
would lead to a huge number of unnecessary test cases on the one hand, and a lack of test
cases for the dirty ranges on the other hand.
To check for the expected results you would need to evaluate some internal
intermediate values rather than the output interface.
Types of Equivalence Classes
• Continuous classes run from one point to another, with no clear separations of
values. An example is a temperature range.
• Discrete classes have clear separation of values. Discrete classes are sets, or
enumerations.
• Boolean classes are either true or false. Boolean classes only have two values,
either true or false, on or off, yes or no. An example is whether a checkbox is
checked or unchecked
Introduction
Testing experience has shown that especially the boundaries of input ranges to a
software component are liable to defects. A programmer who has to implement e.g. the
range 1 to 12 at an input, which e.g. stands for the month January to December in a date,
has in his code a line checking for this range. This may look like:
But a common programming error may check a wrong range e.g. starting the
range at 0 by writing:
For more complex range checks in a program this may be a problem which is not
so easily spotted as in the above simple example.
Applying boundary value analysis you have to select now a test case at each side
of the boundary between two partitions. In the above example this would be 0 and 1 for
the lower boundary as well as 12 and 13 for the upper boundary. Each of these pairs
consists of a "clean" and a "dirty" test case. A "clean" test case should give you a valid
operation result of your program. A "dirty" test case should lead to a correct and specified
input error treatment such as the limiting of values, the usage of a substitute value, or in
case of a program with a user interface, it has to lead to warning and request to enter
correct data. The boundary value analysis can have 6 testcases.n, n-1,n+1 for the upper
limit and n, n-1,n+1 for the lower limit.
A further set of boundaries has to be considered when you set up your test cases.
A solid testing strategy also has to consider the natural boundaries of the data types used
in the program. If you are working with signed values this is especially the range around
zero (-1, 0, +1). Similar to the typical range check faults, programmers tend to have
weaknesses in their programs in this range. e.g. this could be a division by zero problem
where a zero value may occur although the programmer always thought the range started
at 1. It could be a sign problem when a value turns out to be negative in some rare cases,
although the programmer always expected it to be positive. Even if this critical natural
boundary is clearly within an equivalence partition it should lead to additional test cases
checking the range around zero. A further natural boundary is the natural lower and upper
limit of the data type itself. E.g. an unsigned 8-bit value has the range of 0 to 255. A good
test strategy would also check how the program reacts at an input of -1 and 0 as well as
255 and 256.
The tendency is to relate boundary value analysis more to the so called black box
testing which is strictly checking a software component at its interfaces, without
consideration of internal structures of the software. But looking closer at the subject,
there are cases where it applies also to white box testing.
After determining the necessary test cases with equivalence partitioning and
subsequent boundary value analysis, it is necessary to define the combinations of the test
cases when there are multiple inputs to a software component.
Decision Table
Decision tables are a precise yet compact way to model complicated logic.
Decision tables, like if-then-else and switch-case statements, associate conditions with
actions to perform. But, unlike the control structures found in traditional programming
languages, decision tables can associate many independent conditions with several
actions in an elegant way.
Contents
• 1 Structure
• 2 Example
Structure
Decision tables are typically divided into four quadrants, as shown below.
Aside from the basic four quadrant structure, decision tables vary widely in the
way the condition alternatives and action entries are represented. Some decision tables
use simple true/false values to represent the alternatives to a condition (akin to if-then-
else), other tables may use numbered alternatives (akin to switch-case), and some tables
even use fuzzy logic or probabilistic representations for condition alternatives. In a
similar way, action entries can simply represent whether an action is to be performed
(check the actions to perform), or in more advanced decision tables, the sequencing of
actions to perform (number the actions to perform).
Example
The limited-entry decision table is the simplest to describe. The condition
alternatives are simple boolean values, and the action entries are check-marks,
representing which of the actions in a given column are to be performed.
Of course, this is just a simple example,it demonstrates how decision tables can
scale to several conditions with many possibilities.
Just as decision tables make it easy to audit control logic, decision tables demand
that a programmer think of all possible conditions. With traditional control structures, it is
easy to forget about corner cases, especially when the else statement is optional. Since
logic is so important to programming, decision tables are an excellent tool for designing
control logic. In one incredible anecdote, after a failed 6 man-year attempt to describe
program logic for a file maintenance system using flow charts, four people solved the
problem using decision tables in just four weeks. Choosing the right tool for the problem
is fundamental.
System testing
System testing is testing conducted on a complete, integrated system to evaluate
the system's compliance with its specified requirements. System testing falls within the
scope of black box testing, and as such, should require no knowledge of the inner design
of the code or logic. [1]
As a rule, system testing takes, as its input, all of the "integrated" software
components that have successfully passed integration testing and also the software
system itself integrated with any applicable hardware system(s). The purpose of
integration testing is to detect any inconsistencies between the software units that are
integrated together (called assemblages) or between any of the assemblages and the
hardware. System testing is a more limiting type of testing; it seeks to detect defects both
within the "inter-assemblages" and also within the system as a whole.
Contents
• 1 Testing the whole system
• 2 Types of system testing
One could view System testing as the final destructive testing phase before user
acceptance testing.
Types of system testing
The following examples are different types of testing that should be considered
during System testing:
Rather than showing users a rough draft and asking, "Do you understand this?",
usability testing involves watching people trying to use something for its intended
purpose. For example, when testing instructions for assembling a toy, the test subjects
should be given the instructions and a box of parts. Instruction phrasing, illustration
quality, and the toy's design all affect the assembly process.
Contents
• 1 What to measure
• 2 See also
• 3 External links
What to measure
Usability testing generally involves measuring how well test subjects respond in four
areas: time, accuracy, recall, and emotional response. The results of the first test can be
treated as a baseline or control measurement; all subsequent tests can then be compared
to the baseline to indicate improvement.
• Time on Task -- How long does it take people to complete basic tasks? (For
example, find something to buy, create a new account, and order the item.)
• Accuracy -- How many mistakes did people make? (And were they fatal or
recoverable with the right information?)
• Recall -- How much does the person remember afterwards or after periods of non-
use?
• Emotional Response -- How does the person feel about the tasks completed?
(Confident? Stressed? Would the user recommend this system to a friend?)
In the early 1990s, Jakob Nielsen, at that time a researcher at Sun Microsystems,
popularized the concept of using numerous small usability tests -- typically with only five
test subjects each -- at various stages of the development process. His argument is that,
once it is found that two or three people are totally confused by the home page, little is
gained by watching more people suffer through the same flawed design. "Elaborate
usability tests are a waste of resources. The best results come from testing no more than 5
users and running as many small tests as you can afford." 2. Nielsen subsequently
published his research and coined the term heuristic evaluation.
The claim of "Five users is enough" was later described by a mathematical model
(Virzi, R.A., Refining the Test Phase of Usability Evaluation: How Many Subjects is
Enough? Human Factors, 1992. 34(4): p. 457-468.) which states for the proportion of
uncovered problems U
U = 1 − (1 − p)n
where p is the probability of one subject identifying a specific problem and n the number
of subjects (or test sessions). This model shows up as an asymptotic graph towards the
number of real existing problems (see figure below).
In later research Nielsen's claim has eagerly been questioned with both empirical
evidence 3 and more advanced mathematical models (Caulton, D.A., Relaxing the
homogeneity assumption in usability testing. Behaviour & Information Technology,
2001. 20(1): p. 1-7.). Two of the key challeges to this assertion are: (1) since usability is
related to the specific set of users, such a small sample size is unlikely to be
representative of the total population so the data from such a small sample is more likely
to reflect the sample group than the population they may represent and (2) many usability
problems encountered in testing are likely to prevent exposure of other usability
problems, making it impossible to predict the percentage of problems that can be
uncovered without knowing the relationship between existing problems. Most researchers
today agree that, although 5 users can generate a significant amount of data at any given
point in the development cycle, in many applications a sample size larger than five is
required to detect a satisfying amount of usability problems.
Bruce Tognazzini advocates close-coupled testing: "Run a test subject through the
product, figure out what's wrong, change it, and repeat until everything works. Using this
technique, I've gone through seven design iterations in three-and-a-half days, testing in
the morning, changing the prototype at noon, testing in the afternoon, and making more
elaborate changes at night." 4 This testing can be useful in research situations.
Load testing
Load testing is the process of creating demand on a system or device and
measuring its response.
Load testing generally refers to the practice of modeling the expected usage of a
software program by simulating multiple users accessing the program's services
concurrently. As such, this testing is most relevant for multi-user systems, often one built
using a client/server model, such as web servers. However, other types of software
systems can be load-tested also. For example, a word processor or graphics editor can be
forced to read an extremely large document; or a financial package can be forced to
generate a report based on several years' worth of data. The most accurate load testing
occurs with actual, rather than theoretical, results.
When the load placed on the system is raised beyond normal usage patterns, in
order to test the system's response at unusually high or peak loads, it is known as stress
testing. The load is usually so great that error conditions are the expected result, although
no clear boundary exists when an activity ceases to be a load test and becomes a stress
test.
There is little agreement on what the specific goals of load testing are. The term is
often used synonymously with performance testing, reliability testing, and volume
testing.
Volume testing
Volume Testing belongs to the group of non-functional tests, which are often
misunderstood and/or used interchangeably. Volume testing refers to testing a software
application for a certain data volume. This volume can in generic terms be the database
size or it could also be the size of an interface file that is the subject of volume testing.
For example, if you want to volume test your application with a specific database size,
you will explode your database to that size and then test the application's performance on
it. Another example could be when there is a requirement for your application to interact
with an interface file (could be any file such as .dat, .xml); this interaction could be
reading and/or writing on to/from the file. You will create a sample file of the size you
want and then test the application's functionality with that file to check performance.
Stress testing
Stress testing is a form of testing that is used to determine the stability of a given
system or entity. It involves testing beyond normal operational capacity, often to a
breaking point, in order to observe the results. Stress testing may have a more specific
meaning in certain industries.
Contents
• 1 IT industry
• 2 Medicine
• 3 Financial sector
IT industry
In software testing, stress testing often refers to tests that put a greater emphasis
on robustness, availability, and error handling under a heavy load, than on what would be
considered correct behavior under normal circumstances. In particular, the goals of such
tests may be to ensure the software doesn't crash in conditions of insufficient
computational resources (such as memory or disk space), unusually high concurrency, or
denial of service attacks.
Examples:
• A web server may be stress tested using scripts, bots, and various denial of service
tools to observe the performance of a web site during peak loads.
Medicine
• A Cardiac stress test is used most commonly to detect marked imbalances in
blood flow to the heart muscle.
Financial sector
• Instead of doing financial projection on a "best estimate" basis, a company may
do stress testing where they look at how robust a financial instrument is in certain
crashes. They may test the instrument under, for example, the following stresses:
o What happens if the market crashes by more than x% this year?
o What happens if interest rates go up by at least y%?
o What if half the instruments in the portfolio terminate their contacts in the
5th year?
o What happens if oil prices rise by 200%?
This type of analysis has become increasingly widespread, and has been taken up
by various governmental bodies (such as the FSA in the UK) as a regulatory requirement
on certain financial institutions to ensure adequate capital allocation levels to cover
potential losses incurred during extreme, but plausible, events. This emphasis on
adequate, risk adjusted determination of capital has been further enhanced by
modifications to banking regulations such as Basel II. Stress testing models typically
allow not only the testing of individual stressors, but also combinations of different
events. There is also usually the ability to test the current exposure to a known historical
scenario (such as the Russian debt default in 1998 or 9/11 terrorist attacks) to ensure the
liquidity of the institution
Sanity testing
A sanity test or sanity check is a basic test to quickly evaluate the validity of a
claim or calculation. In mathematics, for example, when multiplying by three or nine,
verifying that the sum of the digits of the result is a multiple of 3 or 9 respectively is a
sanity test.
Sanity tests are sometimes mistakenly equated to smoke tests. Where a distinction
is made between sanity testing and smoke testing, it's usually in one of two directions.
Either sanity testing is a focused but limited form of regression testing – narrow and
deep, but cursory; or it's broad and shallow, like a smoke test, but concerned more with
the possibility of "insane behavior" such as slowing the entire system to a crawl, or
destroying the database, but is not as thorough as a true smoke test.
With the evolution of test methodologies, sanity tests are useful both for initial
environment validation and future interactive increments. The process of sanity testing
begins with the execution of some online transactions of various modules, batch
programs of various modules to see whether the software runs without any hindrance or
abnormal termination. This practice can help identify most of the environment related
problems. A classic example of this in programming is the hello world program. If a
person has just set up a computer and a compiler, a quick sanity test can be performed to
see if the compiler actually works: write a program that simply displays the words "hello
world".
A sanity test can refer to various order of magnitude and other simple rule of thumb
devices applied to cross-check mathematical calculations. For example:
• If one were to attempt to square 738 and calculated 53,874, a quick sanity check
could show that this cannot be true. Consider that 500 < 738, yet 5002 = 521002 =
250000 > 53874. Since squaring preserves inequality for positive numbers (see
inequality), this cannot be true and so the calculation was bad.
• In multiplication, 918 x 155 is not 142135 since 918 is divisible by three but
142135 is not (digits add up to 13 the digits of which do not add up to a multiple
of three).
• When talking about quantities in physics, the power output of a car cannot be 700
kJ since that is a unit of energy, not power (energy per unit time). See dimensional
analysis.
Smoke testing
Smoke testing is a term used in plumbing, woodwind repair, electronics, and
computer software development. It refers to the first test made after repairs or first
assembly to provide some assurance that system under test will not catastrophically fail.
After a smoke test proves that the pipes will not leak, the keys seal properly, the circuit
will not burn, or the software will not crash outright, the assembly is ready for more
stressful testing.
In software testing, a smoke test is a collection of written tests that are performed
on a system prior to being accepted for further testing. This is also known as a build
verification test. This is a "shallow and wide" approach to the application. The tester
"touches" all areas of the application without getting too deep, looking for answers to
basic questions like, "Can I launch the test item at all?", "Does it open to a window?",
"Do the buttons on the window do things?". There is no need to get down to field
validation or business flows. If you get a "No" answer to basic questions like these, then
the application is so badly broken, there's effectively nothing there to allow further
testing. These written tests can either be performed manually or using an automated tool.
When automated tools are used, the tests are often initiated by the same process that
generates the build itself.
Exploratory testing
Exploratory testing is an approach in software testing with simultaneous
learning, test design and test execution. While the software is being tested, the tester
learns things that together with experience and creativity generates new good tests to run.
Contents
• 1 History
• 2 Description
• 3 Benefits and drawbacks
• 4 Usage
History
Exploratory testing has been performed for a long time, and has similarities to ad
hoc testing. In the early 1990s, ad hoc was too often synonymous with sloppy and
careless work. As a result, a group of test methodologists (now calling themselves the
Context-Driven School) began using the term "exploratory" seeking to emphasize the
dominant thought process involved in unscripted testing, and to begin to develop the
practice into a teachable discipline. This new terminology was first published by Cem
Kaner in his book Testing Computer Software. Exploratory testing can be as disciplined
as any other intellectual activity.
Description
Exploratory testing seeks to find out how the software actually works, and to ask
questions about how it will handle difficult and easy cases. The testing is dependent on
the testers skill of inventing test cases and finding defects. The more the tester knows
about the product and different test methods, the better the testing will be.
When performing exploratory testing, there are no exact expected results; it is the
tester that decides what will be verified, critically investigating the correctness of the
result.
Disadvantages are that the tests can't be reviewed in advance (and by that prevent
errors in code and test cases), and that it can be difficult to show exactly which tests have
been run.
When repeating exploratory tests, they will not be performed in the exact same
manner, which can be an advantage if it is important to find new errors; or a disadvantage
if it is more important to know that exact things are functional.
Usage
Exploratory testing is extra suitable if requirements and specifications are
incomplete, or if there is lack of time. The method can also be used to verify that previous
testing has found the most important defects. It is common to perform a combination of
exploratory and scripted testing where the choice is based on risk.
Regression testing
Regression testing is any type of software testing which seeks to uncover
regression bugs. Regression bugs occur whenever software functionality that previously
worked as desired stops working or no longer works in the same way that was previously
planned. Typically regression bugs occur as an unintended consequence of program
changes.
Common methods of regression testing include re-running previously run tests and
checking whether previously fixed faults have re-emerged.
Contents
• 1 Types of regression
• 2 Mitigating regression risk
• 3 Uses
Types of regression
• Local - changes introduce new bugs.
• Unmasked - changes unmask previously existing bugs.
• Remote - Changing one part breaks another part of the program. For example,
Module A writes to a database. Module B reads from the database. If changes to
what Module A writes to the database break Module B, it is remote regression.
• New feature regression - changes to code that is new to release 1.1 break other
code that is new to release 1.1.
• Existing feature regression - changes to code that is new to release 1.1 break code
that existed in release 1.0.
Uses
Regression testing can be used not only for testing the correctness of a program,
but it is also often used to track the quality of its output. For instance in the design of a
compiler, regression testing should track the code size, simulation time and compilation
time of the test suites.
Installation testing
Implementation Testing or sometimes called Installation testing is typically
completed by the software testing engineer in conjunction with the configuration
manager. Implementation Testing is usually defined as testing which takes place using the
compile version of code into the Testing environment or pre-production environment
which may or may not make it into Production. This generally takes place outside of the
development environment to limit code corruption from other future releases which may
reside on the development environment.
While the ideal installation might simply appear to be to run an install program
sometimes called package software. This package software typically uses a setup
program which acts as a multi configuration wrapper, which may allow the software to be
installed on a variety of machine and/or operating environments. Every possible
configuration should require extensive testing before it can be used with confidence.