Você está na página 1de 20

Version 20140307-1.0.

Handout 19 (CCIP)/Handout 25 (UEI) - Sampling Methods


There are four main reasons for selecting a sample as an inspector:
-

Sampling allows an inspector to allocate resources to command-emphasis areas


Saves time compared to trying to inspect the entire population
Saves money compared to trying to inspect the entire population
Analysis of a sample is less cumbersome and more practical than an analysis of the
entire population

The sampling process begins by defining the frame. The frame is the listing of items that
make up the population. Examples are wing members, squadron training records, number of
primary aircraft authorized, etc.
Figure 1 below shows the two broad categories of sampling and some commonly used
sampling methods under each one. The difference between them is that in probability sampling,
every item has a 'chance' of being selected, and that chance can be quantified. This is not true for
non-probability sampling; every item in a population does not have an equal chance of being
selected.

Figure 1 Sampling categories and methods

Version 20140307-1.0.0

The advantages and disadvantages of non-probability and probability sampling are


summarized below.

BIAS
In the above chart on non-probability verses probability, you will notice one of the
disadvantages of non-probability sampling is selection bias. So what is bias and why do you care
about it as an inspector?
Bias is a general statistical term meaning a systematic (not random) deviation from the
true value. A bias of a measurement or a sampling procedure may pose a more serious problem
for the inspector than random errors because it cannot be reduced by mere increase in sample
size and averaging the outcomes (Berenson, Levine, Krehbiel, 2009).
For example, suppose your wing commander wants to know the wing members overall
opinion of medical services provided by the medical group. A random sample of 200 individuals
has been drawn from within the medical group. If medical group personal opinions about the
quality of medical services vary from the rest of the wing, then such a poll is biased. Even if we
increase the sample size of medical group personnel to 500, the systematic error is still the same.
The sampling strategy introduced bias as it is quite possible that unit as well as professional pride
would create a deviation between medical group opinion as compared to the total base
population. The deviation is equal to the difference between the population of the medical group
and the whole wing population.
This is important to you as an inspector because sampling bias leads to a systematic
distortion of the estimate of the sampled probability distribution. In laymens terms, the answer
you got from the sample does not reflect the real answer for the population. This means you
could provide misleading information (good or bad) to your wing commander. This could result
in you wing commander addressing a problem that doesnt really exist or its severity in minimal.
Or, your wing commander may not take any corrective action when in reality some action is
required. Also, referring back to Handout 2 Quality Standards for Inspection and Evaluation,
there are numerous statements saying that inspector and inspection findings should be free from
bias or be unbiased.
Sometimes due to resource constraints, you will be forced to use non-probability
sampling techniques. In those cases, you must remember your sample will contain some sort of
selection bias, the resulting answers cannot be used for statistical inference, and you must clearly
communicate the limitations of the information to your wing commander.

Version 20140307-1.0.0

Probability Sampling
Probability sampling involves the selection of a sample from a population, based on the
principle of randomization or chance. Probability sampling is more complex, more timeconsuming and usually more costly than non-probability sampling. However, because items from
the population are randomly selected and each item's probability of inclusion can be calculated,
reliable estimates can be produced along with estimates of the sampling error, and inferences can
be made about the population.
The following are some common probability sampling methods:
simple random sampling
systematic sampling
stratified sampling
cluster sampling
Simple random sampling
In simple random sampling, each member of a population has an equal chance of being included
in the sample. Also, each combination of members of the population has an equal chance of
composing the sample. Those two properties are what defines simple random sampling. To select
a simple random sample, you need to list all of the items in the survey population.
Example 1: To draw a simple random sample of personnel from a wing, each member would
need to be numbered sequentially. If there were 5,000 members in the wing and if the sample
size were 360, then 360 numbers between 1 and 5,000 would need to be randomly generated by a
computer. Each number will have the same chance of being generated by the computer (in order
to fill the simple random sampling requirement of an equal chance for every item). The 360 wing
members corresponding to the 360 computer-generated random numbers would make up the
sample.
Simple random sampling is the easiest method of sampling and it is the most commonly
used. Advantages of this technique are that it does not require any additional information on the
frame (such as geographic areas) other than the complete list of members of the survey
population along with information for contact. Also, since simple random sampling is a simple
method and the theory behind it is well established, standard formulas exist to determine the
sample size, the estimates and so on, and these formulas are easy to use.
On the other hand, this technique makes no use of auxiliary information present on the
frame (i.e., number of employees in each business) that could make the design of the sample
more efficient. And although it is easy to apply simple random sampling to small populations, it

Version 20140307-1.0.0

can be expensive and unfeasible for large populations because all elements must be identified
and labeled prior to sampling.
Systematic sampling
Sometimes called interval sampling, systematic sampling means that there is a gap, or
interval, between each selected item in the sample. In order to select a systematic sample, you
need to follow these steps:
1. Number the items on your frame from 1 to N (where N is the total population size).
2. Determine the sampling interval (K) by dividing the number of items in the population by the
desired sample size. For example, to select a sample of 100 from a population of 400, you would
need a sampling interval of 400 100 = 4. Therefore, K = 4. You will need to select one item out
of every four items to end up with a total of 100 items in your sample.
3. Select a number between one and K at random. This number is called the random start and
would be the first number included in your sample. Using the sample above, you would select a
number between 1 and 4 from a table of random numbers or a random number generator. If you
choose 3, the third item on your frame would be the first item included in your sample; if you
choose 2, your sample would start with the second item on your frame.
4. Select every Kth (in this case, every fourth) item after that first number. For example, the
sample might consist of the following items to make up a sample of 100: 3 (the random start), 7,
11, 15, 19...395, 399 (up to N, which is 400 in this case).
Using the example above, you can see that with a systematic sample approach there are only four
possible samples that can be selected, corresponding to the four possible random starts:
1, 5, 9, 13... 393, 397
2, 6, 10, 14... 394, 398
3, 7, 11, 15... 395, 399
4, 8, 12, 16... 396, 400
Each member of the population belongs to only one of the four samples and each sample
has the same chance of being selected. From that, we can see that each item has a one in four
chance of being selected in the sample. This is the same probability as if a simple random
sampling of 100 items was selected. The main difference is that with simple random sampling,
any combination of 100 items would have a chance of making up the sample, while with
systematic sampling, there are only four possible samples. From that, we can see how precise
systematic sampling is compared with simple random sampling. The population's order on the
frame will determine the possible samples for systematic sampling. If the population is randomly

Version 20140307-1.0.0

distributed on the frame, then systematic sampling should yield results that are similar to simple
random sampling.
This method is often used in industry, where an item is selected for testing from a
production line to ensure that machines and equipment are of a standard quality. For example, a
tester in a manufacturing plant might perform a quality check on every 20th product in an
assembly line. The tester might choose a random start between the numbers 1 and 20. This will
determine the first product to be tested; every 20th product will be tested thereafter.
Example 2: Imagine you have to conduct a survey on base housing for your wing. Your wing has
an adult base housing population of 1,000 and you want to take a systematic sample of 200 adult
base residents. In order to do this, you must first determine what your sampling interval (K)
would be:
Total population (N) sample size (n) = sampling interval (K)
Nn=K
1,000 200 = K
5=K
To begin this systematic sample, all adult base residents would have to be assigned sequential
numbers. The starting point would be chosen by selecting a random number between 1 and 5. If
this number were 3, then the 3rd resident on the list would be selected along with every 5th
student thereafter. The sample of residents would be those corresponding to student numbers 3,
8, 13, 18, 23....
The advantages of systematic sampling are that the sample selection cannot be easier (you only
get one random numberthe random startand the rest of the sample automatically follows)
and that the sample is distributed evenly over the listed population. The biggest drawback of the
systematic sampling method is that if there is some cycle in the way the population is arranged
on a list and if that cycle coincides in some way with the sampling interval, the possible samples
may not be representative of the population.
Stratified sampling
Using stratified sampling, the population is divided into homogeneous, mutually
exclusive groups called strata, and then independent samples are selected from each stratum.
Any of the sampling methods mentioned in this lesson can be used to sample within each
stratum. The sampling method can vary from one stratum to another. When simple random
sampling is used to select the sample within each stratum, the sample design is called stratified
simple random sampling. A population can be stratified by any variable that is available for all

Version 20140307-1.0.0

items on the sampling frame prior to sampling (e.g., rank, age, sex, on base/off base,
Active/Guard/Reserve, income, etc.).

Figure 2 Example of Strata used in Stratified Sampling


Why do we need to create strata? There are many reasons, the main one being that it can
make the sampling strategy more efficient. It was mentioned earlier that you need a larger
sample to get a more accurate estimation of a characteristic that varies greatly from one item to
the other than for a characteristic that does not. For example, if every person in a population had
the same salary, then a sample of one individual would be enough to get a precise estimate of the
average salary.
This is the idea behind the efficiency gain obtained with stratification. If you create strata
within which items share similar characteristics (e.g., income) and are considerably different
from items in other strata (e.g., occupation, type of dwelling) then you would only need a small
sample from each stratum to get a precise estimate of total income for that stratum. Then you
could combine these estimates to get a precise estimate of total income for the whole population.
If you were to use a simple random sampling approach in the whole population without
stratification, the sample would need to be larger than the total of all stratum samples to get an
estimate of total income with the same level of precision.
Stratified sampling ensures an adequate sample size for sub-groups in the population of
interest. When a population is stratified, each stratum becomes an independent population and
you will need to decide the sample size for each stratum.
Example 3: Suppose you need to determine how many focus groups by rank strata you
need to get an understanding of the perceptions, attitudes and beliefs in your wing. In order to
select a stratified simple random sample, you need to follow these steps:

Version 20140307-1.0.0

1. Determine the total wing population. 5,000 members


2. Determine the subpopulations based on your rank strata.
a. Field grade officers 400
b. Company grade officers 600
c. Senior NCOs 500
d. NCOs - 1700
e. Airmen 1800
3. Determine the proportional percentage of each subpopulation
a. Field grade officers 400/5000 = 8%
b. Company grade officers 600/5000 = 12%
c. Senior NCOs 500/5000 = 10%
d. NCOs 1700/5000 = 34%
e. Airmen 1800/5000 = 36%
4. Assuming your total sample size needed was 360 members, multiply the proportional
percentage for each rank strata against the total sample size to determine the
subpopulation sample size.
a. Field grade officers 360 * 8% = 29
b. Company grade officers 360 * 12% = 43
c. Senior NCOs 360 * 10% = 36
d. NCOs 360 * 34% = 122
e. Airmen 360 * 36% = 130
5. Select a simple random sample from each subpopulation based on the numbers in step 4.
This will give you a stratified simple random sample of 360 members.
6. Assuming you want to keep your focus groups to a maximum of 10 participants/group,
you would use the numbers in step 4 to determine how many focus groups you need for
each rank strata.
a. Field grade officers 29/10 = 3 groups
b. Company grade officers 43/10 = 5 groups
c. Senior NCOs 36/10 = 4 groups
d. NCOs 122/10 = 13 groups
e. Airmen 130/10 = 13 groups

Stratification is most useful when the stratifying variables are


simple to work with,
easy to observe, and
closely related to the topic of interest

Version 20140307-1.0.0

A words of caution on the next technique you are about to read. This technique is
only estimating the probability of missing a potentially important perception, belief or attitude. It
is not estimating the percent of a target population who hold a particular perception, belief
or attitude. It will tell you that you have issues, but it will not tell you how widespread the
issues are. It will discover issues, but not measure the issues. Once you uncovered the set of
perceptions, beliefs or attitudes within an organization using this technique, you would then have
to perform additional data collection and analysis to determine how widespread or important the
individual issues are with the organization.
This alternative method to determine to the number of focus groups is to randomly select
30 members from each rank strata and then divide each rank strata into 3 focus groups of 10.
Based on the rank strata groups in example 3, you would create 15 focus groups of 10 people
(150 personnel total). Your focus groups would look like this:
a.
b.
c.
d.
e.

Field grade officers 3 x 10-person focus groups


Company grade officers 3 x 10-person focus groups
Senior NCOs 3 x 10-person focus groups
NCOs - 3 x 10-person focus groups
Airmen 3 x 10-person focus groups

Choosing 30 from each group means there is less than a 5% chance that you have missed an
attitude, perception, or belief with an incidence rate of 10% within the population. The logic
behind this approach and its applications are contained in the following article Sample Size for
Qualitative Research by Peter DePaulo in Attachment 1. Also, if you felt gender might play a
role in the issues identified, you would also need to create six additional focus groups (3 groups
of 10 men, and 3 groups of 10 women). The complete logic behind this approach and its
proper applications are contained in the following article Sample Size for Qualitative
Research by Peter DePaulo in Attachment 1. A sample size calculator based on this
approach is in Attachment 2.
******************************************************************************
Cluster sampling
Sometimes it is too expensive to spread a sample across the population as a whole. Travel
costs can become expensive if interviewers have to survey people from one end of the country to
the other. To reduce costs, statisticians may choose a cluster sampling technique.
Cluster sampling divides the population into groups or clusters. A number of clusters are
selected randomly to represent the total population, and then all items within selected clusters are
included in the sample. If clusters are large, a probability-based sample taken from a single

Version 20140307-1.0.0

cluster is all that is needed. No items from non-selected clusters are included in the samplethey
are represented by those from selected clusters. This differs from stratified sampling, where
some items are selected from each group.
Examples of clusters are squadrons (fighter, communication, comptroller, etc), groups
(operations, maintenance, medical, logistics) and geographic areas such as housing, flight line,
north base, south base etc. The selected clusters are used to represent the population.

Example 4: Suppose your wing commander wants to find out the general readiness of personal
mobility bags across the wing. It would be too costly and lengthy to inspect every personal
mobility bag in the wing. Instead, 10 squadrons are randomly selected from all over the wing.
These squadrons provide clusters of samples. Then every personal mobility bag in all 10 clusters
is inspected. In effect, the bags in these clusters represent all bags in the wing.
As mentioned, cost reduction is a reason for using cluster sampling. It creates 'pockets' of
sampled items instead of spreading the sample over the whole population. Another reason is that
sometimes a list of all items in the population (a requirement when conducting simple random
sample, systematic sample or sampling with probability proportional to size) is not available,
while a list of all clusters is either available or easy to create.
In most cases, the main drawback is a loss of efficiency when compared with simple
random sampling. It is usually better to survey a large number of small clusters instead of a small
number of large clusters. This is because neighboring items tend to be more alike, resulting in a
sample that does not represent the whole spectrum of opinions or situations present in the overall
population. In the previous examples, the readiness of personal mobility bags in the same
squadron or group may be similar due to leadership emphasis within that cluster, deployment
taskings, etc.
Another drawback to cluster sampling is that you do not have total control over the final
sample size. Since not all squadrons have the same number of people and you must inspect every
bag in your sample, the final sample size may be larger or smaller than you expected or needed.

Non-probability Sampling
The difference between probability and non-probability sampling has to do with a basic
assumption about the nature of the population under study. In probability sampling, every item
has a chance of being selected. In non-probability sampling, there is an assumption that there is
an even distribution of characteristics within the population. This is what makes the
INSPECTOR believe that any sample would be representative and because of that, results will be

Version 20140307-1.0.0

accurate. For probability sampling, randomization is a feature of the selection process, rather
than an assumption about the structure of the population.
In non-probability sampling, since elements are chosen arbitrarily, there is no way to
estimate the probability of any one element being included in the sample. Also, no assurance is
given that each item has a chance of being included, making it impossible either to estimate
sampling variability or to identify possible bias.
Reliability cannot be measured in non-probability sampling; the only way to address data
quality is to compare some of the survey results with available information about the population.
Still, there is no assurance that the estimates will meet an acceptable level of error. Statisticians
are reluctant to use these methods because there is no way to measure the precision of the
resulting sample.
Despite these drawbacks, non-probability sampling methods can be useful when
descriptive comments about the sample itself are desired. Secondly, they are quick, inexpensive
and convenient. There are also other circumstances, such as in applied social research, when it is
unfeasible or impractical to conduct probability sampling.
Most non-sampling methods require some effort and organization to complete, but others,
like convenience sampling, are done casually and do not need a formal plan of action. The most
common types are listed below:
purposive sampling
convenience or haphazard sampling
volunteer sampling
judgment sampling
quota sampling

Purposive sampling
Purposive sampling involves taking a sample with a specific purpose or objective in
mind. For example, under current fiscal constraints, IG organizations do not have the manpower
and money to inspect all programs and associated items. This requires them to select a sample of
programs for inspection that provides the greatest return on inspection investment dollar. Some
factors that will drive this purposive sampling approach will be:
-

Special Interest Items (SIIs)

Version 20140307-1.0.0

Command-interest areas or Command-emphasis areas


Likelihood of program failure
Impact on the mission or people if a program fails
Negative indicators that there may be issues with a program

Based on these factors, the IG organization should strive to select a purposive sample with the
objective being to select a sample that contains only high risk programs where there are negative
indicators so that there is a maximum return on inspection dollars.
Convenience or haphazard sampling
Convenience sampling is sometimes referred to as haphazard or accidental sampling. It is
not normally representative of the target population because sample items are only selected if
they can be accessed easily and conveniently.
There are times when the average person uses convenience sampling. A food critic, for
example, may try several appetizers or entrees to judge the quality and variety of a menu. And
television reporters often seek so-called people-on-the-street interviews' to find out how people
view an issue. In both these examples, the sample is chosen randomly, without use of a specific
sampling method.
The obvious advantage is that the method is easy to use, but that advantage is greatly
offset by the presence of bias. Although useful applications of the technique are limited, it can
deliver accurate results when the population is homogeneous.
For example, a scientist could use this method to determine whether a lake is polluted.
Assuming that the lake water is well-mixed, any sample would yield similar information. A
scientist could safely draw water anywhere on the lake without fretting about whether or not the
sample is representative.
Examples of convenience sampling include:
the first row of mobility bags sitting in the first row of a mobility warehouse
the first 100 military members to enter the mobility processing line
the first 50 customers to through the chow hall
the first 10 travel vouchers processed that day.
Volunteer sampling
As the term implies, this type of sampling occurs when people volunteer their services for
the study. In psychological experiments or pharmaceutical trials (drug testing), for example, it
would be difficult and unethical to enlist random participants from the general public. In these

Version 20140307-1.0.0

instances, the sample is taken from a group of volunteers. Sometimes, the INSPECTOR offers
payment to entice respondents. In exchange, the volunteers accept the possibility of a lengthy,
demanding or sometimes unpleasant process.
Sampling voluntary participants as opposed to the general population may introduce
strong biases. Often in opinion polling, only the people who care strongly enough about the
subject one way or another tend to respond. The silent majority does not typically respond,
resulting in large selection bias. Television and radio media often use call-in polls to informally
query an audience on their views. Oftentimes, there is no limit imposed on the frequency or
number of calls one respondent can make. So, unfortunately, a person might be able to vote
repeatedly. It should also be noted that the people who contribute to these surveys might have
different views than those who do not.
Judgment Sampling
This approach is used when you want a quick sample and you believe you are able to
select a sufficiently representative sample for your purposes. You will use your own judgment to
select what seems like an appropriate sample.
This method is highly liable to bias and error as the INSPECTOR makes inexpert
judgment and selection. You probably have be experienced in research methods before you can
make a fair judgment about the right sample. Judgment sampling is often a last-resort method
that may be used when there is no time to do a proper study. In qualitative research, it is common
and can be appropriate as the INSPECTOR explores anthropological situations where the
discovery of meaning can benefit from an intuitive approach.
Quota sampling
This is one of the most common forms of non-probability sampling. Sampling is done
until a specific number of items (quotas) for various sub-populations have been selected. Since
there are no rules as to how these quotas are to be filled, quota sampling is really a means for
satisfying sample size objectives for certain sub-populations.
The quotas may be based on population proportions. For example, if there are 100 men
and 100 women in a population and a sample of 20 are to be drawn to participate in a cola taste
challenge, you may want to divide the sample evenly between the sexes10 men and 10
women. Quota sampling can be considered preferable to other forms of non-probability sampling
(e.g., judgment sampling) because it forces the inclusion of members of different subpopulations.
Quota sampling is somewhat similar to stratified sampling in that similar items are
grouped together. However, it differs in how the items are selected. In probability sampling, the
items are selected randomly while in quota sampling it is usually left up to the interviewer to

Version 20140307-1.0.0

decide who is sampled. This results in selection bias. Thus, quota sampling is often used by
market INSPECTORs (particularly for telephone surveys) instead of stratified sampling, because
it is relatively inexpensive and easy to administer and has the desirable property of satisfying
population proportions. However, it disguises potentially significant bias.
As with all other non-probability sampling methods, in order to make inferences
about the population, it is necessary to assume that persons and/or things selected are
similar to those not selected. Such strong assumptions are rarely valid.
The main difference between stratified sampling and quota sampling is that stratified
sampling would select the students using a probability sampling method such as simple random
sampling or systematic sampling. In quota sampling, no such technique is used. The 15 students
might be selected by choosing the first 15 Grade 10 students to enter school on a certain day, or
by choosing 15 students from the first two rows of a particular classroom. Keep in mind that
those students who arrive late or sit at the back of the class may hold different opinions from
those who arrived earlier or sat in the front.
The main argument against quota sampling is that it does not meet the basic requirement
of randomness. Some items may have no chance of selection or the chance of selection may be
unknown. Therefore, the sample may be biased. Quota sampling is generally less expensive than
random sampling. It is also easy to administer, especially considering the tasks of listing the
whole population, randomly selecting the sample and following-up on non-respondents can be
omitted from the procedure. Quota sampling is an effective sampling method when information
is urgently required and can be carried out independent of existing sampling frames. In many
cases where the population has no suitable frame, quota sampling may be the only appropriate
sampling method.

Version 20140307-1.0.0

Attachment 1

Sample size for qualitative research


The risk of missing something important
Editors note: Peter DePaulo is an independent marketing research consultant and focus group
moderator doing business as DePaulo Research Consulting, Montgomeryville, Pa.
In a qualitative research project, how large should the sample be? How many focus group
respondents, individual depth interviews (IDIs), or ethnographic observations are needed?
We do have some informal rules of thumb. For example, Maria Krieger (in her white paper, The
Single Group Caveat, Brain Tree Research & Consulting, 1991) advises that separate focus
groups are needed for major segments such as men, women, and age groups, and that two or
more groups are needed per segment because any one group may be idiosyncratic. Another
guideline is to continue doing groups or IDIs until we seem to have reached a saturation point
and are no longer hearing anything new.
Such rules are intuitive and reasonable, but they are not solidly grounded and do not really tell us
what an optimal qualitative sample size may be. The approach proposed here gives specific
answers based on a firm foundation.
First, the importance of sample size in qualitative research must be understood.
Size does matter, even for a qualitative sample
One might suppose that N (the number in the sample) simply is not very important in a
qualitative project. After all, the effect of increasing N, as we learned in statistics class, is to
reduce the sampling error (e.g., the +/- 3 percent variation in opinion polls with N = 1,000) in a
quantitative estimate. Qualitative research normally is inappropriate for estimating quantities. So,
we lack the old familiar reason for increasing sample size.
Nevertheless, in qualitative work, we do try to discover something. We may be seeking to
uncover: the reasons why consumers may or may not be satisfied with a product; the product
attributes that may be important to users; possible consumer perceptions of celebrity
spokespersons; the various problems that consumers may experience with our brand; or other
kinds of insights. (For lack of a better term, I will use the word perception to refer to a reason,
need, attribute, problem, or whatever the qualitative project is intended to uncover.) It would be
up to a subsequent quantitative study to estimate, with statistical precision, how important or
prevalent each perception actually is.
The key point is this: Our qualitative sample must be big enough to assure that we are likely to
hear most or all of the perceptions that might be important. Within a target market, different
customers may have diverse perceptions. Therefore, the smaller the sample size, the narrower the
range of perceptions we may hear. On the positive side, the larger the sample size, the less likely
it is that we would fail to discover a perception that we would have wanted to know. In other

Version 20140307-1.0.0

words, our objective in designing qualitative research is to reduce the chances of discovery
failure, as opposed to reducing (quantitative) estimation error.
Discovery failure can be serious
What might go wrong if a qualitative project fails to uncover an actionable perception (or
attribute, opinion, need, experience, etc.)? Here are some possibilities:

A source of dissatisfaction is not discovered - and not corrected. In highly competitive


industries, even a small incidence of dissatisfaction could dent the bottom line.

In the qualitative testing of an advertisement, a copy point that offends a small but vocal
subgroup of the market is not discovered until a public-relations fiasco erupts.

When qualitative procedures are used to pre-test a quantitative questionnaire, an


undiscovered ambiguity in the wording of a question may mean that some of the subsequent
quantitative respondents give invalid responses. Thus, qualitative discovery failure eventually
can result in quantitative estimation error due to respondent miscomprehension.
Therefore, size does matter in a qualitative sample, though for a different reason that in a quant
sample. The following example shows how the risk of discover failure may be easy to overlook
even when it is formidable.
Example of the risk being higher than expected
The managers of a medical clinic (name withheld) had heard favorable anecdotal feedback about
the clinics quality, but wanted an independent evaluation through research. The budget
permitted only one focus group with 10 clinic patients. All 10 respondents clearly were satisfied
with the clinic, and group discussion did not reverse these views.
Did we miss anything as a result of interviewing only 10? Suppose, for example that the clinic
had a moody staff member who, unbeknownst to management, was aggravating one in 10 clinic
patients. Also, suppose that management would have wanted to discover anything that affects the
satisfaction at least 10 percent of customers. If there really was an unknown satisfaction problem
with a 10 percent incidence, then what was the chance that our sample of 10 happened to miss it?
That is, what is the probability that no member of the subgroup defined as those who
experienced the staffer in a bad mood happened to get into the sample?
At first thought, the answer might seem to be not much chance of missing the problem. The
hypothetical incidence is one in 10, and we did indeed interview 10 patients. Actually, the
probability that our sample failed to include a patient aggravated by the moody staffer turns out
to be just over one in three (0.349 to be exact). This probability is simple to calculate: Consider
that the chance of any one customer selected at random not being a member of the 10 percent
(aggravated) subgroup is 0.9 (i.e., a nine in 10 chance). Next, consider that the chance of failing
to reach anyone from the 10 percent subgroup twice in a row (by selecting two customers at
random) is 0.9 X 0.9, or 0.9 to the second power, which equals 0.81. Now, it should be clear that

Version 20140307-1.0.0

the chance of missing the subgroup 10 times in a row (i.e., when drawing a sample of 10) is 0.9
to the tenth power, which is 0.35. Thus, there is a 35 percent chance that our sample of 10 would
have missed patients who experienced the staffer in a bad mood. Put another way, just over
one in three random samples of 10 will miss an experience or characteristic with an incidence of
10 percent.
This seems counter-intuitively high, even to quant researchers to whom I have shown this
analysis. Perhaps people implicitly assume the fallacy that if something has an overall frequency
of one in N, then it is almost sure to appear in N chances.
Basing the decision on calculated probabilities
So, how can we figure the sample size needed to reduce the risk as much as we want? I am
proposing two ways. One would be based on calculated probabilities like those in the table
above, which was created by repeating the power calculations described above for various
incidences and sample sizes. The client and researcher would peruse the table and select a
sample size that is affordable yet reduces the risk of discover failure to a tolerable level.
For example, if the research team would want to discover a perception with an incidence as low
as 10 percent of the population, and if the team wanted to reduce the risk of missing that
subgroup to less than 5 percent, then a sample of N=30 would suffice, assuming random
selection. (To be exact, the risk shown in the table is .042, or 4.2 percent.) This is analogous to
having 95 percent confidence in being able to discover a perception with a 10 percent incidence.
Remember, however, that we are expressing the confidence in uncovering a qualitative insight as opposed to the usual quantitative notion of confidence in estimating a proportion or mean
plus or minus the measurement error.
If the team wants to be more conservative and reduce the risk of missing the one-in-10 subgroup
to less than 1 percent (i.e., 99 percent confidence), then a sample of nearly 50 would be needed.
This would reduce the risk to nearly 0.005 (see table).

Version 20140307-1.0.0

What about non-randomness?


Of course, the table assumes random sampling, and qualitative samples often are not randomly
drawn. Typically, focus groups are recruited from facility databases, which are not guaranteed to
be strictly representative of the local adult population, and factors such as refusals (also a
problem in quantitative surveys, by the way) further compromise the randomness of the sample.
Unfortunately, nothing can be done about subgroups that are impossible to reach, such as people
who, for whatever reason, never cooperate when recruiters call. Nevertheless, we can still sample
those subgroups who are less likely to be reached as long as the recruiters call has some chance
of being received favorably, for example, people who are home only half as often as the average
target customer but will still answer the call and accept our invitation to participate. We can
compensate for their reduced likelihood of being contacted by thinking of their reachable
incidence as half of their actual incidence. Specifically, if we wanted to allocate enough budget
to reach a 10 percent subgroup even if it is twice as hard to reach, then we would suppose that
their reachable incidence is as low as 5 percent, and look at the 5 percent row in the table. If, for
instance, we wanted to be very conservative, we would recruit 100 respondents, resulting in less
than a 1 percent chance - .006, to be exact - of missing a 5 percent subgroup (or a 10 percent
subgroup that behaves like a 5 percent subgroup in likelihood of being reached).
An approach based on actual qualitative findings
The other way of figuring an appropriate sample size would be to consider the findings of a pair
of actual qualitative studies reported by Abbie Griffin and John Hauser in an article, The Voice
of the Customer (Marketing Science, Winter 1993). These researchers looked at the number of
customer needs uncovered by various numbers of focus groups and in-depth interviews.
In one of the two studies, two-hour focus groups and one-hour in-depth interviews (IDIs) were
conducted with users of a complex piece of office equipment. In the other study, IDIs were
conducted with consumers of coolers, knapsacks, and other portable means of storing food. Both
studies looked at the number of needs (attributes, broadly defined) uncovered for each product
category. Using mathematical extrapolations, the authors hypothesized that 20-30 IDIs are
needed to uncover 90-95 percent of all customer needs for the product categories studied.
As with typical learning curves, there were diminishing returns in the sense that fewer new (nonduplicate) needs were uncovered with each additional IDI. It seemed that few additional needs
would be uncovered after 30 IDIs. This is consistent with the probability table (shown earlier),
which shows that perceptions of all but the smallest market segments are likely to be found in
samples of 30 or less.
In the office equipment study, one two-hour focus group was no better than two one-hour IDIs,
implying that group synergies [did] not seem to be present in the focus groups. The study also
suggested that multiple analysts are needed to uncover the broadest range of needs.

Version 20140307-1.0.0

These studies were conducted within the context of quality function deployment, where,
according to the authors, 200-400 customer needs are usually identified. It is not clear how the
results might generalize to other qualitative applications.
Nevertheless, if one were to base a sample-size decision on the Griffin and Hauser results, the
implication would be to conduct 20-30 IDIs and to arrange for multiple analysts to look for
insights in the data. Perhaps backroom observers could, to some extent, serve as additional
analysts by taking notes while watching the groups or interviews. The observers notes might
contain some insights that the moderator overlooks, thus helping to minimize the chances of
missing something important.
N=30 as a starting point for planning
Neither the calculation of probabilities in the prior table nor the empirical rationale of Griffin and
Hauser is assured of being the last word on qualitative sample size. There might be other ways of
figuring the number of IDIs, groups, or ethnographic observations needed to avoid missing
something important.
Until the definitive answer is provided, perhaps an N of 30 respondents is a reasonable starting
point for deciding the qualitative sample size that can reveal the full range (or nearly the full
range) of potentially important customer perceptions. An N of 30 reduces the probability of
missing a perception with a 10 percent-incidence to less than 5 percent (assuming random
sampling), and it is the upper end of the range found by Griffin and Hauser. If the budget is
limited, we might reduce the N below 30, but the client must understand the increased risks of
missing perceptions that may be worth knowing. If the stakes and budget are high enough, we
might go with a larger sample in order to ensure that smaller (or harder to reach) subgroups are
still likely to be represented.
If focus groups are desired, and we want to count each respondent separately toward the N we
choose (e.g., getting an N of 30 from three groups with 10 respondents in each), then it is
important for every respondent to have sufficient air time on the key issues. Using mini groups
instead of traditional-size groups could help achieve this objective. Also, it is critical for the
moderator to control dominators and bring out the shy people, lest the distinctive perceptions of
less-talkative customers are missed.
Across segments or within each one?
A complication arises when we are separately exploring different customer segments, such as
men versus women, different age groups, or consumers in different geographic regions. In the
case of gender and a desired N of 30, for example, do we need 30 in total (15 males plus 15
females) or do we really need to interview 60 people (30 males plus 30 females)? This is a
judgment call, which would depend on the researchers belief in the extent to which customer
perceptions may vary from segment to segment. Of course, it may also depend on budget. To
play it safe, each segment should have its own N large enough so that appreciable subgroups
within the segment are likely to be represented in the sample.

Version 20140307-1.0.0

What if we only want the typical or majority view?


For some purportedly qualitative studies, the stated or implied purpose may be to get a sense of
how customers feel overall about the issue under study. For example, the client may want to
know whether customers generally respond favorably to a new concept. In that case, it might
be argued that we need not be concerned about having a sample large enough to make certain
that we discover minority viewpoints, because the client is interested only in how most
customers react.
The problem with this agenda is that the qualitative research would have an implicit
quantitative purpose: to reveal the attribute or point of view held by more than 50 percent of the
population. If, indeed, we observe what most qualitative respondents say or do and then infer
that we have found the majority reaction, we are doing more than discovering that reaction:
We are implicitly estimating its incidence at more than 50 percent.
The approach I propose makes no such inferences. If we find that only one respondent in a
sample of 30 holds a particular view, we make no assumption that it represents a 10 percent
population incidence, although, as discussed later, it might be that high. The actual population
incidence is likely to be closer to 3.3 percent (1/30) than to 10 percent. Moreover, to keep the
study qualitative, we should not say that we have estimated the incidence at all. We only want to
ensure that if there is an attribute or opinion with an incidence as low as 10 percent, we are likely
to have at least one respondent to speak for it - and a sample of 30 will probably do the job.
If we do want to draw quantitative inferences from a qualitative procedure (and, normally, this is
ill advised), then this paper does not apply. Instead, the researchers should use the usual
calculations for setting a quantitative sample size at which the estimation error resulting from
random sampling variations would be acceptably low.
Keeping qualitative pure
Whenever I present this sample-size proposal, someone usually objects that I am somehow
quantifying qualitative. On the contrary, estimating the chances of missing a potentially
important perception is completely different from estimating the percent of a target population
who hold a particular perception. To put it another way, calculating the odds of missing a
perception with a hypothetical incidence does not quantify the incidences of those perceptions
that we actually do uncover.
Therefore, qualitative consultants should not be reluctant to talk about the probability of missing
something important. In so doing, they will not lose their identity as qualitative researchers, nor
will they need any high math. Moreover, by distinguishing between discovery failure and
estimation error, researchers can help their clients fully understand the difference between
qualitative and quantitative purposes. In short, the approach I propose is intended to ensure that
qualitative will accomplish what it does best - to discover (not measure) potentially important
insights.

Version 20140307-1.0.0

Attachment 2
Sample Size Calculator
for
Probability of Missing a Subpopulation in a Focus Group

(Double click on the table to activate it)

Probability of Missing a Subpopulation (Factor) in a


Focus Group with Randomly Sampled Participants
Incidence

Number of Participants
10

20

30

40

50

60

5%

0.5987

0.3585

0.2146

0.1285

0.0769

0.0461

10%

0.3487

0.1216

0.0424

0.0148

0.0052

0.0018

15%

0.1969

0.0388

0.0076

0.0015

0.0003

0.0001

20%

0.1074

0.0115

0.0012

0.0001

0.0000

0.0000

Notes : Probabilities greater than 5% are indicated by gray text (excluded by "rule of
thumb" in that mis s ing a factor more than five times in 100 is undes irable). An
optimal outcome is indicated by red text; the probability of mis s ing a factor with a
10% incidence rate in the population is .0424 when 30 participants are randomly
s ampled from the population.

Note 1: The numbers in Row 6 can be edited based on group sizes


Note 2: The percentages in column B can be edited to fine-tune acceptable