A Review and Analysis of The Policy Capturing Methodology in Organizational Research Guidelines For Research and Practice 2002 Organizational Research Methods

Organizational Research Methods
http://orm.sagepub.com/
A Review and Analysis of the Policy-Capturing Methodology in Organizational Research: Guidelines for
Research and Practice
Ronald J. Karren and Melissa Woodard Barringer
Organizational Research Methods 2002 5: 337
DOI: 10.1177/109442802237115
The online version of this article can be found at:
http://orm.sagepub.com/content/5/4/337
Published by:
http://www.sagepublications.com
On behalf of:
The Research Methods Division of The Academy of Management
Additional services and information for Organizational Research Methods can be found at:
Email Alerts: http://orm.sagepub.com/cgi/alerts
Subscriptions: http://orm.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://orm.sagepub.com/content/5/4/337.refs.html
>> Version of Record - Oct 1, 2002

What is This?
Downloaded from orm.sagepub.com at U.A.E University on August 11, 2014
ORGANIZATIONAL
10.1177/109442802237115
Karren,
Barringer / THE
RESEARCH
POLICY-CAPTURING
METHODS METHODOLOGY
A Review and Analysis of the Policy-Capturing

Methodology in Organizational Research:
Guidelines for Research and Practice
RONALD J. KARREN
MELISSA WOODARD BARRINGER
University of Massachusetts, Amherst
Policy-capturing has been employed extensively in the past to examine how organizational decision makers use the information available to them when making
evaluative judgments. The purpose of this article is to provide researchers with
guidelines for enhancing the reliability and validity of their studies. More specifically, the authors identify issues researchers may want to consider when designing
such studies and offer suggestions for effectively addressing them. They draw on a
review of 37 articles from 5 major journals to identify best practice and discuss
the advantages and disadvantages of alternative approaches to resolving the various issues. The key issues are (a) the realism of the approach and its effect on both
internal and external validity, (b) the limits of the full factorial design, (c) the need
for orthogonal cues, (d) sample size and statistical power, and (e) the assessment
of reliability. The analysis also includes comparisons with conjoint analysis, a
similar methodology used in the marketing research literature.
Policy-capturing is a method employed by researchers to assess how decision makers

use available information when making evaluative judgments (Zedeck, 1977). The
purpose of this methodology is to capture individual judgesdecision-making policies,
that is, how they weight, combine, or integrate information (Zedeck, 1977, p. 51). It
involves asking decision makers to judge a series of scenarios describing various levels
of the explanatory factors, or cues, and then regressing their responses on the cues. The
estimated coefficients indicate the relative importance of the various cues and define
the patterns or strategies for each decision maker. These results can also be used to
explore the variability or individual differences among the decision makers (see, e.g.,
Graves & Karren, 1992). This involves employing cluster analytic methods to group
individuals with similar cue weights or strategies. Thus, policy-capturing can be used
not only to identify the extent of individual differences in strategies but also to group or
cluster individuals with similar policies.
The policy-capturing approach has been used to assess judgments in a variety of
content areas such as job search (Cable & Judge, 1994; Rynes & Lawler, 1983; Rynes,
Schwab, & Heneman, 1983), compensation (Sherer, Schwab, & Heneman, 1987),
Organizational Research Methods, Vol. 5 No. 4, October 2002 337-361
DOI: 10.1177/109442802237115
2002 Sage Publications
337
338
ORGANIZATIONAL RESEARCH METHODS
employee discipline (Klaas & Wheeler, 1990), job analysis (Sanchez & Levine, 1989),
sexual harassment (York, 1989), employment interviews (Dougherty, Ebert, &
Callender, 1986; Graves & Karren, 1992), contract arbitration (Olson, DellOmo, &
Jarley, 1992), and motivation (Zedeck, 1977). Most of these studies have employed an
experimental policy-capturing design. That is, a survey questionnaire is utilized to
elicit judges likely responses to scenarios rather than field data being used to examine
actual responses. A thorough review of the latter approach is available from Roehling
(1993), and we therefore confine our analysis to experimental designs.
The popularity of the policy-capturing method stems from a number of advantages.
First, the method overcomes many of the limitations inherent in other, more direct
approaches to examining individuals decision policies. For example, a simpler
method involves asking individuals to rate or rank the variables of interest in order of
importance (self-report attribute ratings). Concerns about the validity of this method
have been raised by studies finding that such stated decision policies differ from actual
(observed) decision policies (Hitt & Middlemist, 1979; Sherer et al., 1987; Stumpf &
London, 1981). The discrepancy may stem from individuals not being candid in their
responses because of a desire to be socially correct. Rating challenging work above
high pay, for example, may be the socially desirable response to questions about the
importance of job attributes. In fact, studies have shown that less socially desirable factors such as salary receive more weight and are seen as more important when they are
derived through policy-capturing (see Feldman & Arnold, 1978). Policy-capturing
purportedly weakens such social desirability effects by indirectly assessing the importance of explanatory variables and, for this reason, is considered preferable to the selfreport attribute method (Arnold & Feldman, 1981; Judge & Bretz, 1992; Rynes et al.,
1983).1 Moreover, rating or ranking individual attributes requires more self-insight
than making overall judgments about multi-attribute scenarios. Finally, asking individuals to make overall judgments about multi-attribute scenarios is more similar to
actual decision problems, and hence more realistic, than is a self-report attribute
design (Rynes et al., 1983).
Another advantage of the methodology comes from the ability of the researcher to
experimentally manipulate cue values. By minimizing variable intercorrelations,
researchers avoid the problems of multicollinearity often found with field data and
enhance the capacity to assess the independent effects of cues (e.g., Feldman &
Arnold, 1978). Furthermore, policy-capturing is typically carried out at the individual
level. This means that a separate model is generated for each decision maker, although
aggregate analyses of groups of individuals can also be conducted. These separate,
individual analyses allow for a more in-depth assessment of differences between individuals.
These advantages notwithstanding, there are several issues that researchers designing a policy-capturing study must address if they are to avoid questions about the validity of their results. A number of concerns have been raised about the effect of using
simulated decision contexts on the external validity of results. Care must therefore be
taken to create scenarios that include salient and realistically defined cues and to avoid
unlikely cue combinations. Validity may also be compromised by a failure to consider
respondent overload, statistical power, and reliability.
The purpose of this article is to provide researchers interested in using the policycapturing approach with guidelines for enhancing the reliability and validity of their
Karren, Barringer / THE POLICY-CAPTURING METHODOLOGY
339
studies. More specifically, we identify the issues researchers may want to consider
when designing a policy-capturing study and offer suggestions for effectively addressing these issues. The aim is to provide those unfamiliar with the methodology with
accessible and comprehensible suggestions for designing their own studies. Our analysis of the issues is informed by a review of the approaches taken by researchers who
have used the policy-capturing design in past research. We also explore how marketing
researchers using conjoint analysis, a methodology for assessing the relative importance of product attributes in consumer decision making, have addressed some of the
issues. Our review of all of these studies is aimed not at critiquing the research designs
but, rather, at developing an understanding of what can be considered best practice.
Where the best practice is unclear, that is, where researchers have used a number of different approaches to address a design issue, we discuss the advantages and disadvantages of each and offer recommendations for putting available techniques to the best use.
We distinguish our article from the Aiman-Smith, Scullen, and Barr study (2002
[this issue]) in two ways. First, Aiman-Smith et al. offer a tutorial that covers a broad
range of topics including study design, execution, analysis, interpretation, and reporting of policy-capturing studies. Our article covers a narrower range of topics but generally goes into more depth on some of the key issues regarding design and execution.
Second, our discussion of these issues incorporates our extensive review of prior studies using the policy-capturing methodology.
We begin our article with a brief review of studies that have used the policy-capturing approach. We have limited our selection of studies to those published over the past
25 years in five highly regarded management journals: the Journal of Applied Psychology, Personnel Psychology, the Journal of Management, the Academy of Management
Journal, and Organizational Behavior and Human Decision Processes. We found 37
studies that used policy-capturing as the primary methodology. These studies are summarized in Table 1. As Table 1 indicates, we found differences across these studies in
the types of decisions made, the samples used, the designs employed, the stimuli presented to the respondents, and the analyses employed by the researchers.
Job choice, ratings of job applicants, and performance evaluation decisions made
up approximately one half of the types of decisions studied. Other popular types were
compensation, disciplinary, and absence decisions. The remaining types of judgments
include promotion, sexual harassment, job task importance, firm acquisition, media
choice, organizational effectiveness, utility, transportation services, and changes in
work attitudes.
The samples for these studies were drawn from a number of different groups.
Undergraduate and graduate students were among the most used subjects in the studies. In most cases, they were used to simulate decisions that would typically involve
them (e.g., job choice). Other studies included decisions made by managers, employees, and specialists such as interviewers and recruiters, executives, and faculty members. Again, in most cases, the decision makers were those who typically had experience with the decisions.
We grouped the designs into four categories: full factorials (16), fractional factorials (4), studies without a factorial design and in which the intercorrelations were zero
or near zero (12), and studies in which the intercorrelations were significantly above or
below zero (4). One study did not report the size of the intercorrrelations. Thus, for
most of the designs, the intercorrelation of the factors was either zero or near zero.
(Text continues on page 344)
340
Table 1
Summary of Published Policy-Capturing Studies
Study by Journal
Decision Context
Sample
Variables and Levels
Journal of Applied Psychology

Allen & Muchinsky (1984)
Transportation
services
Dougherty, Ebert, &
Callender (1986)
Dunn, Mount, Barrick, &
Ones (1995)
47 undergraduates
4 variables (4 3 3 2)
and 55 government
employees
Interviewers ratings
One organization with
8 variables
of job applicants
3 interviewers
Managers evaluations 84 managers from
6 variables: 2 levels each
of job applicants
multiple organizations
Feldman & Arnold (1978)
Job choice
Hitt & Barr (1989)
Hollenbeck & Williams

(1987)
Evaluating job
applicants and
starting salaries
Changes in work
attributes
Judge & Bretz (1992)
Job choice
Orr, Sackett, & Mercer

(1989)
Estimates of the
dollar value of
performance
Job choice
Rynes & Lawler (1983)
62 graduate students
from 2 universities
68 line and staff
managers from
88 department store
salespersons from
one organization
67 undergraduates
from 2 universities
17 managers from
1 organization
10 undergraduates
4 variables (4 3 2 3)
6 applicant attributes:
2 levels each
Stimulus Presentation
Correlation of Variables
36 written
transportation scenarios
36 scenarios
of 72 possible;
intercorrelations < .25
.05 to .97; average
was .52
Less than .10
120 videotapes of actual

applicant interviews
39 written personality
profiles of hypothetical
job applicants
64 written paragraph
Orthogonalfull
descriptions
factorial
Information and videotapes Orthogonalfractional
of applicants
factorial: 16 of 64
6 facets/variables; 4 levels Written scenarios with a

subset of factors
Slight correlations,
r < .20
Written scenarios
13 variables: multiple
values
50 written profiles of
hypothetical
programmers
Written job descriptions
Orthogonalfull
factorial
Uncorrelated
dimensions
Sanchez & Levine (1989)
Overall importance
of job tasks
60 employees from
4 jobs in 2 cities
6 task factors: multiple

levels
Viswesvaran & Barrick

(1992)
2 compensation
decisions
35 compensation
specialists
(a) 5 factors (2 4
4 5 5); (b) 5 factors
(3 3 5 3 5)
Task inventories with

descriptions of 6 task
scales
Hypothetical firm
descriptions: (a) 42 out
of 800; (b) 40 out of 675
Orthogonalfull
factorial
Correlated (ranged from
.02 to .91)
Near 0; less than .10
Zedeck & Cascio (1982)
Personnel Psychology
Cable & Judge (1994)
Graves & Karren (1992)
Klaas & DellOmo (1991)
Klaas & Wheeler (1990)
Madden (1981)
Sherer, Schwab, &

Heneman (1987)
Zhou & Martocchio (2001)
Performance appraisal 130 undergraduates

decisions
from 1 university
5 performance factors:
3 levels each
Written descriptions of
performance
Intercorrelations were
approx. zero
variables
Interviewers
evaluations of
job candidates
Disciplinary decisions
of substance abuse
171 undergraduate
5 factors: 2 levels each
students seeking
jobs from 1 university
29 corporate
interviewers from
1 organization
93 managers from
2 organizations
Orthogonalfull
factorial + 4
replicates
Orthogonalfull
factorial
Disciplinary decisions
variables
2 factors: gender and

performance level
(2 3)
Written profiles
Not reportedprofiles
randomly selected
5 factors: (2 2
3 2 2)
Written profiles of
hypothetical employees
71 Chinese managers
and 218 grad.
student alumni/
managers
4 factors (2 2 2 2)
Written profiles of
employees
Orthogonalfull
factorial + full
replicates
Orthogonalfull
factorial
13 supervisors/10
faculty members
20 faculty members
and chair of
university
Both had 16 variables

with 5 values
3 levels each
Written profiles
Intercorrelations < .2
Written hypothetical
profiles
set to zero
Job search decisions
19 human resource
managers and
28 line managers
Performance
3 experiments with
evaluations
undergraduates:
subjects were 58,
70, and 43
Salary raise decisions 11 supervisory
personnel
Compensation
decisions
Organizational Behavior and Decision Processes

Brannick & Brannick
Performance ratings
(1989)
Hobson, Mendel, &
Performance rating
Gibson (1981)
behavior of
supervisor/
subordinate
variables
Written case scenarios
Orthogonal; 32 of 64
possible, fractional
design
Orthogonalfull
factorial
341
(continued)
342
Table 1 continued
Study by Journal
Decision Context
Sample
Organizational Behavior and Decision Processes

Martocchio & Judge (1994) Decisions to be absent 138 workers
Mazen (1990)
Evaluation of applicant 118 recruiters at

profiles
1 university
Rynes, Schwab, &
Job application
10 college seniors
Heneman (1983)
decisions
Zedeck (1977)
Job choice
91 undergraduates
and 233 MBAs
Zedeck & Kafry (1977)
Overall performance
67 nursing personnel
evaluations
for multiple orgs.
Academy of Management Journal
Arnold & Feldman (1981)
Job organization
86 graduate students
choice
Hitt & Middlemist (1979)
Organizational
50 managers
effectiveness
judgments
Pablo (1994)
Integration design
56 executives
decisions
Stahl & Zimmerer (1984)
Firm acquisition
42 executives
decisions
Stumpf & London (1981)
Promotion decisions
43 managers and
51 students
Webster & Trevino (1995)
Media choice
(a) 197 employees;
(b) 70 employees
Variables and Levels

6 factors (2 2 2
2 2 3)
9 factors: 9 values
5 factors (5 3
2 2 2)
Stimulus Presentation
Written scenarios
Profiles presented on
cards
Written scenarios
3 levels each
hypothetical orgs.
Written scenarios
containing behaviors
Written scenarios
Correlation of Variables
Orthogonalfull
factorial
Median correlation = .09
Orthogonalfull
factorial
zero
zero
Orthogonalfull
factorial
25 possible factors:
Simulated cases or
Intercorrelations
5 levels
work units
between criteria,
most below .4
Hypothetical scenarios
Orthogonalfractional
factorial design
6 factors (1/2 2 2
Written acquisition decision Fractional factorial
2 2 2 2)
making exercise
zero correlations
5 factors (2 2 3 2 2) Written hypothetical
Orthogonalfull
candidates
factorial
(a) 5 factors (5 3
Written scenarios
(a) No significant
2 2 2); (b) 3 factors
associations among
(3 2 4)
factors;
(b) orthogonal
full factorial
York (1989)
Journal of Management
Beatty, McCune, &
Beatty (1988)
Policies on sexual
harassment
15 Equal Employment 8 factors: most had

Opportunity officers;
2 levels
79 undergraduate and
graduate students
Written profiles
80 of 384; correlations
were near zero
Compensation
decisions
41 Japanese and 63
U.S. managers from
65 undergraduate and
graduate students,
2 universities
138 employees in a
university
57 employees or
19 triads (supervisor
and subordinates)
8 factors: multiple levels
hypothetical employees
Average correlation =
.22
Written scenarios
Orthogonalfull
factorial
4 factors (2 2 2 3)
Written scenarios
Written scenarios
Orthogonalfull
factorial
Orthogonalfull
factorial
Bretz & Judge (1994)
Job choice
Judge & Martocchio (1996)
Attributions on the
cause of absence
Absence disciplinary
decisions
Martocchio & Judge (1995)
343
344
Most of the stimuli presented were written scenarios or profiles. Only a few were
videotaped. Of those written scenarios, almost all were hypothetical situations.
The results of the vast majority of the studies included an assessment of the policies
of individual judges. Among studies that examined both linear and nonlinear effects,
the linear model contained most of the explained variance; thus, little was attributed to
nonlinear or interaction effects. About a third of these studies considered some form of
cluster analysis to group the relative weights. A large minority of the studies also used
between-subjects analyses specifically to test various hypotheses of the study. Some of
the studies included comparisons of the policy-capturing methodology with direct
assessments. The results of these comparisons indicated that the correspondence was
only moderate (e.g., Arnold & Feldman, 1981).
We next discuss the five key issues we have identified as important design considerations for researchers developing a policy-capturing study: (a) the realism of the
approach and its affect on both internal and external validity, (b) the limits of the full
factorial design, (c) the need for orthogonal cues, (d) sample size and statistical power,
and (e) the assessment of reliability. We review how past researchers have addressed
these issues and offer recommendations accordingly.
Issues in the Design of

Policy-Capturing Studies
Realism
A recurrent concern about policy-capturing has been with the realism of the decision problems presented to participants and, hence, the external validity of the results.
A realistic decision problem is one that is representative of the problems that occur naturally in the participants environment, whereas an unrealistic problem is one that is
unlikely to occur. If the decision problems used in a policy-capturing study are not
realistic, then the results may be biased and cannot be generalized to nonexperimental
settings (Klaas & Wheeler, 1990; Lane, Murphy, & Marques, 1982; Olson et al., 1992;
York, 1989).
There are a number of inherent challenges to enhancing realism. First, in a policycapturing design, individuals are asked to make judgments based on a limited amount
of information (typically four to eight variables, as shown in Table 1), whereas they are
likely to have more extensive information when making judgments of actual cases
(Olson et al., 1992). There is evidence, however, that owing to cognitive limitations,
individuals tend to base judgments on a relatively small number of criteria (Cooksey,
1996; Rossi & Anderson, 1982; Sanchez & Levine, 1989). Hence, if care is taken to
identify, and include in the study, the decision criteria that are likely to be most salient
to decision makersjudgments, then the realism of the decision problem, and the external validity of the study, can be enhanced. This objective may be difficult to achieve,
however, as it is virtually impossible to ascertain a priori the factors to which individuals attend in making judgments (Viswesvaran & Barrick, 1992; York, 1989). One
approach has been to survey or interview individuals involved in making the decisions
of interest. This may involve focus groups and/or interviews with individuals similar to
the studys sample (Graves & Karren, 1992; Rynes & Lawler, 1983; Viswesvaran &
Barrick, 1992). The advantage of such an approach is that the information is provided
345
by the decision makers themselves, who are arguably in the best position to know what
they consider when making decisions (Cooksey, 1996). A potential disadvantage, on
the other hand, is that the identification of important criteria is a subjective process,
and the information obtained may be incomplete due to lapsed or reconstructed memories (Cooksey, 1996). To minimize these problems, researchers are advised to use multiple, carefully selected decision makers and look for consistent themes (Cooksey,
1996). An alternative approach is to examine company or other related records to
ascertain important decision criteria. Hobson, Mendel, and Gibson (1981) examined
the participating organizationsevaluation policies to identify salient performance criteria. Similarly, York (1989) analyzed 123 related court cases to identify variables for a
study of judgments about sexual harassment complaints. This technique is more
objective than the focus group/interview approach; however, it is limited by the quality
and accuracy of the records (Cooksey, 1996). Still another approach is to review past
theoretical and empirical research (Allen & Muchinsky, 1984; Hollenbeck & Williams, 1987; Pablo, 1994). Where available, such information may provide more
objective support for criterion importance. This approach is likely to be of less use to
researchers exploring new topics. Our recommendation is that researchers obtain
information from all available sources (focus groups, company records, prior
research) and use criteria for which there is consistent support.
Unrealistic decision problems may also be created when a full factorial design is
used to set up decision problems (Klaas & Wheeler, 1990; Lane et al., 1982; York,
1989). This very common policy-capturing design involves creating hypothetical scenarios by completely crossing all of the variables. In this case, the variables are said to
be orthogonal and their independent effects may be accurately assessed. As seen in
Table 1, this full factorial approach has been used by many of the studies reviewed for
this article (see, e.g., Bretz & Judge, 1994; Graves & Karren, 1992; Klaas &
DellOmo, 1991; Rynes et al., 1983).
A potential problem with this approach is that if variables are truly correlated in
the environment, then the decision makers may be presented with unrealistic cases
(York, 1989). Crossing employment status with retirement benefits, for example,
would create one scenario in which a contingent worker received pension benefits.
Two alternative approaches have been developed to address this concern. In one, hypothetical scenarios are created in such a way as to enhance realism and minimize variable intercorrelations. Klaas and Wheeler (1990), for example, used an orthogonal
design but selected factors that are not typically correlated in the real world. Klaas
and DellOmo (1991) provided plausible explanations to study participants for the
seemingly implausible combinations created by orthogonal manipulations. Beatty,
McCune, and Beatty (1988) endeavored to replicate real-world correlations rather
than crossing all explanatory variables; intercorrelations between the eight variables
included in the study were on the average .22. Realism and validity are enhanced using
this approach; however, it is also possible that important criteria would be excluded
because of problems with multicollinearity. A second approach involves using actual,
rather than hypothetical, scenarios. Dougherty et al. (1986), for example, asked participants to rate job applicants after listening to actual interviews that had been tape
recorded. In another study, job incumbents rated the importance of tasks listed in their
(actual) job task inventories (Sanchez & Levine, 1989). Intercorrelations between the
explanatory variables in both of these studies were high. The decision problems are
thus realistic, but because the variables are not orthogonal, assessing their independent
346
effects becomes more difficult. Thus, researchers must in many cases make trade-offs
between realism, on one hand, and acceptable levels of variable intercorrelations, on
the other. In these instances, compromises will be required between the ideally desirable and the practically realizable features of the design (Cooksey, 1996, p. 94). As
to the extent to which researchers should emphasize realism over low
intercorrelations, we note that of all of the studies reviewed for this article, only two
report correlations greater than .5 (Dougherty et al., 1986; Sanchez & Levine, 1989).
Our recommendation is to adopt, wherever possible, the approach taken by the preponderance of studies accepted for publication, that is, ensuring that variable
intercorrelations are 0 or near 0. Higher correlations have in many cases been acceptable and may be unavoidable in situations where the variables of interest are naturally
correlated in the environment. Limited evidence that raw-score regression weights are
similar under different intercorrelation levels (ranging from near 0 to .5) suggests that
0 correlations are not required to estimate variable importance (Lane et al., 1982).
Nevertheless, the impact of higher correlations (i.e., greater than .5) on estimates of
variable importance has not been examined, and our review suggests that levels near 0
or less than .20 are more likely to be favorably received.
The realism of decision problems can also be affected by the operationalization of
explanatory variables. If the manipulated values of the explanatory variables are not
representative of the values observed in the judges environment, the external validity
of the study could be limited (Judge & Bretz, 1992; Rynes et al., 1983). Rynes et al.
(1983) found that differences in the variance in treatment levels can generate different
importance weights. Specifically, the results suggested that the estimated effects of
payrelative to other variables included in the studyare higher when the difference
between defined salary levels is wide than when it is narrow. Range effects were also
obtained in a study conducted by Highhouse, Luong, and Sarkar-Barney (1999),
although these authors reached the somewhat different conclusion that the direction of
these effects may vary across decision contexts. That is, the effects of attribute range
may depend on whether the decision maker is in the initial (prechoice screening) or
ultimate (final) stage of job choice. Both studies suggest, however, that judges
responses to hypothetical scenarios are sensitive to attribute range. Hence, conclusions
about the effects of explanatory variables that are based on responses to hypothetical
scenarios cannot be generalized to actual decision contexts unless treatment levels are
realistically defined.
Ensuring the realism of defined treatment levels, as well as variable combinations,
typically involves obtaining input from individuals familiar with the decision context.
Allen and Muchinsky (1984), for example, asked Department of Transportation officials to review hypothetical scenarios describing proposed transportation services for
the physically handicapped. Staff reviews have also been sought to ensure the realism
of descriptions of hypothetical job candidates (Graves & Karren, 1992), employee disciplinary problems (Klaas & Wheeler, 1990), and substance abuse violations (Klaas &
DellOmo, 1991). Levels of variables examined in studies of students job choices
have been defined based on data from college career offices (Bretz & Judge, 1994;
Cable & Judge, 1994; Judge & Bretz, 1992; Rynes et al., 1983). Field data were also
used to create hypothetical applicant profiles (Mazen, 1990) and acquisitions (Pablo,
1994). Finally, pretests using respondents drawn from the same population as the
study sample have helped researchers to identify and revise unrealistic scenarios
(Pablo, 1994; Webster & Trevino, 1995). These methods have the advantage of
347
enhancing the representativeness of scenarios; however, care should also be taken to

ensure that sources are well informed and accurate (Cooksey, 1996). Asking recruiters
to identify the factors that affect job choice, for example, may be less valid than asking
the job applicants themselves.
Under some circumstances, defining treatment levels realistically may raise other
methodological issues. First, the distance between realistically defined levels may
vary across variables. For instance, the difference between core and contingent
employment arrangements could be viewed as substantially greater than the difference
between alternative health insurance plans (e.g., no premium versus a small or moderate premium). Prior research suggests that responses to variables with wider differentials between levels tend to be different from responses to those where the differentials
are relatively narrow (Highhouse et al., 1999; Rynes et al., 1983). In this case, concerns
may be raised about the creation of experimental conditions that set up certain variables to exhibit artificially high effects. Second, the difference between treatment levels may represent a gain for some variables (e.g., current versus higher pay) and a loss
for other variables (e.g., current versus lower benefits). Because research suggests that
responses to losses tend to be greater than responses to gains (Kahneman & Tversky,
1979), the concern again is that experimental conditions create artificially stronger
responses to some variables. In other words, by enhancing external validity (by
enhancing realism), researchers may be inadvertently compromising other forms of
validity. The literature provides little guidance on this issue. Whereas the importance
of realistically defining treatment levels is clearly important, the extent to which
emphasis should also be placed on scaling variables consistently has not been
explored. We suggest researchers focus first on enhancing realism and second on scaling variables as consistently as possible. That is, realistic treatment levels are preferable even if it means that the resultant scales are inconsistent. If emphasizing consistency in scaling results in unrealistically defined variables, then the results are of
minimal value because they cannot be generalized to other settings. Moreover, even if
the case could be made that inconsistent scaling results in unfair comparisons of
variable effects, inferences can still be made about whether the effect of a variable is
significant or nonsignificant (Cooper & Richardson, 1986). Hence, in situations where
achieving realism and consistent scaling is not possible, we recommend using the
treatment levels that reflect real decision contexts.
To summarize, the use of hypothetical scenarios to examine judges decisions can
compromise the external validity of a study unless care is taken to ensure that the variables included are salient to the judges and that variable levels and combinations are
representative of those observed in their environments. Realism can be enhanced by
using actual scenarios or, if creating scenarios, by involving knowledgeable individuals in the creation of hypothetical scenarios, replicating actual correlations between
explanatory variables, and/or selecting variables that are naturally orthogonal in the
environment. Care should also be taken, however, to minimize variable
intercorrelations and thereby enhance the ability to discern the relative importance of
each variable.
The Limits of a Full Factorial Design
As noted above, many of the policy-capturing studies reviewed in this article have
employed a full factorial design in which all variables are completely crossed and bal-
348
anced. Such an approach allows the assessment of the independent effects of each variable on an individual respondents decision (Graham & Cable, 2001; Webster &
Trevino, 1995). A full factorial design also allows the assessment of all main and
higher order effects (Graham & Cable, 2001).
Depending on the number of variables included in the study, however, employing a
full factorial design means that respondents could be asked to review an inordinately
large number of scenarios. Completely crossing seven variables with two levels each,
for example, generates 128 unique scenarios (27 = 128). Employers and/or individuals
are often reluctant to participate in such a time-consuming study, and procuring an
adequate sample under these circumstances can be difficult. Moreover, even among
those individuals willing to read this many scenarios, respondent overload, fatigue,
and stress can affect responses (Graham & Cable, 2001; Webster & Trevino, 1995;
York, 1989). Survey length has been found to be associated with significant differences in respondent stress and exhaustion (Graham & Cable, 2001). Consequently,
researchers may want to limit the number of scenarios presented to participants (Rossi &
Anderson, 1982; Webster & Trevino, 1995). This can be achieved by limiting the number of experimental variables, thereby minimizing the number of scenarios created
when the variables are completely crossed. Limiting the study to a small number of
variables, however, may require the researcher to exclude potentially important
explanatory variables (Graham & Cable, 2001). Hence, researchers employing a full
factorial design may find that they must either limit the scope of their study or risk
compromising the quality of the data (Graham & Cable, 2001).
A common approach to this dilemma is to ask respondents to read a subset of the
full factorial set. This is known as a confounded factorial design, which encompasses
two popular designs: the incomplete block design and the fractional factorial design.
These designs allow the researcher to examine a broader set of variables while avoiding respondent overload (Graham & Cable, 2001). Cue sets can be created whereby
variables are orthogonal, thus allowing the assessment of each variables independent
effects but not higher order (three-way or higher) interaction effects (Cochran & Cox,
1957). If such interactions are not theoretically important, and enough respondents are
available to evaluate all scenarios, then this type of design may be an appropriate
method for reducing respondent boredom and strain (Graham & Cable, 2001; Klaas &
DellOmo, 1991).
Cue sets for the incomplete block design and the fractional factorial design are created in similar fashion. Both involve systematically dividing the full factorial set into
blocks and presenting each respondent with one of the blocks (Cochran & Cox, 1957;
Graham & Cable, 2001; Webster & Trevino, 1995). Each block is composed of a
unique set of scenarios and is created by dividing the full set into halves, quarters,
eighths, and so on (Graham & Cable, 2001).
The major difference between the two types of confounded designs is that all of the
subsets are used (i.e., subgroups of participants each receive a different subset of scenarios) with the incomplete block design, whereas only one subset is used (i.e., all participants receive the same subset of scenarios) with the fractional factorial design.
Consequently, the number of participants required to conduct a study using the incomplete block design increases with the number of blocks used (Graham & Cable, 2001).
That is, at least two participants are required for a one-half (two-block) design, at least
four are needed for a one-quarter (four-block) design, and so on.
349
Owing at least in part to the additional sample and procedural requirements of the
incomplete block design, fractional designs have historically been the approach of
choice among researchers using the confounded factorial approach (Graham & Cable,
2001). Another approach is to randomly select scenarios from the full factorial set,
checking to ensure that variable intercorrelations are not high (Allen & Muchinsky,
1984; Klaas & DellOmo, 1991; Pablo, 1994; Viswesvaran & Barrick, 1992; York,
1989). Allen and Muchinsky (1984), for example, randomly selected 36 out of the 72
possible transportation service descriptions; intercorrelations between the variables
were low (r < .25). Both of these fractional designs allow the researcher to examine the
full range of important cues without creating respondent overload. Because they do
not incorporate the full set of scenarios, however, the researcher cannot be sure that
results are unaffected by the particular set of scenarios selected (Graham & Cable,
2001). Relative to the incomplete block design, which uses the full set of scenarios, the
fractional design allows estimation of fewer effects and requires making more
assumptions about which effects are unimportant (Graham & Cable, 2001).
The confounded factorial design seems to offer researchers a method of studying
how people make decisions without straining respondents or limiting the scope of the
study. Nevertheless, the extent to which this method is a viable alternative to the more
statistically rigorous full factorial design is unclear. We are aware of just one study that
has examined the merits of a full versus a confounded factorial design. In a study of the
effects of five firm attributes on job seekers perceptions of firm reputation, Graham
and Cable (2001) randomly assigned 108 college students to either a full (32 scenarios)
or an incomplete block (8 scenarios) design. Their results indicated that the estimated
effects of explanatory variables were substantially the same across the two designs and
that a full factorial design generated significantly more stress, fatigue, and negative
reactions to survey length among respondents than did an incomplete block design.
Given the limited empirical research on the viability of the confounded factorial
design, it is probably prudent to use a full factorial design wherever possible. One concern is that a design with too few scenarios lacks sufficient power, an issue that is discussed later in the article. Furthermore, scenarios that contain many factors are
cognitively complex and may require some practice with the response scale before
study respondents begin to process the information reliably. In these situations, a confounded factorial design with a small number of scenarios may have insufficient reliability unless respondents are given several practice scenarios before beginning the
task. Nevertheless, the Graham and Cable (2001) study suggested that this type of confounded factorial design is under certain circumstances an acceptable, and perhaps
preferable, alternative. More specifically, these authors argued that where a confounded factorial design is indicated, researchers should give particular consideration
to the incomplete block design. The primary advantage of this approach is that it
reduces the number of scenarios study participants are asked to evaluate. Moreover, as
noted above, estimation of cue effects using the incomplete block design is not limited
by the exclusion of scenarios, as is the case with the fractional design. Hence, when the
number of salient explanatory variables and/or treatment levels included in the study is
such that a full factorial design would generate an inordinately large number of scenarios, researchers should give serious consideration to an incomplete block design.
Determining what constitutes inordinately large may be a matter of judgment at
this point. Rossi and Anderson (1982) suggested an upper limit of 60 scenarios.
350
Aiman-Smith et al. (2002) have recommended using no more than 80 written scenarios. Graham and Cable (2001) found that respondents reacted more negatively to 32
than to 8 scenarios, although the mean stress score for the larger survey was relatively
moderate (3.28 on a 7-point scale). It may be that acceptable survey length varies
across individuals and that pretesting may be needed to determine the maximum number of scenarios respondents can reasonably be expected to process. The optimal number may also vary according to the size of the scenario. Rossi and Anderson, for example, maintained that participants in their study could respond to 60 scenarios in 20
minutes. A smaller number might be indicated where the scenarios require more time
to read and process.
Graham and Cable (2001) also suggested that the incomplete block design may be
most appropriate where the examination of individual decision policies is not a central
focus of the inquiry because individual regression equations must be interpreted with
caution when study participants do not evaluate all possible scenarios. Nevertheless,
we are aware of at least three published studies employing a confounded factorial
design in which data analysis included the estimation of individual regression equations (Klaas & DellOmo, 1991; Pablo, 1994; Webster & Trevino, 1995). According to
Graham and Cable, regression estimates under these circumstances are likely to be
more useful if researchers employ larger fractions (e.g., one half as opposed to one
quarter). Finally, a confounded factorial design is only appropriate where higher order
interactions are expected to be unimportant to explanations of judgments (Graham &
Cable, 2001; Klaas & DellOmo, 1991).
Orthogonality of Cues
The chief theoretical advantage of orthogonality is that it facilitates the assessment

of the independent effects of each of the explanatory variables (Martocchio & Judge,
1994; Zedeck & Kafry, 1977). That is, partitioning out a cues unique contribution to
variance in the dependent variable is most feasible when it does not covary or overlap
with other cues (Darlington, 1968; Pedhazur, 1982). More specifically, if variation in
one cue is associated with variation in a second cue, then determining which portion of
the variation in the dependent variable can be attributed to the first cue, the second cue,
or a combination of the two becomes difficult (Kennedy, 1989). Indeed, a number of
researchers have suggested that the precise measurement of cue importance (beta
weights) is very difficult in the absence of orthogonality (Darlington, 1968). Evidence
also suggests that cluster analysis, a technique whereby groups of respondents with
similar policies are identified, is more successful when variables are not
intercorrelated (Zedeck, 1977). The problem is that intercorrelation results in unstable
parameter estimates with higher variance, which in turn makes the identification of
discrete patterns of policies more difficult (Kennedy, 1989).
It is for these reasons that most researchers employing the policy-capturing design
create variable combinations to ensure that intercorrelations are 0. As Table 1 shows,
variables are orthogonal in 22 of the 37 studies reviewed for this article. Among the
sizeable minority of studies not taking this approach, variable intercorrelations ranged
from a low of .02 to a high of .91 (Sanchez & Levine, 1989). Given the apparent acceptability of nonorthogonal designs in some cases, and questions about the realism of scenarios in which factors are forced to be orthogonal, we next consider the question of
351
whether, or under what circumstances, researchers should take care to ensure that variables are completely uncorrelated.
To our knowledge, one study addresses this issue. Lane et al. (1982) compared estimates of variable importance under three different correlation structures and found
that raw-score regression weights did not change across structures, whereas other
measures of importance (simple correlations between explanatory and dependent variables, semipartial correlations, and standardized regression coefficients) did. The
authors concluded that zero intercorrelations are not required to estimate the importance of explanatory variables and that raw-score regression weights are the most
appropriate indicators to use when variables are not orthogonal. They also noted, however, that raw-score regression weights are not independent of the scale used and may
not be ideal in studies where such independence is important. For example, the
observed effect of a one-unit change in intelligence quotient (with a range of 100 or
more) is likely to be considerably smaller than that of a one-unit change in years of
experience (with a range of 25 or less), suggesting, perhaps erroneously, that years of
experience is a more important determinant of the outcome of interest than intelligence. Hence, in decision problems where the variance of cues is dissimilar, the relative importance of these cues cannot be accurately assessed unless the regression
weights are standardized (i.e., independent of the scale). In such cases, Lane et al. recommended using standardized regression coefficients, as the results of their study suggest that the change in these estimates across different correlation structures, although
significant, is relatively small (p. 238).
Variable intercorrelations in the Lane et al. (1982) study were not very high, so it is
not clear whether estimates of variable importance when intercorrelations are high can
be interpreted with any confidence. Furthermore, use of a nonorthogonal design may
limit the researchers choice of importance measures and make the use of cluster analysis more difficult. Zero or near-zero variable intercorrelation structures would therefore seem to be the preferred design. If an orthogonal design is not possible (due, perhaps, to the creation of unrealistic scenarios), the researcher may want to consider a
design in which variable intercorrelations are relatively low and raw-score regression
weights are used as the measure of importance.
Sample Size and Power
One way to increase the power of a research study is to increase the number of subjects or participants in the study (Cohen, 1988). When planning a research study,
researchers will try to reduce Type II errors and increase power. Specifically, researchers prefer having a reasonably large chance of rejecting the null hypothesis if it, in fact,
is false. However, when designing a policy-capturing study, the number of subjects
may take a secondary role because the main focus typically is the individual analysis
for each subject. In this case, the power of the individual analysis is not based on the
number of subjects, because there is only one subject, but the number of scenarios or
judgments made by each subject. The number of scenarios determines the number of
observations for each analysis. That is, whether the regression weights will be significant is likely to be related to the number of scenarios. In a study employing multiple
regression techniques, the preferred ratio of the number of scenarios to factors is 10:1,
but the minimum ratio is considered to be 5:1 (Cooksey, 1996). These ratios are guide-
352
lines, and other factors should be considered, such as the extent of the explained variation in judgments and the intercorrelations of the cues (see Cooksey, 1996, for further
discussion). However, as discussed above, increasing the number of factors, and
hence the number of scenarios, may also create problems such as stress and exhaustion (Graham & Cable, 2001). Thus, there seems to be a trade-off associated with a
more comprehensive survey. Although it offers higher power, a comprehensive survey
may result in fatigue and, furthermore, the likelihood of reduced reliability.
As discussed previously, researchers sometimes use a confounded factorial design
rather than a full factorial design to reduce fatigue. In some cases, this is done so the
researcher can add additional cues into the judgment process without burdening the
respondent with a large number of scenarios. That is, there are situations or contexts in
which the decision maker generally considers a large number of cues before making a
judgment, creating the need for fractionalization. However, fractionalization also
results in fewer scenarios, and this, in turn, reduces the power of the design. Thus, it is
important that researchers consider both power and fatigue when determining the
number of scenarios to include in the final design. One approach may be to develop at
least five scenarios for each cue. This would be the minimum ratio for studies employing regression techniques (Cooksey, 1996).
Although the size of the sample does not affect the power of the individual analysis,
it can be an issue in studies using other types of analysis. For example, if the researcher
is going to cluster subjects by strategy, large sample sizes offer a more comprehensive
analysis of the grouping process. Large sample sizes are also likely to be more effective when analyzing individual differences between the respondents. For example, in
the Graves and Karren (1992) study, 29 interviewers were used to analyze the different
decision-making strategies. After doing a cluster analysis, 13 clusters were found to
differentiate the various decision-making strategies among the interviewers. Although
it was likely that most if not all of the decision strategies were found, it was rather difficult to estimate the relative popularity of each cluster because there were 13 clusters
among 29 interviewers. Larger sample sizes allow better estimation of the relative
popularity of the clusters. For instance, Klaas and DellOmo (1991) used a much larger
sample size (93 managers). Their cluster analysis indicated 7 clusters, and with their
much larger sample size, they were able to estimate the popularity of each cluster.
Relatively small (less than 50 subjects) samples are not uncommon among the published studies using the policy-capturing approach. Rynes et al. (1983), for example,
examined the job application decisions made by 10 college seniors. Because there
were only 10 subjects, individual analyses were conducted, but subjects were not clustered. Having few subjects makes it difficult to do any form of clustering. Furthermore,
it is unlikely that any kind of generalization can realistically be made regarding these
students, as small samples may not be representative of the population. Obtaining representative samples that can be used to generalize results to the population is not possible
without sufficiently large numbers of respondents and the use of probability-sampling
techniques.
The smallest sample among the published studies reviewed for this article consisted
of three subjects (Dougherty et al., 1986). However, in this case, the objective was not
to discern the relative effects of each of the cues but to determine the validity of each of
the three decision makers in making decisions about job candidates. Studies that use
relatively few subjects probably have very different objectives; they are unlikely to
make inferences regarding the external validity of their results. Thus, large sample
353
sizes should not be a requirement for all policy-capturing studies. It is necessary that
the researcher desire sufficient power when testing hypotheses.
Reliability
Reliability is an important criterion in most research studies, as it is a necessary but

not a sufficient condition for the validity of measures (Carmines & Zeller, 1979). Interestingly, few of the published studies shown in Table 1 have analyzed the reliability of
their decision makers judgments. This may seem to be somewhat disturbing because
many of the studies use only single-item dependent measures. There have been some
notable exceptions. Both Rynes et al. (1983) and Sherer et al. (1987) asked subjects to
make replicate judgments on the set of scenarios, allowing them to estimate reliability.
The correlation between the two sets of scenarios represents a test-retest check of the
judgments. Rynes et al. found reliabilities between .75 and .90, averaging approximately .82 for their 10 subjects. Sherer et al. found the average reliability to be about
.78 for the 11 subjects in their study. In a study by Hollenbeck and Williams (1987), a relatively small number of subjects (n = 11) were asked to perform the policy-capturing
study a second time a month later. The median test-retest reliability for these 11 subjects was .72, which suggests some degree of stability over time.
Although the results from these three studies indicate reasonable estimates of reliability (greater than .70), it is noteworthy that very few of the published studies made
reliability estimates. Furthermore, among those that did, relatively few subjects were
asked to duplicate their judgments. It seems that researchers have difficulty asking
subjects to duplicate their judgments when they are asked to process a large number of
scenarios. In most cases, duplication would require an additional experimental session. A study by Cable and Judge (1994) asked subjects to replicate 4 of 32 full factorial scenarios. The authors calculated reliability on the 4 duplicated scenarios for all
subjects and found an average correlation of .82. Although this process of estimating
reliability does not include the full set of items, it still may be a reasonably good compromise when circumstances do not allow researchers to fully duplicate the set of scenarios. Thus, this strategy of limited duplication is recommended as it is not likely to
create fatigue and the researcher is still able to estimate the reliability of each sample.
Furthermore, these duplicated scenarios may warm up participants to the task and thus
lessen start-up effects, a problem discussed by Aiman-Smith et al. (2002).
Summary
Our analysis of key design issues has so far relied on a review of policy-capturing
studies in the management literature. A number of the issues we discuss are also
addressed by Aiman-Smith et al. (2002). Both articles consider the issue of realism in
the design and presentation of scenarios to participants. Our article considers a variety
of approaches and designs that may be utilized to create realism. We discuss our preference for zero or near-zero intercorrelations between the cues when designing a
study, then consider the advantages and disadvantages of various alternative designs
(e.g., fractional and incomplete block design) to the full factorial design. Furthermore,
in the next section, we discuss approaches used by conjoint analysis researchers to deal
with the realism problem. We also propose that realism is more important than consistency when considering the range or difference between cue levels (values). This
354
Table 2
A Comparison of Conjoint Analysis and Policy-Capturing
Conjoint Analysis
Type of analysis
Cue presentation method
Survey design
Level of analysis
Aggregate analyses
Evaluation of stimuli
Decompositional
Profiles method, trade-off
method, pairwise combination,
adaptive designs
Full factorial design, fractional
design
Individual
Yes, including clustering
Metric scales, nonmetric
procedures
Policy-Capturing
Decompositional
Profiles method
Full factorial design, fractional

design
Individual
Yes, including clustering
Metric scales
means that in some cases, some cues will have wider ranges than others. Aiman-Smith
et al., on the other hand, propose that ranges should be about the same even if cue levels
are less realistic. We are in agreement, however, about the importance of consistency
with the number of levels across cues. We also discuss the advantages of orthogonality
among cues and the use of raw regression weights and standardized weights when
there are nonorthogonal designs.
Both articles also address the issue of fatigue and the cognitive limits of the decision
maker. Aiman-Smith et al. (2002) advise the researcher not to use more than five cues.
They also state that if written scenarios are used, the total number of scenarios should
be between 50 and 80. We tend to be less prescriptive. Because there are many potential designs and approaches, we do not specify the number of cues or scenarios to use.
We believe, however, that there should be an absolute minimum ratio of scenarios to
cues (i.e., 5:1).
Finally, both we and Aiman-Smith et al. (2002) address the reliability issue and
suggest that more estimates are desirable. Both advise using duplicate scenarios to calculate reliability. We suggest that these duplicates may assist in lessening start-up
effects, a problem found, as discussed by Aiman-Smith et al., when subjects initially
learn the task.
We next consider the insights offered by studies from the marketing literature that
have utilized a very similar methodologyconjoint analysis.
Conjoint Analysis
Conjoint analysis is a methodology that has been used extensively in marketing and
consumer research to understand how consumers evaluate preferences for products or
services (Green & Srinivasan, 1978, 1990). Like policy-capturing, conjoint analysis
uses an individualized factorial survey approach to examine the effects of product or
service attributes on evaluative judgments. An examination of this methodology may
therefore yield useful guidelines for researchers employing a policy-capturing
approach. We begin with a consideration of the methodological and computational
similarities between the two approaches. Our comparison of these approaches is summarized in Table 2. We include in this discussion an examination of the approaches
market researchers have taken to resolve some of the issues (e.g., information over-
355
load, orthogonality) associated with a factorial survey design. Finally, we discuss

research on conjoint analysis that may be applicable to policy-capturing.
Design of Conjoint Analysis Studies
Conjoint analysis is used to measure the relative importance consumers give to

the attributes that describe the products or services of interest. Similar to the policycapturing approach, conjoint analysis involves the construction of real or hypothetical
products or services by combining the selected levels of each attribute (factor). These
hypothetical products or services are then presented to the respondents, who provide
an overall evaluation. This type of analysis has been called decompositional because
it involves decomposing respondents preferences, or ratings, to determine the relative
value of each attribute (Hair, Anderson, Tatham, & Black, 1998). Policy-capturing is
also decompositional, as the decision makers are asked for overall evaluations of the
scenario rather than the factors that make up the scenario.
One issue for researchers designing a conjoint-analysis study is choosing a method
for presenting product or service descriptions. Conjoint-analysis researchers have
used a number of presentation methods. The full profile is the most popular presentation method, especially in studies examining fewer than six factors. Each stimulus
(hypothetical product or service) is described separately, most often on a profile card,
and defines varying levels of all of the factors included in the study. Respondents are
asked to either rank-order the stimuli or rate each independently. The decision problem
tends to be more complex than some of the other methods because all the factors are
included in each presentation. Furthermore, as the number of factors increases, so too
does the possibility of information overload. Therefore, it is more likely to be used
with six or fewer factors.
Our review of policy-capturing studies suggests that the full-profile method is the
most popular approach to cue presentation. Those researchers wishing to expand the
scope of their inquiries to a larger number of factors have tended to reduce the number
of scenarios presented to participants (e.g., fractional factorial design) rather than the
number of cues included in the scenarios. In contrast, marketing researchers have
developed a number of alternatives to full-profile presentation. The trade-off method,
for example, entails presenting attributes two at a time and asking respondents to rankorder the full set of combinations for each pair. It is less complex than the full-profile
method and has the advantage of avoiding information overload by presenting only
two attributes at a time. It has a number of limitations, however, and its use has
decreased in recent years. Limitations include the large number of judgments necessary even where the number of factor levels is relatively small, fatigue, the sole use of
nonmetric responses, and the inability to use fractional designs. An alternative to this
method is basically a combination of the trade-off and full-profile methods (Hair et al.,
1998). Referred to as the pairwise comparison method, it involves comparisons of two
profiles containing a subset of the attributes. Respondents typically indicate the
strength of their preference for one profile over another on a rating scale. It is similar to
the trade-off method in that profiles contain a subset of the attributes, but in the
pairwise comparison method, respondents never view more than two profiles at a time.
It is similar to the full-profile method in that profiles include more than two attributes
and metric response measures (ratings) are used. The advantage of the method is that it
allows researchers to examine more than seven factors without creating the problems
356
of respondent fatigue often associated with other presentation methods (Hair et al.,
1998).
As with policy-capturing studies, the problems associated with information overload are also concerns in the design of conjoint-measurement studies. Market
researchers have addressed these concerns in much the same way as researchers using
the policy-capturing approachby employing a fractional factorial design. Using a
fractional factorial design serves to reduce the number of scenarios to a manageable
size while maintaining orthogonality (Green, 1974). As discussed earlier, these
designs result in a reduction of interpretable interaction effects. This is not a problem,
however, if the decision maker is using an additive model, as most of the interpretable
variance is assigned to the main effects.
Researchers using the conjoint-analysis method have examined the effects of as
many as 30 factors and have consequently developed a number of alternative
approaches to the fractional factorial design not found in the policy-capturing literature. The hybrid or adaptive approaches, for example, involve a two-stage procedure in
which respondents are first asked to rate the desirability of the full set of factors.2 The
adaptive approach is the more popular of these methods, perhaps because software is
available that allows the researcher to generate individualized scenarios and is probably most useful when examining 10 or more factors (Green & Srinivasan, 1990). In this
approach, each respondent receives a set of scenarios that only include those factors
designated as the most important in the first stage of the procedure. The scenarios are
then evaluated in the same way as the above methods (Hair et al., 1998).
A third issue for researchers using either the conjoint-analysis or policy-capturing
approach is the impact of forced orthogonality on external validity. That is, creating
a design in which the variables are orthogonal but are naturally correlated in the environment may produce profiles that are not representative of the environment familiar
to the respondents (Green & Srinvasan, 1978, 1990). Where variables are substantially
correlated, Green and Srinvasan (1978) have suggested the use of composite factors,
which provide a summary measure of all correlated subfactors. For example, a medical-cost-sharing variable would summarize the overall level of deductible and
coinsurance provisions, which tend to be highly correlated across health care plans.
This approach avoids the problem of creating unrealistic profiles (e.g., high deductible
and low coinsurance); however, it does not allow the researcher to partial out the
effects of subfactors that may be of more interest to the study.
Alternatively, Steckel, DeSarbo, and Mahajan (1991) devised a new optimizing
methodology that entails creating a survey to ensure that variables are as orthogonal
as possible. A combinatorial optimization procedure is used to create a modified fractional factorial design by identifying and excluding nonrepresentative or unrealistic
profiles (i.e., very unlikely to occur in the environment). An algorithm finds a subset of
the realistic profiles that are as close as possible to being orthogonal. This is not unlike
some policy-capturing designs wherein the researcher takes care to create stimulus
sets in which variable intercorrelations are minimized and realism is enhanced (Beatty
et al., 1988; Klaas & DellOmo, 1991; Klaas & Wheeler, 1990).
Data Analysis
Similarities may also be found in the computational issues that arise in conjointanalysis and policy-capturing studies. One issue is the level of analysis. In both
357
approaches, analyses are typically carried out at the individual level, which means that
the analyst generates a separate model for predicting preferences for each respondent.
At the individual level, each respondent rates enough profiles that the analyses can be
performed separately for each person. Predictive accuracy is calculated for each
respondent rather than for the whole sample. One of the common uses of these analyses is to group individuals with similar importance values into segments. Researchers
using either conjoint analysis or policy-capturing are interested in better understanding these segments and may combine this information with other variables such as
demographics to derive respondent groupings that are similar in preference (Hair et al.,
1998). Many times, however, an aggregate analysis is used to estimate the relative
importance of the attributes of the whole sample. In some studies, this between-group
analysis was used to test hypotheses about the average effects of attributes.
A second computational issue involves the specification of the respondents composition rule. The rule describes how the respondent combines the factors to obtain
overall worth. The most common rule invokes the additive model, which assumes that
the respondent simply adds up the values of each attribute to get a total value for the
factor or attribute combination. As is the case in policy-capturing studies, the main
effects account for most of the total variation in preferences, and hence, this model suffices for most consumer applications. Alternatively, the composition rule using interactive effects allows for the interaction of two or more attributes. The choice of a composition rule determines the types and number of stimuli that the decision maker must
evaluate. More stimuli are required if the researcher is interested in evaluating interactions. Consider a study using 4 factors and 4 levels. If the factors are presented using a
full-profile method, in which all factors are included in all scenarios, and the
researcher is only interested in estimating main effects, then just 16 (4 factors 4 levels) of the 256 (44) possible scenarios are needed (Hair et al., 1998). That is, it is possible to estimate main effects using a fractional factorial design. However, the 16 scenarios must be carefully constructed for orthogonality to arrive at the correct estimation of
the main effects. If, on the other hand, interactions are specified as important, additional scenarios are required to assess these effects. In this case, a full factorial design,
with the full set of 256 scenarios (240 more than are needed to assess main effects)
would be required to assess the importance of all 11 interactions.
Research Results
Because of the similarities between conjoint analysis and policy-capturing,

research on either method can be informative in designing these studies. Research on
conjoint analysis has shown that the relative importance of an attribute or factor
increases as the number of levels on which it is defined increases. This occurs even
though the minimum and maximum values of the attribute are held constant (Wittink,
Krishnamurthi, & Nutter, 1982; Wittink, Krishnamurthi, & Reibstein, 1990). This
result was observed in analyses of both rank-order and ratings data. It indicates a serious problem because the estimated regression coefficients are supposed to be unbiased. Applied to policy-capturing, this finding suggests that investigators should probably consider using the same number of levels for all factors when their intention is to
compare the relative importance of these factors.
Over the past three decades, conjoint analysis has been an important tool for those
conducting consumer and marketing research. Its popularity in evaluating consumer
358
preferences has promoted a broader range of useful methodological techniques that

can and should be used to expand the capabilities of the policy-capturing methodology. Specifically, conjoint analysis has offered more flexibility with methods (e.g.,
pairwise comparison and adaptive) that can be used when either few factors or many
factors are utilized. In situations in which factors are substantially correlated and are
causing potential interpretation problems, conjoint researchers have suggested the use
of either composite factors or new optimizing methodologies that remove the unrealistic scenarios. Hopefully, the advances of conjoint research can be helpful in addressing
and aiding some of the issues and problems plaguing policy-capturing researchers.
Conclusion
Our analysis of some of the key issues related to the policy-capturing method has
indicated not only that this method has been a very effective tool in understanding the
processes by which individual decision makers integrate cues of information in making various types of organizational decisions but that the method is highly flexible and
able to adapt to many contexts. Our comparison with conjoint analysis has been quite
beneficial, as research advances to this methodology can be of assistance in solving
some of the key methodological problems of the policy-capturing method. Although
we are both positive and quite hopeful regarding the scope and versatility of future
research and practice, the researcher designing a study using this method should
understand its limitations and constraints.
To design a sound and valid policy-capturing study, the researcher should first focus
on the purpose of the research. Clearly, the researchers intentions are critical in determining how to deal with problematic situations when no ideal solution is available. In
our analysis, we encountered a number of clear trade-offs when evaluating the various
policy-capturing studies. For example, if researchers are going to use an experimental
design, they typically have to choose between a full factorial design and a confounded
factorial design. In the former, the obvious advantages are the ability to assess the full
model, both the linear and nonlinear components, and to find the relative contribution
of all the variables. On the other hand, the investigator has to consider problems of
stress and fatigue, especially if there are many variables, and also whether the additional scenarios will result in unreliable judgments. Using a confounded factorial
design, the researcher may want to consider the loss of power with fewer scenarios and
the limitations related to the inability of assessing higher order interactions. These
considerations, however, may become moot if and when the required number of cues is
far too large to use a full factorial design. In situations where there are far too many
cues, researchers are likely to utilize statistical packages, which create scenarios with
near-zero correlations between the cues. In cases where there are more than 10 factors,
researchers should utilize the hybrid and adaptive techniques that conjoint researchers
have used for many years.
Design decisions by researchers are not all based on some form of trade-off. We
believe that researchers should be careful to avoid designing policy-capturing studies
that lack realism and result in poor validity. In planning their studies, they may want to
check their variables carefully to avoid cues that are correlated in real situations and
would result in implausible combinations if forced to be uncorrelated in the experiment. They should ensure that levels of each cue are applicable to real settings. Fur-
359
thermore, they may check to see if the constructed scenarios are likely to be interpreted
appropriately within the context of the study. Once this is known, they can then take the
necessary steps to enhance realism. Finally, most of the past studies have not included
a reliability check. Because reliability is a necessary condition for validity, some form
of replication should be included as part of the policy-capturing study.
Notes
1. Evidence suggests that the more sophisticated judges see through the indirect approach,
and eliminating social desirability effects altogether may not always be possible (Mazen, 1990).
2. This so-called self-explication procedure is similar to an approach used in policy-capturing
studies in which researchers run focus groups or interview individuals familiar with the decision
problems to identify salient decision criteria. In policy-capturing studies, factors are invariant
across respondents, whereas in these alternative methods, the factors vary according to individual respondents desirability ratings.
References
Aiman-Smith, L., Scullen, S. E., & Barr, S. H. (2002). Conducting studies of decision making in
organizational contexts: A tutorial for policy-capturing and other regression-based techniques. Organizational Research Methods, 5, 388-414.
Allen, J. S., & Muchinsky, P. M. (1984). Assessing raters policies in evaluating proposed services for transporting the physical handicapped. Journal of Applied Psychology, 69, 3-11.
Arnold, H. J., & Feldman, D. C. (1981). Social desirability response bias in self-report choice
situations. Academy of Management Journal, 24, 377-385.
Beatty, J. R., McCune, J. T., & Beatty, R. W. (1988). A policy-capturing approach to the study of
United States and Japanese managers compensation decisions. Journal of Management,
14, 465-474.
Brannick, M. T., & Brannick, J. P. (1989). Nonlinear and noncompensatory processes in performance evaluation. Organizational Behavior and Decision Processes, 44, 97-122.
Bretz, R. D., Jr., & Judge, T. A. (1994). The role of human resource systems in job applicant decision processes. Journal of Management, 20, 531-551.
Cable, D. M., & Judge, T. A. (1994). Pay preferences and job search decisions: A person-organization fit perspective. Personnel Psychology, 47, 317-348.
Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Beverly Hills, CA:
Sage.
Cochran, W. G., & Cox, G. M. (1957). Experimental designs. New York: John Wiley.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum.
Cooksey, R. W. (1996). Judgment analysis: Theory, methods, and applications. San Diego, CA:
Academic Press.
Cooper, W. H., & Richardson, A. J. (1986). Unfair comparisons. Journal of Applied Psychology,
71, 179-184.
Darlington, R. B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69, 161-182.
Dougherty, T. W., Ebert, R. J., & Callender, J. C. (1986). Policy capturing in the employment interview. Journal of Applied Psychology, 71, 9-15.
Dunn, W. S., Mount, M. K., Barrick, M. R., & Ones, D. S. (1995). Relative importance of personality and general mental ability in managers judgments of applicant qualifications.
Journal of Applied Psychology, 80, 500-509.
360
Feldman, D. C., & Arnold, H. J. (1978). Position choice: Comparing the importance of organizational and job factors. Journal of Applied Psychology, 63, 706-710.
Graham, M. E., & Cable, D. M. (2001). Consideration of the incomplete block design for policycapturing research. Organizational Research Methods, 4, 26-45.
Graves, L. M., & Karren, R. J. (1992). Interviewer decision processes and effectiveness: An experimental policy-capturing investigation. Personnel Psychology, 45, 313-340.
Green, P. E. (1974). On the design of choice experiments involving multifactor alternatives.
Journal of Consumer Research, 1, 61-68.
Green, P. E., & Srinivasan, V. (1978). Conjoint analysis in consumer research: Issues and outlook. Journal of Consumer Research, 5, 103-123.
Green, P. E., & Srinivasan, V. (1990). Conjoint analysis in marketing: New developments with
implications for research and practice. Journal of Marketing, 54(4), 3-19.
Hair, J. F., Jr., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate data analysis
(5th ed.). Englewood Cliffs, NJ: Prentice Hall.
Highhouse, S., Luong, A., & Sarkar-Barney, S. (1999). Research design, measurement, and effects of attribute range on job choice: More than meets the eye. Organizational Research
Methods, 2, 37-48.
Hitt, M. A., & Barr, S. H. (1989). Managerial selection decision models: Examination of
configural cue processing. Journal of Applied Psychology, 74, 53-61.
Hitt, M. A., & Middlemist, R. D. (1979). A methodology to develop the criteria and criteria
weightings for assessing subunit effectiveness in organizations. Academy of Management
Journal, 22, 356-374.
Hobson, C. J., Mendel, R. M., & Gibson, F. W. (1981). Clarifying performance appraisal criteria. Organizational Behavior and Human Performance, 28, 164-188.
Hollenbeck, J. R., & Williams, C. R. (1987). Goal importance, self-focus, and the goal-setting
process. Journal of Applied Psychology, 72, 204-211.
Judge, T. A., & Bretz, R. D., Jr. (1992). Effects of work values on job choice decisions. Journal
of Applied Psychology, 77, 261-271.
Judge, T. A., & Martocchio, J. J. (1996). Dispositional influences on attributions concerning absenteeism. Journal of Management, 22, 837-861.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica, 47, 263-289.
Kennedy, P. (1989). A guide to econometrics (2nd ed.). Cambridge, MA: MIT Press.
Klaas, B. S., & DellOmo, G. G. (1991). The determinants of disciplinary decisions: The case of
employee drug use. Personnel Psychology, 44, 813-835.
Klaas, B. S., & Wheeler, H. N. (1990). Managerial decision making about employee discipline:
A policy-capturing approach. Personnel Psychology, 43, 117-134.
Lane, D. M., Murphy, K. R., & Marques, T. E. (1982). Measuring the importance of cues in policy capturing. Organizational Behavior and Human Performance, 30, 231-240.
Madden, J. M. (1981). Using policy-capturing to measure attitudes in organizational diagnosis.
Personnel Psychology, 34, 341-350.
Martocchio, J. J., & Judge, T. A. (1994). A policy-capturing approach to individuals decisions
to be absent. Organizational Behavior and Human Decision Processes, 57, 358-386.
Martocchio, J. J., & Judge, T. A. (1995). When we dont see eye to eye: Discrepancies between
supervisors and subordinates in absence disciplinary decisions. Journal of Management,
21, 251-278.
Mazen, A. M. (1990). The moderating role of social desirability, age, and experience in human
judgment: Just how indirect is policy capturing? Organizational Behavior and Human Decision Processes, 45, 19-40.
Olson, C. A., DellOmo, G. G., & Jarley, P. (1992). A comparison of interest arbitrator decisionmaking in experimental and field settings. Industrial and Labor Relations Review, 45, 711723.
361
Orr, J. M., Sackett, P. R., & Mercer, M. (1989). The role of prescribed and nonprescribed behaviors
in estimating the dollar value of performance. Journal of Applied Psychology, 74, 34-40.
Pablo, A. L. (1994). Determinants of acquisition integration level: A decision-making perspective. Academy of Management Journal, 37, 803-836.
Pedhazur, E. J. (1982). Multiple regression in behavioral research. New York: Holt, Rinehart
and Winston.
Roehling, M. V. (1993). Extracting policy from judicial opinions: The dangers of policy capturing in a field setting. Personnel Psychology, 46, 477-502.
Rossi, P. H., & Anderson, A. B. (1982). The factorial survey approach: An introduction. In P. H.
Rossi & S. L. Nock (Eds.), Measuring social judgments: A factorial survey approach.
Beverly Hills, CA: Sage.
Rynes, S., & Lawler, J. (1983). A policy-capturing investigation of the role of expectancies in
decisions to pursue job alternatives. Journal of Applied Psychology, 68, 620-631.
Rynes, S. L., Schwab, D. P., & Heneman, H. G., III. (1983). The role of pay and market pay variability in job applicant decisions. Organizational Behavior and Human Performance, 31,
353-364.
Sanchez, J. I., & Levine, E. L. (1989). Determining important tasks within jobs: A policycapturing approach. Journal of Applied Psychology, 74, 336-342.
Sherer, P. D., Schwab, D. P., & Heneman, H. G., III. (1987). Managerial salary-raise decisions:
A policy-capturing approach. Personnel Psychology, 40, 27-38.
Stahl, M. J., & Zimmerer, T. W. (1984). Modeling strategic acquisition policies: A simulation of
executives acquisition decisions. Academy of Management Journal, 27, 369-383.
Steckel, J., DeSarbo, W. S., & Mahajan, V. (1991). On the creation of acceptable conjoint analysis experimental designs. Decision Sciences, 22, 435-442.
Stumpf, S. A., & London, M. (1981). Capturing rater policies in evaluating candidates for promotion. Academy of Management Journal, 24, 752-766.
Viswesvaran, C., & Barrick, M. R. (1992). Decision-making effects on compensation wages:
Implications for market wages. Journal of Applied Psychology, 77, 588-597.
Webster, J., & Trevino, L. K. (1995). Rational and social theories as complementary explanations of communication media choices: Two policy-capturing studies. Academy of Management Journal, 38, 1544-1572.
Wittink, D. R., Krishnamurthi, L., & Nutter, J. B. (1982). Comparing derived importance
weights across attributes. Journal of Consumer Research, 8, 471-474.
Wittink, D. R., Krishnamurthi, L., & Reibstein, D. J. (1990). The effect of differences in the
number of attribute levels on conjoint results. Marketing Letters, 1, 113-129.
York, K. M. (1989). Defining sexual harassment in workplaces: A policy-capturing approach.
Academy of Management Journal, 32, 830-850.
Zedeck, S. (1977). An information processing model and approach to the study of motivation.
Organizational Behavior and Human Performance, 18, 47-77.
Zedeck, S., & Cascio, W. F. (1982). Performance appraisal decisions as a function of rater training and purpose of the appraisal. Journal of Applied Psychology, 67, 752-758.
Zedeck, S., & Kafry, D. (1977). Capturing rater policies for processing evaluation data. Organizational Behavior and Human Performance, 18, 269-294.
Zhou, J., & Martocchio, J. J. (2001). Chinese and American managers compensation award decisions: A comparative policy-capturing study. Personnel Psychology, 54, 115-145.
Ronald J. Karren is an associate professor of management at the University of Massachusetts at Amherst.
He obtained his Ph.D. in Psychology from the University of Maryland. His current research interests include
testing, interviewing, and human resource decision making.
Melissa Woodard Barringer is an associate professor of management at the University of Massachusetts at
Amherst. She received a Ph.D. in Industrial and Labor Relations from Cornell University. Her research interests are in the areas of total compensation and alternative work arrangements.

A Review and Analysis of The Policy Capturing Methodology in Organizational Research Guidelines For Research and Practice 2002 Organizational Research Methods

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Review and Analysis of The Policy Capturing Methodology in Organizational Research Guidelines For Research and Practice 2002 Organizational Research Methods

Enviado por

Direitos autorais:

Formatos disponíveis

Organizational Research Methods

The Research Methods Division of The Academy of Management

>> Version of Record - Oct 1, 2002

Downloaded from orm.sagepub.com at U.A.E University on August 11, 2014