Items Criticos

7 Interpreting the Content of the
MMPI-2
Critical Items, Content Scales, and
Subscales
Critical Items
The empirical keying or contrasted-groups methodology used by Hathaway in developing
the eight basic clinical scales of the MMPI represented a significant innovation in
personality inventory construction. Previous inventories relied on intuitive judgment to
produce items related to a trait of interest and to combine them with other items thought
to be related to the same trait to compose a scale. The items and scales thus composed
tended to be highly face valid (i.e. the items tended to be obviously related to the trait to
be measured). Unfortunately, they were often unsatisfactorily related to external criteria
and were overly vulnerable to the test-taking attitudes of the examinee that would bias
responses and potentially distort the results of testing.
The new approach required only that items selected for inclusion differentiate
between normal and abnormal criterion groups empirically. The authors of the MMPI
gave little consideration to the content of individual items and were concerned only that
the breadth of coverage for the original item pool was sufficient to sample a wide array of
attitudes and behaviors. The justification for the new empirical strategy was provided in
the classic manifesto of Meehl (1945a), The Dynamics of Structured Personality Tests.
In the decades following the release of the MMPI, clinical needs and advances in
psychometric theory and methods have seen a return to attaching greater importance to
test item content. The dust-bowl empiricist rationale for proscribing the examination
of item content in the clinic was succinctly stated by Meehl (1945b): The scoring does
not assume a valid self-rating to have been given (p. 147). But studies demonstrating
comparable validities for personality assessment instruments developed under both
rational and empirical strategies (Hase & Goldberg, 1967), and the appearance of
sophisticated positions defending the importance of test item content (Goldberg &
Slovic, 1967; Jackson, 1971), were instrumental in moderating Meehls (1945) earlier
position (Meehl, 1971, 1972). These developments were predated, however, by the needs
of clinicians to explore the MMPI as part of and prelude to acquiring proficiency in its use
and the pressing demands of clinical work to understand patients under investigation.
Caldwell (1991) has pointed out that the basic elements of the MMPI that are subject
to interpretation may be located on a gradient of increasing obviousness or transparency,
with the (a) subtle components of the clinical scales and certain additional scales (e.g. K
and, perhaps, Mf), (b) certain highly subtle empirically derived scales (e.g. MACR and
O-H), and (c) the basic clinical scales having subtle components corresponding to three
initial steps of this gradient. Intermediate steps would include empirically derived scales
Interpreting the Content of the MMPI-2 393
more generally, including the more obvious of the basic clinical scales. Still more obvious
would be the components of the basic clinical scales and scales whose developmental
methodology emphasized internal consistency standards, such as the Wiggins, the
MMPI-2 content scales, and the RC scales. The top-most steps in this gradient would
include scales composed of items with unusually low endorsement rates, such as the
F and FB scales, and the so-called critical items. It should not be surprising that the
growth in the appreciation of test item content and its potential importance in the clinic
should have originated with the distinction to which Wiener and Harmon (Wiener,
1948; Wiener & Harmon, 1946) called attention, between subtle and obvious content,
and critical items.
Initially, clinicians sought access to the patients responses to certain items thought
to be clearly indicative of psychopathological disturbance. Grayson (1951) gathered a
set of 38 items for use with a VA population. Endorsement of any of these in the course
of psychodiagnostic evaluation was considered sufficient to warrant more detailed
clinical inquiry into the content area of the item, regardless of whether the MMPI profile
appeared pathological. In some cases, these items were considered pathognomonic of a
condition, such as delusions or suicidal ideation. In others, the item would serve as a red
flag or a stop sign, forcing an interruption of the diagnostic process in order to explore the
patients grounds for its endorsement. These thus came to be called stop or critical items.
The history of critical items actually goes back to the Woodworth (1920) Personal Data
Sheet, which included 10 neurotic tendency items. These were considered indicators of
neurosis, regardless of the remainder of the individuals responses.
Although the Grayson items were widely adopted (Gravitz, 1968), particularly after
being reprinted in An MMPI Handbook (Dahlstrom & Welsh, 1960), alert practitioners
soon discovered that this group of items was overwhelmingly redundant with the F
and Sc scales. Eighty-five percent (32/38) of these items were scored on one or both of
these, 40 percent (16 items) on the F scale alone (Koss, 1979). Moreover, for 92 percent
(35/38) of the items, the keyed response was True. Thus, however salient the content of
the individual items might be as a springboard for investigation within the interview,
it became clear that the Grayson items placed undue stress on psychoticism, to the
exclusion of other problem areas; that these items, like the F scale itself, were sensitive to
gross deviancy and to a set on the patients part to exaggerate or conceal psychological
complaints; and that they were also vulnerable to an acquiescent response style, the
inclination to mark items True, regardless of their content.
Another set of critical items was selected by Caldwell (1969). Seeking a broader
range of content, Caldwell chose 68 items, distributed among nine categories. As a set,
the Caldwell critical items covered a wider range of problem areas than the Grayson
items. Nevertheless, although not quite so dominated by psychoticism, with 56 percent
(38/68) of the items overlapping F and Sc (20 items, or 33 percent, on F alone), these
items remain relatively saturated with this source of variance. These shortcomings
were in addition to the greatest concern of all: that the Grayson and Caldwell critical
items, with their origins in the rational-intuitive processes of their creators, had
no demonstrated empirical relationships with the symptoms and complaints they
enunciated. Critiques by Greene (1980) and R. G. Evans (1984) noted the occurrence
of the Grayson and Caldwell critical items in normal groups, and other contradictory
evidence for their validity.
394 Interpreting the Content of the MMPI-2
Seeking to address the external validity of critical items, Lachar and Wrobel (1979)
investigated the empirical correlates of a large number of items, including those appearing
in the Grayson and Caldwell lists, nominated by clinicians as relevant to 14 common
problem areas for psychiatric in- and outpatients. Lachar and Wrobel ultimately settled
on 111 items, distributed into 11 content areas, of which 99 achieved significant (.05)
correlations with counterpart information on problems recorded in patient files; the
remaining 12 items achieved acceptable correlations with closely related criteria.
An alternative approach was taken by Koss and Butcher (1973; Koss, Butcher, &
Hoffmann, 1976) to ensure empirical correlates for critical items by identifying six
crisis situations, each marked by a set of behaviors or complaints exhibited by patients
Table 7.1 A comparison of the content contained within three sets of critical items
LacharWrobel Caldwell KossButcher
Content No. of Content No. of Content category No. of
category items category Items items
(MMPI/ (MMPI/ (MMPI/
MMPI-2) MMPI-2) MMPI-2)
Psychological discomfort
Anxiety and 11/11 Acute anxiety 9/17
tension
Depression and 16/16 Distress and 11/11 Depressed- 25/22
worry depression suicidal ideation
Sleep disturbance 6/6 Suicidal 5/5
thoughts
Reality distortion
Deviant beliefs 15/15 Ideas of 10/1 Persecutory 12/11
reference, ideas
persecution,
and delusions
Deviant thinking 11/10 Peculiar 9/9 Mental 3/11
and experience experience and confusion
hallucinations
Characterological adjustment
Substance abuse 4/3 Alcohol and 4/3 Situational 15/7
drugs stress due to
alcoholism
Antisocial 9/9 Authority 5/5
attitude problems
Family conflict 4/4 Family discord 7/7
Problematic 4/4 Threatened 3/5
anger assault
Sexual concern 8/6 Sexual 7/6
and deviation difficulties
Somatic 23/23 Somatic 10/10
symptoms concerns
at the time of their admission to the hospital. After defining the crisis group, Koss and
Butcher asked clinicians to identify MMPI items that corresponded to the behaviors
and complaints characteristic of each of the groups. Nominated items were then cross-
validated on newly admitted patients, with non-crisis psychiatric patients serving as
controls.
Despite the different methods used to develop the LacharWrobel, Caldwell, and
KossButcher critical item sets, Table 7.1 suggests a high degree of similarity in the
content of the items for each set, as well as roughly comparable areas of coverage. Each
critical item list references distress and dysphoria, cognitive disruption, psychotic
ideation, and substance abuse. (See Table A6, pp. 569576 in Friedman et al., 2001, for
the MMPI-2 items and scoring direction for the KossButcher, LacharWrobel, and
Nichols critical item lists.)
A fourth list was developed by Nichols (1989) when he found many of the Lachar
Wrobel critical items categories insufficiently homogeneous and those of Caldwell and
of Koss and Butcher, too restricted. For example, in his consultations for a neurologist,
he wanted to be able to specify the kinds of somatic complaints endorsed more precisely,
both in terms of their specific content and in terms of the proportion of items endorsed
within a specific content area, such as motor difficulties or genitourinary complaints.
He was also discontented with the inclusion of items in categories implicating psychotic
mentation that might reflect only unusual culturally based beliefs and experiences. For
example, the item Evil spirits possess me at times, although frequently scored on scales
highly saturated with psychoticism (e.g. F, Pa, BIZ, PSYC, and RC6) and potentially
endorsed as an acknowledgment of the kind of hallucinatory or other anomalous
experience common to psychotic states, may also be endorsed at relatively high
frequency by members of certain religious sects or immigrants from countries wherein
a belief in spirits, evil and otherwise, is more common than in the United States. Nichols
therefore devised a new set of items based on both rational and statistical considerations.
Categories were initially selected from large-scale item factor analyses on a very large
Midwestern psychiatric sample. These categories were then refined by subdividing many
of the somatic factors into more discrete classes of symptoms by eliminating categories
that essentially duplicated the content of normed scales, such as fears and phobias, and
by examining patterns of item overlap among content scales and the Caldwell, Koss
Butcher, and LacharWrobel lists. Because of their established validity characteristics,
virtually all of the items in the KossButcher and LacharWrobel sets were retained
on the Nichols Critical Item List (NCIL). More than any of the alternative critical item
lists, the NCIL permits a more specific assessment of both the range and the intensity of
symptomatic expressions, particularly within the health/somatic/neurological area. The
NCIL for the MMPI-2 contains 217 items spread over four major classes and 23 specific
item clusters.
Of the two chief controversies surrounding the use of critical items, one is conceptual,
the other statistical. The conceptual issue is whether inventory items should be
considered behavior samples or behavioral signs. According to Koss (1979), the earliest
inventories viewed item responses as veridical self-reports or samples of behavior
that could stand in lieu of actual interview or observational data, thereby providing a
more efficient basis for clinical description. It was in part in reaction to this view, and
the disappointing performance of previous inventories guided by it, that Hathaway
chose to adopt an empirical approach to the composition on his scales. Abandoning
the assignment of items to scales on the basis of judgments a priori, Hathaway left
between-groups differences in endorsement frequency to identify each item as a sign of
the criterion group, its significance to be determined by further investigation. Meehls
(1945b) enunciation of the empirical rationale emphasized the range of understandings
various people might bring to test items and stressed the fact that a given statement was
endorsed over the content of the statement itself. In his words, the empirical approach
consists simply in the explicit denial that we accept a self-rating as a feeble surrogate
for a behavior sample, and substitutes the assertion that a self-rating constitutes an
intrinsically interesting and significant bit of verbal behavior, the non-test correlates
of which must be discovered by empirical means.
(Meehl, 1945a, p. 297)
According to Meehl, the importance of a structured inventory response is not so

much in its intrinsic semantic characteristics, its face value, but the fact that certain
kinds of people tend to say certain things about themselves (p. 298).
An unqualified allegiance to the sign approach would obviate the need for selection in
gathering candidate items for the pool; any items would do. In their decision to formulate
items out of their clinical experience, from psychiatric textbooks and interview forms and
from previous personality and attitude scales, Hathaway and McKinley implicitly offered
a bow to the traditional view. Their departure from it came only after the item pool was
established, with the commencement of scale construction. As a result, most of the basic
clinical scales of the MMPI are made up of a mix of items, some seeming to function
more as samples, others apparently operating as signs. Dahlstrom (1969) called attention
to this range in the way items may be understood, and his distinctions tend to parallel
the distinction between obvious and subtle item content. That is, it is the obvious items
that most readily conform to the behavior sample or self-report conception, whereas
the subtle items, to the extent that they are valid, better fit into a conception of items as
behavioral signs.
The statistical issue concerns the reliability of individual items and aggregates
of items such as scales. If testretest stability is the issue here, the MMPI-2 Manual
(Butcher et al., 2001, Appendixes E and G) makes it clear that over short time periods,
at least, the temporal stability of the majority of individual items is at least as good as
that of the most commonly used scales of the MMPI-2. Nevertheless, the endorsement
of individual items may occur as a result of accident, confusion, misunderstanding,
or similar inadvertence. The detailed probing of critical items, even when logistically
feasible, does not always correct such mishaps or yield satisfactory information. Some
respondents will deny valid endorsements by claiming disability or faux pas; others may
feel excessively pressed and intruded upon by close questioning of item responses. For
these reasons, it is recommended that the psychologist begin any probing with the least
threatening critical items and only then proceed to the more threatening content.
Proponents of the emphasis on item content (e.g. Butcher et al., 1990; Wiggins,
1966) have often focused on the internal consistency, typically measured by
Cronbachs (1951) coefficient alpha, of item aggregates (i.e. scales) as an index of
the adequacy of such aggregates for psychological measurement. Coefficient alpha
operates as a measure of the homogeneity of item aggregates, or the degree to which
the content of one item of an aggregate is similar in content to the other items included
within it. Scales that have been developed using procedures designed to maximize
internal consistency tend to bear a striking resemblance to aggregates derived from
the factor analysis of test items (e.g. Friedman Sasek, & Wakefield, 1976; Johnson,
Butcher, Null, & Johnson, 1984; Waller, 1999). In comparing scales, coefficient alpha
will serve as an index of the degree of semantic spread that is observed as new items
are added to a preexisting set. For example, starting with an item such as My dad
is a good fellow, the addition of a second item, I love my dad, results in a very
small enlargement of the semantic focus of the first item and a correspondingly slight
decline in internal consistency. If to these two items one adds a third, My mom is a
good mother, the semantic focus is enlarged from positive sentiments about the father
to similar sentiments toward parents. This will result in a further, and probably larger,
increment of decline in internal consistency. The addition of a fourth item, I enjoy
kids, broadens the focus yet again from parents to parents and children or perhaps
to even people in general and may be associated with a still larger drop in internal
consistency. A fifth addition, I like romantic movies, would suggest a very sharp
broadening of semantic focus to something like liking things and people and would
be associated with a correspondingly sharp decline in coefficient alpha. Adding a sixth
item, I hate all my relatives, might either augment both the focus of the aggregate
and the value for internal consistency if it were scored False or essentially destroy its
semantic coherence and internal consistency if scored True. The main point is that
internal consistency reflects the degree of semantic redundancy within a given set of
items and, by implication, the personological redundancy (or strength, or robustness)
of the attribute in the individual who achieves a high endorsement rate for the items
in the set. However, coefficient alpha is always a declining function of the number
of non-identical items in a set. The point at which to close a given set to additional
items is therefore at least somewhat arbitrary, and there is always at least some trade-
off between internal consistency and testretest stability because internal consistency
favors short scales, whereas temporal stability and classification accuracy (Emons,
Sijtsma, & Meijer, 2007; Kruyen, Emon, & Sijtsma, 2012) favor long ones.
The internal consistency approach to scale development contrasts with the empirical
or criterion keying approach, in which items are selected for inclusion on a given scale
because they are associated with higher rates of endorsement by members of the criterion
group than by groups such as normals. The values of coefficient alpha for empirically
derived scales may vary considerably from one scale to the next but are generally modest
in comparison with content-based scales. Each of the items included in an empirical
scale is assumed to provide an increment of non-redundant criterion (e.g. diagnosis in
the case of most of the basic MMPI scales) related variance that will make a unique
contribution to the identification of cases similar to those comprising the criterion
group. Although it is true that such scales may possess considerable common variance
and yield correspondingly high estimates of internal consistency (e.g. Scales 1, 7, and 8),
such variance is not a goal of the method but an artifact of major sources of variation
operating within the item pool at large. In some instances, large common variances may
even attenuate the validity of empirically derived scales by reducing their specificity,
Scale 7, saturated as it is with the first factor, being an example.
In this context, the NCIL might be seen as a compromise between the critical
item approach and the content scale approach to the analysis of content. By forming
critical items into relatively many but small sets, or miniscales, the NCIL may be less
vulnerable to interpretive hypotheses based on unintentional endorsements and yield
a more pointed picture of a persons symptoms and complaints than is possible when
pathological responses to a small cluster of items become obscured, given that such
clusters may become hidden within their parent scales.
Another set of small, highly homogeneous item clusters have been proposed as
subscales for the MMPI-2 content scales (Ben-Porath & Sherwood, 1993; Sherwood &
Ben-Porath, 1991) discussed below.
It is primarily in the context of other test data that sets of critical items, or any highly
redundant cluster of items, have potential value to the clinician. Individual critical items
and small sets thereof offer the clinician access to very highly focused components of the
participants self-report that may be obscured or lost in aggregates that approach the size
of conventional scales, or even scales of high internal consistency, such as content scales.
However, in the absence of supporting indicators in other features of the test, such as
scale scores, profile codes, and the like, critical items are likely to over-predict the kinds
of symptoms, problems, and concerns suggested by their content. It is significant that
with the exception of the now rarely used Grayson items, all critical item lists subdivide
their items into categories determined by content, thereby deemphasizing Graysons
initial conception of these items as stand-alone or stop items, in favor of a conception
that places critical items in a position intermediate between stop items and formally
developed and normed scales.
Whatever their source, critical items afford the clinician a valuable if not always
reliable channel of communication with the patient. Single and small sets of items are
the means by which the patient can most directly address his or her concerns to the
psychologist within the context of the MMPI-2. Although the various scales and indexes
of the test serve to identify those problem areas that are of significance in the patients
current life and circumstances, it is only through single-item responses that the patient
can call the clinicians attention to his or her specific problems. The clinicians access
to some of these responses, in the form of critical items, can help to create a channel
of communication that stands to facilitate empathy between therapist and patient and
build a bridge between the phases of assessment and treatment.
Content Scales
As highly homogeneous collections of items with similar content, the MMPI-2 content
scales also provide a means by which the patient can communicate with the clinician.
Because the most immediate access to the symptomatic behavior and concerns of
the patient is through scales having a strong thematic character, content scales are
designed in a way that allows them to respond directly to aspects of the examinees self-
presentation on the MMPI-2. As Wiggins, Goldberg, and Applebaum (1971) noted, the
view that the MMPI constitutes an opportunity for communication between S [subject]
and the tester has much to commend it; not the least of which is the likelihood that this
is the frame of reference adopted by the S himself (p. 403).

Items Criticos

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Items Criticos

Enviado por

Direitos autorais:

Formatos disponíveis

7 Interpreting the Content of the

According to Meehl, the importance of a structured inventory response is not so

Você também pode gostar