EDUCATIONAL and
PSYCHOLOGICAL
A QUARTERLY JOURNAL DEVOTED TO THE DEVELOPMENT AND
APPLICATION OF MEASURES OF INDIVIDUAL DIFFERENCES
A Technique for the Study of Problem Solving
HJ. A. Riwotpr
:
— 4
{ Reprinted from :
VOLUME FIFTEEN, NUMBER FOUR, WINTER, 1965
an —4A TECHNIQUE FOR THE STUDY OF PROBLEM
SOLVING
H, J. A. RIMOLDE
‘The University of Chicago?
Introduction
In the present article a technique for the study of problem
solving will be described. The technique explores the number,
type and sequence of questions asked by a subject in solving a
problem, the main purpose being to analyze the process of
thinking rather than its end product as indicated by a certain
answer. Differentiation between the “cognitive” and “non-
cognitive” elements involved in the process will not be at
tempted.
One of the criticisms leveled against traditional mental
testing procedures is that they do not give direct indication as
to how a subject thinks. Tests are usually scored in terms of
answers to a series of problems. It is commonly assumed that
for each problem there is only one right answer and that the
other possible answers are all equally or nearly equally wrong.
Nevertheless, it can be demonstrated that the same answer
may be the final outcome of different mental processes. A
process closely related to the one used to get a correct answer
may lead to an incorrect answer; and a process usually asso-
ciated with an incorrect answer may lead to a correct one.
Therefore, the scores obtained by using the traditional type of
tests are of limited usefulness in appraising thinking processes
in themselves, or in disclosing the information needed to solve
a problem; and they do not indicate how this information is
used and evaluated by the subject. Teachers would probably
2 The author is very thankful for the cooperation given to him by Dr. L. Dragstedt
and his seaff at Billings Hospital, University of Chicago, and to the doctors who took
the test at Princeton Hospital, Princeton, New Jersey. Copyright 1955, by H. J. A.
Rimoldi.
*Since September frst, 1955, at Loyola University, Chicago, Tl.
490H. J. A. RIMOLDI 4gt
agree that knowledge of these aspects of a subject’s behavior
should improve selection and prediction procedures as well as
teaching techniques.
Although there has been much experimental work in the
field of problem solving it has proven difficult to quantify the
results and to standardize the procedures. Among recent in-
vestigations of the subject the study of Bloom and Broder (1)
stands foremost as an effort to analyze mental processes rather
than mental products. Their description of the most important
categories constitutes a successful attempt to indicate which
are the significant aspects to be considered in studying prob-
lem solving.
Glaser and associates (4) have used a “tab item” technique,
in some ways similar to the one to be described here to study
“trouble shooting” in electronics. The “tab item” technique,
however, involves a different purpose and format and somewhat
different procedures in presenting the problem and in scoring
the test.
Description of the Test
The examinee is requested to solve a given problem by
asking questions that he judges necessary for its solution. The
questions that the subject might wish to ask are written on
several cards—one card per question—and the corresponding
answers are given on the back of each card. The examiner
records the questions asked and in what order, obtaining a
sequence which indicates the successive steps followed in the
solution of the problem.
The presentation of the problem can be standardized by
previously deciding upon the number and type of questions
that might be asked by the subjects and upon the nature of the
information provided. Together with relevant questions, others
that are not pertinent to the problem should be presented.
Subjects are instructed to select those questions that they
think will lead most directly to a solution of the problem. The
test is ended whenever the examinee reaches a solution and/or
does not wish to ask more questions.
‘The test can be illustrated by describing a medical problem
prepared by the author and briefly described in an earlier pub-452 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT
lication by J. T. Cowles (2). The subject receives, for a specific
clinical case, the type of information usually available to the
physician from the hospital admission chart, from the patient's
complaints, and from other aspects of his clinical history.
Removable cards contained in flat pockets which partially
overlap are evenly arranged on a display folder. On the top
edge of the numbered cards—we shall call them items—the
questions that the examinee may ask are indicated. These
include questions that he might wish to ask of a patient, the
manipulative techniques he might wish to use, the diagnostic
tests he might order, and so forth. By drawing a card and
looking at the reverse side the subject gets information that is
given in the form of verbal reports, laboratory analysis, X-ray
films, etc. For instance, for a question like: Have you been
feverish? the answer might be: Yes, yesterday afternoon had a
temperature of 38°C., and for a question like: Chest X-rays,
the answer might be: Both lung fields normal, Marked cal-
cification in the arch of the aorta, and so forth.
Items referring to similar procedures utilized by the doctor
can be grouped together, for instance those related to lab-
oratory procedures, auscultation, endoscopy, and so forth.
The experimenter or the subject writes the number of each
item as soon as it is chosen or, if the cards are perforated,
inserts them face down on a pin in the same order in which
they are selected. By inspecting the pile of cards the experi-
menter knows the order in which they were requested.
Study of the Items
Three properties of the items will be defined: utility index,
median value, and dispersion.
Utility Index of Each Item
The selection of a given card may be interpreted as indica-
tive of the information that the subject is trying to obtain at a
given moment during the examination. The answers to some
of the questions asked are more rewarding, information wise,
than others, and it is possible to assume that the frequency
with which specific questions are selected by the members of
the group is an index of their usefulness in terms of the infor-H, J. A. RIMOLDI 453
mation that they are expected to provide for the solution of
the problem. Thus, if a given question is chosen by all the
examinees, it may be assumed that it is judged very useful—
indeed crucial—by the group.
The utility index for a given card will be defined as the ratio
between the number of times that it has been selected and the
total number of subjects in the sample. This value can vary
from 1.00 to .00.
Besides defining one experimental property of the items,
the utility index may be used to develop other types of
scores that will be described in the following paragraphs.
Median Value of Each Item
This value indicates during which part of the examination a
given card was selected. Let us assume that we divide the
total examination into four parts. Then it is possible to rescore
the items in terms of the part of the examination in which they
were chosen. For instance those cards selected in the first part
could be given an arbitrary score of one, those in the second
part, a score of two, and so forth. These scores will be called
ordinal scores. For a given sample it is possible to describe
for each item the frequency distribution of its ordinal scores.
The median of these values is taken as an indication of when,
during the examination, a card is likely to be chosen more
frequently.
Dispersion of the Items
From the frequency distribution of the ordinal scores cor-
responding to each item its dispersion can be calculated by
using the interquartile range. Ie is expected that some cards
will have a smaller dispersion than others. For instance, certain
questions will be asked always or nearly always at the beginning
of the examination, others mostly at the end, others anywhere
during the examination.
Possible Methods of Scoring the Test
Agreement Score
It is possible to describe the average order in which the
items are selected by a criterion group. This order will be de-454 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT
fined as “optimal sequence”. The criterion group is formed by
a sample of expert scientists recognized by their colleagues as
of outstanding ability. The coefficient of concordance can be
taken as an indication of the agreement between members of
the criterion group (6).
Another sequence can also be described by discussing with
the group of experts the best order in which the questions
should be asked.
The agreement between a subject’s sequence and the opti-
mal sequence is estimated using the tau coefficient (6). The
higher this value the better the agreement. Other techniques
to indicate degree of agreement can be used, as for example,
those applied in profile analysis (3).
A farther point to be considered is the following: Let us as-
sume that the problem has a total of so items and the subject
selects only 15 items to reach a solution, How shall we deal
with the remaining items? The procedure thus far used has
been to assign to all the remaining items the same rank under
the assumption that all of them have the same chances of be-
ing requested next in order.
Utility Score
Tt seems reasonable to think that the best subjects will
tend to select those items that have a higher utility index. For
a given subject his utility score is the sum of the utility in-
dexes of the cards he chose divided by the total number of cards
selected. The utility score can be determined by using utility
indexes as determined in the whole sample of subjects or by
using the utility indexes obtained in the criterion group. This
value may vary from between 1.00 to .00, although it will
seldom reach these extreme values.
The results given in this study are based on the utility in-
dexes as calculated for the criterion group.
Score Based on Number of Items
The number of cards selected by a subject is another kind
of score. This score might be interpreted in connection with
the agreement and the utility scores, and will be discussed at
greater length in the following sections.H. J. A. RIMOLDI 455
Other indexes can be developed comparing the average num-
ber of items used by the criterion group with the number of
items used by the subjects. It is possible to establish a lower
limit below which a subject’s solution can be interpreted as
guessing for instance, a predetermined value below the
number of cards corresponding to the optimal sequence.
Score Based on the Solution of the Problem
Since the members of the criterion group may give different
solutions to the same problem, a system of differential weights
based on the frequency of these solutions can be developed.
The degree of ambiguity of a problem can be estimated in
terms of the frequency distribution of the different solutions.
Qualitative Analysis
The qualitative analysis of the sequences makes it possible
to infer the crucial information required to solve a problem,
which hypotheses are made by the subject, and how he pro-
ceeds in order to verify them. This information is obtained by
analyzing the spontaneous verbalization of the subjects, by
asking certain questions, or by studying which items are
selected, and when. Some individuals follow a systematic ap-
proach, their “path” is, as it were, straight and direct, and
they select the crucial items in a logical and clear fashion.
Others, on the contrary, change from one hypothesis to another,
following what could be characterized as a devious approach.
Furthermore, some subjects examine again and again the
cards that have been drawn, while others may never do so.
This qualitative analysis may be used as a complement to the
other scores. It is expected that some of the qualitative find-
ings may eventually be quantified.
Application of the Technique to a Medical Situation
The technique has been applied to a group of 38 medical
doctors, including clinicians, surgeons, and advanced medical
students.? The results given by this small group are only used
*"The author is very thankful to Dr. John T. Cowles for securing some of the data
used in this stady.456 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT
for illustrative purposes. The test consists of 5 items in which
information is given about a real patient suffering from a cyst
of the right fallopian tube.
Three different criterion groups were selected including
three, four, and six doctors, respectively. The optimal sequences
for these three groups were established and their coefficients
of concordance are given in Table 1.4 Since the results obtained
using these three groups are fundamentally the same we shall
limit our discussion to the group formed by six experts.
In Table 2 the degree of agreement between each doctor's
sequence and the criterion sequence is expressed in terms of
tau coefficients. The number of cards drawn by each doctor
and the corresponding utility scores are also presented in
Table 2.
Some of the doctors reached the diagnosis that would have
been obtained had the patient been subjected to an exploratory
operation. Most of the doctors who did not reach this diagno-
sis gave one which, on the basis of the non-surgical evidence
presented, was clinically correct.
Discussion of the Findings and of the Technique
The scoring methods described in this article are tentative,
and special studies are now in progress to develop both the
most appropriate methods and the fields in which the tech-
nique can be most successfully applied. In interpreting the
following results the reader should be warned against prema-
ture generalizations in view of the small size of the experimen-
tal sample.
No consistent trend was discovered by plotting for each item
its utility index against its median value and therefore high
utility indexes may be expected of items selected at any mo-
ment during the examination. Nevertheless it seems logical to
think that the dispersion of the ordinal scores of a given item
will be less when the utility index of the item is greater. It seems
likely that, though the median values corresponding to two items
“Table 1 and Table 2 have been deposited as Document number 4689 with the
ADI Auxiliary Publications Project, Photoduplication Service, Library of Congress,
Washington 25, D. C. A copy may be secured by citing the Document number and
by remitting $1.25 for photoprints, or $1.25 for 35 mm. microfilm. Advance payment
is required: Make checks or money orders payable to: Chef, Photoduplicaton Service
Library of Congress.H. J. A, RIMOLDI 4s7
may be the same, the dispersion may be different. In plotting
for each item its utility index against its dispersion, a negative
trend was discovered; i.e., high utility indexes tend to show
small dispersion, and vice versa. This finding is of interest, and
seems to indicate that for the solution of a problem the timing
of the information requested is important.
For the purpose of determining the agreement score, the
non-selected items were assumed to have equal rank. This
assumption on our part was corroborated by the comments
of the subjects. If, after reaching a solution, subjects were
invited to draw more items, a common response by 5 was that
since none of the remaining cards added pertinent information,
the selection of any one or other of them would be equally
irrelevant.
Figure 1 indicates the relationship between agreement and
utility scores. In general, subjects will secure the greater
amount of information (as estimated by the utility score) by
“10
+30)
untity seore
Fiounet458 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT
following the best sequence (as estimated by the agreement
score). Nevertheless, some subjects, as indicated by some of
the points in Figure 1, would tend to maximize the amount of
information at the expense of decreasing their agreement score,
others would tend to maximize the agreement score at the
expense of the utility score, and still others will tend to balance
both of them. If this hypothesis is further verified it might be
possible to postulate varieties of performances in problem
solving situations.
There is a negative relationship between the number of
cards selected and the utility score (Figure 2). This is partly
explained by the definition of the utility score. Nevertheless,
it is conceivable that a subject may select a few cards with low
utility index and consequently show a lower utility score.
‘This might be the explanation for some of the points on the
upper left hand quadrant in Figure 2.
vo. 18 20. 28 30. 88 40. 48 80
Humber of
Fleure 2H, J. A. RIMOLDI 459
0 1 20 28 50 38 40 «8 o
Member of cards
vieune 3
The relationship between agreement score and the number
of cards seems to be curvilinear, as shown in Figure 3. There
seems to be a number of cards above and below which the
agreement score decreases. Thus for a given problem it should
be possible to establish the inflection points below and above
which the agreement score will be smaller than certain given
values.
The technique is highly flexible and it can be used in a
variety of fields. Problems can be prepared, for example, in
medicine, chemistry, law, physiology, and so forth. Situations
that do not require specialized knowledge also can be devised.
The difficulty of the tests may be varied within a wide range.
No information is yet available as to the discriminative power
of the technique. Research is being planned to study this
problem.
At the present time the technique is being used experimen-
tally in the medical and chemical fields. A “logical game”
based on the principles described in this article has been pre-
pared by R. E. John in collaboration with the author (5). This460 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT
electronic machine has been built for the purpose of studying
problem solving in a situation that is less specialty bound. A
description of this test will be available in the future.
The technique may be applied to study individual differ-
ences in problem solving. These differences may be good indi-
cators of personality characteristics. By means of appropriate
experimental design it should be possible to obtain evidence
concerning the organization of behavior in subjects showing
various types of personality dynamics. The technique also
seems to lend itself to the study of correlations between persons
(7, 8). Work along these lines has already been planned.
Summary
Usual psychological techniques are more concerned with the
study of mental products rather than with the study of the
processes leading to the solution of a problem. The technique
described in this article makes it possible to study these proc-
esses by focusing on the choice of information made by a
subject when he attempts to solve a problem.
Several scoring methods are described, and some experimen-
tal results are presented.
The technique can be applied in a variety of fields and
could be used as a complement to the study of correlations be-
tween persons.
The technique is now being used in different fields for the
purpose of determining the best scoring methods, and for
determining its usefulness in education and in the selection of
talent.
REFERENCES
1. Bloom, B. S., and Broder, L. G. Problem Solving Processes of Col-
lege Students. Chicago: The University of Chicago Press,
1950.
a Cowles 5. T. “Current Trends in Examination Procedures.” The
Journal of the American Medical Association, CLV (1954),
1383-1387.
3. Cronbach, LJ, and Gleser, G. C. “Assessing Similarity between
Profiles.” Psychological Bulletin, L. (1953), 456-473-
4. Glaser, R., Damrin, D. E., and Gardner, F. M. The Tab Item
Technique for the Measurement of Proficiency in Diagnostic
Problem Solving Tasks. Champaign: University of Illinois,
College of Education, Bureau of Research and Service, 1952.H. J. A. RIMOLDI 46
§. John, E, R., and Rimoldi, H. J. A., “Sequential Observation of
Complex Reasoning.” The American Psychologist, X (1955),
470 (abstract).
6. Kendall, M. G. Rank Correlation Methods. London: Charles
Griffin and Company, 1948.
7. Stephenson, W. “The Foundations of Psychometry: Four Factor
Systems.” Psychometrika, I. (1936); 195-209.
8. Stephenson, W. The Siudy of Behavior-9: Technique and Its Meth-
odology. Chicago: The University of Chicago Press, 1953.