Você está na página 1de 221

ACTING AND REFLECTING

SYNTHESE LmRARY

STUDms IN EPISlEMOLOGY,

LOGIC, METHODOLOGY, AND PHILOSOPHY OF SCmNCE

Managing Editor:

JAAKKO HINTIKKA, Florida State University, Tallahassee

Editors:

DONALD DAVIDSON, University ofealiforma, Berkeley


GABRIEL NUCHELMANS, University of Leyden
WESLEY C. SALMON, University of Pittsburgh

VOLUME 211
ACTING
AND REFLECTING
The Interdisciplinary Tum in Philosophy

Edited by

WILFRIED SIEG
Department of Philosophy. Carnegie Mellon University.
Pittsburgh. U.S.A.

KLUWER ACADEMIC PUBLISHERS


DORDRECHT / BOSTON / LONDON
ISBN-13: 978-94-010-7617-3 e-ISBN-13: 978-94-009-2476-5
DOl: 10.1007/978-94-009-2476-5

Published by Kluwer Academic Publishers.


P.O. Box 17. 3300 AA Dordrecht. The Netherlands.

Kluwer Academic Publishers incorporates


the publishing programmes of
D. Reidel. Martinus Nijhoff. Dr W. Junk and MTP Press.

Sold and distributed in the U.S.A. and Canada


by Kluwer Academic Publishers.
101 Philip Drive. Norwell. MA 02061. U.S.A.

In all other countries. sold and distributed


by Kluwer Academic Publishers GrouP.
P.O. Box 322. 3300 AH Dordrecht. The Netherlands.

printed on acidfree paper

All Rights Reserved


© 1990 by Kluwer Academic Publishers
Softcover reprint of the hardcover 1st edition 1990
No part of the material protected by this copyright notice may be reproduced or
utilized in any form or by any means. electronic or mechanical.
including photocopying. recording or by any information storage and
retrieval system. without written permission from the copyright owner.
To Ernest Nagel
It would presumably be taken as a
sign of extreme naivete, if not callous
insensitivity, if one were to ask why
all this ardor to reconcile the findings
of natural science with the validity of
values? ...
The point of the seemingly crass
question ... is thus to elicit the rad-
ical difference made when the prob-
lem of values is seen to be connected
with the problem of intelligent ac-
tion. If the validity of beliefs and
judgements about values is depen-
dent upon the consequences of ac-
tion undertaken in their behalf, if the
assumed association of values with
knowledge capable of being demon-
strated apart from activity, is aban-
doned, then the problem of the in-
trinsic relation of science to value
is wholly artificial. It is replaced
by a group of practical problems:
How shall we employ what we know
to direct the formation of beliefs
about value and how shall we direct
our practical behavior so as to test
these beliefs and make possible bet-
ter ones? The question is seen to
be just what it has always been em-
pirically: What shall we do to make
objects having value more secure in
existence? And we approach the an-
swer to the problem with all the ad-
vantages given to us by increase of
knowledge of the conditions and re-
lations under which doing must pro-
ceed.
John Dewey
from John Dewey, The Quest for Certainty, volwne 4 of ''The
Later Works, 1925-1953", Southern Illinois University Press,
Carbondale and Edwardsville, 1988.

vii
TABLE OF CONTENTS

PREFACE by Wilfried Sieg Xl

PART I. PHILOSOPHY? 1

1. Patrick Suppes, PHILOSOPHY AND THE SCIENCES 3

2. Thomas Schwartz, IMPRESSIONS OF PIIILOSOPHY 31

3. THE COMPUTATIONAL MODEL OF THE MIND, a panel discussion


with contributions by Dana S. Scott, Gilbert Harman,
John Haugeland, Jay :McClelland, and Allen Newell 39

4. Herbert A. Simon, DISCUSSION: PROGRESS IN PHILOSOPHY 57

5. Clark Glymour, PIIILOSOPIIY AND THE ACADEMY 63

PART II. WORKING. 73

1. David Carrier, PALE FIRE SOLVED 75

2. Robin Clark,INCREMENTAL ACQUISITION AND A


PARAMETRIZED MODEL OF GRAMMAR 89

3. Dan Hausman, WHAT ARE GENERAL EQUILIBRIUM


THEORIES? 107

4. Kevin Kelly, EFFECTIVE EPISTEMOLOGY, PSYCHOLOGY,


AND ARTIFICIAL INTELLIGENCE 115
with a rejoinder by Herbert A. Simon,
EPISTEMOLOGY: FORMAL AND EMPIRICAL

5. Jonathan Pressler, THE FLAWS IN SEN'S CASE AGAINST


PARETIAN LIBERTARIANISM 129

6. Teddy Seidenfeld, M.J. Schervish, and J .B. Kadane, DECISIONS


WITHOUT ORDERING 143
TABLE OF CONTENTS

7. Wilfried Sieg, REFLECTIONS ON HILBERT'S PROGRAM 171

8. Peter Spirtes, Richard Scheines, and Clark Glymour, THE


TETRAD PROJECT 183

PART III. POSTSCRIPTUM 209

1. Isaac Levi, RATIONALITY UNBOUND 211

x
PREFACE

In the fall of 1985 Carnegie Mellon University established a Department


of Philosophy. The focus of the department is logic broadly conceived, philos-
ophy of science, in particular of the social sciences, and linguistics. To mark
the inauguration of the department, a daylong celebration was held on April
5, 1986. This celebration consisted of two keynote addresses by Patrick Sup-
pes and Thomas Schwartz, seminars directed by members of the department,
and a panel discussion on the computational model of mind moderated by
Dana S. Scott. The various contributions, in modified and expanded form,
are the core of this collection of essays, and they are, I believe, of more than
parochial interest: they turn attention to substantive and reflective interdis-
ciplinary work.
The collection is divided into three parts. The first part gives perspec-
tives (i) on general features of the interdisciplinary enterprise in philosophy
(by Patrick Suppes, Thomas Schwartz, Herbert A. Simon, and Clark Gly-
mour) , and (ii) on a particular topic that invites such interaction, namely
computational models of the mind (with contributions by Gilbert Harman,
John Haugeland, Jay McClelland, and Allen Newell). The second part con-
tains (mostly informal) reports on concrete research done within that enter-
prise; the research topics range from decision theory and the philosophy of
economics through foundational problems in mathematics to issues in aes-
thetics and computational linguistics. The third part is a postscriptum by
Isaac Levi, analyzing directions of (computational) work from his perspective.
The intent of the volume is clearly programmatic: we want to invigorate
and strengthen a tradition-in philosophy-that joins theoretical analysis
and reflection with substantive work in a discipline. How else-but through
such work-to garner the proper material for analysis? Isn't such active
work and a sense of a discipline's history needed to reflect on the direction
or misdirection of particular developments? And isn't, in addition, a criti-
cal philosophical awareness needed to recognize important general problems?
Reflection has to be based on sound analyses not to degenerate into idle spec-
ulation, and its results have to be challenged by genuine problems to test their
adequacy. These questions and remarks apply in particular to philosophy's
interaction with scientific disciplines; there too, we are pushed to interdisci-
plinary work-how else can we thoroughly appreciate that science is not "a
set of technologies" nor "a body of results", but rather "a continuing process
of inquiry whose fruits are the products of a remarkable intellectual method"?
Ernest Nagel, who was teacher and friend to many of us, emphasized the
need to view science in that light not only to uncover the structures of science,

Xl
PREFACE

but also for another social end, namely to help overcome "the age-old and
socially costly conflict between the sciences and the humanities". And here
philosophy has a special role; settled by tradition among the humanities, it
is deeply intertwined with the sciences and in particular, with mathematics.
Indeed, with the latter it shares a penchant for pure, shall we say speculative,
thought and the need for working connections to other disciplines: broad
conceptual designs emerge from and have to be measured against multifarious
experience. Nagel admits and emphasizes that science does not exhaust the
modes of experiencing the world. "The primary aim of science is knowledge;
and however precious this fruit of science may be, it clearly is not and cannot
be a substitute for other things which may be equally precious, and which
must be sought for in other ways." But no one who is deeply devoted to the
humanities can ignore the particular dimension of experience to which science
is relevant.
It satisfies that desire [to know] by dissolving as far as it can our romantic illusions
and our provincialisms through the operation of a social process of indefatigable
criticism. It is this critical spirit which is the special glory of modern science.
There are no reasonable alternatives to it for arriving at responsibly supported
conclusions as to where we stand in the scheme of things and what our destinies
are. 1
In the smaller scheme of things that effect our destinies so much more
directly, I want to express my admiration for the vision and courage of the
administration of President Cyert and the faculty at Carnegie Mellon to create
a modern department of philosophy and thus, the occasion. The Inaugural
Celebration was organized by Dan Hausman and Dana Scott; they laid the
groundwork for a most informative and joyful day. As to this volume 2 , I
thank all contributors for the (additional) work of preparing their papers
for publication; my discussions with Tom Schwartz and Teddy Seidenfeld
were important for sharpening its distinctive direction. Finally, my thanks to
Kathryn Black who prepared the manuscript with unstinting care (in :u.TEX)
and sound advice in matters of style.

Wilfried Sieg
Pittsburgh, July 1, 1989

1 All quotations are from "Modern Science in Philosophical Perspective", an article pub-
lished in 1959 and reprinted in Nagel's collection of essays Teleology Revisited and
other E6IaY6 in the Philosophy and Hi6tory oj Science, Columbia University Press, New
York, 1979, pp. 7-28.
2The preparation of the volume was, in part, supported by a grant from the Buhl
Foundation.
XII
PART I.

PHILOSOPHY ?

Things are what they are,


and their consequences will be what they will be;
why then should we desire to be deceived?
Bishop Butler
PHILOSOPHY AND THE SCIENCES

PATRICK SUPPES

The great tradition in philosophy, from Aristotle to Kant, was that philoso-
phy legislated the methodology and foundations of science. It can be claimed
that, in spite of the many centuries separating Aristotle and Kant, it is still
true that the three most important foundational works on science were Aris-
totle's Posterior Analytics, with many points amplified in the Physics and
the Metaphysics, Descartes' Principles of Philosophy, and at the other end
of the period the very specific working out of the foundations of physics in
Kant's Metaphysical Foundations of Natural Science, with the more general
lines of argument being given in the Critique of Pure Reason. It is not dif-
ficult to trace the enormous impact of Kant on physics in the nineteenth
century, especially German physics, and also psychology, even though Kant
was skeptical of providing the kind of foundations for psychology he gave for
physics.
A different kind offoundational effort was made by logical positivism. In
this case the effort was more to say what was not science but bad metaphysics,
rather than to lay down a detailed foundation for science itself. Certainly
in the tradition of logical positivism there was nothing so close to the actual
spirit of classical physics as is to be found in Kant's Metaphysical Foundations
of Natural Science, or, earlier, in Descartes' Principles.
But those days are gone and done for. I am skeptical that we shall
ever find a revival of the view that philosophy can seriously legislate the
foundations of any science. Indeed, I shall even question as we examine the
matter in more detail that there is a serious sense in which there should be
the foundations of any of the major sciences. The enterprise of foundations,
I want to claim, has become inevitably and irreducibly pluralistic in charac-
ter. The analysis of certain problems or their solutions, because of their wide
conceptual interest, has a foundational character. But there is not some epis-
tomological or metaphysical view that can be used to organize in a definitive
way classification of problems as foundational in nature. There is not some
selected and small list of problems that are regarded as the central problems
of the foundations of anyone discipline. Of course, some physicists still talk
this way, but the record speaks for itself: whenever one range of problems
is solved that were regarded at one point as foundational and fundamen-
tal in an absolute sense, a new range of problems replaces them. I see no
reason to be other than skeptical about the ultimate nature of the physical

3
w. Sieg (ed.), Acting and Reflecting, 3-30.
© 1990 by Kluwer Academic Publishers.
4 CHAPTER 1

universe being settled, whether we are concerned with the final version of
the big bang or the final statement of the fundamental forces. In fact, to
make a skeptical prediction, I think it likely that the inappropriateness of
the detailed analysis of forces in Kant's Metaphysical Foundations of Natural
Science will be matched by a corresponding datedness for the current views
of the fundamental physical forces a hundred years hence.
The old theological drive for certainty and salvation is hard to con-
trol and I am sure that there will be continual attempts to put this or that
scientific discipline on an "ultimate" foundational basis, but all that will re-
sult in practice is a partial solution of some interesting problems, which is a
good outcome, or what is a bad outcome-the development of a new form of
scholasticism irrelevant to current scientific work.
Let me give some examples to illustrate this general remark. Mathe-
maticians have currently lost interest in foundations as classically conceived.
The development of classical foundations has become a technically sophisti-
cated and important subdiscipline but its philosophical role has nearly faded
away. A different kind of example where foundational scrutiny is still actively
involved in the main scientific developments is the intense Bayesian contro-
versy in statistics. An example of still another sort is provided by quantum
mechanics. Partly because the literature on physics is now so diverse and so
large, but also because of the focus of much of the foundational literature,
there are large parts of the foundational literature on quantum mechanics that
are really only known to specialists. A good example would be the now quite
extensive literature on quantum-mechanical logic. Another area of greater
interest to physicists in general, but still a subject that has become too spe-
cialized to follow in detail, is the continuing controversy about the existence
of hidden variables. The controversy about hidden variables continues to be
an active area of interest, even to some experimental physicists, but It has
to be regarded as a foundational subject, not as one of the most important
areas of current research in physics.
I mention these various examples just to give a descriptive sense of
the way in which foundational interests interact with a particular discipline.
What I have to say is not meant to be evaluative but I also want to emphasize
that what I have to say is not meant to be a permanent or static descriptive
analysis. The proper attitude, it seems to me, is very much not only pluralis-
tic but dynamic. The periods of great interest in foundations in a discipline
as a whole are periods that wax and wane with particular features of the
development of the discipline.
The kind of sweeping viewpoint that Aristotle or Kant tried to put forth
aggressively in defense of the central role of philosophy is out of the question
now. Current research in physics, for example, is too complicated, technical,
PHILOSOPHY AND THE SCIENCES 5

and diverse even for physicists to understand all the various subdisciplines.
It is a hopeless task for philosophers to think of offering some kind of un-
derpinnings for this vast intellectual enterprise. I simply pick physics as an
example. This is certainly true for other disciplines as well. The disciplines
are held together by a traditional conglomeration of ideas, which often be-
come separated over time. There is for most scientific disciplines no serious
unified sense of foundations even possible.
This may sound pessimistic and skeptical about any role philosophers
may have. This is not my view. There is a role for philosophy in relation to the
sciences. We are no longer Sunday's preachers for Monday's scientific workers,
but we can participate in the scientific enterprise in a variety of constructive
ways. Certain foundational problems will be solved better by philosophers
than by anyone else. Other problems of great conceptual interest will really
depend for their solution upon scientists deeply immersed in the discipline
itself, but illumination of the conceptual significance of the solutions can be
a proper philosophical role.
In the rest of this lecture I will try to illustrate these general ideas by
considering three examples of scientific problems and results in a given area
that have philosophical interest-indeed philosophical interest in relation to
long-standing problems in the philosophy of science. But as should be evident
from what I have already said, I do not mean to suggest that the three
examples I have chosen lead to anything like a philosophical claim about
science of the sort we associate with Aristotle or Kant.
The first example deals with randomness and determinism in classical
physics, the second with hidden variables in quantum mechanics, and the
third with the nature of visual space.
So, of my three examples, two are taken from physics and one from
psychology. Examples from other disciplines could as easily have been selected
but the selection of three problem areas had to be made not in terms of some
metaphysical criterion of interest but in terms of problems I happen to know
something about.
Determinism and Randomness
One of the great issues in the philosophy of science in the twentieth
century has been the conflict between the deterministic features of classical
physics and the development of probabilistic models of all kinds of natural
phenomena, with randomness as a central feature of such models. Quantum
mechanics, of course, in the view of many persons, has shown once for all that
there exist significant natural phenomena that are in principle indeterminis-
tic. I have something more to say about quantum mechanics in my second
example. What I want to challenge now in a decisive way is the conventional
6 CHAPTER 1

picture that classical mechanics is deterministic and therefore in no sense


random. There are several ways of getting at the demonstration that this is
a mistaken dichotomy, but I think the most striking example and, indeed,
one of the most striking theorems in the entire history of classical mechanics
arises from detailed consideration of a special case of the three-body prob-
lem, which is without doubt the most extensively studied problem in classical
mechanics. The special case is this. There are two particles of equal mass ml
and m2 moving according to Newton's inverse-square law of gravitation in an
elliptic orbit relative to their common center of mass, which is at rest. The
third particle has a nearly negligible mass, so it does not affect the motion
of the other two particles, but they affect its motion. This third particle is
moving along a line perpendicular to the plane of motion of the first two
particles and intersecting the plane at the center of their mass-let this be
the z axis. From symmetry considerations, we can see that the third particle
will not move off the line. The restricted problem is to describe the motion
of the third particle.
To obtain a differential equation in simple form, we normalize the unit
of time so that the temporal period of rotation of the two masses in the
x, y-plane is 211', we take the unit of length to be such that the gravitational
constant is one, and finally ml = m2 = ~, so that ml + m2 = 1. The force
on particle m3, the particle of interest, from the mass of particle 1 is:

Fl = m1 ( - Z , r)
z2+ r2 ";z2+r2'

where r is the distance in the x,y-plane of particle 1 from the center of mass
of the two-particle system ml and m2, and this center is, of course, just the
point x = = =
y z O. Note that ~ is the unit vector of direction of the
~
force Fl. Similarly,

So, simplifying, we obtain as the ordinary differential equation of the third


particle

The analysis of this easily described situation is quite complicated and tech-
nical, but some of the results are simple to state in informal terms. Near the
PHILOSOPHY AND THE SCIENCES 7

escape velocity for the third particle-the velocity at which it leaves and does
not periodically ret urn-the periodic motion is very irregular. In particular,
the following remarkable theorem can be proved. Let tl h ... be the times
at which the particle intersects the plane of motion of the other two parti-
cles. Let Sic be the largest integer equal to or less than the difference between
tlc+l and tic times a constant. 1 Variation in the Sic'S obviously measures the
irregularity in the periodic notion. The theorem, due to the Russian math-
ematicians Sitnikov (1960) and Alekseev (1969a,b), as formulated in Moser
(1973), is this.

Theorem 1. Given that the eccentricity of the elliptic orbits is positive but
not too large, there exists an integer, say a, such that any infinite
sequence of terms Sic with Sic ~ a, corresponds to a solution of the
deterministic differential equation governing the motion of the third
particle. 2

A corollary about random sequences immediately follows. Let s be any ran-


dom sequence of heads and tails-for this purpose we can use any of the
several variant definitions-Church, Kolmogorov, Martin-Lof, etc. We pick
two integers greater than a to represent the random sequence-the lesser of
the two representing heads, say, and the other tails. We then have:

Corollary. Any random sequence of heads and tails corresponds to a solution


of the deterministic differential equation governing the motion of the
third particle.

In other words, for each random sequence there exists a set of initial conditions
that determines the corresponding solution. Notice that in essential ways the
motion of the particle is completely unpredictable even though deterministic.
This is a consequence at once of the associated sequence being random.
From a general philosophical standpoint, what this example suggests
above all is that the classical dichotomy between deterministic and indeter-
ministic phenomena is not really the one that has been the major worry.
What we are in many contexts mainly concerned with is not determinism but
prediction. What the theorem shows is that the real dichotomy is between
determinism and prediction, not between determinism and randomness. In
other words, we can have systems that are both deterministic and random,
and we can also have systems that are deterministic but completely unpre-
dictable in their behavior. In the present context it is not appropriate to
attempt a detailed disentangling of the relationships between the four con-
cepts of determinism, indeterminism, randomness, and predictability, but I
hope that I have been able to suggest in these rather brief remarks that the
8 CHAPTER 1

relationship is not that which is often claimed philosophically. There is an-


other point to make in this connection that bears on my general thesis about
the relation between philosophy and the sciences. In discussions of deter-
minism, a well-known paper of Montague "Deterministic Theories" (1974) is
often cited. Montague proves some useful general theorems about determin-
ism in a setting that he formulates precisely for classical mechanics, but from
a mathematical standpoint the proofs of the theorems are all quite simple,
and from a physical standpoint no really interesting phenomena are treated.
In contrast, I would say, by looking more deeply at results in a particular
science, in this case mechanics, we are led to genuinely surprising results, as
reflected in Theorem 1, whose proof demands the full resources of modern
work in mechanics.
Bell's Inequalities in Quantum Mechanics
Bell's inequalities are formulated for measurements of quantum-mechan-
ical spin of pairs of particles originally in the singlet state. A variety of specific
experimental realizations has been given in the literature. Let A and A' be
two possible orientations of apparatus I, and let Band B' be two possible
orientations of apparatus II. Let the measurement of spin by either apparatus
be 1 or -1, corresponding to spin 1/2 or -1/2, respectively. By E(AB), for
example, we mean the expectation of the product of the two measurements
of spin, with apparatus I having orientation A and II having orientation B.
By axial symmetry, we have E(A) = E(A') = E(B) = E(B') = 0, i.e., the
expected spin for either apparatus is O. Note that we now use the notation A,
A', B, and B' for the random variables whose values are the results of spin
measurements in the four positions of orientation. It is, on the other hand, a
well-known result of quantum mechanics that the covariance (or correlation)
term E(AB) is -cosO(A,B), where O(A,B) is the difference in angles of
orientation A and B. Again, by axial symmetry only the difference in the
two orientations matters, not the actual values A and B.
On the assumption that there is a hidden variable that renders the spin
results conditionally independent, i.e., that there is a hidden variable A such
that E(ABIA) = E(AIA) E(BIA), Bell (1964) derives the following inequali-
ties:

-2 ~ E(AB) + E(AB') + E(A' B) - E(A' B') ~ 2,


-2 ~ E(AB) + E(AB') - E(A' B) + E(A' B') ~ 2,
-2 ~ E(AB) - E(AB') + E(A' B) + E(A' B') ~ 2,
-2 ~ -E(AB) + E(AB') + E(A' B) + E(A' B') ~ 2.

(This form of the inequalities is due to Clauser, Horne, Shimony, and Holt,
1969.)
PHILOSOPHY AND THE SCIENCES 9

The work described thus far falls in a rather standard way within physics,
but the problem is of such general interest and connects to so many other
issues in philosophy, that it is important to see how Bell's inequalities can be
pursued further in a way that does not really depend upon additional physical
assumptions but on general matters of probability and logic.
The first step to mention is Fine's (1982) proof that Bell's inequalities
hold for the four random variables A, A', B, and B', if and only if there exists
a joint probability distribution of the four random variables compatible with
the four given covariances. Note that it will be part of the joint distribution
to fix the two covariances that are not determined by the experimental data,
namely, the covariance of A and A', and the covariance of B and B'.
Bell obtained the inequalities by reasoning from the existence of a hidden
variable. It is also straightforward to show that a joint probability distribu-
tion compatible with the given covariances implies Bell's inequalities. What
is surprising and interesting about Fine's result is that the inequalities are
sufficient for a joint distribution. On the other hand, the result is mathemati-
cally special. For N > 4, satisfaction of Bell's inequalities for every quadruple
of the N random variables is not a sufficient condition for existence of a joint
distribution.
A second result, related in a more general way to this discussion, is an
earlier theorem of Suppes and Zanotti (1981) that relates the existence of a
hidden variable to the existence of a joint probability distribution. The the-
orem as originally stated by Zanotti and me assumed the random variables
had only only two values but, as Paul Holland pointed out (Holland & Rosen-
baum, 1986), the generalization to a finite number of values is immediate. So
the theorem on the existence of a hidden variable or, as it is more generally
called in the philosophical literature, a common cause is as follows:
THEOREM 2. Let Xl, ... , Xn be finite-valued random variables. Then
a necessary and sufficient condition that there is a random variable A
such that Xl, ... ,Xn are conditionally independent given A is that there
exists a joint probability distribution of Xl, ... ,Xn .
In the statement of the theorem, A is of course what the physicists would call
a hidden variable. What is philosophically interesting about this theorem is
that if no restrictions, for example, physical assumptions about the nature of
the hidden variable, are made, then always trivially we can find one for any
phenomenon for which there exists a joint probability distribution. More-
over, we can find a hidden variable that is deterministically related to the
phenomenological variables.
Of course, when a negative result is anticipated, as in the case of quan-
tum mechanics, it is reasonable to put no conditions whatsoever on the nature
10 CHAPTER 1

of the hidden variable, for then a negative result is as strong as possible. But
what happens in the case of quantum mechanics is also clear. This reduces
the problem of a hidden variable just to the question of a joint probability
distribution's existing for given random variables. This is a question that
arises, one might say, in a ubiquitous way in quantum mechanics; for exam-
ple, in general the position and momentum of a particle do not have a joint
distribution.
What Theorem 2 shows is that we have in the general case a complete
reduction of the existence of a hidden variable to the existence of a joint prob-
ability distribution of the phenomenologically given random variables. Note
that although the theorem is stated for finite-valued random variables, con-
tinuous distributions may be approximated arbitrarily well by such discrete
distributions.
The next step is to look more carefully in a general methodological way
at what is involved in the existence or nonexistence of a joint distribution
as data are collected in any particular empirical situation. When data are
recorded for several random variables in what we might term the standard
way, then there is no problem of the existence of a joint distribution. Without
trying to define this standard approach in a general way, let me illustrate by
a couple of vivid examples. Suppose we are concerned with the distribution
of height and weight in the population of entering students in American uni-
versities in the fall of 1986. We record a large sample chosen with appropriate
methodology of sampling, and as we observe each student we measure height
and we measure weight. For each individual observed we put in our data
records the height and the weight of the individual. It is an implicit assump-
tion of such procedures that it does not really matter within a few moments
which variable we measure first. So the measurement of one variable does
not have any impact whatsoever on the measurement of the second. If we
measure height first then our procedure for measuring height does not affect
the outcome of the following weight measurement. This assumption about
sequence in time is dependent upon the interval of time between the two
measurements being quite short. If we measured height when the students
were entering the university, and measured weight four years later, we would
have a joint distribution if we identified appropriately each individual, but it
would not in any sense be the joint distribution we had originally planned to
study, namely, the "simultaneous" distribution of height and weight.
What is suggested by these remarks is that the nonstandard cases can
be classified into several different natural categories. For example, we can
obtain a joint distribution of height for individuals where we are measuring
height separated by a fixed number of years. Such temporal distributions are
of great interest, and it is in fact disappointing how poor the data are on
PHILOSOPHY AND THE SCIENCES 11

such a longitudinal variable as height in terms of good information about the


sample paths of children's increase in height. In the case of such temporal
separation there is no reason to suppose that the first measurement in any
way interferes with the second.
A second kind of case occurs when the measurement of the first vari-
able definitely interferes with the measurement of the second-interferes in
the sense that the first measurement distorts the nature of the object being
measured in such a way that it affects in a significant fashion the result of
the second measurement. Here the classical scientific cases are to be found in
quantum mechanics. If we measure position of a particle, then in general we
affect the particle's state in making that measurement and therefore when we
measure momentum we get a different measurement than we would have an-
ticipated getting if we had reversed the procedure and measured momentum
first. In other words, we cannot get a "simultaneous" distribution of position
and measurement for particles of atomic or subatomic size. We obtain a joint
distribution but not the one in which we are interested.
There is a way of describing this situation that has not been used too
often but that I think is important from a philosophical standpoint. We can
easily claim that identity conditions have been violated in the following sense.
When we measure the position of a particle, we change in an essential way
the state of the particle and therefore the particle we are now observing is
not, in one clear sense, the same particle.
We need to be somewhat careful in the characterization of identity con-
ditions in these situations. We might want to hold on to a bare identity of
the particle, but claim that what is important is that the properties of the
particle do not have a continuing identity in time. So we cannot get a joint
distribution of position and momentum because when we measure position,
for example, we change the state of the particle in such a way that, if we now
want to measure momentum, the momentum of the particle is significantly
different from the momentum of the particle before the measurement of po-
sition. The identity, in other words, of the property of momentum has been
destroyed. So when we talk about identity conditions here the appropriate
thing, in general, is to talk about properties, although in some cases we can
also be faced with the destruction of the particle itself, as in the case of the
observation of photons.
This violation of identity conditions is not peculiar to quantum me-
chanics. In all kinds of situations, where interaction is expected between
properties and where a measurement or treatment affects one property, we
can anticipate the identity of another property of an object being destroyed
or, if a less extreme term is preferred, changed. A simple but clear example
is the following. Suppose the producer of a certain achievement test wants to
12 CHAPTER 1

determine if the two forms of the test are parallel. One simple way to do this
would be to give test A to students and then immediately give test B. If test
A had no impact on the state of the student's skill or competence being mea-
sured, then immediate retest with test B would be a good way to determine
that test A and test B were parallel forms measuring the same competence in
the student. Yet almost all psychological ideas about testing would hold that
immediately giving test B after test A would lead to a poor measurement
of parallelness of the tests, for the impact of having just taken test A would
measurably change the student's response to test B. Invasive measurements
of physiological properties can have similar interference effects, even though
we like to think that ordinary physiological measurements used for purposes
of assessing the state of health of an individual do not significantly interfere
with each other.
The third category represents the extreme case of modification, which
has already been hinted at. In this case the first measurement destroys the
object, and consequently the second measurement is not even possible with
bare identity conditions of the object holding. The classic case in quantum
mechanics is the measurement of properties of photons. For many kinds of
measurements of photons one measurement is all we can make, but this is
not special again to quantum mechanics and is not true in general for pho-
tons. A familiar example is that of sampling procedures for testing quality
of objects. Many quality-assurance programs require destruction of the ob-
jects that are sampled and in many such measurements of quality only one
significant measurement is made because that significant measurement, the
one of importance, leads to destruction of the object. When complicated
objects are tested for quality assurance, as, for example, by the Underwriters
Laboratory, we are faced with progressive destruction of the object rather
than destruction by a single measurement. In this case, ordinarily strong
assumptions are made that gradual destruction of the object will not distort
successive measurement on parts that have not been destroyed. Ordinarily
we feel quite comfortable with the decomposition assumptions that are made
in these cases.
A fourth kind of case of great philosophical and theoretical interest is
when the measurements cannot be made in principle but are assumed to exist
or perhaps even have values that can be inferred from other measurements
that are made. We can return to Bell's inequalities to find good examples
of this last category. Note that when we ask for the joint distribution of
random variables A, A', B, and B', we are not given in the Bell inequalities
the two missing covariances, E(AA') and E(BB'). In other words, we do
not observe the correlations between measurements taken on the same side
of the measuring apparatus with different settings and of course at different
times. There is no natural way to do this. We send a particle through the
PHILOSOPHY AND THE SCIENCES 13

apparatus, a single particle in principle if not in practice, and we measure


the correlation-which is the same as the covariance for these random vari-
ables whose expectations are zero-and we observe, for example, correlation
E(AB'). But we have no natural way of identifying what we would be talk-
ing about in talking about correlations for separate measurements at separate
times of A and A' or correspondingly of Band B'. Consequently, in asking
about the existence of a joint distribution we are simply asking if there can
exist numerical assignments to the two missing covariances, that is, E(AA')
and E(BB'), such that a joint distribution consistent with all the six covari-
ances can be given. This kind of question is an unusual kind of question. It
is not at all natural, from an experimental standpoint, to ask for the values
of these two missing expectations.
Let me focus very sharply on this question. It seems to me it is not at
all clear what are the identity conditions we are focusing on, either at the
level of properties or at the level of "bare" particles, when we ask for the
two missing covariances. These are covariances that we would not naturally
inquire about. Let us consider a similar situation of a very simple sort from
a setting that is surely noncontroversial. Suppose we have two treatments
for a certain kind of cancer. We give one treatment to some patients and the
other treatment to other patients. We cannot ask for the correlation between
the two treatments because no individual is being given both treatments. To
ask for the correlation of the two treatments does not, from an experimental
standpoint, make sense. Introduction of the correlation of the two treatments
rests upon some further theoretical assumptions not obvious at all on the
surface.
Now in quantum mechanics the whole point of the Bell inequalities is
that they are violated by appropriate choice of angles of measurement for
the four random variables so that no joint distribution exists. We might ask,
well, even though the joint distribution does not exist, can we theoretically
compute the missing covariances E(AA') and E(BB')? As far as I can see the
answer is strictly negative: We cannot. I conclude that, looked at from a con-
ceptual standpoint and keeping in mind the identity conditions we naturally
impose for properties and for "bare" particles, the tests of hidden variable
theories generated by the Bell inequalities and the Bell-type experiments are
not as straightforward as it would be natural to expect. At the very least, we
cannot write out the data tables to generate a joint distribution in the way
that would be, in any ordinary experimental situation, straightforward. The
inference about the nonexistence of hidden variables must be at best quite
indirect.
14 CHAPTER 1

Visual Space
One of the classic problems in the philosophy of science has been the
analysis of the nature of physical space. As everybody knows, the discovery
of non-Euclidean geometries in the nineteenth century and the development
of the theory of relativity in the twentieth century have changed forever the
long-held idea that physical space is necessarily Euclidean in character. Much
has been made by philosophers of all sorts of the conceptual importance of
the changes in our theories of physical space.
Much less attention has been devoted to the nature of visual space, that
is, the psychological space in which we see objects. This visual space has a
lot of special characteristics. First, we must think of it in binocular terms.
Second, the space is certainly not homogeneous in the way in which Euclidean
space is. The observer looking out in front of himself with a different view-
point on what lies straight ahead, as opposed to what lies to the right or
left, immediately imposes natural distinctive directions in visual space and
thus upsets our ideas of homogeneity so familiar in the discussion of physical
space. Surprisingly, however, the analysis of visual space has not gone into
this problem from a foundational standpoint in very great depth. I will not
have more to say about it here, although I recognize its importance, and it is
easy enough to see it generates an axiomatic problem that as far as I know has
not yet been solved at all, that is, to formulate visual space with appropriate
and direct account taken of the facts just mentioned.
Returning now to the main question, in a previous article (Suppes, 1977)
I looked at the history of discussions of this problem, beginning with Euclid.
Here I want to concentrate on the various methodologies that have been con-
sidered for studying the nature of visual space and also some of the results
that have been obtained experimentally. The subject is complicated. The
number of experiments is large, and often the nature of these experiments is
involved, especially in terms of the actual parameters estimated from data. I
shall therefore not cover in anything like serious depth all aspects even of the
restricted questions I want to consider, but I hope to be able to say enough to
show that the problem of the nature of visual space is in itself an interesting
philosophical one, even if we should not attach to it the same primary im-
portance that has been historically attached to the nature of physical space.
Perhaps the central point to emphasize in the context of the present lecture is
that philosophical speculations about visual space conducted independent of
consideration of the very large modern psychological literature on the ques-
tion seem naive and wholly inappropriate. On the other hand, the traffic
can be two-way: I think philosophers have something to contribute in their
own way to the conceptual discussion of psychologists on the nature of visual
PHILOSOPHY AND THE SCIENCES 15

space. I hope that some of the comments I make will give a sense of the kind
of help each group may give the other.
Methodology. What would seem to be, in many ways, the most natural
mathematical approach to the question of the nature of visual space has also
been the method most used experimentally. It consists of considering a finite
set of points. Experimentally, the points are approximated by small point
sources of light of low illumination intensity, displayed in a darkened room.
The intuitive idea of the setting is to make only a finite number of point-light
sources visible and to make these light sources of sufficiently low intensity to
exclude illumination of the surroundings. The second step is to ask the person
making visual judgments to state whether certain geometrical relations hold
among the points. For example, do points a and b appear to be the same
distance from each other as points c and d? (Hereafter in this discussion I shall
refer to points, but it should be understood that I have in mind the physical
realization in terms of point-light sources.) Another kind of question might
be, Does the angle formed by points abc appear to be congruent or equal in
measure to the angle formed by points de!?
Another approach to such judgments is not to ask whether given points
have a certain relation but rather to permit the individual making the judg-
ments to manipulate some of the points. For example, first fix points a, b, and
c and then adjust d so that the distance between c and d appears the same as
the distance between a and b. Although the formulation may sound metric
in character, the judgments are often of a qualitative nature-for example,
that of congruence of segments, which I also formulate here as equidistance
of points. However, in other experiments, magnitude estimates of the ratio of
distances are required, in order to apply metric methods of multidimensional
scaling.
Once such judgments are obtained, whether on the basis of fixed relations
or ratios, or by adjusting the position of points, the formal or mathematical
question to ask is whether the finite relational structure representing the
experimental data can be embedded in a two- or three-dimensional space of
a given type--Euclidean, hyperbolic, etc. The dimensionality depends upon
the character of the experiment. In many cases the points will be restricted
to a plane and therefore embedding in two dimensions is required; in other
cases, embedding in three dimensions is appropriate.
By a finite relational structure, I mean as usual a relational structure
whose domain is finite. To give a simple example, suppose that A is the
finite set of points and the judgments we have asked for are judgments of
equidistance of points. Let ~ be the quaternary relation of congruence. Then
to say that the finite relational structure A = (A,~) can be embedded in
three-dimensional Euclidean space is to say that there exists a function ~
16 CHAPTER 1

defined on A such that 'fJ maps A into the set of three-dimensional Cartesian
vectors of real numbers and such that for every a, b, c, and d in A the
following relation holds:

3 3
ab RJ cd iff I:('fJ;(a) - 'fJ;(b»2 = I:('fJ;(c) - 'fJ;(d»2,
;=1 ;=1

where 'fJ;(a) is the ith coordinate of 'fJ(a). Note that the mapping into vectors
ofreal numbers isjust mapping visual points into the Cartesian representation
of three-dimensional Euclidean space. In principle, it is straightforward to
answer the question raised by this embedding procedure: Given a set of
data from an individual's visual judgments of equidistance between pairs of
points, we can determine in a definite and constructive mathematical manner
whether such a Euclidean embedding is possible.
Immediately, however, a problem arises. This problem can be grasped
by considering the analogous physical situation. Suppose we are making
observations of the stars and want to test a similar proposition, or some more
complex proposition of celestial mechanics. We are faced with the problem
recognized early in the history of astronomy, and also in the history of geodetic
surveys, that the data are bound not to fit the theoretical model exactly.
The classical way of putting this is that errors of measurement arise, and our
problem is to determine if the model fits the data within the limits of the error
of measurement. In examining data on the advancement of the perihelion of
Mercury, which is one of the important tests of Einstein's general theory
of relativity, the most tedious and difficult aspect of the data analysis is to
determine whether the theory and the observations are in agreement within
the estimated error of measurement.
Laplace, for example, used such methods with unparalleled success. He
would examine data from some particular aspect of the solar system, for ex-
ample, irregularities in the motion of Jupiter and Saturn, and would then
raise the question of whether these observed irregularities were due to errors
of measurement or to the existence of "constant" causes. When the irreg-
ularities were too great to be accounted for by errors of measurement, he
then searched for a constant cause to explain the deviations from the sim-
pler model of the phenomena. In the case mentioned, the irregularities in
the motion of Jupiter and Saturn, he was able to explain them as being due
to the mutual gravitational attraction of the two planets, which had been
ignored in the simple theory of their motion. But Laplace's situation was
different from the present one in the following important respect. The data
he was examining were already rendered in quantitative form and there was
no question of having an analytic representation. Our problem is that we are
PHILOSOPHY AND THE SCIENCES 17

faced simultaneously with the problem of both assigning a measurement and


determining the error of that measurement. Because of the complexity and
subtlety of the statistical questions concerning errors of measurement in the
present setting, for purposes of simplification, we shall ignore them, but it is
absolutely essential to recognize that they must be dealt with in any detailed
analysis of experimental data.
Returning to the formal problem of embedding relations among a finite
set of points into a given space, it is surprising to find that the results of the
kind that we need for this perceptual problem are apparently not to be found
in the enormous mathematical literature on geometry. There is a large litera-
ture on finite geometries; for example, Dembowski (1968) contains over 1200
references. Moreover, the tradition of considering finite geometries goes back
at least to the beginning of this century. Construction of such geometries by
Veblen and others was a fruitful source of models for proving independence
of axioms, etc. On the other hand, the literature that culminates in Dem-
bowski's magisterial survey consists almost entirely of projective and affine
geometries that have a relatively weak structure. From a mathematical stand-
point, such structures have been of considerable interest in connection with
a variety of problems in abstract algebra. Some general theorems on embed-
ding of finite structures in projective and affine planes are given in Szczerba
and Tarski (1979) and Szczerba (1984).
The corresponding theory of finite geometries of a stronger type, for
example, finite Euclidean, finite elliptic, or finite hyperbolic geometries, is
scarcely developed at all. As a result, the experimental literature does not
deal directly with such finite geometries, although they are a natural exten-
sion of the weaker finite geometries on the one hand and finite measurement
structures on the other.
A second basic methodological approach to the geometrical character of
visual space is to assume that a standard metric representation already exists
and then to examine which kind of space best fits the data. I shall consider
this approach in some detail. Of especial relevance here is multidimensional
scaling, some results of which are reported.
Luneberg theory of binocular vision. The theory of binocular vision
developed by R.K. Luneburg and his collaborators beginning in the 1940s is
still the most detailed and sophisticated viewpoint to receive both mathemat-
ical and experimental attention. Much of the experimental work I report later
takes as its objective testing directly the Luneburg theory or some modifica-
tion of it; this is certainly true of the extensive experimental work of Tarow
Indow and his collaborators.
18 CHAPTER 1

Essentially, Luneburg wanted to postulate that the space of binocular


vision must be a Riemannian space of constant curvature K in order to have
free mobility. It is well known that there are just three types of Riemannian
spaces of constant curvature: If K = 0, the space is Euclidean; if K < 0,
hyperbolic; and if K > 0, elliptic. Moreover, Luneburg felt the evidence is
extremely strong for the conclusion that the space of binocular vision of most
persons is hyperbolic. Luneburg and his collaborators adopted a metric view-
point rather than a synthetic one toward hyperbolic space. We recapitulate
some of the main lines of development here. In particular, we begin with
the Luneburg (1950) axioms for determining a metric on visual space that is
unique up to an isometry, that is, a similarity transformation.
Some preliminary definitions are useful. Let A = (A, d) be a metric
space, i.e., A is a nonempty set and d is a function mapping the Cartesian
product of A into nonnegative real numbers such that:

d(a, b) = ° if and only if a = b, (i)

d(a, b) = d(b,a), (ii)

and

d(a, b) + d(b,c) ~ d(a,c), (iii)

for any points a, b, and c in A. In addition, A is metrically convex iff for any
two distinct points a and c in A there exists a third point b in A such that

d(a, b) + d(b, c) = d(a,c).

The metric space A is complete iff any Cauchy sequence of A converges to


a point in A. We define a betweenness relation Bd (relative to d) and an
equidistance relation Ed in the obvious way:

Bd = {(a, b, c) : d(a, b) + d(b, c) = d(a, c),Jor a, b, c E A}


= =
Ed {(a, b, c, d) : d(a, b) d(c, d), for a, b, c, dE A} .

If we think of Bd and Ed as the (idealized) observed betweenness and


equidistance relations in visual space, then roughly speaking any two metrics
for which they are the same are related by an isometry. More explicitly and
precisely, we have the following theorem.
PHILOSOPHY AND THE SCIENCES 19

THEOREM 3. Let A = =
(A, d) and A' (A, d') be metric spaces that are
complete and metrically convex, and let the betweenness and equidis-
tance relations be the same for the two spaces, i.e., let Bd = Bdl and
Ed = Edl • Then there is a positive real number c such that for all a
and b in A
=
d'(a,b) cd(a,b).

This theorem shows that it is easy to state a condition under which two met-
ric spaces are isomorphic up to multiplication by a constant, in this case the
positive number c. To determine that visual space must be a Riemannian
space of constant curvature, still stronger assumptions are needed. In other
words, just satisfaction, as such, of the numerical relations of betweenness and
congruence in the sense of numerical distance is not sufficient. It is important
to note this, for it might be thought that these conditions on betweenness
and equidistance would be sufficient. The obvious point is that in no sense is
the theorem strong enough to determine that the metric space is Euclidean,
hyperbolic, or elliptic. Luneberg (1948) rightly says that the existence of such
a unique psychometric distance function as expressed in the above theorem
is supported by a variety of classical experiments in visual perception. In
other words, there are many different experiments showing that we do have
sensations of visual distance that can be represented uniquely by a metric up
to selection of a unit of measurement. As Luneburg emphasizes, the assump-
tions of metrical complexity and completeness are needed for the uniqueness
result, even though these axioms are not themselves directly tested in the
relevant experiments.
Much too great a variety of spaces satisfies the hypothesis of the pre-
ceding theorem. We need to tighten the framework in order to have a limited
number of spaces to investigate. Luneburg (1947, 1948, 1950) uses argu-
ments from differential geometry to get the standard result that only in Eu-
clidean, hyperbolic or elliptic spaces, that is, Riemannian spaces of constant
curvature, is it possible to move about visual objects without deformation.
The differential argument is not really satisfactory, but there is a well-known
global argument not mentioned by Luneburg which also establishes this re-
suit. It is one of the most famous problems in the foundations of geometry,
the Helmholtz-Lie problem on the nature of physical space.
Riemann's famous lecture (1854), "Uber die Hypothesen, welche der Ge-
ometrie zu Grunde liegen," was responded to by Helmholtz (1868) more than
a decade later in a famous paper, "Uber die Thatsachen, die der Geome-
trie zu Grunde liegen." Helmholtz makes it explicit that he wants to move
from hypotheses to facts (Thatsachen) that underlie our conception of space.
He argues that although arbitrary Riemannian spaces are conceivable, actual
physical space has as an essential feature the free mobility of solid (i.e., rigid)
20 CHAPTER 1

bodies. In metric geometry, a motion is a transformation of the space A


onto itself that preserves distances. Such a transformation or mapping is also
called an isometry. Explicitly, if A = (A, d) is a metric space, then cp is an
isometry or motion if and only if for every a and b in A

d(cp(a), cp(b)) = d(a, b) .

Helmholtz based his analysis on four axioms, which we describe informally,


following Freudenthal (1965). The first axiom asserts that space is an n-
dimensional manifold with differentiability properties. The second axiom
asserts there is a metric with motions as isometric transformations. The
third axiom asserts the free mobility of solid bodies, which means that if cp is
an isometric mapping of a set B of points onto a set B' (in the same space),
then cp can be extended to a motion of the whole space. The fourth axiom
requires that the motion should be periodic (and not spiraling). This is often
called the monodrony axiom.
Helmholtz claimed to have proved that the only spaces satisfying his four
axioms are the Euclidean, hyperbolic, and spherical spaces. Sophus Lie (1886)
noticed a gap in Helmholtz's proof. Lie strengthened the axioms and solved
the problem. Some years later, Weyl (1923) weakened Lie's assumptions. The
details of the many subsequent contributions to the problem of weakening
the axioms and retaining essentially Helmholtz's solution are to be found
in Busemann (1955, Section 48) and Freudenthal (1965). The basic aim
of the modern work is to eliminate differentiability assumptions, which are
extraneous to the problem of characterizing the spaces that have free mobility
of solid bodies. It is not appropriate here to formulate in technical detail the
strongest theorems, that is, the ones with the weakest assumptions, that have
been proved about the Helmholtz-Lie problem. The point is that whether we
look at space either physically, as Riemann and Helmholtz certainly did, or as
psychological, we want to have as a property of space-certainly to a very fine
approximation-, the property of free mobility of solid bodies in the physical
case and of visual images of bodies in the psychological case.
By this or other lines of argument, following Luneburg, we end up with
three types of Riemannian spaces of constant curvature as the three candi-
dates for visual space. As already remarked, I follow the usual notation to
indicate the constant curvature by ](: if ]( < 0 the space is hyperbolic, if
]( = 0 the space is Euclidean, and if J( > 1 the space is elliptic.
Using the differential expression for a line element in Riemannian spaces
we can express the line element for these three elementary spaces in the
following simple canonical form:
PHILOSOPHY AND THE SCIENCES 21

where the sensory coordinates e, 7], , are ordinary Cartesian coordinates in


e
a three-dimensional Euclidean space when K = O. The origin = 7], , = 0 is
selected to represent the apparent center of observation of the observer.
To present the fundamental ideas here in reasonable compass, it is nec-
essary to skip at this point a number of technical details that are important
in actual experimental applications of Luneburg's ideas. In particular, the
equation for the line element is transformed once again to introduce an in-
dividual parameter (1' as well as ](, which it is anticipated will vary from
individual to individual.
As primary evidence for the hyperbolic nature of visual space, Luneburg
referred to the classical experiments of Hillenbrand (1902) and Blumenfeld
(1913). Let me refer here to Blumenfeld's experiments, which were improve-
ments on those of Hillenbrand. Blumenfeld performed experiments with so-
called parallel and equidistance alleys. In a darkened room the subject sits
at a table, looking straight ahead, and he is asked to adjust two rows of point
sources of light placed on either side of the normal plane, Le., the vertical
plane that bisects the horizontal segment joining the centers of the two eyes.
The two furthest lights are fixed and are placed symmetrically and equidis-
tant from the normal plane. The subject is then asked to arrange the other
lights so that they form a parallel alley extending toward him from the fixed
lights. His task is to arrange the lights so that he perceives them as being
straight and parallel to each other in his visual space. This is the task for
construction of a parallel alley. The second task is to construct an equidis-
tance alley. In this case, all the lights except the two fixed lights are turned
off and a pair of lights is presented, which are adjusted as being at the same
physical distance apart as the fixed lights-the kind of equidistance judg-
ments discussed earlier. That pair of lights is then turned off and another
pair of lights closer to him is presented for adjustment, and so forth. The
physical configurations do not coincide. but in Euclidean geometry straight
lines are parallel if and only if they are equidistant from each other along any
mutual perpendiculars. The discrepancies observed in Blumenfeld's experi-
ment are taken to be evidence that visual space is not Euclidean. In both the
parallel-alley and equidistance-alley judgments the lines diverge as you move
away from the subject, but the angle of divergence tends to be greater in the
case of parallel than in the case of equidistance alleys. Since the most distant
pair is the same for both alleys, this means the equidistance alley lies outside
22 CHAPTER 1

the parallel alley. These results have been taken by Luneburg to support his
hypothesis that visual space is hyperbolic.
There is one obvious reservation to be made about Luneburg's inference
that visual space is hyperbolic. There is no unique concept of lines being
parallel in hyperbolic space. Indow (1979) discusses Luneburg's choice rather
carefully and shows that it has some justification. Essentially he uses or-
thogonality to characterize being parallel. The situation is worse when visual
space's being elliptic is tested by alley data, for no two lines can be parallel in
such a space. A local concept must be used; for any standard choice it can be
shown that in the elliptic case the parallel alley lies outside the equidistance
alley.
Modern experiments. In Luneburg (1947,1948, 1950) a number of exper-
imental applications of the theory are sketched, for example, determination of
the parameters ]{ and (J' for a given observer, quantitative analysis of obser-
vational data for equidistance and parallel alleys, analysis and prediction of
visually congruent configurations, and analysis of what is visually congruent
to infinite horizons in physical space. Detailed analytic suggestions for exper-
iments, quantitative analysis of the data, or determination of parameters was
made later, after Luneburg's premature death, by his associate A.A. Blank
(1953, 1957).
The most extensive early test of Luneburg's theory is found in the re-
port of Hardy, Rand, Rittler, Blank, and Boeder (1953) of the experiments
carried out at the Knapp Memorial Laboratories, Institute of Ophthamology,
Columbia University. Without entering into a detailed description of the ex-
periments I summarize the experimental setup and their main conclusions.
All experiments were carried out in a darkroom with configurations made
up of a small number of low intensity point sources of light. The intensities
were adjusted to appear equal to the observer but low enough not to per-
mit any perceptible surrounding illumination. The observer's head was fixed
in a headrest and he always viewed a static configuration-no perception of
motion was investigated. All observations were made binocularly and the
observer was permitted to let his point of regard vary over the entire phys-
ical configuration until a stable judgment about the visual geometry of the
configuration was reached. An important condition was that all experiments
were restricted to the horizontal plane.
Their main conclusions were these:

1. There is considerable experimental evidence to support Luneburg's pre-


diction of when two configurations are visually congruent.
2. The experiments on parallel and equidistance alleys confirmed the clas-
sical results of Blumenfeld.
PHILOSOPHY AND THE SCIENCES 23

3. The efforts to determine the individual observer constants ]{ and (1' were
not quantitatively successful. The main problem was drift of value of
the constants through a sequence of experiments. The values obtained
here and in related experiments supported Luneburg's hypothesis that,
for most persons, visual space is hyperbolic, that is, ]{ < O.

Some closely related data and analysis are given in Blank (1958, 1961);
in the main the results support the hypothesis that the curvature of visual
space is negative. Other closely related experiments are those of Zajaczkowska
(1956a,b).
The main group to continue in a direct way the theoretical and experi-
mental work of Luneburg, Blank, and the Knapp Memorial Laboratories at
Columbia has been the group centered around Tarow Indow, first at Keio Uni-
versity in Japan, and later at the University of California, Irvine campus. The
list of publications extends over a period of more than two decades, and the
references I give here are far from complete. Indow, Inore and Matsushima
(1962a,b) reported extensive experiments conducted over a period of three
years to test Luneburg's theory and, in particular, to estimate the individual
parameters]{ and (1'. In the 3-point experiment, three points of light Qo, Q1,
and Q2, were presented in the horizontal plane relative to the subject, but
both horizontally and vertically relative to the darkened room. Qo and Q1
were fixed, and it was the task of the subject to move Q2 so that the segment
Q1 Q2 was visually congruent to the segment QOQ1' Conditions were similar
in the 4-point experiment except that there were two points Q2 and Q3 to
be adjusted so that Q2Q3 was visually congruent to QOQ1' Of the 26 experi-
mental runs with six subjects reported in (1962a), for 23 the estimated value
of ]{ was in the range -1 < ]{ < 0 with a satisfactory goodness of fit, which
directly supports Luneburg's theory that visual space is hyperbolic. It should
be mentioned that repeated runs with the same subjects showed considerable
fl uctuation in the value of ]{. In (1962b) the same experimental setup and
subjects were used to replicate the alley experiments of Hillenbrand (1902)
and Blumenfeld (1913) mentioned earlier. The equidistant and parallel alleys
were in the relation observed in the earlier investigations and thus supported
Luneburg's theory. But one aspect was theoretically not satisfactory. The
values of ]{ and (1' estimated for individual subjects in (1962a) did not satis-
factorily predict the alley data at all. Quite different estimated values were
needed to fit these data.
Indow, Inoue and Matsushima (1963) repeated the experiments of (1962
a,b), but with the points of light located in a spacious field. In the earlier
experiments the most distant point of light was 300 cm from the subject.
In this study it was 1610 cm, made possible by conducting the experiment
in a large, darkened gymnasium. Qualitatively the results agreed with the
24 CHAPTER 1

earlier experiments, but the quantitative aspects, as reflected in the estimated


parameters K and (1 did not.
Starting in 1967 and extending over a number of years, Indow and as-
sociates have applied multidimensional scaling methods (MDS) to the direct
investigation of the geometrical character of visual space. However, there
are several points about MDS to keep in mind. First, the results would be
difficult to interpret if the number of scaling dimensions exceeded the num-
ber of physical dimensions. Second, MDS is most often used when there are
not strong structural constraints given in advance. We know, on the other
hand, that visual space is approximately Euclidean. Is the accuracy of MDS
sufficient to pick up the sorts of discrepancies found in the alley experiments?
Matsushima and Noguchi (1967), using data from experiments of a
Luneburg type-small light points in a dark room-and observation of stars
in the night sky, obtained good fits to the Euclidean metric using MDS, with
the appropriate dimensionality.3 On the other hand, the mapping between
the physical space and the visual space determined by MDS was much more
complicated than that proposed by Luneburg and in fact was too complicated
to describe in any straightforward mathematical fashion. Nishikawa (1967)
continued the same line of investigation by arranging the light stimuli in ways
to test the standard alley results. He also suggests a theoretical approach to
explain the Luneburg-type results which he replicated, on the MDS assump-
tion that visual space is Euclidean. The essence of the approach is to assume
the mapping function between visual and physical space changes substantially
with a change in task and instruction. That there is such an effect seems likely
but Nishikawa's theoretical analysis does not get very far. Similar theoreti-
cal arguments are advanced by Indow (1967), but he expresses appropriate
skepticism about the Euclidean solution being satisfactory. Closely related
empirical results and theoretical ideas are also analyzed with care by Indow
(1968, and also 1974, 1975), who gives a particularly good quantitative ac-
count of the accuracy of the Euclidean model for various subjects when MDS
methods are used.
Both methodologically and conceptually, it is natural to be somewhat
skeptical that verbal estimates of ratios of distances-the MDS method used
in the studies cited above-were sensitive and accurate enough to discriminate
the Euclidean or hyperbolic nature of visual space. Indow in various places
expresses similar scepticism about nonmetric MDS, whose lack of sensitivity
to details is well known. A thorough discussion of these matters is to be found
in Indow (1982), which also extends in a detailed way the methods of MDS
to using a hyperbolic or elliptic metric as well as a Euclidean one. Although
the quantitative fit is not much better than that of the Euclidean metric, the
hyperbolic metric does give a better account of the standard alley data.
PHILOSOPHY AND THE SCIENCES 25

I restrict myself to a few other studies especially pertinent. Foley (1964a,


1964b, 1972) undertakes the important task of studying the qualitative prop-
erties of visual space, with an emphasis on whether or not it is Desarguesian.
In the first two papers his answer is tentatively affirmative, and in the last
one negative. Unfortunately, Foley's work represents a line of attack that has
not been followed up by other investigators.
A significant and careful experimental study that reaches some different
conclusions about visual space is that of Wagner (1985). The methodology of
the work is notable for two reasons. First, the experiments were conducted
outdoors in full daylight in a large field with judgments about the geometrical
relations of 13 white stakes. Second, four different procedures were used for
judging distances, angles and areas: magnitude (ratio) estimation, category
estimation, mapping, and perceptual matching, where mapping means con-
structing a simple scale map of what is seen. Only the results for distance
will be discussed here. In this case, perceptual matching was not feasible in
the experimental setup and was not used.
The results for distance are surprising and interesting. The Luneburg
model of hyperbolic space did not fit well at all. What did fit reasonably
well is a Euclidean model of visual space, but the Euclidean visual space is
a nontrivial affine transformation of Euclidean physical space. We may use
x and y axes to discuss the results. The x-axis is the one perpendicular to
the vertical plane through the eyes. It is the depth axis. The y-axis is the
frontal axis passing through the two eyes. Let (x, 0) and (0, y) be two physical
points such that x = y, i.e., along their respective axes the two points are
equidistant from the origin-the point midway between the two eyes, but in
visual space x' = 0.5y'-approximately, i.e., visual foreshortening amounts to
the perceived distance along the depth axis being half of the physical distance
when perceived frontal distances are equated to the physical distances. Call
this foreshortening factor e, so that x' =ex and y' = y, with e varying
with subjects but being approximately 0.5. This very strong effect is highly
surprising, for it has not been reported in Luneburg-type experiments, even
with illumination. The surprise remains even when account is taken of the
very different stimulus conditions of Wagner's experiments, although in an
earlier study under somewhat similar conditions of full illumination Battro,
Netto and Rozestraten (1976) also got results strongly at variance with the
Luneburg predictions.
The notable omission, from a variety of viewpoints, is experimental study
of projective geometry, for the essence of vision is the projection of objects
in three-dimensional space onto the retina. Fortunately, Cutting (1986) has
recently published a book on these matters. I will not attempt a resume of
the many experiments he reports, but concentrate on one fundamental point.
26 CHAPTER 1

The most important quantitative invariant of projective geometry is the cross


ratio of four collinear points. Let a, b, c, and d be four such points. Then their
cross ratio is ~:~~~. In perceiving lines in motion, i.e., from a continuously
changing perspective, is it the cross ratio we perceive as invariant as the
evidence for rigidity in the actual relative spatial positions of given lines?
Cutting provides evidence that the answer is by and large affirmative. This
result also solves La Gournerie's paradox (1859), described in Pirenne (1975):
linear perspective is mathematically correct for just one fixed point of view,
but almost any position in front of a painting will not disturb our perception.
As Cutting points out, an explanation of the apparent paradox is that when
the cross ratio of points projected onto a plane surface is preserved, it will be
preserved from any viewer position. Further pursuit of this kind of projective
analysis should throw further light on the Euclidean or non-Euclidean nature
of visual space.
Some conclusions. Luneb~Irg's fundamental hypothesis is the most strik-
ing of any that have been proposed for the nature of visual space just because
of the relentless theoretical push on his part to work out so many of the im-
plications of his fundamental ideas. As far as I can see he is the first person
in the history of thought to make a really satisfactory detailed proposal that
visual space is not Euclidean in character. There are of course predecessors
going all the way back to Thomas Reid in the eighteenth century, but it
is really Luneburg's virtue to have laid out the theory for the first time in
anything like adequate detail.
Unfortunately, as we have seen from the many experiments surveyed, we
cannot come to a simple conclusion of the kind that Luneburg would like to
have found supported in as detailed a way as possible. We cannot conclude
simpliciter that visual space is hyperbolic. Certainly we can give to Luneburg
the point that there are simple experimental configurations in which the judg-
ments of subjects certainly do support his hypothesis. On the other hand,
there is a great variety of evidence supporting the view that with a change
in experimental circumstances, for example, in the kind of lighting, very dif-
ferent results can be obtained. The study of visual space, like the study of
other psychological phenomena, turns out to be quite sensitive to particular
experimental configurations and particular experimental environments. From
a broad methodological standpoint, in fact, it might be claimed that this is
the most severe difficulty of developing in almost every area of psychology an
adequate deep-running general theory.
In any case, it is important in thinking about visual space to contrast
the variety of results to those obtained in the study of physical space. It may
very well be that in an environment of black holes we shall find our ordinary
ideas of physical space no longer at all valid and the nature of physical space
PHILOSOPHY AND THE SCIENCES 27

changing rapidly, since it depends upon the swirling environment of the black
hole. But for measurements in human environments and on a human scale
the great constancy of physical space is one of the most fundamental facts
of the universe in which we live. The systematization of these physical facts
many centuries ago was one of the most important achievements of Greek
mathematics and science. It is a mistake to think we can achieve anything
like a similar systematization of great general validity in the case of visual
space, at least if we try to think about visual space in the way we think
about classical geometry. The most obvious distinction is that visual space
is in certain fundamental respects closer to classical physics than to classical
geometry. What I have in mind by this remark is that context is rampant in
classical physics but not at all in classical geometry. If we have two bodies
interacting with each other gravitationally, we completely expect the motion
of the two bodies to be disturbed by the introduction of a third, which changes
the environment and thereby the context. We would in fact be astounded if
no change occurred. Endless other physical examples easily come to mind.
We might even say that the study of dynamics in all branches of classical
physics is to a large extent the study of changing context.
By these remarks I do not mean to suggest that it will be an easy matter
to move from a framework of classical geometry to one of classical physics
and thereby achieve a deeper-running, more satisfactory general theory of
visual space. I am only drawing an analogy when it comes to the treatment
of context. I think we are as yet far from clear how to build theories to take
account of the great variety of context effects that have been experimentally
studied thus far. But I also do not want to suggest that I think the situation
is scientifically hopeless, that the contexts are so complicated and devious
that they cannot be reduced in a feasible way to a theoretical framework. We
have a lot to build on, namely, the kinds of experiments that have supported
very well Luneburg's ideas and the kinds of other experiments, for example,
those of Foley and of Wagner, which go in a different direction but in a
way that we can understand and begin to bring within the fold of a general
theory. It is also important to recognize that physics operates only with a very
selected body of experiments. We do not want to make the mistake of thinking
that we can move in any direct way to the study of visual space in wholly
natural environments. The need for the present is to enlarge the canonical
experiments sufficiently to get a range of variation, but with contexts that
we can manage.
The experimental study of visual space is a tedious business, pursued
today in proper scientific fashion by only a small band of intrepid psychol-
ogists. In many ways, our study of visual space is still at the beginning
because we do not yet have a general theoretical framework within which to
operate. Philosophers in search of generalities about space need to be chary
28 CHAPTER 1

of having too fixed or detailed views about the nature of visual space. One
conclusion of considerable historical and philosophical interest is that a va-
riety of experiments certainly do support the conclusion that visual space is
not Euclidean.

NOTES
1 The constant is the reciprocal of the period of the motion of the two particles
in the plane.
2 The correspondence between a solution of the differential equation and a
sequence of integers is the source of the term symbolic dynamics. The idea of such
a correspondence originated with G.D. Birkhoff in the 1930s.
3 Subjects were asked to judge ratios of interpoint distances.

REFERENCES

Alekseev, V.M. (1969a). "Quasirandom dynamical systems. I. Quasirandom diffeo-


morphisms," Mathematicheskie USSR Sbornik, 5, 73-128.
Alekseev, V.M. (1969b). "Quasirandom dynamical systems. II. One-dimensional
nonlinear oscillations in a field with periodic perturbation," Mathematicheskie
USSR Sbornik, 6, 505-560.
Battro, A.M., Netto, S.P., & Rozestraten, R.J.A. (1976). "Riemannian geometries
of variable curvature in visual space: Visual alleys, horopters, and triangles in
big open fields," Perception, 5, 9-23.
Bell, J.S. (1964). "On the Einstein Podolsky Rosen paradox," Physics, 1, 195-200.
Blank, A.A. (1953). "The Luneburg theory of binocular visual space," Journal of
the Optical Society of America, 43,717-727.
Blank, A.A. (1957). "The geometry of vision," British Journal of Physiological
Optics, 14, 154-169, 213.
Blank, A.A. (1958). "Analysis of experiments in binocular space perception," Jour-
nal of the Optical Society of America, 48, 911-925.
Blank, A.A. (1961). "Curvature of binocular visual space. An experiment," Journal
of the Optical Society of America, 51, 335-339.
Blumenfeld, W. (1913). "Untersuchungen iiber die scheinbare Grosse in Schraume,"
ZeitschriJt fur Psychologie und Physiologie der Sinnesorgane, 65, 241-404.
Busemann, H. (1955). The geometry of geodesics. New York: Academic Press.
Clauser, J.F., Horne, M.A., Shimony, A., & Holt, R.A. (1969). "Proposed ex-
periment to test local hidden-variable theories," Physical Review Letters, 23,
880-884.
Cutting, J.E. (1986). Perception with an eye for motion. Cambridge, MA: The
MIT Press.
Dembowski, P. (1968). Finite geometries. New York: Springer-Verlag.
PHILOSOPHY AND THE SCIENCES 29

Fine, A. (1982). "Hidden variables, joint probability, and the Bell inequalities,"
Physical Review Letters, 48, 291-295.
Foley, J.M. (1964a). "Desarguesian property in visual space," Journal of the Optical
Society of America, 54, 684-692.
Foley, J.M. (1964b). "Visual space: A test of the constant curvature hypothesis,"
Psychonomic Science, 1, 9-10.
Foley, J.M. (1972). "The size-distance relation and intrinsic geometry of visual
space: Implications for processing," Vision Research, 13, 323-332.
Freudenthal, H. (1965). "Lie groups in the foundations of geometry," Advances in
Mathematics, 1, 145-190.
Hardy, L.H., Rand, G., RittIer, M.C., Blank, A.A., & Boeder, P. (1953). The geom-
etry of binocular space perception. Knapp Memorial Laboratories, Institute of
Ophthamology, Columbia University College of Physicians and Surgeons.
Helmholtz, H. von (1868). "fIber die Thatsachen, die der Geometrie zu Grunde
liegen," Gottinger Nachrichten, 9, 193-221.
Hillenbrand, F. (1902). "Theorie der scheinbaren Grosse bei binocularem Se-
hen," Denkschriften d. Wiener Akademie d. Wissenschaften. Mathematisch-
Naturwissenschaftliche Classe, 72, 255-307.
Holland, P.W., & Rosenbaum, P.R. (1986). "Conditional association and unidi-
mensionality in monotone latent variable models," The Annals of Statistics,
14, 1523-1543.
Indow, T. (1967). "Two interpretations of binocular visual space: Hyperbolic and
Euclidean," Annals of the Japan Association for Philosophy of Science, 3,
51-64.
Indow, T. (1968). "Multidimensional mapping of visual space with real and simu-
lated stars," Perception & Psychophysics, 3, 45-53.
Indow, T. (1974). "Applications of multidimensional scaling in perception." In
Handbook of perception, Vol. 2, Psychophysical judgment and measurement
(pp. 493-531). New York: Academic Press.
Indow, T. (1975). "An application of MDS to study of binocular visual space."
U.S.-Japan Seminar: Theory, methods and applications of multidimensional
scaling and related techniques. University of California, August 20-24, San
Diego, Calif.
Indow, T. (1979). "Alleys in visual space," Journal of Mathematical Psychology,
19, 221-258.
Indow, T. (1982). "An approach to geometry of visual space with no a priori map-
ping functions: Multidimensional mapping according to Riemannian metrics,"
Journal of Mathematical Psychology, 26, 204-236.
Indow, T., Inoue, E., & Matsushima, K. (1962a). "An experimental study of the
Luneburg theory of binocular space perception (1). The 3- and 4-point exper-
iments," Japanese Psychological Research, 4, 6-16.
Indow, T., Inoue, E., & Matsushima, K. (1962b). "An experimental study of the
Luneburg theory of binocular space perception (2). The alley experiments,"
Japanese Psychological Research, 4, 17-24.
Indow, T., Inoue, E., & Matsushima, K. (1963). "An experimental study of the
Luneburg theory of binocular space perception (3): The experiments in a
spacious field," Japanese Psychological Research, 5, 10-27.
30 CHAPTER 1

Lie, S. (1886). "Bemerkungen zu Helmholtz' Arbeit iiber die Thatsachen, die der
Geometrie zu Grunde liegen," Berichte uber die Verhandlungen der K oniglich
Sachsischen Gesellschaft der Wissenschaften zu Leipzig, Mathematisch-
Physikalische Classe, 38, 337-342.
Luneburg, R.K. (1947). Mathematical analysis of binocular vision. Princeton, N J:
Princeton University Press.
Luneburg, R.K. (1948). "Metric methods in binocular visual perception." In Stud-
ies and essays, (pp. 215-240). New York: Interscience.
Luneburg, R.K. (1950). "The metric of binocular visual space," Journal of the
Optical Society of America, 40, 627-642.
Matsushima, K., & Noguchi, H. (1967). "Multidimensional representation of binoc-
ular visual space," Japanese Psychological Research, 9, 83-94.
Montague, R. (1974). "Deterministic theories." Reprinted in R.H. Thomason (Ed.),
Formal philosophy: Selected papers of Richard Montague, (pp. 332-336). New
Haven: Yale Press.
Moser, J. (1973). Stable and random motions in dynamical systems with special
emphasis on celestial mechanics. Hermann Weyl Lectures, the Institute for
Advanced Study. Princeton, N J: Princeton University Press.
Nishikawa, Y. (1967). "Euclidean interpretation of binocular visual space,"
Japanese Psychological Research, 9, 191-198.
Pirenne, M.H. (1975). "Vision and art." In E.C. Carterette & M.P. Friedman
(eds.), Handbook of perception, Vol. 5,434-490.
Riemann, B. (1854). "Uber die Hypothesen, welche der Geometrie zu Grunde
liegen," Gesellschaft der Wissenschaften zu Gottingen: Abhandlungen, 1866-
67, 13, 133-142.
Sitkinov, K. (1960). "Existence of oscillating motions for the three-body problem,"
Doklady Akademii Nauk, USSR, 133(2), 303-306.
Suppes, P. (1977). "Is visual space Euclidean?" Synthese, 35, 397-421.
Suppes, P., & Zanotti, M. (1981). "When are probabilistic explanations possible?"
Synthese, 48, 191-199.
Szczerba, L.W. (1984). "Imbedding of finite planes," Potsdamer Forschungen,
Reihe B Heft, 41, 99-102.
Szczerba, L.W., & Tarski, A. (1979). "Metamathematical discussion of some affine
geometries," Fundamenta Matliematicae, 104, 115-192.
Wagner, M. (1985). "The metric of visual space," Perception & Psychophysics, 38,
483-495.
Weyl, H. (1923). Mathematische Analyse des Raumproblems. Berlin: Springer.
Zajaczkowska, A. (1956a). "Experimental determination of Luneburg's constants (T
and K," Quarterly Journal of Experimental Psychology, 8, 66-78.
Zajaczkowska, A. (1956b). "Experimental test of Luneburg's theory. Horopter and
alley experiments," Journal of the Optical Society of America, 46, 514-527.

Patrick Suppes
Stanford University
Ventura Hall
Stanford, CA 94305
IMPRESSIONS OF PHILOSOPHY

THOMAS SCHWARTZ

We say that the most dangerous criminal now is the entirely lawless modern philoso-
pher. Compared to him, burglars and bigamists are essentially moral men; my
heart goes out to them. They accept the essential ideal of man; they merely seek
it wrongly. Thieves respect property. They merely wish the property to become
their property that they may more perfectly respect it. But philosophers dislike
property as property; they wish to destroy the very idea of personal possession.
Bigamists respect marriage, or they would not go through the highly ceremonial
and even ritualistic formality of bigamy. But philosophers despise marriage as mar-
riage. Murderers respect human life; they merely wish to attain a greater fullness
of human life in themselves by sacrifice of what seems to them to be lesser lives.
But philosophers hate life itself, their own as much as other people's.
G.K. Chesterton, The Man who was Thursday
"What is your husband studying?" asked Mrs. Sadni. It was 1965, I
was a first-year graduate student at the University of Pittsburgh, and my
wife had run into the building-superintendent's wife in the laundry room of
our apartment house.
"Philosophy," my wife replied, glad of the conversational gambit and
unaware of the peril ahead.
"Oh," said Mrs. Sadni, "Philosophy. That's very interesting. What is
his?"
"Well, uh, ..." My failure to prepare my wife for such an elementary
and natural question quickly became apparent.
But the super's wife was ready with a tactful response. "Of course, he
has just begun his studies. It'll be a while before he has his own philosophy."
In her demand for an aphorism ("Underneath our clothes, we are all
naked," I have since suggested to my wife), or perhaps a doctrinal label,
Mrs. Sadni succinctly expressed her impression of philosophy (her image, I
might have called it, intimating wider compass and greater fancy, had that
word not been pilfered by the servants of commerce). Like a material object
seen from different angles in different lights by differently disposed observers,
philosophy casts innumerable impressions, some more revealing or less mis-
leading than others, but none uniquely veridical. Like a material object, phi-
losophy is best understood through the impressions gained from a variety of
vantage points. Like some impressions of a material object, some impressions
of philosophy are noteworthy because especially informative: they enable us

31
w. Sieg (ed.), Acting and Reflecting, 31-37.
© 1990 by Kluwer Academic Publishers.
32 CHAPTER 2

to predict with fair accuracy the impressions to be gained from a fair variety
of vantage points. Others are noteworthy because neglected: like an aerial
view of a house, they are the impressions gained from unusual vantage points.
This book evinces and celebrates an impression of the latter sort-the
subversive impression of philosophy, I like to call it. I approach this impression
through contrast with others.
Grammar reflects the difference between Mrs. Sadni's impression and
the more professional impressions of philosophy. Mrs. Sadni used the word
philosophy as a count noun, like house: she might have wondered whether
more philosophies were to be found at Harvard than at the University of
Pittsburgh. Professional philosophers use the word rather as a mass noun,
like water: abjuring the plural, they think of themselves as doing philosophy
rather than propounding philosophies. They peddle their wares in the form
of arguments and definitions, puzzles and counterexamples, sometimes even
theories and doctrines, but rarely in a package grand enough to be called a
philosophy.
This does not mean that Mrs. Sadni misspoke or used a different word
spelled and pronounced the same as our philosophy. When a political candi-
date says that he and his opponent have different philosophies, he most likely
marks a difference of opinion on matters philosophical by any measure-
matters of justice, liberty, human welfare, and the like. He uses the count
noun to suggest settled conviction, we the mass noun to suggest continued
inquiry.
Some may wish to couch Mrs. Sadni's impression in the etymological
equation of philosophy with the love of wisdom. Since the word was first used,
however, disciplinary meiosis has caused that equation to become false-and,
if uttered by a philosopher, arrogant.
In the popular forum, Mrs. Sadni's impression is sometimes supplanted
by a derisive one, an impression of philosophy as something arcane, even
silly, as something grown men cannot take seriously and should not under-
take at all. (My heart went out to you when I learned that your son has
become a philosopher, Mrs. Schwartz.) We see this impression in Aristo-
phanes' portrayal of Socrates in The Clouds. We see it in Samuel Johnson's
celebrated refutation of Berkeley's phenomenalism: having kicked a rock, he
cried, "Ouch! So much for Bishop Berkeley!" Or perhaps you have heard the
one about Descartes ordering a Big Mac: asked if he wanted French fries, he
replied, "I think not," whereupon he disappeared.
There is something to be said for the derisive impression. Philosophers
are and ought to be a tiny bit more ridiculous than others. The defense
of extreme positions, the suspension of common sense, the use of outre ex-
IMPRESSIONS OF PHILOSOPHY 33

amples and of arguments too clever easily to fault yet impossible seriously
to accept-these devices often penetrate problems to a depth and with a
precision unattainable by other means, problems not likely to have been so
strenuously attacked had the work not been so much fun.
Undergraduates sometimes give voice to an antiquarian impression of
philosophy. Imagining that the last philosopher died long ago, they expect
philosophy professors to be scholars of philosophy, not philosophers, and phi-
losophy courses to treat of the history and literature of the subject, much
as English courses treat of writings more than writing. They are surprised
when asked to grapple with puzzles or to engage in debate with the classical
authors. The antiquarian impression is not peculiar to students, however.
Educationists, rhetoricians, and especially "theorists" in political science of-
ten claim the title philosopher (necrophilosopher would be more apt) but as
often seem incapable of philosophical assertion except as the tail of a fat dog
of textual commentary.
Not that the history of philosophy can or ought to be neglected. Phi-
losophy does not progress as, for the most part, the sciences do, discarding
some of its history and incorporating the rest, possibly recast in current id-
iom, as part of a continually revised but momentarily consensual body of
doctrine. It is just that philosophy, seen through professional eyes, is a dis-
cipline that takes its forebears seriously, treating them as colleagues rather
than curiosities.
In a recent visit to the Soviet Union, I was struck by another impression
of philosophy. There to address the U.S.S.R. Academy of Sciences, I was
asked early by local colleagues, all applied mathematicians working at the
edge of economics, how I should be described in printed invitations to my
talk. "What is your degree in, economics?"
"No," I said, "philosophy."
"Are you serious?" To a Russian, they told me, philosopher means
priest, more or less. The term was nonpejorative, the point nonpolitical: a
"priest" could preach Marxism rather than Christianity, but preach he must-
sometimes a good thing to do, but not the same thing as doing science. How
could I be a philosopher? Our shared interests lay in the mathematical foun-
dations of choice theory, pure and social. They and I have worked to uncover
anomalies in the classical assumptions of "rational choice" that ground eco-
nomics and much of the social and decision sciences and to develop alternative
foundations. They and I package our product as theorems. How does that
qualify as philosophy?
I had to tell them what I thought philosophy was. Normally, I said, when
we solve problems and answer questions, we rely on tools of inquiry--on con-
34 CHAPTER 2

cepts and assumptions, on principles of inference and evidence-that we ac-


cept uncritically. But sometimes we question those very tools: we probe their
meaning, ask how they came to be accepted, challenge their validity, search
them for the provenance of anomalies and conundra. When we do, some of
the tools we normally rely on to answer questions are unavailable, having
themselves been called into question. When we do, the questions we face are
foundational. The mark of philosophy is that it specializes in such questions.
Focusing as they have on foundational questions, my Soviet colleagues were
themselves engaged willy-nilly in philosophizing, or so 1 contended to their
great amusement.
Having begun the story of what 1 am or was to be, 1 may as well finish.
Philosopher was out. What about my academic title?
"Professor of Government," 1 said, having shifted my chief disciplinary
allegiance to political science some time ago.
If there were a label worse than philosopher, 1 had found it. What was
needed was a designation near enough the tru th yet not misleading to Soviet
eyes. Could they call me Professor of Economics?
"But 1 am not one," 1 said.
They then mentioned a prominent American economist, a mutual friend,
whose title is Professor of Economics and Social Science. Could they use that
title for me?
"It's not my title," 1 replied.
"But could it be? Would it be possible for an institution to give you
such a title?"
"I suppose so."
"Then may we call you that?"
"Well, it's not my title," 1 repeated.
"But it could be, couldn't it?"
"But it's not."
"But it could be."
So it was.
Over the next two weeks, several Soviet colleagues repeated their amuse-
ment and amazement that 1 was really a philosopher. It was as though a
long-time friend and professional collaborator, apparently female, had just
been revealed to be a male transvestite.
IMPRESSIONS OF PHILOSOPHY 35

As I prepared to depart, a brilliant young Soviet mathematician said:


"I've thought about what you told us concerning philosophy and foundational
questions, especially your point that we, in a sense, do philosophy. What you
meant, I think, is that to be a philosopher is to be very wise," whereupon he
gave me a carved owl. Maybe, then, the old-fashioned equation of philosophy
with the love of wisdom is not so different as I had supposed from my own
equation of philosophy with foundational inquiry-or perhaps my Russian
friend had had a surplus owl.
Impressions of philosophy vary inside the discipline as well as out: phi-
losophers harbor a variety of self-images. In 1975, while at Carnegie Mellon,
I had occasion to meet with most of the philosophy faculty of a well-known
university (of which I am supposed to say that it will remain nameless, but
unfortunately it already has a name). They told me of declining enrollments,
too few majors, no prospect of a Ph.D. program, administrative pressures-
everything but plague and locusts. What could they do to define a role for
their department, to protect and enhance their claim on university resources?
What were my colleagues and I doing at Carnegie Mellon?
I told them of our interdisciplinary approach to hiring, of our collegial
ties to other departments, of our effort to complement and profit from the
comparative advantages already enjoyed by our institution. I told them of our
advanced courses arranged with other departments and pitched at a combined
audience of philosophy majors and other majors-courses in aesthetics and
art history, in philosophy of technology, in philosophy of mind and cognitive
psychology, in philosophy in literature, in logic and computer science, in
medical ethics and biomedical engineering, in social choice and economics,
in philosophical logic and rhetoric. I told them of joint majors and second
majors and of graduate seminars taught by philosophers to students in other
disciplines. And I cited some statistical successes: the highest enrollments in
our college, more majors with a philosophy faculty of five or six than they
had with sixteen.
Expecting faces to smile and heads to nod, I instead saw jaws drop and
eyeballs roll. Heavily committed to history of philosophy, metaphysics, epis-
temology, and the like, my audience were horror-struck by the thought of
adulterating their product with foreign ingredients, of trading professional
purity for institutional prosperity. Theirs was a parochial self-image, an im-
pression of philosophy as philo-philosophy-a love neither of wisdom nor of
foundational perplexities nor even of the craft of philosophy but of topics
conventionally labeled philosophical.
Although philosophy would hardly exist without its core, to disdain all
but the core is often to miss the tastiest fruit. Philosophy that feeds only on
itself is often the most artful philosophy. It can also be the least fruitful.
36 CHAPTER 2

That brings me to the subversive impression. It is an interdisciplinary


impression, but it is more and less than that-more specific, less general.
In varying ways and degrees, much of philosophy is about other subjects-
about religion, art, science, technology, politics, psychology, and so on. But
much of that is not the good subversive stuff exemplified in these pages. Of-
ten the philosopher addresses other disciplines from the outside, describing,
interpreting, clarifying, even criticizing and prescribing, playing the role of
anthropologist, reviewer, teacher, or judge. Sometimes the philosopher im-
merses himself in another discipline, learning physics, law, art history, or
whatnot, sojourning with the natives that he might better report and assess
their customs. But the subversive takes an extra step, getting his boots dirt-
ier. He tinkers with foundations from within another discipline, addressing
its practitioners as colleagues, producing a scientific or scholarly product rec-
ognized, accepted, and respected by the members of that discipline. Such a
product might or might not bear the label philosophy. Even if it does not,
it is still likely to exhibit a comparatively explicit and artful treatment of
foundational questions.
Although some of my own work in political science would never get clas-
sified as philosophy, I believe that it bears the stamp of my philosophical
training, that without such a background I would have done it differently,
maybe better in some respects but, I am sure, worse in others. Let me de-
scribe a small example. Political scientists had long argued-and had lately
begun to assert without argument-that despite our vaunted system of checks
and balances, Congress had lost its control of the federal bureaucracy by ne-
glecting its oversight responsibilities. In challenging this "stylized fact," I
began with a definition: Congressional oversight is the attempt by Congress
to detect and remedy administrative violations of legislative goals. Almost
never clearly stated, this definition conforms to usage and neatly fits the issue
of congressional control of the bureaucracy. I then looked for unstated as-
sumptions and principles of evidence: What kinds of behavior were counted
as oversight activities? It had been uncritically assumed, I found, that over-
sight activities must take a particular institutional form. Next I concocted a
counterexample to this assumption, a form of oversight compatible with the
definition but different from the form assumed to be exhaustive. Doffing my
philosophical hat, I went on to construct a model of congressional and ad-
ministrative behavior, to deduce that the second form of oversight would be
adopted in preference to the first, and to adduce empirical evidence that over-
sight activities of the second form are widely conducted. No one has called
my paper on the subject philosophical (it seems less so in the reading than
in my telling), but the mark is there: well or poorly, I tackled a substantive
question in part by turning it into a foundational one.
IMPRESSIONS OF PHILOSOPHY 37

Sinful temptation awaits the would-be subversive. Like a diplomat, an-


thropologist, or spy, the subversive philosopher must guard against going
native, against experiencing such pride in his non philosophical credential or
such fear of rejection by his non philosophical colleagues that he hides his her-
itage and takes excessive pains to "pass" as a nonphilosopher-an effort that
only impairs his value to his new colleagues. Fortunately, the temptation thus
to sin is illgrounded: philosophy commands a fair measure of respect across
campus.
The subversive philosopher also must guard against the temptation to
embrace everyone as a philosopher. Because scholars and scientists of all sorts
address foundational questions to some degree, they are all philosophers to
some degree. The subversive may be tempted to conclude that his closest
nonphilosophical colleagues are philosophers no less than he.
To be sure, non philosophers have made important contributions to foun-
dational inquiry, and a philosophy degree has never been essential to the craft
of philosophy. But it is going too far to say that philosophers as such have
no distinctive contribution to make once they have infiltrated the perimeter
of another discipline. What marks philosophers off from others is not that
they address foundational questions but that foundational questions are their
specialty. Philosophers are familiar with such questions in numerous guises
within and across disciplines. They are trained, when addressing such ques-
tions, to be especially careful and explicit about suspending habitual beliefs,
identifying assumptions, drawing distinctions in common idiom when a spe-
cialized argot has failed or been set aside, and the like. Not that philosophers
have more than others to contribute to foundational inquiry (witness physics,
economics, and mathematical logic). But philosophers do enjoy a compar-
ative advantage in certain skills and experiences, an advantage that makes
them useful collaborators. I saw a James Bond movie in which Bond tells his
beautiful Chinese bedmate that Chinese women are different. "Oh,"she said,
"you think Chinese women are better."
"Not better," he replied. "Different. Like Russian caviar and Peking
duck."
That's philosophy. Not better. Different-different from other disci-
plines, that is, although not so different as salt fish eggs and roast waterfowl.
Thomas Schwartz
Department of Political Science
University of California
Los Angeles, CA 90024
THE COMPUTATIONAL MODEL OF THE MIND

A PANEL DISCUSSION

The following essays were essentially contributions to a symposium concerned


with the computational model of mind. Dana S. Scott moderated the dis-
cussion and started it by presenting a list of questions, reprinted below. The
participants were Gilbert Harman, John Haugeland, Jay McClelland, Allen
Newell, and Zenon Pylyshyn. (Unfortunately, Pylyshyn could not prepare his
remarks for publication.)
Wilfried Sieg

Dana S. Scott:

The Computational Conception of Mind


Some questions:

• Can machines think?


• What can we learn from the computational paradigm?
• Will computer science influence neurology?
• Are there innate rules?
• What is a rule?
• What is it that is learned?
• What is memory?
• What constitutes an experiment in this field?
• What can be regarded as good model building?
• Should a network analogy be used?
• Can models actually lead to explanation?
• Will there be a theory of comprehension?
Dana S. Scott
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

39
w. Sieg (ed.), Acting and Reflecting, 39-56.
© 1990 by Kluwer Academic Publishers.
BENEFITS TO MORAL PHILOSOPHY OF THE COMPUTATIONAL
MODEL OF THE MIND!

GILBERT HARMAN

I suggest that the computational model of mind may be able to shed light
on certain outstanding problems of philosophical ethics. The computational
model of the mind offers a possible way to explain certain aspects of our moral
thinking. In particular, I am thinking of difficulties that arise in implementing
probabilistic reasoning in a computational model of mind and I am thinking
about the so-called "frame problem" in artificial intelligence.
There are aspects of ordinary moral thinking that can seem irrational
from a certain point of view. It can seem that we ought to think of things in
another and seemingly better way than we actually do. But, when we consider
how this other seemingly better way might be implemented computationally,
we may discover that people could not operate that seemingly better way.
The two examples I am thinking of in philosophical ethics have to do
with the principle of "Double Effect" and the distinction sometimes made
between positive and negative duties. I do not want to suggest that the
computational model of mind by itself makes everything clear, only that it
offers a helpful perspective for looking at some of the issues that arise III
connection with these topics.
Double Effect
In ordinary moral thinking, we distinguish intended bad consequences
of an action from unintended but foreseen bad consequences. For certain
bad consequences, we ordinarily suppose that it is worse to intend to bring
them about than it is to act in such a way that you merely foresee will bring
them about. Philippa Foot gives this example (see Foot 1967). If you do not
contribute to the relief of hunger, people will die as a result. We normally
think it would be good of people to do something, and perhaps they ought
to do something, to relieve hunger. But consider a person who refuses to
contribute to famine relief on the grounds that bodies are needed for medical
research! That somehow seems worse than just not caring enough about
starving people to help out. Why? Because this person acts (or refrains)
with the aim or intention that people should die, whereas someone who just
does not care has no such aim. Notice that the person who refrains in order
that there should be more bodies for medical research even has what might
be thought to be a good end in view, whereas the other person is simply
COMPUTATIONAL MODEL OF MIND 41

acting selfishly. Even so, ordinary moral thinking condemns more the person
who refrains in order to provide more bodies for medical research.
Now, on reflection, it may seem silly to make a distinction of this sort
between intended and merely foreseen ends. Shouldn't all foreseeable con-
sequences of an action be taken into account in deciding what to do? The
agent should do that act that has the best consequences, without dividing
the consequences into those that are aimed at and those that are not aimed
at. Any act has certain benefits and certain costs. In deciding whether to
do the act, the costs must be subtracted from the benefits and it would be
foolish to ignore foreseen costs that are not incurred as part of the means to
the benefits. In choosing between various acts, an agent is choosing between
bundles of consequences which must be assessed as wholes.
For similar reasons, it seems an agent should consider, not just the
consequences of actions that are determinately foreseeable, but also various
possible consequences that are more or less likely to ensue. The agent should
try to maximize expected utility.
This all makes sense until the issue is considered from the point of view
of the computational conception of mind. For when you consider how an
agent might compute the expected utility of an act by considering all possible
consequences of the act, multiplying the utility of each consequence by its
probability, and then summing these results up, when you consider all that,
you see that this is not something that could normally be computed, since
there will be indefinitely many consequences that would have to be considered.
Furthermore, keeping track of the probabilities is also not computation-
ally feasible in a realistic system, since these probabilities will have to be
updated in the light of new information, and the number of conditional prob-
abilities a system needs to keep track of is an exponentially exploding function
of the number of possible evidence statements! (see Harman 1986, chapter 3)
This suggests that a computationally feasible agent will have to be quite
restricted in the consequences of actions that it can consider. Normally, such
an agent will have a very simple-minded approach to deciding what to do,
simple-minded anyway from the point of view of maximizing expected utility.
The agent will see that a certain plan will enable the agent to attain some
goal. Normally, the only consequences of action that are considered will be
those involved in that plan. Side effects and further consequences will not be
thought about, even if the agent is aware of them.
This needs to be looked at further in more detail, but it may well turn
out that an adequate account of a computationally feasible agent must see
an important distinction between intended means and ends that are part of
42 CHAPTER 3.1

the agent's plan and foreseen side effects and further consequences that are
not part of that plan. Pdiscuss this issue further in Harman 1986, chapter 9]
Positive/N egative
The other distinction is that between positive and negative duties. This
is sometimes put like this: it is worse to violate the negative duty not to injure
someone than to violate the positive duty of helping to prevent someone from
being injured. Foot gives the following example: Five patients are dying in
room 204. We can save them by manufacturing a serum. But the machinery
for making the serum gives off a noxious gas that is piped into room 206 of
the hospital. Alas, there is a patient in that room who cannot be moved.
Can we operate the machinery and produce the serum that will save the five
patients, although this process will kill the sixth patient in room 206? It
seems we cannot. Anyway, it is not obvious that we can do this. That seems
to be because our positive duty to the five dying patients is not as strict as
our negative duty not to injure the sixth patient in room 206.
On the other hand, consider this different scenario. A noxious gas is
being produced and sent through a pipe into room 204 where there are five
patients who will die unless this is stopped. There is no way to turn off the
gas or move the patients. However, we could move the pipe so that it no
longer seeps into room 204. Unfortunately, it will then seep into room 206
where there is a single patient. There is nothing else we can do. Under these
conditions is it OK to move the pipe? It seems we can! (see Thomson, 1985)
The general principle seems to be that it is much better to deflect an
ongoing harmful process from a larger group onto a smaller group than it is
to save a larger group by initiating a process that harms a smaller group.
Now, there are many questions that might be raised about this. One is why
we should think in these terms at all. Why should we say that in the second
case a process has been diverted without saying this in the first case? Why
can't we say in the first case where the five are dying and need a serum that
a process threatening harm to those five might be diverted so as to instead
threaten the patient in room 206 with harm?
I suspect that this has something to do with the so-called ''frame prob-
lem" in artificial intelligence (see Hayes 1973). The problem arises in the
computational theory of the mind when we envision a system that antici-
pates what the future will be like given certain changes. It seems that such
a system has to be able to suppose that most things will remain the same,
making exceptions for the few things that change. The system will suppose
there is an unchanging background framework with a few changes happening
within that framework. If the system has specifically to reach a conclusion
COMPUTATIONAL MODEL OF MIND 43

about each aspect of each future statf'. then the problem il' computationally
intractable.
Now one element of stability will be to suppose that there are relatively
enduring objects with relatively enduring properties. Another, I suggest, will
be to assume that. there are certain processes that occur perhaps following
relatively fixed "scripts". By dividing the world into objects and processes,
the ("omputational framework problem is made more manageable. If we did
not so structure the world, we would not be able to foresee anything about
the future.
Here then are t.wo ways in which I think research into the computational
theory of mind might be relevant to philosophical ethics.

NOTES
1 The preparation of this paper was supported in part by research grants to
Princeton University from the James S. McDonnell Foundation, the Defense Ad-
vanced Research Projects Agency of the Department of Defense and the Office of
Naval Research under Contracts Nos. NOOOI4-85-C-0456 and N00014-85-K-0465;
and the National Science Foundation under Cooperative Agreement No. DCR-
8420948 and under NSF grant number IST8503968. The views and conclusions
contained in this document are those of the author and should not be interpreted
as necessarily representing the official policies, either expressed or implied. "f t.he
McDonnell Foundation, the Defense Advanced Research Projects Agency, or the
U.S. Government.

REFERENCES
Foot, P. (1967) "The Problem of Abortion and the Doctrine of the Double Effect."
Oxford Review, no. 5. Reprinted in P. Foot (1978) Virtues and Vices. Oxford:
Blackwell, pp. 19-32.
Harman, G. (1986) Change in View. Cambridge, MA: MIT
Hayes, P.J. (1973) "The Frame Problem and Related Problems in Artificial Intel-
ligence," in A. Elithorn and D. Jones (eds.) Artificial and Human Think-
ing. Jossey-Bass. Reprinted in Bonnie Lynn Webber and Nils J. Nilsson
(eds.) Readings in Artificial Intelligence. Los Altos, CA: Morgan Kaufmann,
pp. 223-230.
Thomson, J.J. (1985) "The Trolley Problem," The Yale Law Journal 94, reprinted
in W. Parent (ed.), (1986) Rights, Restitution, and Risk: Essays in Moral
Theory. Cambridge, MA: Harvard University Press, pp. 94-116.

Gilbert Harman
Department of Philosophy
Princeton University
Princeton, NJ 08544
PHILOSOPHY AT CARNEGIE MELLON: PAST, PRESENT, FUTURE

JOHN HAUGELAND

Professor Scott opened this panel discussion with several interesting ques-
tions for our consideration. I would like to say a few words about a couple of
these-beginning, however, with the one that Dana himself deemed the least
promising: "Can a machine think?" Most people who address this question
proceed as if the hard part were deciding what is meant by 'think', or (once
some definition has been proposed) deciding whether a machine could fall un-
der the definition. In other words, the issue is conceived in terms of "drawing
the line" between those entities that can think (including many people) and
those entities that cannot think (including many machines).
And, once the issue is so conceived, you can have all kinds of lovely
squabbles about whether such and such a system or set of systems constitute
"partial solutions" or "half-way successes"-all of which gets fairly boring
fairly quickly. In my view, we won't really be able to tell how far along
any particular work may have been, or even whether it was really "along the
way" at all, until we're essentially done, and can look back and gauge the
whole path. Of course, throughout the history of psychology, there have been
occasional claims that we were essentially done; but I don't think 1986 would
be a very good year for such a claim. Right at the moment, we happen to
know too much about what we don't happen to know about the mind.
Perhaps that's why Professor Scott thought the question about thinking
machines so unpromising. But it seems to me that it's much more promising
if, instead of focusing on the work 'think', we focus on the word 'machine'.
Our conceptual resources for understanding what machines are and can do
have literally exploded in the last two generations, the last one generation,
and indeed, even in the last few years. After all, it is a new conception
of machine and not a new conception of thought, that has philosophically
fueled the emergence and ascendancy of Artificial Intelligence models in our
lifetimes.
To be sure, I say "philosophically" partly because today's events cel-
ebrate the University's recognition (at long last) that philosophy has an
important place at Carnegie Mellon. But that is only part of my reason;
for, in sooth, there is a brilliant heritage of philosophy here. Taking the
deepest sense of the term 'philosophy'-the sense that has nothing to do
with academic compartmentalization, but is rather our highest intellectual
accolate-it was two philosophers who launched the field of Artificial Intelli-
COMPUTATIONAL MODEL OF MIND 45

gence at Carnegie Tech, almost thirty years ago. These pioneers wove their
new understanding of machines into the great mentalist tradition of Hobbes,
Descartes, and Kant, thereby breathing into that tradition new life and hope;
a nascent Philosophy Department is genuinely honored by their presence this
afternoon.
I can illustrate my point about machines by nit-picking at a remark made
by another philosopher, John Searle. When he asks rhetorically whether a
machine can think, his first answer is: "Obviously yes; we are such machines
ourselves." My quarrel with this reply is not that it's false, but that it's
nearly vacuous, for no interesting sense has been given to 'machine'. One of
the great virtues of the computational model of thinking is that it rests on
a very precise and powerful conception of the relevant machine: specifically,
symbol processing machines. Of course, that insight did not mean there was
no more work to do; quite the contrary, suddenly there was lots of work to
do-fraught with unprecedented possibilities.
Conceptually, there was the formidable task of exploring and elaborating
the architectural understanding of the new machines, developing such ideas
as list processing, heuristic control, pattern directed inference, and so on.
Empirically, new ground had to be broken in the formulation and testing of
psychological models offering a hitherto unimaginable combination of detail,
scope, and rigor. Finally, there were enormous philosophical challenges in
working out the crucial but too often implicit new notions of symbol, knowl-
edge, meaning, understanding, and the like.
The last point deserves a little further comment, for it will have reper-
cussions in what follows. Basically, the "Good Old Fashioned AI" ("GOFAI")
conception of symbols derives from turn-of-the-century work in formal logic
and mathematics. Fundamentally, a symbol is a complex digital token with
a meaning (i.e., an interpretation relating it to some "outside world") that is
fully determined by its composition. The relevant composition comprises only
the (arbitrary) meanings of the constituent atomic tokens, and the structure
or "syntax" of the complex as such. This is profound suggestion, since, as we
now know, it is compatible with an account of "processing" that is simulta-
neously formalizable (hence mechanizable) and yet also semantics-preserving.
This pair of properties lies at the heart of the idea that the mind is a symbol
processing machine.
I belabor these foundations, familiar no doubt to everyone, because the
very precision and generality that is their strength may also be their undoing.
Let me explain what I'm getting at by turning deftly to the second of Professor
Scott's questions that I want to address. "What can we learn," he asks, "from
the computational paradigm in cognitive science?" Now I take it that the
appearance of the term 'paradigm' in the question automatically entitles me
46 CHAPTER 3.2

to two brief quotations from Kuhn's Structure of Scientific Revolutions; the


first is on page 65:
Anomaly appears only against the background provided by [a] paradigm. The more
precise and far-reaching that paradigm is, the more sensitive an indicator it provides
of anomaly and hence of an occasion for paradigm change.
So far, I have mentioned no "anomalies" confronted by GOFAI; and I have no
intention of starting now. I merely invite each of you to reflect for a moment
on your inner feelings of satisfaction with computational cognitive science.
Has it progressed as much as it seemed like it would 10, 15, or 20 years ago?
Have there been as many exhilarating new ideas in the first six years of the
1980's as there were, say, in the fist six years of the 1970's or the 1960's? I
have no illusions that everyone here will answer these question in the same
way, or, indeed, that it would prove anything if they did.
Rather, my purpose is to set up a different point. There has been one
cluster of new ideas in the 1980's, so exhilarating in some quarters as to
start a bandwagon, under the banner of "new connectionism" or "parallel
distributed processing." What's more, the very conceivability and manage-
ability of these models is again intimately bound up with our still rapidly
expanding understanding of what can be meant by machine. In the mean-
time, the same precision in a paradigm that renders anomalies recognizable
also makes it possible to discern which new directions are not evolution-
ary developments but revolutionary usurpers. The ultimate complexion of
PDP models is perhaps not yet clear; but they certainly do not seem essen-
tially predicated on the assumption of semantics-preserving transformations
of complex interpreted tokens.
In other words, it seems to me as if the excitement of the 1980's lies
not within GOFAI, but rather without it, among a still forming band of
pretenders to its throne. In the long run, this is a far more serious threat
than a few disputable anomalies, as is emphasized in my second passage from
Kuhn:
The decision to reject one paradigm is always simultaneously the decision to accept
another, and the judgment leading to that decision involves the comparison of both
paradigms with nature and with each other. [p77]
Of course, the results of that judgment, if indeed it comes to such a judgment
in this case, are not yet in. It is not my purpose here to announce or even
predict the collapse of Good Old Fashioned AI; we are all well reminded of
Mark Twain's quip about the prematurity of his obituary. And, while we're
waiting, we might also remember that the alternative approaches have only
begun to scratch the surface of the difficulties that they will ultimately need
to contend with.
COMPUTATIONAL MODEL OF MIND 47

In closing, I think it fitting to observe that, no matter how these issues


are resolved, this University is again at the forefront, attempting to under-
stand the mind in terms of the state of the art in the metaphysics of machines.
What a good time to open a new Department, for the future of philosophy
looks bright at Carnegie Mellon.
John Haugeland
Department of Philosophy
University of Pittsburgh
Pittsburgh, PA 15260
THE BASIS OF LAWFUL BEHAVIOR: RULES OR CONNECTIONS

JAY MCCLELLAND

What is the basis of lawful behavior? What knowledge underlies it, and how
is it acquired? My colleagues and I have been working toward a new kind
of answer to these questions. We have discovered that lawful behavior can
emerge from the performance of a network of simple computing elements. We
have also discovered that these networks can learn to behave lawfully through
experience.
To illustrate, let us consider a simple kind of lawful behavior: the pro-
ductive use of the past tense of English. Even reasonably young children can
form the past tenses of familiar words in English. More than this, they can
form the past tenses of made-up forms that they have never heard. Jean
Berko demonstrated this in experiments on young children in 1958. Even
more strikingly, young children often regularize irregular words; they say
things like "taked" and "goed". Berko took this evidence of the productive
use of the past tense as evidence that the child had acquired the rule. To
quote her 1958 paper:
If a child knows that the plural of witch is witches, he may simply have memorized
the plural form. If, however, he tells us that the plural of "gutch" is "gutches", we
have evidence that he actually knows, albeit unconsciously, one of those rules which
the descriptive linguist, too, would set forth in his grammar.
Berko's argument sounds reasonable, but on close scrutiny a question
arises. Exactly what is the form of the unconscious knowledge of the rule? Is
it written down in the mind in some sort of explicit form, simply inaccessible
to overt report? Do the processing mechanisms actually consult these rules,
and do the learning mechanisms actually formulate, evaluate, and/or modify
members of the rule set?
My colleagues and I have begun to develop an alternative to this type
of account. In our view, the implicit knowledge is stored in connections
among simple processing units organized into networks. While the behavior
of the network may be describable (at least approximately) as conforming
to a system of rules, the network models have properties that differ from
explicit formal rule systems. These properties allow them to capture several
important characteristics of the language acquisition process, as we see it
occurring in the human language learner.
To give you the flavor of our approach, I will describe a computer simu-
lation model David Rumelhart and I have developed that learns to produce
COMPUTATIONAL MODEL OF MIND 49

past tenses of English verbs from exemplars. The model greatly simplifies the
past tense learning task, compared to the task as the child confronts it, and
isolates it from the rest of language acquisition. These simplifications allow
us to focus on the basic point, which is to illustrate how lawful behavior can
be acquired by a network.
In our version of the task, the model is presented with training pairs,
consisting of the present tense form of a word, paired with the corresponding
past tense form. Thus it might be shown go-went, like-liked, etc. Its task is
to learn to produce thp. appropriate past tense form, given the root form as
its input.
The model consists, primarily, of two sets of simple computing elements
(see Figure 1). Each element is a very simple device that takes on an activa-
tion of 0 or 1, based on the weighted sum of inputs from other units. One of
these networks is used to represent the root form of the word, and the other
is used to represent the past tense form.
Processing works like this. When a root form is presented, it produces
a pattern of activation over the root form units via an encoding network that
translates the sequence of phonemes into a pattern of activation. Each of the
root form units represents a phonological property, and if a unit is turned on,
we can think of this as indicating that the property it stands for is present
in the root form of the word being processed. There is a large number of
units, and each word turns on a large subset of them. The representations
of different words overlap with each other in this representation, in that they
share many properties-but each word has its own unique set of properties
that represents it.
Now in most models, representations can be seen as patterns, but in
these models, they are patterns of a particular kind-they are active patterns
that can activate other units through connections. Each of the units in the
root network has a connection to every unit in the past tense network, and
whenever a unit is on it sends signals to all of the units it is connected to.
These signals are weighted by the connections, which may be positive or
negative. If positive, they tend to turn the receiving unit on; if negative, they
tend to turn it off. The receiving units add up the signals they receive, and if
the net input is strongly excitatory they come on with high probability; if it
is strongly inhibitory they stay off with high probability. Intermediate values
produce intermediate probabilities of the unit coming on.
Now it turns out that this kind of network can be trained to find values
of the connections from one set of units to another so that an arbitrary
pattern on the input units will produce a particular output pattern on the
other set of units. The training procedure is very simple. We just present the
50 CHAPTER 3.3

Figure 1: A very small connectionist network, consisting of two groups of units like
those used in our simulations of past-tense learning. The input units are arranged in
a row along the bottom of the figure; the output units are in a row down the right-
hand edge; the connections among the units are indicated in the square array. The
"+", "-", and "." symbols above the connections indicate excitatory, inhibitory,
and null connections, respectively. These particular connection strengths would
allow the indicated input pattern (dark circles along the bottom) to produce the
indicated output pattern (dark circles along the right edge). (From Rumelhart &
McClelland, 1986, reprinted with permission.)
COMPUTATIONAL MODEL OF MIND 51

input pattern and allow it to produce an output pattern based on the current
values of the connection strengths. Then for each output unit, we compare
the obtained pattern with the desired one. When a unit is not active that
should have been, we increase the strength of the connections coming into it
from each active input unit. This means that next time the same input will
be more likely to turn this unit on. When a unit is active that should not have
been, we decrease the strength of the connections coming into it from each
active input unit. This means that next time the same input pattern will be
less likely to turn this unit on. If we carry out this procedure repeatedly with
the same pattern pair, we can guarantee whatever level of accuracy we wish.
In fact, we can train a network to respond correctly to all the members of a
large ensemble of patterns in this way (as long as certain technical conditions
are met).
Now, think with me about the following experiment. Suppose we train
the model with a set of patterns that all exemplified the regular past tense
pattern of English. That is, we present successive pairs like "like-liked",
"hate-hated", "love-loved", etc. For each, we present the root form, we test
to see what the network generates, and we adjust the connections wherever
there are discrepancies between the obtained output and the correct past
tense form. The network will develop strong connections from input features
to the corresponding output feature. Initial "I" in the input will activate
initial "I" in the output, etc. In addition, it will learn to add the correct,
"regular" past-tense ending.
While the pattern of connections is built up from experience with par-
ticular exemplars, the model comes to be able to act in accordance with the
past tense rule. Not only can it correctly form past tenses of words in the
training set, but it can also do very well on the past tenses of words it has
not seen before.
OK, you say, so that's a mechanism that learns to act in accordance with
the past tense rule, but so what? Why should I believe the mind really works
this way, instead of in terms of some real rule induction process? And anyway,
even if I accept that it really does work this way, why shouldn't I just ignore
this and treat the model as a statement about the implementation details?
After all, the mechanism behaves just as if it did have the rule, doesn't it?
What difference does it make?
It makes a lot of difference. For mechanisms like this have a lot of
properties that correspond to what we see in the language learning of young
children. First of all, the mechanism is not thrown by noise-in this case
exceptions in its inputs. It can learn, gradually, to find a set of connections
that captures both the regular pattern and the exceptions in the same set
of weights. Early on in learning, if it receives a small number of exceptions
52 CHAPTER 3.3

Anticipated:

1. Model exhibits over-regularization responses ("go" -+ "goed").


2. The model exhibits variability in its responses during transitional phases of
acquisition ("go" -+ "goed" coexists with other responses).
3. The transition to the adult state is very gradual (regularization errors persist
well into grade school, becoming less and less frequent).
4. The "penetration" of the "past-tense rule" is less than perfect; children are
better at using it with familiar words than with novel forms, even as late as
third grade.

Unanticipated:

5. A special type of transition error, in which irregular past tense forms are
combined with the addition of the "-ed" ending, enter processing late in the
transitional phase when regularizations are only occurring about 10% of the
time (examples are "wented" and "ated").
6. Among irregular forms, those involving no change in forming the past tense
(e.g., "hit", "bid") are easiest to learn.
7. Correspondingly, monosyllabic verbs ending in "t" or "d" that should have
"-ed" added, tend to be used in past-tense contexts with no change (this
includes made up verbs like "mott" as well as real ones like 'pet" as in "he
petted the dog").
8. Irregular verbs involving vowel changes only are regularized more than irreg-
ular verbs involving vowel change and a final consonant change (e.g., verbs
like "sing" are regularized more than verbs like "seek").

Table 1: Correspondences between the Simulation Model and Acquisition Data

mixed in with a large number of regular verbs, it learns the regular pattern
and overregularizes the irregular forms. As I mentioned before, we see this
phenomenon of overregularization in the past tense usage of young children.
Rumelhart and I have run several simulation experiments using training
lists consisting of a mixture of regular and irregular verbs. These simulations
exhibit a number of features that are characteristic of the acquisition of the
past tense of English. One might think that something as simple as the past
tense would not be a rich field of empirical evidence, but in fact it is. In Table
1, I have enumerated several aspects of the models' behavior that are actually
observed in the speech of children learning English as their first language.
I am very enthusiastic about this model, but I don't want to give the
impression that I think it is perfect. It does have flaws, but these are due I
think, to the simplifications that we incorporated to illustrate the basic point
that lawful behavior could emerge from a network of simple processing units.
COMPUTATIONAL MODEL OF MIND 53

Rather than dwell on how we intend to improve the model, I will return
briefly to the basic issue.
When they see lawfulness in behavior, cognitive scientists since the late
50's have been quick to jump in and say that this lawful behavior indicates
knowledge of rules. While it is often acknowledged that lawful behavior need
not necessarily be based directly on systems of rules, attempts to make ex-
plicit theories about the mechanisms that underlie lawful behavior have gener-
ally been couched in terms of rule systems. Until recently, as Zenon Pylyshyn
once said, this approach has been the only straw afloat.
A growing group of researchers is working on a second straw. The mem-
bers of this group view our work in the development of connectionist, dis-
tributed network models of cognitive processes as an attempt to construct
explicit theories in which lawful behavior is an emergent property. We think
this approach has great promise, and we are now actively engaged in extend-
ing it to sentence processing and other, higher-level cognitive tasks.

REFERENCES
Rumelhart, D.E. and J.L. McClelland (1986) "On learning the past tenses of En-
glish verbs," in J.L. McClelland, D.E. Rumelhart and the PDP research group,
Parallel Distributed Processing: Explorations in the Microstructure 0/ Cog-
nition. Vol 2: Psychological and Biological Models. Cambridge, MA: MIT
Press/Bradford Books.

Jay McClelland
Department of Psychology
Carnegie Mellon University
Pittsburgh, PA 15213
ARE THERE ALTERNATIVES?

ALLEN NEWELL

Let me start with two short comments on what others have said, and then
give a tiny non-lecture.
First, I had a particular reaction to Dana Scott's list. It seemed to
me to have an odd characteristic: it assumed that the problem that we are
dealing with in understanding the mind is very special in comparison with the
sciences. It raised questions about what experiments could mean in this field,
and it raised questions about what we could learn from the computational
paradigm. Whereas it seems to me that the basic question is simply one of
psychological science: we will have understood the nature of mind when the
psychological theories of it get tight enough and good enough so that it is
clear that we have mind caught in the scientific net. It will then look like any
other kind of science, but with those particular theories. The computational
conception of mind is certainly a theory, as John Haugeland put it. It is
certainly a model that's around and says something, and it is the basis for
lots of existing detailed theories of the mind.
Second, I was rather happy that John Haugeland at least made a step in
the direction of saying what the computational theory of mind is, namely sym-
bolic systems. There are other definitions, one of them that it is a computer-
like model of the mind. That definition shares a virtue with a position that
Herb Simon, Alan Perlis, and I took a long time ago in a letter to Science,
where we defined computer science as being concerned with all the phenom-
ena that grow out of the computer. We understand more about the nature
of computation as the computer evolves through time. Unimagined aspects
emerge. As a result, our whole view of it becomes enriched, and attempts
to characterize exactly what is there at a particular point in history don't
stultify things. But putting such caution aside, I certainly see the notion of
symbolic structures and symbolic computations as fundamental. However, I
would like to correct John a little bit. He used the word "invent". Herb and
I would both argue that we recognized symbol systems simply as something
that was around at the time in the nature of computers and the way they
were being used.
The non-lecture I'm going to give is to make a simple point. Namely,
we are in danger of not having any alternatives to the computational theory
of mind. This might seem a disturbing prospect to those who view science as
selecting among competing theories. I think the danger is real.
COMPUTATIONAL MODEL OF MIND 55

However, I don't think the danger has serious consequences, because


other kinds of alternatives have been around for awhile.
Let me note the kinds of alternatives that have been put forth. These
are alternative views of what the mind could be all about. One of them
is stimulus-response theory; another one is Gestalt Fields; a third is the
Freudian psychological view, which is to say, an energy model at the bottom,
with a dramatic model overlaying it; finally, though not so widely known,
mathematical psychology in the 1950's took Markoff systems as essentially a
frame within which to cast all mental action.
What each of these different views provides is a space of systems within
which to search for an explanation of mind. They do not say what is the
exact theory of the mind. They simply say that the mind will be a symbolic
system, if we talk about the computational theory, or that it will be a Markoff
system, or whatever. But the different views definitely provide alternatives.
In fact, there are additional alternatives around, if one just looks for
them. One is ecological physics, essentially the Gibsonian point of view.
There is also the phenomenological view. I attribute this to Hubert Dreyfus,
just because he's been the representative of it to cognitive science. It says that
human action always arises out of an inarticulate background. This is taken
as inconsistent with the computational view, which is taken to entail that
everything is articulated. Hence, since the background can't be articulated it
can't be modeled in the computer. This is usually presented as an argument
to show that the computational view can't be right, but from our point of
view it is better seen as just an alternative system view. Yet another view is
cybernetic systems; that is, we should try to model the mind as a feedback
system, described as a set of differential equations. The last one is neural
systems. I'll come back to this view in a moment, because it is the interesting
one.
The feature of all the other system frameworks, it seems to me, is that
they can't compete. Now that we've had 25 or 30 years of work within
the computational model in cognitive psychology, linguistics, and artificial
intelligence, the amount of experimentation, the number of regularities, the
number of domains within which computational microtheories exist-all of
this is now so vast compared to what was explained, say, by stimulus-response
theory, by Gestalt field theory, that there simply are no viable competitors
around. I don't expect this to be conclusive for those who still believe in
other views, especially (as in phenomenology) when those views are based on
arguments and not data. It is my personal assessment.
The one possible exception is neural systems. Quite clearly there will ul-
timately be a view of the mind arising out of neural systems. These systems,
56 CHAPTER 3.4

as immediately present to the eye, are certainly different from the computa-
tional systems we've been talking about.
There are three views available on the nature of neural systems in rela-
tion to symbolic systems. One view is that neural systems are the substrate
out of which the architecture is composed. This leads to systems that look
like the symbolic systems that we now see. A second view is that a more com-
plicated relation exists between neural and symbolic systems. People have
tried to invent various forms of that relationship. For instance, the system
is a symbolic system and what's going on in the neural substrate is not only
the support for the symbol system but also most of the learning behavior.
That is, the performance of the system can be described in terms of symbols
and their processing, but the learning cannot. It might even be that the sub-
strate composes symbolic systems from moment to moment. If you strobe the
system at any moment, you always see a symbolic system, but really all the
dynamics must be described in neural terms. The third view is that, when we
really understand the way the mind works in neural terms, the whole notion
of symbolic systems and symbolic computations will wash away.
In this last view, we may indeed, have an alternative to the notion of the
computation system. In other views, the analysis of the systems is entirely
computational. The issues that go back and forth between Jay McClelland
and his friends are about algorithms, representation, and so on. They are just
focused on particular types of algorithms realized in particular technologies.
But they are no more outside the computational paradigm than are the rest
of us symbolic folk.
In sum, although a small chance exists that we will see a new paradigm
emerge for mind, it seems unlikely to me. Basically, there do not seem to be
any viable alternatives. This position is not surprising. In lots of sciences
we end up where there are no major alternatives around to the particular
theories we have. Then, all the interesting kinds of scientific action occur
inside the major view. It seems to me that we are getting rather close to that
situation with respect to the computational theory of mind.
Allen Newell
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
DISCUSSION: PROGRESS IN PHILOSOPHY

HERBERT A. SIMON

Is there such a thing as progress in philosophy? Nowadays, some even ques-


tion if there is progress in science, and strong doubts have been expressed by
Feyerabend and others. However, most of us have thought that, at least
in some sense, the continual accumulation of facts and the continual re-
evaluation of theories in the natural and social sciences constitutes progress.
What about philosophy? Philosophy was described this morning as a
kind of residual field-and historically, that is what it has been. It is the
field that remains when other fields such as physics and psychology wander
off and become independent. But I don't think a field likes to characterize
itself simply as residual. What is the alternative? This morning I thought
we came very close, and Pat Suppes specifically, to defining philosophy as
another empirical science--the particular empirical science that attacks very
broad and fundamental questions, but attacks them with the aid of facts.
If that is an acceptable definition of philosophy, then what has been going
on here this afternoon is one example of what activity in philosophy might be
like. This afternoon, we were dealing with a classical philosophical question,
the mind-body question. Some of us were taking an empirical approach to
it by asking the further question of whether machines think. That certainly
sounds like a philosophical question, although some might consider it a silly
one. To me it sounds like a fruitful way to ask the mind-body question-a
way that allows its answer to be approached empirically.
To answer the question of whether machines think, we must first define
some terms. In particular, we have to define what thinking is. Then we pick
some machines and put them to the test, to see whether they do the things
that we have agreed to call "thinking." It is easy to make the test operational
if we adopt the position of personal solipsism. Because, for all of us, thinking
has a double aspect-it presents two faces to us. First, there is the thinking
that we ourselves do, that I'm doing right now. We can experience that
thinking directly, going on inside our own heads. In contrast to our own
thinking is the thinking that other people do, which we can recognize only
indirectly. To be sure, we can empathize with the thinking of others and we
can suppose that they are having experiences like those we have when we are
thinking, but there is no way to test directly the validity of our empathy and
suppositions.

57
w. Sieg (ed.), Acting and Reflecting, 57-62.
© 1990 by Kluwer Academic Publishers.
58 CHAPTER 4

But from a more solipsistic, hard-headed standpoint, the reason we know


that our friends are thinking is that the appropriate frowns appear on their
faces at the appropriate moments, or that they put the symbols on pieces of
paper that indicate they understand a task and are performing it successfully.
That is how we judge that our friends are thinking, and how the college
entrance boards judge whether candidates for college admission are thinking.
If we adopt evidences like these as our test of whether thinking is taking
place, then it becomes very easy to decide whether a computer is thinking.
We simply present to the computer the very same tests we have presented
to people and observe what the computer does. If a computer performs on
the test as a thinking person would, then we conclude that the computer was
thinking. This is, of course, a weak form of the Turing Test.
There are many other questions in philosophy that could profitably be
approached in this way, and some of which actually are being approached
in this way nowadays. Some of these are sub-questions under the general
question of what thinking is, and some of them are questions beloved of
phenomenologists. For example, the question of the nature of intuition is
central to the concerns of people like Dreyfus who talk about the limitations
of machines.
Intuition can be investigated in the same way as other thinking. It does
little good to ask "What does it mean for me to have intuitions?" But it is
very profitable to ask, "What does it mean for other people to have intuitions?
How can I tell that they do have them? What are the symptoms? What are
the tests and criteria that lead me to that kind of conclusion?" If we can
agree on the use of the word "intuition" in an operational way, then we can
begin to ask whether systems like computers, appropriately programmed, do
or don't have intuitions. It becomes an empirical question.
I happen to believe that the questions of whether computers can think
and whether they have intuitions have been answered for many years, because
we've seen numerous examples of computer programs that exhibit all the
behaviors we expect to observe in human beings who are thinking or having
intuitions. Some of those programs date back to the late 1950's. But some
people are unconvinced, presumably requiring a higher standard of evidence,
so perhaps there's some point in continuing the empirical work on this topic.
For our present purposes the important point is that these are some
definitely philosophical questions that can now be approached in ways that
weren't available before this marvelous device, the computer, entered into the
scene.
Another example of a philosophical issue we've touched upon today is
whether models are theories-or in what sense models are theories-or which
PROGRESS IN PHILOSOPHY 59

of the meanings of the word "model" are synonymous with some of the mean-
ings of the word "theory." If the model is in the form of a computer program
(just as if it were in the form of a machine built in your shop), then its
properties are easily examined and analyzed and you can decide quite readily
whether, and in what sense, it constitutes a theory for certain phenomena.
You can determine what kinds of predictions it makes about the phenomena
and at what grain size, or level of resolution. Then you can decide, on the
basis of the power and accuracy of its predictions, whether it is reasonable to
call it a theory.
Using computers is probably not the only way to do philosophy. Some
philosophers are accustomed, or even addicted, to doing their philosophizing
with the help of mathematics and logic. (There are even some philosophers
who think you can do philosophical work with ordinary language, but evalu-
ating that claim would raise a whole new set of considerations.) I expect that
there's going to be a great deal of competition in the future (and I hope also a
great deal of collaboration) between those who do philosophy with computers
and those who do it with the tools of logic and mathematics. Here we can
turn our theories on ourselves and ask which of these tools is more likely to
be profitable and in what contexts.
My own hunch is that the tools we've been hearing about this afternoon-
the approach to philosophical questions by writing computer programs, seeing
how the computers then behave, analyzing the structures of the programs-
is going to be particularly useful in gaining an understanding of complex or
ultra-complex systems. This is not really a radical hypothesis. In almost ev-
ery field of science today scientists are finding that they can get very useful,
if not always elegant, answers from computers to a wide range of questions
where they can't solve the equations in closed form. As a result, we find a
proliferation today of scientific analysis by computer simulation.
There is no doubt a lesson in this for our philosophical enterprise. As
we build theories of the mind, should we be trying to derive mathematical
theorems about the mind? Should we be concerned primarily with whether
the mind, or some functioning aspect of it, is equivalent to a Turing machine,
or is some kind of finite-state or infinite-state automaton? Are these going
to be central questions? Undoubtably, there are some interesting questions
of these sorts, but most of the central questions we want to answer have
to do with systems of such complexity that we shall have to be satisfied (if
"satisfied" is the right word) with answering them on the basis of empirical
evidence rather than proofs of theorems. We will have to answer them with
the help of the rather baroque techniques of computer simulation. Dealing
with very complex objects, we should not expect mathematics to do the
60 CHAPTER 4

whole job for us, and we shouldn't expect the whole field to become highly
formalized in a short time-if ever.
If we look at the not wholly unrelated domain of computer science, we
can see this very clearly. There is a portion of computer science, very com-
pletely represented in this room, that proves mathematically a number of
important and interesting mathematical properties of computers and com-
puter programs. But I think it is an historical fact about the past forty years
that a lot of empirical study has been required to understand computers; and
today, computer science has much more the flavor of an empirical, experimen-
tal science than of a branch of mathematics. Al Newell, Alan Perlis, and I
published a definition of computer science some years ago (Science 151:1373-
74, September 22, 1967) that claimed: Computer science is the study of the
phenomena surrounding computers. "Computers plus algorithms," "living
computers," or simply "computers" all come to the same thing-the same
phenomena.
Pursuing computer science requires us to run computers and then ex-
amine their actual behavior because we don't have the wits-in the face of
such complexity-to sum up what is going on in a limited number of precise
theorems. In this respect computer science resembles molecular biology much
more than classical mechanics, which almost became a branch of mathemat-
ics.
Similarly, as philosophy moves in the direction that was suggested by
Pat Suppes, a lot of our results are not going to take the form of formal
proofs, but are going to be empirical results for which we have evidence but
not the kind of certainty that we traditionally expected from logic. What
I am propagandizing for here is something that in other contexts and for
other purposes I have called bounded rationality. We human beings usually
have reasons for what we do, and in trying to understand those reasons we
have constructed what is referred to as a theory of rationality. The theory
takes many forms. In economics and mathematical statistics its most popular
current form is the theory that people maximize their expected utilities.
As we have constructed this very nice formal theory of rationality, we
have discovered simultaneously (empirically) that it gives us only a very
coarse approximation to the kind of human behavior that actually occurs
in this world-the behavior we have in mind when we say that people usually
have reasons for what they do. While it is very important to understand the
theory of subjective utility, as a highly idealized and simplified notion of what
rationality is all about-a notion that can live in Plato's heaven of ideas-at
the same time it is very important to understand the notion of rationality
that is consistent with the very limited computational capabilities, thinking
PROGRESS IN PHILOSOPHY 61

capabilities, of human beings. We need a theory of rationality attuned to the


limited thinking abilities that human beings have.
Professor Harman provided an interesting interpretation of some ethical
conundrums, suggesting that one reason we don't settle these questions im-
mediately by maximizing subjective expected utility is that they present us
with a level of computational complexity we are not prepared to cope with.
I don't know whether that is a correct explanation for the difficulty of these
ethical questions, for it is an empirical claim as to why people experience
these questions as conundrums and don't answer them immediately. But I
think Professor Harman's explanation contains a clue to one of the uses to
which we can put our understanding of our minds as computational devices.
With a better empirical understanding-not merely an idealized notion-of
the computations we human beings are really capable of, we obtain a new
theory of human reason.
The new theory of reason is not just a descriptive theory. If I claimed
only that, you would want to send me back to the department of psychology,
and properly so. On the contrary, an empirically based theory of human
r~ason could then be converted into a normative version relevant to creatures
who do have these kinds of computational limitations to our thinking. It is
cold comfort to know that if human beings followed the dictates of subjective
expected utility, or some other idealized theory of rationality, they would then
be able to make wholly consistent and transitive choices. It is cold comfort
because I know that, as a human being, I live in a world that is orders of
orders of magnitude too complex for the process of calculation called for by
the theory to work.
We would find it highly useful to have, in philosophy, theories of rational-
ity applicable to creatures who are bounded in their computational abilities.
Ethics is one area where they would obviously be useful; the theory of dis-
covery, which was mentioned this afternoon, is another. It has been said (by
Popper and Kuhn, among others) that there can't be a normative theory of
discovery that is relevant for philosophy. But scientists, like other human
beings, have reasons for what they do. As we build a computational theory
of the mind, we can derive from it a theory of scientific discovery, normative
as well as descriptive. The theory would not specify how perfectly rational
men and women go about making scientific discoveries optimally, but would
specify what some of the heuristic procedures are that would be reasonable
for a creature of very limited computational ability to apply if that creature
wanted to find out about the regularities of the environment in which it lives.
We can apply this notion of bounded rationality to philosophical questions
about the discovery and confirmation of theories. And if we incorporate in
62 CHAPTER 4

confirmation theory some notion of the limits of the human mind, it will take
on quite a different appearance from the one it has today.
My assertions here would, of course, take some proving. At the mo-
ment, I advance them only as examples of the challenging problems that face
philosophy today, and of the powerful approach to these problems that the
computer in particular, and empirical approach in general, offers us. These
examples foreshadow a philosophy that deals with very general and funda-
mental human questions, not excluding their empirical underpinnings-the
facts of the world that might be relevant to answering them. This is the kind
of philosophy that we at Carnegie Mellon now have an opportunity to work
at within the friendly environment of an official and formal department, as
we have been doing unofficially and informally for many years. We now have
an opportunity to participate even more intensively in what promises to be
one of the central philosophical ventures of the coming generation.
Herbert A. Simon
Department of Psychology
Carnegie Mellon University
Pittsburgh, PA 15213
PHILOSOPHY AND THE ACADEMY

CLARK GLYMOUR

Philosophy is an awkward discipline, set among departments of human-


ities like some kangaroo among the cattle, like some odd and ugly double-
headed duckling. The awkwardness does not result entirely from social con-
trivance; it is a real and essential consequence of how well the philosophical
tradition has met the demands that philosophical questions impose.
There was a time, roughly from 1750 until somewhere in this century,
when philosophy seemed to stand as its own subject, a species apart. When
Hume wrote, natural philosophy had become physics and chemistry and biol-
ogy, subjects that could be pursued without constant re-examination of their
foundations, but whose foundations remained withal full of puzzles and trou-
bles. In the eighteenth century one could continue to wonder about which
faculty of mind controlled scientific belief or aesthetic judgment, about the
consistency and intelligibility of infinitesimals, about the natural foundations
of morals and politics, about the propriety of belief in the unobserved, about
the character of scientific explanation, about the connections between ratio-
nality and action. Such questions were principally (but surely not exclusively)
the province of philosophers, some of whom, such as Mill and Kant, produced
what various contemporaries regarded as definitive answers. A separation of
labor, however incomplete, came naturally enough. Kant wrote his Prolegom-
ena for teachers of philosophy, and such a purpose made sense at the time.
Physicists, chemists, biologists, later psychologists and sociologists, took in-
struction from philosophers, and some of them learned the foundations of
their subjects like a lesson. Of course, sometimes disputes broke out, and
chemists and (perhaps especially) mathematicians found themselves arguing
metaphysics and epistemology. But often, philosophical authority framed the
understanding of the scientific enterprise. Near the turn of our century, Josiah
Willard Gibbs introduced his new statistical mechanics with the remark that
its aim was to provide the a priori foundations for the science of heat. New-
ton may have set his problem, but Kant provided the sense that Gibbs saw in
it. Freud learned a conception of the methods and goals of psychology from
Brentano, who was in this regard Mill's messenger to Vienna.
The general recognition that the task of philosophy was to provide the
foundations of science, morals and religion set philosophy apart from other
enterprises of letters, from philology or history or literature, for example. It
gave to philosophy a pretension, authority and scope not found so clearly

63
w. Sieg (ed.), Acting and Reflecting, 63-71.
© 1990 by Kluwer Academic Publishers.
64 CHAPTER 5

in other disciplines; it made philosophers experts of a kind deemed relevant


in both scientific and moral enterprises. It carried on the tradition of phi-
losophy as an enterprise concerned with producing general knowledge, and
only incidentally, if at all, an enterprise concerned with producing anecdote,
beauty, entertainment, witticism, sympathy. The sensibility that the aim
and result of philosophy is the production of general knowledge remained
alive even through the worst of philosophers. However badly they argued,
however opaquely they expressed themselves, philosophers claimed to say
something worth knowing. Something, moreover, that was general and not
anecdotal; something that would provide the real justification for the enter-
prises of science, morality, politics and art. In one way or another some of
the philosophers claims or criticisms were recognized, whether by scientists
or moralists or artists; with that recognition of substance went a recognition
of a special philosophical authority and role, and a measure of deference.
For philosophy, times have changed, both in intellectual life and among
hoi polloi. In America at least, there remains no popular conception of the
philosopher as a contributor to knowledge. The New York Times Magazine
may occasionally try to bring the profession of philosophy to the attention of
the broader class of American intellectuals, but its rare efforts have left no
firm impression. The popular, even intellectually popular, understanding of
philosophy is of a useless enterprise, whose practitioners keep alive knowledge
of old books and otherwise serve as secular pontiffs, moralizing to little effect.
The contrast with philosophy and philosophers early in this century is strik-
ing. Husserl, for all the poverty of his philosophy, had a powerful impact on
his scientific contemporaries at Gottingen, and through them on the world.
The first edition of Herman Weyl's Raum, Zeit, Materie, advised the reader
that the theory of relativity was a fruit of the phenomenological method. At
roughly the same time, John Dewey was telling parents and educators the
pragmatist theory of education, and in the pages of the New Republic telling
liberals and progressives that they should support Woodrow Wilson's war
effort. Parents, educators, liberals and progressives listened. There is today
no philosophical figure in any English speaking country who has the audience
and influence that, for example, Dewey had in the United States in 1917.
(And if all philosophical advice were as poor as Dewey's, perhaps its neglect
would be for the best.)
Today the popular conception of philosophy and philosophers is not
very different from the general academic conception held by chemists, physi-
cists, economists, biologists, engineers, statisticians and ever so many others:
Philosophers are the thing if you need a scold or an antiquarian. That this
conception is abroad is partly a social artifact of professionalization and the
organization of university disciplines, but the social processes have done their
PHILOSOPHY AND THE ACADEMY 65

work only in reaction to philosophical theories that address ancient and en-
during questions.
A piece of academic middle-brow wisdom is that philosophy is a stage of
inquiry consisting principally of speculation and vagary: when sensible people
discover some central basic truths in a domain, and a method of systematic
inquiry is settled on, a science separates from philosophy. The middle-brow
picture is vaguely of philosophers speculating about how things move, un-
til some sensible fellow such as Newton makes a science of it. For the last
three hundred years, the only historical period in which philosophy could
be considered a distinct discipline and profession, I think that is very much
the wrong picture. Two other sorts of things have happened instead. First,
and most importantly, philosophical writing has itself developed or prompted
articulate theories with a rich structure. These theories are fundamentally
philosophical in a traditional sense; they concern very general norms or ide-
als for categories of action, and, of course, the metaphysical structures that
underlie such norms. Disciplines that distinguish themselves from philoso-
phy have developed through embracing those answers, and because of that
embrace the distinctions drawn between these enterprises, on the one hand,
and a particular philosophical tradition, on the other hand, are largely profes-
sional and social rather than fundamental and intellectual. Second, especially
in physics but also to some extent in biology, certain philosophical questions
of a metaphysical or conceptual kind have become part of the professional
texture of a scientific subject, so that someone trained in a scientific disci-
pline can pursue these questions and be recognized as a physicist or biologist,
and not as a philosopher. Especially in the 20th century, entire subjects have
been founded on explicitly philosophical theories, and philosophical questions,
or certain kinds of answers to them, have moved into the center of various
disciplines. In neither of these kinds of cases is it a question of philosophical
vagaries and speculation coming to be replaced by an empirical study, or by
an uncontroverted empirical theory. By anything other than social and pro-
fessional measures, economics and physics and many other disciplines have
not abandoned philosophy; they have embraced it.
Consider a few examples.
1. One of the great philosophical questions concerns how evidence ought
to transform belief; another concerns how interest and belief ought to
determine action. The issues are as old as Plato. Bayesian statistics
and decision theory form a large sub-discipline of statistics, and another
of economics, whose ancestral tree is rooted in a philosophical theory
about how to form and change belief, and how to act rationally. The
sources for central ideas in decision theory are brief passages in Pas-
cal's Pensees; the subjective or personal interpretation of probability
66 CHAPTER 5

as individual degree of belief or opinion is given in few, if any, places


so clearly as in Hume's Enquiry Concerning Human Understanding.
The use of conditional probability in inverse inference was, of course,
developed mathematically by Thomas Bayes, but Bayes' essay and his
idea were published and made known by Richard Price as a response
to Hume's skepticism. The notion of a general measure of well-being,
utility, was developed in the writings of Bentham and Mill, and elab-
orated for morals by Sedgewick. Utility and probability as degrees of
belief were reconceived and combined by Frank Ramsey in 1924. Ram-
sey's single essay contains in outline most of the ideas of the theory of
rationality upon which entire branches of contemporary social science
are founded. The workaday efforts of Bayesian statisticians, decision
theorists and econometricians produce consequences and applications
and variants of that philosophical theory; it was simply given to them.
If part of philosophy is to judge the answers to philosophical questions,
and if we judge an answer to a question by seeing to its implications and
presuppositions, then much of contemporary social science and statis-
tics is part of the enterprise of philosophy, no matter how remote these
disciplines may be from the profession of philosophy.. The relation is
symmetric; no philosopher who now wishes to address the ancient ques-
tions of rational action and change of belief can do so intelligently while
in ignorance of the fruits, some bitter, some sweet, of Ramsey's theory.
2. Aristotle's Prior Analytics and his Posterior Analytics present a the-
ory of demonstration whose centerpiece is the theory of the categorical
syllogism, the first answer to the questions: What is a proof? Why do
proofs show necessity? Is there a means to determine whether or not
claims necessitate other claims? Through the work of commentators
on his metaphysics, nota.bly Porphyry and Boethius, Aristotle's system
also gave birth to combinatorics (at least in Europe; in India and China
the same results had other sources). A broken but robust tradition of
logical investigation extends from Aristotle to Leibniz, who combined
proof-theoretic and combinatorial ideas into an early algebra of logic.
The real advances in theories that answer these questions came in the
19th century with the work first of Boole, then Frege. Boole and Frege
were each mathematicians by profession, but they each labored in aid
of answers to philosophical questions about mathematical and scientific
knowledge. For Boole the questions were closely related to Aristotle's,
and concerned how mathematical demonstration might aid in the infer-
ence of causes; for Frege, as for Plato, the fundamental questions con-
cerned the very nature of mathematical knowledge. Frege's enormous
achievement was continued early in this century by men who applied
their mathematical abilities in aid of philosophical projects. The result
PHILOSOPHY AND THE ACADEMY 67

of these efforts was modern mathematical logic, and the modern for-
mulations of the theory of sets; Aristotle and Leibniz would recognize
modern proof theory and semantics as answers to their questions, and
they would recognize the theorems of model theory and proof theory as
illuminations of those answers.
3. The theory of computation is another fruit of philosophical inquiries
into the nature of mathematical knowledge. The modern theory of
computation rests on a particular answer to the question: What is it
for a function to be computable? That answer, given by Alonzo Church
and Alan Turing, was only possible because of work in logical theory
in the first third of this century, work prompted in large part by issues
in the philosophy of mathematics. In the 1930s, Church made Prince-
ton's philosophy department the birthplace of the modern theory of
computation.
4. Cognitive science is an enterprise whose practitioners aim to understand
human cognition as the result of computational procedures executed by
an organic computer, the human brain. Its practitioners are principally
psychologists and computer scientists, and they come in many vari-
eties. Ultimately, the ambition is to understand computationally how
from infancy one forms a conception of the world and an understand-
ing of language, acquires other skills, and solves the myriad "problems"
whose instances adults face in practical life. The computational concep-
tion of mind has a long philosophical ancestry; one can find passages
in Hobbes, for example, that clearly endorse it. But the very prac-
tice of giving explicitly computational theories of cognitive capacities
is equally indebted to the philosophical tradition. The first explicitly
computational theory of cognitive capacities is Rudolf Carnap's Der
Logische Aufbau der Welt. Carnap's book offered an account of how
concepts of color, sound, place, and object could be formed from ele-
ments consisting of gestalt experiences and a relation ("recollection of
similarity") between such experiences. The theory was given as a logical
construction, but also as what Carnap called a "fictive procedure". The
procedural characterization is in fact a series of algorithms that take as
input a finite list of pairs of objects (the "elementary experiences") such
that there is a recollection of similarity between the first and the second
member of each pair. The book was of course written before there were
computers or programming languages, but it would nowadays be an
undergraduate effort to put the whole thing into LISP code. Carnap's
role in the genesis of cognitive science continued through his students;
Walter Pitts, who was instrumental in the development of neural nets
as computational devices, and Herbert Simon.
68 CHAPTER 5

5. There is a booming subject in computer science that studies the possi-


bilities and limits of computational systems that learn. Computational
learning theory has a philosophical history that can, without artifice, be
traced to Plato, but its modern source is in Hans Reichenbach's attempt
to fashion a reply to Hume's skepticism about induction. Reichenbach
posed learning problems as the task of learning the limiting value of an
infinite sequence fr from increasingly larger initial segments of the se-
quence. He construed "learning" as forming a sequence T of conjectures
about the value of the limit of fr such that T converges to the same limit.
Hilary Putnam combined Reichenbach's set up with recursion theory to
create difficulties for Carnap's confirmation theory: for any Carnapian
measure, there is an hypothesis that never receives confirmation above
1/2, even though only positive instances of the hypothesis are obtained.
About the same time E. Mark Gold independently formulated the same
framework, and Putnam and Gold independently and simultaneously
published essentially the same results about limiting recursion theory.
Most of computational learning theory has developed from this work.
In these examples, a philosophical theory with sufficiently rich struc-
ture has been embraced as an analysis or explanation or norm, and pursui'ts
carried out within that philosophical theory have become other disciplines.
Philosophy has not been abandoned in these enterprises but embraced. To
accept an answer as standard, and as a great advance, is not to accept the an-
swer. Modern logic was created by Frege's work; it did not stop with Frege's
work, and modern logicians do not confine themselves to working out the
consequences of Frege's systems, just as modern statisticians do not confine
themselves to working out the consequences of Ramsey's system. If philos-
ophy requires constantly re-examining assumptions, then that requirement
is met by reflective practitioners of the disciplines that have sprung from
philosophical theories.
The enduring questions of philosophy have come to be integral parts of
other subjects, woven thoroughly into other disciplines. The theory of proof
and demonstration, which formed the center of epistemology from Aristotle to
Descartes, has become part of mathematics itself, done as well, and more, in
departments of mathematics and computer science as in departments of phi-
losophy. The philosophy of science, within which tradition located the theory
of demonstration, has become a distributed subject; physicists write philoso-
phy of physics as well now (and about as much) as professional philosophers,
statisticians consider foundations of inductive inference as often as philoso-
phers, and philosophical biologists have become commonplace. The philos-
ophy of mind as Hume and Kant practiced it is done now by psychologists
released from the chains imposed on them by behaviorism. Philosophical
PHILOSOPHY AND THE ACADEMY 69

logic, metaphysics, and epistemology are done more now in departments of


computer science than in departments of philosophy, and done almost exactly
in the way that many philosophers pursued those subjects twenty and thirty
years ago. The evidence is easy to find: walk through Carnegie Mellon, Stan-
ford or MIT and see what is written where the walls turn black (or green).
Ethics is now a subject pursued by economists, taught in medical schools and
schools of engineering, and even, shudder as one may, in schools of business.
The deference and division of labor is quite gone.
So philosophical has the work of many sciences, especially the social
sciences, become, that debts have come to be reversed. The writings of
many philosophical celebrities of the last twenty years show their debt to
the "positivist" social sciences. Donald Davidson's influential and celebrated
essays on the philosophy of mind are the result of intelligent and insightful
reflection on what results if we think of the mind as operating on decision
theoretic principles. The central argument of Rawls' A Theory of Justice is
borrowed from decision theory as well. One cannot even pretend that one
is carrying philosophical news to benighted social scientists or psychologists.
The latter gave Davidson's writings little response, but they were not intended
to: the formal sources of the philosophy were suppressed. A Theory of Justice
found criticisms as trenchant (if not as lengthy) from economists as from
philosophers.
So philosophical questions and real philosophical work have been taken
up by many who do not profess the subject. Whether in thinking about
rational action, or rational change of belief, or the nature of proof, or the
nature of logical necessity, or the structure of the computable, or the limits of
the learnable, the practitioners of other disciplines have proceeded to explore
the implications of a philosophical position with mathematical rigor, and
have not shied from the complexity that such rigor may 'bring. The results
have been spectacular. So spectacular, that whether or not the dominant
philosophical theories are endorsed, no one can any longer claim to have
thought intelligently about any of these philosophical topics unless the results
of the practitioners of the various scientific disciplines are systematically taken
into account. Philosophy is faced with a simple choice: Either master the
methods and results oflogic, probability theory, decision theory, computation
theory, and many related subjects, and use them, or abandon whole batteries
of traditional philosophical issues.
Attempts at compromise between these two alternatives seem chiefly to
produce triviality. Recently I observed an exchange between my colleague in
the computer science department at Carnegie Mellon, Allen Newell, and a
very prominent philosopher. In a public lecture, the philosopher argued that
the business of epistemology is to determine the processes by which humans
70 CHAPTER 5

form belief, and to determine the circumstances and limits of their reliability,
and he argued that these are entirely scientific questions. Newell asked what
would seem to serious people an obvious question: Since the philosopher
thought of the theory of knowledge as that kind of scientific enterprise, and
since he claimed to be interested in the theory of knowledge, why did he not
have a laboratory in order to pursue the subject, or failing that, why did
he not pursue mathematical theorems about the limits of the learnable? I
think no one was impressed with the philosopher's defense that epistemology
requires a division of labor, and he had done his part in telling us what
epistemology is. The philosopher had, in effect, made clear one instance
of the choice that confronts the members of his profession, but he had not
chosen.
To the choice, one natural response of those who profess philosophy has
been to seek a certain insularity, and thereby to gain the distinction of phi-
losophy from other subjects. Nothing serves this purpose better than the
professionalization of the history of philosophy, a subject that no one other
than professional philosophers (and the odd and courageous classicist) wishes
to pursue. Admirable and interesting as such historical concerns may be, the
professionalization of the history of philosophy serves the function of disen-
gaging philosophy from the consequences of the theories it has produced. It
is a means of taking the second alternative, and abandoning the attempt to
seriously address many of the fundamental and traditional questions of phi-
losophy. Another, perhaps less common, response has been to join arms with
the meta-literati, professors of literature and history and modern languages,
and, as Richard Rorty urges, practice dropping names, and hinting at big
pictures. Still another raft to insularity is to adopt a general skeptical posi-
tion and use it to dismiss without further examination the century's work in
logic, probability, psychology and the social sciences.
There is also a segment of the community of professional philosophers
who have instead chosen to try to seriously address traditional philosophi-
cal questions in the light of what we know now. Their work presupposes a
knowledge of relevant parts of 20th century science and mathematics that aid
in philosophical pursuits and that explore the implications of philosophical
theories. They make little effort to hide their sources, concerns or methods by
banishing every equation and symbol from their papers. They view the his-
tory of their subject as enmeshed with other histories. They quite accept that
no deference is owed them because of their profession, and quite reject any
disdain founded on the same fact. They seek no disciplinary excuse to ignore
most of the science that bears on their questions. Such philosophers suffer
occasionally from the peculiar awkwardness of philosophy as an academic de-
partment. They are told, if they listen, that they are not humanists. The
National Endowment for the Humanities is constituted more or less deliber-
PHILOSOPHY AND THE ACADEMY 71

ately to exclude them, and Departments of English hold them in contempt.


But of course they are humanists, if the pursuit of philosophical questions is
a part of the humanities. More importantly, they remain philosophers, the
kind of workers among contemporaries in that profession with whom Aristo-
tle and Leibniz and Hobbes and Hume would find an interesting conversation
germane to their concerns.
Hume would have time for the occasional cocktail party, I am sure, but he
would tire of it soon enough, and want to know what had become of Reverend
Bayes' ideas; Aristotle, having not been acquainted with Mrs. Onassis, would
find himself without really big names to drop. He might settle happily for
some news about theories of proof. So should we.
Clark Glymour
Department of Philosophy
Carnegie Mellon University
Pittsburgh, PA 15213
PART II.

WORKING.

- to claim no infallibility
and to exempt no proposed solution
to a problem from intense criticism.
Ernest Nagel
PALE FIRE SOLVED

DAVID CARRIER

Such diverse philosophers as Davidson, Derrida and Goodman have given ar-
guments supporting the claim that unrevisable interpretations of artworks are
impossible. Although this view has been supported by appeal to radical his-
toricism, Heidegger's account of language and the deconstructionists' texts,
it may be defended by quite respectable philosophical arguments. An ideal
interpretation, Alexander Nehamas writes, "would account for all the text's
features"; to interpret is to place that text "in a context which accounts for
as many of its features as possible." 1 All interpretations thus are partial for
the ultimately trivial reason that, just as no map can represent all features
of what it maps, so "no reading can ever account for all of a text's features."
To interpret is "to understand an action" and this to "understand an agent
and therefore other actions and agents as well. .." Hence "each text is inex-
haustible: its context is the world." Just as there is no way that the world
is independently of how it is described, so there is no way that an artwork
is apart from how it is interpreted; different descriptions of the world or an
artwork point out different features or describe differently given features of
those entities.
An artwork, unlike the world, is the creation of an agent and so crit-
ics who reject this view of interpretation often appeal to some notion of the
artist's intentions; an artwork means, they assert, what the artist intended
it to mean. Thus E.D. Hirsch distinguishes the meaning and significance of
an artwork. 2 An artwork can have one meaning for its creator, and another
for later audiences who, placing it in different contexts, may understand it
differently. Unless we could thus distinguish meaning and significance, Hirsch
argues, we could not coherently practice interpretation. But this argument,
Nehamas' account implies, does not rule out multiple interpretations. If the
"author is postulated as the agent whose actions account of the text's fea-
tures," then to interpret is to imagine an agent who might have produced that
text; and the known facts about the actual author do not uniquely define the
qualities of this agent. We can, of course, sometimes appeal to the artist's
account of his or her intentions; but just as my own account of my actions
need not be correct, so the artist is not a priviledged interpreter of his or her
artwork.
This very general argument has guided my study of interpretation in
the visual arts. In a series of papers on old master painters and a book on

75
W. Sieg (ed.), Acting and Reflecting, 75-87.
© 1990 by Kluwer Academic Publishers.
76 CHAPTER 1

art criticism, I study the practice of interpretation in art criticism and art
history.3 This research requires much discussion of actual interpretations and
many visual references, unwieldly materials for a presentation in this context.
Here I further explore this argument using a relatively well known, easily
accessible novel which, since it has not been interestingly interpreted, provides
a good vehicle for raising these theoretical issues. My aim is not to do literary
criticism, but to use this example to study the theory of interpretation.
Vladimir Nabokov's Pale Fire is a famously puzzling classic. I have
read some thirteen substantial accounts; many others exist and since there
are journals devoted to the interpretation of modern literature, many more
will appear. 4 The problem all interpreters encounter is very simple. The
book consists of four parts: a poem in Popeian couplets, "Pale Fire," by
an imaginary poet, John Shade; a foreword, commentary and index by an
imaginary literary critic, Charles Kinbote. The poem is about the death of
Shade's unhappy daughter and his reflections on the possibility of immortality
and the commentary, the reader quickly discovers, seems only tangentially to
refer to the poem. Kinbote, we learn, is in exile from Zembla, a kingdom
which once he ruled; and he, a great admirer of Shade, talked to the poet
about Zembla and expected "Pale Fire" to be about that kingdom. But since
it is not, his commentary is actually an autonomous novella.
The evidence for this analysis is straightforward. Kinbote is a parody
of the academic commentator. Shade's lines, a curio: RED SOX BEAT
YANKS 5-4, ON CHAPMAN'S HOMER. .. ,(97-8) a famous sports headline,
he reads as "a reference to the title of Keats' famous sonnet" (75).5 Given
an anonymous note, "you have haL ...s real bad, chum" (63) he concludes
that the word is hallucinations, misspelt; an apter possibility, halitosis, has
the right number of letters. Given Nabokov's own experience of American
academic life, and his practice of literary criticism, it is natural to expect him
to make such a commentary.
Whether we then find Pale Fire very funny-"though our knowledge
about it may increase tenfold, the essential mystery will remain intact" or
find it "a mirthless hoax," a sort of "long tiresome game of Scrabble" is a
matter of taste. 6 "Pale Fire" is often beautiful, and the novel is funny; and
we may enjoy them even if the commentary has little to tell us about the
poem.
In fact, Pale Fire is a puzzle with a unique solution, and since none of its
commentators have understood this, it is not surprising that their accounts
are unhelpful. My argument relies, first, on study of one precedent, the
explicit presentation of such a puzzle in Lolita, and then on the evidence
with in Pale Fire. Only the commentary on Pale Fire is original.
PALE FIRE 77

The second part of Lolita is an elaborate detective story? Her ex-lover,


Humbert Humbert seeks to decipher the clues left by the man who abducted
her. We are given many references to French and English literature; even li-
cense plates, WE 1564, SH 1616-the dates of Shakespeare-are used. Hum-
bert cannot solve the puzzle, and when he does find Lolita because she writes
to him, the solution is given in a way which further teases the reader. 8
"Do you really want to know who it was? Well, it was -" And softly ... she emit-
ted a little mockingly ... the name that the astute reader has guessed long ago.
Waterproof. Why did a flash from Hourglass Lake cross my Consciousness? (248)
She is interrupted in mid-phrase, but if we recall that 165 pages earlier
Humbert's waterproof watch was mentioned, we read another dialogue:
"I once saw ... two children ... right here, making love ... , Next time 1 expect to see
fat old Ivoe in the ivory, He really is a freak ... last time he told me a completely
indecent story about his nephew. It appears-" "Hello there," said John's voice.
(83)
This dialogue, too, is interrupted in mid-phrase; but that nephew, we
realize when we find some further clues, was Lolita's abductor. His novel,
Nabokov clearly indicates, is a puzzle, an artifact constructed so that here
everything fell into order,
"into the pattern of branches that 1 have woven throughout this memoir with the ex-
press purpose of having ripe fruit fall at the right moment ... of rending that golden
and monstrous peace through the satisfaction of logical recognition ... ". (248)
Nabokov tells us that Lolita is a puzzle, gives the information needed
to solve it, and finally tells the solution. He does not say that Pale Fire is a
puzzle and so here the argument must be indirect. If it is a puzzle, solving that
puzzle should explain the central problem of the book, the relation between
"Pale Fire" and Kinbote's texts. Though the poem is a fiction about a real
place, Appalachia and the commentary about an imaginary kingdom, things
pass from one of these worlds to the other.9 The Zemblan terrorist Jakob
Gradus becomes the lunatic assassin, Jack Gray; King Charles of Zembla-
Gradus' intended victim-turns into an eccentric Professor, Chales Kinbote,
who fails to prevent the murder of Shade. Kinbote translates from Zemblan
back into English Timon Afinsken:
The sun is a thief: she lures the sea and robs it. The moon is a thief: he steals his
silvery light from the sun. The sea is a thief: it dissolves the moon.
This is a recognizable paraphrase of Timon of Athens (IV,3,441-5): Sun
and moon change sexes, and the account of the sun/moon/sea obviously
relates to the rivalry discussed by Kinbote between his commentary and the
poem it ostensibly reflects.
78 CHAPTER 1

The sun's a thief, and with his great attraction robs the vast sea; the moon's an
arrant thief, and her pale fire she snatches from the sun; The sea's a thief, whose
liquid surge resolves the moon into salt tears.
"Silvery light" is a synonym for "pale fire" but since the poet is interested
in more than a synonym, not surprisingly Kinbote cannot locate the poem's
title in Timon Akinfksen. Much of the novel involves such word play, as when
Kinbote notes that the name of Shades' murderer is hidden in the last line of
the poem, "Alike great temples and Tana gra dust (155)"
When Shade, led by a newspaper report of a mystical experience like
his of an image of "a tall white fountain," (707) finds that ''fountain'' was
a misprint for "mountain"; when, if we look up 'word golf' in the index we
read:
Word golf, S's predilection for it, 819; see Lass.
Lass, see Mass.
Mass, Mars, Mare, see Male.
Male, see Word golf.
Such play may seem trivial, but only attention to such details permits solu-
tion of the puzzle. 10 When, for example, Shade writes: Help me, Will! Pale
Fire. (961-2), Kinbote correctly concludes this means, "look in Shakespeare"
and finds in the Zemblan version of Timon of Athens "nothing that could be
regarded as a equivalent of 'pale fire'" (191-2). What is lost in the transla-
tions of Timon of Athens into Zemblan and back into English was just those
particular words that Shade needs.
Here, then, are my groundrules: if there is a puzzle to be solved, Pale
Fire must clearly identify it; if the puzzle exits, it must be solvable with
available clues; when it is solved, it must tell us something important about
the book.
King Charles escapes while the revolutionairies look for the crown jewels,
whose location is discussed three times: the queen inquires about them-"he
revealed to her their unusual hiding place, and she melted in girlish mirth"
(142); Kinbote says that they are not in the palace but "were, and still are
cached in a totally different-and quite unexpected corner of Zembla" (163);
and Gray deposits his raincoat and suitcase in "a station locker-where, I
suppose they are still lying as snug as my gemmed scepter, ruby necklace
and diamond-studded crown in-no matter where" (185). As in Lolita, a key
dialogue is interrupted in mid course. Looking in the index,
Crown Jewels 130, 681; see Hiding place;
Hiding Place Potaynik (q.v.);
Taynik Russ., secret place; see Crown Jewels.
PALE FIRE 79

In response to an interviewer, Nabokov said that they are hidden "in


the ruins of some old barracks near Kobaltana (q.v.)", for which the index
entry reads; "Kobaltana, a once fashionable mountain resort ... now a cold
and desolate spot of difficult access and no importance ... not in the text."ll
The location of the jewels is a mystery. Is there within the book evi-
dence giving their location? Learning that they are in an unexpected corner
of Zembla is an important clue. Appalachia and Zembla are different, but
not unconnected worlds. Consider how some things move from one to the
other. Responding to Shade's musings on death, Kinbote says-"The ideal
drop is from an aircraft" (148); and we later learn that in Zemblan Kinbote
means 'regicide'. 'When King Charles parachutes from a plane into America,
the narrative voice conspicuously shifts from the third person, "he descended
by parachute from a chartered plane" (165) to first-"while ... the chauffeur ...
was doing his best to cram the bulky and ill-folded parachute into the boot,
I relaxed on a shooting stick" (166)-signaling the transformation of Charles
into Kinbote as he moves from Zembla to our world. We can, similarly,
trace the movement of Kinbote's copy of Timon Afinsken from Zembla to
this world. Charles was imprisoned in a room whose closet, he discovered
as a child, contained that book and a secret passage way leading out of the
palace to the Zemblan theater. As he re-enters that closet to escape "an
object fell with a miniature thud; he guessed what it was and took it with
him as a talisman" (87); this object, we may reasonably infer, was that book.
Finally, Nabokov's reference to Kobaltana points to another connection be-
tween Zembla and this world. Kinbote writes his commentary in a mountain
resort, difficult to find, where he had planned to come to follow Shade; a
town in "Wyoming or Utah or Montana", in "Utana on the Idoming border"
(121). Could Utana be the equivalent in this world of the Zemblan resort
Kobaltana?
Philosophers have noted that a person or thing can be identified in
another possible world only by picking out those qualities which constitute
its essence. 12 The crown jewels are in some corner of Zembla which, since
Appalachia and Zembla are not disjoint spaces, could be in our world. To
locate them, we must identify their essential properties. Crown jewels are,
essentially, precious things. What is most precious to Kinbote is the text of
"Pale Fire" which he takes from Shade as the poet is murdered, hides at
the bottom of a closet, "from which I exited as if it had been the end of the
secret passage that had taken me all the way out of my enchanted castle and
right from Zembla to this Arcady" (198); and presents, after fending off rival
editors, with this commentary. 13 In Edgar Allen Poe's "The Purloined Letter"
the police search a house for a stolen letter which, cleverly, is hidden by being
placed in plain sight; in Pale Fire, analogously, what is most precious is that
text of "Pale Fire" which is in our hands. For the reader who, convinced that
80 CHAPTER 1

the crown jewels can be found, reads forward and backward looking for clues,
the moment of realization that he unknowingly has been holding what he was
searching for, the text of the poem, cannot but be comic.
The earlier interpreters could not convincingly explain the relation be-
tween "Pale Fire" and the commentary. If that commentary is only occasion-
ally relevant to the poem, then the book is but a modest joke. And while
it is possible to admire poem and commentary as autonomous works, then
there is no real sense in which they are one artwork, Pale Fire. Normally
a commentary is about a text when it tells us the meaning of that text; for
example, it translates foreign words, annotates obscure names and explains
obsolete slang. In that sense, as Kinbote recognizes, his commentary is not
about "Pale Fire"
"what did I have? An autobiographical, eminently Appalachian, rather old-fashion
narrative ... but void of my magic ..." (200).
But since the description of an artwork is not exhausted by an ac-
count of its content-for that description does not tell us how that content is
represented-a commentary may also tell us how to think of that content. To
learn that "Pale Fire" is the crown jewels is to learn that this poem describing
mundane events is a precious thing. Nabokov repeatedly identifies the artist
with the magician: he describes the baffling mirror, the black velvet back-
drop, the implied associations and traditions-which the native illusionist ...
can use (Lolita, 288).
In Pale Fire both Shade and Kinbote describe art thus:
It sufficed that I in life could find
Some kind of link-and-bobolink, some kind
Of correlated pattern in the game,
Plexed artistry and something of the same
Pleasure in it as they who played it found (812-5)
Kinbote views
Shade perceiving and transforming the world, taking it in and taking it apart, re-
combining its elements in the very process of storing them up so as to produce at
some unspecified date an organic miracle, the fusion of image and music, a line of
verse (11),
a process he compares to a conjurer's trick. Like the magician, the artist is a
puzzle creator who makes everyday things appear magical.
Kinbote's commentary tells us not what, but how "Pale Fire" presents
its content, and thus it is amusing that so many commentators have admired
or hated the book without understanding it. Sometimes paintings have been
praised for their organic unity when, it is later discovered, they were mutilated
PALE FIRE 81

or enlarged after the artist's death; some site-specific works were admired in
ignorance of the fact that they were intended to be seen from a particular
vantage point; Rembrandt's The Nightwatch was thus mis-named because of
accumulated dirt, which now has been removed; and often abstract paintings
are reproduced upside down. 14
To read Pale Fire as poem plus unconnected novel is, I claim, as badly
mistaken. This discovery may change our evaluation of the book. Is Nabokov
a terribly uneconomical artist who, to conceal his puzzle, presents so many
distracting clues; or an admirably subtle creator of an artwork which was for
long time incorrectly read? Such a re-evaluation will require further debate.
As aesthetician, I am interested in what his novel tells us about the theory of
interpretation, and here recollecting my response to this discovery is useful.
Originally I concluded that I had produced an unrevisable interpretation.
That made me feel proud. Not only had I shown all the earlier interpreters
to be incorrect; I had produced a counter-example to the thesis that there
are no unrevisable interpretations. On reflection, I recognized that both of
these claims could not be correct; my response, rather, showed that an agent
need not be in the best position to understand his own activity. Although I
had recognized a feature of Pale Fire not noticed earlier, that did not mean
that interpretation of the book would cease; on the contrast, even I was
led to further interpret. Nor was it clear that I had the right to be proud.
Discovering the facts, unlike producing an original interpretation, requires no
especial brilliance. What exactly had I accomplished?
Compare a literary work with a chess puzzle. If white can win in five
moves, after sacrificing his queen and bishops, we are not interested in other
possible moves. That solution, when discovered, is unique, which means that
the puzzle is exhaustible. We think that the creation and interpretation of
art is a more exalted activity than the devising and solving of puzzles. Chess
puzzle creators obey the rules of the game; Nabokov created the rules for his
literary games. Once a puzzle is solved, there is nothing more to say about
it. Once a novel is recognized to be a puzzle and solved, interpretation can
continue. Consider Lolita, for example.
Solving the puzzle, and learning the identity of her abductor changes how
we think of Humbert's obsession. What is perverse about him is not that he
desires young girls, but that for him in love there is no reciprocity; he desires
her, but not that she desire to be desired by him. Such sexuality, Thomas
Nagel has argued, is intrinsically perverse, unlike sex involving the wrong,
or wrong kind of partner. 15 But because Lolita is a puzzle, we readers must
attend as closely to it as does Humbert to Lolita; only that close attention
permits us to decipher the puzzle. The phrase 'body of the text' nicely
emphasized the parallel between his obsession with her body and the reader's
82 CHAPTER 1

attention needed to gather the clues. Once the reader recognizes that he or
she has thus become like Humbert, how Humbert's perversity is thought of
changes. And whether we are then alarmed at our identification with him, or
more understanding of his perversion, in any case our view of the book and
ourselves has changed.
Recognizing that Pale Fire is a puzzle, we may reasonably expect that
future interpretations of it will be different. Here a parallel with the use of
facts in interpretation of artworks is helpful. When Caravaggio's birth date
and the year of his first major public commission were determined, then it
was no longer possible as the leading authority had asserted to believe that
work the product of a prodigy. Caravaggio was in his late 20's when he did
this painting. 16 But knowing that fact, though it did require the r~vision of
earlier accounts, did not mean that interpretation ceased. Future accounts,
and there have been many, must be consistent with this fact. Similarly,
discovering that Pale Fire is a puzzle does not foreclose interpretation, but
only requires that later interpretations all be consistent with this fact.
This analysis preserves the fact/interpretation contrast. Pale Fire is a
puzzle, and that fact about the work interpreters must now take into account.
Is there a certain abritrariness in this procedure? Were I rather to say that I
have produced a new interpretation of that work, then if I am correct I would
have an unrevisable interpretation. Only commitment to the doctrine that
interpretations always, it could seem, are revisable, leads me to classify this
analysis as the discovery of a fact rather than of an unrevisable interpretation.
What is unrevisable, a wit might say, is the claim: interpretations always are
revisable.
This wit reasons poorly. The fact/interpretation distinction implies that
to speak of a fact rather than an interpretation is to indicate that all future
interpretations must be consistent with the fact that I have discovered. Once
a chess puzzle is solved, there is nothing more to say; once I solved Pale Fire,
the way is opened to new interpretations. Precisely because it does not lead
us to discard the fact/interpretation distinction, my solution tells us some-
thing interesting about that distinction. To discover a counter-example to the
claim that interpretations always are revisable is impossible because when we
produce an unrevisable account we call it a factual discovery. That contrast
is not arbitrary, but marks a category distinction. It is a fact that Caravag-
gio was born in 1571, and a fact that Pale Fire is a puzzle; interpretation of
Caravaggio's painting and Nabokov's book is guided by these facts. A fact
is not unrevisable but true or false; an interpretation, rather, is suggestive,
plausible, and original, or not. This category distinction reflects the different
ways that facts and the body of existing interpretations guide further study
of an artwork. A new account of the facts leads to the rejection of earlier
PALE FIRE 83

conflicting accounts; a new interpretation must be consistent with the facts,


but takes issue with existing interpretations. These facts can be disregarded
only if we can discover that they are not facts.
A well-known sequence of interpretations of Poe's "The Purloined Let-
ter" illustrates the problems created by eliding the fact/interpretation dis-
tinction. 17 Lacan allegorizes the novel as an oedipal drama; Derrida replies
that there are not three, but four characters in the story; and Barbara John-
son argues that Derrida too fails to properly place the work in context. This
activity of re-describing the context of this story can go on indefinitely. It is
one of his detective stories; and American story much admired by the French;
part of Poe's oeuvre; a work of mid-nineteenth century literature; .. ,113 There
are an indefinite number of ways that it can be placed in context and so an
indefinite number of different ways that it can be interpreted. But such novel
accounts do not therefore show that earlier interpretations were incomplete,
for here the notion of completeness is irrelevant.
Just as we revise interpretations, so we change our judgment about the
facts. Just as further documentation might demonstrate that Caravaggio was
not born in 1571, so additional evidence could show that I am wrong to believe
Pale Fire a puzzle. Additional documentation from Nabokov's posthumous
notes could, perhaps, show that I am wrong about the facts. But such argu-
mentation differs in kind from argumentation about an interpretation. Many
historians have asserted that Caravaggio's early paintings are homoerotic; al-
though no writer before 1951 so described them, they certainly seem to show
enticing young men. But some critics hold that these are illusionistic or alle-
gorical images and perhaps were not intended to be erotic. So argumentation
continues, and it is possible that there is no knock down evidence permitting
a choice between these interpretations. By contrast, argumentation about
Caravaggio's birthdate could begin again only if new documentation were
produced.
My reading of Pale Fire, and my interest in alternative interpretations,
owes much to Arthur Danto's The Transfiguration o/the Commonplace. 19 His
thesis is that visually identical artifacts are, when placed in differing contexts,
interpreted differently. So, when in the 1960's Andy Warhol made a Brillo
box and placed it in an art gallery, he created a quite different thing from the
physically identical box in a grocery; unlike the grocery Brillo box, his was
an artwork. We might think of "Pale Fire" in a related way. By itself, it is
an not unusual poem; within Pale Fire, it is part of the puzzle I have solved.
Within Kinbote's commentary a Dantoesque example occurs. He refers
to Robert Frost's poem with "two closing lines identical in every syllable, but
one personal and physical, and the other metaphysical and universal" (136).
84 CHAPTER 1

This is ironic since Nabokov disliked the work of Frost, whose "Stopping by
Woods on a Snowy Evening" concludes:
The woods are lovely, dark and deep
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.
Nabokov's name does not appear in the index to The Transfiguration of
the Commonplace, but these lines, Danto has said, were "one of the impulses"
for his book, "though ... in true Nabokovian fashion, the example disappeared
from the manuscript it helped inspire."20
My fact/interpretation distinction develops Danto's idea that to inter-
pret is to put into context. Where I disagree with him, perhaps, is in allowing
that often there are different ways of constructing that context. A fact about
an artwork is internal to it; facts are facts in any context. Constructing a
context provides one revisable way of interpreting a work; one work may
be put in various contexts. Visual artworks here provide helpful examples.
When it was discovered that a well known Pontormo painting praised by some
historians as a beautiful organic whole was but part of a larger, now partly
destroyed work, it became clear that the earlier interpretations, since they
described but a portion of Pontormo's painting, were probably incorrect. 2!
That what remains is but a fragment is a fact about that painting which any
further interpretation must take into account. My inside/outside metaphor
underlines the different ways facts and interpretations enter into such debate.
A fact is a feature of the artwork itself; an interpretation, one way of putting
that work in context. Whatever context the work is put it, an interpreta-
tion must be true to the facts; but it is possible to find different ways of
constructing contexts. Every correct interpretations of Pale Fire must, I be-
lieve, be consistent with the fact that the crown jewels are "Pale Fire"; but
since an interpretation accounts for only some of the facts about a work, an
interpretation need not even mention those jewels.
It is belief in this fact/interpretation distinction-a spatial metaphor-
which separates me and Danto from the deconstructionists. Louis Althusser
attacks just this distinction in terms readers of Philosophy and the Mirror of
Nature will find familiar: "the paradox of the theoretical field is that it is
an infinite because definite space, i.e., it has no limits, no external frontiers
separating it from nothing, precisely because it is defined and limited within
itself, carrying in itself the finitude of its definition which by excluding what
it is not, makes it what it is." 22
Such spatial metaphors, he concludes, are to be rejected. The problem
he and his fellow structuralists and poststructuralists then face is to establish
PALE FIRE 85

some notion of validity of interpretation. My opposed argument, here and


elsewhere, is that studying artworks requires both establishing the facts and
interpretation, and that though all interpretations must be true to the facts,
any given interpretation may be replaced or supplemented by another. Thus,
I claim, in interpretation is truth valued. 23

NOTES

1 Alexander Nehamas "The postulated Author: Critical Monism as a Regu-


lative Ideal," Critical Inquiry, 8, 1 (1981) pp. 144, 148, 149.
2 E.D. Hirsch, Validity in Interpretation. (New Haven and London: Yale
University Press, 1967).
3 "Manet and his interpreters" Art History, 8, 3, (1985), pp. 320-35; "Ekphra-
sis and Interpretation: Two Modes of Art Historical Interpretation" The British
Journal of Aesthetics, XXVII, 1 (1987):20-31; Artwriting. (Amherst: Univ. of
Mass. Pr, 1987)
• See Julia Bader, Crystal Land: Artifice in Nabokov's English Novels. (Berke-
ley: Univ. of Calif. Pr, 1972); Laurie Clancy, The Novels of Vladimir Nabokov.
(New York: St. Martins, 1984); Andrew Field, Nabokov: His Life in Art. (Boston:
Little, Brown & Co., 1968); David Galef, "The Self-Annihilating Artists of Pale
Fire," Twentieth Century Literature, 31 (1985), pp. 421-37; H. Grabes, Fictitious
Biographies: Vladimir Nabokov's English Novels. (The Hague: Mouton, 1977);
L.L. Lee, Vladimir Nabokov. (Boston: Twayne Pub, 1976); John Lyons' contribu-
tion to L.S. Dembo ed. Nabokov: The Man and his Work. (Madison: Univ. of
Wis. Pr, 1967); David Packman, Vladimir Nabokov: The Stucture of Literary De-
sire. (Columbia & London: Univ. of Missouri, Pr, 1982); Peter J. Rabinowitz,
"Truth in Fiction: A Reexamination of Audiences", Critical Inquiry, 4 (1977)
pp. 121-41; Alden Sprowles' contribution to C. Proffer ed., A Book of Things about
Vladimir Nabokov. (Ann Arbor: Ardis, 1974); Page Stegner, Escape into Aesthet-
ics: The Art of Vladimir Nabokov. (New York: Dial Pr., 1966); Tony Tanner, City
of Words: American Fiction 1950-1970. (New York: Harper & Row, 1977), CH. 1.
The one useful account, from which I do borrow, is Mary McCarthy, "A Bolt from
the Blue," reprinted in her The Writing on the Wall and Other Literary Essays.
(New York: Harcourt, Brace & World, (1970), pp. 15-34.
5 Vladimir Nabokov, Pale Fire. (New York: Berkley Books 1968); all refer-
ences included in the text, the poem referred to by line and Kinbote's Foreward
and Commentary by page.
6 Stegner, Escape, p. 131; Field, Nabokov, p. 315; Hugh Kenner, A Homemade
World: The American Modernist Writers. (New York: Knopf, (1975), p. 211.
7 My account of Lolita is drawn entirely from Carl Proffer, Keys to Lolita.
(Bloomington: Indiana Univ, Pro 1968); all references included in the text.
8 Vladimir Nabokov, Lolita. (New York: Berkeley, 1966); all references in-
cluded in the text.
9 McCarthy discusses many of these inversions.
86 CHAPTER 1

10 Nabokov was deeply distrustful of psychoanalysis and so it is interesting to


note the parallels between his interest in world play and Freudian views of language;
here the most useful account is Arthur Danto, "Freudian Explanations and the
Language of the Unconscious," J. Smith ed., Psychoanalysis and Language. (New
Haven: Yale Univ. Pr, 1978), pp. 325-53.
11 Vladimir Nabokov, Strong Opinions. (New York: McGraw-Hill, 1973),
p. 92
12 See Saul Kripke, Naming and Necessity. (Cambridge: Harvard Univ. Pr.,
1980). This claim is of course, inconsistent with his theory of essences.
13 A discarded draft printed by Kinbote also makes this connection: As chil-
dren playing in a castle find In some old closet full of toys, behind The animals and
masks, a sliding door (four words heavily crossed out) a secret corridor - (77). I
regret my inability to provide a plausible hypothesis about those four words here
replaced with five.
14 See my "Art and Its Preservation," The Journal of Aesthetics and Art Crit-
icism, XLIII,3 (1985), pp. 291-300; "Art and its Spectators," The Journal of Aes-
thetics and Art Criticism, XLV,1 (1986), pp. 5-17; E. Haverkamp-Begemann, Rem-
brandt: 'The Nightwatch'. (Princeton: Princeton Univ. Pr., 1982): Artforum and
Art in America frequently publish 'corrections' about upside-down photographs.
II> Thomas Nagel, "Sexual Perversion," reprinted in R. Baker & F. Elliston
(eds.), Philosophy of Sex. (Buffalo: Prometheus Books, 1975), pp. 247-60.
16 Roberto Longhi's account is reprinted in his Opere complete, vol. IV (Flo-
rence, 1968), pp. 82-143; a full discussion of the problem appears in Howard Hib-
bard, Caravaggio. (New York: Harper & Row, 1983); the philosophical issues are
discussed in my "The Transfiguration of the Commonplace: Caravaggio and His
Interpreters," Word and Image, III,1 (1987): 41-73.
17 The texts of Lacan and Derrida appear in Yale French Studies, 48 (1973),
and the discussions, with full bibliography, in Barbara Johnson, "The Frame of
Reference: Poe, Lacan, Derrida," Yale French Studies, 55/56 (1977), pp. 457-505.
18 For example, Poe's detective is akin to the connoisseur: both are experts
at identifying the authentic original. See Carlo Ginzburg, "Clues: Morelli, Frued,
and Sherlock Holmes," U. Eco and T.A. Sebeok (eds.), The Sign of Three: Dupin,
Holmes, Peirce. (Bloomington: Indiana Univ. Pr, 1983), pp. 81-118.
19 Arthur Danto, The Transfiguration of the Commonplace. (Cambridge: Har-
vard Univ. Pr, (1973).
20 In a letter of 4.19.83
21 See Leo Steinberg, "Pontormo's Capponi Chapel," Art Bulletin, 58 (1974),
pp. 386-98.
22Louis Althusser & Etienne Balibar, Reading Capital, Trans. B. Brewster
(London: Verso Editions, 1979), p 27.
23 Thanks to Arthur Danto, Alexander Nehamas, Marianne Novy, Mark Roskill;
for the last two words of my essay-Dana Scott; and to Richard Hennessy, whom
this essay is for.
PALE FIRE 87

David Carrier
Department of Philosophy
Carnegie Mellon University
Pittsburgh, PA 15213
INCREMENTAL ACQUISITION AND A PARAMETERIZED MODEL
OF GRAMMARl

ROBIN CLARK

Work in generative grammar over the past thirty years has been guided by
the problem of how it is that we can arrive at such a rich state of knowledge
about our native language given limited exposure to impoverished data. This
problem, the "Projection Problem" (cf., Baker, 1979 and the references cited
there), is stated in (1) below:
(1) The Projection Problem
What relation exists between a human being's early linguistic expe-
rience and his resulting adult intuitions (e.g., judgments concerning
grammaticality, ambiguity, entailment, etc.)?
I will argue, here, that certain computational principles constrain the kinds
of hypotheses we can formulate about the relationship between adult compe-
tence (knowledge about language) and early linguistic experience and, there-
fore, these principles must be taken into account in hypotheses about the
form of Universal Grammar.
Let us begin by considering some of the intuitions which must be ac-
counted for by a theory of grammar. In particular, we must develop our
theory of grammar so that we can explain how we come to know facts like
the following (where the ,*, indicates ungrammaticality):

(2) a. It is likely that John will be late.


b. John is likely to be late.
c. It is probable that John will be late.
d. *John is probable to be late.
The examples in (2a-b) illustrate what has traditionally been called
"raising to subject." That is, the surface subject in (2b) is taken as the
logical subject of the predicate to be late and not as the logical subject of
the entire predicate is likely to be late. Note that the surface forms of (2a)
and (2c) are virtually identical, differing only with respect to the choice of
adjective (likely versus probable). How, then, can we account for the fact that
native speakers of English do not accept (2d) on analogy with (2b)?
Similarly, consider the examples in (3), which illustrate a phenomenon
commonly referred to as "dative shift":

89
w. Sieg (ed.), Acting and Reflecting, 89-105.
© 1990 by Kluwer Academic Publishers.
90 CHAPTER 2

(3) a. Bill sent his subscription to Mary.


b. Bill sent Mary his subscription.
c. Bill transferred his subscription to Mary.
d. *Bm transferred Mary his subscription.
Examples (3a-b) indicate that the structure [vp Verb NP 1 to NP 2 ] may
be related to the structure [vp Verb NP 2 NPd (where "VP" abbreviates
Verb Phrase and "NP" abbreviates Noun Phrase). Note, as above, that the
surface form of (3a) is identical with that of (3c), differing only with respect
to choice of lexical items. We are again faced with the problem of explaining
why example (3d) cannot be formed on analogy with example (3b). To put
the point somewhat differently, why does the language learner systematically
avoid making certain obvious generalizations?
Consider, finally, the examples in (4) (where underlining indicates coref-
erence or, strictly speaking, "binding"):
(4) a. Every advisor fears his students.
b. John fears his students.
c. His students fear John.
d. *His students fear every advisor.
As the underlining indicates (4a) and (4b) may be interpreted roughly as (5a)
and (5b), respectively:
(5) a. for all x, x an advisor, x fears x's students
b. for x = John, x fears x's students
The contrast between (4c) and (4d) is of some interest (note that the
'*' assigned to (4d) is with respect to an interpretation). While (4c) may be
taken as more or less synonomous with John's students fear him (i.e., John),
(4d) may not be interpreted as synonomous with the bound reading of every
advisor's students fear him; in other words, (4d) may not be interpreted as:
(6) for all x, x an advisor, x's students fear x. The theory of grammar must
account for why it is that the interpretation given in (6) is not associated
with the string in (4d). Again, one might argue that the obvious analogy
between (4c) and (4d) is blocked by some principle of grammar.
Generative grammar has traditionally taken its object of inquiry to be
the problem of characterizing the initial set of principles which constrain
the kinds of hypotheses that children can make about the adult grammar
they are to acquire (cf., the discussion in Chomsky, 1965 or, more recently,
Chomsky, 1986). Generative grammarians, then, seek to characterize those
properties of mind that guide the language learner into making certain kinds
of generalizations about the adult grammar while preventing him from making
INCREMENTAL ACQUISITION 91

certain other kinds of generalizations. "Universal Grammar" is the collection


of those properties of mind which constrain the form of linguistic hypotheses
that the learner makes.
The relationship between early linguistic experience, Universal Gram-
mar and the target adult grammar has often been expressed by the following
diagram: 2

(7) {PLD} -+ ILAD I -+ Adult Knowledge of Language

Where PLD stands for the "£rimary Linguistic Data" , a representation of the
early linguistic experience (see below) and LAD stands for the "Language
Acquisition Device" a representation of Universal Grammar. We assume
that the primary linguistic data is fed into a language acquisition device
with the device returning a hypothesis about the form of the adult grammar.
Given that our theory of Universal Grammar is sufficiently constrained, we
would expect that our theory could guarantee convergence to the target adult
grammar for any appropriate pairing between natural language and primary
linguistic data. This is merely to say that any natural language can be learned
by any (biologically well-formed) child.
Note that this formulation of the problem is entirely consistent with
the formulation of a general theory of learning found in Osherson, Stob &
Weinstein (1986), where the human intellectual endowment is characterized
as:

(8) Human Intellectual Endowment = f:early experience -+ competence


That is, the human intellectual endowment is a function that maps early
experience (in some domain) to knowledge (of that domain). The "modular
nativist" claim of Chomsky, seen from the characterization of learning in (8),
amounts to the following claim:
(9) Human "Language Organ" = f:early linguistic experience -+ Adult
Grammar
There is a function f (= LAD) from early linguistic experience to adult
grammars, and f does not reduce to fin (8).3
Linguists have generally made the following set of assumptions about
the child's linguistic experience:

(10) a. PLD is a set of strings presented to LAD (no direct evidence of


linguistic structure).
b. PLD is positive (no negative instances allowed; cf, Brown & Hanlon,
92 CHAPTER 2

1970; Newport, Gleitman & Gleitman, 1977.)


c. Presentation of PLD to and acquisition by LAD is instantaneous.
Assumption (lOa) is warranted given that children do not receive direct
evidence about how the adult grammar represents well-formed utterances.
Note that evidence about constituent structure (what constitutes a phrase)
may be given to the child indirectly in the form of intonation breaks, pauses,
etc. Such indications are not direct evidence of structure and it is by no
means a given that children interpret such "cues" as indicating constituent
structure. Assumption (lOa), then, provides a very strong constraint on the
primary data.
Assumption (lOb), that the primary linguistic data is positive, basically
guarantees that the learner does not have access to grammaticality judg-
ments; that is, sentences do not come explicitly marked as "ungrammatical."
Thus, the learner cannot limit his hypotheses about the language on the basis
of known negative instances (strings that do not constitute sentences of the
language). At first, it may appear that assumption (lOb) is too stringent-
children are surely corrected by their elders when they make mistakes. Note,
however, that this tutoring is not given consistently nor is it the case that
children are necessarily aware of what aspect of their utterance is being cor-
rected (cf., the references cited under (lOb) as well as the helpful discussion
in Wexler & Culicover, 1980). Assumption (lOb) may, then, be viewed as a
"worst case" assessment which forces us to construct a more robust theory of
grammar. 4
Finally, (lOc) assumes that the entire set of primary linguistic data is
presented to the learner at once. This assumption is made largely for pur-
poses of simplicity. The syntactic theorist need not consider the order of
presentation of data to the learner and, therefore, need not consider hypothe-
ses constructed by the learner prior to converging on the adult grammar. We
should note that the assumption might have the further ramification that
the learner could converge on the adult grammar over a variety of differ-
ent orders of data presentation. This would help account for the fact that
speakers of the same language arrive at strikingly similar states of knowledge
despite the potential diversity of early linguistic experience. At this stage
of research, however, it is far from obvious that this result could be said to
follow from assumption (lOc); it certainly does not follow logically from such
an assumption.
A computational theory of acquisition, parallel to the competence the-
ory, makes largely the same set of assumptions for the same reasons; note,
however, that instantaneous presentation of data is not assumed:
INCREMENTAL ACQUISITION 93

(11) Computational Assumptions (cf., Berwick, 1985)


a. PLD is a set of strings presented to LAD (no direct evidence of
linguistic structure).
b. PLD is positive (no negative instances allowed)
c. Presentation of PLD to LAD is unordered.
d. Acquisition by LAD is insensitive to ordering and proceeds through
stages.
Assumptions (l1c-d) result from the failure to assume that the presentation
of data to the learner is instantaneous. We will assume that the data are
presented to the learner one sentence at a time. Given this new assumption,
it is crucial that we build our theory so as to be as invariant as possible
over different orders of presentation in order to account for the diversity
of potential environments for learning. Finally, the competence theory must
take into account "intermediate" hypotheses (stages) forwarded by the learner
before positing the adult grammar. This adds a further empirical problem
to the computational theory: Is it possible to simulate the stages children
pass through during language acquisition (cf., Brown, 1973 for a discussion
0f stages of acquisition)?
Note, finally, that both the competence and the computational theories
of language learnability must assume that the set of primary linguistic data is
finite. In other words, the learner must hypothesize the adult grammar after
a finite number of instances and must stand by that hypothesis thereafter. 5
In fact, the bound on the size of the primary linguistic data must be quite
severe if we are to account for the apparent rapidity of language acquisition; a
theory which could not guarantee learn ability after finite instances or which
required an astronomical (though finite) number of instances to guarantee
convergence would be of no interest as a theory of learnability. Given the
assumptions that the set of primary data consists of positive instances and
is strictly bounded in size, a theory of language learnability will apparently
be forced to posit a richly structured language acquisition device in order to
account for the detailed nature of adult linguistic knowledge.
Let us turn now to a substantive principle which has recently been pro-
posed by Berwick (cf., Berwick, 1985 and the references cited there):6
(12) The Subset Principle
If data set D; is consistent with languages L;, Lj such that Li is a subset
of Lj, guess L;.
In other words, if the primary linguistic data is compatible with the grammars
of two languages where one language is a superset of the other language,
the learner must guess that the target language is the smaller of the two
languages. Suppose that the learner hypothesized the larger language; then,
94 CHAPTER 2

if the actual target language is the subset language, no positive examples


will force the learner to retract his hypothesis. This is so precisely because
the target language is a proper subset of the learner's hypothesized language,
so all examples in the primary linguistic data will be compatible with the
learner's hypothesis.
Suppose, on the other hand, that the learner advances the subset lan-
guage as his hypothesis and the target is the superset language. Then there
is at least one string in the target language that is not in the hypothesized
language, again because of the proper subset relation that holds between the
two languages. But then there is, at least potentially, at least one positive
example which will force the learner to retract his hypothesis. The Subset
Principle forces conservative acquisition of the kind required by our learn-
ability assumptions (cf., (10) and (11), above); in particular, given that we
require that the learner see only positive examples, the learner's guesses must
be sufficiently conservative as to avoid making an overly general hypothesis
which could not be counterexemplified on the basis of available evidence. 7
Following Berwick (1986), let us apply the Subset Principle to a particu-
lar case. s It has long been known that English allows for so-called "unbounded
dependencies" where a wh-phrase may be related to a gap somewhere in the
sentence (in the examples I have indicated the position of the gap with "_"):

(13) a. Who [s did John see --.J


b. Who [s did John see [NP friends of --.J]
c. Who [s did Mary think that [s John saw --.J]
d. Who [s did Bill say that [s Mary thought that [s John saw --.J]]
Note that the wh-phrases in (13a), (13c) and (13d) are interpreted as having
the semantic role normally assigned to the direct object of the verb see,
although the wh-phrase does not occupy this position in any of the examples.
It would appear from the examples in (13) that the gap may be separated from
the wh-phrase by arbitrarily long stretches of sentence. The obvious inductive
generalization, then, is to relate a wh-phrase to a gap located somewhere in
the sentence.
Note, however, that there are structures where the wh-phrasej gap rela-
tion is systematically excluded:
(14) a. *who [s were [NP friends of --.J seen by John]
b. *who [s did Mary deny [NP the rumor that [s John saw --.J]]
c. *who [s did Mary wonder [which man saw --.J]
d. *who [s did Mary wonder [which stories John told _ to --.J]
e. *who [s did Bill visit England after [s John saw --.J]
INCREMENTAL ACQUISITION 95

The data in (14) show that the obvious generalization is incorrect: It is


not always possible to relate an initial wh-phrase to a gap somewhere in
the sentence. Example (14a) shows that the wh-phrase/gap relation is not
possible when the gap is properly contained within a subject noun phrase
(= NP); (14a) should be compared with (13b) where the gap is properly
contained within an object NP and the wh-phrase/gap relation is well-formed.
The ill-formed relation is shown schematically in (15):

(15) *[5 wh-phrase [5 ... [subject ... gap ...J.. ·ll


1

Compare the relation in (13b) with the ill-formed relation in example


(14b). In (14b), the wh-phrase is related to a gap inside an object NP, but
the gap itself is properly contained within a clause inside of the object. Thus,
although a wh-phrase may be related to a gap contained within an object,
this relation is possible only if the gap is not contained within a clause that
forms a part of the NP. We can show this relation schematically as follows:

(16) *[5 1 wh-phrase [5 .. .[NP .. .[clau.e ...gap ...J.. ·J···ll


Now consider example (14c). At first, this example may appear to be
isomorphic, up to lexical items, to examples (13c) and (13d). The difference
is that the wh-phrase in (14c) is related to a gap properly contained within a
clause that is itself introduced by a wh-phrase, while the wh-phrases in (13c)
and (13d) are related to gaps in clauses that are not themselves introduced
by wh-phrases. We can show the relevant properties of examples like (14c)
schematically as follows (where identity of subscripts attached to the wh-
phrase and gap show that they are to be taken as related):

(17) *[5 wh-phrasei [5 .. .[5 wh-phrasej [5 ...gapj ... gapi·· ·lllJ


1 1

Finally, consider (14d). In this example, the clause containing the gap
to be related to an initial wh-phrase is an adverbial clause fixing the time
of an event with respect to another event. As (14d) shows, this sort of wh-
phrase/gap relation is ill-formed:

(18) *[5 wh-phrase [5 .. .[adverbial ... gap ... J... ]]


1

The above data would seem to indicate that the wh-phrase/gap relation
is restricted to cases where the gap is properly contained within an object
clause or an object NP. This generalization is, again, not quite correct as the
data in (19) show:

(19) a. Who [5 did Mary think that [5 John saw ---1J


b. *who [5 did Mary whisper that [5 John saw ---1J (cf., Mary whis-
pered that John saw Bill)
96 CHAPTER 2

c. *who [8 did Mary croak that [8 John saw --.J] (cf., Mary croaked
that John saw Bill.)
d. *who [8 did Mary giggle that [8 John saw --.J] (cf., Mary giggled
that John saw Bill.)
Notice that the examples in (19b-d) are identical (up to the choice of the
verb) with example (19a) (cf., also the examples in (13)). The above data
would seem to argue that the ability to relate a wh-phrase to a gap is at least
partially a function of the lexical properties of verbs. Following standard
terminology (cf., Erteschik, 1973), we will refer to verbs, like think in (19a),
as "bridge" verbs and verbs, like whisper, croak and giggle in (19b-d), as
"non-bridge" verbs. Note that non-bridge verbs tend to fall into semantic
classes, like verbs of manner of saying. We assume that the ability to extract
from an object clause is partially a function of the lexical semantics of verbs;
given that the learner is sensitive to lexical semantics, the bridge/non-bridge
distinction should follow. We can summarize (19) with the following schema:
(20) *[SI wh-phrase [8 ... V(non-bridge) [clau.e ... gap ...] ...]]

The data in (13), (14) and (19) have been accounted for by means of a
constraint on the relation between gaps and their antecedents:
(21) Subjacency Condition
No rule can involve two elements X and Y in the structure below if both
A and B are bounding nodes:
... X .. '[A .. '[B ... Y .. .B] . . .A] . ..

We need not linger over the technical niceties associated with (21). It is
sufficient to see that the Subjacency Condition requires that antecedent/gap
relations be short distance. In particular, the Subjacency Condition forces us
to take apparent long distance antecedent/gap relations as in (13d) as consist-
ing of a series of short "leaps." These leaps are mediated by a complementizer
node, Comp, occupied by complementizers like that or whether.
Notice that clauses (category S) are unique in being introduced by a
complementizer. If we take the bounding nodes to be minimally Sand NP,
then the schema given in (16) (repeated here as (22)) follows since the gap is
separated from the wh-phrase by an S and an NP:

(22) *[8 wh-phrase [S .. .[NP ... [clau.e


1 ... gap ...] ...] ...]]

Consider, now, the bridge/non-bridge verb distinction:


(23) a. *[8 1 wh-phrase [S .. .V(non-bridge) [SI [s .. .gap ...]]]]
INCREMENTAL ACQUISITION 97

b. [8 wh-phrase [8 ... V(bridge) [8 [5 ...gap ...]]]]


1 1

The distinction follows if we assume that, in the general case, 5' counts as
a bounding node in addition to 5 and NP. In the non-bridge verb case, the
wh-phrase would be separated from the gap by at least an 5 and an 5' (the rel-
evant nodes are underlined in (23a)). Bridge verbs would have the exceptional
property of rendering the S' of their complement clause to be transparent to
the operation of the Subjacency Condition. The relation between the wh-
phrase and the gap in (23b) could then be broken into two sub-chains-one
between the wh-phrase and the embedded Comp(lementizer) node and the
other between the embedded Comp node and the gap-both of which obey
the Subjacency Condition (although the summation of the two chains appar-
ently violates Subjacency). This is essentially the analysis given to the bridge
verb phenomenon in Chomsky (1980).
The preceding would be of little interest if languages did not vary with
respect to the nodes they selected to count as bounding nodes. We could
simply stipulate as part of the specification of the initial state of the language
acquisition device that antecedent/gap relationships obeyed Subjacency and
that the bounding nodes were S, S' and NP. It would be up to the learner to
discover which verbs have the bridge property; notice that the learner would
have positive examples of the form of questions like those in (13) for this
discovery.
Interestingly, however, languages do differ as to which nodes they se-
lect to count as bounding nodes for purposes of the Subjacency Condition.
Italian, for example, selects different bounding nodes from those selected by
English, as pointed out in Rizzi (1982); as a result, the locality effects on
antecedent/gap pairs differ in Italian, as the data in (24) illustrate (I will
again use the device of co-subscripting antecedent/gap pairs):

(24) a. II solo incarico [81 chei [8 no sapevi [81 a chi [8 avrebbero affidato
--.-i --J]]]] e poi finito ate.
"The only charge that you didn't know to whom they would entrust
has been entrusted exactly to you."
b. Tuo fratello, [51 a cuii [8 mi domando [51 che storiej [8 abbiano
raccontato --J _d]]] era molto preoccupato.
"Your brother, to whom I wonder which stories they told, was very
troubled."
c. La nuova idea di Giorgio, [81 di cuii [8 immagino [81 che cosaj [8
pensi --J --.-i]]]], diverra presto di pubblico dominio.
"Giorgio's new idea, of which I imagine what you think, will soon be-
come known to everyone."
98 CHAPTER 2

As the examples in (24) show, the following is a well-formed antecedent/gap


relation in Italian (compare (25) with (17»:
(25) [51 wh-phrasei [5 .. '[51 wh-phrasej [5 ... gapj ...gapi ... J]
The facts in (24) can be accounted for if Italian differs from English in that
English selects S', S and NP as bounding nodes whereas Italian selects only
S' and NP as bounding nodes (cf., Rizzi, 1982). In other words, the set of
bounding nodes is not fixed but is, rather, parameterized. Languages thus
may vary with respect to the value they assign this parameter.
If the preceding account of cross-linguistic variation is on the right track,
then the set of bounding nodes must be, in whole or in part, a function
of linguistic experience rather than a theorem of the initial setting of the
language acquisition device. Given that the primary linguistic data consists
of positive data (i.e., no explicit indication ofungrammaticality is consistently
given), the learner's task would seem to be massive, even assuming that the
language acquisition device provides the information that the target language
must obey the Subjacency Condition.
It is precisely in this type of situation that the Subset Principle can help
us provide a theory of language learnability. More concretely, suppose that
language Ll lacks (surface) antecedent/gap relations. In other words, Ll is
a language like Chinese (cf., Huang, 1982) which does not allow wh-phrases
to be fronted in the syntax.9 Presumably, the canonical way of asking a wh-
question in such a language would be as in (26a), where the wh-phrase occurs
in situ, rather than (26b), where the wh-phrase has been fronted (where the
,*, is with respect to Ld:

(26) a. John saw who?


b. *who (did) John see_
It has often been noted that there is a strict correlation between semantic
roles and word order in early stages of child language (cf., Berwick, 1985, and
Bowerman, 1973). The hypothesis that the target language is Ll accounts for
this empirical observation in that the grammar for Ll docil not allow for the
displacement of constituents: Ll does not make provision lor antecedent/gap
relationships.
Suppose that language L2 is identical to Ll except that it has (,.,1 "idly)
bounded wh-movement (e.g., Russian, cC., Chomsky, 1980). ThaI I". t.he
grammar of L2 makes provision for antecedent/gap relations but only for
those of a strictly local kind. We can capture this by claiming that the
bounding nodes for L2 are S', Sand NP. Thus, example (27a) would be
well-formed in L2 since the antecedent/gap relation crosses only a single S
INCREMENTAL ACQUISITION 99

node, while example (27b) would be ill-formed in L2 since the antecedent/gap


relation must cross at least an S node and an S' node in one step:
(27) a. [51 who [5 (did) John see-J]
b. *[51 who [5 (did) Mary think [51 that [5 John saw -J]]]
Note that Ll is a proper subset of L2 since L2 contains all the sentences
allowed by the grammar of Ll and, in addition, contains some sentences not
allowed by the grammar of L1 , namely, those involving bounded movement.
Suppose that language La is identical to L2 in having wh-movement
(that is, it licenses antecedent/gap relations) but, in addition, La allows for
bridge verbs. That is, La defines the set of bounding nodes for Subjacency as
NP, S and S' but allows S' to be transparent in certain environments, as in
English. Given that think is a bridge verb and whisper is a non-bridge verb,
La has the following array of facts:
(28) a. [51 who [5 (did) John see-J]
b. [51 who [5 (did) Mary think [51 that [5 John saw -J]
c. *[5 1 who [5 (did) Mary whisper [51 that [5 John saw -J]
Example (28b) would be grammatical since think renders the S' node
of its complement clause transparent to Subjacency; thus, the relationship
between who and the gap after see may be captured as the summation of two
smaller chains: One chain consists of who and the Comp of the embedded
clause and the other chain consists of the Comp of the embedded clause and
the gap after see. Note that there is no way to break down the antecedent/gap
relation in (28c) into smaller chains that obey Subjacency since whisper is not
a bridge verb; in particular, there is no way to establish a sub-chain between
the wh-phrase and the Comp node of the embedded clause which does not
cross both an S node and an S' node. Note that L2 (and hence Ld is a proper
subset of La since La contains all the sentences contained in L2 in addition to
allowing for some apparently unbounded antecedent/gap relations (depending
upon properties of the intervening verbs).
Finally, suppose that the grammar oflanguage L4 allows antecedent/gap
relations like those allowed by the grammar of La, but that L4 has defined its
bounding nodes to be NP and S'. In other words, L4 would allow long-distance
antecedent/gap relations like those found in Italian (cf., the examples in (24)).
Notice that by our assumption, L4 is exactly like La except that L4 contains
examples like (29) while La does not (again, I have used co-subscripting to
disambiguate the antecedent/gap relations):

(29) [51 whati [5 (did) Mary wonder [51 who; [5 ---J saw ~]]]]
100 CHAPTER 2

We have constructed the above languages such that Ll is a subset of L2


which is a subset, in turn, of L3 which is, itself, a subset of L4 :

Notice that if the learner guesses that the target language is L4 on the basis
of data from L2 his guess will be consistent with all of the primary data.
The hypothesis grammar (L4) will assign structural descriptions to all of
the strings in L2. Thus, if the target language is L2 and the learner has
hypothesized L4 , then no evidence will ever force the learner to retract his
hypothesis. But then it would seem perverse to say that the learner has
acquired the target language since the grammar hypothesized by the learner
will accept as sentences strings that are ill-formed with respect to the target
language (cf., also note 5).
We can shed some light on the above problem by appealing to the Subset
Principle. We need to construct our theory of the language acquisition device
(and hence our characterization of Universal Grammar) in such a way that
the learner first assumes that the target language resembles L 1 ; that is, the
language does not have a movement rule which establishes antecedent/gap
relations and the bounding nodes for Subjacency are defined (vacuously in
this case) to be S, S' and NP. Since the learner must acquire the lexical
properties of individual words (for example, hit is a transitive verb which
must have an object), data of the form given in (31) will force the learner to
react his hypothesis that the target language is L1 :
(31) What did Bill hit _?
The example in (31) is incompatible with the hypothesis that there is no
movement in the language since the only way to reconcile the above example
with the known lexical properties of hit is to assume that the wh-phrase,
what, interpreted as the direct object.
Given the characterization of hypothesis languages (above), the learner
will next hypothesize that the target language is like L2 • That is, the target
allows for movement, but the bounding nodes for Subjacency remain (non-
vacuously, now) S, S' and NP. Thus, the learner hypothesizes that the target
language allows for strictly bounded movement (see the discussion of L2 ,
above). Note, however, that this hypothesis will be incompatible with data
like that in (32):
(32) [SI what [s did John say [SI that [s he saw -J]]]
That is, the assumption that the target language is like L2 with respect to
movement will result in a grammar that fails to assign a structural description
to examples like that in (32). Hence, the learner must retract his hypothesis
INCREMENTAL ACQUISITION 101

that the target language is like L2 and, by the Subset Principle and our
characterization of the hypothesis languages, he must assume that the target
language is like L3: That is, the grammar for the target language allows for
a class of bridge verbs. 10
If the target language is like English, then the learner will not need to
revise his hypothesis grammar any further (at least with respect to conditions
on movement). Suppose, however, that the target language is like Italian or
French in that the set of bounding nodes for Subjacency consists of S' and
NP rather than S, S' and NP. Here again, rather simple primary data will be
sufficient to force the learner to revise his hypothesis. Consider, for example,
the following wh-question from French:
(33) [SI a. qui; [s est-ce que vous vous demandez [s' qu'j [s est-ce que Jean
a donne -.-j _;]]]]
"To whom do you wonder what John gave?"
The example in (33) is incompatible with the assumption that S is a bounding
node since the chain between a qui (to whom) and its associated gap cannot
be broken into sub-chains due to the presence of que (what) in the embedded
complementizer position. The only possible structural description for (33)
contains information that the chain between a qui and its gap crosses two
S nodes, thus forcing the assumption that the target language is like the
hypothesis language L4 with respect to antecedent/gap relations.
Notice that example (33) is still a relatively simple sentence, involv-
ing a single embedded clause. If we count the number of S nodes in the
structural description of any given sentence, we can associate the resulting
integer with the sentence (called the "degree" of the phrase-marker-i.e., the
structural description of the sentence). The following constraint has been
hypothesized: l l

(34) Boundedness of Minimal Degree of Error (BDE)


For any base grammar B there exists a finite integer U such that for any
possible adult grammar A and learner C, if A and C disagree on any
phrase-marker b generated by B, then they disagree on some phrase-
marker b' generated by B, with b' of degree at most U.
The BDE guarantees that discrepancies between the grammar of the
hypothesis language and the grammar of the target language are detectable
on relatively simple input data. In general, given the effects of the Subjacency
Condition, we may set the degree constant, U, at 2. But then no phrase-
marker of depth greater than or equal to 3 will be required for acquisition
to be successful. That is, the learner need not be presented with an input
sentence which requires a structural description of depth, say, 15 in order
102 CHAPTER 2

to have evidence to retract his hypothesis. The effect of the BDE, then, is
that the grammar of the target language must be learnable from some finite
sequence of relatively simple input sentences. If we cannot guarantee this
property, then presumably there are target languages such that the learner
cannot converge on the target grammar within bounded time. In short, if
the grammar cannot be learned within some strictly bounded time, then it
cannot be a grammar for a natural language.
We should note that the BDE is really double-edged since it also requires
that the grammar of any natural language does not contain rules which can
be satisfied only by a phrase-marker of degree greater than U (= 2). Suppose
that the grammar for some natural language contained such a rule. Then
there must be input sentences which have a degree greater than or equal to
the depth required by that rule. By the BDE, such evidence is irrelevant and
may be disregarded by the learner. If a linguistic theory allows for grammars
which violate the BDE, then it is characterizing grammars which cannot
correspond to any natural language grammar in that we cannot guarantee
that they can be learned within some bounded period of timeP
If the above arguments are correct, then we must construct our theory
of UG in a manner that is consistent with the Subset Principle and the BDE.
That is, the learner initially assumes that the target grammar lacks movement
and has bounding nodes for the Subjacency Condition set at NP, Sand S'.
But the initial setting of the language acquisition device (in other words, Uni-
versal Grammar) is exactly the object of study for generative grammarians.
Given that the Subset Principle and the BDE are computational principles
designed to guarantee learnability given a finite sequence of positive data, it
is apparent that the theorist must take elements of computation theory into
account in order to provide a fair account of Universal Grammar.
While the Subset Principle and the BDE allow us to provide a princi-
pled explanation of certain aspects of language acquisition (for example, why
children do not over-generalize with respect to syntactic locality principles
like Subjacency), there are domains where the Subset Principle, for example,
seems irrelevant. One case is the over-generalization of morphological rules
by children:
(35) a. I goed.
b. He hitted me.
c. I want liting. (French-English Bilingual)
In the above examples, goed, hitied and liling (a combination of French lit
("bed") and the English affix -ing) are not actual words of English. These
stand as apparent counterexamples to the conservative acquisition forced by
the Subset Principle since no positive evidence is sufficient to establish the
INCREMENTAL ACQUISITION 103

non-existence of these forms. The child will simply never hear these examples
in adult speech. It may be that in these cases correction plays an important
role or other computational principles are sufficient to allow the learner to
retract these forms.
As the above indicates, further work in the area of learnability is nec-
essary. We have only scratched the surface of the Projection Problem. Nev-
ertheless, we may expect the relationship between linguistic theory and the
computational theory of learnability to be a long and fruitful one.

NOTES

1 I wish to thank Eric Nyberg and Kevin Kelly for many helpful discussions.
Most of the material in this paper was originally presented at a Philosophy Collo-
quium at Carnegie Mellon University in January 1987.
2 But see White (1982) for an alternative formulation along with a discussion
of some of the relevant psycholinguistic literature.
3 This is not to deny that other cognitive domains are relevant to language
acquisition; given that humans use language it is hard to see how the linguistic
component could be completely segregated from other cognitive domains. What
(9) denies is that there is an all-purpose learning function which yields grammars
as a special case. For discussion of this point see Piatelli-Palmarini (1980). Notice
that the claim is empirical; to falsify it, one need only provide the characterization
of a completely general learning function which yields adult linguistic competence
as a special case.
4 The theory of learn ability given by linguistic theory could be viewed as a
limit theory of acquisition. It may be that a learner could gain access to some
information through correction or explicit tutoring, but the theory of grammar
can ensure learn ability in the absence of such information. Given the inconsistent
nature of negative information, it would represent a severe weakening of the theory
to assume the presence of such information.
S We don't want the learner to posit the adult grammar as a hypothesis only
to reject it later, never to return to it. In such a case, we would say that the
target language was not learned, although the learner momentarily hypothesized
the correct grammar.
6 The Subset Principle is based on work done by Angluin (1978).
7 We should note that the exact scope of application of the Subset Princi-
ple is far from uncontroversial. See Hyams (1986) for a discussion of a case of
overapplication of the Subset Principle.
S While we agree with Berwick (1985) that the Subset Principle accounts for
the setting oflanguage-particular parameters regarding syntactic locality principles,
our account differs from his on a number of points.
9 In point of fact, children do appear to produce wh-questions at a rather
early stage. Klein (1982) and Hyams (s1986) argue that early wh-questions are
produced by the phrase structure component and, hence, do not involve a movement
104 CHAPTER 2

rule which establishes antecedent/gap relations. If this is so, then the Subjacency
Condition is irrelevant as a filter on well-formed applications of movement at this
stage since movement simply does not exist. See Klein (1982) for some discussion.
10 Note that the child must establish classes of lexical items (e.g., the class
of bridge verbs as opposed to the class of non-bridge verbs). The Subset Principle
forces conservative acquisition which is example-driven. It need not force it to
the relatively weak position that the child laboriously records lexical properties on
an item-by-item basis. That is, an example of a particular lexical item occurring
in some syntactic configuration may allow the learner to generalize to the class
containing that lexical item. For some suggestive work on lexical classes, see Keil
(1979).
11 For discussion of the BDE, see Culicover & Wexler (1977) and Wexler &
Culicover (1980). Note that their original work was concerned with the learning of a
transformational component given base structures (roughly, a thematic representa-
tion) and surface forms. Since the child grammar and the adult grammar agreed on
base forms, divergence of surface forms could be traced unambiguously to the trans-
formational component. Work by Chomsky and others (see, for example, Chomsky,
1989 and Chomsky, 1981) has reduced the transformational component to a single
rule which can be stated as part of Universal Grammar and, therefore, need not be
learned on the basis of experience. As I will argue, the BDE still places a strong
constraint on linguistic theory. In particular, we may take a broader interpretation
of the BDE such that any divergence at any level of linguistic representation will
be detectable from a structural description of strictly bounded degree. For recent
discussions of this line of investigation, see Atkinson (1986) and Morgan (1986).
12 It could, of course, come about that in the best case the learner could
"accidentally" converge on the target grammar given an appropriate ordering of
the input evidence (starting, say, from the most complex structures and proceeding
to the simplest). Recall, however, our assumption that learnability was guaranteed
over random sequences of input evidence. If we take the goal of linguistic theory to
be bounding theory on language acquisition, characterizing the limits o11tarnability,
a theory which cannot guarantee the effects of the BDE is without interest.

REFERENCES
Angluin, D. (1978) "Inductive Inference of Formal Languages from Positive Data,"
Information and Control, 45, 117-35.
Atkinson, M. (1986) "Learnability", P. Fletcher & M. Garman (eds.) Language
Acquisition. Cambridge U ni versi ty Press, Cambridge.
Baker, C.L. (1979) "Syntactic Theory and the Projection Problem," Linguistic In-
quiry, lOA, 533-82
Berwick, R. (1985) The Acquisition of Syntactic [(nowledge. The MIT Press, Cam-
bridge, MA.
Bowerman, M. (1973) EaI'ly Syntactic Development. Cambridge University Press,
Cambridge.
Brown, R. (1973) A First Language. Harvard University Press, Cambridge, MA.
Brown, R. & C. Hanlon (1970) "Derivational Complexity and the Order of Acqui-
sition of Child Speech," l.R. Hayes (ed.) Cognition and the Development of
Language. Wiley, New York.
INCREMENTAL ACQUISITION 105

Chomsky, N. (1965) Aspects of the Theory of Syntax. The MIT Press, Cambridge,
MA.
Chomsky, N. (1980) "On Binding," Linguistic Inquiry, 11, 1-46.
Chomsky, N. (1981) Lectures on Government and Binding. Foris Publications,
Dordrecht, Holland.
Chomsky, N. (1986) Knowledge of Language: Its Nature, Origin, and Use. Praeger
Publishers, New York.
Culicover, P. & K. Wexler (1977) "Some Syntactic Implications of a Theory of
Language Learnability," P. Culicover, T. Wasow & A. Akmajian (eds.) Formal
Syntax. Academic Press, Inc. New York.
Erteschik, N. (1973) On the Nature of Island Constraints. MIT PhD Dissertation.
Hyams, N. (1986) Language Acquisition and the Theory of Parameters. D. Reidel,
Dordrecht, Holland.
Huang, J. (1982) Logical Relations in Chinese and the Theory of Grammar. MIT
PhD Dissertation.
Keil, F. (1979) Semantic and Conceptual Development. Harvard University Press,
Cambridge, MA.
Klein, S. (1982) Syntactic Theory and the Developing Grammar: Reestablishing the
Relationship between Linguistic Theory and Data from Language Acquisition.
UCLA PhD Dissertation.
Morgan, J. (1986) From Simple Input to Complex Grammar. The MIT Press,
Cambridge, MA.
Newport, E., H. Gleitman & L. Gleitman (1977) "Mother, please, I'd rather do it
myself: Some effects and non-effects of maternal speech style," Snow & Fergu-
son (eds.) Talking to Children: Language Input and Acquisition. Cambridge
University Press, New York.
Osherson, D., M. Stob & S. Weinstein (1986) Systems that Learn. The MIT Press.
Cambridge, MA.
Piatelli-Palmarini, M., ed. (1980) Language and Learning. Harvard University
Press, Cambridge, MA.
Rizzi, L. (1982) Issues in Italian Syntax. Foris Publications, Dordrecht, Holland.
Wexler, K. & P. Culicover (1980) Formal P,·inciples of Language Acquisition. The
MIT Press, Cambridge, MA.
White, 1. (1982) Grammatical Theory and Language Acquisition. Foris Publica-
tions, Dordrecht, Holland.

Robin Clark
Department of Linguistics
University of California
Los Angeles, CA 90024
WHAT ARE GENERAL EQUILIBRIUM THEORIES?!

DAN HAUSMAN

A little philosophy of science can be a troubling thing. Simplified treatments


of philosophy of science maintain that scientists formulate generalizations or
theories, derive implications from them, and retain these generalizations or
theories (albeit with some caution) as long as they pass the experimental
tests. This story is heavily oversimplified, and its general inadequacies are
familiar and shall not be repeated here. But this story is both well-known
and a simplification of a truth, not mere error.
If one accepts this oversimplified vision of science, much work in the-
oretical economics is hard to understand, for it clearly does not consist in
the presentation and examination of testable theories. Much of it is, instead,
better interpreted as conceptual development, and not a less significant part
of empirical science for this interpretation.
In this paper I shall focus on a particular stream of theoretical work in
economics concerning which controversy and dispute have raged. The ele-
gant theories of general equilibrium which have been developed during the
past three decades have left many economists puzzled, since they appear to
have little to do with real economies. Gerard Debreu in his classic Theory
of Value states that his theory is concerned with the explanation of prices
(1959, p. ix). Others as distinguished as Kenneth Arrow and Frank Hahn
deny that general equilibrium theories are explanatory (1971, pp. vi-viii).
Moreover, some prominent economists (Blaug, 1980, 187-92) and prominent
philosophers (Rosenberg 1983) have argued that work in general equilibrium
theory is not empirical science at all. I shall here offer a philosophical inter-
pretation of what those mathematical structures called general equilibrium
theories are. I shall defend their cognitive worth and their place within eco-
nomics, although I shall concede that they are without explanatory power.
To understand what general equilibrium theories are, one must first
understand what equilibrium theory is. General equilibrium theories are ap-
plications of (and thus not identical to) equilibrium theory. They are, as
will be discussed later, the result of combining equilibrium theory with aux-
iliary hypotheses of the right sort. "Equilibrium theory" is my name for the
fundamental theory of microeconomics. Although economists do not use my
terminology, most regard what I call "equilibrium theory" or "the basic equi-
librium model" as fundamental to virtually all economic theory. They hope
to be able to reduce, or at least relate macroeconomic theories to equilibrium

107
W. Sieg (ed.), Acting and Reflecting, 107-114.
© 1990 by Kluwer Academic Publishers.
108 CHAPTER 3

theory. They hope to be able to augment the basic equilibrium model to deal
with questions of economic growth and change. This is the model they rely on
in specific empirical research and in many welfare recommendations. When
one has succeeded in saying what equilibrium models are, one has largely
succeeded in saying what neoclassical economics is.
Among the various assumptions common to different neoclassical mod-
els, one can distinguish two different kinds. Some, like "Agents' preferences
are transitive" or "Entrepreneurs attempt to maximize profits" should be re-
garded as the fundamental "laws" of neoclassical economics-although they
are, to be sure, very messy and problematic. Other assumptions like "Com-
modities are infinitely divisible" or "Agents have perfect information" have
(when taken to be claims about the world) narrower scope and are not re-
garded as assertions or discoveries of economics. Economists are pleased when
these simplifications can be relaxed. Although such simplifications are essen-
tial in most economic theorizing and are common constituents of neoclassical
models, they are not really assertions of economics nor are they, I suggest,
part of fundamental economic theory or of the fundamental assumptions of
equilibrium models. I think one can best understand what neoclassical eco-
nomics is by focusing on its fundamental laws or principles.
"Equilibrium theory" is my name for these fundamental laws or princi-
ples. It is helpful to divide them into four groups:
1) Utility theory: Individuals have complete and transitive preferences and
choose that option that they most prefer.
2) Economic preference: Individuals prefer "larger" commodity bundles
to smaller. Commodities possess diminishing marginal utility or dimin-
ishing marginal rates of substitution for all individuals.
3) Production: Increasing any input (other inputs held constant) increases
output at (eventually) a diminishing rate. Increasing all inputs in a cer-
tain proportion increases output in the same proportion. Entrepreneurs
or firms attempt to maximize profits.
4) Equilibrium: An equilibrium that reconciles the activities of individuals
(in which there is no excess demand on any market) exists.
Although utility functions are often immediately defined as ranging over
commodity bundles, it is helpful to recognize that utility theory is much more
general. It might be regarded as a way of making specific the idea that
people are instrumentally rational. Many economists regard it as defining
rationality. Utility theory is silent concerning the content of preferences and
does not imply that individuals are egoistic or that there is some sensation or
entity called "utility" which is the sole or ultimate goal of individual action.
GENERAL EQUILIBRIUM THEORIES 109

To say that agents are utility maximizers is to say no more than that they
do what they most prefer.
"Non-satiation", the generalization that individuals prefer more com-
modities to fewer, identifies the options that individuals face with commodity
bundles. It implicitly declares individuals to be self-interested or mutually
disinterested. All they care about is the absolute size of the commodity bun-
dle they wind up with. "Economic rationality" might be (and often implicitly
is) defined as utility theory plus non-satiation. Diminishing marginal utility
is sometimes thought (quite implausibly) to be part of economic rationality.
But it is simply an empirical generalization about people's preferences for
mixes of commodities.
Diminishing returns to a variable input is, like diminishing marginal
utility, a fairly well-founded empirical generalization. Constant returns to
scale, on the other hand, is one of the principles or laws with which economists
are least happy. More than any other in the list, it is included largely because
it is needed for mathematical proofs of the existence of equilibrium. Profit
maximization is a mare's nest of its own (see, for example, Friedman 1953).
Obviously there is something to it, but there is plenty of evidence of its
incorrectness.
The claim that an equilibrium exists might seem an odd candidate for
a fundamental "law" of neoclassical economics, since it is never, or virtually
never, stated as an assumption in neoclassical models. Instead, the existence
of equilibrium is something to be proven. But it is not something that inciden-
tally happens to be provable in a great many neoclassical models. The models
are constructed so as to permit one to prove that some sort of equilibrium can
obtain. Even though the proposition makes its explicit appearance typically
as a theorem, it remains a fundamental "law" of neoclassical economics.
The various constituent claims of equilibrium theory might be regarded
as the basic principles or laws that neoclassical economists have discovered.
Or, if one wants to postpone questions of assessment, one might regard them
merely as the fundamental assumptions in neoclassical models and leave aside
questions about the applicability of such models. They are not all equally
central and significant. Various simplifications such as perfect information or
infinite commodity divisibility will also be common constituents of neoclassi-
cal models, but, as mentioned above, such simplifications are not as essential
to neoclassical economics as are the propositions of equilibrium theory.
Taken as genuine assertions about the world, the four groups of propo-
sitions discussed above make up equilibrium theory, the fundamental theory
of neoclassical economics. They are an articulation of a basic vision of eco-
nomic life that was around long before neoclassical economics was. In that
vision, which can already be found in Adam Smith, individuals are thought
110 CHAPTER 3

of as rational and self-interested and as interacting only through voluntary


exchanges. Smith and his intellectual descendants then sought to show how
the result of such exchanges is a systematic and beneficial organization of
the economy. To point out that equilibrium theory is an articulation of this
vision is not automatically to criticize it.
Fundamental models do not by themselves enable one to say much about
the world. What makes the basic equilibrium model significant is that it forms
the core of partial and general equilibrium analyses. In partial equilibrium
models, markets are assumed to be isolated from one another and there is
often (largely implicit) aggregation, as in the common assumption that there
are only two commodities. General equilibrium models often avoid such iso-
lating and aggregating assumptions and attempt to deal with the general
interdependence of markets-although, of course, truly heroic assumptions
are needed for the exercise. In any case, both partial and general equilibrium
models are augmentations of the basic equilibrium model, which are designed
to enable one to come to terms with specific practical or theoretical questions
(see Green 1981).
There are two quite different varieties of general equilibrium theories.
One of these is of practical use, while the other is quite abstract. The first
kind is exemplified by input-output models. By assuming, for example, that
there are constant production co-efficients and that demand will show special
constancies, one can set up a model of an economy with perhaps a hundred
different commodities and industries and, with the help of a computer, investi-
gate how it operates. Practical general equilibrium theories raise no questions
that do not arise equally with respect to partial equilibrium analyses.
Theories of the second kind, which I shall call "abstract general equilib-
rium theories" , place no limitations on the interdependence of markets or on
the nature of production and demand beyond those implicit in the "laws".
When economists speak of general equilibrium theory, it is usually this ab-
stract variety that they have in mind. It is abstract general equilibrium theory
with which I am concerned in this paper. Given the abstractness and lack of
specification in abstract general equilibrium theory, many economists regard
it as the fundamental theory of contemporary economics. As the previous
discussion suggests, this seems a mistake. Equilibrium theory is the funda-
mental theory. General equilibrium theory is a particular application of the
fundamental theory.
What confuses matters is that applying equilibrium theory as the general
equilibrium theorists do serves no clear explanatory or predictive purposes.
Nor are these theorists attempting to develop a theory of a more specific
subject matter within economics. The stipulations they make concerning
information, markets and the like are ill-suited for any such purposes. The-
GENERAL EQUILIBRIUM THEORIES 111

ories of intertemporal general equilibrium assert or assume that agents have


complete and accurate knowledge concerning the availability and prices of
commodities and concerning the production possibilities both in the present
and the future! They also stipulate that there is a complete set of commodity
futures markets on which present commodities (or titles to future commodi-
ties) can be freely exchanged for titles to future commodities of every kind
and date (see Koopmans 1957, pp. 105-26; Malinvaud 1972, ch. 10 and Bliss
1975, ch. 3). Since such claims render the theory so obviously either false or
inapplicable to real economies, little testing can be done. Furthermore, the
fact that reality does not satisfy, even approximately, such assumptions of the
theories leaves abstract general equilibrium theories with little if any predic-
tive worth. Given the falsity of stipulations such as perfect information, one
wants to know what the point is of abstract general equilibrium theories.
One further peculiarity of abstract general equilibrium theories is that
they take the form of existence proofs. One demonstrates that the axioms
(which include reformulations of the claims in the first three groups above
and stipulations or auxiliary hypotheses of the kinds discussed) are sufficient
conditions for the existence of an economic equilibrium. Abstract general
equilibrium theories thus seem to have the form of explanatory arguments
where the explanandum is the existence of an economic equilibrium. Yet
construing general equilibrium theories as explanations of economic equilibria
with various properties is implausible, since there is no fact of equilibrium to
be explained. 2 Such peculiar theories thus appear to be without explanatory
power. What then are they doing as such a prominent part of a supposedly
empirical science?
This is a difficult question upon which leading theorists disagree. Some
believe, mistakenly (as argued at greater length in ch. 7 of my 1981b) that
general equilibrium theories serve at least in part to explain prices (Debreu
1959, p. ix; Malinvaud 1972, p. 242). C.J. Bliss denies that abstract general
equilibrium theories 'represent reality', but claims that nevertheless they are
a good point of departure and a good guide to which concepts are central and
fundamental (1975, p. 301). Although Bliss's view suggests important truths,
it is misleading. Many economists, particularly when they are concerned
about how to justify their theories, are tempted to say that they only provide
some sort of logic of economic phenomena or that they are merely bags of
tools into which theorists dip when convenient. These claims have a certain
truth to them, which I have tried to capture by distinguishing equilibrium
theory from its applications and hinting at a distinction between models and
theories (see my 1981b, ch. 3). These claims do not, however, resolve problems
of justification. If an economic theory is only a logic or a bag of tools or a
guide to which concepts are central, we still need to ask whether it is a good
logic or a good bag of tools or a good guide. If, as in the case of general
112 CHAPTER 3

equilibrium theories, there are no empirical applications. we have no way of


answering these questions.
There is, however, more to the attitude toward general equilibrium the-
ory that Bliss and others hold than the above argument recognizes. General
equilibrium theory may be of great heuristic value (see Green 1981). Al-
though heuristics is itself a complicated subject, one can show that general
equilibrium theories have been of heuristic value merely by showing that they
have in fact helped in developing valuable empirical economic theories. No-
tice that the heuristic value of general equilibrium theories is independent of
the existence proofs (the arguments) that such theories provide. Where gen-
eral equilibrium theories have been most valuable has been in the invention
of conceptual and mathematical devices (dated commodities, for example)
which are useful in other theories.
Yet it seems to me that the existence proofs that general equilibrium
theories provide are themselves also of value. Roy Weintraub argues that the
existence proofs show that the "hard core" of neoclassical economics, which
includes the claim that there are equilibrium states, is consistent, and that
without such proofs the general research strategy of neoclassical economics
would be futile (1985a, 1985b, esp. ch. 7). But the mere consistency of the
"hard core" propositions of neoclassical economics or of the "laws" listed
above (which embody these propositions) can be established in very simple
models and does not require the sophisticated mathematical work of the past
four decades.
My views are closest to those expressed by Frank Hahn and Kenneth
Arrow. They largely deny that general equilibrium theories say anything
about real economies, but they insist, rather unclearly, that they remain a
serious and valuable part of economics (1971, pp. vi-viii).
Since the Eighteenth Century many economists have believed that, given
reasonably favorable conditions, self-interested voluntary exchanges lead to
coherent and efficient economic organization. Yet the theories which econ-
omists have possessed have not enabled them to explain how this order
comes about nor even to show how it is possible that such order could come
about. Economic theorists might thus reasonably be in doubt concerning
both whether their theoretical framework captures the crucial features of the
economy and whether it is likely to lead them to an adequate theory. In
pursuing and developing equilibrium theory, will one ever be able to explain
how self-interested individual action within certain institutional constraints
can lead to coherent economic order? Do theorists really have a grip on the
most important and central economic regularities? Will economists ever be
able to understand whether the results of individual actions are truly efficient
and whether they lead to the achievement of other goals we might have?
GENERAL EQUILIBRIUM THEORIES 113

In proving the existence of equilibria under various conditions, I take the


abstract general equilibrium theorists to be providing explanations in princi-
ple of the characteristics of possible (although imaginary) economic states. In
doing so they demonstrate that equilibrium theory is capable of explaining
at least some sorts of complicated economic equilibria, and thus they give
one reason to believe that economists are on the track of an adequate gen-
eral economic theory. This sort of an explanation of a possibility needs to
be distinguished carefully both from explaining 'How possibly?' in the sense
of Hempel and Dray and from any discussions of the feasibility of economic
equilibria. Hempel's and Dray's view is that sometimes things happen con-
trary to our expectations which need explaining (away) (see Hempel, 1965,
pp. 428-30). But economists are not trying to show that the existence of
equilibrium is consistent with prior beliefs. Nor, despite Hahn's claims (1973,
p. 324), are abstract general equilibrium theorists concerned with how or
whether a competitive equilibrium is practically possible or feasible. We do
not need all this theory to know that real semi-competitive capitalism does
not regularly achieve full employment. If one did need general equilibrium
theories for the purpose, they would not help anyway, since the existence
proofs that the theories provide show only what conditions are sufficient for
competitive equilibria, not what conditions are necessary.
The abstract general equilibrium theorists have shown that were the
world very much simpler than it will ever be, economists could use their laws
to explain in principle how economies work. If one regards the resemblances
between the imaginary worlds of the theories and actual economies as at all
significant, these demonstrations give us reason to believe, in Mill's words
(1843, Bk. VI, ch. III, sec. 1), that economists know the laws of the "greater
causes" of economic' phenomena. Theorists thus have reason to believe that
they are on the right track. We should regard the existence proofs as providing
this sort of theoretical reassurance, not as explanations. Note in addition
that these abstract general equilibrium theories may help to improve current
economics. By progressively weakening and complicating the stipulations
needed in order to demonstrate the existence of more complex equilibria,
economists come closer to being able to apply the theory to real economies.

NOTES

1 This paper derives from my 1981a and my unpublished 1982. Ed Green


provided useful criticisms during the lunch-time seminar where a version of this
paper was delivered.
2 This is an overstatement. Portions of economies may in exceptional circum-
stances approximate equilibria. On rare and special occasions general equilibrium
114 CHAPTER 3

theories may thus be applicable and explanatory. If these theories have real impor-
tance, it is not for this exceptional applicability.

REFERENCES
Arrow, K. and F. Hahn (1971) General Competitive Analysis. San Francisco:
Holden-Day.
Blaug, M. (1980) The Methodology of Economics: Or How Economists Explain.
Cambridge: Cambridge University Press.
Bliss, C. (1975) Capital Theory and the Distribution of Income. Amsterdam: North
Holland.
Debreu, G. (1959) Theory of Value. New York: Wiley.
Friedman, M. (1953) "The Methodology of Positive Economics," pp. 3-43 of Essays
in Positive Economics. Chicago: University of Chicago Press.
Green, E. (1981) "On the Role of Fundamental Theory in Positive Economics,"
pp. 5-15 of J. Pitt, ed. Philosophy in Economics. Dordrect: Reidel.
Hahn, F. (1973) "The Winter of Our Discontent," Economica 40:323-30.
Hausman, D. (1981a) "Are General Equilibrium Theories Explanatory?" pp. 17-32
of J. Pitt, ed. Philosophy in Economics. Dordrect: Reidel.
Hausman, D. (1981b) Capital, Profits and Prices: An Essay in the Philosophy of
Economics. New York: Columbia University Press.
Hausman, D. (1982 unpublished) "The Conceptual Structure of Neoclassical Eco-
nomics," address at the 1982 meetings of the American Economic Association.
Hempel, C. (1965) Aspects of Scientific Explanation and Other Essays in the Phi-
losophy of Science. New York: Macmillan.
Koopmans, T. (1957) Three Essays on the State of Economic Science. New York:
McGraw-Hill.
Malinvaud, E. (1972) Lectures on Microeconomic Theory. Amsterdam: North-
Holland.
Mill. J.S. (1843) A System of Logic. rpt. London: Longmans Green & Co., 1949.
Rosenberg, A. (1983) "If Economics Isn't a Science: What Is It?" Philosophical
Forum 14:296-314.
Weintraub, E.R. (1985a) "Appraising General Equilibrium Theories," Economics
and Philosophy 1:23-37.
Weintraub, E.R. (1985b) General Equilibrium Analysis: Studies in Appraisal. Cam-
bridge: Cambridge University Press.

Dan Hausman
Department of Philosophy
University of Wisconsin
Madison, WI 53706
EFFECTIVE EPISTEMOLOGY, PSYCHOLOGY, AND ARTIFICIAL
INTELLIGENCE

KEVIN KELLY

Introduction
In this paper, I discuss the epistemological relevance of computation the-
ory. First, I dispense with standard arguments against the epistemological
interest of so-called "discovery methods" , which are procedures that generate
good hypotheses. Then I examine the importance of computational concepts
in the theory of justified belief. Finally, I compare the aims and methods
of a computationally informed epistemology with those of the related fields
of cognitive psychology and artificial intelligence. I conclude that artificial
intelligence is most interesting when viewed as an approach to effective epis-
temology rather than as an adjunct to cognitive psychology.
Hypothesis Generation
Prior to the Nineteenth Century, many philosophers, scientists, and
methodologists were interested in finding procedures that generate or dis-
cover knowledge. In his Posterior Anaiytics, Aristotle attempted to provide
an account of how to discover causes by finding missing terms in incomplete
explanations. Francis Bacon envisioned something like an industry for gener-
ating scientific knowledge. And many subsequent methodologists, including
the likes of Newton, Whewell, Herschell, and Mill, all believed that there exist
good, effective methods for making causal discoveries.
In the early Twentieth Century, however, the study of hypothesis gen-
eration procedures was largely abandoned in epistemology. There were some
good reasons for this shift in attitude. For one thing, philosophers had to con-
tend with the new edifice of modern physics. It was quite natural for them
to analyze the riches at hand rather than to study procedures for generating
more.
Moreover, many philosophers had become familiar with mathematical
logic and probability theory. These disciplines provide excellent tools for
the study of the structure and justification of scientific theories. But if hy-
pothesis generation methods are to be of use to beings like us, they should
be explicit procedures. And the proper setting for the study of procedures is
computation theory. But computation theory was largely unavailable to epis-
temologists in the first half of this century. Hence the formal tools available

115
W. Sieg (ed.), Acting and Reflecting, 115-128.
© 1990 by Kluwer Academic Publishers.
116 CHAPTER 4.1

to epistemologists during this period also directed them to study logical and
probabilistic relations of evidential support rather than effective methods for
generating hypotheses from evidence.
Epistemologists did not appeal to these practical reasons for postponing
the study of hypothesis generation methods. Rather, they attempted to prove
that it is a mistake for epistemologists to study such methods. Three distinct
strategies of argument were pursued.
(A) There are no good discovery methods to be found,
(B) even if there were, we should not use them, and
(C) even if it were not reprehensible to use them, such methods would be of
interest to psychologists, not to epistemologists.
(A) was seriously proposed by Rudolph Carnap, Karl Popper and Karl
Hempel, (B) is supported by the philosopher Ron Giere and many frequentist
statisticians, and (C) was proposed by Popper and Hempel, and has been
revived recently by the philosopher Larry Laudan.
You Cau't So far as (A) is concerned, it is a non-trivial matter to prove
that a problem cannot be solved by a computer. And this assumes that
one can state with mathematical precision what the problem to be solved is.
Traditional advocates of (A) did not state the discovery problem they had
in mind with any precision; nor did they have the theoretical wherewithal
to prove a problem uncomputable. So (A) originated as a bluff, and has not
advenced beyond this state in the philosophical literature 1 .
You Shouldu't There is, at least, a plausible argument for position (B).
1. Evidence only supports an hypothesis if it can possibly refute the hy-
pothesis.
2. But if a sensible procedure looks at the available evidence to construct
an hypothesis to fit it, then the evidence looked at cannot possibly
refute the hypothesis constructed to fit it.
3. Therefore, the evidence input to an hypothesis generator does not sup-
port the hypothesis generated.
4. But an hypothesis guessed without looking at the data could possibly
be refuted by the available evidence.
5. So if one wants hypotheses supported by the current evidence, one
should not use a generation procedure that looks at the data.

But the plausibility of this argument hinges on an equivocation over the sense
of "possibility" in premises (1) and (2).
EFFECTIVE EPISTEMOLOGY 117

Consider what it means in premise (1) for evidence to "possibly refute


an hypothesis." To say that some evidence can possibly refute a given hy-
pothesis is not to say that the relation of inconsistency possibly holds between
the evidence and the hypothesis. For notice that inconsistency is a logical
relation, which either holds necessarily or necessarily fails to hold. So if it
possibly holds, then it holds necessarily. But then the first premise says that
evidence confirms an hypothesis only if this evidence (necessarily) refutes the
hypothesis, which is absurd.
I propose, instead, that the sense of "possibly refutes" in premise (1) is
this: our actual evidence supports our hypothesis only if we might possibly
have sampled evidence that (necessarily) refutes the hypothesis, where our
evidence collection procedure is fixed over all possibly worlds. Under this
construal, premise (1) is plausible. For suppose I test my hypothesis that "all
ravens are black" by using a flying scoop that collects only black things. There
is no possible world in which this procedure collects evidence that refutes my
hypothesis, and the evidence collected does seem unpersuasive. The same can
be said about a procedure that can collect only uninformative or tautologous
evidence. Such evidence is consistent with any hypothesis, so our procedure
cannot possibly produce evidence that contradicts our hypothesis. So when
we talk about evidence possibly refuting an hypothesis, we are really talking
about dispositions of our evidence gathering procedure.
But now consider premise (2). When we say that the output of a hypoth-
esis generator that uses the evidence to patch together an hypothesis cannot
possibly be refuted by the input evidence, what we mean is that the proce-
dure is sure never to output an hypothesis refuted by its input, regardless of
what the input is. That is, the computational structure of the procedure ne-
cessitates the consistency of the output with the evidential input. The same
can be said of premise (4).
So the argument equivocates between properties of evidence gathering
procedures and properties of hypothesis generation procedures. To clarify this
point, consider a procedure that produces an unrefuted hypothesis for any
consistent evidence. Now collect some evidence and feed it to the procedure.
Eventually, an hypothesis pops out. Suppose further that the evidence gath-
ering procedure is unbiased with respect to the hypothesis produced: that
is, it can collect evidence that refutes the hypothesis in any world in which
there is such evidence. In this case we can say that the evidence might have
refuted the hypothesis produced because we might have gathered evidence
that refutes it, despite the fact that the generation procedure never produces
an output that is refuted with respect to its input. Hence, an hypothesis
produced by a generation procedure that peeks at the data can nonetheless
be supported by this data, at least so far as premise (1) is concerned.
118 CHAPTER 4.1

There are other arguments against the reliance on discovery methods.


These include the familiar claim that test statistics are "biased" if one looks at
the data used in the test and the claim that such methods are too unreliable.
The response to the first objection is that test statistics mean the same thing
(from a frequentist point of view) when a discovery procedure employs them
as when a human uses them. In neither case is the test actually repeated
forever. In both cases, a new hypothesis is conjured as soon as the first is
rejected. Of course, it would be a mistake to confuse the test significance
level with the mechanical procedure's probability of conjecturing the truth,
but it would be equally mistaken to confuse the human's reliability with the
significance level of his tests. The response to the second objection is similar.
The reliability of humans dreaming up hypotheses and employing statistical
tests is unknown, but can probably be exceeded by the reliability of well
designed formal methods 2 .
Who Cares? Finally, consider claim (C), that theory generation proce-
dures are of interest to psychologists but not to epistemologists. The usual
argument for this position is that the epistemologist is interested only in the
justification of hypotheses, not in their discovery. How an hypothesis hap-
pens to be discovered or constructed is a mere matter of fact, whereas one's
justification in believing it is a normative, philosophical issue (Popper, 1968).
But it is equally true that what people happen to believe is a mere matter
of fact, while deciding which discovery method one ought to choose is a nor-
mative, philosophical issue. In general, what people ought to believe and the
scientific methods they ought to choose are both normative, epistemological
questions, whereas how they happen to dream up conjectures or to come to
believe them are psychological questions.
Of course, it would be another matter if one could show that no interest-
ing normative issues arise in the choice of a method for generating good the-
ories. But of course there are many such issues. What sorts of inputs should
an hypothesis generator receive? Should an adequate method converge to
the truth on complete evidence? If so, how shall we construe convergence?
Must the method know when convergence is achieved? Should the method's
conjectures always be confirmed with respect to the input evidence? Should
its conjectures always be consistent with the evidence? Should it maintain
coherence among one's beliefs? Should a finite being be capable of following
it? On the face of it, these are all interesting normative questions concerning
hypothesis generation methods.
Finally, proponents of (C) might propose that the answers to such ques-
tions are all parasitic on finding an adequate theory of confirmation. Hence
the study of theory generating procedures is not irrelevant to epistemology,
but it is nonetheless redundant 3 .
EFFECTIVE EPISTEMOLOGY 119

But this position is also implausible, for there are obvious criteria of
evaluation for hypothesis generators that cannot be maximized jointly with
the aim of producing only highly probable or highly confirmed hypotheses.
One thing we might desire in a method is that it converge to the truth in a
broad range of possible worlds (Putnam, 1963). Another is that it be effective
so that it can be of use to us. But Scott Weinstein (Osherson, et al, 1986) has
demonstrated that there is an effective discovery method that can converge to
the truth in more possible worlds4 than any effective method which generates
an hypothesis of maximal posterior probability on any given evidence 5 . That
is, producing probable hypotheses is at odds with effectiveness and converging
to the truth in a broad range of possible worlds. It is also known that for
some problems, a method that sometimes conjectures hypotheses inconsistent
with the input evidence con converge to the truth in more worlds than any
method whose conjectures are always consistent with the evidence. It is clear
from these examples that theories of rational belief can be at odds with other
criteria of evaluation for discovery methods. Hence, all interesting normative
questions concerning such methods need not be reducible to questions about
probability and confirmation.
Rational Belief
So far, I have argued that computation theory is essential to the study
of theory generation methods and that the study of theory generation is an
abstract, normative topic suitable for philosophical investigation. But com-
putation theory also has direct relevance to standard theories of hypothesis
evaluation. It can show that proposed norms of hypothesis evaluation are
in conflict for beings with limited cognitive resources. For example, some
Bayesians like to issue the following Three Commandments to the teeming
multitude:

1. Thou shalt be coherent (i.e., thou shalt distribute thy degrees of belief
as a probability measure on Ye algebra of propositions).
2. Thou shalt modify they degrees of belief by conditionalization on Ye
evidence.
3. Thou shalt not be dogmatic (i.e., thou shalt not believe Ye contingent
propositions to Ye degree 1 or to Ye degree 0).
The advantage of the first commandment is that it prevents one from
taking bets he must lose, and the advantage of the second two is that anyone
who follows them will (in his own opinion anyway) converge to the truth on
increasing, complete evidence. But Baim Gaifman (Gaifman & Snir, 1982)
has shown the following. Assuming that one can talk about arithmetic, and
assuming that one is coherent, one's degrees of belief become increasingly
120 CHAPTER 4.1

impossible to compute as one becomes less dogmatic. That is, for any com-
putational agent, avoiding hopeless bets is at odds with convergence to the
truth. This is a computational result that should be of interest even to epis-
temologists who ignore questions of theory generation. What might have
been taken to be mutually reinforcing reasons to be a Bayesian are actually
conflicting desiderata for real agents.
But some philosophers may insist that epistemology concerns norms for
ideal agents rather than for merely actual ones. And unlike every system
of interest to man-including himself, robots, animals, and even aliens from
other planets-an "ideal agent's" logical and mathematical ability is unlim-
ited by the results of computation theory. It is fair to ask why epistemology
should be for such peculiar agents, who are limited like all ordinary agents in
one respect (observationally) but not in another (cognitively).
One proposal is that ideal agents are an abstraction, and that ab-
stractions are a practical necessity in all formal reasoning. Messy and ill-
understood factors may be disregarded in favor of elegant, formal principles.
But if every system in which we are interested seems to have a limitation,
and if we have an elegant formal theory about this limitation, as we do in the
case of computability, then the abstraction argument suggests more of sloth
than of virtue.
A stronger negative position is that epistemology is the study of justified
belieffor any agent whose observational power is limited. Since the limitations
of ideal agents transfer to any agent with other limitations, epistemological
results for ideal agents are more general, and therefore more interesting than
those for computationally limited agents.
While it is true that the limitations on ideal agents are limitations on all
more limited agents, it is false that any solution to the problems of an ideal
agent is a solution to the problems of a real one. In Gaifman's theorem, for
example, we see that the Bayesian Commandments cannot both be observed
by a real agent. So generality of applicability of epistemological principles is
no reason to focus on ideal agents.
A third position is that although ideals are not themselves achievable
by real agents, they are normative in the sense of promoting the realizable
actions that better approximate them. This is fair enough, but notice that
it is no argument for ignoring computation in epistemology. Rather, it is an
invitation to characterize the sense in which a proposed ideal be effectively
approximated.
An adequate theory of approximating an ideal must have at least two
parts. First, there must be a concept of distance from the ideal so that some
acts can be determined to be further from the ideal than others. Second,
EFFECTIVE EPISTEMOLOGY 121

there must be a motivation for achieving better approximations of the ideal


that is a natural generalization of the motivation for achieving the ideal itself.
If there is no well-motivated account of effective approximation for a proposed
ideal, then the ideal is not normative in the sense of guiding action, for either
there is no way to tell which actions are better than others, or there is no
motivation for calling the better actions better.
For example, consider the Bayesian ideal of coherence. It turns out
that there are hypothesis spaces over which there are countably additive
probability distributions, but no computable, countably additive probabil-
ity distributions 6 . So coherence is an unachievable ideal when one entertains
certain classes of hypotheses. The question is then how to approximate co-
herence in such a case. And whatever the metric of approximation, it had
better be the case that better effective approximations of coherence yield
more benefits analogous to immunity to Dutch book. So we also need to
invent something like "degrees of Dutch book", and to show that better de-
grees of approximation to coherence achieve lower degrees of Dutch book.
Without such a theory, coherence over rich hypothesis spaces is a fatuous,
non-normative ideal. The moral is this: if you insist on advising the pub-
lic to hitch its wagon to a star, you ought to provide a way to characterize
which broken wagons came closer to the stellar rendezvous than others and
to explain why coming closer is better than not trying at all.
Effective Epistemology aud Psychology
Since humans believe, reason, and attempt to justify their beliefs, and
since psychologists intend to explain these facts with computational models,
effective epistemology and cognitive psychology can appear quite similar. But
similarity is not identity. The epistemologist's aim has often been to improve
the human condition rather than to describe it 7 . It is easy to see that epis-
temologists can evaluate lots of possible methods for testing and generating
hypotheses or for altering degrees of belief even if no human ever uses them.
But when we move from the question of ends to the question of means,
the relationship between psychology and epistemology becomes more tangled.
For if past success is any evidence of a heuristic principle's soundness, then
to the extent that humans are successful in science, it would seem fruitful
to codify and to recommend the heuristics that successful humans use. And
these principles may be covert, in the sense that no human could "find" them
in an introspective exercise. So to find them, the epistemologist would have
to resort to the usual experimental and theoretical techniques of cognitive
psychologists. On this approach, epistemology can look just like empirical
psychology, even though its aims are quite distinct. Herbert Simon is a
notable proponent of this conception of epistemological method.
122 CHAPTER 4.1

There is nothing wrong with beginning effective epistemology with the


study of covert human methods, provided the empirical problems involved
are not too onerous. There are weak evolutionary and empirical arguments
in their favor as a place to start in the search for good techniques. But
these arguments are just the starting point for an epistemological analysis of
these techniques. For example, one might plausibly expect Einstein's implicit
search heuristics to be good ones for generating scientific theories, for he did,
after all, invent relativity theory. But it is possible that his commitments to
field theory and to symmetry conditions made him a very inflexible theory
generator. Perhaps he would have been a crank in any possible world with a
messier space-time structure than ours.
It does not suffice to reply that success in the actual world is all that
counts. Fist of all, one only uses a theory generator when one does not know
which possible world he is in. Hence, a method with strong a priori com-
mitments to one world in the set is undesirable from the point of view of the
user even if its commitment happens to be true. Moreover, each new subject
matter in our universe presents us with an entirely new hidden structure to
contend with. To succeed in theorizing about these different subject matters,
we must essentially succeed in decoding many distinct possible structures.
The structure of cognition is a world apart from the structure of a cell's
chemical pathways, which in turn is a world apart from the dynamical struc-
ture of a galaxy. Hence, past success in one subject matter is not persuasive
evidence of future success in others, unless one knows on formal grounds that
one's method is general.
A lucky crank can look like a genius in his subject without being one.
One difference between the crank and the genius is that the latter is more
flexible than the former. A more flexible agent can succeed in a broader
range of possible circumstances. Success in a world involves several factors,
some of which are in conflict. Would the generator's conjectures fit the input
evidence that might be encountered in this world, or are there inputs for
which its hypotheses are vacuous, irrelevant, or false? Can the procedure
converge to the truth in the world in question? Another aspect of rational
method is efficiency, for an inefficient method costs more to yield the same
benefit. So another important question is whether there exist much faster
methods that are as flexible as the method in question.
Questions of the efficiency, generality, and correctness of algorithms are
just the sorts of questions computation theory can address in a mathematical
manner. If a discovery algorithm appears to work pretty well over a class of
trials, epistemologists should feel under some obligation to prove some facts
about its scope and limits, its strong suits and blind spots. Psychologists are
EFFECTIVE EPISTEMOLOGY 123

under no such obligation, although such analyses may facilitate the explana-
tory and predictive power of psychological theory8.
Effective Epistemology and Artificial Intelligence
Artificial intelligence has emerged from its academic honeymoon. The-
oretical computer scientists already view it as a kind of clean witchcraft, in
which eyes of newts are mixed with warts of toads-but only symbolically. A
little pseudo-probability here, a little Aristotelian metaphysics there, and a
good deal of unintelligible hacking to hold it all together, and voila! Franken-
stein's monster surprises its creator as it cranks through its unexpected be-
haviors.
When pressed regarding the lack of theoretical depth in the field, many
AI proponents slide into the posture of cognitive modelling. Since the brain's
procedures may be messy or poorly motivated, why shouldn't cognitive mod-
els be the same way? But in this view, the absence of psychological evidence
in most AI articles may raise the questioning eyebrows of cognitive psycholo-
gists, whose models are usually less detailed, but whose empirical arguments
are often very sophisticated. But AI is a much more creditable study when
it is interpreted as effective epistemology. Like epistemologists (but unlike
psychologists) AI technicians can pursue interesting studies in blissful igno-
rance of how humans actually work. If an AI programmer were to develop a
super-human discovery engine than can demonstrably satisfy various method-
ological criteria he would be overjoyed. So would an epistemologist. And an
AI enthusiast, like an epistemologist, has no qualms about saying that you
ought to use his method.
And like epistemologists (but unlike computation theorists), AI techni-
cians rarely have clear conceptions of the problems their procedures are to
solve. This is not so much a shortcoming in AI practice as a fundamental fact
about the vague subject matter of the field. In computation theory, a problem
is just a mathematical function. A program solves a problem (function) just
in case it computes it. So the natural order of business in computation theory
is to define a function extensionally, and to decide whether it is computable
or not, how impossible it is to compute if it cannot be computed. and how
long it takes to compute if it can be computed.
The usual technique in artificial intelligence is quite different .. \ "prob-
lem" is a vaguely specified area of human competence--say learning. Once a
problem area is specified, the AI programmer typically begins to play around
with data structures and procedures that seem to do what humans take them-
selves to be doing when they address problems in the relevant area. And if
the resulting procedures take a lot of time to run, the offending subroutines
are altered until they run quickly---even if this alters the input-output be-
124 CHAPTER 4.1

havior of the overall system. The final result is a program that runs large
cases on a real computer and that goes through steps that seem reasonable
to humans; but whose overall input-output behavior may be quite unknown
to the designer.
From the computation theorist's point of view this way of proceeding
appears underhanded. Whenever the problem to be solved by his current
program becomes too difficult, the AI programmer changes the program until
it solves an easier problem-and calls the result progress. The computation
theorist accepts no progress other than solving the same (mathematically
precise) problem in a less costly way. And finding a moderately efficient
algorithm for an easier problem should never be confused with finding a very
efficient solution to a given problem.
But if the AI programmer does not usually make mathematical progress,
his approach can make a kind of epistemological progress. If the view pro-
pounded in this essay is correct, epistemologists should strive for rational,
effective methods for discovering, testing, and maintaining theories. And in
searching for such methods, one may either examine effective methods that
seem plausible to see whether they are rational, or one may first propose con-
straints on rational behavior and then analyze the computational difficulties
of these behaviors. At worst, the AI approach of starting with procedures
can lead to a rambling, unintelligible program that runs on a computer but
which carries our understanding of the kinematics of rational belief not one
whit further. At best, it focuses attention on principles that can possibly be
normative for real robots and people. On the other hand, the standard, philo-
sophical approach of proposing abstract principles of rationality can lead to
non-normative, inapproximable ideals, which in the phrase of Hilary Putnam,
are "of no use to anybody". At best, it focuses attention on the motivation
for a method, rather than on getting the method to run on a computer.
Prospects
I have argued that effective methods of discovery and hypothesis eval-
uation are not only acceptable objects of epistemological study, but are its
proper objects-so far as physically possible beings are concerned. One can
begin the study in various ways. One can attempt to discover the human
methods as a bench-mark and then evaluate and improve them; one can de-
sign computer programs that seem to perform intelligently and then analyze
them; or one can define abstract criteria of adequacy and subsequently search
for procedures that satisfy them. The first approach involves the techniques
of cognitive psychology, the second is the approach of artificial intelligence,
and the third is the standard approach of epistemologists.
EFFECTIVE EPISTEMOLOGY 125

Some logicians and computer scientists have already been busy bridging
the gap. For example, there is an extensive interdisciplinary literature span-
ning the fields of recursion theory, linguistics, statistics and logic that focuses
on the ability of discovery methods to converge to the truth in various classes
of possible worlds 9 . These studies juggle the desideratum of convergence to
the truth with those of effectiveness, complexity, verisimilitude and confirma-
tion. Other research centers on the effectiveness of probabilistic norms, as we
have seen in the case of Gaifman's paper.
The cross-fertilization of computer science and epistemology is still in
its infancy, and the prospects for discoveries are still good. My own research
concerns computational issues in the discovery of complete universal theories.
Another natural project would be to find a well motivated theory of effec-
tively approximable Bayesian coherence. Still another would be to investigate
the computational complexity of non-dogmatic coherence over propositional
languages. Only an artificial disdain for practicality prevents a revolutionary,
computational reworking of epistemology. The tools are ready and waiting.

NOTES

1 An exception is Hilary Putnam's "'Degree of Confirmation' and Inductive


Logic". My response to this argument may be found in The Automated Discovery
of Universal Theories.
2 For a more thorough discussion, see Glymour, et al, 1986.
3 Larry Laudan has made roughly this point.
4 in the sense of set inclusion.
5 A method is said to be Bayesian if its conjecture always has maximal pos-
terior probability on the input evidence.
6 Proof: Let propositions be partial recursive functions, and let hypotheses be
indices drawn from an acceptable numbering (Rogers, 1967) of the partial recursive
functions. Hence, any two indices of the same function are equivalent, and must
therefore be assigned the same probability values. Suppose for reductio that P is
a count ably additive probability distribution on the partial recursive functions. P
cannot be uniform, for there is a countable infinity of partial recursive functions,
and if P were to assign each function f the same value r, the countable sum over
all functions would be unbounded. Let i be an index and let r!>i be the ith partial
recursive function. Let [i] be the set of all j such that PU) = P(i). Since P
is computable, [i] is neither the set of all iudices nor the empty set. Since P is
computable, [i] is a recursive set (on input k, just compute P(k) and P(i) and see
whether the results are identical). But notice that [i] is the set of all indices of
some non-universal and non-empty subset of the partial recursive functions. But
by Rice's theorem (Rogers, 1967), 110 such set of indices is recursive. Hence, P is
110t effective. Q.E.D.
7 Quine's views being a notable exception.
126 CHAPTER 4.1

8 As a case in point, consider the application of the theory of learn ability to


linguistics by Kenneth Wexler (Wexler & Culicover, 1983).
9 For a good survey, see (Angluin, 1980).

REFERENCES
Angluin, D (1980) "Finding Patterns Common to a Set of Strings," Journal of
Computer and System Sciences, 21:46-62.
Gaifman, H. & Snir, M. (1982) "Probabilities over Rich Languages, Testing and
Randomness," Journal of Symbolic Logic, 47:495-548.
Glymour, C., Kelly, K, Scheines, R. & Spirtes, P. (1986) Discovering Causal Struc-
ture: Artificial Intelligence for Statistical Modelling. New York, NY: Academic
Press.
Kelly, K.T. (1986) The Automated Discovery of Universal Theories. Ph.D. Thesis,
University of Pittsburgh.
Osherson, D.N., Stob, M. & Weinstein, S. (1986) Mechanical Learners Pay a Price
for Bayesianism
Popper, KR. (1968) The Logic of Scientific Discovery. New York, NY: Harper &
Row.
Putnam, H. (1963) "'Degree of Confirmation' and Inductive Logic," in A. Schlipp
(ed.), The Philosophy of Rudolf Ca,·nap. LaSalle, IL: Open Court.
Rogers, H. (1967) Theory of Recursive Functions and Effective Computability. New
York, NY: McGraw-Hill.
Wexler, K & Culicover, P.W. (1983) Formal Principles of Language Acquisition.
Cambridge, MA: MIT Press.

Kevin Kelly
Department of Philosophy
Carnegie Mellon University
Pittsburgh, PA 15213
128 CHAPTER 4.2

comprehensively, and formally in our conceptual models of them. One can


properly say that the central problem of "computational epistemology" is pre-
cisely to understand how mind, its power so mismatched with the complexity
of what it is trying to grasp, can succeed even in a gross and approximate
way in dealing with its external environment. And we have only that same
imperfect, bounded human rationality (and the aid computers can give us)
as our tool for building epistemological theories. Recognition of this state
of affairs should instill in us modesty about how much of the complexity of
these phenomena we are likely to capture in formal theorems, and how much
in less tidy theories arrived at through painstaking empirical observation and
experimentation. And an examination of the history of the other sciences
might even persuade us that the formalization of a theory is often a (very
useful) cleanup operation that can be performed after the shapes of theories
have been discerned by observation.
Professor Kelly asserts that "if a discovery algorithm appears to work
pretty well over a class of trials, epistemologists should feel under some obli-
gation to prove some facts about its scope and limits, its strong suits and
blind spots." Investigate its scope and limits, its strong spots-yes. Prove
theorems about them-maybe. I confess that I feel only a mild ohligation of
this sort since I do not want to restrict my knowledge, even about theoreti-
cal matters, to what I can capture in formally proved theorems. One of the
things the past thirty years has taught us is that epistemology can be an
empirical as well as a mathematical science.
Empirical scientists should not allow themselves to be cowed by epithets
like "clean witchcraft" or "unintelligible hacking." It is time that we look to
the natural sciences-to physics, chemistry and especially to biology-for our
models, and not accept the unsubstantiated and unsubstantiable claim that
theorem proving is the royal road to theoretical knowledge about epistemology
or any other subject.

NOTES

1 And "theory" is not at all limited to those things that can be formally
demonstrated-as distinct from those that are verified empirically. Readers may
be surprised that I include mathematics-and philosophy for that matter-in my
claim. But one need only recall the thousands of hours that such giants as Euler and
Newton spent "playing" with numbers in their search for theorems in number theory
and combinatorial arithmetic to recognize what a large role empirical investigation
has played in the development of mathematics. A contemporary example is the
use of the computer by Mitchell Feigenbaum and others to discover the surprising
behavior of simple non-linear differential equations in the transition between laminar
and turbulent, or chaotic, behavior, a precursor to building an elegant new formal
theory for the phenomena thus discovered empirically.
THE FLAWS IN SEN'S CASE AGAINST PARETIAN LIBERTARIANISM

JONATHAN PRESSLER

Introduction
Every society is characterized by a system of social institutions. We re-
gard some of these systems as superior to others. But no one who thinks very
long about social institutions believes that any currently existing system is
optimal. When we try to explain why existing systems are deficient, we usu-
ally appeal to normative principles that formulate conditions which we think
any fully adequate social system should satisfy. There are, of course, differ-
ences of opinion over these principles. Some social and political philosophers
advocate adequacy conditions that others dispute. However, such disagree-
ments usually take place in the context of a shared assumption that it is
possible (at least in principle) for some system of social institutions to meet
all of the adequacy conditions that need to be imposed on such systems.
But is there any internally consistent set that contains all of the ade-
quacy conditions that need to be imposed on systems of social inst.illlt.ions?
Over the past 25 years, social choice theorists have proved a nllllli" I of sur-
prising theorems which suggest that there is inherent conflict even among
the normative principles that all social philosophers would find it reasonable
to apply to institutional systems. A series of such theorems, for example,
seem to suggest that there is a deep inconsistency between the following two
propositions:
(I) A society's institutions ought to guarantee that all members of society
have some things that they are free to do, or not do, as they please.
(II) If everyone in society prefers one available alternative to another, soci-
ety's institutions should not permit the realization of the latter alter-
native.
This apparent inconsistency, first discovered by Amartya Sen, has come to
be known as the Paradox of the Paretian Libertarian.! A Paretian (after the
Italian economist Wilfredo Pareto) is someone who subscribes to proposition
(II). A libertarian, broadly construed, is someone who subscribes to proposi-
tion (I).
Although Sen's unquestionably valid formal results may seem to show
that Paretian libertarianism is an untenable doctrine, I shall argue that they
do not really support this conclusion. More explicitly, I shall try to show that
129
W. Sieg (ed.), Acting and Reflecting, 129-141.
© 1990 by Kluwer Academic Publishers.
130 CHAPTER 5

whenever Sen posits a Paretian-libertarian conflict to explain an apparently


troubling result in social choice theory, the difficulty can be better dealt
with either by claiming that the theorem in question imposes overly strong
background conditions on social choice mechanisms or by claiming that it
relies on an unacceptable construal of individual liberty.
Preference Profiles, Institutional Systems, and Social Choice
Functions
In the course of my discussion, I shall be using the following expressions
as technical terms of art: "preference profile", "system of social institutions",
"social choice function". Briefly stated, a preference profile is a sequence of
orderings that models the preferences of society's members over a fixed set of
alternative outcomes. Each ordering in the profile represents the preferences
of a particular individual. A system of social institutions is a mechanism
that assigns social choice functions to preference profiles. These social choice
functions are rules that select a non-empty set of "best" outcomes from the
class of available outcomes. Such a set of "best" outcomes is standardly called
a choice set.
An example involving an election between two candidates, Cain and
Abel, will serve to illustrate the connection between preference profiles, social
institutions and social choice functions. Suppose that our system of social
institutions stipulates that the candidate preferred by a simple majority of
voters is to hold office. In that case, if the majority prefer Cain to Abel, our
system of institutions will select a social choice function that identifies {Cain's
holding office} as the set of best outcomes. On the other hand, if a majority
of voters prefer Abel to Cain, the choice function that our institutions select
will make {Abel's holding office} the choice set. (Notice that majority rule
itself should not be construed as a social choice function. Rather, it is an
institution that places constraints on the selection of choice functions.)
Sen's Original Theorem
Armed with the foregoing introduction to basic terminology, let us turn
our attention to the theorem that originally led Sen to claim that there is
a paradoxical conflict between libertarianism and Paretianism. This theo-
rem says that four apparently plausible acceptability constraints on social
institutions cannot jointly be satisfied by any system of institutions. The
first of these constraints is the Unrestricted Domain Condition-for short,
"Condition U".

Definition 1 Condition U is satisfied by a system M of social institutions


if, and only if, M assigns a social choice function to every preference
profile.
PARETIAN LIBERTARIANISM 131

In other words, to say that a system of social institutions ought to satisfy


the Unrestricted Domain Condition is to say that, no matter what people's
individual preferences happen to be, the institutional system ought to specify
a choice function. The basic idea behind Condition U is very simple: an ideal
system of social institutions is one that can respond to all situations that
might arise in society. Thus, if we assume that such a system will contain a
set of election rules, this set should ensure that, no matter who is running or
what people's preferences are with respect to the candidates, someone will be
elected to office.
The second acceptability constraint that plays a role in Sen's origi-
nal theorem is the Binary Choice Condition-or, to borrow Tom Schwartz's
acronym, "BICH". 2 Although BICH is a constraint on institutional systems,
binariness itself is a property of choice functions. A social choice function
is binary when, and only when, the alternatives that it deems best in any
given set S of available outcomes are just those which are not bested in any
pair-wise comparison between members of S.

Definition 2 A system of social institutions satisfies BICH just III case it


assigns only binary choice functions to preference profiles.

So, if there is an election in an institutional system that satisfies BICH, the


result of the election will be the same as the one that would have come about if
we had run an exhaustive series of pair-wise elections between the candidates
and had selected just those candidates who didn't get defeated in any of these
pair-wise contests.
The final two constraints that enter into Sen's original theorem are the
ones that appear to be most central to the Paretian-Libertarian Paradox.
The less complex of the two is the Weak Pareto Condition-or, more simply,
Condition P.

Definition 3 A system of social institutions satisfies Condition P if, and


only if, it assigns choice functions in such a way as to guarantee that x
is the sole member of the choice set in any context where x and yare
the only available alternatives and everyone strictly prefers x to y.

Notice that Condition P only places constraints on the content of choice sets in
contexts where there are exactly two available alternatives. Thus, by itself, P
constitutes an extremely weak demand. However, it becomes somewhat more
robust when it is combined with BICH. If an institutional system satisfies
both BICH and P, then, no matter how large the set of available alternatives
happens to be, if it contains both x and y and everyone prefers the former to
the latter, y will be excluded from the choice set.
132 CHAPTER 5

The fourth and final constraint that Sen appeals to in his original the-
orem is Condition L, the Minimal Libertarian Condition. L can best be
explained in terms of a concept called "decisiveness". An individual i is deci-
sive for a pair of alternatives (x, y) just in case x is guaranteed to be the best
alternative in the set {x, y} if i strictly prefers x to y. In other words, i is
decisive for (x, y) in a given system M of social institutions if, and only if, in
every choice context where i strictly prefers x to y and no other alternatives
are available, M selects a choice function that picks x as the best available
alternative. The same individual i is said to be both ways decisive for (x, y)
in institutional system M if, and only if, he is decisive for that pair in M and,
in addition, he is decisive for (y, x) in M.
Given the notion of bi-directional decisiveness, Minimal Libertarianism
can be formulated very succinctly:

Definition 4 A system of social institutions satisfies Condition L if, and only


if, it assigns choice functions in such a way as to guarantee that there
are at least two individuals i and j such that i is both ways decisive for
a pair (x, y) and j is both ways decisive for a pair (w, z).

Like Condition P, Condition L only restricts the content of choice sets


in contexts where there are exactly two available alternatives. Once again,
however, more substantial implications emerge when we add BICH to our list of
institutional constraints. In order to state these implications in an economical
way, it will be useful to introduce a modified notion of decisiveness. Let us say
that a system of social institutions makes an individual i both ways strongly
decisive for a pair of alternatives (x, y) just in case it assigns choice functions
in such a way as to guarantee that

(a) whenever i prefers x to y and x is available, y does not belong to the set
of best alternatives, and
(b) whenever i prefers y to x and y is available, x does not belong to the set
of best alternatives.

We can use the concept of bi-directional strong decisiveness, to formulate


a more robust brand of minimal libertarianism called "Condition L*":

Definition 5 A system of social institutions satisfies Condition L* if, and


only if, it assigns choice functions in such a way as to guarantee that
there are at least two individuals i and j such that i is both ways
strongly decisive for a pair (x, y) and j is both ways strongly decisive
for a pair (w, z).
PARETIAN LIBERTARIANISM 133

Unlike Condition L, Condition L* has direct implications for social choice in


contexts where there are more than two available alternatives. Obviously,
the difference in power between L and L* stems from the difference between
mere decisiveness and strong decisiveness. However, if an institutional system
satisfies BIeH, then anyone who is decisive for a given pair of alternatives
automatically becomes strongly decisive for that pair. Thus, if a system of
social institutions meets both BIeH and Condition L, it follows that the system
also satisfies L*.
We have now introduced all of the constraints on institutional systems
that playa role in the theorem which prompted Sen to posit the existence of
a paradoxical conflict between Paretianism and libertarianism. So, without
further ado, let us state Sen's result:

Theorem 1 No system of social institutions that satisfies Condition U and


BIeH can satisfy conditions P and L.

Sen has illustrated this theorem with a well-known example involving


two individuals and a single copy of Lady Chatterley's Lover.3 The two in-
dividuals are the prudish Mr. 1 and the lascivious Ms. 2; the single copy of
Lawrence's notorious novel is the one that resides in the public library. Mr. 1
and Ms. 2 both have library cards that give them standard borrowing privi-
leges. Thus, each may borrow the copy of Lady Chatterley's Lover, provided
that it isn't already checked out. We assume that the library is just about
to close for the day, that no potential borrower other than 1 or 2 remains in
the building, and that the lone copy of Lady Chatterley's Lover is still in the
stacks.
In these circumstances there are, at most, three available options:
a: the prudish Mr. 1 checks out the library's copy of Lady Chatterley's
Lover;
b: the lascivious Ms. 2 checks out the library's copy of Lady Chatterley's
Lover;
n: no one checks out the library's copy of Lady Chatterley's Lover.

Mr. 1, being a prude, likes n the best. But if either he or Ms. 2 is going to
take out Lady Chatterley's Lover, he would rather that he be the one to do
so, since he doesn't want 2's already shameful morals to be further corrupted.
Mr. 1 therefore prefers option a to option b. Whereas he likes n the most,
n is Ms. 2's least favorite alternative. She wants Lady Chatterley's Lover to
be read, and she knows that no one will read it if the novel just sits on the
library shelf gathering dust. One might think that she would most like to
check the book out herself. However, she would really prefer that the prudish
134 CHAPTER 5

Mr. 1 check it out; for she believes that if the book is in 1's possession, he may
well be tempted to expand his erotic horizons by reading Lawrence's steamy
prose. To sum matters up, then, we have the following preference orderings
for Mr. 1 and Ms. 2:

Mr. 1: n,a,b
Ms. 2: a,b,n
(Here every alternative is strictly preferred to each option that falls to its
right.) Notice that 1 and 2 both prefer a to b. Let us suppose that everyone
else in their society also has this preference.
Imagine now that Ms. 2 is not in a position to check out Lady Chatterley's
Lover (either because she has forgotten her library card or simply isn't in the
library). This leaves a and n as the only available alternatives. But where
these are the only available options, it seems sensible to say that the choice
set should be {a} if 1 prefers a to n, and {n} if 1 prefers n to a. In other
words, the institutional system should make 1 both ways decisive for (a, n).
Similarly, 2 should be made both ways decisive for the pair (b, n) (for if band
n are the only available options, it should be entirely up to 2 which option
gets realized).
Now then, if 1 is both ways decisive for (a, n), the fact that he prefers
n to a means that a does not belong to the choice set when a and n are the
only available alternatives. And if 2 is both ways decisive for (b, n), the fact
that she prefers b to n means that b is not a member of the choice set when
only band n are available. But BICH implies that if an outcome y is excluded
from the choice set when x and yare the only available options, then y is
excluded from the choice set whenever x is available. So, assuming that 1
and 2 are both ways decisive for (a, n) and (b,l1) respectively, when the class
of available alternatives is {a, b, n}, the choice set cannot contain either a or
n. This leaves b as the only possible member of the choice set. However,
both 1 and 2 prefer a to b; and we have assumed that this preference is
shared by everyone else as well. Consequently, the combination of BICH and
Condition P excludes b from the choice set. We are thus left with an empty
set of best alternatives. But to say that the set of best alternatives is empty
is tantamount to saying that no system of social institutions which satisfies
BICH and Condition P can cope with a situation in which people have the
powers and preferences that we have just described. This is a troubling result;
for the powers that we have assigned to 1 and 2 seem entirely innocuous, and
the preference distribution that we have stipulated is perfectly conceivable.
PARETIAN LIBERTARIANISM 135

Why Theorem 1 Fails to Support the Existence of a


Paretian-Libertarian Paradox4
As Sen sees it, the Lady Chatterley's Lover example illustrates a deep
conflict between the demands of Paretianism and those of libertarianism.
However, a very different lesson might be inferred from Sen's example. To
present this lesson in the clearest light, it will be useful to make the following
assumptions about individuals 1 and 2: both are rational; each knows the
other to be rational; each is aware of the other's preferences; each respects
the other's rights; and each knows which alternatives are available. Given
these assumptions, it is not difficult to see which outcome will be realized
when the set of available alternatives is {a, b, n}. Since Ms. 2 would rather
check out Lady Chatterley's Lover than have no one borrow it at all, she will
take the book out if Mr. 1 fails to do so. But for 1, the least desirable outcome
is the one in which 2 borrows Lawrence's novel. So, given his knowledge that
this outcome will occur unless he checks the novel out himself, he will certainly
check it out. And this means that a will be realized.
We see, then, that where a, b, and n are all available, a will come about
naturally, through a process that does not violate anyone's rights. Notice also
that a is a Pareto-efficient outcome (because if either b or n were to result,
2 would be in a situation that she prefers less than a ). Finally, an outcome
other than a could only be produced through the violation of someone 's rights.
(Thus, 1 would violate 2's rights if he refused to check out Lady Chatterley's
Lover and then used force to prevent 2 from borrowing the book.) But if a
given Pareto-efficient outcome 0 would arise naturally through a process in
which no rights are violated and no other outcome could be produced except
by violating someone's rights, then, surely, 0 is the only socially permissible
outcome. Furthermore, if 0 is the only socially permissible outcome, it must
be the only outcome that we can properly regard as best. So, in the choice
context that we have been considering, a is the sole best outcome. In other
words, an institutional system will deal properly with this context only if it
selects a choice function whose value is a when the set of available options is
{a, b, n}.
Now that we have determined what the choice set should be when a,
b, and n are all available, let us eliminate b from the set of available op-
tions without changing any other feature of the choice context that we have
been examining. (We may once again imagine that 2 either has forgotten
her library card or has already left the library without checking out Lady
Chatterley's Lover.) In this slightly modified context, it is plain that the
prudish Mr. 1 can produce n, the outcome that he likes best, just by leaving
Lawrence's novel on the shelf. So this is surely what he will do. Any other
outcome could only result. from the violation of 1's right not to borrow the
136 CHAPTER 5

book. Moreover, n is Pareto-efficient. Consequently, when a and n are the


only available alternatives, n is the only outcome that we can properly regard
as best. Thus, in order to deal appropriately with the choice context that
results when we eliminate b from the set of available options, an institutional
system must select a choice function whose value is n when the set of available
options is {a,n}.
To sum matters up, we have just considered two choice contexts in which
people have the very same preferences. The only difference between the con-
texts is that the set of available alternatives is {a, b, n} in the first and just
{a, n} in the second. But as a result of this difference, the proper choice set
for the first context is not the same as the proper choice set for the second.
The best outcome when a, b, and n are all available is a; but the best outcome
when only a and n are available is n.
Since both contexts are characterized by the same preference profile
F, any system of social institutions will select choice functions in such a way
that the same choice function will determine the choice sets for both contexts.
Thus, an institutional system will deal appropriately with the contexts that
we have just been considering only if the choice function C that it assigns to
preference profile F is such that C( {a, b, n}) = a and C( {a, n}) = n. However,
a choice function that has this property cannot be binary; for if C is binary
and C( {a, b, n}) = a, then for every option x in {a, b, n}, a must belong to
C({a,x}).
The foregoing considerations suggest that we should not require systems
of social institutions to select only binary choice functions. In other words, not
only is BICH an inappropriate adequacy constraint on institutional systems, it
is a constraint that no acceptable system of social institutions can satisfy. But
once we recognize that BICH should not be imposed on institutional systems,
we eliminate an important reason for thinking that the Lady Chatterley's
Lover example illustrates a conflict between Paretianism and libertarianism.
While a system of social institutions will not be able to cope with the powers
and preferences that people have in this example if it is a system that meets
both Condition P and BICH, some institutional systems that satisfy P without
satisfying BICH can cope with these powers and preferences. Indeed, some
such systems have the desirable property of judging that a is best when the
available options are a, b, and n, but n is best when only a and n are available.
A similar conclusion applies to Theorem 1. This theorem only suggests
that there is a paradoxical conflict between Paretianism and libertarianism
when BICH is assumed to be a basic acceptability condition for institutional
systems. Once we reject this assumption, the alleged Paretian-libertarian
conflict seems to dissolve.
PARETIAN LIBERTARIANISM 137

Sen's First Attempt to Resuscitate the Paradox


Sen has offered two replies to those who want to escape his paradox by
jettisoning BIeR. In one of these replies, he appeals to a result that has been
proved by Batra and Pattanaik. 5 This result involves a condition that Robert
Sugden has dubbed "Minimal Consistency".6

Definition 6 A system of social institutions is minimally consistent if, and


only if, every social choice function C in its range has the following
property: if C deems that x bests y in a pair-wise comparison between
x and y, then it doesn't judge y to be one of the best alternatives in a
larger set containing both x and y unless it also judges x to be one of
the best alternatives in that larger set.

The import of minimal consistency can readily be understood by con-


sidering a three-way election between Shadrach, Meshach, and Abednego.
Minimal consistency requires that Abednego not be a winner in this three-
way contest unless its winners include everyone who would beat Abednego
in pair-wise elections between him and the other two candidates. Thus, if
Shadrach would defeat Abednego in a pair-wise contest and Abednego is a
winner in the three-way election, Shadrach must also be a winner in that
election.
Minimal Consistency is a weaker constraint than BIeR. (Indeed, it is
even weaker than a, a consequence of BIeR to which Sen has long subscribed. 7 )
It may therefore seem significant that Batra and Pattanaik have shown that

Theorem 2 No system of social institutions that satisfies Minimal Consis-


tency and Condition U can satisfy both Condition L and Condition
p.8

In light of this theorem and the fact that a system of social institutions
can be minimally consistent without satisfying BIeR, Sen suggests that the
basic problem underlying his own theorem cannot be solved merely by show-
ing that an acceptable institutional system must generate some non-binary
choice functions. 9 This is a perfectly valid point. If we permitted institu-
tional systems to generate non-binary choice functions, but required them
to be minimally consistent, the Paradox of the Paretian Libertarian would
remain a serious problem.
However, the same considerations that militate against BIeR also un-
dermine Minimal Consistency. We have already seen that an institutional
system will not be able to deal appropriately with the Lady Chatterley's
Lover example unless the system's range contains a choice function C such
that C( {a, n}) = nand C( {a, b, n}) = a. But a system of institutions that
138 CHAPTER 5

selects such a choice function cannot be minimally consistent; for if it selects


a choice function C such that C( {a, n}) = n, it can only satisfy Minimal
Consistency if a and n both belong to C( {a, b, n}) or neither does.
Since it is clear that we need a system of social institutions that can
deal properly with choice contexts similar to those described in the Lady
Chatterley's Lover example, it is plain that Minimal Consistency is not an
acceptable adequacy constraint on institutional systems. But once we see
that a complete system of social institutions need not (and, indeed, should
not) be minimally consistent, Theorem 2 no longer suggests that there is
a paradoxical conflict between Paretianism and libertarianism: although a
minimally consistent institutional system that satisfies Condition U cannot
meet conditions P and L, many systems of social institutions that satisfy U
meet P and L as well. We must therefore conclude that Sen's appeal to this
theorem does not breathe any life back into the Paretian Libertarian Paradox.
Sen's Second Attempt to Revive the Paradox
Sen's second effort to resuscitate his paradox employs modifications of
conditions P and L. As I noted when I first introduced these conditions, if an
institutional system satisfies both BICH and P, then, no matter how large the
set of available alternatives happens to be, if it contains both x and y and
everyone prefers the former to the latter, y will be excluded from the choice
set. This consequence of BICH and P suggests a somewhat more robust version
of the latter constraint:

Definition 7 A system of social institutions satisfies Condition P* if, and


only if, it assigns choice functions in such a way as to guarantee that
whenever everyone prefers x to y and x is available, y does not belong
to the set of best alternatives.

P* is the natural Paretian counterpart of L*, the fortified version of libertar-


ianism that I defined in order to clarify the effect of BICH on Condition L. To
repeat,

Definition 5 A system of social institutions satisfies Condition L* if, and


only if, it assigns choice functions in such a way as to guarantee that
there are at least two individuals i and j such that i is both ways
strongly decisive for a pair (x, y) and j is both ways strongly decisive
for a pair (w, z).

(Recall that an institutional system makes an individuCll i strongly decisive


for a pair of alternatives (x, y) just in case it assigns chuice functions in such
a way as to guarantee that whenever i prefers x to y and x is available, y
does not belong to the set of best alternatives.)
PARETIAN LIBERTARIANISM 139

According to Sen, an institutional system cannot meet the demands of


Paretianism and libertarianism unless it satisfies conditions P* and L*. But,
as Sen shows,

Theorem 3 No system of social institutions that satisfies Condition U can


satisfy both Condition P* and Condition L*.10

Since the kinds of individual preferences that make it impossible for an in-
stitutional system to satisfy the conjunction of U, P*, and L* are ones that
might well arise in actual social choice contexts, Sen does not think that we
can avoid a Paretian-libertarian conflict by weakening Condition U. He thus
concludes that Theorem 3 provides a clear case for the existence of a deep
incompatibility between the demands of Paretianism and libertarianism.
In order to illustrate the problems that can be generated by combining
U, P*, and L*, Sen turns once again to his lady Chatterley's Lover example. 11
He proposes that the prudish Mr. 1 and la,;civious Ms. 2 sll(jltld be regarded
as both ways strongly decisive for {a, n} and {b, n}, respecLively. Given the
preferences that Sen has assigned to 1 and 2, this "decisiveness distribution"
guarantees that neither a nor n will be a best element in {a, b, n}. Moreover,
since everyone prefers a to b, P* tells us that b cannot qualify as a best
element of {a,b,n}. Therefore, the set of best alternatives in {a,b,n} is
empty. But (by definition) a social choice function cannot generate an empty
set of best alternatives. Consequently, no institutional system that satisfies
P* can select a choice function that is capable of coping with a context in
which people have the kinds of powers and preferences that Sen has assigned
to the characters in his Lady Chatterley's Lover example.
But has Sen made a reasonable assignment of powers to prudish 1 and
lascivious 2? As we have already seen, when the available alternatives are
a, b, and n, a can be realized without infringing anyone's rights. Thus, a
concern for the rights of 1 and 2 does not justify eliminating a from the choice
set. However, Sen thinks that l's rights imply that 1 is both ways strongly
decisive for {a, n}; and this bi-directional strong decisiveness, together with
l's preference for n over a, eliminates a from the choice set. We may therefore
conclude that Sen is making a mistake when he associates l's rights with bi-
directional strong decisiveness for {a,71}Y
More generally, Sen mischaracterizes libertarianism when he associates
rights with strong decisiveness. We can grant individuals all the rights that
libertarianism requires without making anyone strongly decisive for any pairs
of outcomes. This being the case, libertarian considerations do not demand
that institutional systems satisfy Condition L*. But once we agree that a
proper concern for individual rights is consistent with the rejection of L*,
140 CHAPTER 5

there is no temptation to think that Theorem 3 establishes the existence of


a Paretian-libertarian paradox.
Conclusion
In summary, Sen has not made a compelling case for his contention that
there is a paradoxical conflict between Paretianism and libertarianism. His
original theorem established that an institutional system cannot meet con-
ditions P and L if it has an unrestricted domain and its range is limited to
binary choice functions. But no Paretian-libertarian paradox emerges from
this result once we see that a compelling case can be made for institutional
systems that generate non-binary choice functions. Furthermore, the fact
that acceptable systems of social institutions must also violate Minimal Con-
sistency shows that Sen cannot draw a genuine Paretian-libertarian paradox
from the fact that an institutional system with an unrestricted domain must
violate either P or L if it is to be minimally consistent. Sen's final attempt to
defend the existence of a paradoxical conflict between Paretianism and liber-
tarianism is based on the result that a system of social institutions with an
unrestricted domain cannot satisfy the conjunction of conditions P* and L*.
However, since an institutional system can respect individual rights without
meeting L*, this last attempt to show that there is a Paretian-libertarian
paradox is no more successful than Sen's earlier efforts.

NOTES
1 See Sen, "The Impossibility of a Paretian Liberal", Journal of Political Econ-
omy, 78 (January/February 1970),152-7; Collective Choice and Social Welfare. San
Francisco: Holden Day, 1970, chps. 6 & 6*; "Liberty, Unanimity, and Rights", in
Sen, Choice, Welfare and Measurement. Cambridge, MA: M.I.T. Press, 1982, pp.
291-326; "Liberty and Social Choice", Journal of Philosophy, 80 (1983), 5-28.
2 Thomas Schwartz, The Logic of Collective Choice. Columbia: New York,
1985.
3 Sen uses this example in "The Impossibility of a Paretian Liberal" and in
Collective Choice and Social Welfare. To clarify certain matters that Sen's original
presentation leaves obscure, I shall offer a slightly modified version of the example.
These changes do not have any substantive importance.
4 The basic arguments in the following two sections bear a resemblance to
reasoning that has recently been presented by Robert Sugden. (See "Why Be Con-
sistent? A Critical Analysis of Consistency Requirements in Choice Theory", Eco-
nomica, 52 (May 1985), 167-83.) However, Sugden and I arrived at our arguments
independently.
5 Raveendra N. Batra & Prasanta K. Pattanaik, "On Some Suggestions for
Having Non-Binary Social Choice Functions", Theory and Decision, 3 (1972), 1-11.
6 Sugden, op. cit.
PARETIAN LIBERTARIANISM 141

7 A system of social institutions satisfies Q if, and only if, for every social choice
function C in its range, C judges an alternative x to be one of the best alternatives
in a set S only if it also judges x to be one of the best alternatives in every subset
of S to which x belongs. Other names for Q are "the Chernoff Condition" and "the
Independence of Irrelevant Alternatives". Sen staunchly defends Q in Collective
Choice and Social Welfare, calling it "a most appealing condition" (p. 81) and "a
very basic requirement of rational choice" (p. 17). The relation between BICR and
Q is discussed at length in Sen, "Social Choice Theory: A Re-examination", Econo-

metrica, 45 (January 1977), 53-89. See also Blair et.al., "Impossibility Theorems
without Collective Rationality", Journal of Economic Theory, 13 (1976), 361-379.
8 Batra & Pattanaik, op. cit.
9 Sen, "Liberty, Unanimity and Rights", p. 311.
10 This t.heorem is discussed in Collective Choice and Social Welfare, pp. 81-2,
and in "Liberty, Unanimity and Rights", p. 311.
11 Collective Choice and Social Welfare, p. 82.
12 Indeed, an institutional system that makes 1 both ways strongly decisive
for (a, n) actually undermines l's liberty. Given the rights that we would ordinarily
attribute to 1, he is in a position to prevent the realization of n, his least favorite
alternative. To prevent n from being realized, 1 only has to check out Lady Chat-
terley's Lover. But this is precisely what he cannot do if he is both ways strongly
decisive for (a, n). For if he checks out Lawrence's novel, he realizes aj and a is pre-
cluded by the combination of his preference for a over n and his strong decisiveness
for (a, n).
Jonathan Pressler
Department of Philosophy
Carnegie Mellon University
Pittsburgh, PA 15213
DECISIONS WITHOUT ORDERING

T. SEIDENFELD, M.J. SCHERVISH, and J.B. KADANE

Abstract
We review the axiomatic foundations of subjective utility theory with a view
toward understanding the implications of each axiom. We consider three differ-
ent approaches, namely, the construction of utilities in the presence of canonical
probabilities, the construction of probabilities in the presence of utilities, and the
simultaneous construction of both probabilities and utilities. We focus attention on
the axioms of independence and weak ordering. The independence axiom is seen to
be necessary in order to prevent a form of Dutch Book in sequential problems.
Our main focus is to examine the implications of not requiring the weak order
axiom. We assume that gambles are partially ordered. We consider both the con-
struction of probabilities when utilities are given and the construction of utilities
in the presence of canonical probabilities. In the first case we find that a partially
ordered set of gambles leads to a set of probabilities with respect to which the ex-
pected utility of a preferred gamble is higher than that of a dispreferred gamble.
We illustrate some comparisons with theories of upper and lower probabilities. In
the second case, we find that a partially ordered set of gambles leads to a set of lex-
icographic utilities each of which ranks preferred gambles higher than dispreferred
gambles.

1. Introduction: Subjective Expected Utility [SEU] theory

The theory of (subjective) expected utility is a normative account of


rational decision making under uncertainty. Its well known tenets are spot-
lighted by the familiar, canonical decision problem in which Sj : j = 1, ... , n
is a partition, and Oij is the outcome of optioni (acti) in statej. That is,
acts are functions from states to outcomes. This problem is illustrated in
Figure 6.1.
In the canonical decision problem, states are value-neutral and act inde-
pendent. The value of an outcome does not depend upon the state in which
it is rewarded, and the choice of an act does not alter the agent's opinion
(uncertainty) about the states. In insurance terms, there are no "moral haz-
ards."
General Assumption: Acts are weakly ordered by (weak) preference,
:::;, a reflexive, transitive relation with full comparability between any two
acts.

143
W. Sieg (ed.), Acting and Reflecting, 143-170.
© 1990 by Kluwer Academic Publishers.
144 CHAPTER 6

• • • • • •
011 °lj Oln


Oil °ij Oin


Oml Omj Omn

Figure 6.1: Canonical Decision Matrix

Subjective Expected Utility [SEU] Thesis: There is a real-valued


utility U( ... ), defined over outcomes, and a personal probability p( ... ), de-
fined over states, such that

Al ~ A2 if and only if L: j p(Sj )U(Olj) ~ L:j p(Sj )U(02j).


There are several well-trodden approaches to the normative justification
of the SEU thesis, which we discuss in the remainder of this section.
1.1 Utility Given Probability
The seminal efforts of J. von Neumann and O. Morgenstern (1947) pro-
vide necessary and sufficient conditions for an expected utility representation
of preference over (simple) lotteries: acts specified by a probability on a (fi-
nite subset of a) set of rewards. Their theory uses one "structural" axiom
and three axioms on preference ~.
Structural Axiom: Acts are simple lotteries (Li)' i.e., simple distri-
butions over a set of rewards. The domain of acts is closed under convex
combinations of distributions-denoted by aLl + (1- a)L2.
Weak-order Axiom: ~ is a reflexive, transitive relation over pairs of
lotteries, with comparability between any two lotteries.
Independence Axiom: for all Ll, L 2 , L 3 , (0 <a ~ 1)
Ll ~ L2 if and only if aLl + (1 - a)L3 ~ aL2 + (1 - a)L3.
Archimedean Axiom: for all (Ll -< -< L 3 ) 3(0 < a, (3 < 1)
L2
(3Ll + (1 - (3)L3 -< L2 -< aLl + (1 - a)L3 .
DECISIONS WITHOUT ORDERING 145

o P(rd 1

Figure 6.2: Curves of Indifference with Three Rewards

A particularly simple illustration of this theory involves lotteries over


three rewards (rl ~ r2 ~ r3), where the reward ri is identified with the
degenerate lottery having point-mass P(ri) =
1(i =
1,2,3). Following the
excellent presentation by Machina (1982), we have a simple geometric model
for what is permitted by expected utility theory. Figure 6.2 depicts the
consequences of the axioms above.
According to the axioms, indifference curves ("') over lotteries are par-
allel, straight lines of (finite) positive slope. Li is (strictly) preferred to Lj ,
Lj ~ Li, just in case the indifference curve for Li is to the left of the indif-
ference curve for Lj. Hence, in this setting, expected utility theory permits
one degree of freedom for preferences, corresponding to the choice of a slope
for the lines of indifference.
Another version of this example occurs with the decision theoretic re-
construction of "most powerful" Neyman-Pearson tests of a simple "null"
hypothesis (h o) versus a simple rival alternative (hd. We face the binary
decision given by the matrix:

ho hl
accept ho a b
reject ho c d
146 CHAPTER 6

where, we suppose that outcomes band c are each dispreferred to either


outcomes a and d. In the usual jargon, c is the outcome of a typel error and
b is the outcome of a type2 error. By the assumption that states are "act
independent," without loss of generality, we may rewrite the matrix with
utility outcomes:

ho hI
accept ho o -(I-x)
reject ho -x 0
where 0 < x < 1. The expected utility hypothesis requires that accepting ho
is not preferred to (j) rejecting ho just in case (1- Po)/Po ~ x/(l- x), where
Po is the "prior" probability of h o.
Suppose we have the option of conducting an experiment E (with a
sample space of possible experimental outcomes denoted by 0), where the
conditional probabilities p(·lh o) and p(-Iht} over 0 are specified by the de-
scription of E. A (Neyman-Pearson) statistical test of ho against hI, based
on E, is defined by a critical region nco; with the understanding that ho
n
is rejected iff occurs. Associated with each statistical test are two quan-
tities: (a,{3), where a = p(nlho) is the probability of a typel error, and
{3 = p(nClht} is the probability of a type2 error.
According to the N- P theory, two tests may be compared by their (a, {3)
numbers. Say that T2 dominates TI if (a2 ~ ad, ({32 ~ {3d and at least one
of these inequalities is strict. This agrees with the ranking of tests by their
expected utility since (prior to observing the outcome of the experiment) the
expected utility of test T, having errors (a, {3), is given by:

-[x· p(n&h o) + (1 - x) . p(nC&hd] = -[x· a . Po + (1 - x) . {3. (1 - Po)],


so that TI < T2 ifT2 dominates TI (except for the trivial cases of certainty:
Po = 0 or Po = 1, when T2 '" TI is possible still-but then there hardly is
need for a "test" of h o).
Given an experiment E, there are numerous, mutually undominated tests
based on E. For example, consider the falllily of un dominated tests of ho:
p = 0 versus hI: P = 1 from the observat.ion of a normally distributed
random variable X '" N[p,0"2], with specified variance 0"2. These are just
the family of "best," i.e., most powerful tests of ho versus hI-which, by the
Neyman-Pearson lemma, is the family of likelihood ratio tests for the datum
x. Table 6.1 lists some (a,{3) values for undominated tests from six such
= =
experiments: 0" 1/4; 1/3; 2/5; = = = =
1/2; 1; and 4/3.
Three of these families, corresponding to 0" = 1/3; 0" = 1/2; and 0" = 4/3,
are depicted by the curves in Figure 6.3. The graph shows the tangents to
DECISIONS WITHOUT ORDERING 147

Table 6.1: The "best" .a-values for twelve a-values and six experiments

(1'= .250 .333 .400 .500 1.000 1.333


a ,8-values
.010 .047 .250 .431 .628 .908 .942
.020 .026 .172 .327 .521 .854 .904
.030 .017 .131 .268 .452 .811 .871
.040 .012 .106 .227 .401 .773 .841
.045 .011 .096 .210 .380 .756 .828
.050 .009 .088 .196 .361 .740 .814
.055 .008 .080 .184 .344 .725 .802
.060 .007 .074 .172 .328 .710 .789
.070 .006 .064 .153 .300 .683 .766
.080 .005 .055 .137 .276 .657 .744
.090 .004 .049 .123 .255 .633 .722
.100 .003 .043 .111 .236 .611 .702

these three curves at a = .05. The ".05-a-Ievel" tangents are not parallel.
A statistical test of ho versus hI is a lottery involving the three prizes -x,
-(I-x), O. As before, if the preferences among such tests satisfy the expected
utility hypothesis, then the indifference curves (of equally desirable tests) are
parallel straight lines.
In Figure 6.3, these indifference curves have negative slopes equal to
-xpo/(l - x)(l - Po). (The slopes are negative because smaller (0',,8) val-
ues are better.) Thus, expected utility theory is in conflict with the popular
convention of choosing the "best" test with a fixed a-level, e.g., a = .01 or
a = .05. That is, when testing simple hypotheses, in order to agree with
expected utility theory the choice of a must reflect the precision of the ex-
periment. (See, also, Lindley [1972, p. 14] who gives this argument for the
special case of "0-1" losses.) In a purely "inferential" (non-decision-theoretic)
Bayesian treatment of testing a simple hypothesis versus a composite alter-
native, Jeffreys [1971, p. 248] argues for the same caveat about constant
a-levels.
A dramatic illustration of this lesson can be seen with the aid of the
table, above. Suppose an agent prefers undominated tests with a = .05 over
rivals. Then, for the experiment corresponding to (1' = 1/4, test T2 is preferred
to test T l , where (0'1 = .01, ,81 = .047) and (0'2 = .05, ,82 = .010). Likewise,
for the experiment corresponding to (1' = 4/3, test T4 is preferred to test T 3,
where (0'3 = .09, ,83 = .723) and (0'4 = .05, ,84 = .814). However, test T 5 ,
the "50-50" mixture of tests Tl and T3, is preferred to test T 6 , the "50-50"
148 CHAPTER 6

f3 values

1.00


0.90

0.80

0.70


0.60

0.50

0.40

0.30

0.20

0.10

0.00
0.00 0.02 0.04 0.06 0.08 0.10
a values
Figure 6.3: Families of (a, /3) Pairs for Undominated Tests
DECISIONS WITHOUT ORDERING 149

mixture of tests T2 and T 4 , as (as =


.05, (3s =
.385) and (a6 =
.05, (36 =
.412), so that Ts dominates T 6. This is the decision-theoretic analogue of
Cox's (1958) example involving the failure of the ancillarity principle within
N eyman-Pearson theory.
1.2 Probability Given Utility
The "Dutch Book" argument, tracing back to Ramsey (1931) and de-
Finetti (1937), offers prudential grounds for action in conformity with per-
sonal probability. Under several "structural" assumptions about combina-
tions of stakes (that is, assumptions about the combination of wagers), your
betting policy is consistent ("coherent") only if your "fair" odds are proba-
bilities.
A simple bet on/against event E, at odds of r : 1 - r, with a total stake
S > 0 (say, bets are in $ units), is specified by its payoffs, as follows:
E ...,E
bet on E win (l-r)S lose rS
bet against E lose (l-r)S win rS

(By writing S < 0 we can reverse betting "on" or "against.")


The general assumption (that acts are weakly ordered by :j) entails
that there is a preference among the options betting on, betting against and
abstaining from betting (whose consequences are "status quo," or net $0,
regardless of whether E or ...,E). The special ("structural") assumptions about
the stakes for bets require, in addition:
(a) Given an event E, a betting rate r : 1- r and a stake S, your preferences
satisfy exactly one of three profiles.
Either, betting on -< abstaining -< betting against E,
or betting on "" abstaining"" betting against E,
or betting against -< abstaining -< betting on E.
(b) The (finite) conjunction of favorable/fair/unfavorable bets is favorable/
fair/unfavorable. (A conjunction of bets is favorable in case it is pre-
ferred to abstaining, unfavorable if dispreferred to abstaining, and fair
if indifferent to abstaining.)
(c) Your preference for outcomes is continuous in rates, in particular, each
event E carries a unique "fair odds" rE for betting on E.
Note: It follows from these assumptions that your attitude towards a simple
bet is independent of the size of the stake.
Dutch Book Theorem: If your fair betting odds are not probabilities,
then your preferences are incoherent, i.e., inconsistent with the preference for
150 CHAPTER 6

sure-gains. Specifically, then there is some "favorable" combination of bets


which is dominated by abstaining, i.e., some "favorable" combination where
you payout in each state of a finite (exhaustive) partition. (See Shimony
(1955), for an elegant proof using the linear structure of these bets.)
The "Dutch Book" argument can be extended to include conditional
probability, p(·I·), through the device of called-off bets. A called-off bet on
(against) H given E, at odds ofr: (1-r) with total stake S (> 0), is specified
by its payoffs, as follows.

HnE ...,H n E
bet on H win (1-r)S lose rS o (the bet is called off)
bet against H lose (1-r)S win rS o (the bet is called off)
By including called-off bets within the domain of act to be judged favor-
able/indifferent/unfavorable against abstaining, and subject to the same struc-
tural assumptions (a-c) imposed above, coherence of "fair" betting odds en-
tails: r(HIE) . rE = r(HnE), where "r(HIE)" is the "fair called-off" odds on H
given E. This result gives the basis for interpreting conditional probability,
p(HIE), by the fair "called-off" odds r(HIE), for then we have:

p(HIE) . p(E) = p(H n E) ,

the axiomatic requirement for conditional probabilities.


1.3 Simultaneous Axiomatizations of (Personal) Probability and
Utility.
We distinguish two varieties:
(i) without extraneous "chances," as in Savage's (1954) theory.
(ii) with extraneous "chances," a continuation of the von Neumann-Morgen-
stern approach, as in Anscombe & Aumann's (1963) theory of "horse
lotteries." Horse lotteries are a generalization of lotteries as illustrated
in Figure 6.4.

An outcome of act Ai, when state Sj obtains (when "horse/, wins) is


the von Neumann-Morgenstern lottery Lij. The Anscombe-Aumann theory
is the result of taking the von Neumann-Morgenstern axiomatization of ::5
(the Weak-order, Independence and Archimedean postulates), and adding an
assumption that states are value-neutral.
2. Independence and Consistency in Sequential Choices
We are interested in relaxing the "ordering" postulate, without aban-
doning the normative standard of coherence (consistency) and without losing
DECISIONS WITHOUT ORDERING 151

Lij

Figure 6.4: Anscombe-Aumann "Horse Lotteries"

the representation ("measurement") of our modified theory. First, however,


let us compare two programs for generalizing expected utility in order to
justify the concern for consistency:
Program -,1-delete the "independence" postulate. Illustrations: Samuel-
son (1950); Kahneman & Tversky's "Prospect Theory" (1979); Allais (1979);
Fishburn (1981); Chew & Macrimmon (1979); McClennen (1983); and espe-
cially Machina (1982, 1983-which has an extensive bibliography).
Program -,O-delete the "ordering" postulate. Illustrations: I.J. Good
(1952); C.A.B. Smith (1961)-related to the "Dutch Book" argument; I. Levi
(1974,1980); Suppes (1974); Walley & Fine (1979); Wolfenson & Fine (1982);
Schick (1984).
And in Group Decisions: Savage (1954, §7.2); Kadane & Sedransk
(1980), and Kadane, et al (1990)-applied to clinical trials.
Also, "regret" models involve a failure of "ordering" if we define the
relation ~ by their choice functions, which violate (Sen's properties a and (3,
1977) "independence of irrelevant alternatives": Savage (1954, §13.5); Bell &
Raiffa (1979); Loomes & Sugden (1982), and Fishburn (H)83).
A criticism of program -,1: Consider elementary problf'ms where we ap-
ply the modified theory -,1 to simple lotteries. Thus, we discuss the case, like
the von Neumann-Morgenstern setting, where "probabilit.y" is given and we
try to quantify (represent) the value of "rewards."
152 CHAPTER 6

There is a technical difficulty with the theory that results from just the
two postulates of "weak-ordering" and the usual "Archimedean" requirement.
It is that these two are insufficient to guarantee a real-valued "utility" rep-
resentation of ~ (see Fishburn, 1970, §3.1). We can avoid this detail and
also simplify our discussion by assuming that lotteries are over (continuous)
monetary rewards; we assume that lotteries have $-equivalents and more $ is
better.
Under these assumptions and to underscore the normative status of co-
herence, let us investigate what happens when a particular consequence of
"independence" is denied.
Mixture dominance ("betweenness"): If lotteries L 1 , L2 are each pre-
ferred (dispreferred) to a lottery L3, so too each convex combination of Ll
and L2 is preferred (dispreferred) to L3.
Here is an illustration of sequential inconsistency for a failure of mixture
dominance. Let Ll '" L2 '" $5.00, but .5Ll + .5L2 '" $6.00: the agent prefers
the "50-50" mixture of Ll and L2 to each of them separately. Then, by
continuity of (ordinal) utility over dollar payoffs, there is a fee, -$ c, such
that, e.g.,

where Li - c denotes the modification of Li obtained by reducing each payoff


in Li by the fee $ c. Assume $4.00 -< (Li - f)(i = 1,2).
Consider two versions of a sequential decision problem, depicted by the
decision trees in Figures 6.5 and 6.6. "Choice" nodes are denoted by a 0
and "chance" nodes are denoted bye. In the first version (Figure 6.5), at
node A the agent may choose between plans 1 and 2. These lead to terminal
choices at nodes B, depending upon how a "fair" coin lands at the intervening
chance nodes. If the agent chooses plan 1 (at A) and the coin lands "heads,"
he faces a (terminal) choice between lottery Ll and the certain prize of $5.50.
If, instead the coin lands "tails," he faces a (terminal) choice between L2 and
the certain prize of $5.50.
The decision tree is known to the agent in advance. He can anticipate
(at A) how he will choose at subsequent nodes, if only he knows what his
preferences will be at those junctures. In the problem at hand, we suppose
the agent knows that, at B, he will not change his preferences over lotteries.
(There is nothing in the flip of the coin to warrant a shift in his valuation
of specified, von Neumann-Morgenstern lotteries.) For example, according to
our assumptions, at A he prefers a certain $5.50 to the lottery L 1 • Thus, we
assume that at D, too, he prefers the $5.50 to L 1 •
DECISIONS WITHOUT ORDERING 153

"heads" $5.50 ~
0' = .5

$5.50 ...... 1
"tails" L2

$5.50 ~
A-
Ll - C ~

"heads"
$4.00
$5.75 ...... 2
11'

D - designates choice points


t
B
• - designates chance points
~ - designates chosen alternative

At choice node A option 2 is preferred to option 1. At each choice node B


this preference is reversed.
Figure 6.5: First Version of the Sequential Decision
154 CHAPTER 6

$5.50 <=

.lJ.
$5.50", 1
"tails" L2
1 - 0: = .5
$5.50 <=
A-
$L -£ <=
1

$4.00
$5.50", 2

1- 0: = .5

o - designates choice points


t
B
• - designates chance points
=> - designates chosen alternative
At choice node A option 1 is preferred to option 2. The tree results by
replacing L; - c: (i=1,2) from Figure 6.5 with $-equivalents under ~.
Figure 6.6: Second Version of the Sequential Decision
DECISIONS WITHOUT ORDERING 155

Then, at A, the agent knows which terminal options he will choose at


nodes B and plans accordingly. If he selects plan 1, he will get $5.50. If he
selects plan 2, he will get lottery L1 - e with probability 1/2 and he will get
lottery L2 - e with probability 1/2. But this he values $5.75; hence, plan 2
is adopted.
The decision program ..,1 requires the "ordering" postulate for terminal
decisions. Thus, at choice nodes such as B, the agent is indifferent between
lotteries that are judged equally desirable (-) according to his preferences
(~). The second version of the sequential choice problem (Figure 6.6) re-
sults by replacing the lotteries at the (terminal) nodes B by their sure-dollar
equivalents under -. In this version, by the same reasoning, the agent rejects
plan 2 and adopts plan 1. This is an inconsistency within the program since,
at nodes B, the agent's preferences are given by the weak-ordering, ~, yet
his (sequential) choices do not respect the indifferences, -, generated by ~.
Let us call such inconsistency in sequential decisions an episode of "se-
quential incoherence." Then, we can generalize this example and show:
Theorem If ~ is a weak order (1) of simple lotteries satisfying the Archimedean
postulate (3) with sure-dollar equivalents for lotteries, and if ~ respects
stochastic dominance in payoffs (a greater chance at more $ is better), then
a failure of "independence" (2) entails an episode of sequential incoherence
(see Seidenfeld (1988».
However, as Levi's decision theory-one which relaxes the ordering postu-
late rather than "Independence"-avoids sequential incoherence (Levi, 1986),
we see that it is not necessary for decisions to agree with expected utility the-
ory in order that they be sequentially coherent.
3. Representation of preferences without "ordedng"
Next, we discuss the representation of an alternative theory falling within
program ..,0: to weaken the "ordering" assumption. Again, let us begin with
the more elementary problem where we try to quantify values for the rewards
when "probability" is given-analogous to the von Neumann-Morgenstern
setting.
Let R = {ri : i = I, ... } be a countable set of rewards, and let L =
{L : L is a discrete lottery, a discrete P on R}. As before, define the convex
combination of two lotteries a L1 + (1 - a )L2 = L 3, by P3 = aP1 + (1 - a )P2.
We consider a theory with three axioms:
Axiom 1: Preference -< is a strict partial order, being transitive and
irreflexive. (This weakens the "weak order" assumption, since noncompara-
bility, -, need not be transitive.)
156 CHAPTER 6

Axiom 2: (independence): For all L 1, L 2, and L 3 , and for alll ~ 0: > 0,

Axiom 3: A suitable Archimedean requirement. (Difficulties with axiom


3 are discussed below.)
Say that a real-valued utility U agrees with the partial order 04( iff

We hope to show that 04( is represented by a (convex) set of agreeing


utilities. That is, we seek to show there is a (maximal) set of agreeing utilities,
U -<, where (by the unanimity rule)

Ll 04( L2 iff for each U E U-< EiP1(ri)U(rd < EiP2(ri)U(r;}.


Aside on related results: Aumann (1962) proved that when R is finite,
there exists a real-valued utility agreeing with 04(, provided axioms like 1-3
hold. A lottery is simple if its support is a finite set of rewards. Kannai (1963)
extended Aumann's result to simple lotteries on a countable reward set by
strengthening the Archimedean axiom 3. (More precisely, these theories deal
with a reflexive and transitive partial order-which identifies indifferences-
not just with the irreflexive part 04(.) These two studies, as well as Fishburn's
(1970, ch. 9) simplification of Aumann's work, use an embedding ofthe partial
order in a separable, normed linear space. Their proofs have a common
theme. Represent a lottery L by a vector of its probability P, with coordinates
corresponding to the elements of R. Because a lottery is simple, all but finitely
many of its coordinates are zero. Call a vector difference (P2 - PI) ''favorable''
when Ll 04( L 2. The set of ''favorable'' vectors form a convex cone, and a
Separating Hyperplane Theorem (Klee, 1955) yields a utility. (However, the
separability assumption prohibits using this method when, e.g., the reward
set R is uncountable.)
There are three observations which help to explain some of the difficulties
that arise in carrying out our project for representing preferences given by
partial orders.
(1) The usual Archimedean axiom won't do, it is too restrictive.
Example 1: R = {ro
04( r* 04( rtl but for no 0 < 0: < 1 is it the case that
o:ro + (1 - o:)rl r*. However, this partial order can be represented by a
04(
set of utilities, U = {Ux : 0 < x < I} with Ux(ro) = 0, Ux(rt) = 1 and
Ux(r*) = x. This is illustrated in Figure 6.7.
Hence, in general, to represent a partial order generated by a set of utilities,
a weakening of the usual Archimedean postulate is necessary.
DECISIONS WITHOUT ORDERING 157

1
~----------------~~rl

o ~-------------------ro

o < x < 1

Figure 6.7: Example of Restrictions of the usual Archimedean Axiom

......
..

-
U(rJ)

designates an open boundary


designates a closed boundary

The two (convex) sets differ by the presence of the point identified by the
arrow. The common partial order is generated by the "unanimity" rule.
Figure 6.8: Two Convex Sets of Utilities which Generate the Same Partial Order
158 CHAPTER 6

(2) Two different convex sets of utilities can generate the same partial
order. That is, given convex sets U I and U 2, we can define the partial orders
--{I and --{2 according to the "unanimity" rule. However,

Example 2: it may be that --{1=--{2, though U I '# U2. See Figure 6.8 for an
illustration.
When we shift from representing indeterminate utility (given determi-
nate "chances") to the dual task of representing indeterminate probability
(given a determinate utility-by assuming favorable bets combine according
to the "Dutch Book" assumptions-see §4), this phenomenon causes difficul-
ties for the representation of conditional probabilities. (Also, contrast this
with Aumann's example, 1964, p. 210.)
(3) Last, though the representation of indeterminate preferences over
lotteries (given determinate "chances") is by convex sets of utilities-similarly
the dualized representation of indeterminate betting odds (given bets are in
stakes which behave like utiles-see §4) is by convex sets of probabilities-
when we turn to the simultaneous representation of indeterminate preferences
and beliefs (through "horse lotteries"), convexity may fail. The set {(P,U)} of
probability-utility pairs which agree with a partially ordered preference over
horse lotteries may not be convex (nor even connected). However, convexity
is assured for both sets: {(P,U*): U* fixed} and {(P*,U): P* fixed}.
Here is an example of non-convexity of the set of probability-utility pairs
agreeing with a partial order, --{, over "horse lotteries."
Example 3: There are two uncertain states (S, -.S) and three rewards (ro, r*,
rl), with rl preferred to ro, ro --{ rl, but where r* is --{-incomparable with
either ro or rl. Consider the two acts, Al and A2, defined by the payoffs:

S -.S
Al ro rl
A2 rl r*

Fix the utilities U(ro) = 0 and U(rd = 1, and let peS) denote the probability
of state S. Then Figure 6.9 shows the regions where Al is preferred or A2 is
preferred.
This example shows why the proof techniques based on the Separat-
ing Hyperplane results are inappropriate for identifying the (maximal) set of
pairs: {(P,U): (P,U) agrees with --{} for "horse lotteries."
Our proof procedure for giving a representation of a strict preference over
horse lotteries is to modify Szpilrajn's (1930) argument that, by transfinite
induction, every partial order may be extended to a total order. The modifi-
cation involves preserving the other axioms: "Independence," "~rchimedes,"
DECISIONS WITHOUT ORDERING 159

o Al is preferred (convex)
9 A2 is preferred (not convex)

1 · ..... .
···........
.......
....... ...
::::::::.2:::::::.5:S.8
0 · ........ .
···...........
..........
.......... ...
P(S) =

···...........
..........
.......... ...
U(r*) = -2 · ··............
........... ..
··...........
...........
........... ...
...........
···............
...........
··............
........... ....
. . . . . . .
............
···............ . . . . . ...
-4
............
............ ...
·............
···.............
............. .. .
.............
.............
............. . ..
-6 · ··.............
···..............
............. ....
···..............
.............
..............
............. ..
· .............. .

Figure 6.9: Regions of Preference for Example 3

and "value-neutrality" of states. In the Appendix we illustrate this technique


for representing strict partial orders of von Neumann-Morgenstern lotteries
by convex sets of (lexicographic) utilities.
4. Representation of Beliefs without "Ordering"
By appeal to the Separating Hyperplanes theorem, we may generalize
the Dutch Book argument to establish the coherence of beliefs for partially
ordered gambles, including the case (discussed by C.A.B. Smith, 1961) of
"medial" odds. Consider the finite partition of states {Sj : j = 1, ... ,n}, and
define a gamble as a vector of n real-values, Ai =
(ril, ... ,rin), where rij is
the (utility of the) reward generated by Ai when state Sj obtains. Denote
the constant gamble rj = 0 (corresponding to "no bet," or "status quo")
by 0, and define the set of favorable gambles, :F, to be those which are
preferred to 0 in pairwise comparisons. As in the Dutch Book argument, we
make structural assumptions about the value of the rewards, assuring that
the magnitudes of the rewards behave like utilities.
Structural assumptions:
160 CHAPTER 6

(0,1,0)[s2]

open at
_-T---- this vertex

Closed along
any part of
--\------"''- this face
designated
by * * *
(l,O,O)[sl]

Closed at these vertices

Figure 6.10: Different Convex Sets of Probabilities which Generate the Same Partial
Order

(i) (weak dominance over 0) If rij


for some j, then Ai is favorable.
~ °(all j) with a strict inequality

(ii) (scalars) If Ai is favorable, so too is cAi = ( ... , Crij, ... ), for c> o.
(iii) (convex combinations) If Ah and Ai are favorable, so too is the
convex combination %Ah + (1 - %)Ai = (... , %rhj + (1 - %)rij, ... ), for ~
% ~ l.
°
Representation theorems relating to :F:
Theorem 1 coherence of :F:
(i) 0 ¢ :F iff there is a maximal, non-empty convex set P of probabilities
with the property that VAi E:F, Vp E P, Lj p(Sj)rij > 0.
(ii) Moreover, if:F is open, or if :FU {O} is closed, then Ai E :F provided
Vp E P, Lj p(Sj)rij > 0.
DECISIONS WITHOUT ORDERING 161

(0,1,0)[s2]

lower ,_~_lower
p(s3) p(sl)

upper _~_\

p(s3)
lower
p(s2)

(0,0,1)[s3] (l,O,O)[sl]

Convex set P1 ID1IIIIIID


Figure 6.11: Supporting Lines Determined by Odds Alone

We may extend this to included conditional probabilities by paralleling


the device of "called-off" bets, used to show coherence of conditional odds in
the Dutch Book argument. Then,
Theorem 2: (coherence of conditionally favorable gambles)
Let :FE(C :F) be the set of favorable gambles, called-off in case event E
fails to occur, i.e., VAi E :FE, rij = 0 if Sj E E C • Assume that 0 ¢ :F.
(i) Then VAi E :FE, Vp E P, Lj pes; IE)ri; > O.
(ii) Moreover, if Ai is called off when E fails and either :FE is open or
:FE U {OJ is closed, then Ai E :FE provided 'rip E P, Lj pes; IE)ri; > O.
In both theorems, the closure conditions imposed in clause (ii) reflect
the severity of the problem illustrated in Figure 6.10, which is dual to the
problem illustrated in Example 2, p. 156.
The favorable gambles :F are a subset of those preferred to "no bet"
under the partial order (-<,,), generated by the "unanimity" rule adapted to
162 CHAPTER 6

(0,1,0)[s2]

Convex set PI IillIIIIIIII


~
Convex set P2 ~

for s2:s3 for s2:s1


given not-s1 given not-s3

Lower od",d~s-J" Lower odds

Lower odds for sl:s3 Upper odds


given not-s2

Figure 6.12: Supporting Lines Determined by Odds and Called-Off Bets

the set P. Denote the closure of :F by cl(:F), and denote by :F- the set that
results from taking each favorable gamble and changing the sign of its payoffs.
It is straightforward to verify that P is a unit set (expected utility theory)
just in case -<." is a weak-order. That occurs if and only if cl(:F) U:F- = Rn
(the space of all gambles on the n states Sj). In other words, when P is not
a unit set, there will be gambles Al and A2 with Al -<." A2 but where none
of AI, Ai" , A2, or A"2 is favorable.
We illustrate sets P for the elementary case of three states, n = 3
in Figures 6.11-6.13. The figures use barycentric coordinates. Each tri-
nomial distribution on {SI,s2,Sa} is a point in the simplex having vertices:
{(100)(010)(001». Figure 6.10 shows different convex sets of probabilities
that generate the same preferences under the ''unanimity'' rule. Figure 6.11
shows the supporting lines for a set PI which arises merely by specifying odds
at which betting "on" and "against" the (atomic) events Sj become favorable.
The set PI is the largest one agreeing with these upper and lower probabili-
DECISIONS WITHOUT ORDERING 163

(0,1,0)[s2]

(0,0,I)[s3] (I,O,O)[sl]

A convex set such that no proper subset has the ~


~
same upper and lower probabilities for the atoms.

Figure 6.13: Supporting Lines which Overdetermine the Vertices

ties. As noted by Levi (1980, p. 198), typically, infinitely many convex subsets
of Pi carry the same probability intervals.
Figure 6.12 illustrates the supporting lines for a set P2 given, in addition,
by bounds on the favorable "called-off" bets :F$jc. P 2 is properly included
within Pi, has the same upper and lower probabilities, and is the largest
set agreeing with all si.x pairs of unconditional and conditional odds. As
Levi (1974, and 1980, p. 202) points out, we can distinguish between two
sets having different supporting lines, e.g., Pi and P2, with a gamble that is
favorable for only one of them.
Figure 6.13 illustrates how just a few supporting lines can overdetermine
the vertices (and thereby all) of a convex set. The simplest case is when the
supporting lines corresponding to the upper and lower unconditional odds
fix the convex set, P3, uniquely. That is, there is no proper subset of P3
with the same upper and lower probabilities. Hence, the set of favorable
gambles, :F, is fully determined once these upper and lower betting odds are
164 CHAPTER 6

given. (This corrects a minor error in Levi's (1980, p. 202) presentation.


There, the set "Bt" has upper and lower unconditional and conditional odds
which overdetermine its vertices. Thus, the proper subset "B~" does not have
the same range of unconditional and conditional odds as "Bi'.) We plan to
investigate the computational issues relating to the measurement of a convex
set, P, using the set of favorable gambles, :F. How efficiently can we locate
supporting lines which overdetermine the vertices of a set?
5. Summary
We have illustrated a variety of axiomatic and consistency arguments
used to justify the normative status of expected utility theory-§l. When
(only) the "independence" axiom is denied, inconsistency in sequential choice
results-§2. We argue, instead, for a generalization of expected utility the-
ory by relaxing the "ordering" postulate. The resulting theory admits rep-
resentations in terms of sets of probabilities and utilities-§3. By analogy
with the "Dutch Book" betting argument, we prove that coherence of a par-
tially ordered (strict) preference over gambles (as identified by the set of its
strictly "favorable" gambles) is represented by a convex set of probabilities-
§4. Sometimes this representation is fixed by a very few number of compar-
isons, making measurement feasible.

Appendix
Representation of a strict partial order by a convex set of lexicographic
utilities.

Defs. Let REW be a set of rewards, REW = {ra : a :$ P}. A lottery, L,


is a discrete probability distribution over REW, L = {pO : p( r a) ~
0, LP(ra) = 1}. Let Supp(L) be the support of p(-). (A simple lottery
is a lottery with finite support.) Denote by LREW the set of simple
lotteries over REW. Given two lotteries L1 = {P1(·)} and L2 = {P2(·)},
define their convex combination by L3 = XL1 + (1 - X)L2 = {XP1(-) +
(1- X)p2(·)}. Then, L REW is a (Herstein & Milnor, 1953) mixture set.
Axioms The following two are our axioms for a strict partial order, 1>, over
LREW.
Al I> is a transitive and irreflexive relation on LREW x LREW.
A2 (Independence) For all L 1, L2 and L 3 , and for each 0 < x:$ 1:
XL1 + (1- X)L3 I> XL2 + (1 - X)L3 iff L1 I> L2.
DECISIONS WITHOUT ORDERING 165

Def. When neither Ll I> L2 nor L2 I> L 1 , we say the two lotteries are in-
compamble (by preference), which we denote by Ll "" L 2.
Incomparability is not transitive, unless I> is a weak order.
Theorem 1: Let REW be a reward set of arbitrary cardinality and let LREW
be the set of simple lotteries over these rewards. Let I> be a strict partial
order defined over elements of L REW. Then there is an extension of I>
to 1>* which is a total ordering of LREW satisfying axiom 2.
Combining Theorem 1 with Hausner's (1954) important result (since a
total order is a "pure" weak order), we arrive at the following conse-
quence.
Corollary 1: There is a lexicographic real-valued utility, U, which agrees
with 1>, i.e., if Ll I> L2 then Eu[Ld < Eu[L2]'
(Note: A lexicographic utility U is a (well ordered) sequence of real
valued utilities, U = {Ua : Ua is a real valued utility, for each a < ,8}.
When U is a lexicographic utility, then Eu[Ld < Eu[L 2] is said to
obtain if Eu..[Ld < Eu..[L 2] at the first utility Ua in the sequence U
which gives Ll and L2 different expected values, provided one such Ua
exists. )
Pl·oof of Theorem 1: Let {L-y : "y < k ("y ranging over ordinals, k a cardinal)}
be a well ordering of LREW. Let I> be a partial order on LREW satis-
fying axioms 1 and 2. By induction, we define a sequence of extensions
of 1>, {t>A : >. ::::; k }, where each I>A preserves both axioms and where
I>k is a total order on LREW. The partial order I>A' corresponding to
stage>. in the k sequence of extensions, is obtained by contrasting lot-
teries La and Lf3, where r( a,,8) = >. under the canonical well ordering
r of k x k -+ k. We define extensions for successor and limit ordinals
separately.
Successor ordinals: Suppose I> A satisfies axioms 1 and 2. Let r( a,,8) =
>. + 1 and (for convenience) suppose max[a,,8] =,8. Define t>A+l as
follows.
Case 1: If a =,8, then I>A+l = I>A'
Otherwise,
Case 2: LIJ I>A+l Lv iff either
(i) LIJ t>A Lv (so t>A+1 extends t>A)' or
(ii) La ""A Lf3 & 3(0 < X < 1) with xLIJ + (1 - x)Lf3 I>A (or
=) xLv + (1 - x)L a .
Limit ordinals: Let r(a,,8) = >. < k, a limit, and (for convenience)
again assume max[a,,8] =,8.
166 CHAPTER 6

Case 1: If 0:' = /3, then take I>~ = U6<~( 1>6). That is, LIJ I>~ Lv obtains
just in case 3(6 < >.)LIJ 1>6 Lv.
Case 2: If 0:' i:- /3, then define I>~ as: LIJ I>~ Lv iff either (i) 3(6 <
>.)LIJ 1>6 Lv (so I>~ extends all preceding 1>6), or (ii) V(6 < >.)L a "'6 L{3
& 3(6 < >')3(0 < x < 1) with xLIJ+(l-x)L{3 1>6 (or =) xL v+(l-x)La.
Next, we show (by transfinite induction) that I> ~ satisfies the two ax-
ioms, assuming 1>(= 1>0) does. First, consider successor stages where
the extension is of the form I> ~+1'
Axiom l-irreflexivity. We argue indirectly. Assume, for some lot-
tery L IJ , LIJ 1>~+1 Lw Since LIJ I>~ LIJ is precluded, by hypothesis of
induction, it must be that (ii): 3(0 < x < 1) with
xLIJ + (1 - x)L{3 I>~ (or =) xLIJ + (1- x)La.
Since I> ~ satisfies axiom 2, L{3 I> ~ (or =) La. If either L{3 I> ~ La or
L{3 = La, then 1>~+1 = I>~, contradicting the hypothesis LIJ 1>~+1 Lw
Axiom l-transitivity. Assume LIJ 1>~+1 Lv and Lv 1>~+1 L",. There are
four cases to consider, since each 1>~+1 relation may obtain in one of
two ways. The combination where clause (ii) is used for both provides
the greatest generality (the other cases being analyzed in the same way).
Thus, we have: 3(0 < x, y < 1) with
xLIJ + (1 - x)L{3 I>~ (or =) xLv + (1 - x)La
and also
yLv + (1- y)L{3 I>~ (or =) yL", + (1 - y)L a.
Since I> ~ satisfies axioms 1 and 2, we may "mix" these to yield

w(xLIJ + (1 - x)L{3) + (1 - w)(yLv + (1 - y)L{3)


I>~ (or =)

w(xLv + (1 - x)L{3) + (1 - w)(yL", + (1 - y)L{3) .

Choose w· x = (1 - w)y, cancel the common "Lv" terms (according to


axiom 2), regroup (by "reduction") to arrive at: 3(0 < v < 1)
vLIJ + (1 - v)L{3 I>~ (or =) vL", + (1 - v)La,
where v = wx/(l- y + wy). Hence, LIJ 1>~+1 L"" as desired.
Axiom 2. We are to show LIJ 1>~+1 Lv iff
xLIJ + (1 - x)L", 1>~+1 xLv + (1 - x)L", .
There are two cases.
Case 1: LIJ I>~Lv occurs just in case xLIJ+(I-x)L", l>~xLv+(I-x)L",
DECISIONS WITHOUT ORDERING 167

(by axiom 2). By the definition of t>>'+1, we obtain the desired result:
xL,.. + (1 - x)Lt/J t>>'+1 xL" + (1 - x)Lt/J.
Case 2: vL,..+(I-v)L,8 t>>. (or =) vLt/J+(I-v)La occurs just in case
yLt/J + (1- y)(vL,.. + (1- v)Lfj) t>>. (or =)
yLt/J + (1 - y)(vLt/J + (1 - v)La),
according to axiom 2. Choose y = v(1 - x)/[v(1 - x) + x], regroup
terms to yield: w(xL,.. + (1 - x)Lt/J) + (1 - x)Lfj t>>. (or =) w(xL" +
(1- x)Lt/J) + (1- x)L a , where w = v/[v(l- x) + x]. By the definition
of t> >'+1. we obtain the desired result:

xL,.. + (1 - x)Lt/J t>>'+1 xL" + (1- x)Lt/J .


This establishes axioms 1 and 2 for successor stages, t>>'+1.
The argument with limit stages is similar.
Axiom l-irreflexivity. Again, we argue indirectly. Assume L,.. t>>. L,...
By hypothesis of induction -,3(6 < A) L,.. t>6 L,... So we may assume
La f. Lfj and "1(6 < A)La "'6 Lfj and 3(6 < A)3(0 < x < 1) with
xL,..+(I-x)Lfj t>6 (or=) xL,..+(I-x)L a . But, by the hypothesis of
induction t>6 satisfies axiom 2, hence, Lfj t>6 (or =) La, a contradiction.
Axiom l-tmnsitivity: Assume L,.. t>>. L" and L" t>>. Lt/J. Again there
are four cases, and again we discuss the most general case where clause
(ii) is used to obtain these t> >. -preferences. Thus, we have: 3(0 <
x,y < 1)3(6,6' < A) with
xL,.. + (1- x)L,8 t>6 (or =) xL" + (1- x)La
and also
yL" + (1- y)Lfj t>6 1 (or =) yLt/J + (1- y)La.
Without loss of generality, let 6 = max[6, 6']. Then
yL" + (1 - y)Lfj t>6 (or =) yLt/J + (1- y)La,
since t>6 extends t>6 " Now, repeat the "mixing" and "cancellation"
steps used with the parallel case for successor stages. This yields the
desired con cl usion: L,.. t> >. Lt/J.
Axiom 2: For this axiom, the reasoning is the same as used with axiom
2 in the successor case, modified to apply to the appropriate (preceding)
stage t>6.
Last, define t>k = U6<k( t>6). Hence, t>k is a total order of LREW
which satisfies axiom 2. Every two (distinct) lotteries are compared
under t>k, i.e., V(La f. Lfj E LREW)La t>k Lfj or Lfj t>k La .D

Next, we state without proof, a simple lemma.


168 CHAPTER 6

Lemma 1: If lexicographic utilities U1 and U2 both agree with the strict


partial order 1>, then so too does their convex mixture XUl + (1- X)U2.
Also, sets of lexicographic utilities generate a strict partial order ac-
cording to the "unanimity" rule, as we now show.
Lemma 2: Each set of lexicographic utilities U = {U : U is a lexicographic
utility over REW} induces a strict partial order I>u (satisfying axioms
1 and 2) under the ''unanimity'' rule:
LOI I>u Lp iff'V(U E U) Eu[LOI) < Eu[L,B)
Proof: The lemma is evident from the fact that each lexicographic utility
induces a weak-ordering ~u of LREW, satisfying axiom 2, according to
the definition:
LOI -<u < Eu[Lp).
Lp iff Eu[LOI)
Recall, Eu[LOI) < Eu[Lp) obtains if Eu[LOI) < Eu[Lp) for the first utility
U (if one exists) in the sequence U which assigns LOI and Lp different
expected utilities. Each utility U (hence, ~U), supports axiom 2 as:
Eu[LOI) < Eu[Lp) iff Eu[xLOI + (1- x)L-y) < Eu[xLp + (1- x)L-y).O
As is evident from the proof of Theorem 1, if L '" L', i.e., if neither L I> L'
nor L' I> L, then there are alternative extensions of I> in which L 1>6 L' and in
which L' 1>6 L. This observation, together with the two lemmas and Corollary
1, establish the following representation for strict partial orders 1>.

Theorem 2: Each strict partial order I> over a set LREW is identified by a
maximal, convex set U of lexicographic utilities that agree with it. In
symbols, I> = I>u, where I>u is the strict partial order induced by U
under the "unanimity" rule.
Of course, in light of problem (2) (p. 156), it can be that there is a
proper (convex) subset U' C U where I> = I>u' as well; hence, the
maximality of U is necessary for uniqueness of the representation.

REFERENCES

Allais, M. (1979) "The So-Called Allais Paradox and Rational Decisions Under
Uncertainty," in Allais and Hagen (eds.) Expected Utility Hypotheses and the
Allais Pamdox. D. Reidel: Dordrecht.
Anscombe, F.J. and Aumann, R.J. (1963) "A definition of subjective probability,"
Annals of Math. Stat., 34, 199-205.
Aumann, R.J. (1962) "Utility theory without the completeness axiom," Economet-
rica, 30, 445-462.
DECISIONS WITHOUT ORDERING 169

Aumann, R.J. (1964) "Utility theory without the completeness axiom: a correc-
tion," Econometrica, 32, 210-212.
Bell, D. and Raiffa, H. (1979) "Decision Regret: A Component of Risk Aversion,"
MS., Harvard University.
Chew Soo Hong and MacCrimmon, K.R. (1979) "Alpha-Nu choice theory: a gen-
eralization of expected utility theory," working paper, University of British
Columbia.
Cox, D.R. (1958) "Some Problems Connected with Statistical Inference," Annals
oj Math. Stat., 29, 357-363.
deFinetti, B. (1937) "La prevision: ses lois logiques, ses sources subjectives," Annals
de l'Institut Henri Poincare, 7, 1-68.
Fishburn, P.C. (1970) Utility Theory Jor Decision Making. Kriefer Publishing Co.:
N.Y.
Fishburn, P.C. (1981) "An Axiomatic Characterization of Skew-Symmetric Bilinear
Functionals, with applications to utility theory," Economic Letters, 8, 311-313.
Fishburn, P.C. (1983) "Nontransitive Measurable Utility," J. Math. Psych., 26,
31-67.
Good, I.J. (1952) "Rational Decisions," J. Royal Stat. Soc. B, 14, 107-114.
Hausner, M. (1954) "Multidimensional utilities," in R.M. Thrall, C.H. Coombs, and
R.L. Davis (eds.), Decision proceBBes. Wiley: N.Y.
Herstein, I.N. and Milnor, J. (1953) "An axiomatic approach to measurable utility,"
Econometrica, 21, 291-297.
Jeffreys, H. (1971) Theory oj Probability, 3rd ed. Oxford University Press: Oxford.
Kadane, J. and Sedransk, N. (1980) "Toward a More Ethical Clinical Trial,"
Bayesian Statistics. Bernardo et al (eds.) University Press: Valencia.
Kadane, J., et al (1990) A New Design Jor Clinical Trials. Wiley: forthcoming.
Kahneman, D. and Tversky, A. (1979) "Prospect Theory: An Analysis of Decision
Under Risk," Econometrica, 47, 263-291.
Kannai, Y. (1963) "Existence of a utility in infinite dimensional partially ordered
spaces," Israel J. oj Math., 1, 229-234.
Klee, V.L. (1955) "Separation Properties of Convex Cones," Proc. Amer. Math.
Soc., 6, 313-318.
Levi, I. (1974) "On Indeterminate Probabilities," J. Phil., 71, 391-418.
Levi, I. (1980) The Enterprise oj Knowledge, MIT Press: Cambridge.
Levi, I. (1986) "The Paradoxes of AUais and EUsberg," Economics and Philosophy,
2,23-53.
Lindley, D.V. (1972) Bayesian Statistics: A Review. SIAM: Philadelphia.
Loomes, G. and Sudgen, R. (1982) "Regret Theory: An Alternative Theory of
Rational Choice Under Uncertainty," Economic J., 92, 805-824.
McClennen, E.F. (1983) "Sure Thing Doubts," in B. Stigum and F. Wenstop (eds.),
Foundations oj Utility and Risk Theory with Applications. D. Reidel: Dor-
drecht.
Machina, M. (1982) "'Expected Utility' Analysis Without the Independence Ax-
iom," Econometrica, 50, 277-323.
Machina, M. (1983) "The Economic Theory of Individual Behavior Toward Risk:
Theory, Evidence and New Directions," Dept. of Economics, U.C.S.D.: San
Diego, CA 92093. Tech. Report #433.
170 CHAPTER 6

Ramsey, F.P. (1931) "Truth and Probability," in The Foundations of Mathematics


and other essays. Kegan, Paul, Trench, Trubner, and Co. Ltd.: London.
Samuelson, P. (1950) "Probability and the Attempts to Measure Utility," Economic
Review, 1, 167-173.
Savage, L.J. (1954) The Foundations of Statistics. Wiley: N.Y.
Schick, F. (1984) Having Reasons. Princeton Univ. Press: Princeton.
Seidenfeld, T. (1988) "Decision Theory without Independence or Without Ordering,
What is the difference?" with discussion, Economics and Philosophy, 4, 267-
315.
Sen, A.K. (1977) "Social Choice Theory: A Re-examination," Econometrica, 45,
53-89.
Shimony, A. (1955) "Coherence and the axioms of probability," J. Symbolic Logic,
20, 1-28.
Smith, C.A.B. (1961) "Consistency in Statistical Inference and Decision," J. Royal
Stat. Soc. B, 23, 1-25.
Szpilrajn, E. (1930) "Sur l'extension de l'ordre partiel," Fundamenta Mathematicae,
16, 386-389.
Suppes, P. (1974) "The Measurement of Belief," J. Royal Stat. Soc. B, 36,160-175.
von Neumann, J. and Morgenstern, o. (1947) Theory of Games and Economic
Behavior, 2nd ed. Princeton Univ. Press: Princeton.
Walley, P. and Fine, T. (1979) "Varieties of modal (classificatory) and comparative
probability," Synthese, 41, 321-374.
Wolfenson, M. and Fine, T. (1982) "Bayes-like decision making with upper and
lower probabilities," J. Amer. Stat. Assoc., 77, 80-88.

Teddy Seidenfeld
Department of Philosophy
M.J. Schervish
J.B. Kadane
Department of Statistics
Carnegie Mellon University
Pittsburgh, PA 15213
REFLECTIONS ON HILBERT'S PROGRAM

WILFRIED SIEG

Introduction
Hilbert's Program deals with the foundations of mathematics from a
very special perspective; a perspective that stems from Hilbert's answer to
the question "WHAT IS MATHEMATICS?". The popular version of his ''for-
malist" answer, radical at Hilbert's time and shocking to thoughtful mathe-
maticians even today, is roughly this: the whole "thought-content" of math-
ematics can be uniformly expressed in a comprehensive formal theory, math-
ematical activity reduces to the manipulation of symbolic expressions, and
mathematics itself is just "ein Formelspiel". Hilbert defended his "playful"
view of mathematics against intuitionistic attack by remarking:
The formula game that Brouwer so deprecates has, besides its mathematical value,
an important general philosophical significance. For this formula game is carried out
according to certain definite rules, in which the TECHNIQUE OF OUR THINK-
ING is expressed. These rules form a closed system that can be discovered and
definitively stated. The fundamental idea of my proof theory is none other than to
describe the activity of our understanding, to make a protocol of the rules according
to which our thinking actually proceeds. Thinking, it so happens, parallels speaking
and writing: we form statements and place them one behind the another. If any
totality of observations and phenomena deserves to be made the object of serious
and thorough investigation, it is this one-... 1
For my purposes, Hilbert's "computational conception of the mathemat-
ical mind" is of no consequence; what is of interest is the very possibility of
the formal representation of mathematics. Hilbert tried to exploit this pos-
sibility in his foundational program for philosophical ends. The explicit and
not altogether modest goal was to resolve foundational problems once and
for all by mathematical means.:! It is well-known that Hilbert's program was
refuted already in the early thirties by work of Godel's. So you may wonder
why one would want to reflect on such an extravagant program and its under-
lying formalist doctrine-more than fifty years after its refutation?-There
is one main reason on which I want to focus: the program can be given a
modified formulation, and the resulting general REDUCTIVE PROGRAM is
not refuted by Godel's work. In its pursuit most fascinating results have been
obtained that are of contemporary philosophical interest.
Formal Reflection In this part of my talk I want to discuss the Hilbert
program in its original and modified form. I start out by describing the central

171
W. Sieg (ed.), Acting and Reflecting, 171-182.
© 1990 by Kluwer Academic Publishers.
172 CHAPTER 1

metamathematical problem and the failure of the original program. Then


I will move on to the (enforced) modification and discuss one particularly
convincing partial solution.
Metamathematical Problems The claim that reasoning is rule-governed, ex-
pressed so vigorously in the quote I read to you, is by no means original
with Hilbert. It was formulated explicitly already in the 17th century by
Hobbes and, under his influence, by Leibniz; indeed, they viewed thinking
as just a kind of calculation. 3 Two developments in the late 19th century
made this claim more plausible at least for mathematical reasoning: first, the
radicalization of the axiomatic method (most vividly expressed in Hilbert's
own "Grundlagen der Geometrie") and, second, the remarkable extension of
logic through Frege's work. The latter provided an expressive formal lan-
guage and an appropriate logical calculus, that made it possible for the first
time to represent complex informal reasoning by formal derivations built up
according to fixed logical rules. Given a suitable axiomatic starting-point,
e.g., Russell and Whitehead's type theory or Zermelo's axioms for set the-
ory, mathematics could be systematically and formally developed. That was
the quasi-empirical background for Hilbert's foundational considerations; at
their center was the conviction that a radical reduction of set theoretic to
constructive mathematics should be possible.
Hilbert's conviction of the reducibility was based on two penetrating ob-
servations, one flash of connecting insight, and a programmatic demand. 4 The
penetrating observations have to come first, for sure, and can be formulated
as follows: the finite structures of symbols constituting a formal theory can be
taken as proper objects of mathematical study; the crucial notions concerning
these objects are decidable. These observations amount to recognizing the
mathematical character of the syntax of formal systems and to making ex-
plicit, what had been a normative, epistemologically motivated requirement
on formal objects and notions. With a flash of connecting insight, Hilbert
exploited them in a strategic way to locate investigations of (the syntax of)
formal theories within a part of mathematics that was wholly acceptable to
constructivist mathematicians like Kronecker and Brouwer. He called that
part of mathematics FINITIST and was convinced to have found a conclusive
way of transcending the foundational disputes of the time-by justifying clas-
sical mathematics in her formalized garb on radically constructivist grounds.
The way to achieve this goal was indicated by the programmatic demand to
establish the consistency of classical mathematics within finitist mathemat-
ics. Clearly, the crucial questions were: (1) in what sense, if any, does a
finitist consistency proof justify classical mathematics? and (2) can one find
a finitist consistency proof for all of mathematics?
REFLECTIONS 173

If one assumes (in accord with the practice in the Hilbert school and sub-
sequent analyses e.g., by Kreisel and Tait) that finitist mathematics is a part
of elementary number theory, then the answer to the second question is very
brief, namely, NO! That is a trivial consequence of Godel's Incompleteness
Theorems. In spite of the negative answer to question (2) I want to address
question (1), as it serves as a springboard for a modified version of Hilbert's
program. Assume that we are dealing with a standard, comprehensive formal
theory P, e.g., Zermelo-Fraenkel set theory. The consistency statement for P
is provably equivalent to the so-called reflection principle

Pr(a, rsl) => s,


r
where Pr is the canonical proof predicate for P and s1 is the P-translation
of the finitist statement s. Thus, a finitist consistency proof for P guaran-
tees that every P-provable finitist statement s is finitistically true. Hilbert
expressed this fact by saying that a consistency proof eliminates ideal ele-
ments from proofs of real statements. The parallel of Hilbert's position to
an INSTRUMENTALIST position with regard to scientific theories should be
obvious; in particular, if one replaces real (or finitist) by observational and
ideal by theoretical. In short, a finitist consistency proof would justify the
instrumental use of P for establishing real statements. To re-emphasize: this
instrumental justification was to extend to all of classical mathematics and
was to be based on the fixed, absolute foundation of finitist mathematics.
Keeping this in mind, one might say that Hilbert was striving for an ABSO-
L UTE consistency proof. 5
A Partial Solution The goal of obtaining an absolute consistency proof for
all of mathematics had to be abandoned. The general reductive program was
developed, however, and it has been pursued with great vigor and mathemat-
ical success for now almost half a century.6 The basic task of the modified
program can be seen as follows: find a significant part of mathematical prac-
tice, formalized in P*, and an appropriate constructive theory F*, such that
F* proves the partial reflection principle for P*:

P"*(d, rsl) => s .


Here, d is any P*-derivation and s is in a class C of F*-statements. It follows
immediately that P* is conservative over F* with respect to all statements
in C and, consequently, consistent relative to F*. The questions that had
sweeping general answers in the original program had to be addressed anew,
indeed in a more subtle way. In particular the following questions had to
be addressed: (1) Which parts of mathematical practice can be represented
174 CHAPTER 7

in a certain P*? And (2) What are (the grounds for) the principles of a
"corresponding" constructive F*? Briefly put, if a metamathematical con-
servation result has been obtained, it has to be complemented by additional
mathematical and philosophical work establishing its foundational interest by
answering these questions.
The actual proof theoretic work has focused on a particular part of math-
ematical practice, namely analysis. Hilbert and Bernays viewed this central
mathematical discipline as decisive for the (success or failure of) the reductive
program. But what is the framework P* in which analysis can be formally
presented?-In a supplement to the second volume of their "Grundlagen der
Mathematik" 7 they showed that second-order arithmetic suffices for this task.
It is for parts of this formal theory that the reductive program has been car-
ried out successfully, in rather striking and surprising ways. Let me describe
one reductive result that answers the two questions I asked earlier. It turned
out, through refined mathematical work that started with Weyl in 1918, was
inspired by constructivist ideas,S and was in a way completed by Takeuti, Fe-
ferman, and Friedman,9 ... well, it turned out that variants of the theory of
arithmetic properties are sufficient for the REPRESENTATION OF the PRAC-
TICE of analysis. The theory of arithmetic properties is a weak subsystem of
second-order arithmetic in which only the existence of arithmetically definable
sets is guaranteed. But how can it be that analysis is carried out with just
arithmetically definable sets? Isn't for example the impredicative least upper
bound principle crucial for any substantial development?l°-A version of the
principle, restricted to arithmetically definable sequences of sets, is provable
in the weak theory and suffices for applications, as the detailed mathematical
work shows. This is in my view a rather remarkable fact and all the more so,
as the theory of arithmetic properties is conservative over classical elementary
number theory and thus consistent relative to intuitionistic number theory,u
So we do have, using the earlier way of speaking, a justification of classical
analysis on the basis of weak constructive principles.
There is a tremendous variety of additional results and open questions.
The work goes, as you may suspect, into two different directions. On the
one hand one tries to establish reductive results for stronger subsystems of
second-order arithmetic for foundational reasons; on the other hand one tries
to push parts of analysis through in even weaker theories for computational
reasons. 12 I want to mention two results in this connection. The first concerns
classical, impredicative theories that are reducible to intuitionistic theories
of well-founded trees. The second concerns much weaker theories that are
actually reducible to finitist mathematics (when the latter is taken to include
primitive recursive arithmetic); nevertheless, they are of great mathematical
strength. I will discuss the first result only; the second result is taken up
briefly in the following remark.
REFLECTIONS 175

Remark Though Hilbert's "computational conception of the mathematical


mind" is of no consequence here, the mechanical features of formal theories
are being used for a variety of purposes. First of all, comprehensive theories
for classical and constructive theories can be implemented on computers:
Andrews, for example, is refining his TP-system based on Church's finite
type theory; a version of Martin-Lof's intuitionistic type theory is used by
Constable for constructive mathematics. Secondly, partial mechanizations
can be used for "computer-assisted research", as in the proof of the four-
color conjecture. And, thirdly, proofs in formal theories can provide direct
computational information. This point is most closely related to the detailed
pursuit of the modified Hilbert program; it was Kreisel who focused on it by
asking "What more than its truth do we know, if we have proved a theorem
by restricted means?" (here: in a weak subsystem of second or higher order
arithmetic). One answer to this question is given by characterizing the class F
of provably recursive functions of a formal theory T. If R( x, y) is a quantifier-
free arithmetic statement and

T I- ('Vx)(3y)R(x, y) ,

then we actually know that

for some fin F T I- ('Vx)R(x, /(x» ;

i.e., we obtain F-bounds for II~-sentences provable in T. Such bounds can


as a matter of fact be extracted from proofs by mechanical means, namely,
means used in proof theoretic consistency arguments. For such results to
be of genuine computational interest the class F has to be "small", yet the
theory T has to be strong for mathematical practice. A first, very important
step in this direction was made by Friedman, who introduced a subsystem of
second order arithmetic WKLo: it is of remarkable mathematical strength,
as shown by detailed work of Friedman and Simpson, but is conservative over
primitive recursive arithmetic. Consequently, its class of provably recursive
functions consists of exactly the primitive recursive ones.-Here is an area of
current research, where computational issues interact with rich mathematical
and metamathematical ones.
Structural Reduction A natural starting-point for elucidating the philo-
sophical significance of reductive results is a closer examination of the goals of
constructive (relative) consistency proofs. It is in the course of such an exam-
ination that the concept of "structural reduction" is introduced. To give you
some concrete sense of what is intended and what has been achieved, I start
out by describing the intuitionistic theories of well-founded trees, theories to
which some impredicative subsystems of classical analysis can be reduced.
That is one of the results I alluded to a minute ago.
176 CHAPTER 1

Well-Founded Trees I.d. classes are given by generalized inductive defini-


tions and have been used in constructive mathematics ever since Brouwer.
Two familiar examples are well-founded trees of finite sequences of natural
numbers (so-called unsecured sequences) and Borel sets. The former were em-
ployed by Brouwer in his justification of bar-induction, the latter in Bishop's
original development of measure theory. In spite of the fact that i.d. classes
can be avoided in the current practice of constructive analysis, particular ones
are of intrinsic foundational interest.
The constructive (well-founded) trees form such a distinguished class,
called O. 0 is given by two inductive clauses, namely (i) if e is 0, then e is
in 0, and (ii) if e is (the Godel-number of) a recursive function enumerating
elements of 0, then e is in O. The elements of 0 are thus generated by joining
recursively given sequences of previously generated elements of 0 and can be
pictured as infinite, well-founded trees. Locally, the structure of such a tree
is as follows:

{{e}(3)}(O)

Higher tree classes are obtained by a suitable iteration of this definition along
a given recursive well-ordering of the natural numbers. Suitable means that
branchings in trees are taken over the natural numbers and also over already
given lower tree classes. Their constructive appeal consists partly in this:
the trees reflect their build-up according to the generating clauses of their
definition directly and locally in an effective way. If one views the clauses
as inference rules, then the constructive trees are infinitary derivations and
show that they fall under their definition.
Constructive theories for 0 have been formulated as extensions of intu-
itionistic number theory with two principles for O. The first principle

(V'x)(A(O, x) ~ Ox)

is a definition principle making explicit that applications of the defining


clauses to elements of 0 yield elements of O. A( 0, x) is the disjunction of the
REFLECTIONS 177

antecedents of the generating clauses for 0 formulated above. The second


principle

('v'x) (A(S, x) ~ Sx) ~ ('v'x)(Ox ~ Sx)

is a schematic proof principle expressing that one can give arguments by


induction on O. Here, S is any formula in the language of the theory, and
A(S, x) is obtained from A(O, x) by replacing all occurrences of 0 byoccur-
rences of S. Proofs by this principle, "similar" to that for ordinary induction
in number theory or epsilon induction in set theory, follow or parallel the
construction of the elements of O. The resulting theory is called IDI (0). For
the higher tree classes the definition and proof principles can be formulated
in a similar, albeit more complicated manner. The theory is denoted by ID<,x
(0), when the iteration proceeds along arbitrary initial segments of the given
well-ordering of type A.
It is to such intuitionistic theories that some strong classical, impred-
icative theories can be reduced. 13 The latter involve in the simplest case the
comprehension principle for sets of natural numbers

(3X)('v'x)(Xx ¢=::} Sx)

for formulas S of the form ('v'Y)(3y)RyY, where R is quantifierfree and may


contain set and number parameters.-From an intuitionistic point of view
(but also the vantage point of Poincare or Russell) such a principle is vi-
ciously circular. Sets are assumed to be constructed. And, as Godel put it,
[i}n this case there must clearly exist a definition (namely the description of
the construction) which does not refer to a totality to which the object defined
belongs. That condition, crucial for a constructive understanding of sets, is
violated by impredicatively defined sets. So, not surprisingly, such definitions
are intuitionistically inadmissible. The reduction of the subsystem of analysis
described above is intuitionistically nevertheless most satisfactory and guar-
antees, in particular, relative consistency. But what, if anything, has been
gained from a more general perspective?-Let me try to address this question
by analyzing the epistemological goals of the (modified) Hilbert Program and
by formulating two (philosophical) tasks.
Special Status In my earlier description of Hilbert's Program I emphasized
its instrumentalist goal: a consistency proof allows us to recognize the truth
of statements that are meaningful from a restricted standpoint, but in whose
proof problematic principles have been used. As in the case of instrumental-
ism for scientific theories two questions arise immediately and underly any
close examination of the epistemological point of reductions of "problematic"
178 CHAPTER 7

theories P to "secure" theories F. Here, I first consider the question "What is


the special status of (constructive) F's?". The second, complementary prob-
lem "What are reasons for adopting particular (classical) P's?" has to wait
for another occasion.
The first question was answered by Hilbert for finist mathematics quite
directly. According to Hilbert, finitist statements form the contentual, intu-
itively founded part of mathematics; they deal, and I quote,
with certain extralogical concrete objects that are intuitively present as immediate
experience prior to all thought. If logical inference is to be reliable, it must be
possible to survey these objects completely in all their parts, and the fact that
they occur, that they differ from one another, and that they follow each other, or
are concatenated, is immediately given intuitively, together with the objects, as
something that neither can be reduced to anything else nor requires reduction. a
However appealing such a radical position may be, it is difficult to see how it
supports Hilbert's claim, that (restricted versions of) the principles of induc-
tion and recursion are finitistically correct. Bernays, in a very thoughtful and
informative essay written before Godel's incompleteness results were known,
describes matters differently. The finitist part of mathematics is not founded
directly on the intuition of concrete objects, but rather, "it corresponds to a
standpoint where one REFLECTS already on the general characteristics of in-
tuitively given objects" .15 And here is the first philosophical task, of interest
quite independently of the special context in which I am presenting it:
(I) Analyze this reflection on finite configurations and investigate, whether and how
the proof principle of complete induction and the definition principle of primitive
recursion can be founded on it.
Significant contributions have been made by Bernays, Kreisel, Parsons, and
Tait,16 but I think there continue to be non-trivial issues.
Godel's discovery of the incompleteness phenomena undermined the spe-
cial status of finitist mathematics as an absolute basis and (refuting Hilbert's
formalism as an even plausible view) led to the reductive program, freed from
harsh metamathematical restrictions and radical philosophical aims. Yet that
program preserves, it seems to me, Hilbert's fundamental insight: to recog-
nize the consistency of an axiomatic theory it is not necessary to exhibit a(n
infinite) structure; it is sufficient to solve a combinatorial problem concern-
ing finite structures, to wit, formal derivations. Bernays17 formulated the
epistemological interest of this idea most succinctly:
In taking ... the deductive structure of a formalized theory as the object of investiga-
tion, the theory is so-to-speak projected into number theory. The number theoretic
structure obtained in this way is in general essentially different from the structure
intended by the theory. However, that structure can serve the purpose of recogniz-
ing the consistency of the theory from a standpoint that is more elementary than
the assumption of the intended structure.
REFLECTIONS 179

What is aimed for is, perhaps, best called a STRUCTURAL REDUCTION.


The hope had been that the principles used for the solution of the combina-
torial problem could be recognized as correct on the basis of finite structures;
that hope had to be given up. We know now that infinitary objects have to
be admitted, e.g., in Godel's consistency proof for number-theory, primitive
recursive functionals of finite type,18 in Gentzen-style proofs infinitary, but
constructive "cut-free" derivations that can be coded as well-founded trees.
In any event, there is a clear sense in which the structural reduction of classi-
cal impredicative theories to intuitionistic theories of well-founded trees can
be viewed as an epistemological one. To assess, however, its epistemological
significance requires a detailed analysis of the principles employed in its proof
together with their justification in terms of the given constructive objects. 19
Above, I tried to indicate some features that point to the special status of
the intuitionistic theories for well-founded trees. Nevertheless, we still have
to address the obvious modification of the earlier philosophical question, for-
mulated again as a task:
(II) Extend the reflection on finite configurations to constructive well-founded infi-
nite trees and investigate, whether and how induction and recursion principles can
be founded on it.
The special character of these constructive objects is certainly recognized,
but there is-to my knowledge-no sustained conceptual analysis.
Concludulg Remarks
In the background of these considerations are issues that are grouped in
the contemporary discussion of foundational problems under the heading of
Platonism and Constructivism. Godel has perhaps most poignantly formu-
lated the general role of constructive consistency proofs. He remarked that
giving such a proof for classical mathematics means, and I quote, "to replace
its axioms about abstract entities of an objective platonic realm by insights
about the given operations of our mind". Avoiding the schematic (and also
enignlatic) opposition in Godel's remarks,2° I see the task of such proofs in
relating two complementary aspects of our mathematical experience, namely,
• the impression, that mathematics deals with structures of abstract objects
that are independent of us, and
• the conviction, that principles for some structures are immediately evident,
because we can grasp the build-up or construction of their elements.

Hilbert sought a radical solution to bridge the apparent gulf between


classical and constructive mathematics. 21 The radical solution cannot be ob-
tained. The issues are much more delicate than the foundational dispute in
the twenties (or, for that matter, the writing of some contemporary authors)
leads one to believe. On the one hand, the mathematical work reported above
180 CHAPTER 7

shows that classical analysis can be carried out in a tiny corner of Cantor's
paradise, where even constructivists can feel at home (aware of the requisite
reductive result). On the other hand, the "vicious circularity" of impred-
icative definitions does not lead to contradictions in all cases; at least that
much follows from the above proof theoretic results even for constructivists.-
The general formulation of the two aspects of mathematical experience (cries
out for analytical substance for the concepts of structure and construction
and) suggests concrete work for philosophical analysis and attendant meta-
mathematical investigations. Together, they promise to give us a deepened
understanding of what is characteristic of and possibly problematic in clas-
sical mathematics and of what is characteristic of and taken for granted as
convincing in constructive mathematics.

NOTES
1 Hilbert, "Die Grundlagen der Mathematik" (1927) This paper was presented
to the Mathematical Seminar in Hamburg in July 1927; it was published in: Ab-
handlungen aus dem mathematischen Seminar der Hamburgischen Universitiit 6
(1928), pp. 65-85. A translation can be found in van Heijenoort (ed.), From Frege
to Godd, Cambridge, 1967, 464-479.-Incidentally, it was Hilbert who provoked-
by the very formulation of the Elltscheidullgsproblem for predicate logic-Church's
and Turing's work on computability, work that is central in theoretical computer
science.
2 Those foundational issues go back to the 19th century and were discussed,
in particular, during the seventies and eighties. A brief account of this history is
given in my paper "Foundations for Analysis and Proof Theory", Synthese 60(2),
1984, 159-200.
3 Hobbes, De Corpore, in particular the section "Computatio Sive Logica",
and Leibniz, De Arte Combinatoria.
4 On the more psychological side Bernays reports in C. Reid's Hilbert biogra-
phy (pp. 173-174):
For Hilbert's program ... experiences out of the early part of his scientific career (in
fact, even out of his student days) had considerable significance; namely, his resistance to
Kronecker's tendency to restrict mathematical methods and, particularly, set theory. ...
In addition, two other motives were in opposition to each other-both strong tendencies
in Hilbert's way of thinking. On one side, he was convinced of the soundness of existing
mathematics; on the other side, he had-philosophically-a strong scepticism. ... The
problem for Hilbert ... was to bring together these opposing tendencies, and he thought
that he could do this through the method of formalizing mathematics.
1> You may ask, can this instrumentalism (radical formalism) be reconciled
with Hilbert's view of the soundness (and meaningfulness) of mathematics? My
answer is "YES, somewhat plausibly, if the formal theories under consideration are
complete." -The notion of completeness figures already significantly in Hilbert's
earliest foundational paper "Uber den Zahlbegriff" (1899). The formalisms for
elementary number theory and analysis set up in the twenties were believed to be
REFLECTIONS 181

complete in the Hilbert school. If a formalism allows then, as Hilbert put it, "to
express the whole content of mathematics in a uniform way", and if it provides
"a picture of the whole science", why not make a bold methodological turn and
avoid all the epistemological problems connected with the (classical treatment of
the) infinite?
6 See e.g., Kreisel, "A Survey of Proof Theory", Journal of Sf/I11bolic Logic
33,1968, pp. 321-388; Takeuti, Proof Theory (Second edition), North Holland Pub-
lishing Company, Amsterdam 1987; Feferman, "Theories of Finite Type related
to Mathematical Practice", Handbook of Mathematical Logic, Amsterdam 1977,
pp. 913-971; Buchholz, Feferman, Pohlers, Sieg, "Iterated Inductive Definitions
and Subsystems of Analysis: Recent Proof-Theoretical Studies", Springer Lecture
Notes in Mathematics 897, Berlin, Heidelberg, New York 1981.
7 Supplement IV ("Formalismen zur deduktiven Entwicklung der Analysis")
in volume II of Grundlagen der Mathematik (Second edition), 1970.
8 Most important, apart from the work of Brouwer's, are two books: Weyl,
Das Kontinuum, Leipzig, 1918; Bishop, Foundations of Constructiue Analysis, New
York,1967.
9 Takeuti, Two Applications of Logic to Mathematics, Princeton, 1978; Fe-
ferman, l.c., see 6; Friedman, "A Strong Conservative Extension of Peano Arith-
metic", in: The Kleene Sf/I11posium, (Barwise, Keisler, Kunen, eds.), Amsterdam,
1980, pp. 113-122.
10 A (set existence) principle is called impredicative if sets, whose existence is
guaranteed by the principle, are given by formulas involving quantifiers the range
of which includes those very sets.
11 That is a result of Godel and Gentzen mentioned below.
12 This distinction between foundational and computational reduction is elab-
orated in Sieg, "Reductions of Theories for Analysis", in: Foundations of Logic and
Linguistics, (Dorn and Weingartner, eds.), New York, 1985, pp. 199-231. There is
also a discussion of the (mainly mathematical) interest of reductions of the second
type.
13 For the detailed description of these theories I have to refer to the litera-
ture, e.g., the Lecture Notes volume listed in 6.-The reference below is to GOdel's
"Russell's Mathematical Logic", reprinted in Benacerraf and Putnam's collection of
selected readings Philosophy of Mathematics, Cambridge University Press, Second
Edition, 1983, p. 456.
14 Hilbert, "On the Infinite", in: van Heijenoort, I.e., p. 376.
15 Bernays, "Die Philosophie der Mathematik und die Hilbertsche Beweis-
theorie", published in 1930 and reprinted in Abhandlungen %1.Ir Philosophie der
Mathematik, Darmstadt, 1976, pp. 17-61, especially p. 40.
16 Kreisel, "Mathematical Logic", in: Lectures on Modem Mathematics, vol.III,
Saaty (ed.), New York, 1965, pp. 95-195, in particular, pp. 168-173. Parsons, "In-
tuition in Constructive Mathematics" ,in: Language, Mind, and Logic (Butterfield,
ed.), Cambridge University Press, 1986, and other papers referred to there; Tait,
"Finitism", Journal of Philosophy 78, 1981, pp. 524-546.
182 CHAPTER 7

17 Bernays, "Die schematische Korrespondenz und die idealisierten Struk-


turen", reprinted in the volume of essays mentioned in 1!>, p. 186. This paper was
published originally in 1970. I should mention that essentially the same view is
expressed in Bernays's 1930 paper mentioned above; except, that the finitist stand-
point is taken as "absolute". After the discussion of completeness ("deduktive
Abgeschlossenheit") inclusive a footnote stating (incorrectly, as we know) that com-
pleteness of a formal theory is not as far reaching as its decidability, we read (on
p.59):
1m Gebiete dieser Wld verwandter Fragen liegt noch ein betrachtliches Feld der Prolr
lematik offen. Diese Problematik ist aber nicht von der Art, daB sie eine EinwendWlg gegen
den von WlS eingenommenen Standpunkt darstellt. Wir miissen uns nur gegenwii.rtig hal-
ten, daB der Formalismus der Siitze und Beweise, mit denen wir unsere Ideenbildung zur
Darstellung bringen, nicht zusammenf'aIlt mit dem Formalismus derjenigen Struktur, die
wir in der Gedankenbildung intendieren. Der Formalismus reicht aus, urn unsere Ideen von
unendlichen Mannigfaltigkeiten zu formulieren, aber er vermag im allgemeinen nicht, die
Mannigfaltigkeit gleichsam aus sich kombinatorisch zu erzeugen.
18 Godel, "Eine bisher noch nicht beniitzte Erweiterung des finiten Stand-
punktes", Dialectica 12 (1958), pp. 280-287.
19 There is a genuine conceptual problem for attempts to extend the reductive
work concerning impredicative subsystems of analysis beyond the strongest system
that has been treated: i.d. classes are provably not sufficient. We have to find a
new, broader concept of "constructive mathematical object", if there is to be any
prospect for genuine advances.
20 That is quoted from Reid, I.c., p. 218.
One has to keep in mind that for Hilbert there was only one constructive
21
mathematics, as finitist and intuitionist mathematics were assumed to be coexten-
SIve.
Wilfried Sieg
Department of Philosophy
Carnegie Mellon University
Pittsburgh, PA 15213
THE TETRAD PROJECTl

PETER SPIRTES, RICHARD SCHEINES, and CLARK GLYMOUR

1. Introduction

Questions about causality have long concerned philosophers. These


questions have included: What is a causal relation? Can it be reduced to
some other, more primitive kind of relation? What types of causal relations
are there? Does it make sense for a later event to cause an earlier event?
Does every event have a cause? Is the world deterministic?
Surprisingly, however, one fundamental epistemological question-indeed,
from a practical point of view the fundamental question-about causality has
received relatively little attention from philosophers. That question is, how
can we reliably determine whether or not events of one sort can cause those of
another sort? For example, how can we reliably determine whether penicillin
cures syphilis? In this example, and in many cases in the sciences and in
everyday life the answer is that there are reliable experimental procedures for
answering questions about causal relations.
However, in many domains either practical or ethical considerations
make experimentation impossible. Obvious practical considerations prevent
astrophysicists from performing experiments on the the formation of galaxies
or the birth of the universe. Obvious ethical considerations prevent social sci-
entists from determining the effects of poverty by randomly selecting a group
of children to live in poverty. Nevertheless, as the example of astrophysics
shows, genuine science without experimentation is possible.
Where experiments cannot be performed it is often possible to collect
statistical data. The problem that we will address is how causal relations
can reliably be inferred from statistical data; far less progress has been made
on this question than on the question of how to infer causal relations when
experimentation is possible.
We have devised a procedure that, granted some substantive assump-
tions, can reliably infer some causal relationships from statistical data. The
procedure has been implemented in a computer program, TETRAD II. The
results obtained with the program have been impressive and surprising. In
Monte Carlo simulation tests, TETRAD II has reliably inferred causal rela-
tions from statistical data; when applied to empirical data, it has often found
causal models that perform better on standard statistical tests than models
suggested by researchers not using TETRAD II. Of course, while TETRAD II

183
W. Sieg (ed.), Acting and Reflecting, 183-207.
© 1990 by Kluwer Academic Publishers.
184 CHAPTERS

can be applied in a wide variety of situations, it is not applicable everywhere.


Restrictions upon its use will be described in more detail below.
The development of such a program has required that certain traditional
questions in the philosophy of science be addressed, but other questions, even
those about causality, have proved inconsequential. The TETRAD II program
does not, for example, require an analysis, probabilistic or otherwise, of the
notion of causality, but it does suppose an understanding of a connection
between causality and conditional independence. The procedures of the pro-
gram do not require an analysis of the notion of "statistical explanation,"
and none of the philosophical theories of that notion play any role; but the
program does depend on a general notion of scientific explanation that is not
restricted in its application to statistical theories, and it depends on criteria
for judging whether or not one explanation is better than another. Philos-
ophy of science has long been concerned with the logical clarification and
mathematical reconstruction, for epistemological purposes, of various kinds
of scientific theories. The TETRAD programs require a special reconstruction
of the formal structure of classes of statistical theories that have a causal
component, and this reconstruction aims not only to catch distinctions im-
plicit in the inferences made with such t!1eories, but also to capture features
of theories that are relevant to their capacity to provide good explanations.
The philosophical literature has almost entirely ignored questions about how
to search for the best (or better, or adequate) explanations, but such ques-
tions, and answers to them, are at the very center of the TETRAD project.
The remainder of this paper will describe some of the results of that project.
2. The Problem of Inferring Causal Structure from Statistical
Data
A social scientist or psychologist or epidemiologist or biologist attempt-
ing to develop a good statistical theory faces many difficult problems. The
researcher must choose what variables to measure and how to measure them,
as well as worrying about sampling technique and sample size. He or she must
also consider how the variables are distributed, and worry about whether
measures of a variable in one individual or place or time are correlated with
measures of that same variable in another individual or place or time. The
researcher must ask whether the relations among the variables are linear, or
non-linear, or even discontinuous. These are demanding tasks; fortunately
there are a variety of data analytic techniques to help one along and to test
hypotheses concerning these questions. Suppose the investigator has passed
these hurdles and has arrived, however tentatively, at the usual statistical
modeling assumptions: the distribution is multinormal, or nearly so, and
there is no autocorrelation, or at least not much. Assume that the investiga-
tor has covariance data for six variables.
TETRAD 185

The problems that now must be faced make the data analysis problems
seem to pale in comparison. There are 415 alternative possible theories of the
causal relationships among the six variables, and only one of those theories
can be true. How can the researcher find the one needle in the enormous
haystack of possibilities? Even if the goal is not be find the one unique cor-
rect theory, but only to reduce the possibilities to a handful, or to find a close
approximation to the correct model, how is that to be done? The "conven-
tional wisdom" says that the investigator must apply his or her "substantive
knowledge" of the domain, but this is mostly whistling in the dark. To get
the number of possibilities down to manageable size will take much more sub-
stantive knowledge than we usually have about social or behavioral domains.
Suppose, for example, that one is quite certain that the causal relations are
acyclic. That is, no sequence of directed edges in the true graph of the causal
relations leads from one variable back to that same variable. The number
of alternative causal models consistent with this is still greater than three
million. Suppose (contrary to what is usual in many social science studies)
that the variables are ordered by time, and the researcher therefore knows
that variables occurring later cannot possibly be causes of variables occur-
ring earlier. There are still 5! or 120 alternative models compatible with this
restriction.
Now let us repeat the sanIe sequence of calculations when there are
twelve variables in the data set. Without any restrictions imposed, there are
466 alternative causal models, only one of which can be true. If the researcher
knows that there are no cyclic paths in the true model, the number of alterna-
tives is truly astronomical: 521,939,651,343,829,405,020,504,063 (see Harary
& Palmer 1973). And even if the researcher is lucky enough to be able to
totally temporally order the variables and knows that later variables cannot
cause earlier variables, the number of alternatives is reduced to 11! or a mere
39,916,800.
These counts are very conservative. For example, they do not not in-
clude the possibility considered by every researcher, that some part of the
correlations is due to unmeasured variables that have effects on two or more
measured variables. When such possibilities are included the already enor-
mous numbers are enormously increased. What can an investigator possibly
do to select the one correct model from the enormous number of possible
causal models? Unfortunately, in practice, even the best of researchers usu-
ally take a wild guess, or indulge their prejudices rather than their knowledge.
One, or two, or sometimes even three, causal models are suggested, the ap-
propriate equations are written down, the parameters are estimated and the
model is subjected to some statistical test. If the test is passed, the researcher
and journal editors are happy and the thing is published. No one likes to men-
tion (or even think about) the millions, billions or zillions of alternatives that
186 CHAPTER 8

I+------e

Figure 8.1: Timberlake and William's Regression Model

have not been considered, or the dark and troubling fact that among those
alternatives there may be many which would pass the same statistical test
quite as well or even better, and which would also square with well-established
substantive knowledge about the domain.
The existence of alternative models superior to those that are published
is not just an idle worry. Consider the following example taken from the
American Sociological Review (Timberlake & Williams 1984). Sociologists
have long debated whether a large influx of foreign capital causes third world
countries to become more or less politically repressive. M. Timberlake and
K. Williams are two such researchers who attempted to answer this question
with data measured between 1967 and 1977 on 72 "peripheral" countries.
They used the following four variables: the level of foreign capital investment
received (fi), the level of economic development as measured by the log of
the energy consumed per capita (en), an index of the lack of civil liberties
( cv), and an index of the level of political repression (po). They offered the
following regression model (Fig. 8.1) in order to examine the effect of fi on
po. The variable e is an error term, curves connecting pairs of independent
variables represent unexplained correlations, and there is an arrow from A to
B iff A is an immediate cause of B.
Since the regression coefficient connecting fi to po is estimated to be
both positive and significantly different than zero, Timberlake and Williams
concluded that foreign capital investment causes higher levels of political
repression in peripheral countries.
However, even among just these four variables there are hundreds of
different possible causal models. The technique of multiple regression that
Timberlake and Williams used to construct their model, while often employed
in the social sciences, rules out the vast majority of these models a priori,
even though many of them are perfectly plausible. Using TETRAD (an early
TETRAD 187

Figure 8.2: Our Alternative

version of the TETRAD II program) we devised a series of alternative models


that perform quite well on standard statistical tests, and are just as plausible
on substantive grounds as Timberlake and William's model. One of our
alternatives is depicted in Fig. 8.2.
In this model, the estimated effect of foreign investment on economic
development is both positive and significant; thus it asserts that more foreign
money causes more economic development. The effect of economic devel-
opment on political repression is significant but negative; thus this model
predicts that economic development inhibits political repression. Therefore,
according to this model more foreign investment causes less political repres-
sion. Using the same variables and the same data, but a different causal
structure, we arrive at a conclusion completely opposed to the one Timber-
lake and Williams endorse.
We cannot be sure that Timberlake and Williams model is wrong, or that
one of our alternative models is correct. However, the existence of models
that are superior to Timberlake and Williams' model on statistical grounds,
and that are equally plausible, illustrate the need for some systematic way of
searching through the vast number of possible causal models.
3. TETRAD II
The overall aim of the TETRAD II program is to provide a fully automated
system in which the user can enter data together with whatever fragmentary
knowledge he or she may have about the causal processes among the measured
variables, as well as information, if any is available, about which measured
variables do or do not have common causes. The aim is to minimize the sta-
tistical sophistication required of the user, and so far as possible to reduce the
input required, other than the data, to the specification of simple constraints
on directed graphs. Ideally the input, except for the data, could be given by
drawing a few pictures. What the user will receive in return is a set of alter-
native statistical/causal models that explain the data, and that include all
of the "best" explanations of the data that are consistent with the input the
188 CHAPTER 8

user has provided. In addition, the program will automatically prepare input
files for programs that provide parameter estimates and statistical tests for
each model desired The parts of TETRAD II so far implemented perform these
tasks, but not with the full generality or efficiency that we think is possible.
The core of the TETRAD II is a procedure that takes a partially specificed
statistical/causal theory and data, and searches for causal connections sup-
ported by the data but not included in the theory. The procedure depends
on the following ideas:
1. Typically, causal hypotheses for statistical data can be represented by
directed graphs;
2. Any directed graph determines a class of joint probability distributions
on random variables corresponding to the vertices of the graph, and all
distributions in the class share certain independence assumptions;
3. There are patterns or constraints among covariances of any probabil-
ity distribution whose independence assumptions are represented by a
directed graph, and these constraints are determined entirely by the
graph;
4. The geometrical features of graphs that determine probability distribu-
tions satisfying specific constraints are computable;
5. The absence of geometrical features that determine models satisfying
specific constraints are inheritable-if a graph does not have such a
feature, no extension of that graph will have the feature;
6. Enough of the sample distribution theory of the constraints in question
is known to make possible statistical tests for the constraints in the
multinormal case;
7. The combination of these facts makes possible a fully computerized
search which seeks graphs that explain the patterns found in sample
data, that do not generate patterns not found in the sample data, and
that are as simple as possible;
8. The asymptotic reliability of the search procedure can (we hope) be
demonstrated.

We use heuristic search techniques to search for those causal models that
best explain the data, i.e., whose implied constraints most closely match the
constraints judged to hold in the population.
There are four components to this strategy:
1. determining the constraints implied by a given causal model;
2. judging what constraints hold in the population;
TETRAD 189

3. judging how closely a given causal model's pattern of implied constraints


matches a pattern of constraints judged to hold in the population;
4. searching efficiently for those causal models whose pattern of implied
constraints matches a pattern of constraints judged to hold in the pop-
ulation.
We will briefly describe each of these components in subsequent sections;
first, however, it is necessary to describe more precisely the relationship be-
tween causal models and graphs.
Probabilistic Models and Graphs We have already seen that directed
graphs can be used to encode causal theories: there is a directed edge from A
to B in graph G iff A directly causes B in theory T. Directed acyclic graphs
can also be used to encode all of the conditional independence relations in a
wide variety of joint probability distributions. Conditional independence re-
lations true of a probability distribution P imply important constraints upon
the covariance matrix of any population correctly described by P. We search
for graphs that encode probability distributions which imply constraints upon
covariance matrices that most closely match the constraints that we judge to
hold in the population. Then we read the causal relations from the graphs.2 In
this subsection we describe how directed graphs can be used to encode condi-
tional independence relations. The definitions and theorems in this subsection
are all based on the work of Judea Pearl (1988) and his colleagues.
Let P be a joint probability distribution over a set of variables U, and
X, Y and Z be subsets of U. Lowercase letters such as x represent a con-
figuration of values of the variables in a subset of variables. X and Yare
conditionally independent given Z in P iff P(X = xlY = y, Z = z) =
P(X = xlZ = z). Following Pearl, we use the notation I(X, Z, Y)p to de-
note the conditional independence of X and Y given Z in P. If x and yare
individual variables, then I(x, Z, y) implies that the partial correlation of x
and y on Z is equal to OJ in multinormal distributions, the converse is also
true.
In order to explain how graphs represent conditional independencies we
will introduce a number of definitions. A directed graph is an ordered
pair < V, E >, where V is a set of nodes, and E is a set of ordered pairs
of nodes. A directed path P from Vi to Vj is an ordered sequence of
vertices < Vi, .•• , Vj >, such that every pair of vertices < v x , Vy >, where Vy
is an immediate successor of Vx in P, is in E; Vi is the source of P, and Vj
is the sink of P. An acyclic path contains each vertex at most once. An
acyclic graph contains no cyclic paths. An undirected path P between
v i and Vj is an ordered sequence of vertices < Vi, ..• , Vj > such that for
every adjacent pair of vertices Vx and Vy in the sequence either < v x , Vy >
190 CHAPTERS

v w

Example of Directed Path P: < u, v, z, y >


Example of Undirected Path U: < u, V, z, w >
Example of Node with Converging Arrows on U:x

Figure 8.3: Graph Concepts

is in E, or < vY ' Vz > is in E, and no vertex appears more than once. A


node x in an undirected path P has converging arrows iff P contains
a subsequence < Vi, Vj, Vk > such that < Vi, Vj > is in E, and < Vk, Vj > is in
E. A node x has a descendant y iff there is a directed path from z to y.
We illustrate a number of these concepts in Fig. 8.3.
If X, Y, and Z are three disjoint subsets of nodes in a directed acyclic
graph G, then Z d-separates X from Y iff there is no undirected path from
a node in X to a node in Y such that
1. every node with converging arrows is in Z or has a descendant in Z; and
2. every other node is outside Z.
Fig. 8.4 (taken from Pearl 1988) illustrates the concept of d-separation.
We adopt the convention (following Pearl) that a directed acyclic graph
G represents a probability distribution P just when Z d-separates X and Y
in G iff I(X,Z,Y)p.
There are several limitations to representing conditional independence
relations in this way. First, different graphs can represent the same set of
conditional independence relations. Second, not every probability distribu-
tion can be represented graphically. Unfortunately, it is difficult to give a
complete characterization of precisely which probabilistic causal models can
be represented by directed graphs (see Pearl 1988). Nevertheless, the con-
vention is a useful one for the following reason: if a graph encodes a set S
TETRAD 191

v w

{v} and {w} are d-separated by {u}


{v} and {w} are not d-separated by {u,y}

Figure 8.4: D-Separation

atomic conditional independence statements, then it also encodes all atomic


conditional independence statements that logically follow from S.3
The most fully developed part of TETRAD II is intended to correct mis-
specifications in probabilistic causal models that can be represented graphi-
cally. However, in order to give simple, concrete illustrations of various ideas
used in TETRAD II, we will describe one particular kind of probabilistic causal
model that can be represented graphically. Structural equation models,
or linear causal models are widely used throughout the social sciences,
in social psychology and psychometrics, in educational research, market re-
search, epidemiology and areas of biology. Regression models, path analytic
models, and factor analytic models are special cases of structural equation
models. Each structural equation model consists of the following four parts:

• A set of random variables with a joint probability distribution.


• A set of distribution assumptions about the random variables.
• A set of linear equations relating the random variables.
• A set of causal relations among the random variables.

A specific example of a structural equation model is given below.

• The random variables are {v,w,x,y"T,el,e2,e3,e4}' The variables


v, w, x, and yare measured variables, T is a latent variable, and el, e2, e3, e4
are "error" or "disturbance" terms.
192 CHAPTERS

v
/l~ w x y

1 1 1 1
Model I

Figure 8.5: Causal Graph

• The set of linear equations is:

v =aT+el
w = bT+ e2
x = cT+ e3
y = dT+ e4

• The collection of all variables is multinormally distributed, and T and


the ei are uncorrelated and have unit variance and zero mean.
• The set of causal relations is: {< T, v >, < T, w >, < T, x >, < T, y >
,< el,V >,< e2,W >,< e3,X >,< e4,y >}, where < r,s > is in the set
of causal relations if and only if r is an immediate cause of s.

The set of causal relations can also be represented in a directed graph,


in which there is an edge from r to s if and only if r is an immediate cause
of s. The directed graph representation of the set of causal relations for this
example is depicted in Fig. 8.5.
The linear equations are by convention expressed in a canonical form in
which r appears in the equation for s if and only if r is a direct cause of s.
For example, v is expressed as a function of el, while el is not expressed as a
function of v. Using this convention we can recover from the graph important
parts of the statistical model. The graph encodes both the form of the linear
equations, and assumptions of statistical independence that are implicit in
the statistical model. The graph does not encode the particular numerical
values of the linear coefficients, the variances of the independent variables, or
the joint distribution family (e.g., multinormal).4
The graph is not only a vivid representation of the form of the equations
of a structural equation model; it also determines certain kinds of statistical
constraints, or overidentifying constraints that a structural equation
TETRAD 193

T T

v
/l~ w x y v
/l~ w x y

t
el
t
e2 e3
t t
e4
t
el
t
e2 e3
t t
e4
Model II Model III

Figure 8.6: Two Causal Graphs

model may imply. Several classes of constraints concern tetrad differences.


A tetrad difference is the determinant of a 2 x 2 submatrix of the covariance
matrix: 'Yij'Ykl-'Yik'Yjl, where 'Yij is the covariance between i and j, and i, j, k,
and I are distinct variables.
Figs. 8.5 and 8.6 illustrate how the overidentifying constraints implied by
a structural equation model can be determined from the graph of the model.
(In Fig. 8.6 the labels on each edge of the graph represent the corresponding
coefficient in the set of linear equations.)
The tetrad difference in Model I is

'Yvw'Yzy - 'Yvz'Ywy = abcduj. - abcduj. =0


In this case, the tetrad difference vanishes regardless of the values of free
parameters (the linear coefficients and the distributions of the independent
variables). When a structural equation model robustly specifies a vanish-
ing tetrad difference in this way, we say the model strongly implies the
vanishing tetrad difference.
The tetrad difference in Model II is

'Yvw'Yzy - 'Yvz'Ywy = cdfuj.u~l

In Model II, the tetrad difference is positive regardless of the distribu-


tions of the independent variables, as long as the product of coefficients cdf is
positive. We say in this case that given the sign of cdf, the tetrad difference
is strongly implied to be positive.
The tetrad difference in Model III is
194 CHAPTERS

This tetrad difference may be zero for particular values of the coefficients
and variances, such as a = = = = 0';1 = 0';. =
d f 9 1, but it is not zero
for other values of the coefficients and variances; hence this constraint is not
robust in Model III. Even if the signs of the coefficients were given, Model II
does not imply that the tetrad difference is positive or negative.
Geometric Determination of Constraints The causal graph alone de-
termines whether or not a given vanishing tetrad difference or vanishing par-
tial correlation is robustly implied. s The causal graph together with infor-
mation about the signs of the coefficients labeling the edges can also place
constraints upon the signs of non-vanishing tetrad differences and correla-
tions.
As an example, we will state a necessary and sufficient condition for
a causal graph to strongly imply a first order vanishing partial correlation.
First, however, a few simple definitions are needed.
A non-empty path contains more than one vertex. A trek between
distinct vertices i and j is a pair of directed acyclic paths from a common
source V with sinks i and j respectively that intersect only at V, such that
at most one of the paths is empty. For a trek tij between i and j, i(tij) is the
path from the source of the trek to i.
Theorem 1: An acyclic graph G strongly implies that Pij.k = 0 iff
1. k occurs on every trek between i and j; and
2. every trek between i and k is an acyclic path from k to i, or every trek
between j and k is an acyclic path from k to j.
Other simple geometrical facts about treks determine whether or not a
model robustly implied a vanishing tetrad constraint.
Judging Whether a Constraint Holds in the Population We per-
form statistical tests to judge whether or not a given constraint holds in the
population. For example, suppose i and j are multinormally distributed. Let
r be the sample correlation, where the size of the sample is N. Then Fisher's
z statistic is

z = 1/2 X In 1 + Tij
1- Tij

If the population correlation is 0, then z is distributed as a standard


normal. Hence, we can calculate the probability p that Tij is as large or larger
than the observed value, given that the correlation is zero in the population.
If p is larger than the significance level (chosen by the user) we judge the
TETRAD 195

correlation to vanish in the population; otherwise we judge it not to hold in


the population.
Unfortunately, all of the tests that we employ make the highly restric-
tive assumption that the variables are multinormally distributed, as well as
making a number of other assumptions that are at best approximately true.
We do not know how sensitive the tests we employ are to deviations from
these assumptions. We are currently searching for statistical tests that make
less restrictive assumptions
Scoring At the most abstract level, the principles of scientific explanation
should apply in common to the social and to the natural sciences. Good social
science and natural science explanations differ in the mathematical setting
and the substance, not the form. Repeatedly, philosophers of science have
pointed out that scientific explanation addresses both particular facts and
regularities or patterns in the data. For example, Newtonian gravitational
theory explained those regularities of planetary motion that we call Kepler's
laws, while Dalton's atomic theory explained regularities such as the law
of definite proportions. Many philosophers and historians of science have
also suggested various forms for the explanation of regularities or patterns
in the data, forms that are supposed to be characteristic of good scientific
explanations (see Causey 1977, Glymour 1979, Riddell 1980, Rosenkrantz
1977). The details of philosophical explication are not pertinent here, but
one version of an explanatory principle could be crudely described as follows:
Other things being equal, if one theory implies a constraint
judged to hold in the data for all values of its free parameters, and
another theory implies that consttoaint only for specific values of its
free parameters, the first theory provides a better explanation of
the pattern than does the second theory.
There is both historical and theoretical justification for this principle.
Historically, refinements of this explanatory principle have been used to argue
for the superiority of Copernican astronomy as against Ptolemaic astronomy
in the 16th and 17th centuries, the superiority of atomic chemistry in the 19th
century, and the superiority of general relativity over Newtonian gravitation
theory early in this century. Theoretically, in many cases (and in particular
in the case of linear modeling) it can be shown that under a wide variety of
probability distributions over the free parameters of a theory, the probability
of a constraint holding only for specific values of the free parameters is zero.
In the context of linear causal theories, this principle becomes what we
call the Explanatory Principle: Other things being equal, prefer models
that strongly imply constraints on the covariance matrix that are judged to
hold in the population. Hence, if a constraint is strongly implied by a model
196 CHAPTERS

M, and is judged to hold in the population, we give the M credit; the tighter
the constraint holds in the sample, the more credit M gets.
A second principle we use in scoring models is the Falsification Prin-
ciple: Other things being equal, prefer models that do not strongly imply
constraints that are judged not to hold in the population. If M strongly
implies a constraint that is judged not to hold in the population then M re-
ceives discredit; the less tightly the constraint holds in the sample, the more
discredit M receives.
Unfortunately, the principles can conflict. For example, suppose that
model M' is a modification of model M, formed by adding an extra edge to
M, and that M' implies both fewer constraints that are judged to hold in
the population and fewer constraints judged not to hold in the population.
Then M' is superior to M with respect to the Falsification Principle, but in-
ferior to M with respect to the Explanatory Principle. Because the principles
can conflict, TETRAD II introduces a scoring function (the TETRAD-score)
that balances the relative merits and demerits of models. 6 The TETRAD-score
is controlled by a weight parameter that is used to determine the relative
importance of the Explanatory and Falsification Principles. The scoring func-
tion has several desirable asymptotic properties, but we do not know whether
the particular value for weight that we use is optimal.
Finally, if several different models have identical TETRAD-scores, we can
sometimes break ties by using the Simplicity Principle: Other things being
equal, prefer simpler models (i.e., models with higher degrees of freedom).
Search TETRAD II searches a tree of elaborations to an initial model. In
many cases the search space is quite large, but the search can be performed
in a reasonable amount of time for the following reasons.

• We have modified well known, fast algorithms for analyzing directed


graphs, to determine the set of all vanishing and positive tetrad differ-
ences implied by a model.
• We store most of the computational work required to evaluate a model
M and re-use it to evaluate elaborations of M.
• We are able to calculate a T-maxscore for each model M, which places
an upper limit on the TETRAD-scores that any elaboration of M could
receive. If any model 1\'1 receives a T-maxscore that is lower than the
TETRAD-score of some model lVI' already examined, then M and all of its
elaborations can be eliminated from consideration since none of them
could possibly receive a TETRAD-score as high as M'. Often, this can
be used to conclusively eliminate from consideration many millions of
models without ever explicitly generating or examining them.
TETRAD 197

TETRAD II searches for models that have high TETRAD-scores. We will


illustrate the search that TETRAD II uses in the following very simple case.
Suppose that we are looking for the elaboration of a model M that has the
highest TETRAD-score, and that there are only four possible edges, e1, e2, e3,
and e4 that could be added to M. The top half of Fig. 8.7 illustrates a full
search in which no candidates could be eliminated. Each node corresponds
to the model obtained by adding its edge to its parent. We generate the
first level of nodes (2, 10, 14, and 16) by considering all elaborations of node
1, the initial model, and then ordering these nodes left to right by their
T-maxscore. We then explore the subtree that represents all elaborations
of the most promising (leftmost) node. The numbers labeling the nodes
correspond to the order in which we explore their subtrees. Notice that in
the level of nodes that are elaborations of node 2, (3, 7, and 9), edge e4 is
more promising than edge e3, whereas the reverse is true on level 1. This
illustrates the interactive effect edge additions often exhibit. Whereas the
model that results from adding edge 3 to the initial model is more promising
than the one that adds edge 4, once edge 1 is added to the initial model the
order reverses.
We can safely eliminate many models from our search for the highest
TETRAD-scores without either generating them or examining them (assuming
our starting point is a submodel of the true model) by eliminating edges with
low T-maxscores. The bottom half of Fig. 8.7 illustrates this point. If, on
levell, T-Maxscore(e4) is less than TETRAD-score(Initial Model M) we can
eliminate all nodes in which e4 occurs without ever visiting them. Hence,
all of the nodes that represent models that contain M + e4 can be safely
eliminated from consideration without ever generating them. The tree that
results has 9 nodes instead of 16.
What is the relationship between a high TETRAD-score and the true
model? If we assume that
• M is a submodel of the true model, and
• the TETRAD-scores and T-maxscores are calculated from the popula-
tion covariances (instead of the sample covariances that we are actually
given), and
• every vanishing tetrad difference that holds in the population holds
because of the causal structure of the true model, and not because of
the particular parameter values of the true model,
then the correct model is at least tied for the highest TETRAD-score.
Of course, in actual practice, we calculate T-maxscore and TETRAD-
score on the basis of a sample covariance matrix, not a population covariance
matrix; nor do we know that every vanishing tetrad difference that is judged
198 CHAPTER 8

Tree Like Search


Full Tree I Initial Model

~UJ~
//0)\ r~~ @e4

®e2 CVe4 @e3 (U)e3 ©e4 @e4

e3 @
/\ \ e4 ® (8) e3
\ ©e4

I
®e4

e3@

Figure 8.7: TETRAD II's Search Strategy


TETRAD 199

to hold in the population actually does hold in the population or that it


holds in the population because of the causal structure of the true model.
Hence, in actual practice we allow the T-maxscore of a model to decrease to
a user-specified percentage of its parents T-maxscore before we cut off the
search. However, as the sample sizes increase without limit, the probability
approaches 1 that the sample covariance matrix is very close to the population
covariance matrix.
In many cases, an unrestricted search of the kind described here is too
slow to be practical. For this reason we have allowed the user to enter substan-
tive knowledge about the domain to narrow down the search. For example,
a user can specify that only acyclic graphs are to be considered, or that a
variable A cannot be the cause of a variable B (perhaps because A occurred
before B.) We have generally found that the combination of TETRAD II's au-
tomatic respecification by data analysis together with substantive knowledge
is necessary in order to make the search reasonably fast, and the number of
suggested models usefully small.
4. Case Studies
How reliable is TETRAD II in practice? TETRAD II has performed quite
well on both Monte Carlo simulation studies and on actual empirical data.
We describe both kinds of tests below.
Monte Carlo Studies Often, the application of a procedure to empiri-
cal data sets is of little use in judging the reliability of the procedure. The
reason is obvious: in empirical cases we generally don't know what the true
model is, so we can't judge whether the procedures have found it. We can
judge whether the procedures turn up something that isn't absurd, and we
can judge whether the procedures find models that pass statistical tests, but
neither of these features is of central importance. The issue that is of central
importance is whether or not the automated model revision procedures find
the truth. Sometimes we can obtain empirical tests of models produced by
the automatic searches, and they may provide some evidence of the reliability
of the procedures. However, such tests have rarely been obtained, and they
cannot be relied upon since one does not know whether the initial' model given
to the program is empirically correct. It is also possible to do mathemati-
cal analyses of the power of a discovery procedure to distinguish or identify
alternative structures in the limit, as the sample size grows without bound.
Some results of this kind pertinent to the methods employed in TETRAD II
are implicit in Discovering Causal Structure, and we have subsequently
proved a number of other limiting properties of the TETRAD II procedures.
However, limit results do not address the behavior of automated discovery
200 CHAPTERS

Part A: Model 2
[!J -.6 .[!J
741.~ Y4-1~
El §] §] 8 E] 8 El §]
I 1.2 / e5t --e6
1.92 t

Part B: Start for Model 2

[!J .[!J
/l~ /l~
El §] §] 8 E] 8 El §]

Figure 8.8: Initial Graph and Correct Graph

procedures on samples of realistic sizes, and it is that which ought most to


concern empirical researchers.
The best available solution is to apply Monte Carlo simulation meth-
ods to assess the reliability of model respecification procedures. We used a
random number generator to generate data for a specified sample size from
a specified structural equation model. (See for example, part A of Fig. 8.8).
Then part of the model used to generate the data is then given to the proce-
dures (see part B of Fig. 8.8), and we see with what reliability the procedures
can recover information about the missing parts of the models used to gen-
erate the data. In this way, the reliability of the procedures was tested in
nearly ideal circumstances: the true structural equation model was known,
the sampling was random, and distribution assumptions were satisfied to a
good approximation. In a recent study we compared the reliability of TETRAD
II to two other programs, LISREL VI and EQS, which also attempt to improve
given initial causal models. However, the procedures employed by LISREL
VI and EQS are based on approaches very different from those employed in
TETRAD II.
For each of nine cases, twenty data sets with sample size 200 and twenty
data sets with sample size 2,000 were obtained by first generating values for
each of the exogenous variables 7 (including error variables) with a random
TETRAD 201

number generator giving a standard normal distributionS, then calculating


the value of each of these variables' immediate effects, and continuing to
propagate values through the system until all variables had values for a single
case. We repeated this process for the n cases in a single data set. Finally,
we calculated variances and covariances from the raw data, and these were
input to the programs.
We chose the nine different models that generated the data because they
represent typical causal structures found in the social science literature. In
most cases, we generated the linear parameters at random. We gave all of
the exogenous variables standard normal distributions.
For each data set and initial model, TETRAD II produces a set of alter-
native elaborations that are tied for the highest TETRAD-score. In this study
the sets consisted of between 1 and 13 models, although in most cases the
set consisted of between 1 and 4 models. LISREL VI and EQS both produce as
output a single elaboration of the initial model. In all cases, the information
provided by each program is scored "correct" when the output contains the
true model. 9
At sample size 2000, TETRAD II's set included the correct re-specification
in 95% of the cases, LISREL VI found the right model 18.8% of the time
and EQS 13.3%. At sample size 200, TETRAD II'S set included the cor-
rect re-specification 52.2% of the time, while LISREL VI corrected the mis-
specification 15.0% of the time, and EQS corrected the mis-specification 10.0%
of the time.
Of course, to some extent, this is comparing apples and oranges. TETRAD
IIoutputs a list of suggested models, while both LISREL VI and EQS output
only a single suggested model. This gives TETRAD II an advantage in pro-
ducing reliable output. In order to make a fairer comparison, we ask two
questions: How reliable would LISREL VI and EQS be if they output a list
of suggested models? How reliable would TETRAD II be if it output a single
model? We consider each of these questions in turn.
We believe that the first question is more illuminating than the second
question. For a variety of reasons, it is more sensible to produce a list of
suggested models when the data cannot discriminate between them, than to
simply randomly choose one from the list to suggest. (Strangely enough,
random selection from a list of equally good models is in effect what both
LISREL VI and EQS do.) Unfortunately, we cannot give a precise answer to the
question. LISREL VI and EQS had no automatic method for producing a list
of models instead of a single model, and the size of the study we conducted
made it impossible to conduct such a search by hand. Two things are clear,
however. First, if LISREL VI or EQS were to search for a list of models to
suggest ,rather than a single model, they would vastly increase the amount
202 CHAPTERS

of time they take to perform their searches; in some cases the searches could
not be completed in a reasonable amount of time. Second, if such a search
could be completed in a reasonable amount of time, both LISREL VI and EQS
would in some of the cases that we examined be more reliable. However,
overall they would still not be as reliable as TETRAD II, because in some of
the cases where LISREL VI and EQS performed quite badly (and TETRAD II
performed quite well), the answers that LISREL VI and EQS suggested would
not be improved by conducting a search for a list of models.
Would TETRAD II would be more reliable than LISREL VI or EQS if it
output only one model? The answer depends on how one selects a single
model from TETRAD II'S set. If we have no reason to believe that anyone
model in TETRAD II'S set is more likely to be the true one than is any other, we
could simply choose one randomly. We calculate the expected single model
reliability of TETRAD II in the following way. We assume first that when
TETRAD II outputs a list of n models for a given covariance matrix, that
the probability of selecting any particular one of the models to be TETRAD
II'S single model guess is l/n. Hence, instead of counting a list of length n
that contains the correct model as 1 correct answer, we would count it as
l/n correct answers.10 Then simply divide the expected number of correct
answers by the number of trial runs.
Were TETRAD II to output a single model, its single model reliability
at sample size 2000 would drop from 95% to about 42.3%. On our data at
sample size 2000 LISREL VI has a reliability of 18.8% and EQS has a reliability
of 13.3%. At sample size 200 TETRAD II's single model reliability is 30.2%,
compared to LISREL's reliability of 15.0% and EQS' reliability of 10.0%.
In many realistic settings we would often have substantive reasons to
prefer one model over another, however. If the substantive knowledge is
worth anything, and we use it to select a single model M, then M is more
likely to be true than a model selected at random from TETRAD II'S set of
suggested models.

Empirical Tests TETRAD II was quite successful on Monte Carlo simula-


tion tests. However, these tests were not completely realistic. For example,
they ensured that all of the modeling assumptions (such as multinormality)
made by TETRAD II were exactly satisfied, and that we always started with a
correct submodel of the true model. This raises the question of how TETRAD
II performs on empirical data sets, where its modeling assumptions are not
exactly satisfied, and we can't know that our initial model is a submodel of
the true model.
Unfortunately, given an empirical data set, we generally don't know what
the true model is, so we can't judge the reliability of TETRAD II. However,
TETRAD 203

100
TETRAD =0
90 - LISREL = I!J
-
= un
80
70
60
-
r-- r-
EQS

60
50 r-- 50
40 r-- 40 %
Correct
30 r-- 30
20 r-- 20
10 r--
o
§ill il
:1111
10
0
n = 2000 n = 200

Figure 8.9: Results

we can ask that TETRAD II at least suggest a variety of plausible models that
perform as well or better on statistical tests than the models suggested by
researchers not using TETRAD II. We have generally found this to be the
case. In (Glymour 1987) we described a number of applications of TETRAD
(a predecessor of TETRAD II) to empirical data sets. We describe a typical
example here.
In 1973, H. Costner and R. Schoenberg proposed a method for identi-
fying misspecifications in multiple indicator models. Their procedure, while
not automated is like the TETRAD II procedure in several important respects.
Costner and Schoenberg illustrated their technique with a model of data for
industrial and political development (Fig. 8.10).
In this model, GNP represents gross national product, Energy represents
logarithm of total energy consumption in megawatt hours per capita, Labor
represents a labor force diversification index, Exec represents an index of ex-
ecutive functioning, Party represents an index of political party organization,
Power is an index of power diversification, and CI represents the Cutright
Index of political representation.
Using their revision procedure, Costner and Schoenberg arrived at a
modification of the initial model (Fig. 8.11).
Their revised model has a chi square statistic with p = .053 and 11
degrees of freedom. It is a common practice in the social sciences to accept
a model with any p value greater than .05, and stop searching for further
204 CHAPTERS

or e4
l pafY
e5
"PTer'T

e6 e7

Figure 8.10: Costner and Schoenberg's initial model.

el e2 e3 e4 e5 e6 e7

Figure 8.11: Costner and Schoenberg's revised model.


TETRAD 205

Additions to Skeleton ITTR


I Equations
1Explained
Ip(X2)
1) en-po, gp-ex, en-pa .220 6 0.1189
2) en-po, la-pa, en-ex .202 6 0.4792
3) en-po, la-pa, po -ex .202 6 0.4085
4) en-po, la-ex, ex-pa .223 5 0.1108
5) en-po, en-ex, gp-pa .270 5 0.5752
7) en-po, en-ex, ci-Ia .241 4 0.5198
8) en-po, la-ex, pa-en .213 3 0.1234
9) ID-po, en-ex, ex-gp .358 5 0.1222
10) ID-po, en-ex, pa-en .328 5 0.1526

Table 8.1: Goodness of Fit Results

alternatives. However, using TETRAD we discovered a number of plausible


alternatives to this model that perform better on statistical tests; these are
listed in the chart below. The edges listed in the chart are the edges that
were added to Costner and Schoenberg's initial model.
5. New Things to Do
The TETRAD project has generated a number of interesting questions
for which we as yet have only partial answers. Each advance towards more
complete answers to these questions promises to increase the power of the
discovery procedure.
The most restrictive assumption that TETRAD II makes is that the vari-
ables are multinormally distributed. Reliable distribution free, or at least
more robust, methods of detecting constraints are therefore much to be de-
sired.
The elaboration procedure of TETRAD II requires an initial graph of
causal relations. Often, a user doesn't even have this much knowledge about a
domain. TETRAD II currently includes a crude method of building a complete
causal model from the data alone. The current method needs to be improved
because it is too slow and suggests too many models.
A distinct set of procedures is needed to build "path models" with-
out unmeasured common causes from fragmentary causal knowledge. We are
seeking results that connect the existence of a directed edge in a causal model
with the conditional dependence on every other set of variables of the ver-
tices connected by the edge. Results already obtained guarantee that such
206 CHAPTER 8

conditions will be adequate save in graphs that are unusual in the modeling
literature.
The work of Judea Pearl (1988) has shown how to determine from a
causal graph robust implications of higher order vanishing partial correlations.
These have not yet been implemented in TETRAD II. We have discovered other
classes of constraints on covariance matrices which we do not yet know how
to determine from a graph. Introducing consideration of such constraints
into the search procedures may improve the discriminations the program can
make.
We have done some preliminary characterization of patterns of con-
straints that can only be robustly explained by the introduction of latent,
unmeasured variables. We are attempting to characterize more generally the
conditions under which latent (unmeasured) variables should be introduced
into a model.
We are attempting to find and prove a characterization of graph equiv-
alence over any set of independence assumptions, or any set of tetrad con-
straints. If a single graph could be used to generated a set of equivalent
graphs, it might be possible to speed up our search procedure considerably.
We do not know how to determine from cyclic causal graphs what con-
straints are robustly implied (except in the case of first order partial correla-
tions). We are attempting to extend our current techniques to cyclic graphs.
We know some worst-case properties of our algorithms and the search
problem. We plan to investigate expected case properties of our algorithms
and the search problem.

NOTES

lThe research reported in this paper was supported by the Office of Naval Research
under Contract numbers NOOOl4-88-K-0194 and NOO014-89-J-1964.
2 An important question concerns justification for assuming that the graph that en-
codes the causal relations that generates a probability distribution P should be isomorphic
to some graph that perfectly encodes the conditional independence relations true in P. This
will be the subject of a forthcoming paper.
3 Atomic conditional independence statements are of the form I(X,Z,Y)p, and contain
no quantifiers or propositional connectives.
4It is possible that when we graphically represent a probabilistic theory P in the
manner just described, that for certain values of the free parameters of P, [(X, Z, Y)p even
though X and Y are not d-separated by Z in the graph. However, for a wide variety of
probability distributions over the free parameters of P, the probability of this occurring is
zero.
5 Judea Pearl has shown how to use a causal graph to determine whether partial corre-
lations of any order vanish. We have not yet incOl-porated higher order partial correlations
into TETRAD II, but we plan to do so.
TETRAD 207

6The original TETRAD program has no such scoring function. It is left to the user
to balance the Explanatory and Falsification principles.
7 An exogenous variable is a cause but not an effect.
8First we pseudo-randomly selected a number from a uniform distribution by calling
the "random" system call on the UNIX operating system. Then we input this number to a
function which turned it into a pseudo-random sample from a standard normal distribution.
9We have devised an elaborate classification for cases in which the programs were not
correct, e.g., LISREL's model was in TETRAD II's set of best suggestions, but we neglect
to go into it here (see Spirtes et al forthcoming).
10 Actually, to simplify the calculations, we also assume that the length of the lists
output by TETRAD II for all of the covariance matrices generated by a single model was
in each case equal to the average length of the lists. This is a fairly good approximation
in most cases.

REFERENCES

Causey, Robert 1. (1977) Unity of Science. D. Reidel Publishing Company.


Glymour, C. (1979) "Explanations, Tests, Unity, and Necessity," Nous, 14:31-50.
Glymour, C., Scheines, R., Spirtes, P., and Kelly, K. (1987) Discovering Causal
Structure. San Diego, CA: Academic Press.
Harary, F., and Palmer, E. (1973) Graphical Enumeration. New York: Academic
Press.
Pearl, Judea (1988) Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference. San Mateo, CA: Morgan Kaufmann Publishers.
Riddell, R.C. (1980) "Parameter Disposition in Pre-Newtonian Planetary Theories,"
Archive for History of Exact Sciences, 23:87-157.
Rosenkrantz, R. (1977) Inference, Method and Decision. Dordrecht, NL: D. Reidel.
Spirtes, P., Scheines, R., and Glymour, C. (forthcoming) "Simulation Studies of
the Reliability of Computer Aided Specification Using the TETRAD, EQS,
and LISREL programs," Sociological Methods and Research.
Timberlake, M., and Williams, K.R. (1984) "Dependence, Political Exclusion, and
Government Repression: Some Cross-National Evidence," American Sociolog-
ical Review, 49:141-146.

Peter Spirtes
Richard Scheines
Clark Glymour
Department of Philosophy
Carnegie Mellon University
Pittsburgh, PA 15213
PART III.

POSTSCRIPTUM.

The natural man is impatient with doubt and suspense:


he impatiently hurries to be shut of it. A disciplined mind
takes delight in the problematic, and cherishes it until a way
out is found that approves itself upon examination
John Dewey
RATIONALITY UNBOUND

ISAAC LEVI

Patrick Suppes contends that philosophers "are no longer Sunday's preachers


for Monday's scientific workers". Yet, precisely because of the diversity and
complexity Suppes rightly recognized in scientific disciplines, there is a need
for sermonizing about science which might be less urgent in times of greater
simplicity. To say this is not to say that such exhortation is peculiarly suited
to philosophers. The problems calling for preaching and advocacy are not the
province of any particular profession or discipline. Yet, one can at least hope
that a sensibility to the predicaments which make it desirable to reflect on the
direction that scientific inquiry not merely will be but ought to take will be
fostered in a climate of interdisciplinary cooperation spiced with familiarity
with important philosophical traditions.
It is undeniable, for example, that the development of computer and
automation technology has had and will have an enormous impact on our
culture and surely ought to be grist for the mill of philosophical reflection.
There is ample indication on the pages of this volume of the willingness
of sophisticated and serious thinkers to take up the challenge. Kevin Kelly,
Herbert Simon, Richard Scheines, Peter Spirtes, and Clark Glymour are both
Sundays' preachers and Mondays' workers who combine advocacy with im-
plementation. A common theme running throughout these discussions is that
it is, indeed, possible and desirable to construct an efficient and effective (i.e.,
recursive) method for generating and deciding the acceptability of solutions
to problems. The examination of claims such as these is of first rate philo-
sophical importance even though a thorough scrutiny calls for some familiar-
ity with the special disciplines relevant to the thesis as well as philosophical
sophistication.
As Simon has made clear many times, his concern with such "discov-
ery procedures" arises as part of an effort to offer an account of "bounded
rationality". Consider the problem of choosing strategies in chess. Von Neu-
mann and Morgenstern point out that, for the ideally rational players as
conceived in game theoretical discussions, the game of chess is trivial. Un-
fortunately, limitations of memory, computational capacity and emotional
stability prevent beginning players and grand masters alike from identifying
and implementing ideally optimal strategies. The advice which may be culled
from norms of ideal rationality will be of little help to the chess player or, for
that matter, to any other decision maker facing a decision problem of mod-

211
W. Sieg (ed.), Acting and Reflecting, 211-221.
© 1990 by Kluwer Academic Publishers.
212 CHAPTER 1

erate or considerable complexity. Prescriptions of ideal rationality should be


supplemented or replaced by alternative prescriptions better suited to human
intellectual and emotional capacities whose implementation is likely to incur
minimal economic and social cost.
I take it that most authors who have objected to efforts to construct a
"logic" of discovery have intended to deny that there is an effective criterion
applicable in all cases for deciding whether a given putative solution to a
given problem is acceptable (in whatever sense of "acceptable" is appropriate
to the problem). This, of course, does not apply to the game of chess where
an effective criterion of admissibility is available. Hence, there is an effective
method for generating solutions to chess problems meeting the requirements
of ideal "unbounded" rationality. However, implementation of such methods
would be inefficient and costly even if technologically feasible.
For Simon, therefore, it does not matter whether we are considering a
problem where there is an effective method for generating admissible solutions
or not if it is not economically, politically, psychologically and computation-
ally feasible to generate admissible solutions according to the standards of
unbounded rationality. Although he sometimes seems to be attempting to
refute those who are sceptical of logics of discovery, he is not, strictly speak-
ing, doing so. His concern with methods of generating hypotheses is present
whether there is an effective procedure for deciding the admissibility of solu-
tions to problems in a given class of cases or not.
Even though Simon does not refute sceptics of logics of discovery, he does
dismiss them, in effect, as irrelevant. According to Simon, as I understand
him, we should replace norms of ideal rationality with norms of bounded
rationality. "Ought" implies "can". It is clear that we cannot live up to the
dictates of ideal rationality. We should, therefore, compromise such ideals so
as to obtain standards of rationality better suited to human capacity. Simon
endorses views such as this in many places. Kevin Kelly follows him in this
quite explicitly in his essay in this volume. Spirtes, Scheines and Glymour do
not address the point explicitly but appear to take it for granted in developing
their themes.
If accommodating our ideals to our capacities were the right approach
to take, I would be more sympathetic than I actually am with Kevin Kelly's
conclusion that "effective methods of discovery and hypothesis evaluation
are not only acceptable objects of epistemological study, but are its proper
objects-so far as physically possible beings are concerned". But it seems to
me that Kelly's (and Simon's) contention misperceives the human and social
significance of computer and automation technology.
It is precisely because we are creatures of flesh and blood who are inca-
pable of living up to the norms of ideal rationality to which we are committed
RATIONALITY UNBOUND 213

that the "computer revolution" is so important. The reason is that new tech-
nologies have enlarged our capacities to live up to the norms of ideal ratio-
nality better than we hithertofore dreamed we could do. Just as automation
can enhance worker productivity, so too computers enhance computational
capacity. No doubt they cannot allow us to meet standards of perfection;
but our improved computational capacity and, indeed, our improved memory
enables us to solve problems with great precision which we could only address
with crude approximations (if at all) in the past.
Observe that if we were already perfect and perfectly efficient calculators,
there would be little interest in these new technologies. What is the point in
seeking to improve our capacities when they cannot be improved?
The most obvious benefit we reap from the availability of computers is
that they enable us to be more rational by enhancing our computational ca-
pacity and improving our memory. Such a benefit must be denied by those
who would tailor our standards of rationality to fit our capacities. Those
who focus on bounded rationality without reference to an ideal can say in-
stead that as our computational capacities are enhanced, we may modify our
standards of rationality. But they cannot explain why such changes are a
benefit. Moreover, since when computational capacities are enlarged, there
may be many alternative normative systems which are implementable given
the enhanced capacity, it becomes unclear not only whether any change in
norms is a benefit but which change is.
If this is right, then it is a mistake to see theories of bounded rational-
ity as replacing theories of ideal rationality. The study of effective methods
of problem solving is important because it may conceivably provide us with
means for improving human capacity to live up to standards of ideal ratio-
nality. Indeed, unless we have such standards of ideal rationality, the study
of techniques aimed at improving our computational capacity seems blind
and without any sensible aim or purpose. Theories of bounded rationality
lower our sights by compromising our ideals. If they were taken seriously
as substitutes for theories of ideal rationality, the development of computer
technology would lose much of its raison d 'etre.
It may, perhaps, be objected that in spite of the advances in computer
technology and programming, we face important problems where the demands
of ideal rationality are beyond our abilities and our means to satisfy them.
We cannot simply wait until new breakthroughs enhance our capacities and
enable us to meet the requirements. Theories of bounded rationality should
not, perhaps, replace theories of ideal rationality entirely. But theories of
ideal rationality should be supplemented by theories of bounded rationality
for just such predicaments.
214 CHAPTER 1

As long as the understanding is that theories of bounded rationality


"supplement" but do not replace theories of ideal rationality, neither the
importance of addressing the difficulties they attempt to remedy nor the suc-
cesses that are sometimes achieved need be disputed. Nonetheless, we should
not forget that if and when it becomes feasible to meet the requirements of
ideal rationality rather than invoke some second best criterion that should be
done.
The relation between ideal and bounded rationality I have just sketched
does not require that those who are committed to principles of ideal rational-
ity are able to live up to them. To be sure, it is to be expected that conformity
to such principles is feasible on many important occasions and that there is
at least hope that we may enhance our capacities through better training,
psychotherapy and technology.
Kelly acknowledges that one might take the view that ideals are not
themselves achievable by real agents. And he seems to admit-at least
tacitly-that such ideals have some interest even for students of bounded
rationality. However, his conception of the relation between such ideals and
human capacities is different than the one defended here. According to Kelly,
"although ideals are not themselves achievable by real agents, they are norma-
tive in the sense of promoting the realizable actions that better approximate
them".
Standards of ideal rationality are not limiting cases of standards of
bounded rationality. The standards are ideal in the sense that the capac-
ity to satisfy them is one which an ideally rational agent, in contrast to a
human agent, possesses. To advocate a standard of ideal rationality is not
to promote realizable actions which approximate the prescriptions of such
standards. Rather it is to promote conformity to such standards wherever
that is feasible. The standard is an ideal in the sense that often we are inca-
pable of meeting its requirements. In that case, we are urged to devise ways
and means of satisfying these requirements in a larger number of cases. We
approximate ideal rationality better and better by improving our capacities
to live up to the strict letter of the ideal and not by promoting actions which
approximate but do not meet the requirements of ideal rationality.
Consider the strict Bayesian view ofrational decision making. A rational
agent is to restrict his choice to options bearing maximum expected utility
among his feasible options where expected utilities are derived from the prob-
ability function representing the agent's state of probability judgment (credal
state) and a positive affine transformation of a utility function representing
the agent's evaluation of the consequences of his options.
It is not true, as Kelly claims, that the strict Bayesian ideal is never
achievable. There are many problems where agents are perfectly capable of
RATIONALITY UNBOUND 215

identifying Bayes' solutions which maximize expected utility. Strict Bayesian


ideals are sometimes but by no means always unsatisfiable. Moreover, the
limitations on our capacities to satisfy them are open to modification through
new therapies and technologies designed to enhance our capacities.
But we cannot always wait for help which, in any case, may never come.
Suppose that the agent is not able to compute the expected utilities and
thereby identify options optimal in expected utility. He is not able to live
up to his strict Bayesian ideal. Does the strict Bayesian ideal counsel him to
come as close to living up to it as is feasible? Kelly seems to think so. I do
not. The strict Bayesian counsel is to maximize expected utility. Nothing else
will do. A strict Bayesian may look for a "Bayes-non Bayes compromise"as
I.J. Good put it when strict Bayesian principles cannot be followed. But
these compromises are not to be regarded as deriving from strict Bayesian
principles.
To be sure, strict Bayesian principles can provide some guidance when
the agent cannot observe them perfectly. The agent may have at his disposal
information which entails the expected utilities of all the options. However,
the calculations required to derive the expected utilities may be too complex
or too costly for the agent to carry out in the requisite time. In determin-
ing expectations, an agent might use techniques of approximation without
compromising his strict Bayesian ideals at all if he can be assured that the
approximations are good enough to yield him the same policy recommenda-
tions as exact calculations would.
Computation, however, need not be the problem. There might be no
obstacle to the computation of expected utilities from credal probabilities
and utilities but the agent may be so unclear as to his strict Bayesian credal
state that he might leave open whether one option or another maximizes
expected utility. Nonetheless, the strict Bayesian agent could still act as if he
had a numerically definite credal state and, indeed, could do so in a manner
consistent with the partial information he has concerning his unknown credal
state. Indeed, he might do so in several different ways which yield conflicting
policy recommendations. The agent would not be maximizing expected utility
in accordance with his true (but unidentified) convictions. And the theory of
ideal rationality would not be in a position to specify what the agent should
do except perhaps to insist that the agent act in a manner consonant with
the general principles of Bayesian rationality.
In both types of situation just considered, we see how strict Bayesian
ideals can guide choices under conditions of "bounded rationality" without
presupposing the idea that realizable actions approximate the recommenda-
tions of normative theory.
216 CHAPTER 1

Consider then the case where the agent is asked to bet on the integer in
the billionth place in the decimal expansion of 7r. The agent knows that to
be coherent (that is, obey the requirements of consistent credal probability
judgment), this credal probability that the integer is 9 must be either 1 or 0
depending upon whether 9 is or is not the integer in question. Thus, the agent
knows in advance that to be coherent, his belief probability should be 0 or 1.
Moreover, he also knows that if his belief probability is 0 (1) when 9 is (is not)
the integer in question, he will fail to be credally coherent. Hence, the advice
to choose in a way which maximizes expected utility relative to a coherent
credal probability cannot be implemented unless the agent can establish the
identity of the integer in the billionth place in the decimal expansion of 7r.
In the first type of case, the information about the problem available to
the decision maker entails an exact solution which the decision maker cannot
identify owing to lack of computational capacity. Still the strict Bayesian
ideal can control our demands on an approximate solution.
In the second kind of case, the decision maker lacks information suf-
ficient to entail an exact solution and, hence, could not identify an exact
solution even if his computational capacities were adequate to the job. The
indefiniteness here derives from the decision maker's incapacity to identify
his beliefs and desires accurately enough. Here too, the strict Bayesian ideal
can identify constraints on what the agent should do.
In the third kind of case, the decision maker lacks information about his
credal probabilities just as in the second case. But unlike the second case, his
failure of self knowledge is due to his incapacity to make calculations from
information he does have. I do not understand how strict Bayesian principles
can be deployed to give advice here except to suggest that the agent make the
calculations or borrow the results from someone who has. The fact, however,
that strict Bayesian norms fail to give advice in this case implies no deficiency
in those ideals. If problems involving gambles on the billionth place in the
decimal expansion of 7r are urgent, efforts to compute the integer belonging
at that place ought to be intensified.
To say all this is not to suggest that it is foolish to propose procedures
for addressing cases such as this when they arise. If we cannot make the
computation required, some surrogate recipe for decision making might be
deployed. We should not, however, assess this recipe with respect to how close
it is to the strict Bayesian limiting ideal. In the example under discussion, it is
sometimes suggested by strict Bayesians that the agent should evaluate risks
as if each integer has an 0.1 chance of filling the billionth place even though
the suggestion clearly flouts the requirements of strict Bayesian doctrine. This
recommendation is no better and no worse than any other recommendation
which violates the canons of strict Bayesian rationality-at least as far as
RATIONALITY UNBOUND 217

strict Bayesianism is concerned. And I would not know how to begin to


characterize the "closeness" of this proposal to the Bayesian ideal.
Whether we think of cases where the ideals of strict Bayesian rationality
give some sort of counsel as to what to do even though the ideals cannot be
fully satisfied or whether we think of cases where they cannot, we should not
expect these ideals or any others to guide action by recommending realizable
actions that better approximate them. If we did insist on this, our under-
standing of the norms would hinge on the conception of approximation we
favor as Kelly suggests. Ideal rationality would be related to bounded ratio-
nality as a limiting case. As I have indicated, our understanding of norms
should depend on no such thing. We should be able to apply the norms
strictly in a broad spectrum of cases. We should be in a position to enlist the
aid of psychotherapy, logic and assorted technologies (such as the printing
press and computers) to secure sufficient emotional stability, memory and
computational capacity to solve more and more complex problems according
to the specifications of the ideals. To the extent that these "helping" dis-
ciplines cannot as yet provide us with the resources to solve our problems
according to the standards laid down by our conception of ideal rationality,
there is no sense in which ideal rationality urges us to approximate confor-
mity to the ideal standard. It encourages, instead, strict conformity in a
(hopefully) increasing class of cases.
One of the important differences between the way Kelly suggests ideal
standards of rationality relates to practice and the view I am pressing there
concerns the critical attention devoted to the character of the ideal standards.
According to Kelly, the pressing problems-including those of primary episte-
mological interest-focus on implementation. Our chief concern is to devise
ways and means of becoming better and better copies of ideally rational
agents. This requires a better understanding of what is required to get closer
to the ideal. Once we have a good understanding of this, we do not need
to examine the precise details of strict satisfaction of the ideal. By way of
contrast, concern with strict satisfaction and devising ways and means for
realizing it remain important according to the alternative approach. Critical
scrutiny of the ideal itself rather than of the criterion for measuring distance
between the ideal and the second best remains a paramount epistemological
concern.
The ideal imposed by strict Bayesiallism presupposes that when con-
fronted with a decision problem representable by characterizing options as
functions from states to consequences (under the assumptions that states are
probabilistically independent of acts), the agent is committed to a system
of credal probability judgments for the states (a credal state) representable
by a unique probability function over the algebra generated by the states, a
218 CHAPTER 1

valuation of consequences representable by a utility function unique up to a


positive affine transformation and a valuation of the acts representable by an
expected utility function unique up to a positive affine transformation. The
acts which the agent is rationally permitted to choose are restricted to those
which maximize expected utility.
There are many ways one might be sceptical about the strict Bayesian
ideal. One widely held view is that the so-called "independence postulate"
needs to be modified. Another alternative rejects the assumption that ideal
rationality entails commitment to credal states representable by unique prob-
abilities and commitment to evaluations of consequences representable by util-
ity functions unique up to a positive affine transformation. According to this
view the acts which are the feasible options will not, in general, be evaluated
by an expected utility function unique up to a positive affine transformation
and, indeed, will not even be weakly ordered with respect to expected utility.
On the other hand, there is a clear sense in which the independence postulate
is satisfied.
As is well known, there is considerable evidence to show that individuals
do deviate from the standards of strict Bayesianism. In many cases, devia-
tion can be imputed to various forms of confusion which inhibit the agent's
capacity to make the requisite calculations or comprehend the task being set
for him or her to undertake. In other cases, however, deviant behavior seems
to persist even when the agents are emotionally stable, comprehend the prob-
lem and have clearly made the requisite computations. This seems to be the
case, for example, in decision problems of the kind invented by M. Allais and
D. Ellsberg.
But if the deviation from strict Bayesian rationality is (i) counted as a
deviation from ideal rationality and (ii) cannot be blamed on factors which
inhibit calculation, the deviation is not only a violation of the ideal standard
but the recommendations of bounded rationality which recommend satisfying
the requirements of ideal rationality to the best of one's abilities.
If the strict Bayesian is going to defend his position by showing that the
deviant behavior is irrational even according to the requirements of bounded
rationality, he will have to show that his standard of ideal rationality is
preferable to alternative ideals just as, for example, H. Raiffa tries to do
in responding to challenges to the independence postulate. But this requires
a systematic critical inspection of ideals. Once we have established that the
agent is computationally and emotionally competent for the task at hand, the
issue of bounded rationality loses its relevance. More familiar preoccupations
with ideals of rational choice come to the fore.
Students of bounded rationality do not, to my knowledge, address cases
of behavior allegedly deviating from widely acknowledged norms which can-
RATIONALITY UNBOUND 219

not be explained as due to emotional distress or limitations of memory and


computational capacity. The existence of such deviations implies that there
may be empirically grounded controversies concerning ideal rationality of fun-
damental importance which are not reducible to controversies over bounded
rationality. Recognition of this would oblige them to soften their emphasis
on the epistemological centrality of identifying norms of bounded rational-
ity. They would be obliged to give a hearing to authors from M. Allais to
M. Machina who seek to tinker with the independence axiom in order to save
the ideal rationality of the deviant responses in the Allais problem. They
would also have to consider the alternative response to the problems of Allais
and EIlsberg which preserves independence while abandoning the assumption
that the feasible options are orderable with respect to expected utility.
The Seidenfeld-Schervish-Kadane paper is concerned with exploring prop-
erties of decision theories which give up the assumption that the feasible
options ought to be weakly ordered with respect to value. Included in the
discussion is an argument showing that modification of the independence pos-
tulate yields some extremely uncomfortable consequences which are avoidable
if one gives up the weak ordering requirement instead. This section of the
Seidenfeld-Schervish-Kadane paper is, I suggest, an important contribution
to the examination of the merits of rival proposals concerning what a de-
cision theory purporting to characterize ideal standards of rational choice
ought to be. No attention is paid to the question of bounded rationality
and none seems relevant. The strict Bayesian view, theories which modify
the independence postulate entrenched in strict Bayesian doctrine but insists
that choices be weakly ordered in a way reflecting the values of the decision
maker and theories which retain the independence postulate but abandon the
weak order ability assumption all are extremely demanding on the memories,
computational capacities and motivations of human agents. It is doubtful
whether there are any considerations to be obtained from theories of compu-
tation which can be helpful in deciding the merits of these alternative ideals
in a non-question begging way. To the contrary, the ideals we adopt give a
direction to efforts to deploy computer technology to overcome the limitations
due to our bounded rationality without which our efforts would be blind.
In the light of all this, what should we say about the chess playing
program Simon considers?
Do the standards of ideal rationality advocated, let us say, by von Neu-
mann and Morgenstern, entail that failure to choose a winning chess strategy
is a mark of irrationality-i.e., failure to conform to the requirements of ideal
rationality?
This would be so if the chess player had as his or her feasible strategies
all possible plans for making moves given the moves of his opponent.
220 CHAPTER 1

It is clear, however, that most "games" in the "complete tree of possible


games" are not feasible for the players to play. The reason is the limited com-
putational capacities and memories of the players. A strategy is feasible for a
player only if he can implement it and implement ability entails that the agent
be able to follow the specifications of the policy-in this case the instructions
as to how to move at each step depending on the previous moves by the agent
and his opponent. Hence, there may very well be intellectual limits on the
capacities of the decision maker precluding strategies which might otherwise
have been feasible for him or her from being so. In such cases, there is no
failure of ideal rationality. Principles of ideally rational decision making do
not prescribe which options ought to be feasible in a decision problem but only
which options are admissible given a set of feasible options. We may agree
that agents who, due to their lack of computational capacity, cannot identify
more than three strategies for playing chess four moves ahead are not very
good chess players. But if they pick the best of these strategies (say the one
which maximizes the prospect of victory in the set of three strategies), they
have met the requirements of ideal rationality and not merely satisfied them
to a good degree of approximation.
As I understand it, the problem Simon is addressing is the task of enlarg-
ing the space of feasible options faced by the decision maker. Enlarging the
set of feasible options should, from the vantage point of the decision maker,
improve his predicament by enhancing his control-provided the new feasible
options include options bearing higher values. Of course, there are costs to
searching for new options-including costs of computation. Such costs can
preclude continuing the search until all "objectively" available options have
become feasible for the decision maker. Given the costs, "aspiration levels"
are determined which allow one to say that one searches for new options un-
til an option which yields benefits at or above the aspiration level has been
identified.
Once a set of feasible options is identified in this way, the rational agent
ought ideally to choose the option which maximizes value. Of course, the
option chosen will be one which "satisficed" in the context of the search for
new options.
Thus, the tasks for which Simon's, Kelly's, Spirtes, Scheines, and Gly-
mour's proposals seem best suited are the tasks of abduction-the identifica-
tion of potential solutions to problems or feasible options for realizing given
goals. Whatever the merits of these ideas may turn out to be, it is impor-
tant, in my opinion, to disassociate them from issues arising from the fact
that standards of ideal rationality are often violated due to limitations of
computational capacity, memory and due to emotional instability.
RATIONALITY UNBOUND 221

Appreciation of the importance of this point has become widespread,


thanks in large measure to the efforts of authors like Simon among economists
and I.J. Good among statisticians. And in recent years it has become in-
creasingly recognized that the new computer technologies can contribute to
an intelligent response to the issues raised.
What I have sought to do here is to suggest that the relevance of such
technologies to problems about limitations of computational capacity and
ideal rationality is more controversial than it is often taken to be. Some
authors urge us to tailor our prescriptions to fit our capacities. For such au-
thors, we should look to AI and cognate fields for help in identifying those
capacities. Others, like myself, insist that we should urge workers in artifi-
cial intelligence to devote less time to devising models of the mind and more
effort to what, in any case, is the most impressive contribution of the com-
puter revolution-to enhancing our control over our environment and above
all improving our capacity to be rational. Sorting out the issues involved here
calls for philosophical acumen, technical competence and the kind of inter-
disciplinary exchanges which Carnegie Mellon has made integral to the life of
the University.
It would be wrong to suggest that the topic of bounded rationality and
the question of discovery are the only philosophical issues of interest which
call for informed involvements with substantive developments in mathemat-
ical, natural and social sciences or the arts. The pages of this volume offer
testimony to the diversity of topics and attitudes which can become grist for
philosophical reflection in an interdisciplinary setting. What I have sought
to make clear by addressing (albeit briefly) the topic of bounded rationality
is that there is still room for philosophical preaching. There are controversial
issues concerning what we ought to do thrown up by technological innovation
and social change which extend beyond issues of public policy and private
morality to touch the ways we ought to reason with one another. Were the
issues uncontroversial, the preaching would be gratuitous. But the facts sug-
gest otherwise; and it is fortunate that institutions like Carnegie Mellon are
prepared to encourage such preaching informed by the best fruits of inquiry.
Isaac Levi
Department of Philosophy
Columbia University
New York, NY 10027

Você também pode gostar