Você está na página 1de 26

Mathematical Psychology

Jean-Claude Falmagnea , Michael D. Leea


a

University of California, Irvine, 3151 Social Sciences Plaza A, Irvine CA 92697-5100

Abstract
The article reviews the field from a historical perspective, starting from the
works of Fechner and Thurstone, and outlining the basic theoretical concepts
in four traditional areas: learning, psychophysics, measurement and choice,
and response times. More recent topics that have emerged from these areas
are also briefly mentioned together with their most prominent contributors.
Keywords: Bayesian statistics, choice theory, Decision Field Theory,
European Mathematical Psychology Group, Fechners Law, Law of
Comparative Judgment, Markov chain processes, mathematical learning
theory, measurement theory, psychophysics, sequential sampling models,
signal detection theory, Society for Mathematical Psychology, stimulus
sampling theory, utility theory
1. Introduction
Mathematics has been used in psychology for a long time, and for different
purposes. When William James (1890/1950) writes
Self esteem =

Success
Pretensions

he is using mathematical notation metaphorically. Because no method is


provided to measure the three variables, the equation cannot be taken literally. James means to convey, by a dramatic formula, the idea that if your
pretensions increase without a corresponding increase of your success, your
self-esteem will suffer. Such usages of mathematics, which Miller (1964) calls
Email addresses: jcf@uci.edu (Jean-Claude Falmagne), mdlee@uci.edu (Michael
D. Lee)

Preprint submitted to Encyclopedia of the Social and Behavioral Sciences April 16, 2012

discursive, can be found in psychological discourse at least since Aristotle.


Despite their historical interest, we shall not review such metaphorical uses
of mathematics here (see, instead, Boring 1950 or Miller 1964).
In this article, we reserve the term mathematical psychology to the elaboration and the testing of mathematical theories and models for behavioral
data. In principle, such a theory entails an economical representation of a
particular set of data in mathematical terms, where economical means that
the number of free parameters of the theory is substantially smaller that
the number of degrees of freedom in the data. In that sense, mathematical
psychology plays for behavioral data the role that mathematical physics or
mathematical biology play for physics or biology, respectively. In the best
cases, the theory is cast in probabilistic terms and is testable by standard
statistical methods. The large majority of such mathematical theories for behavioral data have emerged from four partially overlapping traditional fields:
psychophysics, learning, choice, and response times. Each of these field is
outlined below from the standpoint of the prominent mathematical models
that have been proposed. Other topics to which mathematical psychologists
have devoted much work are also mentioned.
2. The Precursors: Fechner and Thurstone
Gustav Theodor Fechner (1801-1887; see Boring 1950) was by training
an experimental physicist with a strong mathematical background. While
his interests were diverse and his contributions many, ranging from experimental physics to philosophy, we only consider him here as the founder of
psychophysics. His main purpose was the measurement of sensation in a
manner imitating the methods used for the fundamental scales of physics,
such as length or mass. Because sensation could not be measured directly,
Fechner proposed to evaluate the difference between the sensations evoked
by two stimuli by the difficulty of discriminating between them.
To be more specific, we introduce some notation. We write x, y, etc.
for positive real numbers representing physical intensities measured on some
ratio scale, such as sound pressure level. Let P (x, y) be the probability that
stimulus x is judged to be louder than stimulus y. Cast in modern terms
(cf. Luce and Galanter 1963; Falmagne 1985/2002) Fechners idea amounts
to finding a real valued function u defined on the set of physical intensities,
and a function F (both strictly increasing and continuous) such that
P (x, y) = F [u (x) u (y)] .
2

(1)

This equation is supposed to hold for all pairs of stimuli (x, y) such that
0 < P (x, y) < 1 (subjectively, x is close to y). If such an equation holds,
then the function u can be regarded as a candidate scale for the measurement
of sensation in the sense of Fechner. A priori, it is by no means clear that
a scale u satisfying (1) necessarily exists. A necessary condition is the socalled Quadruple Condition: P (x, y) P (x0 , y 0) P (x, x0) P (y, y 0) ,
where the equivalence is assumed to hold whenever the four probabilities
are defined. In the literature, the problem of constructing such a scale,
or of finding sufficient conditions for its existence, has come to be labeled
Fechners Problem. An axiomatic discussion of Fechners Problem can be
found in Falmagne (1985/2002; see also Krantz et al. 1971).
For various reasons, partly traditional, psychophysicists often prefer to
collect their data in terms of discrimination thresholds, which can be obtained
as follows from the discrimination probabilities. We define a sensitivity function : (y, ) 7 (y) by the equivalence: (y, ) = x P (x, y) = .
The Weber function1 : (y, ) 7 (y) is then defined by the equation
(y) = (y) .5 (y). A representation of these fundamental concepts of
classic psychophysics is given in Figure 1, in a special case where P (y, y) = .5.
For each value of y, the S-shaped function x 7 P (x, y) is called a psychometric function.
Starting with Weber himself, the Weber function has been investigated
experimentally for many sensory continua. In practice, (y) is estimated by
stochastic approximation (cf. Robbins and Monro 1951; Wasan 1969; Levitt
1970) for one or a few criterion values of the discrimination probability
and for many values of the stimulus y. A typical finding is that, for stimulus
values in the midrange of the sensory continuum and for some criterion values
, (y) grows approximately linearly with y, according to the equation:
(y) = yC () ,

(2)

where C is a function depending on the criterion. The label Webers Law


is attached to this equation. It is easily shown that (2) is equivalent to the
homogeneity equation P (x, y) = P (x, y), ( > 0), which in turn leads
to P (x, y) = H (log x log y), with H (s) = P (es , 1). This means that the
1

E. E. Weber (1795-1878), a colleague of Fechner, professor of anatomy and physiology


at Leipzig.

a psychometric
function

0.5

(y)

(y)

Figure 1: The Weber function in the case where P (y, y) = .5 for all stimuli y; thus
.5 (y) = y. The S-shape function x 7 P (x, y) is referred to as a psychometric function.

function u in (1) has the form


u (y) = A log y + B,

(3)

where the constants A > 0 and B arise from uniqueness considerations for
both u and F . (3) has been dubbed Fechners Law. Our discussion indicates
that, in the framework of (reffechone), Webers Law and Fechners Law are
equivalent. Much of psychophysics evolved from Fechners ideas.
The most durable contribution of Leon Louis Thurstone (18871955) to
mathematical psychology is his Law of Comparative Judgments (Thurstone
1927a,b; see Bock and Jones 1968), a cornerstone of binary choice theory
closely related to (1) of Fechners Problem. Thurstone supposed that a subject confronted with a choice between two alternatives x and y (where x, y
are arbitrary labels and do not necessarily represent numerical values), makes
a decision by comparing the sampled values of two random variables Ux and
Uy associated to the alternatives. Suppose that these random variables are
independent and normally distributed, with means (x), (y), and variances 2 (x), 2 (y), respectively. Denoting by the distribution function of
a standard normal random variable and with P standing for the probability
4

measure, this leads to

P (x, y) = P (Ux > Uy ) = q

(y) (y)
2

2 (x) + 2 (y)

(4)

Assuming that all the random variables have the same variance 2 = 2 /2,
we obtain
P (x, y) = (u (x) u (y)),
(5)
with u (x) = (x) / and F = , a special case of (1). (4) and (5) are
called Case III and Case V of the Law of Comparative Judgments, respectively. Thurstones model has been widely applied in psychophysics and
choice. Other models in the same vein have been proposed by various researchers (e.g. Luce 1959; Luce and Suppes 1965). Thurstones other important contributions are in learning theory (e.g. Thurstone 1919), and especially
Multiple-Factor Analysis (e.g., Thurstone 1947).
3. The Beginning: Mathematical Learning Theory
Two papers mark the beginning of mathematical psychology as a distinguished research field: one by Estes (1950) entitled Toward a Statistical
Theory of Learning, and another by Bush and Mosteller (1951). These works
were the highlights of a movement spearheaded by R.R. Bush and R.D. Luce
at the University of Pennsylvania, and R.C. Atkinson, W.K. Estes and P.
Suppes at Stanford, which set out to formalize mathematical learning theories in terms of stochastic processes, and especially, Markov processes (cf.
Bharucha-Reid, 1960; Parzen, 1994). There were good reasons for such a development at that time. The previous decade had been plagued by fruitless
controversies concerning the basic mechanisms of learning. While a considerable experimental literature on learning was available (cf. Hilgard, 1943), the
statistical tools in use for the analysis of the data were poor, and the prominent theories ambiguous. When mathematics was used, it was metaphorically. Moreover, the scope of the theories were ambitious, covering a vast
class of experimental situation loosely connected to each other conceptually.
A typical example of such an endeavor is Hull (1943). By contrast, the
Markov models developed by Bush, Estes and their followers were designed
for specific experimental situations. Under the influence of Suppes, a philosopher of science from Stanford who played a major role in the development of
5

mathematical psychology, these models were often stated axiomatically. As


a consequence, the predictions of the models could in many cases be derived
by straightforward mathematical arguments. Two classes of models were
investigated.
3.1. Finite State Markov Chains
In these models, the basic idea is that the subjects responses in a learning
experiment are the reflection of some internal states, which coincide with
the states of a finite Markov chain. The transitions of the Markov chain are
governed by the triple of events (stimulus, response, reinforcement) occurring
on each trial. In many situations, the number of states of the Markov chain is
small. We sketch a simple case of the so-called one-element model, in which
the chain has only two states which we denote by N and K. Intuitively, N
stands for the naive state, and K for the cognizant state. On each trial of
the experiment, the subject is presented with a stimulus and has to provide a
response, which is either labeled as C (correct) or as F (false). For example,
suppose that the subject has to identify a rule used to classify some cards
into two piles. Say, a drawing on each card has either one or two lines, which
can be either both straight or both curved, and either both vertical or both
horizontal. The subject must discover that all the cards with curved lines
and only those, go on the left pile. The subject is told on each trial whether
the chosen pile is the correct one.
In words, the four axioms of the model are as follows: [M1] The subject
always begins the experiment in the naive state N; [M2] the probability of
a transition from the naive state N to the cognizant state K is equal to
some parameter 0 < 1, constant over trials, regardless of the subjects
response; [M3] in state N, the probability of a correct placement is equal to
a parameter 0 < 1, constant over trials; [M4] in state K, the response is
always correct. The derivation of the model can either be based on a 2-state
Markov chain with state space {N, K}, or on a 3-state Markov chain with
state space {(N, C), (N, F ), (K, C)} (where N, K denote the subjects states
and C, F the responses). In either case, the derivations are straightforward.
Notice that, from Axioms [M1] and [M2], the trial number of the occurrences
of the first K state has a geometric distribution with parameter . Writing Sn
and Rn for the cognitive state and the response provided on trial n = 0, 1, . . .
(thus, Sn = N, K and Rn = C, F ), we get easily, for n = 0, 1 . . .,
P (Sn = N) = (1 )n ,
6

P (Rn = C | Sn = N) =
P (Rn = C | Sn = K) = 1
P (Rn = C) = 1 (1 )n (1 ) ,
which implies, with pn = P (Rn = C),
pn+1 = (1 )pn + .

(6)

A strong prediction of this model is that the number of correct responses


recorded before an error occurring on trial number n+1 should be binomially
distributed with parameters and n (i.e., all the subjects responses have
been generated by the naive state). This prediction is surprisingly difficult
to reject, at least in some situations (Suppes and Ginsburg, 1963).
3.2. Linear Operators Models
To facilitate the comparison, we take the same concept identification experiment as above, involving the classification of cards into two piles, and we
consider a simple representative of this class of models. As before, we denote
by Rn the response on trial n. We take as the sample space the set of all
sequences (R0, R1 , . . . , Rn , . . .), with Rn = C, F for n = 0, 1, . . .. We also
define
p,0 = P(R0 = C)
p,n = P(Rn = C | Rn1 , . . . , R0),

n = 1, 2, . . . .

Let 0 < 1 be a parameter. The axiom of the model has two cases, for
n = 0, 1, . . .,
[L1]
[L2]

p,n+1 = (1 ) pn +
p,n+1 = pn

if Rn1 = F
if Rn1 = C.

Thus, the model has the two parameters p,0 and , and learning occurs
only when false responses are provided. As in the case of the finite Markov
chains models, many predictions could be derived from such models, which
could then be tested on the type of learning data traditionally collected by
the experimenters. For both classes of models, the results of such enterprises
were often quite successful. Nevertheless, the interest for such models waned
during the sixties, at least for learning situations, because researchers gradually realized that their simple mechanisms were far too primitive to capture
7

all the intricacies revealed by more sophisticated analyses of learning data


(see especially Yellott, 1969). General presentations of this topic can be
found in Atkinson and Estes (1963) or Atkinson et al. (1965) for the finite
state Markov learning models, and in Estes and Suppes (1959) or Sternberg
(1963) for the linear operator models. A mathematical discussion of Markov
processes for learning models is contained in Norman (1972).
Despite the partial failure of these models to provide a satisfactory detailed explanation of traditional learning data, their role was nevertheless
essential in the introduction of modern probability theory (in particular
stochastic processes) and axiomatic methods in theoretical psychology, and
in promoting the emergence of mathematical psychology as a field of research.
In recent years, a renewed interest in learning theory has appeared from the
part of some economists.
4. Psychophysics
The main research topics in psychophysics can be traced back to the ideas
of Fechner and Thurstone outlined in the first section of this article. Fechners method of measuring sensation is indirect, and based on the difficulty
of discriminating between two stimuli. Under the impetus of S.S. Stevens,
a psychologist from Harvard, different experimental methods for scaling
sensation became popular.
4.1. Direct Scaling Methods
In the case of the magnitude estimation method, Stevens (1957) asked
subjects to make direct numerical judgments of the intensities of stimuli.
For example, a subject may be presented with a pure tone of some intensity
x presented binaurally, and would be required to estimate the magnitude
of the tone on a scale from 1 to 100. Typically, each subject would only
be asked to provide one or a couple of such estimations, and the data of
many subjects would be combined into an average or median result which we
denote by (x). In many cases, these results would be fit reasonably well by
the so-called Power Law (x) = x or such variants as (x) = x + and
(x) = (x ) . In the cross modality matching method, the subject is
presented with some a stimulus from one sensory continuum (e.g., loudness),
and is required to match its intensity with that of some other stimulus, from
a different sensory continuum (e.g., brightness). Power laws were often also
obtained in these situations. In the context of a discussion concerning the
8

measurement of sensation, the difference of forms between (3) and the Power
Law was deemed important. While not much mathematical theorizing was
involved in any particular application of these ideas, a real challenge was
offered by the need to construct a comprehensive theory linking all important
aspects of psychophysical methods and data. The most ambitious effort in
this direction is due to Krantz (1972). For a slightly different approach to
direct scaling, see Anderson (1981).
4.2. Functional Equation Methods
Because the data from psychophysical experiments are typically noisy,
the theoretician may be reluctant to make specific assumptions regarding
the form of some functions entering in the equations of a model. An example
of such a model is (1), in which the functions u and F are not specified a
priori. In such cases, the equations themselves may sometimes specify the
functions implicitly. For instance, if we assume that both Webers Law and
(1) hold, then Fechners Law must also hold, that is, the function u must
be logarithmic, as in (3). Many more difficult cases have been analyzed (see
Aczel et al., 2000, for the application of functional equation methods in the
behavioral sciences).
4.3. Signal detection theory
Response strategies are often available to a subject in a psychophysical experiment. Consider a situation in which the subject must detect a
low intensity stimulus presented over a background noise. On some trials,
just the background noise is presented. The subject may have a bias to respond YES on some trials even though no clear detection occurred. This
phenomenon prevents a straightforward analysis of the data because some
successful YES responses may be due to lucky guesses. A number of signal
detection theories have been designed for parsing out the subjects response
strategy from the data. The key idea is to manipulate the subjects strategy
by systematically varying the payoff matrix, that is, the system of rewards
and penalties given to the subject for his or her responses. These fall in four
categories: correct detection or hit; correct rejection; incorrect detection
or false alarm; and incorrect rejection or miss. An example of a payoff
matrix is displayed in Table 1, in which the subject collects 4 monetary units
in the case of a correct detection, and looses 1 such unit in the case of a false
alarm.

Table 1: An example of payoff matrix. The subject collects 4 monetary units in the case
of a correct detection (or hit).
Responses
Yes

No

Yes

4(Hit)

-2(Miss)

No

-1(False alarm)

3(Correct rejection)

Stimulus

For any payoff matrix , we denote by ps () and pn () the probabilities of a correct detection and of a false alarm, respectively. Varying the
payoff matrix over conditions yields estimates of points (ps (), ps ()) in
the unit square. It is assumed that these points lie on a Receiver-OperatorCharacteristic (ROC) curve representing the graph of some ROC function
: pn () 7 ps (). The function is typically assumed to be continuous
and strictly increasing. The basic notion is that the subjects strategy varies
along the ROC curves, while the discriminating ability varies across these
curves. The following basic random variable model illustrates this interpretation. Suppose that to each stimulus s is attached a random variable Us
representing the effect of the stimulus on the subject sensory system. Similarly, let Un be a random variable representing the effect of the noise on that
system. The random variables Us and Un are assumed to be independent.
We also suppose that the subject responds YES whenever some threshold (depending on the payoff matrix ), is exceeded. We obtain the two
equations
ps () = P (Us > ) ,

pn () = P (Un > ) .

(7)

The combined effects of detection ability and strategy on the subjects performance can be disentangled in this model, however. Under some general
continuity and monotonicity conditions and because Us and Un are independent, we get
Z
Z 1
P (Us > Un ) =
P (Us > ) dP (Un ) =
(p) dp,
(8)
0

with the ROC function and after changing variable from to pn () = p.


Thus, for a fixed pair (s, n), the area under the ROC curve, which does
not depend upon the subjects strategy, is a measure of the probability that
Us exceeds Un . Note that (8) remains true under any arbitrary continuous
10

strictly increasing transformation of the random variables. For practical


reasons, specific hypotheses are often made on the distributions of these
random variables, which are (in most cases) assumed to be Gaussian, with
expectations s = E (Us ) and n = E (Un ), and a common variance equal
to 1. Replotting the ROC curves in (standard) normal-normal coordinates,
we see that each replotted ROC curve is a straight line with a slope equal to
1 and an intercept equal to s n .
Obviously, this model is closely related to Thurstones Law of Comparative Judgments. Using derivations similar to those leading to (4) and(5)

and defining d0 (s, n) = s n we obtain P (Us > Un ) = d0 (s, n) / 2 ,
an equation linking the basic signal detectability index d0 and the area under the ROC curve. The index d0 has become a standard tool not only in
sensory psychology, but also in other fields where the paradigm is suitable
and the subjects guessing strategy is of concern. Multidimensional versions
of this Gaussian signal detection model have been developed. Various other
models have also been considered for such data, involving either different
assumptions on the distributions of the random variables Us and Un , or
even completely different models, such as threshold models (Krantz, 1969).
Presentations of this topic can be found in Green and Swets (1974) and
MacMillan and Creelman (2004).
Mathematical models for multidimensional psychophysics were also developed. One approach takes the guise of Geometric representations of perceptual phenomena, which is the title of a seminal volume on the topic (Luce
et al., 1995, see, in particular Indow 1995). Another approach emphasizes
psychological dissimilarity as the foundational concept (Dzhafarov, 2011).
5. Measurement and Choice
Because they were preoccupied with the scientific bases of their discipline,
a number of mathematical psychologists have devoted considerable efforts to
the elucidation of the foundation of measurement theory, that is, the set of
principles governing the use of numbers in the statement and discussion of
scientific facts and theories. An account of the results can be found in the
three volumes of Foundation of Measurement (Krantz et al., 1971; Luce
et al., 1990; Suppes et al., 1989)
The literature on Choice Theory is extensive. Contributors come from
diverse fields including mathematical psychology, but also microeconomics,
political science and business, for example, the latter two concerned with the
11

study of voter or consumer choice. Early on, the literature was dominated by
Thurstones Law of Comparative Judgments, which still remains influential.
Many other situations and models have been analyzed, however, and we only
give a few pointers here. A generalization of Thurstone model is obtained by
dropping the assumption of normality of the random variables Ux and Uy in
the last part of (4). Despite many attempts, the problem of characterizing
this model in terms of conditions on the binary choice probabilities, posed by
Block and Marschak (1960), is still unsolved. In other words, we do not know
which set of necessary and sufficient conditions on the choice probabilities
P (x, y) guarantee the existence of the random variables Ux , Uy satisfying
the first part of (4) for all x and y in the choice set. A number of partial
results have been obtained, however.
In the multiple choice paradigm, the subject is presented with a subset
Y of a basic finite set X of objects and is required to select one of the
objects in Y . We denote by P (x; Y ) the probability
of selecting x in Y . By
P
abuse of notation, we also write P (X; Y ) = xX P (x; Y ). Suppose that
P > 0. The Choice Axiom, proposed by Luce (1959), states that, for all
Z Y W X , we have
P (Z; Y ) P (Y ; W ) = P (Z; W ) .

(9)

Defining the function v : x 7 P (x; X ), (9) yields immediately P (x; Y ) =


hP
i1
v(x)
v(y)
, for all Y X and x Y . This model plays an imporyY
tant role in the literature. In the binary case, it has an interpretation in terms
of random variables as in the Thurstone model, but these random variables,
rather than being Gaussian, have a negative exponential distribution.
In the general case of such a random variable model for the multiple
choice paradigm, we simply suppose that to each x in X is attached a random
variable Ux such that, for all subsets X of X and all x in X, we have
P (x; X) = P (Ux = max{Uy | y X}) .

(10)

The general characterization problem for this model has been solved by Falmagne (1978) who states necessary and sufficient conditions for the existence
of the random variables satisfying (10). His paper also contains a uniqueness result. As in the binary case, specific assumptions can be made on the
distributions of these random variables.
Other models, based on different principles, have also been proposed for
the multiple choice paradigm. For example, in the elimination by aspects
12

model, due to Tversky (1972), a subjects choice of some object x in a set X


is regarded as resulting from an implicit Markovian-type process gradually
narrowing down the acceptable possibilities. For reviews of probabilistic
choice models see Luce and Suppes (1965), or Suppes et al. (1989). A sample
of some of the results can be found in Marley (1997).
6. Response Time Models
The time or latency of a response has been used as a behavioral index
of the sensory or mental processes involved in the task since the inception
of experimental psychology in the 19th century. Many mathematical models
are based on Donders idea that the observed response time is a sum of a
number of unobservable components including at least a sensory, a decision
and a motor response part (Donders, 1868/1969). These models make various
assumptions on the distributions of the component times, which are often
taken to be independent. McGill (1963), for instance, assumes that the
component times are all distributed exponentially and independently, with
possibly different parameters, so that their sum is distributed as a general
gamma random variable.
Another category of models is grounded on the assumption that the observed response results from an unobservable accumulation of evidence with
absorbing boundaries. These models are known as sequential sampling models (Busemeyer and Rapoport, 1988; Laming, 1968; Link and Heath, 1975;
Ratcliff, 1978; Vickers, 1979), and are based on sequential sampling methods from statistics, including the sequential probability ratio test (Wald and
Wolfowitz, 1948). As psychological models, sequential sampling processes
assume the latent accumulation of information or evidence from a stimulus, based on a series of samples. When the accumulated evidence reaches a
boundary or threshold, the decision corresponding to that boundary is made,
and the number of samples taken provides a measure of the time taken to
make the decision. This means the models generate a joint distribution over
both decisions and response times. There are also some theoretical accounts
of confidence that can be related to the dynamics of sequential sampling processes, and so allow this third behavioral variable to be modeled (Vickers,
1979; Pleskac and Busemeyer, 2010). In some reaction time situations, the
successive stimuli follow each other in a fast paced sequence. In such cases,
prominent sequential effects appear. Markovian models explaining such sequential effect have been developed (Falmagne, 1965; Falmagne et al., 1975).
13

Within the general sequential sampling framework, there are many models and model classes making different assumptions about the way evidence
is accumulated, and the form of the boundaries. Random-walk or drift diffusion models (Ratcliff, 1978; Ratcliff and McKoon, 2008) assume a single tally
is maintained, race or accumulator models (Smith and Vickers, 1988; Vickers, 1970) maintain separate tallies for each alternative decision, and ballistic
models (Brown and Heathcote, 2008) also maintain separate tallies but without stochastic variability. In most models, the sampling process that generates evidence from a stimulus is assumed to be homogeneous, and boundaries
are also assumed to be constant throughout the decision-making process.
There are, however, some exceptions Smith (2000), especially considering
more general utilities incorporating deadlines or time pressure (Frazier and
Yu, 2008), or considering mechanisms for the learning or self-regulation of
the boundaries over trials (Busemeyer and Myung, 1992; Simen et al., 2006;
Vickers, 1979).
Sequential sampling models were at first mostly applied to simple visual
and perceptual decision-making phenomena, but have also found application in two other areas. One of these is in modeling higher-order cognitive
processes such as categorization (Nosofsky and Palmeri, 1997) and judgment and preference phenomena, especially through Decision Field Theory
(Busemeyer and Townsend, 1993). The other is in combining behavioral and
neuroscientific data relating to the time course of simple decisions (Gold and
Shadlen, 2007; Smith and Ratcliff, 2004). There is recent work on hierarchical extensions of the model, to account for individual differences in subjects,
differences between stimuli, and other sources of variation beyond the level of
a single decision trial (Rouder et al., 2003; Vandekerckhove et al., in press).
Early work in the modeling of response times relied on analytic results,
making the comparison of models to data, and the estimation of parameters
from data tractable. Computational approaches have played a progressively
more important role, but there continues to be important foundational mathematical development (Navarro and Fuss, 2009; Smith, 2000). Luce (1986) is
a basic reference for the models and their applications, and Smith (2000) and
Bogacz et al. (2006) provide reviews of many of the relevant mathematical
and statistical results.

14

7. Other topics
From these four traditional areas, research in mathematical psychology
has grown to include a wide variety of subjects. Current research include
many aspects of perception, cognition, memory, decision-making,and more
generally information processing (cf. Dosher and Sperling, 1998). In some
cases, the models can be seen as more or less direct descendant of those
proposed by earlier researchers in the field. The multinomial process tree
models (Batchelder and Riefer, 1999), for instance, is in the spirit of the
Markovian models of the learning theorists.
However, as with response time modeling, the advent of powerful computers gave also rise to different types of models for which the predictions could
be obtained by simulation, rather than by mathematical derivation. Representative of this trend are the parallel distributed processing or connectionist
models (Rumelhart and McClelland, 1986), scaling and clustering models of
stimulus representation (Shepard, 1980; Navarro and Griffiths, 2008) and a
wide range of cognitive process models, including especially models of category learning (Ashby and Maddox, 2005; Kruschke, 2008; Nosofsky, 1992)
and memory (Clark and Gronlund, 1996; Norman et al., 2008). Despite
the computational emphasis, there continue to be important mathematical
results for some of these models (Myung et al., 2007; Navarro, 2005).
Most recently, Bayesian methods have had a rapid and widespread influence over the areas traditionally studied by mathematical psychology. There
are at least three types of Bayesian influence on the field (Lee, 2011). One
type involves applying Bayesian statistics for data analysis, and to compare
and evaluate models (Pitt et al., 2002; Rouder et al., 2009; Kruschke, 2011).
Another influence is as a framework for extending cognitive process model
accounts of complicated behavioral data, allowing the incorporation of individual differences, stimulus variability, co-variate information, and other
hierarchical and latent mixture structure (Lee, 2008; Rouder et al., 2007).
The third influence Bayesian methods have had is as a theoretical metaphor
for the mind, to contrast with alternatives like the information processing
or connectionist metaphors. The Bayesian view treats the mind as applying Bayesian inference to sparse and noisy data to learn about richly structured mental hypothesis spaces (Chater et al., 2006). This approach has
risen quickly in prominence, and many Bayesian models have been developed across a wide range of phenomena, including especially generalization,
concept learning, and inductive reasoning (Anderson, 1991; Kemp and Tenen15

baum, 2008; Griffiths and Tenenbaum, 2009; Tenenbaum and Griffiths, 2001).
Finally, we mention Psychometrics. The research in Psychometrics concerns the elaboration of statistical models and techniques for the analysis of
test results. As suggested by the term, the main objective is the assignment
of one or more numbers to a subject for the purpose of measuring some mental or physical traits. In principle, such a topic could be regarded as part of
our subject. For historical reasons, however, this line of work has remained
separate from mathematical psychology. The research on knowledge spaces
is an alternative to psychometrics. Instead of measuring a persons numerical
aptitude in a subject, it uses stochastic algorithms to uncover the knowledge
state of a person, which is the set of all concepts mastered by the person in
that subject (Falmagne and Doignon, 2011).
8. The Journals, the Researchers, the Society
The research results in mathematical psychology are mostly published in
specialized journals such as the Journal of Mathematical Psychology, Mathematical Social Sciences, Psychometrika, Econometrica and Mathematiques,
Informatique et Sciences Humaines. Some of the work also appears in main
stream publications: Psychological Review or Psychonomic Bulletin & Review. Early on, the research was typically produced by psychologists, and
the work often had a strong experimental component. The last couple of
decades other researchers became interested in the field, coming especially
from economics, applied mathematics, and machine learning.
The Society for Mathematical Psychology (SMP) was founded in 1979.
The society manages the Journal of Mathematical Psychology, and organizes
a yearly meeting gathering several hundred participants coming from all over
the world. The European Mathematical Psychology Group (EMPG), is an
informal association of about one hundred scientists meeting every summer
in some European university. The first meeting was in 1971.
9. Cross References
43013. Bayesian theory, history of applications; 43017. Computational
approaches to model evaluation; 43021. Connectionist approaches; Decision and choice: behavioral decision research; 43029. Decision and Choice:
Economic Psychology; 43031. Decision and Choice: Luces Choice Axiom;
43033. Decision and choice: random utility models of choice and response
16

time; 43034. Decision and Choice: Utility and Subjective Probability, Contemporary Theories; 43037. Diffusion and Random Walk Processes; 43039.
Discrete state models of information processing 43040. Dynamic decision
making; 43043. Functional Equations in Behavioral and Social Sciences;
43047. Information theory; 43049. Knowledge spaces; 43050. Learning:
mathematical learning theory; 43051. Learning: Mathematical Learning
Theory, History; 43055. Markov decision processes; 43059. Mathematical
Psychology: History; 43062. Measurement theory: history and philosophy;
43064. Measurement: Representational Theory of; 43065. Memory Models,
Quantitative; 43066. Model Testing and Selection, Theory of; 43081. Psychometrics: Multidimensional Scaling in Psychology; 43084. Psychophysical
laws and theory, history; 43089. Sequential decision making; 43090. Signal Detection Theory; 43091. Signal Detection Theory, History of; 43094.
Stochastic dynamic models (choice, reponse, time); 43109. Decision and
Choice: Risk, Theories; 43113. Mathematical Learning Theory
10. References
Bush, R.R, & Mosteller, F., 1951. A mathematical model for simple
learning. Psychological Review, 58, 313323.
Estes,W.K., 1950. Toward a statistical theory of learning. Psychological
Review, 57, 94107.
Falmagne, J.C., 1985/2002. Elements of Psychophysical Theory. Oxford
University Press, New York. Reprinted in 2002.
Green, D.M., Swets, J.A., 1974. Signal Detection Theory and Psychophysics.
Krieger, New York.
Luce, R.D., 1959. Individual Choice Behavior. Wiley, New York.
Luce, R.D., 1986. Response Times: Their Role in Inferring Elementary
Mental Organization. Oxford University Press, New York.
Luce, R.D., Bush, R.R, & Galanter, E. (Eds.), 1963. Handbook of Mathematical Psychology, Volume 1. Wiley, New York.

17

Luce, R.D., Bush, R.R, & Galanter, E. (Eds.), 1963. Handbook of Mathematical Psychology, Volume 2. Wiley, New York.
Luce, R.D., Bush, R.R.,& Galanter, E. (Eds.), 1965) Handbook of Mathematical Psychology, Volume 3: Representation, axiomatization, and invariance. Wiley, New York.
Norman, F., 1972. Markov Processes and Learning Models. Academic
Press, New York.
Shepard, R.N., 1980.Multidimensional scaling, tree-fitting, and clustering. Science 214, 390398.
Stevens, S.S, 1957. On the psychophysical law. Psychological Review 64,
153181.
Thurstone, L.L., 1927. A law of comparative judgement. Psychological
Review, 34, 273286.
Thurstone, L.L., 1947. Multiple-FactorAnalysis. University of Chicago
Press.
11. Relevant Websites
The Society for Mathematical Psychology. http://www.mathpsych.org.
References
Aczel, J., Falmagne, J., Luce, R., 2000. Functional equations in the behavorial sciences. Japonica Mathematica 52, 469512.
Anderson, J., 1991. The adaptive nature of human categorization. Psychological Review 98, 40929.
Anderson, N., 1981. Foundations of Information Integration Theory. Academic Press, New York.
Ashby, F., Maddox, W., 2005. Human category learning. Annual Review of
Psychology 56, 149178.
18

Atkinson, R., Bower, G., Crothers, E., 1965. An Introduction to Mathematical Learning Theory. Wiley, New York.
Atkinson, R., Estes, W., 1963. Stimulus sampling theory, in: R.D.Luce,
R.R.Bush, Galanter, E. (Eds.), Handbook of Mathematical Psychology,
Volume 2. Wiley, New York, pp. 121268.
Batchelder, W., Riefer, D., 1999. Theoretical and empirical review of multinomial process tree modeling. Psychonomic Bulletin & Review 6, 5786.
Bharucha-Reid, A., 1960. Elements of the Theory of Markov Processes and
their Applications. MacGraw-Hill, New York.
Block, H., Marschak, J., 1960. Random ordering and stochastic theories of
responses, in: Olkin, I., Ghurye, S., Hoeffding, W., Madow, W., Mann,
H. (Eds.), Contributions to Probability and Statistics. Stanford University
Press, Stanford, CA, pp. .97132.
Bock, R., Jones, L., 1968. The Measurement and Prediction of Judgement
and Choice. Holden-Day, San Francisco, CA.
Bogacz, R., Brown, E., Moehlis, J., Holmes, P., Cohen, J., 2006. The physics
of optimal decision making: A formal analysis of models of performance in
twoalternative forced choice tasks. Psychological Review 113, 700765.
Boring, E., 1950. A History of Experimental Psychology. Prentice Hall,
Engelwood Cliffs, NJ.
Brown, S., Heathcote, A., 2008. The simplest complete model of choice
response time: Linear ballistic accumulation. Cognitive Psychology 57,
153178.
Busemeyer, J., Myung, I., 1992. An adaptive approach to human decision
making: Learning theory, decision theory, and human performance. Journal of Experimental Psychology: General 121, 177194.
Busemeyer, J., Rapoport, A., 1988. Psychological models of deferred decision
making. Journal of Mathematical Psychology 32, 91134.
Busemeyer, J., Townsend, J., 1993. Decision field theory: A dynamic
cognitive approach to decision making. Psychological Review 100, 432
459.
19

Bush, R., Mosteller, F., 1951. A mathematical model for simple learning.
Psychological Review 58, 313323.
Chater, N., Tenenbaum, J., Yuille, A., 2006. Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Sciences 10, 287291.
Clark, S., Gronlund, S., 1996. Global matching models of recognition memory: How the models match the data. Psychonomic Bulletin & Review 3,
3760.
Donders, F., 1868/1969. On the speed of mental processes. Acta Psychologica
30, 412431. Translated by W. G. Koster.
Dosher, B., Sperling, G., 1998. A century of information processing theory: Vision, attention and memory, in: Hochberg, J., Cutting, J.E. (Eds.),
Handbook of Perception and Cognition at Centurys End: History, Philosophy, Theory. Academic Press, San Diego, CA, pp. 199252.
Dzhafarov, E., 2011. Mathematical foundations of universal Fechnerian scaling, in: Berglund, B., Rossi, G.B., Townsend, J.T., Pendrill, L. (Eds.),
Measurements With Persons. Psychology Press, New York, pp. 185210.
Estes, W., 1950. Toward a statistical theory of learning. Psychological Review
57, 94107.
Estes, W., Suppes, P., 1959. Foundations of linear models, in: Bush, R.,
Estes, W. (Eds.), Studies in Mathematical Learning Theory. Stanford Univeristy Press, Stanford, CA, pp. 137179.
Falmagne, J.C., 1965. Stochastic models for choice reaction time with application to experimental results. Journal of Mathematical Psychology 2,
77124.
Falmagne, J.C., 1978. A representation theorem for finite random scale systems. Journal of Mathematical Psychology 18, 5272.
Falmagne, J.C., 1985/2002. Elements of Psychophysical Theory. Oxford
University Press, New York. Reprinted in 2002.
Falmagne, J.C., Cohen, S., Dwivedi, A., 1975. Two-choice reactions as an
ordered memory scanning process, in: Rabbit, P., Dornic, S. (Eds.), Attention and Performance V. Academic Press, pp. 296344.
20

Falmagne, J.C., Doignon, J.P., 2011. Learning spaces, in: Interdisciplinary


Mathematics Series. Springer, Heidelberg.
Frazier, P., Yu, A., 2008. Sequential hypothesis testing under stochastic
deadlines, in: Platt, J., Koller, D., Singer, Y., Roweis, S. (Eds.), Advances
in Neural Information Processing Systems 20, MIT Press, Cambridge, MA.
pp. 465472.
Gold, J., Shadlen, M., 2007. The neural basis of decision making. Annual
Review of Neuroscience 30, 535574.
Green, D., Swets, J., 1974. Signal Detection Theory and Psychophysics.
Krieger, New York.
Griffiths, T., Tenenbaum, J., 2009. Theory-based causal induction. Psychological Review 116, 661716.
Hilgard, E., 1943. Theories of Learning. Appleton Century Crofts, New York.
second edition.
Hull, C., 1943. Principles of Behavior. Appleton Century Crofts, New York.
Indow, T., 1995. Psychophysical scaling: Scientific and practical applications,
in: Luce, R.D., DZmura, M., Hoffman, D.D., Iverson, G.I., K, R.A. (Eds.),
Geometric Representations of Perceptual Phenomena. Papers in Honor of
Tarow Indow for his 70th Birthday. Erlbaum, Mahwah, NJ, pp. 128.
James, W., 1890/1950. The Principles of Psychology. volume I.
Reprinted by Dover Publications.

Holt.

Kemp, C., Tenenbaum, J., 2008. The discovery of structural form. Proceedings of the National Academy of Sciences 105, 1068710692.
Krantz, D., 1969. Threshold theories of signal detection. Psychological Review 76, 308324.
Krantz, D., 1972. A theory of magnitude estimation and cross modality
matching. Journal of Mathematical Psychology 9, 168199.
Krantz, D., Luce, R., Suppes, P., Tversky, A., 1971. Foundations of Measurement, Volume 1: Additive and Polynomial Representations. Academic
Press, New York.
21

Kruschke, J., 2008. Models of categorization, in: Sun, R. (Ed.), The Cambridge Handbook of Computational Psychology. Cambridge University
Press, New York, pp. 267301.
Kruschke, J., 2011. Doing Bayesian Data Analysis: A Tutorial with R and
BUGS. Academic Press / Elsevier.
Laming, D., 1968. Information Theory of Choice Reaction Time. Academic
Press, London.
Lee, M., 2008. Three case studies in the Bayesian analysis of cognitive models.
Psychonomic Bulletin & Review 15, 115.
Lee, M., 2011. How cognitive modeling can benefit from hierarchical Bayesian
models. Journal of Mathematical Psychology 55, 17.
Levitt, H., 1970. Transformed up-down methods in psychoacoustics. The
Journal of the Acoustical Society of America 49, 467476.
Link, S., Heath, R., 1975. A sequential theory of psychological discrimination.
Psychometrika 40, 77105.
Luce, R., 1959. Individual Choice Behavior. Wiley, New York.
Luce, R., 1986. Response Times: Their Role in Inferring Elementary Mental
Organization. Oxford University Press, New York.
Luce, R., DZmura, T., Hoffman, D., Iverson, G.., A.K., R. (Eds.), 1995.
Geometric Representations of Perceptual Phenomena. Papers in Honor of
Tarow Indow for his 70th Birthday. Erlbaum, Mahwah, NJ.
Luce, R., Galanter, E., 1963. Discrimination, in: Luce, R., Bush, R.,
Galanter, E. (Eds.), Handbook of Mathematical Psychology, Volume 1.
Wiley. volume I, pp. 191244.
Luce, R., Krantz, D., Suppes, P., Tversky, A., 1990. Foundations of measurement, Volume 3: Representation, axiomatization, and invariance. Academic Press, San Diego, CA.
Luce, R., Suppes, P., 1965. Preference, utility and subjective probability, in:
Luce, R.D., Bush, R.R., Galanter, E. (Eds.), Handbook of Mathematical
Psychology, Volume 3. Wiley, pp. 252410.
22

MacMillan, N., Creelman, C., 2004. Detection Theory: A Users Guide (2nd
ed.). Erlbaum, Hillsdale, NJ.
Marley, A. (Ed.), 1997. Choice, Decision and Measurement. Papers in Honor
of R. Duncan Luce. Erlbaum, Mahwah, NJ.
McGill, W., 1963. Stochastic latency mechanisms, in: Luce, R.D., Bush,
R.R., Galanter, E. (Eds.), Handbook of Mathematical Psychology, Volume 1. Wiley, New York, pp. 309360.
Miller, G., 1964. Mathematics and Psychology. McGraw-Hill, New York.
Myung, J., Montenegro, M., Pitt, M., 2007. Analytic expressions for BCDMEM models of recognition memor. Journal of Mathematical Psychology
51, 198204.
Navarro, D., 2005. Analyzing the RULEX model of category learning. Journal
of Mathematical Psycholog 49, 259275.
Navarro, D., Fuss, I., 2009. Fast and accurate calculations for first-passage
times in Wiener diffusion models. Journal of Mathematical Psychology 53,
222230.
Navarro, D., Griffiths, T., 2008. Latent features in similarity judgment: A
nonparametric Bayesian approach. Neural Computation 20, 25972628.
Norman, F., 1972. Markov Processes and Learning Models. Academic Press,
New York.
Norman, K., Detre, G., Polyn, S., 2008. Computational models of episodic
memory, in: Sun, R. (Ed.), The Cambridge handbook of computational
psychology. Cambridge University Press, New York, pp. 189224.
Nosofsky, R., 1992. Similarity scaling and cognitive process models. Annual
Review of Psychology 43, 2553.
Nosofsky, R., Palmeri, T., 1997. An exemplar-based random walk model of
speeded classification. Psychological Review 104, 266300.
Parzen, E., 1994. Stochastic Processes. Holden-Day, San Francisco, CA.

23

Pitt, M., Myung, I., Zhang, S., 2002. Toward a method of selecting among
computational models of cognition. Psychological Review 109, 472491.
Pleskac, T., Busemeyer, J., 2010. Two-stage dynamic signal detection: A
theory of confidence, choice, and response time. Psychological Review
117, 864901.
Ratcliff, R., 1978. A theory of memory retrieval. Psychological Review 85,
59108.
Ratcliff, R., McKoon, G., 2008. The diffusion decision model: Theory and
data for two-choice decision tasks. Neural Computation 20, 873922.
Robbins, H., Monro, S., 1951. A stochastic approximation method. The
Annals of Mathematical Statistics 22, 400407.
Rouder, J., Lu, J., Speckman, P., Sun, D., Morey, R., Naveh-Benjamin, M.,
2007. Signal detection models with random participant and item effects.
Psychometrika 72, 621642.
Rouder, J., Speckman, P., Sun, D., Morey, R., Iverson, G., 2009. Bayesian ttests for accepting and rejecting the null hypothesis. Psychonomic Bulletin
& Review 16, 225237.
Rouder, J., Sun, D., Speckman, P., Lu, J., Zhou, D., 2003. A hierarchical
Bayesian statistical framework for response time distributions. Psychometrika 68, 589606.
Rumelhart, D., McClelland, J. (Eds.), 1986. Parallel distributed processing.
Exploration in the microstructure of cognition, Volume 1. MIT Press,
Cambridge, MA.
Shepard, R., 1980. Multidimensional scaling, tree-fitting, and clustering.
Science 214, 390398.
Simen, P., Cohen, J., Holmes, P., 2006. Rapid decision threshold modulation
by reward rate in a neural network. Neural Networks 19, 10131026.
Smith, P., 2000. Stochastic dynamic models of response time and accuracy:
A foundationalprimer. Journal of Mathematical Psychology 44, 408463.

24

Smith, P., Ratcliff, R., 2004. The psychology and neurobiology of simple
decisions. Trends in Neurosciences 27, 161168.
Smith, P., Vickers, D., 1988. The accumulator model of two-choice discrimination. Journal of Mathematical Psychology 32, 135168.
Sternberg, S., 1963. Stochastic learning theory, in: Luce, R.D., Bush, R.,
Galanter, E. (Eds.), Handbook of Mathematical Psychology, Volume 2.
Wiley, New York.
Stevens, S., 1957. On the psychophysical law. Psychological Review 64,
153181.
Suppes, P., Ginsburg, R., 1963. A fundamental property of all-or-none models, binomial distribution of responses prior to conditioning, with application to concept formation in children. Psychological Review 70, 139171.
Suppes, P., Krantz, D., Luce, R., Tversky, A., 1989. Foundations of Measurement, Volume 2. Academic Press, San Diego, CA.
Tenenbaum, J., Griffiths, T., 2001. Generalization, similarity, and Bayesian
inference. Behavioral and Brain Sciences 24, 629640.
Thurstone, L., 1919. The learning curve equation. Psychological Monographs
26, 151.
Thurstone, L., 1927a. A law of comparative judgement. Psychological Review
34, 273286.
Thurstone, L., 1927b. Psychological analysis. American Journal of Psychology 38, 368389.
Thurstone, L., 1947. Multiple-Factor Analysis. University of Chicago Press.
Tversky, A., 1972. Elimination by aspects: A theory of choice. Psychological
Review 79, 281299.
Vandekerckhove, J., Tuerlinckx, F., Lee, M., in press. Hierarchical diffusion
models for two-choice response time. Psychological Methods .
Vickers, D., 1970. Evidence for an accumulator model of psychophysical
discrimination. Ergonomics 13, 3758.
25

Vickers, D., 1979. Decision Processes in Visual Perception. Academic Press,


New York, NY.
Wald, A., Wolfowitz, J., 1948. Optimal character of the sequential probability
ratio test. Annals of Mathematical Statistics 19, 326339.
Wasan, M., 1969. Stochastic approximation. Cambridge Tracts in Mathematics and Mathematical Physics 58.
Yellott, Jr, J., 1969. Probability learning with non contingent success. Journal of Mathematical Psychology 6, 541575.

26

Você também pode gostar