Escolar Documentos
Profissional Documentos
Cultura Documentos
NOTES
Generalized Infinite Products for Powers of e1/k 161
Scott Ginebaugh
REVIEWS
The Philosophies of Mathematics 188
by Alan Baker
MATHBIT
169, Another Proof That There Are Infinitely Many Primes
ASSOCIATE
STEPHEN F . KENNEDY
DONALD J . ALBERS
392 pp., Hardbound, 2015 MAA Member: $45.00
ISBN: 978-0-88385-588-1
EDITORS : GERALD L . ALEXANDERSON
DELLA DUMBAUGH
FRANK A . FARRIS
DEANNA B . HAUNSPERGER
PAUL ZORN
JAMES TANTON
EDITOR
Susan Jane Colley
Oberlin College
ASSOCIATE EDITORS
David Aldous Daniel Krashen
University of California, Berkeley University of Georgia
Elizabeth S. Allman Jeffrey Lawson
University of Alaska Fairbanks Western Carolina University
David H. Bailey Susan Loepp
University of California, Davis Williams College
Scott T. Chapman Jeffrey Nunemacher
Sam Houston State University Ohio Wesleyan University
Allan Donsig Bruce P. Palka
University of Nebraska-Lincoln National Science Foundation
Michael Dorff Paul Pollack
Brigham Young University University of Georgia
John Ewing Adriana Salerno
Math for America Bates College
Stephan Ramon Garcia Edward Scheinerman
Pomona College Johns Hopkins University
Luis David Garcia Puente Anne V. Shepler
Sam Houston State University University of North Texas
Sidney Graham Frank Sottile
Central Michigan University Texas A&M University
J. Roberto Hasfura-Buenaga Susan G. Staples
Trinity University Texas Christian University
Michael Henle Sergei Tabachnikov
Oberlin College Pennsylvania State University
Tara Holm Daniel Velleman
Cornell University Amherst College
Lea Jenkins Cynthia Vinzant
Clemson University North Carolina State University
Gary Kennedy Steven H. Weintraub
Ohio State University, Mansfield Lehigh University
Chawne Kimber Kevin Woods
Lafayette College Oberlin College
ELECTRONIC PRODUCTION
MANAGING EDITOR AND PUBLISHING MANAGER
Bonnie K. Ponce Beverly Joy Ruedi
NOTICE TO AUTHORS Proposed problems and solutions may be submitted to Prob-
lem Editor Daniel Ullman online via https://american
The MONTHLY publishes articles, as well as notes and other fea- mathematicalmonthly.submittable.com/submit.
tures, about mathematics and the profession. Its readers span
a broad spectrum of mathematical interests, and include pro- Questions but not submissions may be addressed to
fessional mathematicians as well as students of mathematics monthlyproblems@maa.org.
at all collegiate levels. Authors are invited to submit articles
Advertising correspondence should be sent to:
and notes that bring interesting mathematical ideas to a wide
audience of MONTHLY readers. MAA Advertising
1529 Eighteenth St. NW
The MONTHLY’s readers expect a high standard of exposition;
Washington DC 20036
they expect articles to inform, stimulate, challenge, enlighten,
Phone: (202) 319-8461
and even entertain. MONTHLY articles are meant to be read, en-
E-mail: advertising@maa.org
joyed, and discussed, rather than just archived. Articles may
be expositions of old or new results, historical or biographical Further advertising information can be found online at www.
essays, speculations or definitive treatments, broad develop- maa.org.
ments, or explorations of a single application. Novelty and
Change of address, missing issue inquiries, and other sub-
generality are far less important than clarity of exposition
scription correspondence can be sent to:
and broad appeal. Appropriate figures, diagrams, and photo-
graphs are encouraged. maaservice@maa.org.
Notes are short, sharply focused, and possibly informal. They or
are often gems that provide a new proof of an old theorem, a The MAA Customer Service Center
novel presentation of a familiar theme, or a lively discussion P.O. Box 91112
of a single issue. Washington, DC 20090-1112
(800) 331-1622
Submission of articles, notes, and filler pieces is required via the
(301) 617-7800
MONTHLY’s Editorial Manager System. Initial submissions in pdf or
LATEX form can be sent to Editor Susan Jane Colley at Recent copies of the MONTHLY are available for purchase
www.editorialmanager.com/monthly. through the MAA Service Center at the address above.
The Editorial Manager System will cue the author for all re- Microfilm Editions are available at: University Microfilms In-
quired information concerning the paper. The MONTHLY has ternational, Serial Bid coordinator, 300 North Zeeb Road, Ann
instituted a double-blind refereeing policy. Manuscripts that Arbor, MI 48106.
contain the author’s names will be returned. Questions con-
cerning submission of papers can be addressed to the Editor- The AMERICAN MATHEMATICAL MONTHLY (ISSN 0002-9890) is
Elect at monthly@maa.org. Authors who use LATEX can find published monthly except bimonthly June-July and August-
our article/note template at www.maa.org/monthly.html. September by the Mathematical Association of America
This template requires the style file maa-monthly.sty, which at 1529 Eighteenth Street, NW, Washington, DC 20036 and
can also be downloaded from the same webpage. A format- Lancaster, PA, and copyrighted by the Mathematical Asso-
ting document for MONTHLY references can be found there too. ciation of America (Incorporated), 2017, including rights to
this journal issue as a whole and, except where otherwise
Letters to the Editor on any topic are invited. Comments, criti- noted, rights to each individual contribution. Permission to
cisms, and suggestions for making the MONTHLY more lively, make copies of individual articles, in paper or electronic
entertaining, and informative can be forwarded to the Editor form, including posting on personal and class web pages,
at monthly@maa.org. for educational and scientific use is granted without fee
The online MONTHLY archive at www.jstor.org is a valuable provided that copies are not made or distributed for profit
resource for both authors and readers; it may be searched or commercial advantage and that copies bear the follow-
online in a variety of ways for any specified keyword(s). MAA ing copyright notice: [Copyright 2017 Mathematical Asso-
members whose institutions do not provide JSTOR access ciation of America. All rights reserved.] Abstracting, with
may obtain individual access for a modest annual fee; call credit, is permitted. To copy otherwise, or to republish,
800-331-1622 for more information. requires specific permission of the MAA’s Director of Pub-
lications and possibly a fee. Periodicals postage paid at
See the MONTHLY section of MAA Online for current informa- Washington, DC, and additional mailing offices. Postmas-
tion such as contents of issues and descriptive summaries of ter: Send address changes to the American Mathemati-
forthcoming articles: cal Monthly, Membership/Subscription Department, MAA,
www.maa.org/monthly.html. 1529 Eighteenth Street, NW, Washington, DC 20036-1385.
The Image of a Square
Annalisa Crannell, Marc Frantz, and Fumiko Futamura
Abstract. Every quadrangle is the perspective image of a square. We illustrate this statement
by using perspective art techniques and by analogy to the visualization of conic sections.
We also give examples of how understanding perspective images of squares can be applied
fruitfully in the areas of photogrammetry (determining true relative sizes of real-world objects
from a photograph) and linear algebra (more specifically, in the decomposition of projective
transformations).
1. INTRODUCTION. What looks like a square? Which geometric shapes are the
images of squares? Brook Taylor—of Taylor Series fame—illustrated literally the cen-
trality of squares to perspective artists in the first drawing of his New Principles of
Linear Perspective, published in 1719 [17]. Taylor was both an accomplished mathe-
matician and a skillful landscape painter. New Principles brought Taylor’s interest in
mathematics and drawing together, noting in the preface that the subject of perspec-
tive “. . . has still been left in so low a degree of Perfection, as it is found to be, in the
Books that have been hitherto wrote upon it.” His book introduced, among other things,
the usefulness of a “vanishing line” (a generalization of the more familiar “vanishing
point”), and stirred a revival of interest in the mathematics of perspective in Europe [1].
Figure 1 demonstrates Taylor’s setup illustrating that the trapezoid abcd is the per-
spective image of the square ABCD. A question Taylor could have asked himself (but
apparently never did) is, how much could we deform abcd and still be able to make
the same claim? Could a, b, c, d be the vertices of a diamond? What about a kite?
Could they be the vertices of a nonconvex shape such as a Penrose dart? The answer
is surprising: Every quadrangle is the perspective image of a square.
The goal of this paper is to provide some visually compelling insight into the cor-
respondence between squares and their many images—a visual insight that incorpo-
rates not only familiar images of direct perspective such as Taylor’s, but also allows
for somewhat more complicated interpretations of perspective (such as the discon-
nected pools of light in Figure 2). Along the way, we draw analogies between our
main theorem and images of conic sections.
In addition to giving the theorem a robust visual interpretation, we will also give
examples of how understanding perspective images of squares can be applied fruit-
fully in the areas of photogrammetry (determining true relative sizes of real-world
objects from a photograph) and linear algebra (more specifically, in the decomposition
of projective transformations). We’ll use these fields to dig into some of the “why we
care” aspect of this subject.
But first, we will need some definitions and background.
H
G d
b a
c d
e
K
B C
D I
A F
Figure 1. The first figure of Brook Taylor’s New Principles of Linear Perspective [17]. Here ABCD depicts a
square, and abcd depicts the image of a square (used with the permission of the Max Planck Institute for the
History of Science).
the lampshade, the bulb projects two trapezoidal pools of light on the wall. At first
glance these two pools of light don’t seem too remarkable, possibly because we think
of the bulb projecting the upper square opening ABCD of the lampshade onto the wall
and ceiling, and the lower opening onto the wall and floor. However, there is a more
interesting way of looking at this image.
It is possible to regard the square ABCD as being completely projected onto the
wall, which we think of as an infinite plane π. The image of any point, say A, is the
unique point on the wall collinear with A and the center of projection O. Following
the dashed “connector” OA, we see that the image of A is Aπ , and similarly the image
of B is Bπ . The image of C is a little different, because O lies between C and its image
Cπ . The same applies to D and Dπ . Notice how crucial the bottom of the lampshade is
in physically realizing the complete projection. The result is that the interior of ABCD
is projected to two separate, unbounded pools of light.
The unboundedness prompts a second look, revealing something we have glossed
over. The midpoints of AD and BC on the top of the lampshade are marked in white.
A line from O to either of these points is parallel to the plane π; hence the images
of these points are in some sense infinitely distant. If we were to be really precise
about it—which we won’t—we would need the complete formalism of spaces like E2
and E3 , called extended Euclidean space (see [6, p. 60–62] or [14, p. 84] for a formal
definition). For the purposes of this paper, it suffices to think of these spaces as the
union of all points in R2 or R3 (called ordinary points) together with additional points
(called points at infinity) such that every set of parallel lines meet at exactly one point at
infinity and every set of parallel planes meet at exactly one line at infinity. In particular,
the images of the midpoints of AD and BC are points at infinity belonging to the
(extended) plane π. (Geometers will be familiar with a similar space, real projective
100
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Bπ
Aπ
B
C
A
D
O
π
Dπ
Cπ
Figure 2. A wall-mounted lamp with a square lampshade (a rectangular parallelepiped with square ends).
space, or PR3 ; working in E3 instead of PR3 allows us to use the standard metric
properties of Rn such as distance between ordinary points and angles between ordinary
lines.)
What we would really like to say is that the quadrangle Aπ Bπ Cπ Dπ is the image
of the square ABCD. However, the term “quadrangle” often refers to figures having
line segments as edges (see, for example, [18]). If we trace in order the points Aπ , Bπ ,
Cπ , Dπ in Figure 2, we trace a “bow tie” like that in Figure 3(a). However, we should
not regard that figure as the image of the square ABCD; for example, our previous
remarks show that the interior of segment AD does not map to the interior of Aπ Dπ ,
and the interior of BC does not map to the interior of Bπ Cπ . To address this issue, we
will modify our usual geometric definitions somewhat. Let us agree that, given four
coplanar points A, B, C, D, no three of which are collinear, “the quadrangle ABCD”
refers to the points A, B, C, and D called vertices, and the infinite lines AB, BC,
CD, and DA, called edges. The same goes for the quadrangle Aπ Bπ Cπ Dπ . It will also
be useful to refer to the diagonals of ABCD, namely the infinite lines AC and BD.
Thus, the notation determines which of the six lines associated with the quadrangle
ABCD are to be considered edges and which diagonals. In Figure 3(b) the edges of the
quadrangles ABCD and Aπ Bπ Cπ Dπ are drawn with solid lines, and the diagonals with
dashed lines.
In fact, we regard the edges and diagonals not just as infinite lines but as extended
lines, meaning that each contains a point at infinity. In a natural way, a quadrangle
ABCD is a parallelogram if AB CD (that is, AB is parallel to CD, meaning that they
meet at a point at infinity) and AD BC. A parallelogram is a rectangle if adjacent
edges (that is, edges with a common vertex) are perpendicular, say AB ⊥ BC, and a
C D
C D
Dπ Cπ Dπ Cπ
(a) (b)
Figure 3. We choose the two figures in (b) to represent quadrangles ABCD and Aπ Bπ Cπ Dπ , rather than those
in (a). The extended solid lines are edges and the dashed lines are diagonals.
rectangle is a square if its diagonals are perpendicular; that is, if AC ⊥ BD. Therefore
the lamp in Figure 2 projects a square ABCD like that in Figure 3(b) to a quadrangle
Aπ Bπ Cπ Dπ like the one next to it. The edges AD and BC, being parallel, have a
common point at infinity, and as we will show, that point projects to the center of the
“×” in Figure 3(b)—that is, the intersection Aπ Dπ · Bπ Cπ .
Before discussing that point at infinity, we add a few more auxiliary parts to our
concept of a quadrangle. To motivate the choice of terminology, Figure 4 portrays a
quadrangle ABCD as the top of a box drawn in perspective. The box could be say,
an office building seen from an airplane, with a horizon line v seen in the distance. In
imitation of Taylor’s perspective drawing terminology [1, p. 8], we denote the principal
vanishing points of ABCD to be the intersections V = AD · BC and V = AB · CD of
nonadjacent edges of the quadrangle. These points determine the line v, which we will
call the vanishing line of ABCD. The points W = v · AC and W = v · BD are the
vanishing points of the diagonals.
W V W V
C
D
B
A
Figure 4. Auxiliary parts of a quadrangle, portrayed as the top of a building with a distant horizon.
102
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
disk represents the center of the × of Figure 3(b), that is, the point Vπ = Aπ Dπ · Bπ Cπ
on the wall that is the image of the infinitely distant point V . Although the lamp may
seem unremarkable at first, each of its parts—the bulb, the shade, the post, and the
wall mount—has an interesting interpretation.
A few more remarks before we present the main theorem. We have seen that the
light rays of the light bulb O define a bijective mapping from the plane of ABCD
to plane of the wall. Given distinct, extended planes σ and π, and a point O not on
either of them, we call f O : σ → π a perspectivity with center O if for each point
X ∈ σ , the point X π = f O (X ) is the point of π collinear with O and X . Likewise,
f O−1 : π → σ is also a perspectivity with center O. The intersection line σ · π is a
line of fixed points, called the axis of f O . Given a set S in σ , we call f O (S ) the
perspective image of S (under f O ). Since the latter set is also the perspective image
of the former under f O−1 , the two sets are said to be in perspective. In particular, we
say that a quadrangle (in one plane) is the perspective image of a square (in another)
if a perspectivity maps the vertices and edges of the square to the vertices and edges,
respectively, of the quadrangle. We note also that points at infinity are sometimes called
directions, because every line through such a point runs in the same (parallel) direction.
We will use the fact that two such points represent perpendicular directions when every
line containing one point is perpendicular to every line containing the other.
To assist in reading and following the notation, we have adapted Andersen’s
mnemonic for choosing variable names [1]: π and σ for planes, with σ contain-
ing the square, a for the axis between π and σ , v for vanishing lines, and V , V , W ,
W for vanishing points.
3. MAIN THEOREM. The theorem that every quadrangle is the perspective image
of a square is both known and unfamiliar. It is known in the sense that the theorem
appears in various papers and books in the mathematical literature (Dörrie, for
example, called this theorem one of the “true jewels of mathematical miniature work”
and used a diagram of the proof to adorn the cover of his book [4]). But proofs have
tended to fall into one of two camps. The proofs that appeal to visual perspective
restrict themselves implicitly to the case of convex polygons (most notably, see [5]);
approaches that allow for more general configurations (for example [4], [9], and [19])
often discard the visual interpretation, although even in those cases the accompanying
diagrams show the usual convex setup. Perhaps this disconnect between “visualiz-
ing” and “proving” explains why the theorem is also unfamiliar; it seems to appear
infrequently in modern projective geometry texts, and sometimes its appearance in the
literature even comes as a conjecture or a puzzle (see the concluding question of [13]
and the contest at [12]).
Since versions of our main theorem are proved elsewhere, we give an informal
proof, concentrating on the “generic” case of a quadrangle ABCD in which no two
edges are parallel, so that the vanishing points V , V , W , W are all ordinary.
σ
O
90°
90°
V W
W D V
A π
C
B
Bσ
Cσ σ
Aσ
Dσ
The method of the proof is easily adapted to the case of the bow tie quadrangle
Aπ Bπ Cπ Dπ of Figure 3(b), which resulted from the lamp projection. In Figure 6 the
plane π of Aπ Bπ Cπ Dπ lies horizontally, with one of the vanishing points of the quad-
rangle given by Vπ = Aπ Dπ · Bπ Cπ , in analogy to Figure 4. Where are the other three
vanishing points? With regard to Vπ = Aπ Bπ · Cπ Dπ , we have Aπ Bπ Cπ Dπ , hence
Vπ is the point at infinity—that is, the direction—parallel to Aπ Bπ and Cπ Dπ . We
therefore draw the vanishing line vπ through Vπ parallel to Cπ Dπ as shown, and locate
Wπ = vπ · Aπ Cπ and Wπ = vπ · Bπ Dπ . To locate the center of projection O as in the
proof, let σ be the plane through vπ perpendicular (for convenience) to π, and draw
a semicircle in σ with diameter Wπ Wπ . Since Vπ is at infinity, there is no semicircle
in σ with diameter Vπ Vπ , but if we imagine Vπ as an ordinary point on vπ to the left
(say) of Vπ , and then move Vπ farther and farther to the left, a semicircle connecting
the two points stays anchored at Vπ and locally looks more and more like a ray perpen-
dicular to vπ at Vπ . This turns out to be the correct approach; as shown in Figure 6, O
is the intersection of the perpendicular to vπ at Vπ with the semicircle having diameter
Wπ Wπ .
The other parts of the proof can be done analogously to the proof of Theorem 1, as
in Figure 7. Observe that the location of O is visually consistent with the location of
the light bulb in Figure 2.
104
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
the “semicircle” σ
with “diameter” VπVπ
O
Cπ Dπ
π
Wπ Vπ Wπ
π
Aπ Bπ
D C σ
O
TOP OF LAMP
A B
Cπ Vπ Dπ
π
Aπ Bπ
Figure 7. Completing the square for the bow tie quadrangle Aπ Bπ Cπ Dπ . If we stood above the lamp in
Figure 2 and looked down, ABCD would be the top of the lampshade and Aπ Bπ Cπ Dπ its image on the wall.
wall plane π is slanted, as might happen in an attic room. The square opening ABCD
of the lampshade projects to a quadrangle Aπ Bπ Cπ Dπ that includes a dart shape, dia-
grammed in the inset at the upper right.
Our two lamp examples are reminiscent of a common example of conic sections,
namely the light patterns cast by a lamp with a circular cylindrical shade. With the
bulb at the center of the shade, the light streams out in a double-napped circular cone,
and the sections of the cone of light by walls, floor, and ceiling are conic sections. If
instead of a circular cone like x 2 + y 2 = z 2 we consider a surface of the form
we get a “square cone” like that in Figure 9, comprised of a pair of pyramids with
square horizontal cross sections. That is, we get the kind of volume illuminated by the
square lampshades in our examples, and each section of such a cone is the perspective
image of a square. We think of the vertex O = (0, 0, 0) as the center of perspective,
and choose a square, horizontal slice ABCD as the square of interest. In Figure 9 the
plane π meets only the lower pyramid, the intersection being a convex quadrangle
Aπ Bπ Cπ Dπ . Indeed, the idea of looking at the intersection of a pyramid with a plane
was the basis for a proof that Emch published in this journal in 1917 illustrating, in his
words, “the importance of perspective as an introduction to projective geometry” [5].
Cπ
Dπ
Bπ
D C A B
O
π
Dπ
Figure 8. A floor lamp with a square lampshade casts light on an attic wall.
But for more interesting quadrangles, we need more than just one of the pyramids;
we need the full cone. In Figure 10 we see two views of a situation in which the
plane π tilts so that it meets all four faces of the upper pyramid, and just two faces of
the lower pyramid. The intersection of the plane and the cone is a dart-type quadrangle
Aπ Bπ Cπ Dπ , like that created by the lamp in Figure 8. The formula for the square cone
is easy to work with, and we encourage readers to use a graphing program to explore
the interesting variety of “square conics” analogous to circles, ellipses, parabolas, and
hyperbolas. All of them are images of a square!
5. IMITATING THE MASTERS. Our work so far has given the impression that a
square and its quadrangle image always (or often) lie in separate planes. But a result
similar to Theorem 1 holds even when we restrict all objects to a single plane. The
proof of Theorem 1 leads to the solution of a same-plane drawing problem that is
essentially the reverse of a type investigated by Renaissance masters such as Leone
Battista Alberti (1404–1472) and Piero della Francesca (1415–1492), as well as math-
ematician Brook Taylor. In Figure 11, which is is essentially a partial version of
Figure 5, we think of the planes σ and π as the front and top, respectively, of a
106
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
C B
A
D
x y z
Dπ
Aπ Cπ
Bπ
π
A B
B D
C C A
π Cπ Aπ
Dπ
Bπ
box (rectangular parallelepiped). The problem posed by the masters involved a reg-
ular polygon or circle drawn undistorted on the front of a box, and they sought to
reproduce a copy of that object seen obliquely in perspective on the top of the box.
The solutions generally resulted in a figure on the top that was the perspective image
of the one on the front (see, for example, [11, pp. 186–189]). The proofs given here
show that reverse can be done, that is, start with a quadrangle ABCD in the plane π—a
square distorted by perspective—and draw a square Aσ Bσ Cσ Dσ in the plane σ that is
the perspective image of ABCD under some perspectivity from π to σ .
σ
O
90°
90°
υ V W
W D V
A
π
C
a B
X
parallel
σ
Aσ
Figure 11. We reverse Taylor’s method for constructing the perspective image of a square.
Proof. With just a minor difference, the proof imitates that of Theorem 1. Let ABCD
be a quadrangle in a plane π with vanishing points V , V , W , W and vanishing line
v. Choose O on the intersection of the circles with diameters VV and WW . Let a be
a line in π parallel to v, and let FO be the perspective collineation with center O and
1 A fine point needed here is that we also consider the planes σ and σ to be parallel in space to the plane
of the page, on which the whole configuration is projected. Thus parallel lines in the separate planes are drawn
parallel in the figure.
108
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
axis a such that Vπ := FO (V ) is the point at infinity on OV. Since v and a are parallel,
they have a common point P at infinity, and Pπ := FO (P) = P since P lies on a. But
FO (V ) is also a point at infinity, hence FO maps v and the points V , V , W , and W to
infinity as in the proof of Theorem 1. By the same reasoning as in that earlier proof,
Aπ Bπ Cπ Dπ is a square, where Aπ = FO (A), Bπ = FO (B), etc.
As we will see in the next two sections, this corollary inspired by Renaissance artists
leads to intriguing modern applications.
A1
H1
3
B1
2 C1
1
C0
–1 B0
H0
–2
A0
–3
–5 –4 –3 –2 –1 0 1 2 3
Although the statements of Corollaries 2 and 3 ought to be well known, we have not
been able to find these elsewhere in the literature. Rather, a common computer vision
technique (see [8], [16]) decomposes a projective collineation into three components
as the composition of an orientation-preserving similarity, an affine transformation,
and a perspective collineation.
Figure 13 illustrates Corollary 2, which says that the projective collineation of Fig-
ure 12 is a product FO ◦ S. The figure H2 in Figure 13 is the image of H1 under the
similarity S, and the point O is the center of the perspective collineation FO that maps
H2 to H1 . We have drawn all the connectors to show that they indeed meet at O.
Observe that in this case we could contract H2 toward O and then reflect it in O, to
get a figure congruent (not just similar) to H0 as in Corollary 3.
110
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
6
4 O
H2 H1
2
–2
H0
–4
–6
–8
–10 –8 –6 –4 –2 0 2 4
Figure 13. A decomposition of the projective collineation in Figure 12. The figure H0 is mapped to H2 by a
similarity, and H2 is mapped to H1 by a perspective collineation with center O.
V
a
Now obviously the resulting images are the correct shape—they’re square, as guar-
anteed by Corollary 1—but are they the correct relative sizes? To see that they are,
imagine that the actual receding wall is covered by a square grid whose lines are par-
allel to the edges of the dark square, as suggested in Figure 16. Of course, we see
the receding wall at an oblique angle, so those grid squares don’t appear square; they
are just more quadrangles in the plane of the photograph. Each such quadrangle has
the same principal vanishing points V, V as the dark flyer; likewise, the diagonals of
these quadrangles have the same vanishing points as the diagonals of the dark flyer
(not shown in the figure). Thus FO maps these quadrangles to squares aligned with the
square image of the dark square, resulting in a square grid, as shown to the right of a.
In fact, the image is a faithful, undistorted reproduction of the hallway wall, reversed
from left to right. Since the perspective collineation FO preserves points, lines, and
intersections, it maps any object on the oblique grid so that its image has the corre-
sponding intersection points with the square grid, hence the relative sizes of the dark
square and the light square in Figure 15 are exactly as shown. From this figure, we see
that the light flyer is actually larger than the dark one, which answers our question.
We can be even more specific; a standard room is 8 feet high. Measuring carefully
shows us that the light square in Figure 15 is 1/4 as long as the vertical corner edge
112
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
V
a
U
of the hallway, so that square is two feet on each side, whereas the near dark square is
3/4 the size of the light square, or 18 inches on a side. That is, with largely geometric
techniques (as opposed to numerical or computational techniques), we can discover
photogrammatic information from a photograph or perspective drawing.
Brook Taylor’s view. Observe that in using Corollary 1, we treated the objects in the
photograph as figures all in the same plane. That is, we solved the problem by con-
structing a perspective collineation—a map from the plane to itself. But alternatively,
we could have considered this as a problem involving several planes, using Theorem 1
to construct a perspectivity from the plane of the receding wall to the plane of the adja-
cent wall facing us in the photograph. In that case, however, we must bring in some
notions we didn’t treat explicitly in our proof of Theorem 1.
For example, although our proof of Theorem 1 was confined to the situation where
all vanishing points of the given quadrangle are ordinary, the vanishing points of the
flyers on the receding wall lie at infinity, since the edges of the flyers are actually
parallel. Consequently, their common vanishing line, whose image in the photograph
is the line v in Figure 15, lies at infinity also. We can see the line in the diagram, but
it’s infinitely far away in space. To choose a plane σ through this line parallel to the
wall plane σ facing us, we must therefore choose the so-called plane at infinity—the
union of all the points and lines at infinity. Thus center of perspective O, which lies in
σ , is also a point at infinity. In other words, the dashed connectors emanating radially
from O are actually parallel in space to one another—they merely appear to converge
because we see them in perspective, like sunbeams—and the associated perspectivity
with center O is a parallel projection from the receding wall to the wall facing us.
In fact, it can be shown that these rays are parallel in space to the floor and ceiling,
and pierce each wall at a 45◦ angle. This map is therefore an isometry that causes a
reflected image of the receding wall to appear on the wall facing us, as though one wall
were folded at the corner onto the other.
We mention the notion of folding as an isometry because Taylor himself described
the same phenomenon in his construction (the reverse of ours, starting with the square
I have observed that the Shapes of the Representations of Figures on a Plane don’t at all depend
upon the Angle the Picture makes with that Plane.
Our investigation into the perspective images of squares began with Taylor’s three-
dimensional interpretation, and then used those results to move our investigations into
two-dimensional applications. Taylor’s observation—that a same-plane square must
necessarily be the same size as a different-plane square that has the same quadrangle
image—allows us to come full circle.
Or, perhaps we should say, it allows us to come full square.
ACKNOWLEDGMENT. The authors wish to thank the referees, whose careful and critical readings of our
earlier drafts were invaluable to us in our revisions. This work was supported by NSF TUES Grant DUE-
1140135.
REFERENCES
1. K. Andersen, Brook Taylor’s Work on Linear Perspective. Springer-Verlag, New York, 1992.
2. H. Anton, Elementary Linear Algebra. Seventh ed. John Wiley & Sons, New York, 1994.
3. H. S. M. Coxeter, Projective Geometry. Second ed. Springer, New York, 2003.
4. H. Dörrie, 100 Great Problems of Elementary Mathematics. Dover, New York, 1965.
5. A. Emch, A problem in perspective, Amer. Math. Monthly 24 (1917) 379–382, http://dx.doi.org/
10.2307/2973980.
6. H. Eves, A Survey of Geometry. Revised ed. Allyn and Bacon, Boston, 1972.
7. M. Frantz, A car crash solved—with a Swiss army knife, Math. Mag. 84 (2011) 327–338, http://dx.
doi.org/10.4169/math.mag.84.5.327.
8. R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision. Second ed. Cambridge Univ.
Press, New York, 2003.
9. J. L. S. Hatton, The Principles of Projective Geometry Applied to the Straight Line and Conic. Cambridge
Univ. Press, London, 1913.
10. E. Michaelsen, U. Stilla, Pose estimation from airborne video sequences using a structural approach for
the construction of homographs and fundamental matrices, Lecture Notes in Computer Science Vol. 3138.
Ed. by A. Fred, T. Caelli, R. P. W. Duin, A. Campilho, and D. de Ridder, Springer, Berlin, 2004. 486–494,
http://dx.doi.org/10.1007/978-3-540-27868-9_52.
11. D. Pedoe, Geometry and the Visual Arts. Dover, New York, 2011.
12. Problem 2013-2c, Newsletter of the Delft Institute of Applied Mathematics (December 2013) 293,
http://www.nieuwarchief.nl/home/problems/pdf/uitwerking-2013-2.pdf.
13. W. H. Richardson, Projection of a quadrangle into a parallelogram, Amer. Math. Monthly 73 (1966) 644–
645, http://dx.doi.org/10.2307/2314807.
14. D. Row, T. J. Reid, Geometry, Perspective Drawing, and Mechanisms. World Scientific Publishing,
Hackensack, NJ, 2012.
15. K. Shoemake, T. Duff, Matrix animation and polar decomposition, Proc. Conf. Graphics Interface 92
(1992) 258–264.
16. M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis and Machine Vision. Cengage Learning,
Boston, 2014.
17. B. Taylor, New Principles of Linear Perspective: or the Art of Designing on a Plane the Representations
of all sorts of Objects, in a more General and Simple Method than has been done before. London, 1719,
http://echo.mpiwg-berlin.mpg.de/MPIWG:C0RQ3H5B.
18. E. W. Weisstein, Quadrangle—From MathWorld, A Wolfram Web Resource, http://mathworld.
wolfram.com/Quadrangle.html.
19. C. R. Wylie, Jr., Introduction to Projective Geometry. Dover, New York, 1970.
20. X. Zhang, Projection matrix decomposition in AR—A study with Access3D, in Mixed and Augmented
Reality, 2004, Third IEEE and ACM International Symposium on ISMAR 2004. 258–259, http://dx.
doi.org/10.1109/ISMAR.2004.48.
114
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
ANNALISA CRANNELL wishes she had had a course on projective geometry at some point in her life.
Nonetheless, she enjoys sharing the subject with her colleagues in the mathematical community and with her
Franklin & Marshall students.
Franklin & Marshall College, Lancaster PA 17604
annalisa.crannell@fandm.edu
MARC FRANTZ a former painter, is a research associate in mathematics at Indiana University. He loves the
visual approach to mathematics, especially links between mathematics and art.
Indiana University, Bloomington, IN 47405
mfrantz@indiana.edu
116
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
work. The calculations and timings shown in this paper were performed using Mathe-
matica 10.0.2.0 on a 2.8 GHz Macintosh iMac computer.
1. THE METHODS. In this paper, we will look at three particular methods or tech-
niques that are commonly used in experimental math and see examples of applying
each to M ONTHLY problems.
Other computer methods that are useful for many M ONTHLY problems, but not cov-
ered in this paper, are the mechanical summation methods of Gosper, Wilf, Zeilberger,
et al. These are especially useful for problems involving sums with binomial coeffi-
cients. They are illustrated in a very illuminating and entertaining article [21] published
earlier in this M ONTHLY. These methods are often included as part of experimental
math, but they produce both the final answer and the proof, so they are not heuristic in
the same way that methods we consider here are.
Constant recognition. The Inverse Symbolic Calculator Plus (ISC+) [8] is an online
service that attempts to identify a constant, given a good numerical approximation to
the constant. According to its website, ISC+ “uses a combination of lookup tables and
integer relation algorithms in order to associate a closed form representation” with the
given approximation. It is used to identify values that come up in research, such as
definite integrals or infinite series, by calculating them to a high precision (the rule
of thumb is that 15 digits are needed) and asking ISC+ for a closed-form candidate.
Such problems are very common in the M ONTHLY Problems section, and the value
can often be discovered by this method. Even with computers, it is sometimes difficult
to calculate a value to 15 digits, and we will see examples of this in this paper.
Even in the old days, we might have attempted to guess the value of a series by
adding up several dozen terms. If we got a sum of 3.14159, we would probably guess
that the series summed to π and attempt to prove this using known facts about π,
including other series whose value included π. With computers we can get more digits;
if the answer was 3.1415926535897932385, we would be even more confident that
the answer was π and would be willing to work harder to prove this. Plugging in our
20-digit π suspect into ISC+ indeed produces π.
The ISC+ table is enormous, and the lookup method almost always produces a can-
didate if you have enough digits. However, guessing a constant from a high-precision
approximation is far from infallible. We like mathematical problems to have neat
answers. For example, to 30 digits, we have
√
eπ 163
= 262537412640768743.999999999999.
Anyone looking at the right-hand side would guess that it represents an integer, but to
35 digits, we have
√
eπ 163
= 262537412640768743.99999999999925007.
To 42 digits, this agrees with π/8, but in fact, it is not π/8. A collection of even more
spectacular examples of misleading near matches is in [9].
Integer relation detection. The third method is integer relation detection, in which
we seek to express a given constant as a rational linear combination of known con-
stants. An ancient example is the greatest common divisor of two integers, which we
know can be expressed as such a combination: gcd(a, b) = ax + by for some integers
x, y.
The general integer relation detection problem is: Given a set of n numbers ck ,
attempt to find an integerlinear combination of them that is very nearly 0; that is,
find integers ak such that nk=1 ak ck ≈ 0. If successful, and the combination is exactly
0, this means that any of the ck that have a nonzero coefficient can be expressed as
a rational linear combination of the others. Ferguson and Bailey’s PSLQ algorithm
[15] and the Lenstra–Lenstra–Lovász (LLL) lattice reduction algorithm [19] are two
well-known integer relation detection algorithms. Mathematica’s solver is the function
FindIntegerNullVector; the Mathematica documentation does not reveal which
algorithm this uses.
Another (slightly disguised) example of integer relation detection is the question
of whether a given number x is an algebraic number (that is, it is a zero of a polyno-
mial with integer coefficients). We can recast this question as: For some n is there an
integer relation between the numbers 1, x, . . . , x n ? In other words, are there integers
a0 , . . . , an such that an x n + · · · + a0 = 0? If we could show that a mystery number
was a zero of particular polynomial, we would then know a lot about it, even if we
could not get an explicit representation. Take the simple example that we are given a
number x that is approximately
x ≈ 3.146264369941972342329135.
Is x algebraic? (Clearly, the right-hand side is algebraic because it is rational, but the
question is really whether x is the root of a polynomial with small coefficients.) This
can be answered using FindIntegerNullVector and a suitable number of powers of
x (say n = 10). Mathematica also has a function RootApproximant specifically for
answering whether a number is algebraic, and it says that x satisfies
x 4 − 10x 2 + 1 = 0.
118
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
For most purposes, this would be almost as good as an explicit form. In this example,
because the equation is so simple, we can in fact find the explicit form. A few
keystrokes √
in Mathematica
√ gives the roots, and looking at their numerical values
shows x = 2 + 3.
One early and spectacular example of integer relation detection is the Bailey–
Borwein–Plouffe formula for π in base 16 (see, for example, [10, Chapter 2]):
∞
1 4 2 1 1
π= − − − .
i=0
16 i 8i + 1 8i + 4 8i + 5 8i + 6
This formula allows the calculation of any base-16 digit of π with a moderate amount
of effort and without calculating the preceding digits. The formula was hard to dis-
cover but once discovered can be proved easily using only calculus. More examples of
integer relation detection are in an article [4] published earlier in this M ONTHLY.
∞
1
.
n=1
sinh 2n
This series converges extremely rapidly, so it is easy to get a good numerical approx-
imation: The first five terms give about 28 digits of accuracy. Mathematica gives to
15 digits that
5
1
K = = 0.313035285499331.
n=1
sinh 2n
The ISC+ “standard lookup” does not identify this constant, but the “advanced lookup”
yields the transformed value 1/(1 + K ) = tanh(1), in other words, K = 1/ tanh 1 − 1.
We are prompted to conjecture
∞
1 1
n
= − 1, (1)
n=1
sinh 2 tanh 1
which is plausible because of the hyperbolic functions on both sides and checks out
numerically: If we sum the first 10 terms, the two sides agree to about 900 decimals.
This is strong evidence but not a proof; we still use traditional hand methods to get a
proof.
Because the hyperbolic functions have expressions in terms of the exponential func-
tion, we might try expanding both sides of (1) as power series in e−1 and see if they
match. We have on the left, using the geometric series, that
∞
1 ∞
2 ∞
exp(−2n )
= = 2
n=1
sinh 2n n=1
exp(2n ) − exp(−2n ) n=1
1 − exp(−2 · 2n )
∞
∞
=2 exp(−2n (2m + 1)).
n=1 m=0
so the two sides are equal, and our numerically inspired conjecture is proved.
−1 if j | (i + 1)
mi j = ,
0 if j (i + 1)
⎛ ⎞
−1 −1 0 0 0
⎜ −1 0 −1 0 0 ⎟
⎜ ⎟
M6 = ⎜ −1 −1 0 −1 0 ⎟. (2)
⎝ −1 0 0 0 −1 ⎠
−1 −1 −1 0 0
We work out the first few terms as examples and get that for n = 2 through n = 25
the values of det Mn are
A number theorist might recognize this sequence, but anyone can ask the OEIS about
it. One of the OEIS hints is “enter about 6 terms, starting with the second term,” so we
ask about the subsequence −1, 0, −1, 1, −1, 0. OEIS immediately replies with 1,399
matches, of which the one rated most relevant is its sequence A008683, the Möbius
function μ(n). This sequence in fact matches all 24 of our calculated values, so we
conjecture det Mn = μ(n).
The matrices Mn have an obvious recursive structure in the sense that m i j does
not depend on n, and so the upper left (k − 1) × (k − 1) submatrix is always Mk .
The determinants have a further recursive structure: If we expand by minors along
the bottom row, the minor determinant for column j is ± det M j . This is because, in
forming the minor, the M j at the upper left is preserved while the −1 terms in the
superdiagonal slide into the diagonal. For example, the minor for column 3 and row
5 in (2) is
120
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
−1 −1 0 0
−1 0 0 0
−1 −1 −1 .
0
−1 0 0 −1
In this example, M3 is in the upper left, and the matrix in the lower right has all −1
along the diagonal and all 0 above the diagonal.
The minor thus has determinant (−1)n− j det M j . If we make the convention that
det M1 = 1 for the empty matrix M1 , this evaluation is still true for j = 1. Therefore,
expanding det Mn by minors along the bottom row gives us a recurrence: We have for
n > 1 that
det Mn = (−1)(−1)n−1+ j−1 (−1)n− j det M j = − det M j .
j<n, j|n j<n, j|n
This rearranges as
det M j = 0.
j|n
1, n = 1;
μ(d) =
d|n
0, n > 1.
(This is the first formula in the OEIS entry A008683.) This is the same recurrence
satisfied by det Mn , and det Mn and μ(n) have the same starting value of 1, so we have
by induction that det Mn = μ(n).
50
an2 = 9.869604401089358618834491,
n=1
Surprisingly, the same π 2 values turn up! We still do not know where the π 2 comes
from, but comparing the tables, we conjecture that
∞
an2 = lim 2n an + simple function of a.
n→∞
n=1
In fact, this is easy to prove now that we have thought of it. Rearrange, square, and
rearrange the recurrence (3) to get
2
an+1 = 2n+1 an+1 − 2n an .
∞
∞
an2 = a12 + 2
an+1 = a12 + lim 2n an − 2a1 = lim 2n an + a 2 − 2a,
n→∞ n→∞
n=1 n=1
so the “simple function” is a 2 − 2a, and this gives the right answer for the four
examples we tried. (We are assuming temporarily that lim 2n an exists; this will be
proved later when we evaluate it.)
So we have reduced the sum problem to an asymptotic problem for the general term,
which should be easier. The recurrence (3) has a lot of 2n in it, and it should be easier
to think about if we reparameterize to get rid of them. If we define bn = an /2n , we get
2n+1 bn+1 = 2n − 2n (2n − 2n bn ) = 2n − 2n 1 − bn ,
which rearranges to
√
1− 1 − bn
bn+1 = with b1 = a/2, so 0 ≤ b1 ≤ 1. (4)
2
122
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
This is much simpler: Not only is the 2n gone, but each b value depends only on
the previous
value
√ not on n. That is, it is an iteration, bn+1 = f (bn ) where
and
f (x) = 12 1 − 1 − x . This is attractive not only because it is simpler but also
because there is a systematic (but complicated) theory to get asymptotic values of
sequences defined by an iteration (see [14, Chapter 8]).
In our case, rather than apply the systematic theory, we will make an observation
that leads to a quick solution. Remembering the π 2 and the square roots, we might be
reminded of the half-angle formulas for the trigonometric functions, of which the most
common are
θ 1 + cos θ θ 1 − cos θ
cos = and sin = .
2 2 2 2
Neither of these has exactly the same form as our recurrence, but if we square the
second one, we can get a half-angle formula for sin2 that does have the right format,
namely
θ 1 − cos θ 1− 1 − sin2 θ
sin 2
= = .
2 2 2
Therefore, we define
θ1 = arcsin b1 and θn+1 = 12 θn
θ1
an = 2n sin2
2n−1
is the solution to the recurrence (3). Then using limx→∞ x 2 sin2 (c/x) = c we calculate
θ1 a
lim 2 an = lim 2 sin n−1 = 4θ12 = 4 arcsin2
n 2n 2
.
n→∞ n→∞ 2 2
∞
kk 1
k
− √ . (5)
k=1
k!e 2πk
Today, Mathematica can identify the sum immediately and directly, but back in 2000,
when this problem was posed, Mathematica was not as smart. Let’s see how experi-
mental math can help us identify the sum.
The sum converges slowly (the general term is about 1/k 3/2 ), so brute force does not
work. There is a very precise asymptotic formula (an extension of Stirling’s formula;
see, for example, [24, p. 140, formula 5.11.1]) for ln k!, which begins
1 1 1
ln k! = k + 12 ln k − k + 12 ln(2π) + − 3
+ − ··· .
12k 360k 1260k 5
We therefore have
kk 1 1 1 1
= √ exp − + − + · · ·
k!ek 2πk 12k 360k 3 1260k 5
1 1 1 139 571
=√ 1− + + − + · · · .
2πk 12k 288k 2 51840k 3 2488320k 4
The first term of this will cancel with the other term in the sum (5), and we can use
as many of the remaining terms as we think is useful. Using three terms reduces the
required numerical work to a reasonable level for a computer. Write
1 1 139
ak = − + 2
+
12k 288k 51840k 3
so that we have
∞ ∞
1 ak
∞
kk 1 kk 1
− √ = − √ (1 + a k ) + √ √
k=1
k!ek 2πk k=1
k!ek 2πk 2π k=1 k
∞
kk 1
= −√ (1 + ak ) (6)
k=1
k!ek 2πk
1 1 3 1 139
+√ − ζ(2) + ζ( ) +
5
ζ( ) ,
7
(7)
2π 12 288 2 51840 2
where ζ is the Riemann zeta function. The zeta series converge slowly too, but a lot is
known about them and how to calculate them more quickly, and we can let Mathemat-
ica figure them for us. To 25 decimals, we get
(7) = −0.08378540362877196918178047.
124
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
571 ∞
1 571 1 1
√ ≈ √ < 10−4 7/2 .
2488320 2π k=N k 9/2
(7/2)2488320 2π N 7/2 N
(5) ≈ −0.0840695087276559961999.
This explains the second part of the answer, so we would now need to show
n
kk 2 √ 2
lim k
−√ n =− .
n→∞
k=1
k!e 2π 3
The sum here is also a limiting case of a known function, in this case the Lambert
W -function (see, for example, [24], section 4.13, p. 111), whose power series expan-
sion is
∞
k k−1 k
W (z) = (−1)k−1 z .
k=1
k!
Formally, we want to study the derivative at z = −1/e, but this point is on the circle
of convergence and the series does not converge there, so we have to work inside the
circle and take a limit. This can be done by appealing to properties of this function;
the details are in the published solution [18]. Another experimental math treatment of
this problem is in [10, pp. 81–85].
∞ 2
∞ (−1)k−1
(−1) n
. (8)
n=0 k=1
n+k
∞ ∞
n a0
(−1)k ak = .
k=0 n=0
2n+1
Our particular example is worked out in [17, p. 246, Example 1], where we find
∞
(−1)k−1
∞
1
= n+k . (9)
k=1
n+k k=0
2k+1 (n + k + 1) k
The right-hand side converges quickly, and to get 15 decimals, we only need about 50
terms.
The inner sum is about 1/(4n 2 ), and the outer sum is an alternating series, so we
would need about 107 or 108 terms to get 15 decimals, and each of those has an inner
sum of 50 terms. That is a lot of terms, and we need a better way.
We will attempt to get a good value with much less work by using the following
observation. We know that the partial sums of an alternating series lie alternately above
and below the series value (and that the error is less than the first omitted term). Empir-
ically, it is further true that for series with slowly and smoothly decreasing terms, the
series value is almost exactly halfway between two successive partial sums (or what is
the same, the series value is almost exactly the partial sum plus half the first omitted
term). To take a simple example, ln 2 = 0.693147. The first 100 terms of the series
ln 2 = ∞ k=1 (−1) k−1
/k give a poor approximation of 0.688172, but adding half the
next term gives the much better approximation 0.693123. (This heuristic observation
has been worked out in more generality and detail as the method of “repeated averag-
ing”; see, for example, [11, p. 72] and [12, p. 278].)
Our method is to truncate the outer sum of (8) after 100,000 terms, and estimate
each term (and the first omitted term) using Euler’s transformation (9) with 50 terms.
That is, write
50
1
dn = n+k ,
k=0
2k+1 (n + k + 1) k
126
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
and sum all the included terms and add half the next term, giving to 25 digits
5 −1
10
1 2
(8) ≈ (−1)n dn2 + d100000
n=0
2
= 0.411233516699556597589303
+ 0.000000000012499875000313
= 0.411233516712056472589616.
This takes about 40 seconds in Mathematica, which is reasonable. The ISC+ (with
advanced lookup) identifies this as
π2
= 0.4112335167120566091181038,
24
especially because of the terms in the outer sum being very nearly 1/(4n 2 ). However,
naively applying this estimate to the sum gives ∞
2
n=1 (−1)
n−1 1
4n 2
= 18 ζ (2) = π48 , only
half the calculated value, and it is not clear how ζ (2) might be generated.
However, thinking about the double (or triple) series and rummaging through zeta
function lore might make us think of Tom Apostol’s evaluation [1] of ζ (2) using the
double integral
1 1
1
ζ (2) = d x d y.
0 0 1 − xy
(This method appeared earlier as an exercise in LeVeque [20, Section 6-10, exercise 6,
p. 122], and later Apostol independently rediscovered it and popularized it.) Apostol
then used an extremely clever change of variables to evaluate the integral. It is easy to
turn our sum into a double integral, too, and it looks a little like Apostol’s:
∞ 2 2
∞ (−1)k−1
∞ 1
∞
(−1)n = (−1)n (−1)k−1 x n+k−1 d x
n=0 k=1
n+k n=0 0 k=1
∞ 1 2
xn
= (−1) n
dx
n=0 0 1+x
∞
1 1
(−1)n x n y n
= dx dy
n=0 0 0 (1 + x)(1 + y)
1 1
1
= d x. (10)
0 0 (1 + x)(1 + y)(1 + x y)
Somewhat miraculously, Mathematica knows the value of this integral: π 2 /24, which
confirms our guess. If we trust Mathematica, our job is done!
∞ x
ln(1 − t)
Li2 (x) = x /n = −
n 2
dt. (11)
n=1 0 t
We can verify the indefinite integral by hand by differentiating, but it is easier to work
forward now that we have the hint of using Li2 . We expand the integrand of (10) in
partial fractions twice to get
1 1
1
dx
0 0 (1 + x)(1 + y)(1 + x y)
1 1
1 1 x
= − dy dx
0 1−x 1+y 1 + xy
2
0
1
1
= (ln 2 − ln(1 + x)) d x
0 1−x
2
1 1 1 1 1 1
= (ln 2 − ln(1 + x)) d x + (ln 2 − ln(1 + x)) d x.
2 0 1+x 2 0 1−x
The first integral is easily evaluated as 12 ln2 2. To evaluate the second integral, we make
the change of variables x = 1 − 2t to get
1 0
1 1
(ln 2 − ln(1 + x)) d x = (− ln(1 − t)) (−2 dt)
0 1−x 1/2 2t
π2 1
= Li2 ( 12 ) − Li2 (0) = − ln2 2,
12 2
where we have used the value Li2 (0) = 0 from the definition (11), and the value
Li2 ( 21 ) = π 2 /12 − 12 ln2 2 that comes from setting x = 12 in the functional equation
(see [24, p. 611, formula 25.12.6]):
1 2
Li2 (x) + Li2 (1 − x) = π − (ln x)(ln(1 − x)) for 0 < x < 1.
6
Combining this with the first integral, the ln2 2 terms cancel and we are left with
(10) = π 2 /24.
128
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
7. A RAPIDLY CONVERGING PRODUCT. M ONTHLY problem 11677 [25] asks
for an evaluation of
∞
√ √
P= 1 + 2e−nπ 3 cosh nπ/ 3 .
n=1
so we get about 1.5 significant digits for each term we take in the product.
Taking the first 15 terms and calculating to 25 digits, we get
P ≈ 1.028032541689576770462884.
But now we hit a snag: We ask ISC+ about this, and it says it found nothing, both in the
standard lookup and the advanced lookup. (We asked on March 18, 2016; the database
is updated continually, and ISC+ may someday be able to identify this constant.)
Because the item we seek is a product,
we wonder if we would have better luck
working with its logarithm, ln P = ∞ n=1 ln(1 + 2an ). Taking the first 15 terms of this
and calculating to 25 digits, we get
ln P ≈ 0.02764682187200888558353500.
This has the same problem, though: ISC+ cannot find it.
The ISC+ lookups almost always work for M ONTHLY problems, perhaps because
those usually have neat answers, but this is an exception, and we look at other
methods to identify the number. Testing whether it is an algebraic integer using
RootApproximant does not produce any useful answers. It does misidentify the
25-digit version of ln P as
√
5657351 − 29079344023205
,
9578834
which agrees to 24 digits but is not correct. We will try integer relation detection.
There are two challenges to using integer relation detection. The first is that often
the desired number must be calculated to a very high precision, sometimes to hun-
dreds of digits. For our example, this is not much of a problem because the product
converges so rapidly. The other problem is guessing which constants should go into a
linear combination to get the desired number. These guesses are based on experience
and similar expressions for which we know the constants. In M ONTHLY problems we
are not given the context, and we may not have any experience with the particular
expressions, so guessing the constants may be especially challenging.
We are going to work with ln P again. We do not have much idea what constants to
use, but we will guess
√ that we should
√ include the constants that appear explicitly in the
product,
√ namely
√ π, 3, and π 3 and their logarithms, ln π and ln 3. (Do not use both
π 3 and√ π/ 3 because one is a rational multiple of the other, and do not use both ln 3
and ln 3, for the same reason.) A good rule of thumb when looking for a logarithm
in other words
√ √
P = eπ 3/18 4
3. (12)
We test this against the product with 200 terms and find they agree to about 317 digits,
so we conjecture that this is the correct value of the product.
Unfortunately, the explicit answer does not seem to point to any method of proof.
One oddity that might catch our eye√ is the 18th root; that is, in the product and the final
answer, we have a term with exp(π 3), but in the final answer, it appears to the 1/18
power. If we know a lot about special functions, this might remind us of the modular
functions and in particular of the Dedekind eta function, which includes a 1/12 power
and that appears in a discriminant formula to the 24th power:
∞
η(τ ) = eπiτ/12 1 − e2πinτ , Im τ > 0.
n=1
This turns out to be the key observation, as it is possible to express the given product
in terms of a ratio
√ of eta function values, and a functional equation allows us to express
4
the ratio as 1/ 3. The complete solution is in [25].
ACKNOWLEDGMENT. Many thanks to the referees for their thorough reviews and helpful comments.
REFERENCES
1. T. M. Apostol, A proof that Euler missed: Evaluating ζ (2) the easy way, Math. Intelligencer 5 no. 3
(1983) 59–60, http://dx.doi.org/10.1007/BF03026576.
2. D. H. Bailey, J. M. Borwein, Experimental Math Website, http://experimentalmath.info/.
3. D. H. Bailey, J. M. Borwein, N. J. Calkin, R. Girgensohn, D. R. Luke, V. H. Moll, Experimental Mathe-
matics in Action. A K Peters, Wellesley, MA, 2007.
4. D. H. Bailey, J. M. Borwein, V. Kapoor, E. W. Weisstein, Ten problems in experimental mathematics,
Amer. Math. Monthly, 113 (2006) 481–509, http://dx.doi.org/10.2307/27641975.
5. D. Beckwith, L. Zhou, A determinant by Möbius inversion: 11179. Amer. Math. Monthly, 114 (2007)
550, http://www.jstor.org/stable/27642263.
6. J. M. Borwein, D. H. Bailey, Mathematics by Experiment: Plausible Reasoning in the 21st Century.
Second ed. A K Peters, Wellesley, MA, 2008.
7. J. M. Borwein, D. H. Bailey, R. Girgensohn, Experimentation in Mathematics: Computational Paths to
Discovery. A K Peters, Natick, MA, 2004.
8. P. Borwein, J. Borwein, S. Plouffe, Inverse Symbolic Calculator Plus at University of Newcastle
(Australia), maintained at the University of Newcastle, https://isc.carma.newcastle.edu.au/.
9. J. M. Borwein, P. B. Borwein, Strange series and high precision fraud. Amer. Math. Monthly, 99 (1992)
622–640, http://dx.doi.org/10.2307/2324993.
10. J. M. Borwein, K. Devlin, The Computer as Crucible: An Introduction to Experimental Mathematics.
A K Peters, Wellesley, MA, 2009.
11. G. Dahlquist, Å. Björck, Numerical Methods. Dover Publications, Mineola, NY, 2003.
130
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
12. G. Dahlquist, Å. Björck, Numerical Methods in Scientific Computing. Vol. I. Society for Industrial and
Applied Mathematics, Philadelphia, 2008.
13. P. P. Dályay, H. Widmer, Evaluate a series: 11604, Amer. Math. Monthly, 120 (2013) 476, http://dx.
doi.org/10.4169/amer.math.monthly.120.05.469.
14. N. G. de Bruijn, Asymptotic Methods in Analysis. Corrected reprint of the Third (1970) ed. Dover Publi-
cations, New York, 1981.
15. H. R. P. Ferguson, D. H. Bailey, S. Arno, Analysis of PSLQ, an integer relation finding algorithm. Math.
Comp., 68 (1999) 351–369, http://dx.doi.org/10.1090/S0025-5718-99-00995-3.
16. O. Furdui, B. Bradie, An alternating sum of squares of alternating sums: 11682, Amer. Math. Monthly,
122 (2015) 78–79, http://dx.doi.org/10.4169/amer.math.monthly.122.01.75.
17. K. Knopp, Theory and Application of Infinite Series. Second ed. Dover Publications, New York, 1990.
18. D. E. Knuth, C. C. Rousseau, A Stirling series: 10832, Amer. Math. Monthly, 108 (2001) 877–878,
http://dx.doi.org/10.2307/2695574.
19. A. K. Lenstra, H. W. Lenstra, Jr., L. Lovász, Factoring polynomials with rational coefficients, Math. Ann.,
261 (1982) 515–534, http://dx.doi.org/10.1007/BF01457454.
20. W. J. LeVeque, Topics in Number Theory. Vol. I. Addison-Wesley Publishing Co., Reading, MA, 1956.
21. I. Nemes, M. Petkovšek, H. Wilf, D. Zeilberger, How to do M ONTHLY problems with your computer,
Amer. Math. Monthly, 104 (1997) 505–519, http://dx.doi.org/10.2307/2975078.
22. OEIS Foundation Inc., The On-Line Encyclopedia of Integer Sequences (2011), http://oeis.org/.
23. H. Ohtsuka, Problem proposed: 11853, Amer. Math. Monthly, 122 (2015) 700, http://dx.doi.org/
10.4169/amer.math.monthly.122.7.700.
24. NIST Handbook of Mathematical Functions, Eds. F. W. J. Olver, D. W. Lozier, R. F. Boisvert, C. W. Clark.
Cambridge Univ. Press, New York, 2010. Also online at http://dlmf.nist.gov.
25. A. Stadler, R. Boukharfane, Dedekind η function disguised: 11677. Amer. Math. Monthly, 121 (2014)
951–952, http://dx.doi.org/10.4169/amer.math.monthly.121.10.946.
26. E. C. Titchmarsh, The Theory of the Riemann Zeta Function. Second ed. Ed. and with a preface by
D. R. Heath-Brown. The Clarendon Press, Oxford Univ. Press, New York, 1986.
ALLEN STENGER is a retired software developer and current math hobbyist. He received a B.S. in math-
ematics from Emory and an M.A. in mathematics from Penn State. He is an editor of the Missouri Journal
of Mathematical Sciences. His mathematical interests are number theory and classical analysis. This paper is
expanded from a talk he gave at the 2013 MAA Southwestern Section meeting in Socorro, New Mexico.
2892 95th St., Boulder, CO 80301-4916
StenBiz@gmail.com
Abstract. An additive system is a collection of sets that gives a unique way to represent either
all nonnegative integers, or all nonnegative integers up to some maximum. A structure theorem
of de Bruijn gives a certain form for an additive system of infinite size. This form is not, in
general, unique. We improve de Bruijn’s theorem by finding a unique form for an additive
system of arbitrary size. Our proof gives a concrete construction that allows us to test easily
whether a collection of sets is an additive system. We also show how to determine most of the
structure of an additive system if we are only given its union.
1, 2, 3, 4, 5, 6, 7, 8, 9
10, 20, 30, 40, 50, 60, 70, 80, 90
100, 200, 300, 400, 500, 600, 700, 800, 900
1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000
The key point is that any number can be written as this type of a sum, and this can
be done in one and only one way. This is the mathematical object we study in this
paper.
Definition. An additive system of infinite size is a collection of sets such that every
number can be written, in one and only one way, as a sum of numbers from the collec-
tion, with no two numbers selected from the same set.
http://dx.doi.org/10.4169/amer.math.monthly.124.2.132
MSC: Primary 11B13, Secondary 11A63
132
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
An important variation is when the sums only go up to a certain maximum. In the
next definition, note that we are looking at a finite collection of finite sets.
Definition. An additive system of size z is a collection of sets such that every number
less than z can be written, in one and only one way, as a sum of numbers from the
collection, with no two numbers selected from the same set. We also require that no
number greater than or equal to z can be written as such a sum.
In Figure 1, each selected number determines one nonzero digit of the sum, which is
why the collection is an additive system (of infinite size). If instead we limit ourselves
to the first three sets from Figure 1, this would be an additive system of size 1000
because the sums we get go from 0 to 999. (We justify 1000 as the size, rather than
999, by noting that the numbers from 0 to 999 are the 1000 smallest numbers.)
Each set from Figure 1 consists of all positive numbers with a single designated
nonzero digit. The collection is an additive system because every number can be writ-
ten uniquely in the decimal (base 10) system. A binary (base 2) variation would be the
collection of the sets {1}, {2}, {4}, {8}, {16}, {32}, . . . . This is another additive system
of infinite size. If we took, say, only the first four of these sets, we would instead get
an additive system of size 16.
Now consider the collection of five sets from Figure 2. It may not be as obvious
that this is another additive system of infinite size, because it is something of a hybrid
between base 5 and binary. Things become clearer if we consider a cashier using U.S.
currency. The cashier has five drawers of coins, each of which has an endless sup-
ply of a certain type of coin. The first drawer contains pennies (each worth 1 cent),
the next contains nickels (each worth 5 cents), and the other three drawers contain
quarters, half-dollars, and dollar coins (each worth 25 cents, 50 cents, and 100 cents,
respectively). If the cashier wishes to get a specified amount of money as efficiently
as possible, the value of the pennies is an element of the first set, {1, 2, 3, 4}, the value
of the nickels is in the second set, {5, 10, 15, 20}, and so on. (We measure value in
cents here.) For example, to make $27.43, the cashier uses $27 in dollar coins, $0.25
in quarters, $0.15 in nickels, and $0.03 in pennies. In terms of our additive system,
2743 = 3 + 15 + 25 + 2700. This is why the collection from Figure 2 is another addi-
tive system of infinite size. Taking only the first three sets would give an additive
system of size 50 because our cashier could uniquely create any value from 0 to 49.
1, 2, 3, 4
5, 10, 15, 20
25
50
100, 200, 300, 400, 500,
Figure 2. An additive system built on U.S. coins.
Exercise. Consider taking the cashier above and using dimes (each worth 10 cents)
instead of half-dollars. The values used by each coin type, in increasing value, would
be according to the collection from Figure 3. Show that this collection is not an additive
system by showing that 26 can be written in two ways.
In Figures 1 and 2, each set consists of consecutive positive multiples of its smallest
element. Furthermore, while the smallest element of the first set is 1, the smallest
element of every other set is the first missing multiple not found in the previous set
of the list. What we are saying, more or less, is that these additive systems generalize
place values. Not all additive systems follow this pattern, but those that do are called
British number systems. (Recall that our full definitions will come later.) The collection
from Figure 3 is not a British number system since 25 is not the first multiple of 10 that
fails to be in {10, 20}. Of course, the collection from Figure 3 is not even an additive
system. There do exist additive systems that are not British number systems. Let us
look at a couple of examples.
Figure 4 involves some unusual notation. The idea here is that we have two sets. The
first consists of all positive numbers with all “even” digits zero. The second consists of
all positive numbers with all “odd” digits zero. More precisely, the numbers indicated
by · · · 0 0 0 0 are those whose 10s digit, 1000s digit, 100 000s digit, etc. are
all zero; each “” can be replaced by any digit. So this set contains 408 and 901 and
20 407. With · · · 0 0 0 0, on the other hand, we have numbers in which the
1s digit, the 100s digit, the 10 000s digit, etc. are all zero. So 50 and 8090 are in the
second set, as is 100 020. We allow any to be replaced by zero, as long as they are
not all replaced by zero, since we specifically exclude zero from each of the two sets.
(We instead could have described the first set as consisting of the “full word” nonzero
integers that are “compatible” with the “partial word” · · · 0 0 0 0 ; the second set
is based on the partial word · · · 0 0 0 0. See, e.g., [1] for a complete explanation
of this terminology and its notation, which is close to the notation we use.)
0 0 0 0 0
0 0 0 0 0
Figure 4. An additive system that is not a British number system; either even or odd digits are zero.
Although 4567 is in neither set, we can write it as a sum with one number from each
set: 4567 = 507 + 4060, since 507 is in the first set and 4060 in the second. Similarly,
7 672 091 = 7 070 001 + 602 090. These are the only ways to write this type of sum
for 4567 or 7 672 091. So, reasoning as we did for Figure 1, this is an additive system
of infinite size, because the digits tell us the one and only one way to write a sum
for a given number. The collection is not a British number system because neither set
consists only of consecutive multiples of its smallest element. Also note that neither
of the sets from Figure 4 is, by itself, an additive system.
The collection from Figure 5 is another example. Reasoning as with the collec-
tion from Figure 4, this collection is not a British number system. To see that it is
an additive system, compare it to the additive system from Figure 2. What if the
cashier of that example has only three drawers? The first drawer has pennies and half-
dollars, the second has nickels, and the third has quarters and dollar coins. When the
cashier continues to produce specified amounts of money as efficiently as possible,
each set from Figure 5 gives the value that can come out of one drawer. For example,
134
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
whereas with Figure 2, 794 is the sum 4 + 15 + 25 + 50 + 700, with Figure 5, we
have 794 = 54 + 15 + 725. That is, whereas the cashier originally created $7.94 with
$0.04, $0.15, $0.25, $0.50, and $7.00 from the five drawers, now from the three draw-
ers $0.54, $0.15, and $7.25 are used.
We can always form a new additive system from another by “associating” some
sets together. This is how we created Figure 5 from Figure 2. To be more specific,
associating sets means replacing them with a single set. We create the new set by
taking the positive sums that we can get by using only the replaced sets. For example,
the first set from Figure 5 consists of the sums using the first set (pennies) and fourth
set (half-dollars) from Figure 2. Figure 4 is another example of an association. We
created it from Figure 1 by associating every other set together. In general, when we
associate sets together, we call the new additive system a quotient.
Some quotients are more useful than others, and this turns out to be a crucial obser-
vation. If we begin with a British number system, then it is not so useful to associate
sets that are adjacent. For example, if from Figure 2 we associated quarters and half-
dollars, two adjacent denominations, then the sets {25} and {50} would be replaced by
{25, 50, 75}. The new quotient is not especially interesting because the new collection
is just another British number system, namely the one in which our cashier has no
half-dollars. The useful quotients are those in which we only associate sets that are not
adjacent, and we define a term for this. We say such a quotient is mixed. For example,
Figures 4 and 5 are mixed quotients of the additive systems from Figures 1 and 2,
respectively.
We can now state our fundamental structure theorem.
As mentioned above, this is similar to a theorem of de Bruijn [2], but ours goes
further because de Bruijn’s form is not unique. We get uniqueness by using slightly
different definitions. As in Figure 2, our British number systems can include an infi-
nite set, provided that it is the final one. On the other hand, in de Bruijn’s definition,
every British number system is a collection of finite sets, such as in Figure 1. Our
uniqueness also depends on limiting our quotients to ones that are mixed, something
that was unnecessary for de Bruijn’s result. Also note that de Bruijn’s theorem is only
for additive systems of infinite size. For additive systems of finite size, Theorem 1
is essentially present in Munagi’s work [3], which, in turn, depends on de Bruijn’s
theorem.
After giving his key lemma, de Bruijn states, “The theorem easily follows by
repeated applications of the following lemma.” More work is needed, however, to
prove the theorem from this lemma, and this is what Nathanson recently provided
in [4]. Nathanson’s proof, however, may be hard to apply to a particular additive
system, such as the collection from Figure 5. We avoid this problem by giving our
construction first, on its own. Then we prove that we have constructed the unique form
for the original additive system. It is also worth noting that our proof is shorter than
Nathanson’s.
Theorem 2. Suppose two additive systems have equal unions. If either system has
infinite size or if the two systems have equal size, then the additive systems are equal.
The hypotheses for Theorem 2 may seem a bit arbitrary, and indeed, we will later
weaken them. To see that different additive systems can have the same union, consider
the following.
Exercise. Find two distinct British number systems with union {1, 2, 3, 4, 5, 10, 15}.
Why does this not contradict Theorem 2?
As we will see in Section 6, given an additive system, or even just its union, there is
an easy way to determine if another additive system has the same union. It turns out,
however, that there never will be a third additive system with that union.
Before we begin the next section, let us discuss a couple of ways in which our
approach diverges from the work that others have done. First, our definition of an
additive system (that shortly we make more precise) is actually new, because our col-
lections contain only positive numbers. In earlier definitions, such sets always contain
zero. Rather than considering the sums as we described them above, those definitions
consider taking exactly one element from each set, with finitely many nonzero. Our
equivalent definition omits all zeros because we find it easier to discuss a sum t + u
rather than something like t + u + 0 + 0 + 0 + 0 + · · · , where we must include a zero
for each set we are not using.
Second, we should mention our choices for terminology. Unfortunately, the vocab-
ulary already used in this subject has been inconsistent. Of the terms we use, some
136
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
already exist and others we invented. Our goal has been to select terms to be both
appropriately descriptive and concise.
Our use of additive system comes from Nathanson [4], who also uses unique rep-
resentation system. Such a collection is called a number system by de Bruijn [2] and
a complementing system of subsets by Munagi [3]. Munagi uses the term usual to
describe the additive systems that de Bruijn and Nathanson call British number sys-
tems. As mentioned above, our British number systems are related but somewhat dif-
ferent. Finally, our quotient is what de Bruijn and Munagi call a degeneration, but
Nathanson calls a contraction. We regret introducing yet another term, but feel that
the benefits are worth it. It is common for equivalence relations (on the sets of a col-
lection, in this case) to give rise to a quotient, one for each equivalence class. That
is exactly what happens here. For readers who know about such things, our defini-
tion indeed meets the category-theoretic requirements of a quotient, provided that we
define a morphism between two additive systems to be a map between the collections
in which each set of one collection goes to a set that contains it.
We now start again from the beginning, adding in the details we have omitted so
far.
This definition does not distinguish between ordered collections, e.g., (Ai )0≤i<I ,
and unordered collections, e.g., {As }s∈S .
The empty collection is the unique additive system of size 1. The reader may wish to
verify that there are unique additive systems of size 2 and 3, but two additive systems
of size 4.
We can rephrase this definition as follows. A is an additive system of size z if and
only if the operation of summation is a bijective map from the set of A-samplings to
the set of nonnegative integers less than z.
We write |A| for the size of an additive system A.
Here is one way in which we can create an additive system from a smaller one.
Here is what happens if we begin with the empty collection and use Proposition 3
I times.
In a British number system B = (Bi )0≤i<I , the set Bi−1 is always finite for 1 ≤ i <
I , because otherwise min Bi could not satisfy the final condition above. For I finite,
however, it is possible to have B I −1 infinite, as in Figure 2.
Note that |B| = ∞ if and only if either I = ∞, or I < ∞ and B I −1 is an infinite set.
It also turns out that British number systems are characterized by (2) of Proposition 4,
up to ordering. See Theorem 8.
Proof. Define bi = min Bi for 0 ≤ i < I . Note that we always have b0 = 1, and that
if I = ∞, then limi→∞ bi = ∞ = b I .
For 0 ≤ j ≤ I and j < ∞, define B j = (Bi )0≤i< j . Then B j is an additive system
of size b j . This is trivial for j = 0, and then follows by induction using Proposition 3.
This completes (1) if I < ∞.
For the rest of (1) and also for (2), consider a nonnegative integer n < b I . Fix j as
small as possible such that n < b j . Notice that j < ∞. Since the B-samplings with
sum n are exactly the B j -samplings with sum n, exactly one B-sampling σ has sum
n. This finishes (1) if I = ∞. Finally, for (2), suppose σ is nonempty, so that j ≥ 1
and n ≥ b j−1 . Apply Proposition 3 with A = B j−1 and B = B j−1 to get that max σ
is the largest element of ∪B j that is less than or equal to n. Since every element of
∪B \ ∪B j is larger than n, max σ is also the largest element of ∪B that is less than
or equal to n.
Given a British number system B and a positive integer n < |B|, the following
“greedy” algorithm allows us to find the B-sampling that has sum n. Say the desired
B-sampling is {s1 , s2 , . . . , st } with s1 > s2 > · · · > st . By (2) of Proposition 4, s1 is
the largest element of ∪B that is less than or equal to n. Having found s1 , . . . , si , either
138
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
s1 + · · · + si = n, in which case i = t, or we apply the same result to see that si+1 is
the largest element of ∪B that is less than or equal to n − (s1 + · · · + si ).
This statement, once proved, also allows us to determine whether the arbitrary col-
lection A is an additive system.
One direction is trivial, so assume A is an additive system. Since Bi
= B j if and
only if Ai = A j , Bi
→ Ai gives a well-defined, one-to-one map from B/ ∼ to A. We
must show that this map is onto and that Bi
= Ai for all i. Being additive systems,
A and B/ ∼ are each collections of disjoint nonempty sets of positive integers. Thus,
it is enough to show that the following holds for all positive integers n.
Let N be a positive integer, and assume this statement for all positive integers n < N .
One consequence of this inductive hypothesis is that if we have a set of positive inte-
gers less than N , then the set is an A-sampling if and only if it is a (B/ ∼)-sampling.
In order to reduce to N < |B/ ∼|, we show that if N = |B/ ∼|, then |A| =
|B/ ∼|. This implies our induction statement for all n ≥ N , since no such n is in any
set of A or of B/ ∼.
So suppose N = |B/ ∼|. Every positive integer n < N is a (B/ ∼)-sum, and thus,
by our induction hypothesis, also an A-sum, so |A| ≥ |B/ ∼|. If |A| > |B/ ∼| = N ,
then N is the sum of an A-sampling σ , which leads to the following contradiction.
By (2), N is not in any set of A (since |B| = |B/ ∼|). Thus, σ contains at least
two elements. But then every element of σ would be less than N , so our induction
140
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
hypothesis would imply that σ is a (B/ ∼)-sampling, contradicting N = |B/ ∼|. So
we assume N < |B/ ∼|.
If N is an element of ∪B, say N ∈ Bi , then N is both in Ai , by (1), and also in Bi
,
by the definition of a quotient. Since this is what we need to show, we also assume N
is not in ∪B.
Let σ N be the (B/ ∼)-sampling with sum N . If σ N has more than one element,
then, as above, by our induction hypothesis, σ N also is an A-sampling. In this case, N
is in no set of A or of B/ ∼. Thus we have reduced to our final case: N is not in ∪B
σ N = {N } is the (B/ ∼)-sampling with sum N ; say N is an element of some set
and
B j of B/ ∼. We must show that N is in A j .
If B is a British number system and N ∈ ∪B, then B<N also satisfies the definition
of a British number system. By Proposition 4, |B<N | = N . This is a special case of
(1) implying (2) in the following result.
Proof. Since A is an additive system, every nonnegative integer less than N is the sum
of a unique A-sampling, which a priori is an A<N -sampling. Furthermore, since {N }
is the only A-sampling with sum N , we see that N is not an A<N -sum. Thus A<N
is an additive system if and only if all A<N -sums are less than N , and in this case
|A<N | = N , so (2), (3), and (4) are equivalent.
Write A = B/ ∼ as a mixed quotient. In general, a B<N -sum may not be an A<N -
sum. For example, consider the B<52 -sum 54 when B and A are the collections from
Figures 2 and 5, respectively. However, every A<N -sum is always a B<N -sum by the
definition of a quotient. Thus, if (1) holds, then an A<N -sum is less than N because, as
mentioned after the definition of truncation, B<N is a British number system of size
N . This gives us (4).
On the other hand, if (1) fails, then by Lemma 6, we can find an A<N -sampling
{t, u} such that N < t + u, which means that (4) fails.
We can now characterize which additive systems are British number systems.
Proof. Write A = B/ ∼ as a mixed quotient. Then (1) holds if and only if ∼ is trivial
in the sense that A = B, except for ordering. Thus (1) is equivalent to (2) by Proposi-
tion 7 and the remark that follows it.
142
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Trivially (1) implies (3), and we proved that (1) implies (4) in Proposition 4.
Suppose (1) does not hold. We could use Lemma 6 to show that (3) and (4) do not
hold, but we prefer to prove this directly. Write B = (Bi )0≤i<I , and define bi = min Bi .
There exist j < k such that B j ∼ B k ; since
∼ is a mixing
equivalence,
j + 1 < k. Then
b j < b j+1 < bk with b j and bk in B j and b j+1 ∈ B j+1 = B j , so (3) does not hold.
Finally,
consider the A-sampling σ = b j+1 , bk . We have that b j + bk is an element
of B j that is greater than max σ and less than the sum of σ , so (4) also does not
hold.
Theorem 9. If the unions of two additive systems are equal, then either the additive
systems are equal or one is the reduction of the other.
As in our proof of Theorem 1, here we begin with a rather general situation. From an
arbitrary set of positive integers U , we construct a collection CU of disjoint nonempty
sets such that ∪CU = U . Later we restrict U to be the union of some additive system.
If U is the empty set, let CU be the empty collection. If U is nonempty, but does not
contain both 1 and 2, let CU = {U }. Except for these special cases, i.e., if U contains
both 1 and 2, our construction has three parts. First we define a subset TU ⊆ U . Then
we define a function fU from U to the set of nonnegative integers. Finally we define
the collection CU .
Our goal for TU is for it to contain 1 and 2 and also satisfy the following. Given
consecutive elements d − < d of TU , define c = d − d − .
i. If c + d ∈
/ U and 2d ∈ / U , then d is the largest element of TU .
ii. If c + d ∈
/ U and 2d ∈ U , then 2d is the smallest element of TU that is larger
than d.
iii. If c + d ∈ U , then c + d is the smallest element of TU that is larger than d.
Beginning with the set {1, 2}, these conditions can be used to progressively determine
the elements of TU in ascending order, so it is clear that there exists a unique set of
positive integers TU that contains 1 and 2 and satisfies these conditions. Since here we
only consider sets U that contain 1 and 2, we have TU ⊆ U .
Exercise. Find TU when U is the union of the collection from Figure 4; repeat using
Figure 5. Can you guess what TU is when U is the union of an additive system? (Hint:
An alternate construction of TU begins with {1}. If the maximum of the current set is
We next define the function fU from U to the set of nonnegative integers. For u in
U , let t be the greatest element of TU such that t ≤ u. (Since 1 is in TU , t exists.) Then
let fU (u) be the smallest positive integer r such that r + t is in U ; if no such positive
integer exists, set fU (u) = 0. Note that fU (u) = 0 if and only if max U = u ∈ TU . In
particular, fU (u) = 0 for at most one u ∈ U .
Exercise. Using the exercise above, find fU when U is the union of the collection
from Figure 4; repeat using Figure 5. Speculate on a general result for fU when U is
the union of an additive system.
Finally, let CU be the collection of all nonempty sets of the form fU−1 ({n}), where
n is a nonnegative integer. Clearly the sets of CU are disjoint and U = ∪CU .
Here is what we can say about CU when U is the union of an additive system.
Theorems 2 and 9 follow directly from this.
Here is how to determine whether there exist s and z as in (2). A necessary con-
dition is that U has finitely many elements, but at least two. In that case, the only
possible value of z is the difference between the two largest elements of U ; s times
that difference must equal the largest element of U . In particular, at most one pair
(s, z) satisfies (2).
Here are the intermediate results we need to prove Theorem 10.
Lemma 12. Suppose m ∈ Bi for some 0 ≤ i < I , and bi = min Bi . Then either
bi + m = |B| and m is the largest element of ∪B, or bi + m is the smallest element
of ∪B that is larger than m.
144
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
13. Consider B/ ∼, where ∼ is any equivalence relation on B.
Lemma Suppose
u ∈ B j and that t is the largest element of ∪B such that t ≤ u. Then t ∈ B j .
Lemma 12 follows directly from the definition of a British number system. For
Lemma 13, let σ be the B-sampling with sum u. By the definition of a quotient, every
element of σ is in B j , and by (2) in Proposition 4, max σ = t. Now we can prove
Proposition 11.
Proof. Let T = TU , f = fU , and bi = min Bi for all 0 ≤ i < I .
For (1), first notice that for any additive system A, |A| ≥ 3 if and only if ∪A contains
1 and 2. Since U contains 1 and 2, we have that |B| = |B/ ∼| ≥ 3, so ∪B also
contains 1 and 2. Therefore, we can show that T = ∪B by showing that ∪B satisfies
the three defining conditions of T . So let d − < d be consecutive elements of ∪B and
c = d − d − . Denote by d + the smallest element of ∪B that is larger than d, if it exists.
Say d − ∈ Bi . By Lemma 12, c = bi . This also means c ≤ d − < d, so that
c + d < 2d.
Now, d ∈ Bi or d = bi+1 . Consider first d ∈ Bi . Again applying Lemma 12, either
c + d = |B| and d + does not exist, or d + = c + d ∈ ∪B ⊆ U . Thus, either the first
or third of the defining properties of T applies.
For d = bi+1 , once again we apply Lemma 12. Either 2d = |B| and d + does not
exist, or d + = 2d ∈ ∪B ⊆ U . So either the first or second of the defining properties
of T applies, provided that we can show c + d ∈ / U . Since ∼ is a mixing equivalence,
Bi
= Bi+1
, so {bi , bi+1 } is a (B/ ∼)-sampling, and thus bi + bi+1 = c + d ∈ / U.
This completes (1).
For (2), let u ∈ B j and t be the largest element of ∪B such that t ≤ u. By (1) and
the definition of f , f (u) = f (t). Since t ∈ B j by Lemma 13, we can replace u by t
and assume u ∈ ∪B and t = u. Specifically, u is in a set of B that is equivalent to B j ;
replacing j if necessary, we may assume u ∈ B j .
Assume now that f (u) > 0. We claim that f (u) ≥ min B j . Since f (u) > 0, it
suffices to consider 0 < n < min B j and show that n + u ∈ / U . Indeed, if σ is the
(B/ ∼)-sampling with sum n, then no element of σ is in B j since n < min B j .
Thus σ ∪ {u} is the unique (B/ ∼)-sampling with sum n + u. Since σ ∪ {u} has more
than one element, n + u is not in U = ∪ (B/ ∼). For (2), we therefore only need to
show that f (u) ≤ min B j .
Let j0 be the smallest integer such that Bj = B j0 ; then min B j =
nonnegative
b j0 . If j0 < j, then bj0 , u is a B j -sampling (where B j is the equivalence class of
B j ), so b j0 + u ∈ B j ⊆ U and thus f (u) ≤ b j0 = min B j . So we may also assume
j0 = j. We thus need to show f (u) ≤ b j .
By Lemma 12, since u ∈ B j , either b j + u = |B| or b j + u ∈ ∪B. If b j + u ∈
∪B ⊆ U , then f (u) ≤ b j , as desired. Finally, if b j + u = |B|, then since every ele-
ment of U is less than |B/ ∼| = |B|, we have f (u) < b j , completing (2). (Actually,
this final case cannot occur, since we already showed f (u) ≥ min B j = b j0 = b j .)
Now consider u 1 and u 2 in U such that fU (u 1 ) and fU (u 2 ) are positive. By (2), u 1
and u 2 are in the same set of B/ ∼ if and only if fU (u 1 ) = fU (u 2 ). (Distinct sets of
B/ ∼ are disjoint, and thus have different minimum elements.) We then get (3) from
the definition of CU .
For (4), we have that u = max U is the unique element on which fU is zero. This
implies that I < ∞ and that CU has the desired form, by using the same reasoning as
in (3). To get the desired form for B/ ∼, it is enough to prove these two claims:
• B
I −1
= B I −1 , and
• if z ∈ U and z ≥ b
I −1 , then z ∈ B I −1 .
In Proposition 11, notice that (3) and the final part of (4) imply that CU and B/ ∼
are equal, except possibly for the membership of elements with fU (u) = 0, and there
is at most one such element.
We can now prove Theorem 10.
Proof. Let C = CU . If U does not contain both 1 and 2, then it is easy to see that A
must be the empty collection or {{1}}. In either case, there do not exist integers s and
z as in (2), and the special cases of our construction show that C = A. So we may
assume that U contains 1 and 2.
Consider the following conditions.
i. A = C.
ii. There exist integers s ≥ 2 and z ≥ 1 such that the s largest elements of U are
z, 2z, . . . , sz.
iii. C is reducible. (In particular, C is an additive system.)
iv. A is the reduction of C.
We show that (i) implies that (ii), (iii), and (iv) hold. This establishes all of Theorem 10
except for the first part of (2); for that we must separately prove that (ii) implies (iii).
Using Theorem 1, write A as the mixed quotient B/ ∼ so we can apply Proposi-
tion 11. Say B = (Bi )0≤i<I , and let bi = min Bi for all 0 ≤ i < I .
Suppose A = C. By (3) of Proposition 11, fU cannot be always positive, so fU is
zero on max U = max ∪B. Since U has a maximum element, I < ∞.
Let z = b I −1 and let s ≥ 1 be the number of elements in B I −1 , so that B I −1 =
{z, 2z, . . . , sz} and sz = max U . By (4) of Proposition 11,
If s were equal to 1, then we would have A = C, and thus s ≥ 2. In particular, the first
equation above now gives us (ii). Substituting the first equation above into the second,
we get
This gives us (iii) and (iv), since Proposition 7 implies that A<z is an additive system
of size z.
Now assume (ii). Recalling from (1) of Proposition 11 that TU = ∪B, let us show
that z and 2z are in TU .
Let t be the largest element of TU that is strictly less than 2z; say t ∈ Bi . Since
TU ⊆ U , t ≤ z. We claim that
146
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Indeed, let us consider the two possible conclusions when applying Lemma 12 to t. We
could have that bi + t = |B| = |B/ ∼| > max U ≥ 2z or that bi + t is the smallest
element of ∪B that is larger than t. In the latter case, bi + t ∈ TU , so bi + t ≥ 2z, with
equality giving us that 2z ∈ ∪B = TU .
Using the inequality that appears in our claim, we can conclude that bi ≤ t ≤ z ≤
(bi + t) /2, so we must have bi = t = z. In particular, z ∈ TU , and applying the second
part of our claim, 2z ∈ TU .
We now apply the definitions of TU , fU , and C = CU to conclude the following.
First, z, 2z, . . . , sz are all in TU . Second, fU (u) = 0 exactly for u = sz, and fU (u) =
z if and only if u ∈ {z, 2z, . . . , (s − 1) z}. (For “only if,” we use that u ≥ fU (u), which
follows from (2) of Proposition 11.) Third, {sz} and {z, 2z, . . . , (s − 1) z} are sets of
C. Thus C = C<z ∪ {{z, 2z, . . . , (s − 1) z} , {sz}}. By Proposition 7, C<z is an additive
system of size z, so C is reducible. This completes our proof.
Recall that the construction in the proof of Theorem 1 allows us to test an arbitrary
collection to see if it is an additive system. We can now perform a similar test with
unions. Given an arbitrary set U of positive integers, construct CU . Then U is the
union of an additive system if and only if CU is an additive system, something we can
determine by applying our earlier test. Indeed, if U is the union of an additive system,
then CU is an additive system by Theorem 10. Conversely, if CU is an additive system,
then U is the union of the additive system CU itself, since U = ∪CU holds for arbitrary
U.
Let us pick up one additional earlier thread. At the end of Section 2, we saw that if
we are given a British number system B and a positive integer n < |B|, we can find
the B-sampling with sum n by a greedy algorithm. That is, we successively choose
the largest possible elements from the additive system’s union in a way so that we get
partial sums that do not exceed n. We also saw in Theorem 8 that for any additive
system that is not a British number system, this greedy algorithm must fail for some n.
What if we begin with n and an additive system that may not be a British number
system? What if we are only given the union U of such an additive system? Let us
show how to find the sampling σ with sum n, assuming that such a sampling exists,
i.e., n is smaller than the size of the additive system. To avoid trivialities, assume U
contains 1 and 2.
First construct TU , or at least construct those elements of TU that are less than
or equal to n. Then use the same greedy algorithm with TU to find σ0 . (So σ0 is
the sampling of the underlying British number system, i.e., the one whose mixed
quotient is the additive system with which we began.) To determine σ , apply fU to
each element of σ0 . Whenever fU fails to be one-to-one on σ0 , replace elements by
their sum. For example, if U is the union of the sets from Figure 5 and n = 373,
then σ0 = {3, 20, 50, 300}. Since the values of fU on these elements are 1, 5, 1,
and 25, respectively, σ = {53, 20, 300}. We leave it to the reader to use Proposi-
tion 11, especially (2), to justify this algorithm. Special care is needed if there is
u ∈ σ0 with fU (u) = 0. In this case, in the notation of Proposition 11, we can use that
u ∈ B I −1 = B I −1
to see that it is in a different set of B/ ∼ from all other elements
of σ0 .
ACKNOWLEDGMENT. Many thanks to two referees who offered many detailed suggestions.
REFERENCES
1. F. Blanchet-Sadri, Algorithmic combinatorics on partial words, Int. J. Found. Comput. Sci. 23 (2012)
1189–1206, http://dx.doi.org/10.1142/S0129054112400473.
2. N. G. de Bruijn, On number systems, Nieuw Arch. Wisk. 4 no. 3 (1956) 15–17.
3. A. O. Munagi, k-Complementing subsets of nonnegative integers, Int. J. Math. Math. Sci. 2005 no. 2
(2005) 215–224, http://dx.doi.org/10.1155/IJMMS.2005.215.
4. M. B. Nathanson, Additive systems and a theorem of de Bruijn, Amer. Math. Monthly 121 (2014) 5–17,
http://dx.doi.org/10.4169/amer.math.monthly.121.01.005.
MICHAEL MALTENFORT received his Ph.D. from the University of Chicago in 1997. After twelve years
at Truman College, one of the City Colleges of Chicago, he moved to Northwestern University, where he has
been a Lecturer and College Adviser since 2013. He has loved square dancing since 1985 and has been a square
dance caller since 2002.
Department of Mathematics, Northwestern University, Evanston, IL 60208
malt@northwestern.edu
148
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Rotating Multiple Sets of Labeled Points to
Bring Them Into Close Coincidence:
A Generalized Wahba Problem
Bisharah Libbus, Gordon Simons, and Yi-Ching Yao
Abstract. While attempting to better understand the 3-dimensional structure of the mam-
malian nucleus as well as a rigid-body kinematics application, the authors encountered a
naturally arising generalized version of the Wahba (1965) problem concerned with bring-
ing multiple sets of labeled points into close coincidence after making appropriate rotations
of these sets of labeled points. Our solution to this generalized problem entails the develop-
ment of a computer algorithm, described and analyzed herein, that generalizes and utilizes an
analytic formula, derived by Grace Wahba (1965), for determining space satellite attitudes,
that task being to find a suitable rotation that brings one set of m labeled points into close
coincidence, in a least-squares sense, with a second set of m labeled points.
m
S(M1 , . . . , Mk ) = wi j ||Mi ai − M j a j ||2 , (1)
0≤i< j≤k =1
where M0 = In (the n × n identity matrix), and where ||v|| denotes the Euclidean norm
of vector v.
When k = 1, this is known as “the Wahba problem,” thus explaining why we
refer to the problem of minimizing (1) as “the generalized Wahba problem” (GWP).
Grace Wahba [17], as a graduate student, using nothing more than linear algebra and
some clever reasoning, obtained an explicit formula for the rotation matrix M1 , while
addressing a compelling need by space scientists, in 1965, to estimate satellite atti-
tudes: given two sets of m labeled n-dimensional points {a1 , . . . , am } and {b1 , . . . , bm },
find a rotation matrix M that brings the second set into the best least-squares coinci-
dence with the first, i.e.,find a rotation matrix M that minimizes Wahba’s (unweighted)
loss function S(M) = m=1 ||a − Mb ||2 . See Figure 1 with m = 4 and n = 3, where
the unit vectors a ( = 1, 2, 3, 4) are representations, in the satellite reference frame,
of the directions of four observed objects, and the unit vectors b ( = 1, 2, 3, 4) are
representations of the corresponding observations in a known reference frame.
See [4] for an elegant analytic solution for the optimizer M in the Wahba problem,
the direct use of which can sometimes make an accurate computation of M difficult.
Markley [8] provides a computationally more accurate approach based on a singular
value decomposition of an n × n matrix with n = 3 (in the context of satellite attitude
http://dx.doi.org/10.4169/amer.math.monthly.124.2.149
MSC: Primary 65F30, Secondary 65K05; 15B10
b4 a4
150
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
1
1
4
1
2
4
3 2
2
4 3
3
M1 and M2 . But measurement errors are to be expected, and the resulting ambiguity
in the data can be resolved by minimizing S(M1 , M2 ) (that is, by seeking a best least-
squares fit of the data, subject to the requirement that M1 and M2 are 3 × 3 rotation
matrices).
For rigid-body applications, it makes sense to adjust the weights wi j in (1) appro-
priately to reflect the fact that one is dealing with a sequence of contiguous, time-
ordered rotational motions, such as by using larger weights when j − i is small.
Cellular nuclei of eukaryotic organisms, where chromosomes reside, provide the
setting for an entirely different application of the GWP. The focus of attention is on the
ends of chromosomes, called telomeres. While it is known that telomeres are anchored
to the nuclear envelope (cf. Alsheimer et al. [1], Crabbe et al. [2], and Moens et al.
[9]), so as to facilitate the required motion of chromosomes within the cellular nucleus
during the various phases of cellular activity, the anchoring details are not very well
understood. However, specific proteins are being identified as playing key roles in the
association of telomeres with the nuclear envelope (cf. Hou et al. [6], Kind and van
Steensel [7], Postberg et al. [10], and Schmidt et al. [12]). It is tempting to suspect,
but presently it cannot be ascertained, whether the arrangement of anchoring points
is unique, with each telomere occupying a fixed attachment location to the nuclear
envelope relative to all other telomeres, an arrangement that is common to all nuclei
of the given type. We shall refer to this suspected arrangement as the “fixed anchoring
points” (FAP) hypothesis. To be specific, assume that we are observing k + 1 cellu-
lar nuclei. Now, if it is possible to independently rotate the latter k nuclei, together
with their telomere attachments, so as to bring their corresponding telomere locations
into close coincidence with the corresponding telomere locations in the first cellular
nucleus, this would provide strong evidence in support of the FAP hypothesis. To be
even more specific, we might compute a function like S(M1 , . . . , Mk ) in (1), and if the
minimum possible value of this function is larger than some specified threshold value,
then we might reasonably view this as providing a sound statistical basis for rejecting
the FAP hypothesis.
As a practical matter, it is not presently possible to compute the 3-dimensional
locations of telomeres within their cellular nuclei. So this appealing approach toward
testing the FAP hypothesis is not presently feasible to implement. However, one of
the authors of this paper has experimentally secured 2-dimensional projections of the
missing 3-dimensional data on telomere locations, and, with this incomplete data,
the current authors have been able to convincingly reject the alternative hypothesis that
the attachment points of telomeres to the nuclear envelope occur randomly. Unfortu-
nately, this provides scant evidence for the validity of the FAP hypothesis.
152
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Since S(M) = m=1 (||a ||2 + ||b ||2 ) − 2 m=1 aT Mb , it is apparent that mini-
mizing S(M) is equivalent to maximizing
m
S(M) = aT Mb = tr (A T M B), (2)
=1
where the notations (·)T and tr (·) denote a matrix transpose and a matrix trace, respec-
tively, and where A = (a1 , . . . , am ) and B = (b1 , . . . , bm ) are n × m matrices. Given
any n × n (nonnegative-definite) diagonal matrix D = diag(δ1 , . . . , δn ) with δ1 ≥
n
δ2 ≥ · · · ≥ δn ≥ 0, it follows from Lemma 1 that tr (G D) = i=1 G ii δi is maxi-
mized over all rotation matrices G = (G i j ) ∈ S O(n) n when G = J(1) (= In ), the n × n
identity matrix (which attains the maximum value i=1 δi ). Moreover, by Lemma 2,
tr (G D) is maximized over all orthogonal matrices G whose determinants are equal to
−1 when G = J(−1) , the n × n identity matrix with its last diagonal element replaced
n−1
by −1 (which attains the maximum value i=1 δi − δn ). Now, let U DV T be a singular
value decomposition of the matrix product AB T , where the diagonal elements of the
diagonal matrix D satisfy δ1 ≥ δ2 ≥ · · · ≥ δn ≥ 0 and where U and V are appropri-
ately chosen orthogonal matrices of dimension n × n. Observe, for any rotation matrix
M, that
S(M) = tr (A T M B) = tr ((A T M B)T ) = tr (B T M T A) = tr (M T (AB T ))
= tr (M T (U DV T )) = tr ((V T M T U )D) = tr (G D),
m
S(M1 , . . . , Mk ) = wi j (||ai ||2 + ||a j ||2 )
0≤i< j≤k =1
m
− wi j (Mi ai )T (M j a j ).
0≤i= j≤k =1
m
S(M1 , . . . , Mk ) = −2 wi j (Mi ai )T (M j a j ) + S(− j)
i∈{0,...,k}\{ j} =1
= −2 tr (B Tj M j A j ) + S(− j) ,
where A j = (a j1 , . . . , a jm
) (matrix of dimension n × m), the th column of the n × m
dimensional matrix B j is i∈{0,...,k}\{ j} wi j Mi ai ( = 1, . . . , m), and S(− j) is a sum of
terms that do not involve M j . It follows that minimizing S(M1 , . . . , Mk ) over M j with
the other Mi ’s fixed is equivalent to maximizing tr (B Tj M j A j ) over M j (cf. (2) with A
and B replaced by B j and A j , respectively), and can be readily solved using any avail-
able algorithm (e.g., Markley’s singular value decomposition method in Section 2). As
described for the fixed index j ∈ {1, . . . , k}, we shall refer to this approach toward
reducing the size of S(M1 , . . . , Mk ) as a j-step, applied to a general current state
(M1 , . . . , Mk ). Our algorithm for minimizing S(M1 , . . . , Mk ) now takes shape: (i) start
with an arbitrary state (configuration) (M10 , . . . , Mk0 ) ∈ S O(n) × · · · × S O(n), called
the seed; then (ii) update this state through a sequence of j-steps, updating one rotation
matrix at a time, cycling through the possible choices for j in some prescribed manner.
What we will call algorithm A1 uses the trivial cycling strategy: cycling through the
indices {1, . . . , k} repeatedly, starting with the index 1. What we will call algorithm A2
cycles through these indices by choosing at each step a “best possible j-step,” i.e., one
that reduces the current value of S(M1 , . . . , Mk ) as much as possible. More precisely, if
the current (best possible) j-step is for j = j ∈ {1, . . . , k}, then to determine the next
(best possible) j-step, we need to compare all the j-steps with j ∈ {1, . . . , k} \ { j }
and choose one that yields the smallest (updated) value of S(M1 , . . . , Mk ). It follows
that the two algorithms A1 and A2 coincide for k = 2, while algorithm A2 is more
time consuming for k > 2.
For algorithm A2 , unlike for algorithm A1 , the frequency distribution of j-steps per-
formed ( j = 1, . . . , k) could become significantly uneven when k ≥ 3. But empirical
evidence indicates otherwise. What we observe is that, after a few j-steps, the pattern
of j values chosen by the algorithm starts to repeat according to some permutation of
the integers 1, . . . , k, continuing in this way until the current values of S(M1 , . . . , Mk )
cease to change (apart from round-off errors). Effectively, convergence has occurred.
Table 1 is a typical example for k = 5, with the values of j broken up into 40 ver-
tical blocks of 5, describing a total of 200 j-steps. It can be seen that the process of
repetition of the permutation (5, 3, 2, 4, 1) begins with the 42nd j-step.
Whatever the cycling strategy used, we shall let (M1r , . . . , Mkr ) denote the state
(rotation matrix configuration) at the end of the r th step, r = 1, 2, . . . . Clearly
S(M1r , . . . , Mkr ) decreases in r . So two natural questions arise: (1) As r → ∞, does
(M1r , . . . , Mkr ) converge? and (2) Does
hold? Since these remain open questions for algorithm A1 , we shall focus our attention
on algorithm A2 , addressing them from a theoretical standpoint as well as we presently
can.
An illustrative example for the algorithms Ai , for i = 1, 2, clarifies what can go
wrong. For k = 2, m = 4, and n = 3, we repeatedly computed the minimizing config-
urations (M1∗ , M2∗ ) for randomly generated 3 × 4 data A j , for j = 0, 1, 2, twice with
the same data, starting each trial with a different set of seeds, and checking for possible
disagreement. For simplicity we set all of the weights wi j in (1) equal to 1. (Note that
154
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Table 2. An example for A1 = A2 with k = 2 :
Two Distinct Seed-Dependent Limits for the Same Data
A0 A1 A2
0.56 0.42 0.99 0.62 −0.09 −0.36 −0.45 −0.90 0.00 0.83 −0.40 −0.40
0.82 0.91 −0.09 0.79 0.72 −0.59 −0.06 −0.36 −0.91 0.11 −0.54 0.44
0.13 0.00 −0.07 −0.01 −0.69 −0.72 0.89 −0.23 0.40 0.55 −0.74 −0.80
the two algorithms are the same for k = 2.) On the 302nd repetition of this process, we
finally encountered a case of disagreement, as shown in Table 2. The pair of seeds used
in the second trial are randomly generated rotation matrices. As the table shows, these
give rise to a pair of limiting rotation matrices that differ from those resulting from
the simple pair of seeds (I3 , I3 ) used for the first trial where I3 is the 3 × 3 identity
matrix. The limiting values of S(M1r , M2r ) (as r → ∞) for this example, are 12.81672
and 12.52939, respectively (with an approximate ratio of 1.02). Extensive empirical
studies of this sort, conducted by the authors with randomly generated data and trial
seeds, have never produced more than two different limiting configurations (M1∗ , M2∗ )
when k = 2, m = 4, and n = 3. So we are confident that the smaller value 12.52939,
for this example, corresponds to a genuine global minimum for the sum in (1).
What the two limiting configurations appearing in Table 2 have in common is
important to note. They both have the appearance of being a global-minimum con-
figuration in that no further improvement (reduction of S(M1∗ , M2∗ )) is possible by the
application of an additional j-step. But, of course, one truly is and the other is not
a global-minimum configuration. In what follows, we will describe both configura-
tions as “stationary configurations.” This is an important concept for us to consider
at this point because our methodology naturally leads to the discovery of stationary
configurations that might or might not be global-minimum configurations. Whether a
stationary configuration is truly a global-minimum configuration depends on the seed
and the cycling strategy chosen.
A configuration (M1∗ , . . . , Mk∗ ) is said to be stationary in S O(n) × · · · × S O(n)
(with respect to the sets of points {a j } and the weights wi j ) if for each j = 1, . . . , k,
To prove Theorem 1, we need the following lemma whose proof is relegated to the
appendix.
156
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
While for technical reasons we can only address the convergence issue for algo-
rithm A2 (instead of A1 ) as in Theorem 1, this result provides theoretical support of
convergence for A1 as the two algorithms are close cousins. (Recall that A1 = A2 for
k = 2.) Indeed, we have performed extensive simulation studies with n = 3, and for
most of the simulation studies we have used algorithm A1 (instead of A2 ), and always
observed apparent convergence of (M1r , . . . , Mkr ) as r gets large. It should be remarked
that if the sequence of (M1r , . . . , Mkr ) generated by algorithm A1 converges, then the
limiting configuration is necessarily stationary.
Theorem 1 assumes that the number of stationary configurations is finite. To get
some idea of how many stationary configurations there can be in the worst possible
data situation, we carried out extensive simulation studies for m = 4, n = 3, and
k = 2, . . . , 10, and found that the maximum numbers of stationary configurations
are 2, 2, 3, 3, 4, 4, 4, 4, 5, respectively. This is of practical importance when one is
worried that the stationary configuration found is not a global-minimum configura-
tion. One can simply use algorithm A1 repeatedly with randomly generated seeds. For
k = 5, m = 4, and n = 3 as an example, one is very likely to end up with the same
stationary configuration over and over again, because there is only one stationary con-
figuration (which is necessarily the global-minimum configuration). But if one finds
a second stationary configuration, then the configuration that yields the larger value
of S(M1 , . . . , M5 ) can be discarded. Continuing, no new stationary configuration is
likely to be found, but if a third stationary configuration is encountered, one can again
discard the configuration corresponding to the larger value of S(M1 , . . . , M5 ). At this
point, one can continue with randomly generated seeds, but this table says that, based
on an enormous number of examples we have checked, the maximum number of sta-
tionary configurations for k = 5 is 3, and new ones will not be found by continuing.
So as a practical matter, one is bound by persistence (even in the worst possible data
situation) to find the global minimum one seeks. Lest it seem to the reader that this
process, as outlined (to make certain that the true global minimum is found), will be
time consuming, the actual computational time on a PC will be a few minutes at most,
and probably considerably less, simply because algorithm A1 converges so rapidly.
We concede that the above discussion is based solely on empirical evidence without
rigorous theoretical justification.
Due to the possible presence of multiple stationary configurations, one can never
know for sure if the limiting (stationary) configuration of (M1r , . . . , Mkr ) corresponds
to the global minimum of S(M1 , . . . , Mk ). However, it appears to us that this issue is
likely to be insignificant in practice for the following reasons:
• for the vast majority of the simulated data sets, there appears to be only one
stationary configuration which would necessarily correspond to the global minimum
of S(M1 , . . . , Mk );
• in the rare cases when multiple stationary configurations arise, evidence suggests
that the k + 1 sets of labeled points in the corresponding data set {a j , = 1, . . . , m},
for j = 0, . . . , k, cannot be brought into very close coincidence by properly cho-
sen rotation matrices M1 , . . . , Mk , which, of course, is the objective. In view of the
rather large size of the global minimum for the example described in Table 2, an
inability to obtain a close coincidence of the corresponding labeled points is evi-
dent;
• for these exceptional simulated cases of multiple stationary configurations, we
have observed that the largest of the S(M1∗ , . . . , Mk∗ ) values is only a few percent-
age points larger than the smallest, namely the one corresponding to the global
minimum of S(M1 , . . . , Mk ); cf. the ratio of 1.02 for the example described in
Table 2.
Proof of Lemma 1. The case δn ≥ 0 is trivial. We now assume δn < 0. Since the
function
F(δ1 , . . . , δn ) := max tr (G D)
G∈S O(n)
n
is continuous in δ1 , . . . , δn , it suffices to prove F(δ1 , . . . , δn ) = i=1 δi for δ1 > · · · >
δn−1 > −δn > 0. Let M ∈ S O(n) maximize tr (G D) over G ∈ S O(n), i.e.,
n
F(δ1 , . . . , δn ) = tr (M D) = Mii δi . (5)
i=1
We claim that Mi j = 0 for i = j, from which it follows easily that Mii = 1 for all i
n
and F(δ1 , . . . , δn ) = i=1 δi .
It remains to establish the claim. For each pair (i, j) with i = j, let Ri j (θ) be the
identity matrix with the elements at the four locations (i, i), (i, j), ( j, i), and ( j, j)
replaced by cos θ, − sin θ, sin θ, and cos θ, respectively. Since Ri j (θ) ∈ S O(n), we
have M Ri j (θ) ∈ S O(n), and
f i j (θ) : = tr (M Ri j (θ)D)
= δi (Mii cos θ + Mi j sin θ) + δ j (M j j cos θ − M ji sin θ)
+ δh Mhh .
h∈{1,...,n}\{i, j}
d
0= f i j (θ)|θ=0 = δi Mi j − δ j M ji , i.e., δi Mi j = δ j M ji . (6)
dθ
158
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
By (6) with i = 1, we have
n
n
δ12 = δ12 M12j = δ 2j M 2j1 ,
j=1 j=1
which together with δ12 > δ 2j for j = 1 and nj=1 M 2j1 = 1 implies that M11
2
= 1, which
in turn implies that M1 j = M j1 = 0 for j = 1. Applying (6) repeatedly shows that
Mi j = 0 for i = j. The proof is complete.
To show that Q is stationary, suppose to the contrary that for some 1 ≤ j ≤ k and
some M ∗j ∈ S O(n),
With ε = c − c∗ > 0, a standard continuity argument shows that there exists a δ > 0
such that S(M1 , . . . , M j−1 , M ∗j , M j+1 , . . . , Mk ) < c − ε/2 whenever |Mi − Mi | < δ
for all i ∈ {1, . . . , k} \ { j}. Since Q r → Q , there is an such that d(Q r , Q ) < δ,
r
i.e., |Mi − Mi | < δ for all 1 ≤ i ≤ k. By the definition of algorithm A2 ,
r r r r
S(Q r +1 ) ≤ min S(M1 , . . . , M j−1
, M j , M j+1
, . . . , Mk )
Mj
r r r r
≤ S(M1 , . . . , M j−1
, M ∗j , M j+1
, . . . , Mk )
< c − ε/2 < c,
which contradicts the fact that S(Q r ) monotonically decreases to c, completing the
proof.
REFERENCES
1. M. Alsheimer, E. von Glasenapp, R. Hock, R. Benavente, Architecture of the nuclear periphery of rat
pachytene spermatocytes: Distribution of nuclear envelope proteins in relation to synaptonemal complex
attachment sites, Mol. Biol. Cell 10 (1999) 1235–1245.
2. L. Crabbe, A. J. Cesare, J. M. Kasuboske, J. A. Fitzpatrick, J. Karlseder, Human telomeres are tethered
to the nuclear envelope during postmitotic nuclear assembly, Cell Rep. 2 (2012) 1521–1529.
3. A. H. J. de Ruiter, J. R. Forbes, On the solution of Wahba’s problem S O(n), J. Astronaut. Sci. 60 (2013)
1–31.
4. J. L. Farrell, J. C. Stuelpnagel, Problem 65-1: A least squares estimate of spacecraft attitude, SIAM Rev.
8 (1966) 384–386.
BISHARAH LIBBUS is a retired geneticist. He received his Master’s degree in biology from the American
University of Beirut, Lebanon (AUB), and his Ph.D. in genetics from the University of Missouri, Columbia. He
was a postdoc at the Johns Hopkins University before holding various faculty positions at Haigazian College
and AUB in Beirut, the University of Vermont, and a senior visiting scientist at the National Institutes of
Health. His interest in chromosome order grew out of his study of mammalian meiosis and male chromosome
organization.
401 Ironwoods Dr., Chapel Hill, NC 27516
blibbus@gmail.com
GORDON SIMONS is a retired professor of statistics from the University of North Carolina, where he taught
statistics and probability for 38 years. He received undergraduate and Master’s degrees in mathematics and a
Ph.D. in statistics, all from the University of Minnesota, and then was a postdoc at Stanford University for
two years before coming to the University of North Carolina in 1968 to teach, and to direct the education and
research of graduate students in statistics. He chaired the Department of Statistics for seven years during his
tenure at UNC.
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill NC 27599-3260
gsimons@live.unc.edu
YI-CHING YAO received an undergraduate degree in electrical engineering from National Taiwan University
in 1976 and a Ph.D. in mathematics (specialized in statistics) from MIT in 1982. He was with the Department
of Statistics, Colorado State University during 1983–1995. Since 1995, he has been a research fellow at the
Institute of Statistical Science, Academia Sinica, Taiwan. His research interests are in the areas of applied
probability and statistics. He served on the editorial board for the journals of the Annals of Statistics, Bernoulli,
Sankhyā, and Statistica Sinica.
Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan, ROC
yao@stat.sinica.edu.tw
160
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
NOTES
Edited by Vadim Ponomarenko
Abstract. Catalan and Pippenger discovered infinite products for various values related to e.
Based on these, infinite products for e1/2 and e2/3 were found by Sondow and Yi [3], who
furthermore conjectured that the products could be generalized for a power of e1/k . The dis-
coveries in this paper result from attempting to prove this conjecture. By generalizing Sondow
and Yi’s products, infinite products for powers of e1/k are found.
Based on Catalan and Pippenger’s products, Sondow and Yi [3] discovered the follow-
ing formulas:
√ 1/2 1/4
e 2 66 10 10 14 14 1/8
= ··· , (3)
2 3 57 9 11 13 15
1/3
e2/3 3 3 6 6 9 1/9 9 12 12 15 15 18 18 21 21 24 24 27 1/27
√ = · · · . (4)
3 2 4578 10 11 13 14 16 17 19 20 22 23 25 26
Products (2),(3), and (4) are similar in that they were proved by calculating the nth
partial product and applying Stirling’s asymptotic formula
√
N ! ∼ 2π N (N /e) N (N → ∞).
Using a similar method, we found generalizations of the infinite products (3) and
(4). These products were found based on the fact that (3) can be rewritten as
√ 1/2 2 2 1/4 1/8
e 2 2 ·3 24 (5 · 7)2
= ··· (5)
2 3 5·7 9 · 11 · 13 · 15
http://dx.doi.org/10.4169/amer.math.monthly.124.2.161
MSC: Primary 11Y60, Secondary 40A20
1/3 1/9
e2/3 3 35 22
√ =
3 2 4·5·7·8
1/27
317 (2 · 4 · 5 · 7 · 8)2
· · · · . (6)
10 · 11 · 13 · 14 · 16 · 17 · 19 · 20 · 22 · 23 · 25 · 26
2. THEOREMS.
1/k
∞
n −k n−1 −1 n−1 k+1
1/k n
e(k−1)/k k k−2 kk k !
= . (7)
k 1/(k−1) (k − 1)! n=2
k n !k n−2 !k
1/k
∞
n −k n−1 −1 n−2
1/k n
e(k−1)/k k k−2 kk (k n−1 !/(k n−2 !k k ))k−1
= . (8)
k 1/(k−1) (k − 1)! n=2
k n !k n−2 !/(k n−1 !2 k k n−1 −k n−2 )
In this form, Sondow and Yi’s formula (6) is the special case k = 3.
Equation (8) can be written in a form similar to (4), where for any k ≥ 3 the first
factor is
1/k
kk k
··· (9)
23 k−1
Thus, Sondow and Yi’s formula (4) is the special case k = 3 of (9) and (10).
1/5 1/25
e4/5 555 5 10 10 10 10 15 15 15 15 20 20 20 20 25 25 25
=
51/4 234 6 7 8 9 11 12 13 14 16 17 18 19 21 22 23 24
1/125
25 30 30 30 30 35 120 120 120 120 125 125 125
· ··· ··· .
26 27 28 29 31 32 117 118 119 121 122 123 124
162
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
When written in the form of (8), this is
1/5 1/25
e4/5 53 519 (2 · 3 · 4)4
=
51/4 2·3·4 6 · 7 · 8 · 9 · 11 · 12 · · · · 19 · 21 · 22 · 23 · 24
1/125
599 (2 · 3 · 4 · 6 · 7 · 8 · 9 · 11 · 12 · · · · 19 · 21 · 22 · 23 · 24)4
· ··· .
26 · 27 · 28 · 29 · 31 · 32 · · · · 119 · 121 · 122 · 123 · 124
Note that (7) also works when k = 2, which provides the product
√ 1/4 1/8
e 2 46 8 10 12 14 1/16 16 18 20 22 24 26 28 30 1/32
= ··· .
2 3 57 9 11 13 15 17 19 21 23 25 27 29 31
where
k n+1 !k n−1 !
a0 = 1, an = for n ≥ 1.
k n !2 k (k−1)k n−1
Theorem 2 can be written in a form similar to (3), where the first factor is
1/k
k k k k
··· ··· 2 (12)
k+1 2k − 1 2k + 1 k −1
where
and
Equation (13) becomes much simpler when expanded. The denominator contains all
integers from k n to k n+1 excluding multiples of k. The numerator contains all multiples
of k from k n to k n+1 , excluding multiples of k 2 , k times. This is easily seen in an
example.
1/3
e4/3 3333 12 12 12 15 15 15 21 21 21 24 24 24 1/9
=
9 4578 10 11 13 14 16 17 19 20 22 23 25 26
30 30 30 33 75 78 78 78 1/27
· ··· ···
28 29 31 32 76 77 79 80
1/3 1/9
e4/3 34 312 (4 · 5 · 7 · 8)3
=
9 4·5·7·8 10 · 11 · 13 · 14 · · · · 22 · 23 · 25 · 26
36 1/27
3 (10 · 11 · 13 · 14 · · · · 22 · 23 · 25 · 26)3
· ··· .
28 · 29 · 31 · 32 · · · · 76 · 77 · 79 · 80
Furthermore, we can see that Sondow and Yi’s formula (3) is a specific case of this
generalized form where k = 2.
3. PROOFS.
Proof of Theorem 1. From (7) we can see that cancellations occur when computing
the partial products, which are
n
n nk n+1 − 2(n + 1)k n + (n − 1)k n−1 + 1
E n := k + n−1
k − k n−1 − k n−v = .
v=1
k−1
(14)
Thus the nth partial product is
1/k n
k En k n−1 !
.
kn !
164
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Replacing E n with (14) and combining like terms give us
3/2−k/2
−1/(k−1) (k−1)/k
k k n (k−1) e .
e(k−1)/k
.
k 1/(k−1)
Proof of Theorem 2. From (11), notice that cancellations occur when computing the
partial products, which are
2
1/k 2
1/k 2 2 2
1/k 3
k (k−1) k 2k(k−1) k 3k (k−1)
, 3 , 4 2 3 2 (k−1)k 2 ,
k 2 !/k!2 k k−1 k !k!/k 2 !2 k (k−1)k k !k !/k ! k
n−1 2
1/k n
k nk (k−1)
. . . , n+1 n−1 n 2 (k−1)k n−1 ,....
k !k !/k ! k
ACKNOWLEDGMENT. The author is grateful to Pei-yong Wang for advising the research and the writing
of this paper.
REFERENCES
1. E. Catalan, Sur la constante d’Euler et la fonction de Binet, C. R. Acad. Sci. Paris Sér. I Math. 77 (1873)
198–201.
2. N. Pippenger, An infinite product for e, Amer. Math. Monthly 87 (1980) 391.
√
3. J. Sondow, H. Yi, New Wallis- and Catalan-type infinite products for π , e and 2 + 2, Amer. Math.
Monthly 117 (2010) 912–917.
Abstract. It is well known that when a prime p is congruent to 1 modulo 4, the sum of the
quadratic residues equals the sum of the quadratic nonresidues. In this note, we give ele-
mentary proofs of V.-A. Lebesgue’s analogous results for the case where p is congruent to 3
modulo 4.
This theorem was established by V.-A. Lebesgue [6, p. 144 (16)] using trigonomet-
rical series that were well known to Gauss [3, Article 362]. As we show in the final
section of this note, the result may also
√ be deduced from standard formulas for the
class number of the quadratic field Q[ − p]. Part (a) is obtained in this way in [1,
Cor. 13.4].
We mention in passing that Dirichlet proved that, when p is congruent to 3 modulo
4 and p > 3, there are more quadratic residues than nonresidues in Zlp . It would be
interesting to have a simple direct proof of this fact; see [8]. The arguments in this
note do not seem to furnish such a proof.
Lemma.
(a) If p ≡ 7 (mod 8), then Q = np.
(b) If p ≡ 3 (mod 8), then 3 Q = np + 2p .
http://dx.doi.org/10.4169/amer.math.monthly.124.2.166
MSC: Primary 11A15
166
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Proof. Let σ be the doubling function x → 2x on Z, and consider the function σ̄
induced on Z p by σ . Notice that if x ∈ Q l , then σ̄ (x) = σ (x) = 2x, and if x ∈ Q u ,
then σ̄ (x) = σ (x) − p = 2x − p.
(a) When p ≡ 7 mod 8, the function σ̄ preserves Q. Now, since Q u has n
elements,
Q= σ̄ (Q) = σ (Q l ) + σ (Q u ) − np = σ (Q) − np,
that is, Q = 2 Q − np, giving Q = np, as required.
(b) When p ≡ 3 (mod 8), the function σ̄ sends
quadratic
residues to quadratic
nonresidues. Now, since Q u has n elements, and Z p = 2p ,
p p
Q= − σ̄ (Q) = − σ (Q l ) + σ (Q u ) − np
2 2
p
= − σ (Q) + np.
2
p
Thus, Q= 2
− 2 Q + np, giving 3 Q = np + 2p , as required.
p−1
1 i
h=− i ,
p i=1 p
where pi denotes the Legendre symbol. Using our notations, the preceding may be
written
1
h= ( N− Q). (2)
p
( p−1)/2
1 i
h= .
2− 2
i=1
p
p
Thus,
h = (r − n)/δ, (3)
1 : if p ≡ 7 mod 8,
δ=
3 : if p ≡ 3 mod 8.
This establishes the lemma of the previous section. The theorem then follows as before.
REFERENCES
168
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Collège Calvin, Geneva, Switzerland 1211
christian.aebi@edu.ge.ch
p1 = 2, p2 = 3, p3 = 5, ..., pk .
Since 2, 3, 5, 7, and 11 are prime, we have k > 4. We will use “lg” to denote the base
2 logarithm. Note that we have the crude bounds k > lg k and k lg k > 1.
Let N = k 3k . Given any positive integer n ≤ N , we can write
1 + lg N = 1 + 3k lg k < 4k lg k < k · k · k = k 3 ,
so there are strictly fewer than k 3 possibilities for each a j , and hence fewer than (k 3 )k
possibilities for the tuple (a1 , . . . , ak ). That is, there are fewer than k 3k possibilities
for the prime factorization of n, so it is not possible to construct prime factorizations
for all positive integers n ≤ k 3k .
REFERENCE
1. G. J. Chaitin, Toward a mathematical definition of life, in The Maximum Entropy Formalism, Ed.
R. D. Levine, M. Tribus, MIT Press, Cambridge, 1979. 477–498.
http://dx.doi.org/10.4169/amer.math.monthly.124.2.169
MSC: Primary 11A41
Abstract. We consider the problem of characterizing all functions f defined on the set of inte-
gers modulo n with the property that an average of some nth roots of unity determined by f
is always an algebraic integer. Examples of such functions with this property are linear func-
tions. We show that, when n is a prime number, the converse also holds. That is, any function
with this property is representable by a linear polynomial. Finally, we give an application of
the main result to the problem of determining self-perfect isometries for the cyclic group of
prime order p.
1 a f (x)+bx
n−1
μa,b
f = ω is an algebraic integer for every a, b ∈ Zn . (1)
n x=0
2. PRELIMINARIES. We shall use the symbol (=) to denote ordinary equality. The
symbol (≡) will be used to denote congruence (mod n) (i.e., equality in Zn ).
First we show that any function representable by a linear polynomial function sat-
isfies condition (1).
1 β
n−1
μa,b
f = ω = ωβ ,
n x=0
http://dx.doi.org/10.4169/amer.math.monthly.124.2.170
MSC: Primary 11R04
170
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
which is an algebraic integer. If α ≡ 0, then ωα = 1. Consequently,
1 αx+β
n−1
ωβ 1 − ωαn
μa,b = ω = = 0,
f
n x=0 n 1 − ωα
The next result shows that it is not necessary to check (1) for all pairs (a, b).
1 k(a f (x)+bx) 1
n−1 n−1
μka,kb
f = ω = σk (ω)a f (x)+bx = σk μa,b
f .
n x=0 n x=0
Finally, we give a necessary and sufficient condition for the average of roots of unity
to be an algebraic integer. This is a standard result in algebraic number theory.
Proof. The sufficiency is clear. Let μ denote the average of ω1 , . . . , ωn and assume
that it is an algebraic integer. By the triangle inequality, |μ| ≤ 1 with equality if and
only if ω1 = · · · = ωn . Moreover, |μ | ≤ 1 for all algebraic conjugates μ of μ. If not
all ωi ’s are equal, then |μ| < 1. As a result, we also have |α| < 1, where α is the
product of all algebraic conjugates of μ. But α is an integer, which implies that α must
be 0. It follows that μ = 0.
1 f (x)+bx
p−1
f =
μ1,b ω
p x=0
Proof. By Lemma 4, we have that, for each b, either f (x) + bx is constant modulo
f = 0. Suppose there is b0 ∈ Z p such that f (x) + b0 x is constant modulo p.
p or μ1,b
Then it is clear that f is of the form f (x) ≡ αx + β for all x ∈ Z p .
p−1 f (x)+bx
If there is no b with f (x) + bx constant modulo p, then x=0 ω = 0 for
all b. Since the minimal polynomial of ω is 1 + X + · · · + X p−1 , we must have that
x → f (x) + bx is a permutation modulo p, for all b. This means that the set W f has
cardinality p. Hence, by Lemma 6, f is a linear polynomial.
1 a f (x)+bx
p−1
f =
μa,b ω
p x=0
Remark. It is possible that a function f may have a nonlinear form and still satisfy (1).
The conjecture says that such function should be representable by a linear polynomial
in Zn [X ].
For example, when n = 6, it can be checked that f (x) ≡ x 3 + x satisfies (1).
However f is representable by a linear polynomial, since x 3 + x ≡ 2x for all x ∈ Z6 .
172
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
4. CONNECTION TO PERFECT ISOMETRIES. Corollary 8 has an application
to the representation theory of finite groups, especially to the problem of finding
self-perfect isometries for the cyclic group C p of prime order p. For the purpose of
illustrations, we will define perfect isometries specifically for this special case. Inter-
ested readers are referred to [1] for the definition of perfect isometries for general
blocks of finite groups.
Throughout this section, let G = C p . Denote by R(G) the free abelian group gen-
erated by Irr(G), the set of all irreducible complex characters of G. We will regard
R(G) as lying in C F(G)1 , the space of complex-valued class functions of G.
Let I : R(G) −→ R(G) be a linear map. Define a generalized character μ I of
G × G by
μ I (g, h) = I (χ)(g)χ(h), for all g, h ∈ G.
χ∈Irr(G)
χx (u a ) = ωax , a = 0, 1, . . . , p − 1.
I f (χx ) = χ f (x) , x = 0, 1, . . . , p − 1.
p−1
p−1
μ I (g, h) = μ I (u a , u b ) = I (χx )(u a )χx (u b ) = ωa f (x)+bx .
x=0 x=0
Thus, we see that the condition in (1) is precisely the requirement that μ I satisfies the
integrality condition. This is the only condition to consider, as the separation condition
is satisfied for any bijection f .
If f is a linear bijection, then by Lemma 2, the integrality condition is satisfied.
Thus, I f is a perfect isometry.
Conversely, if I f is a perfect isometry, then μ I (u a , u b )/ p is an algebraic integer for
all a, b. It follows from Corollary 8 that f must be linear.
Remark. The following actions on Irr(G) are well known to give bijections on the
set.
1 This is an inner product space with the standard inner product of group characters.
ACKNOWLEDGMENT. The second author is supported by MUIC Seed Grant Research 017/2015. We wish
to express our gratitude toward the anonymous referee and the editor for helpful comments and suggestions
leading to improvements of this work.
REFERENCES
1. M. Broué, Isométries parfaites, types de blocs, catégories dérivées, Astérisque 181 no. 182 (1990) 61–192.
2. L. R. Dickson, The analytic representation of substitutions on a power of a prime number of letters with a
discussion of the linear group, Ann. Math. 11 no. 1/6 (1896) 65–120.
3. W. W. Stothers, On permutation polynomials whose difference is linear, Glasgow Math. J. 32 no. 2 (1990)
165–171.
174
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Maximal Area of Equilateral Small Polygons
Charles Audet
Abstract. We show that among all equilateral polygons with a given number of sides and the
same diameter, the regular polygon has the maximal area.
http://dx.doi.org/10.4169/amer.math.monthly.124.2.175
MSC: Primary 51M16, Secondary 97G40
r eg ncnr eg r eg
A2n = Arneg + 1 − 1 − (cn )2 .
4
Proof. When n is even, the side length and area of the small regular n-sided polygon
satisfy cnr eg = sin( πn ) and Arneg = n8 sin( 2π
n
). It follows that
π
r eg n 1 2π
A2n − Arneg = sin − sin
4 n 2 n
n π π π
= sin − sin cos
4 n n n
n
r eg
= cnr eg − cnr eg 1 − (cn )2 ,
4
and the result follows.
We now show that if the area of a small polygon exceeds that of the regular one,
then there exists another polygon with twice as many vertices with an area exceeding
that of the regular one by the same amount.
Proof. Let P be a small equilateral n-sided polygon such that δ := A(P ) − Arneg
> 0, where n ≥ 4 is an even number. Define c = n1 P(P ) to be the length of the equi-
lateral sides of the polygon P .
The equilateral polygon Q is constructed from P by adding one vertex near the
center of each side of P . Each added vertex is at distance h away from the center of
one side of P , where h > 0 is taken to be as large as possible so that the diameter of
Q remains equal to one. The initial polygon P is represented in Figure 2 by the full
lines, the added vertices are depicted by white circles, and Q is delimited by the dotted
lines.
The area of both polygons are related as follows:
nch nch
A(Q) = A(P ) + = Arneg + + δ. (1)
2 2
176
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Figure 2. The dotted equilateral polygon Q is obtained by adding n vertices at the same distance h from each
side of the equilateral n-sided polygon P .
We next compute a valid lower bound h on the distance h from each added ver-
tex to the polygon P . The left part of Figure 3 illustrates two opposite sides of the
n-sided polygon, together with the added vertices labeled A and B. By construction,
the distance between them satisfies |AB| ≤ 1. The value of h diminishes when these
sides are moved away from each other. The minimal value of h occurs when both sides
are parallel, and when the two pairs of vertices are at unit distance, as illustrated in the
right part of the Figure 3. The distance h from the added vertices A and B to the
polygon satisfies
1
h ≥ h := 1 − 1 − c2 .
2
Figure 3. The distance from an added vertex to the polygon is larger when the corresponding sides are not
parallel to each other: h ≥ h.
Recall that the maximal area enclosed by an n-sided polygon of a given perimeter
is achieved by the regular polygon [5]. This implies that P(P ) > Pnr eg and c > cnr eg
A(P
because
√
) > Arneg
. From equation (1), Lemma 1, and using the fact that the func-
tion c 1 − 1 − c is monotone increasing, we see that
2
nch
A(Q) ≥ Arneg + +δ
2
nc
= Arneg + 1 − 1 − c2 + δ
4
Repeated applications of this last lemma allow us to prove the main result.
Theorem 3. For any integer n ≥ 3, the small n-sided equilateral polygon with the
greatest area is the regular small polygon.
Proof. The largest small polygon is the regular one when n is odd [10]. Suppose by
contradiction that there exists a small equilateral n-sided polygon P (with n even) for
which A(P ) = Arneg + δ for some scalar δ > 0.
Applying Lemma 2 repeatedly yields a sequence of small equilateral polygons with
2n, 4n, 8n, . . . sides, each with an area exceeding that of the small regular polygon by
the value δ. However,
m π π
r eg
lim A2m + δ = lim sin +δ = +δ
m→∞ m→∞ 4 m 4
implies a contradiction: there exists a small equilateral polygon whose area exceeds π4 ,
the area of the circle with unit diameter.
REFERENCES
1. C. Audet, P. Hansen, F. Messine, The small octagon with longest perimeter, J. Combin. Theory Ser. A 114
(2007) 135–150.
2. , Ranking small regular polygons by area and by perimeter, J. Appl. Ind. Math 3 (2009) 21–27.
Original Russian text: Diskretnyi Analiz i Issledovanie Operatsii, 15 (2008) 65–73.
3. C. Audet, P. Hansen, F. Messine, S. Perron, The minimum diameter octagon with unit-length sides:
Vincze’s wife’s octagon is suboptimal, J. Combin. Theory Ser. A 108 (2004) 63–75.
4. C. Audet, P. Hansen, F. Messine, J. Xiong, The largest small octagon, J. Combin. Theory Ser. A 98 (2002)
46–59.
5. V. Blåsjö, The isoperimetric problem, Amer. Math. Monthly 112 (2005) 526–566.
6. R. L. Graham, The largest small hexagon, J. Combin. Theory Ser. A 18 (1975) 165–170.
7. D. Henrion, F. Messine, Finding largest small polygons with GloptiPoly, J. Global Optim. 56 (2013)
1017–1028.
8. M. J. Mossinghoff, A $1 problem, Amer. Math. Monthly 113 (2006) 385–402.
9. , An isodiametric problem for equilateral polygons, Contemp. Math. 457 (2008) 237–252.
10. K. Reinhardt, Extremale polygone gegebenen durchmessers, Jahresber. Deutsch. Math. Verein 31 (1922)
251–270.
11. N.K. Tamvakis, On the perimeter and the area of the convex polygon of a given diameter, Bull. Greek
Math. Soc. 28 (1987) 115–132.
178
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
PROBLEMS AND SOLUTIONS
Edited by Gerald A. Edgar, Daniel H. Ullman, Douglas B. West
with the collaboration of Paul Bracken, Ezra A. Brown, Daniel Cranston, Zachary Franco,
Christian Friesen, László Lipták, Rick Luttmann, Frank B. Miles, Leonard Smiley, Kenneth
Stolarsky, Richard Stong, Walter Stromquist, Daniel Velleman, and Fuzhen Zhang.
PROBLEMS
11957. Proposed by Éric Pité, Paris, France. Let m and n be two integers with n ≥ m ≥ 2.
Let S(n, m) be the Stirling number of the second kind, i.e., the number of ways to partition
a set of n objects into m nonempty subsets. Show that
n
n m S(n, m) ≥ m n .
m
where the outer sum is over all 2n choices of (σ1 , . . . , σn ) ∈ {1, −1}n .
11960. Proposed by Ulrich Abel, Technische Hochschule Mittelhessen, Friedberg,
Germany. Let m and n be natural numbers, and, for i ∈ {1, . . . m}, let ai be a real number
with 0 ≤ ai ≤ 1 . Define
m
1 m
f (x) = 2 (1 + ai x) − m (1 + ai x) .
mn n
x i=1 i=1
Let k be a nonnegative integer, and write f (k) for the kth derivative of f . Show that
f (k) (−1) ≥ 0.
http://dx.doi.org/10.4169/amer.math.monthly.124.2.179
11962. Proposed by Elton Hsu, Northwestern University, Evanston, IL. Let {X n }n≥1 be
a sequence of independent and identically distributed random variables each taking the
values ±1 with probability 1/2. Find the distribution of the random variable
1 X1 1 X2 1
+ + + ··· .
2 2 2 2 2
11963. Proposed by Gheorghe Alexe and George-Florin Serban, Braila, Romania. Let
a1 , . . . , an be positive real numbers with nk=1 ak = 1. Show that
n
(ai + ai+1 )4
≥ 12n,
i=1
ai2 − ai ai+1 + ai+1
2
where an+1 = a1 .
SOLUTIONS
180
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Adding this and the analogous inequalities obtained by cycling the variables, we obtain
3(42a 2 + 42b2 + 42c2 ) 9(a 2 + b2 + c2 )
f (x, y, z) + f (y, z, x) + f (z, x, y) ≤ = ,
142 14
which is the desired inequality.
Editorial comment. Paolo Perfetti proved the stronger result
9 9
f (x, y, z) + f (y, z, x) + f (z, x, y) ≤ ≤ (tan2 x + tan2 y + tan2 z).
14 14
Also solved by A. Ali (India), S. Baek (Korea), R. Bagby, P. P. Dályay (Hungary), O. Geupel (Germany),
P. Perfetti (Italy), R. Stong, R. Tauraso (Italy), and the proposer.
Concyclic or Collinear
11779 [2014, 456]. Proposed by Michel Bataille, Rouen, France.
V
Let M, A, B, C, and D be distinct points B
(in any order) on a circle with center O.
Let the medians through M of triangles E
MAB and MC D cross lines AB and CD at P
P and Q, respectively, and meet again D
at E and F, respectively. Let K be the K
intersection of AF with DE, and let L be L
O A
the intersection of BF with CE. Let U U
M
and V be the orthogonal projections of F
Q
C onto MA and D onto MB, respectively,
and assume U = A and V = B. Prove
that A, B, U , and V are concyclic if and
only if O, K , and L are collinear.
C
Solution by Richard Stong, Center for Communications Research, San Diego, CA. The
problem is not quite correct. We must also assume that E and F do not coincide, hence
K and L do not coincide. (If K and L coincide, then O, K , L are clearly collinear, but
A, U, B, V need not be concyclic.)
Let R be the radius of , let N the point where lines AC and BD intersect, and let X and
Y be the reflections of O across lines AC and BD, respectively. The claim is the equivalence
of (1) A, B, U, V are concyclic, and (2) O, K , L are collinear. We show that each of these
is equivalent to (3) M is equidistant from X and Y .
(1) ⇐⇒ (3). Note that A, B, U, V are concyclic if and only if the powers from M are
equal: |MA| · |MU| = |MB| · |MV|. From trigonometry and the extended law of sines,
|MB| · |MV| = 4R 2 sin 12 ∠MOB sin 12 ∠MOD cos 12 ∠BOD .
If K = L, then this is the unique line through K and L. Hence, O, K , and L are collinear
if and only if
Plugging in the formulas for e and f above and factoring out (a − b)(c − d), this becomes
1 1 1 1 (ab − cd)(ad − bc)
m + − − + (a + c − b − d)m = .
a c b d abcd
Since x = a + c and y = b + d, this is the equation of the line perpendicular to XY through
the midpoint (a + b + c + d)/2 of XY. Hence, O, K , and L are collinear if and only if
|MX| = |MY|.
Also solved by R. Chapman (U. K.), J.-P. Grivaux (France), C. R. Pranesachar (India), and the proposer.
Altitudes of a Tetrahedron
11783 [2014, 549] and 11797 [2014, 738]. Proposed by Zhang Yun, Xi’an City, Shaanxi,
China. Given a tetrahedron, let r denote the radius of its inscribed sphere. For 1 ≤ k ≤ 4,
let h k denote the distance from the kth vertex to the plane of the opposite face. Prove that
4
hk − r 12
≥ .
h +r
k=1 k
5
182
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Editorial comment. The problem was inadvertently repeated as Problems 11783 and 11797.
Several solvers noted that equality holds if and only if the face areas are equal. This
does not require, however, that the tetrahedron be regular. Some solvers noted that the
n-dimensional analogue of the inequality holds with lower bound n(n + 1)/(n + 2).
Also solved by A. Ali (India), S. Baek (Korea), R. Bagby, M. Bataille (France), D. M. Bătinetu-Giurgiu &
T. Zvonaru (Romania), I. Borosh, R. Boukharfane (Morocco), R. Chapman (U. K.), N. Curwen (U. K.),
P. P. Dályay (Hungary), M. Dincă (Romania), D. Fleischman, H. S. Geun (Korea), O. Geupel (Germany),
M. Goldenberg & M. Kaplan, J. G. Heuver (Canada), S. Hitotumatu (Japan), E. J. Ionaşcu, Y. J. Ionin,
B. Karaivanov (U.S.A.) & T. S. Vassilev (Canada), O. Kouba (Syria), D. Lee (Korea), O. P. Lossers (Nether-
lands), V. Mikayelyan (Armenia), R. Nandan, Y. Oh (Korea), P. Perfetti (Italy), I. Pinelis, C. R. Pranesachar
(India), Y. Shim (Korea), J. C. Smith, R. Stong, T. Viteam (India), M. Vowe (Switzerland), T. Zvonaru &
N. Stanciu (Romania), GCHQ Problem Solving Group (U. K.), Missouri State University Problem Solving
Group, University of Louisiana at Lafayette Math Club, and the proposer.
(d) Prove that if H , K , and L are√ the respective projections of P onto AB, AC, and BC,
then the area of triangle HKL is 3163 (R 2 − r 2 ).
(e) With the same notation, prove that |HK|2 + |KL|2 + |HL|2 = 94 (R 2 + r 2 ).
Solution by TCDmath Problem Group, Trinity College, Dublin, Ireland.
(a) We represent the points by complex numbers: A = r , B = r ω, C = r ω2 , O = 0,
P = z, where r > 0 and ω = e2πi/3 . We compute
(b) We have
|PA| |PB| |PC| = (z − r )(z − r ω)(z − r ω2 ) = z 3 − r 3 .
This formula takes its maximum value R 3 + r 3 when z 3 = −R 3 , that is, when z = Reiθ
with 3θ ≡ π (mod 2π ) or when P lies on one of the altitudes of the triangle on the oppo-
site side to the vertex. It takes its minimum value R 3 − r 3 when z 3 = R 3 , that is, when P
lies on one of the three altitudes of the triangle on the same side as the vertex.
(c) Heron’s formula for the area of a triangle with sides a, b, c is
162 = 2 a 2 b2 + b2 c2 + c2 a 2 − a 4 + b4 + c4
2
= a 2 + b2 + c2 − 2 a 4 + b4 + c4 .
184
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Then by the Stolz–Cesàro theorem, we have
log(xn+1 /xn ) log xn+1 − log xn log xn
lim = lim = lim = β. (1)
n→∞ xn+1 n→∞ xn+1 n→∞ x 1 + · · · + x n
This implies that xn+1 < xn for large n since xn > 0 and limn→∞ xn = 0 by assumption.
Since xn goes to zero as n → ∞, (1) implies limn→∞ βxn+1 = limn→∞ log(xn+1 /xn ) = 0.
It follows that
xn+1
lim = 1. (2)
n→∞ x n
Since limm→∞ 1
(N +m)x N
= 0,
1 1
≤ − lim (N + m)x N +m = − lim nxn ≤ .
β + m→∞ n→∞ β −
Let approach 0 to obtain limn→∞ nxn = −1/β > 0. Thus,
1
lim log(nxn ) = lim (log n + log xn ) = log − .
n→∞ n→∞ β
However, if this holds, then, since log n → ∞, it must be the case that
(m a + m b + m c )2 ≤ 4s 2 − 16Rr + 5r 2 ,
due to Xiao-Guang Chu and Xue-Zhi Yang. (See J. Liu, “On an inequality for the medians
of a triangle,” Journal of Science and Arts, 19 (2012) 127–136.) The second is
√
s ≤ (3 3 − 4)r + 2R,
A Middle Subspace
11792 [2014, 648]. Proposed by Stephen Scheinberg, Corona del Mar, CA. Show that every
infinite-dimensional Banach space contains a closed subspace of infinite dimension and
infinite codimension.
Solution by University of Louisiana at Lafayette Math Club, Lafayette, LA. Let V be an
infinite-dimensional normed vector space (we do not require completeness). We construct a
sequence of linearly independent vectors v0 , v1 , . . . in V and a sequence of bounded linear
functionals λ0 , λ1 , . . . such that λi (v j ) = δi, j for all nonnegative integers i and j. Choose
a nonzero v0 ∈ V . By the Hahn–Banach theorem, there is a bounded linear functional λ0
on V with λ0 (v0 ) = 1. Suppose that nonzero vectors v0 , . . . , vk ∈ V and bounded linear
functionals λ0 , . . . , λk have been defined such that λi (v j ) = δi, j for i, j ∈ {1, . . . , k}. The
k
vector subspace i=1 ker λi has infinite dimension since it has finite codimension in V ,
k
which is infinite-dimensional. In particular, there exists nonzero vk+1 ∈ i=1 ker λi . The
186
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
functional λk+1 may be defined by λk+1 (v j ) = 0 for 0 ≤ j ≤ k and λk+1 (vk+1 ) = 1 and
then extended by the Hahn–Banach theorem to a bounded linear functional on V . The
vectors v0 , . . . , vk+1 are linearly independent since applying λ j to ci vi = 0 shows that
c j = 0. Continuing in this way, we construct the desired sequence v0 , v1 , . . . .
Let W be the closure of the linear span of {v0 , v2 , v4 , . . . }. The subspace W has infi-
nite dimension, since the vi are linearly independent. We claim also that W has infinite
codimension, that is, that V /W is infinite-dimensional. We prove this by showing that
the cosets v1 + W, v3 + W, v5 + W, . . . are linearly independent. Suppose otherwise, that
n n and there are some scalars α1 , . . . , αn ∈ R with at least one of them nonzero
there is some
such that i=1 αi v2i−1 ∈ W . Say α j = 0. Since λ2 j−1 (vi ) = 0 for even i, the linear func-
tional λ2 j−1 vanishes on their linear span and therefore on the closure W . This contradicts
n
λ2 j−1 αi v2i−1 = α j = 0.
i=1
Editorial comment. This inequality seems to have first appeared in M. S. Klamkin and
D. J. Newman, An inequality for the sums of unit vectors, Univ. Beo. Publ. Elek. Fac., Ser.
Mat. i. Fiz. 338–352 (1971) 47–48. A more accessible reference is G. D. Chakerian and
M. S. Klamkin, Inequalities for Sums of Distances, this M ONTHLY 80 (1973) 1009–1017.
Also solved by M. Aassila (France), U. Abel (Germany), K. F. Andersen (Canada), R. Bagby, E. Bojaxhiu
(Albania) & E. Hysnelaj (Australia), R. Boukharfane (France), F. Brulois, P. Budney, S. Byrd & R. Nichols,
N. Caro (Brazil), R. Chapman (U. K.), W. J. Cowieson, P. P. Dályay (Hungary), P. J. Fitzsimmons, N. Grivaux
(France), E. A. Herman, Y. J. Ionin, E. G. Katsoulis, J. H. Lindsey II, O. P. Lossers (Netherlands), V. Muragan
& A. Vinoth (India), M. Omarjee (France), M. A. Prasad (India), R. Stong, R. Tauraso (Italy), J. Van Hamme
(Belgium), J. Zacharias, R. Zarnowki, New York Math Circle, and the proposers.
What does it mean to assert a mathematical claim, for example that there is a prime
between 5 and 10? If the claim is true, then what makes it true? And how do we come
to know it in the first place? It is apparently basic questions such as these that drive
the field of philosophy of mathematics. That these questions arise for even the most
elementary mathematical propositions makes the philosophical project to elucidate
the nature of mathematics accessible to nonspecialists. It also makes it frustratingly
inconclusive.
Before delving into contemporary philosophy of mathematics, let us begin by cast-
ing a glance back one hundred years to the early part of the twentieth century. At this
time, philosophers of mathematics were focused on the following question
(i) Are the central claims of our core mathematical theories true? If so, what makes
them true? [foundation]
Interestingly, the project of addressing the foundation question was taken up not just
by philosophers, but also by a number of prominent mathematicians. Many philoso-
phers view this period as the “golden age” of philosophy of mathematics. Although
the foundation of mathematics went through a series of crises during this time, the
issues being addressed were of interest to the wider mathematical community. The
“Big Four” philosophical views on the nature of mathematics that emerged during this
period were logicism, intuitionism, formalism, and platonism.
According to logicism, the truths of mathematics are ultimately truths of logic. Once
appropriate definitions of the basic terms are given, statements such as “2 + 2 = 4”
can be seen to be true solely by virtue of the meanings of the expressions involved,
as is sometimes the case with nonmathematical claims, such as “All bachelors are
unmarried.” Logicism began with the work of the German philosopher Gottlob Frege
in the late nineteenth century, was taken up by Bertrand Russell in the early twentieth
century, and culminated with the massive (and massively impenetrable) three-volume
work, Principia Mathematica, published in 1910 by Russell and Whitehead.
According to intuitionism, which was championed by the Dutch mathematicians
Brouwer and Heyting, and which took its inspiration from the philosophy of Immanuel
Kant, mathematical entities such as numbers are created by the mental acts of mathe-
maticians. Intuitionism is a form of antirealism, since it denies that there is a preexist-
ing universe of mathematical entities waiting to be described. However, mathematical
claims can be true if they are proved in the right way. In particular, proofs of the
existence of some particular mathematical entity must proceed by giving an explicit
“recipe” for constructing the given entity.
According to formalism, mathematical claims are meaningless strings of symbols
that are manipulated according to explicitly stated formal rules. This is a more radical
http://dx.doi.org/10.4169/amer.math.monthly.124.2.188
188
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
form of antirealism, since it denies that mathematical claims are even true (or false
either)! The most famous defender of formalism was David Hilbert. Hilbert was not a
formalist about the whole of mathematics, only the part that deals with infinite totali-
ties. For Hilbert, a mathematical claim such as, “There is a prime number between 10
and 20” can be expressed as a finite string of claims about finite numbers (i.e., “Either
10 is prime or 11 is prime or . . . or 20 is prime.”) and thus is meaningful. By contrast, a
claim such as, “There is no largest prime number,” cannot be expressed as a finite string
of claims about finite numbers, and hence, on Hilbert’s view, is not capable of being
true or false. For Hilbert, infinitary statements are merely vehicles for moving between
(meaningful) finitary claims. The use of infinitary claims is permissible provided that
we can show that their use never results in inconsistency.
Famously, all three of these philosophical views of mathematics ran into technical
difficulties. Frege’s version of logicism was dealt a fatal blow by Russell’s paradox.
On Frege’s view, every property determines a set of things that have that property.
Russell’s paradox asks about the property, which we shall denote by p, of not being
self-membered. Is the set S determined by p a member of itself? If so, then it does
have the determining property p, implying it is not a member of itself. But if it is not
a member of itself, then it has property p, implying that it is a member of itself! Rus-
sell’s own response to this paradox was to build a version of logicism that separates
objects and sets made up of those objects into different levels. This paved the way
to the development of modern set theory. While set theory seems to make an excel-
lent foundation for the rest of mathematics, it does not vindicate logicism because set
theory is not logic.
The main technical problem with intuitionism is that it requires an underlying logic
that rejects the Law of the Excluded Middle. This is the principle that, for any state-
ment p, either p or not- p is true. If p is a mathematical existence claim, then one way
to prove p, in classical mathematics, is to show that the assumption that not- p leads
to a contradiction. The intuitionist rejects this form of reductio ad absurdum proof,
since what the intuitionist requires in order to establish p is the construction of a par-
ticular example that fits the existence claim. For example, if p is the claim that there
exist irrational numbers which, when raised to irrational powers, are rational, then the
intuitionist will demand at least one specific example of such a number. This feature
of intuitionism is not contradictory, as was the case with Frege’s logicism, but it does
conflict with mainstream mathematical practice. David Hilbert famously complained
that “taking the Principle of the Excluded Middle from the mathematician . . . is the
same as . . . prohibiting the boxer the use of his fists.”
Hilbert’s own preferred philosophy of mathematics, formalism, ran into its own
roadblock in the formidable shape of Gödel’s celebrated incompleteness theorems. A
corollary of these theorems is that a consistent system strong enough for arithmetic
cannot be used to probe its own consistency. This means that the finitary part of math-
ematics cannot be relied upon to guarantee the consistency of the infinitary—and, for
Hilbert, meaningless—parts.
There was also a fourth philosophical view floating around in the early twentieth
century, one with much older roots and with some well-known proponents, including
such luminaries as G. H. Hardy and Kurt Gödel. Here is a characteristic passage from
Hardy’s A Mathematician’s Apology:
I believe that mathematical realty lies outside us, that our function is to discover
or observe it, and that the theorems which we prove, and which we describe
grandiloquently as our “creations,” are simply our notes to our observations.
This view has been held, in one form or another, by many philosophers of high
While not a view about the foundations of mathematics per se, platonism is in an
important sense the most straightforwardly realist position of all: mathematicians are
exploring and describing an abstract landscape that exists independently of us. While
not subject to the technical problems that afflicted the preceding three views, platonism
runs into severe difficulties answering a second philosophical question:
(ii) How do we come to know the truth of the central claims of our core mathematical
theories? [knowledge]
Note that Hardy’s talk of “observations” in the above passage is at best metaphorical.
No mathematician has ever literally observed a mathematical object.
What does philosophy of mathematics look like today? Fast-forwarding a hundred
years from the foundational controversies of the early twentieth century, we can see
successors of each of the “Big Four” philosophical views on the nature of mathematics.
In addition to a shift from the foundation question to the knowledge question, a third
question has also come to increasing prominence in contemporary debates:
(iii) What explains the usefulness of mathematics in science, and its applicability more
generally to the world? [application]
In the remainder of this review, I shall briefly outline the four most prominent current
philosophies of mathematics, and suggest in each case a book that explores the given
position in more detail.
First up is neologicism. For decades after Frege’s logicism was torpedoed by Rus-
sell’s paradox, it was assumed that this dealt a fatal blow to logicism more generally.
It was not until the early 1980’s that philosophers noticed a relatively straightforward
way to salvage the core aspects of Frege’s approach while avoiding Russell’s para-
dox. In his original work, Frege proposes an axiom that he calls “Basic Law V.” One
implication of Basic Law V is that for every property there is a set of objects that fall
under that property. Frege uses Basic Law V to prove a key foundational result, that the
number of Fs is equal to the number of Gs if and only if the F-objects can be put into
one-to-one correspondence with the G-objects. This latter result has come to be known
as “Hume’s Principle.” Basic Law V is what gives rise to Russell’s paradox. (Consider
the property of not being self-membered. According to Basic Law V, there is a set S
of objects with this property. Is S a member of itself? Either answer leads to contra-
diction.) However, Basic Law V plays a very little role in Frege’s system other than
to prove Hume’s principle, and Hume’s principle itself does not fall prey to Russell’s
paradox. Neologicism proposes to jettison Basic Law V and instead use Hume’s prin-
ciple, plus logic, as the foundation of arithmetic. Hume’s principle is one example of a
family of principles known as abstraction principles. Another goal of neologicism is to
find a way of distinguishing “good” (i.e., consistent) abstraction principles from “bad”
(i.e., inconsistent) abstraction principles and to find foundational abstraction principles
for other areas of mathematics. (An example from geometry is the principle that the
direction of line M is equal to the direction of line N if and only if M is parallel to N .)
An excellent book-length summary of neologicism, including both the philosophical
motivations and the technical results, is Fixing Frege by John Burgess [1].
The second of our four contemporary philosophies of mathematics is structural-
ism. While not a direct successor of intuitionism (in the way that neologicism is a
direct successor of logicism), structuralism shares with the older intuitionist position
a down-playing of the status of mathematical objects as mind-independent entities.
190
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
For the structuralist, mathematics is about structure, not objects. Indeed, any collec-
tion of objects with the right structure can serve to instantiate a given mathematical
theory. Take the natural numbers, for example. According to structuralism, numbers
are simply places in the natural number structure. There is no independent object that
is the number 17. Structuralism helps to address all three of the issues that preoccupy
philosophers of mathematics. It helps with the knowledge question, since knowledge
of structures seems more tractable than knowledge of abstract objects. It helps with
the applicability question, since structures are by their nature realizable in multiple
ways by different physical phenomena. And it fits well with the way that mathemati-
cians talk about mathematics, and with research into structure-centered foundations
for mathematics such as category theory. Philosopher Stewart Shapiro has done a lot
to articulate and defend structuralism and his book Philosophy of Mathematics: Struc-
ture and Ontology [4] is an excellent overview of this position.
Formalism not only falls foul of Gödel’s results, it also flies in the face of math-
ematical practice. When mathematicians describe themselves as formalists, they tend
to use this label merely to emphasize their view of the importance of rigor and proof.
Few actually believe—as formalism dictates—that mathematical claims are meaning-
less strings of symbols. The next philosophical position I shall discuss is fictional-
ism. According to the fictionalist, core mathematical claims are meaningful, but they
are false. A claim such as “7 is a prime number” is akin to a claim about a fictional
character, such as “Sherlock Holmes is a pipe-smoking detective.” Each is an accept-
able claim to make, in the right context, yet each is, strictly speaking, false. Sherlock
Holmes does not exist, and nor does the number 7. On the fictionalist view, what math-
ematicians are doing is setting out fictional scenarios and then exploring their conse-
quences. Thus, for example, the story of arithmetic might begin: “Once upon a time
there was a number, 0, that was the successor of no number, and it had a successor, 1,
and . . . .” Fictionalism does well on the knowledge question: we make up our mathe-
matical fictions, so there is no problem explaining how we know about what happens
in them. More of a problem is the applicability question. The Sherlock Holmes stories
may be entertaining, but they are not particularly useful. What makes our mathemati-
cal fictions so invaluable for theorizing about the physical world? Mary Leng develops
and defends a broadly fictionalist position in her book Mathematics and Reality [3].
This brings us to the fourth and final philosophical position, known as indispens-
abilist platonism (IP). The strategy underlying IP is to use the applicability of math-
ematics to answer the knowledge question for platonism. We begin by noting that
science makes reference to a variety of theoretical entities such as electrons, genes,
and black holes. We believe in the existence of these entities because they are part of
our best scientific theories. But science also refers to a variety of mathematical entities
such as numbers, sets, and functions. Moreover, these mathematical entities are seem-
ingly indispensable to science: we do not know how to formulate our theories without
them. IP argues that this provides sufficient grounds for believing in the existence of
numbers, sets, and functions. In brief, we ought to believe in the literal truth of math-
ematics because we believe our best scientific theories and we need mathematics for
our best scientific theories. The pros and cons of indispensabilist platonism have been
much discussed over the past two decades. Mark Colyvan’s book, The Indispensability
of Mathematics [2] provides a nice overview of this position.
Philosophers of mathematics have traditionally focused their attention on a very nar-
row selection of core areas of mathematics such as arithmetic, geometry, and set the-
ory. This has changed over the past several decades, with philosophers now routinely
bringing in a more diverse array of examples from fields such as topology, group the-
ory, linear algebra, and knot theory. This broadening of perspective has gone hand
REFERENCES
192
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 124
Now Available in the Anneli Lax
New Mathematical Library
Common Sense
Mathematics
Ethan D. Bolker and Maura B. Mast
&RPPRQ6HQVH0DWKHPDWLFVLVD
WH[WIRUDRQHVHPHVWHUFROOHJHOHYHO
FRXUVHLQTXDQWLWDWLYHOLWHUDF\7KH
WH[WHPSKDVL]HVFRPPRQVHQVHDQG
FRPPRQNQRZOHGJHLQDSSURDFKLQJ
UHDOSUREOHPVWKURXJKSRSXODUQHZVLWHPVDQG¿QGLQJXVHIXOPDWK
HPDWLFDOWRROVDQGIUDPHVZLWKZKLFKWRDGGUHVVWKRVHTXHVWLRQV