Abstract Set Theory (Skolem)

NOTRE DAME MATHEMATICAL LECTURES Number 8
ABSTRACT SET THEORY

by
THORALF A. SKOLEM Professor of Mathematics University of Oslo, Norway
NOTRE DAME, INDIANA 1962
Copyright 1962 UNIVERSITY OF NOTRE DAME
COMPOSITION BY BELJAN, ANN ARBOR, MICHIGAN PHOTOLITHOPRINTED BY GUSHING - MALLOY, INC. ANN ARBOR, MICHIGAN, UNITED STATES OF AMERICA 1962
PREFACE The following pages contain a series of lectures on abstract set theory given at the University of Notre Dame during the Fall Semester 1957-58. After some historical remarks the chief ideas of Cantor's theory, now usually called the naive set theory, are explained. Then the axiomatic theory of Zermelo-Fraenkel is developed and some critical remarks added. In particular the settheoretic relativism is emphasized as a natural consequence of the application of Lowenheim's Theorem on the axioms of set theory. Other versions of axiomatic set theory which logically are of very similar character are not dealt with. However, the simple theory of types, Quine's theory and the ramified theory of types are treated to a certain extent. Also Lorenzen's operative mathematics and the intuitionist mathematics are outlined. Further, there is a short remark on the possibility of finitist mathematics in a strict sense and finally some hints are given about the possibility of a set theory based on a logic with an infinite number of truth values. The book "Transfinite Zahlen" by H. Bachmann has been very useful in particular for the writing of parts 6 and 8. Some references to the literature on these subjects occur scattered in the text, but no attempt has been made to set up a complete list. Such a task seems indeed scarcely worth while, because very extensive and complete lists can be found both in the mentioned book of Bachmann and in the book "Abstract Set Theory" by A. Fraenkel. Th. Skolem.
CONTENTS
1. Historical remarks. Outlines of Cantor's theory 2. Ordered sets. A theorem of Hausdorff 3. Axiomatic set theory. Axioms of Zermelo and Fraenkel 4. The well-ordering theorem 5. Ordinals and alephs 6. Some remarks on functions of ordinal numbers 7. On the exponentiation of alephs 8. Sets representing ordinals 9. The notions "finite" and "infinite" 10. The simple infinite sequence. Development of arithmetic 11. Some remarks on the nature of the set-theoretic axioms. The set-theoretic relativism 12. The simple theory of type.s 13. The theory of Quine 14. The ramified theory of types. Predicative set theory 15. Lorenzen's operative mathematics 16. Some remarks on intuitionist mathematics 17. Mathematics without quantifiers 18. The possibility of set theory based on many-valued logic
1 7 12 19 22 28 32 35 38 41 45 48 50 52 61 64 68 69
ABSTRACT SET THEORY
fay
Thoralf A. Skolem
1. Historical remarks. Outlines of Cantor's theory Almost 100 years ago the German mathematician Georg Cantor was studying the representation of functions of a real variable by trigonometric series. This problem interested many mathematicians at that time. Trying to extend the uniqueness of representation to functions with infinitely many singular points he was led to the notion of a derived set. This was not only the beginning of his study of point sets but lead him later to the creation of transfinite ordinal numbers. This again lead him to develop his general set theory. The further development of this, the different variations or modifications of it that have been proposed in more recent years, the discussions and criticisms with regard to this subject, will constitute the contents of my lectures on set theory. One ought to notice that there have been some anticipations of Cantor's theory. For example B. Bolzano wrote a paper with the title: Paradoxien des Unendlichen (1951) (Paradoxes of the Infinite), where he mentioned some of the astonishing properties of infinite sets. Already Galilei had noticed the remarkable fact that a part of an infinite set in a certain sense contained as many elements as the whole set. On the other hand it ought to be remarked that about the same time that Cantor exposed his ideas some other people were busy in developing what we today call mathematical logic. These investigations concerned among other things the fundamental notions and theorems of mathematics, so that they should naturally contain set theory as well as other more elementary or ordinary parts of mathematics. A part of the work of another German mathematician, R. Dedekind, was also devoted to studies of a similar kind. In particular, his book "Was sind und was sollen die Zahlen" belongs hereto. In my following first talks I will however confine my subject to just an exposition of the most characteristic ideas in Cantor's work, mostly done in the years 1874-97. The real reason for a mathematician to develop a general set theory was of course the fact that in mathematics we often have to do not only with single mathematical objects but also with collections of them. Therefore the study of properties of such collections, even infinite ones, must be of very great importance. There is one fact to which I would like to call attention. Most of mathematics and perhaps above all the classical set theory has been developed in accordance with the philosophical attitude called Platonism. This standpoint means that we consider the mathematical objects as existing before and independent of our actual thinking. Perhaps an illustrating way of expressing it is to say that when we are thinking about mathematical objects we are looking at eternal preexisting objects. It seems clear that the word "existence"
LECTURES ON SET THEORY
according to Platonism must have an absolute meaning so that everything we talk about shall either exist or not in a definite way. This is the philosophical background for classical mathematics generally and perhaps in particular for classical set theory. Being aware of this, Cantor explicitly cites Plato. Everybody is used to saying that a mathematical fact has been discovered, not that it has been invented. That shows our natural tendency towards Platonism. Whether this philosophical attitude is justified or not, however, I will not discuss now. It will be better to postpone that to a later moment. When Cantor developed his theory of sets he liked of course to conceive the notion "set" as general as possible. He therefore desired to give a kind of definition of this notion in accordance with this most general conception. A definition in the proper sense this could not be, because a definition in the proper sense means an explanation of a notion by means of more primitive or previously defined notions. However, it is evident that the notion "set" is too fundamental for such an explanation. Cantor says that a set is a collection of arbitrary well-defined and well-distinguished objects. What is achieved, perhaps, by this explanation is the emphasizing that there shall be no restriction whatever with regard to the nature of the considered objects or to the way these objects are collected into a whole. Taking the Platonist standpoint, it is clear that this whole, the collection, must itself again be considered as one of the objetts the set theory talks about and therefore can be taken as an object in other collections. This is indeed clear, because there are no restrictions as to the nature of the objects. Now we are very well acquainted with sets in daily life. These sets are finite, but I shall not now enter into the distinction between finite and infinite sets. The most important mathematical property of the finite sets is the number of their elements. By the way I write me M, expressing that m is an element of or belongs to M. Indeed this notation is used everywhere in the literature. If we shall compare two finite sets M and N with regard to number, we may do that in the way of pairing off the elements by distributing as far as possible the elements of M and N into disjoint pairs. Let us for simplicity assume M and N disjoint, that is, without common elements. If it is possible to distribute the elements of M and N into disjoint pairs (m,n), meM, neN, such that all meM and all neN occur in these pairs, then it is evident that there are just as many elements m in M as elements n in N. If, however, we may build a set of pairs (m,n) such that all m occur, but not all n, then in the case of finite sets M possesses less elements than N. It is clear that it must be possible to compare sets by considering such sets of disjoint pairs in the case of infinite sets as well. This leads to one of the most important notions not only in the classical set theory but also in ordinary mathematics, namely, the notation of one-to-one correspon*dence or mapping. We say that f is a one-to-one correspondence between the sets M and N, if f is a set of mutually disjoint pairs (m,n) such that each meM and each neN occur in one of the pairs. In order to be able to take into account the case that M and N have some common elements, it is necessary to replace the simple notion pair {a,b}, which means the set containing a and b as elements, with the notion ordered pair (a,b), which can be conceived as {{a,b}, {a} }. However I will here, to begin with, use the notion ordered pair, triple etc. as known ideas without worrying about an analysis of them.
HISTORICAL REMARKS
Possessing the notion one-to-one correspondence or mapping, we may obtain this generalisation of the number concept: M and N have the same cardinal number, if a mapping f exists of M on N. This circumstance is written M ~ N. Cantor says that the cardinal number M of M is what remains, if we make an abstraction with regard to the individual characters of Us elements. This definition is made much clearer by Russell, who says that M is the set of all sets N being ~ M. _ Further, this definition of the relatiqg = Between cardinals was natural: M i N if M is ~ a subset of N. Further M < N if M ~ a subset of N, but N not ~ M. Let us again introduce some notations. I shall write A B when the set A is contained in B, and AcB, if A is contained in B, but not inversely B in A. Then we know that for the finite sets as we encounter them in everyday life, there is never a mapping of the set on a proper part of itself. Thus, if M is finite, Nc M - N <M. Dedekind uses this as a definition of finite sets: A set M is finite, if it is not ~ any proper part N of itself. On the other hand, if we look at the simplest infinite set we know, namely the number series 0,1,2,..., then it is easily seen that this set admits a mapping on a proper part of itself, for example, the set of positive integers 1,2,.... It is said that already Galilei wondered about this, and found it an astonishing property of an infinite set, that a proper part of it could in a certain sense be said to possess just as many elements as the whole set. Some further notations may suitably be mentioned now. We write M U i N resp. M O N as the notation for the union of M and N, resp. the intersection of M and N. Thus M U N contains as elements all the elements of M and N and only these, while M n N contains as elements just the common elements of M and N. If M n N is empty, i.e., M and N are disjoint, I shall often write M + N instead of M U N. Both operations can be generalized very far. Let T be a set whose elements are again sets A,B,C,.... Then I will write ST and DT as denotations for the union of all sets A,B,C,..., resp. the intersection of all A,B,C,... In natural analogy to the arithmetic of finite sets, addition of cardinals is defined thus: M + N = M + N, ifM and N are disjoint, and generally, if A,B,C,..., constituting all elements of T, are disjoint in pairs, then ST is said to be the sum of the cardinals of all the elements A,B,C,... of T. These definitions are justified by the simple theorem: If A ~ A1, B ~ Bf, C ~ C1, , any two of A,B,C,.... as well as any two of A f ,B f ,C f .... being disjoint, then ST ~ ST1, TT denoting the set of all A T ,B f ,
C t , ....
The proof of this theorem is of course quite trivial, but as we shall see later, the so-called axiom of choice must be applied.
Multiplication of cardinals is defined in the simplest way by taking again disjoint sets. If M and N are two such sets, we may build the set of all pairs {m,n}, where m and n run independently through all elements of M and N. This set will be written M N. It is again easy to see that if M ~ M', N ~ N', where MT and NT are again disjoint, then the set M1 N1 of all pairs {m f , nf} will be ~M N. Therefore we may define an operation on the cardinal numbers called multiplication by putting
MN = M N.
This can then in an obvious way be generalized to the general case, where T is a set of mutually disjoint sets A,B,C,.... Letting PT denote the set of all sets which consists of just one element from ja.ch of A,B,C,..., we say that PT is the product of the cardinal numbers A,B,C,... Using ordered pairs we may define the Cartesian product MX N of M and N. This is the set of all ordered pairs (m,n) such that m e N , neN. Of course MX N = M N. A natural assumption after the discovery that the natural number series is ~ proper parts of itself was that many sets of mathematical objects ought to possess the same cardinal number as the number series, even if they contained the latter as a proper subset. This assumption Cantor proved to be correct. Quite trivial is the remark that the series of integers > a certain negative integer is of the same cardinality as the series of non-negative integers. A little more remarkable is the fact that this is true of the set of all rational integers, negative, positive or zero. The last fact is verified by writing the integers for ex. in this order:
0, -1, 1, -2, 2, -3, 3,
Or in other words, if we put for x = 0

y = 2x and for x < 0 y = -2x - 1,
then this function y of x furnishes a 1-1- correspondence between all integers on the one hand and the non-negative ones on the other hand. Let P denote the set of all pairs of non-negative integers, while N is the set of the non negative integers themselves. Then one finds that
= ( x + y + 1) + /x\ \ 2 / (l)
yields a one-to-one correspondence between P and N. Indeed to every pair (x, y) corresponds a unique value of z and to each value of z there is only one pair of non-negative integers x, y such that the above equation is fulfilled. Similarly the set of all ordered n-tuples (xi,..., xn) all xêN has the same cardinal number as N. All sets possessing this cardinal number are called denumerable. Turning to the more often considered sets of numbers, Cantor proved that the set of all rational numbers is denumerable. We can take the rationals a in the form r-, b > 0, a and b coprime integers. Then we arrange the rationals so that lal + b successively takes the values 1,2,3,.... and the for
HISTORICAL REMARKS which lal + b has the same value we arrange according to their magnitude. Thus we obtain the sequence llll2^jljÎ3:^j4H^^z3I^^4lH^+3j4 1' 1' 1' 1' 2' 2' I 9 1' 3' 3' 1' 1' 2' 3' 4' 4' 3' 2' 1'
containing all the rational numbers. Cantor proved also that even the set of all algebraic numbers is denumerable. This can be done in the simplest way as follows. Every algebraic number is a root in an irreducible equation anxn + ....+ ao = 0 for some n, the a0,.... an being integers with 1 as g.c. div. Now we can arrange the n-tuples an, ...., a0 in a sequence by taking the successively increasing values of m = |an| + + I a 0 l + n. Those with the same m we can take according to increasing values of n, and for those with the same value of m and n, which are only finite in number, we arrange the corresponding roots first according to their absolute value and finally those which have the same absolute value we arrange according to increasing amplitude. One might get the impression that all infinite sets were denumerable. However, Cantor proved that the set of all real numbers, even all reals between 0 and 1, is not denumerable. His proof is performed by the diagonal method, called after him in the literature: Cantor's diagonal method. We know that every real number = 0 and < 1 can be written as a decimal fraction 0. ai a 2 .... and this decimal fraction is unique, if we require tnat there shall not occur only 9Js from a certain place on. Then let us assume that
c*i = 0. an a2i ... Q?2 = 0. aw a22 ...
were all reals ^ 0 and < 1. Let the real number 0 be O.bib 2 ...., where br for each r is the next digit after arr (0 when arr is 9) except when all an from a certain i on are all 8; then we take the bi as 7 for example. Then obviously 0 ^ /3 < 1, while ]3 is 4= every at. Thus the set of reals i 0 and <1 is not denumerable. This means that in Cantor's theory we have to do witn different infinite cardinals. It is now natural to ask, if spaces of higher dimensions would yield greater cardinals. Cantor showed that this is not the case. His result that e.g., a plane could be mapped onto a line or say onto a segment of a straight line astonished the mathematical world at that time. I shall now expose a proof of the fact that the 1. quadrant of a plane, say in Cartesian coordinates the set of all pairs of positive real numbers x,y, can be mapped on the real numbers z > 0. The definition of such a mapping is particularly easy when we make use of the development of reals in continued fractions. Any positive real number a can be developed thus:
a1 = ao +
where a<j^ 0, while ai, a2,.... are all ^ 1. Now I define the correspondence so that the points (x, y), where x and y are both irrational, are mapped on the irrational z > 2, the points (x,y), where x is irrational, y rational, are mapped on the irrational z such that 1 < z < 2, the points (x, y), where x is rational, y irrational, are mapped on the irrationals z < 1, and finally the rational points (x,y) are mapped on the rationals z. This mapping is defined as follows. As often as x and y are both irrational, their continued fractions being
Q
. y yi
+ -
x2 + ...
1
y2
the corresponding z shall be

z = XQ + 2 +
yo + 1 + 1 + * + ^+ 1 x2 y 2 +... If x is irrational, but y rational, the corresponding z shall be

z=l+
n +-
\
+ 1+
+ X2
where n is the number given to y in an enumeration of all rationals. If x is rational, y irrational, the corresponding z is, when n is the number of x, 1 n+
yo + l +
Finally the (x,y) where x and y are both rational and > 0, are mapped in an arbitrary way on the rational numbers z > 0. Cantor also proved generally that the set UM of subsets of a set M was of higher cardinality than M; however, I will talk about this theorem later.
ORDERED SETS.
2.
Ordered sets. A theorem of Hausdorff.
One obtains a more complete idea of Cantor's work by studying his theory of ordered sets. As to the notion "ordered set" this is nowadays mostly defined in the following way: A set M is ordered by a set P E M2, if and only if the following statements are valid: 1) No pair (m,m), meM, is eP. 2) For any two different elements m and n of M either (m,n)eM or (n,m)eM but not both at the same time. 3) for all m,n,peM we have (m,n)eP & (n,p)eP-(m,p)eP (transitivity). As often as (m,n)eP, we also say m is less than n or m preceeds n, written m< n. K M and N are ordered sets and there exists a one-to-one order-preserving correspondence between them, Cantor said that they were of the same order type and wrote M - N. They are also called similar. Evidently two ordered sets of the same order type possess the same cardinal number; but the inverse need not be the case. Only for finite sets is it so that two Odered sets of the same cardinality are also of same type. Cantor denoted by M the order type of M. That two infinite ordered sets possessing the same cardinal number may have different order types is seen by the simple example of the set of positive integers on the one hand and that of the negative integers on the other. Both sets are denumerable, but obviously not ordered with the same type, because the former has a first member, which the other has not, whereas the latter has a last member, which the former does not possess. Cantor studied to a certain extent the denumerable types, also types of the same cardinality as the continuum, but above all he studied the so-called well-ordered sets. In this short survey of Cantor's theory I shall only mention some of the most remarkable of his results and add a theorem of Hausdorff. It will be necessary to define addition and multiplication of ordered sets. If A and B are ordered by PA and PB while A and B are disjoint, the sum set A + B will be ordered by PA + PB + A B. We have of course to distinguish between A + B and B + A. This addition may be extended to the case of an ordered set T of ordered sets A,B,C,... which are mutually disjoint. Indeed the union (or sum) ST will then be ordered by the sum of the sets PA,PB>PC> .... and the products X Y when (X,Y) run through all pairs which are the elements of the ordering set PT for T. By the product of two ordered sets A and B we understand A B ordered lexicographically: that means that ai, bi precedes ai, b2 if either ai precedes a2, or ai = a2, but bi precedes b2. This definition also admits generalization, but that will not be necessary just now. If a 1-1-correspondence exists between the ordered^ sets M and N such that the order is reversed by the correspondence, then N is said to be the inverse order type of M. For example the order type of the set of negative integers is the inverse of the type of the positive integers. Cantor denotes the inverse of the order type a by a*. Thus a; and u>* denote the types of the sets of positive and of negative integers.
An interesting class of ordered sets are the dense ones. An ordered set is called dense, if there is always an element between two arbitrary ones. The simplest example is the set of rational numbers in their natural order. This set is also open, that means that there is no first and no last member. Now we have the remarkable theorem: There is one and only one open and dense denumerable ordertype. Proof. Let A = {ai, a 2 ,....} and B = {bi, b 2 ,....} be two denumerable sets, both open and dense. First we let ai correspond to bi. Then ai divides the remaining elements of A into those < ai and those > ai. Let ami be the aj with least index < ai and am2 the a^ with least index > ai. Either mi or m2 is 2. Letting bnj be the bj with least j < fy , while bnj2 is the bj with least j > bi, then either ni or n2 is 2. We let bni correspond to ami and bn to am . Now every remaining a^ from A is either < am or > ami but < ai, or > ai but < am2 or > a m2 , which gives 4 cases. There are 4 corresponding cases for the remaining bj. Then if am3 is the a^ with least i such that aj < ami and bna the b; with least j such that bj<b m i , we let am3 correspond to bna and so on. It is easily seen how we obtain in this way an order-preserving correspondence between the at and the bj. One has only to remark that if am is the ai with the least i which has not already got any corresponding bj, then it gets one when in the different intervals between the already chosen am the r further amg are chosen. We have further: In an open and dense denumerable set a subset can be found similar to any given denumerable ordered set. This is seen in a similar way as in the proof of the preceding theorem. Indeed if bi, b 2 ,.... are elements of an arbitrary denumerable ordered set while ai a2 .... is an open and dense denumerable set, then we may map bi on ai. Then according as b2 < bi or > bi we map b2 on an element a"< ai or > ai. Then b3 is either less than both bi and b2 or lies between bi and b2 or is greater than both. Respectively we map b3 on an element a f l f having the same order relation to ai and aff and so on. Let us use the term scattered set for a set having no dense subset. Then an interesting theorem of Hausdorff says that every ordered set is either scattered or the sum of a set T of such sets, where T is dense. Proof: It is easy to understand that if an interval a to b in an ordered set is scattered and the interval b to c as well, then the whole interval a to c has the same property. Indeed, if d < e both belong to a dense set S then the set of all xeS such that d ^ x i e constitute a dense subset of S, and an eventual dense subset of the interval a to c must either contain at least 2 elements in the interval a to b or at least 2 in the interval b to c. Therefore the statement that the interval between a and b in an ordered set M is scattered is transitive so that we can divide M into classes A,B,C,... such that in each class any two different elements furnish a scattered interval. These classes are therefore successive parts of M, each of them scattered. On the other hand, if there are
ORDERED SETS
two different ones A and B, there must always be a C between, else A and B would amalgamate into one class. Thus a set T of the successive scattered parts of M must be dense. As to the denumerable ordered sets I should like to mention two facts which will be useful when I talk about Cantor's second number class. If a denumerable ordered set M has no first element, then it is coinitial with a;*, and if it has no last element, it is cofinal with cu These statements mean that we can in the first instance find an infinite sequence of type co* in the set such that there is no earlier element than all these in M, and in the second instance we may find an infinite sequence of type cu such that there is no element in M after all these. Proof: Let in the first case aiM, ani be the a^ with least i such that aj < ai , further anz be the aj with least i such that ai < anj , etc. Clearly 1 < ni < j\2 < ... If am were < every an , then we should have m > 1, ni, n2 , ..., which is absurd. Similarly in the second case. Among the ordered sets, the well-ordered ones, namely those possessing a least element in every non-empty subset, are especially important. That well-ordering is equivalent to the principle of transfinite induction is well known. This principle says that if a statement S is always valid for an element of a well-ordered set M when it is valid for all predecessors, then S is valid for all elements of M. Further I ought to mention that the sum of a well-ordered set T of well-ordered sets A,B,C,.... is again a well-ordered set. If T is denumerable and a denume ration is simultaneously given for each element A,B,C,... of T, then the sum is a well-ordered denumerable set. Also the product of two well-ordered sets is again well-ordered. The order types of the well-ordered sets are called ordinal numbers. These ordinals Cantor has introduced by a creative process which is very characteristic of his way of thinking. I will now give an exposition of this creative process. He begins with the null set 0 containing no element. Then since this 0 is an object of thought he has obtained one thing which he denotes by 1. (We may think of 1 as the set {0}, see the later axiomatic theory). Now, conceiving 0 and 1 as ordinals he has the right to write 0 < 1. Then he has this set of two ordinals which represents the ordinal 2. Having obtained 0 < 1 < 2 he has an ordered set representing the ordinal 3. Now he has 0 < 1 < 2 < 3 which furnishes a well-ordered set with 4 elements, etc. Now he thinks this process continued infinitely so that he obtains the set of all positive integers 0 < 1 < 2 <.... This well-ordered set, however, represents an infinitely great ordinal co . Then he has a set containing all finite integers together with co. This is a well-ordered set representing a greater ordinal than co, denoted by cu+ 1. Proceeding in this way he obtains after a while a well-ordered set consisting of two infinite series of increasing ordinals. This set represents a still greater ordinal written as cu + cu.
10
It is evident that all the infinite sets hitherto introduced are denumerable. But now Cantor collects all ordinals of denumerable well-ordered sets. This set represents an ordinal that is not denumerable. Strictly speaking the axiom of choice is being taken into account here, but Cantor uses that as an evident principle without even being aware of it. According to this principle we have that a denumerable set of denumerable or finite sets has a denumerable union. Now let us assume that the ordinals of finite and denumerable sets constitute a denumerable set. Then this set is cofinal with co, because there is evidently no greatest ordinal of this kind. Thus we may assume that oil < a2 < a 3 < ..... is a sequence of type co, such that every denumerable ordinal is = some ar- However we <iould then find finite or denumerable ordinals &, &,.... such that but now the ordinal
y = ai + Pi + $2 + ... must be denumerabie. Nevertheless y is clearly > every ar, so that we get a contradiction. Therefore the sequence of all finite and denumerable ordinals represents a non-denumerable ordinal. This was by Cantor denoted by Q. Cantor used the first letter aleph, written N, of the Hebraic alphabet with indices to denote the cardinal numbers of well-ordered sets. The cardinal of a), that is the cardinal number of the denumerable sets he called N0, the cardinal of Q he called HI . He proved that every subset of Q is either finite or has the cardinal N0 or the cardinal tf j. . Indeed if we have a subset of Q we may enumerate successively the elements of the subset by the elements 0,1,2,..., co, co + 1, ... of Q and then either this enumeration will stop with a finite number n or it will stop with some a < Q or it does not stop, so that the subsequence also has the ordinal Q . The finite ordinals are also called those of the first number class, the denumerable ones those of the second class. Now Cantor again collects the ordinals of cardinal number NI and proves similarly that they constitute a sequence of still greater cardinality N2 There is no cardinal between NJ and N2 - The ordinals belonging to well-ordered sets whose cardinal number is NI are said to be the numbers of the third number class. In this way he continues and obtains an increasing infinite sequence of alephs
each Nn+l being the cardinal number of the set of all ordinals represented by well-ordered sets with cardinal number Nn. These latter ordinals are those of class n + 2. But now he collects all ordinals belonging to all the classes with finite number. Then he obtains a set with a cardinal number which is suitably denoted NU, being > every Nn, n finite, while there is no cardinal between the Nn and this Nw . From N w he then derives K w+1 , ^w+2 etc. Quite generally there is an K ^ f c r every ordinal a. It must be conceded that Cantor's set theory, and in particular his creation of ordinals, is a grandiose mathematical idea. But what was at that time the reaction of the mathematical world to all this? In the first instance the
ORDERED SETS
11
reaction was rather unfavourable. No wonder, these ideas were too new and too strange. However, very soon the reaction got favourable for two reasons; 1) Cantor's way of thinking was of the same nature as, for example, Cauchy's and Weierstrass's treatment of analysis and the theory of functions, 2) Many of the notions introduced by Cantor were useful in ordinary mathematics. There were, however, also some opponents, above all Kronecker and Poincare. Kronecker did not only attack Cantor's theory of sets but also most of ordinary analysis. He required decidable notions. Poincare's main objection was that in set theory so called non-predicative definitions are used which according to him (and also Russell) are logically objectionable. The situation for Cantor's theory became indeed very much changed after 1897. In this year the Italian mathematician Burali-Forti discovered that the theory of transfinite ordinals leads to a contradiction. According to the Platonist point of view the existing ordinals are well-defined and well-distinguished objects such that they, according to Cantor's definition, should constitute a set. This set is well-ordered, therefore it represents an ordinal. However the ordinal represented .by a well-ordered set of ordinals is always greater than all ordinals in the set. Thus we obtain an ordinal which is greater than all ordinals, which is absurd. Another still better known antinomy was discovered a few years later (1903) namely Russell's. Ordinary sets are not elements of themselves. According to platonism the existing sets which are not elements of themselves ought to constitute a set U. We have then the logical equivalence x e x-~x e U. If, however, we put here U instead of x, which should be allowed because the equivalence should be generally valid, we get Ue"uUeU which of course is absurd. Also Cantor's theorem that the set UM of all subsets of M is of greater cardinality than M leads to an absurdity when we ask if there is a greatest cardinal or not. Indeed according to this theorem there is no greatest cardinal. But the union of all sets ought on the other hand to have the greatest possible cardinal number.
12
3. Axiomatic set theory. Axioms of Zermelo and Fraenkel

The discovery of the antinomies made it clear that a revision of the principles of set theory was necessary. The attempt to improve set theory which is best known among mathematicians is the axiomatic theory first set forth by Zermelo. I shall expose his theory in a somewhat more precise form, replacing his vague notion "definite Aussage" (= definite statement) by the notion proposition or prepositional function in the first order predicate calculus. We assume that we are dealing with a domain D of objects together with the membership relation e, so that all propositions are built up from atomic propositions of the form xey by use of the logical connectives &, v, - , -^( and, or, not, if - when) and the quantifiers (x), (Ex) (for all x, for some x). Then the following axioms are assumed valid. I write them both in logical symbols and in ordinary language. 1. Axiom of extensionality. If x and y have just the same elements, then x = y. In symbols (z)(zex-zey) -(x = y) Here x = y has the usual meaning, so that where U is an arbitrary predicate. Hence we also have 2. Axiom of the small sets. a) There exists a set without elements denoted by the symbol 0. Because of 1. there can be only one such set. (Ex)(y)(ylx). b) For every object m in D there exits a set {m} containing m, but only m, as element, (x)(Ey)(xey & (z)(zey (z = x) ) ) c) For all m and n in D there exists a set {m, n} containing m and n, but only these, as elements. (x)(y)(Ez)(xez & yez & (u)(uez*(u =x) v (u = y))) . Of course b) might be omitted because it follows from c) by putting n = m. 3. Axiom of separation. Let C(x) be a prepositional function with x as the only free variable, and m an arbitrary set. Then there exists a set consisting of all elements x of m having the property C(x). (x)(Ey)(z)(zc y C(z) & zex)
AXIOMATIC SET THEORY
13
4. Axiom of the power set. For every set m there exists a set Um whose elements are just all subsets of m. (x) (Ey) (z) (ze y(u) (ue z -me x)) 5. Axiom of the union. For every set m there exists a set Sm whose elements are just all elements of the elements of m. (x)(Ey)(z)(zey-*-^(Eu)(zeu & uex)) 6. The axiom of choice. Let T be a set whose elements are mutually disjoint sets A,B,C,... 4= 0. Then there exists a set M having just one element in common with each of the sets A,B,C,... (x)((y)(z)(yex & zex & y z~(u)(ue~x v uey))*-(Ev)((w)(wex -*(Et)(tev & tew & (s)(sev & sews = t)) ). These are the most general axioms set up by Zermelo (1908). Most of the general theorems of set theory are proved by the aid of these axioms. However, in order to ensure the existence of infinite sets Zermelo added: 7. The axiom of infinity. There exists a set U such that OeU and whenever xeU, {x} is eU as well. (Ex)(0ex & (y)(yex-^{y} ex) ). Later Fraenkel introduced a further axiom which is more powerful with regard to the proof of the existence of large transfinite cardinals, namely the following. 8. Let the binary relation F(x,y) (= prepositional function of two free variables x,y and any number of bound variables derived from the membership relation by the means of the predicate calculus) be such that (x)(y)(z)(F(x,z) & F(y,z)*-(y = x)). Then to every set m there exists a set n such that xen-*(Ey)(y em & F(x,y)). Or written more completely: (u)(v)(w)(F(u,w) & F(v,w)(u = v)) (x)(Ey)(z)(zey(Eu)(uex & F(z,u)). The following development of the Zermelo-Fraenkel set theory is carried out in such a way that it could be formalized in the predicate calculus. Such a procedure would however be very cumbersome if it were performed in all details. Therefore I have chosen an exposition that is somewhat more informal and more like the ordinary mathematical procedures. Theorem 1. (x)(Ey)(yeF x) . That means that to each set M we may find an object a such that a e M. Therefore the total domain D is not a set. Proof: According to the axiom of separation, the xeM for which xex is true, constitute the diverse elements of a set N. Then Ne M. Otherwise NeN would imply Ne N and inversely. Theorem 2. To each M and N there is anM* such that M! ~ M and M' 0 (M UN) = O.
14 Proof: Let a be I S(MUN).
The pairs {a,m}, where m runs through M, constitute a set Mf obviously ~ Mbecause the pairs (m, {a,m}) furnish a one-to-one correspondence between M and Mf. Indeed if {a,mi}4= {a,m2}, then mi =1= m2, and if mi m2, then {a,mi} =f= {a,m2}, because else we must have mi = m2 or mi = a & m2 = a, whence again mi = m2 . Now MT is disjoint to M U N, because otherwise we would have an element m of M such that {a,m}eM UN, whence aeS(M UN), contrary to supposition. Theorem 3. Let T be a set of sets A,B,C,.... 4 0 Then there exists a set T1 of sets A f ,5 ! ,C f ,... together with a one-to-one correspondence between T and T1 such that the unions ST and ST1 are disjoint while A',131 ,Cf,... are mutually disjoint and resp. ~ A,B,C,.... Proof: According to the previous theorem a set P exists which is disjoint to T U ST, while P ~ T, which means that we have a one-to-one mapping f(X) = XM such that Xff runs through P when X runs through T. For every XeT the pairs tf(X), x}, where x runs through X, constitute a set F(X). The function F has an inverse. Indeed, as often as Xi 4= X2 , F(Xi) and F(X2 ) will be disjoint, because f (Xx) 4= f(X 2 ), and if we compare two elements from F(Xi) and F(X2), namely
X l }and
(f(X 2 ), x 2 } ,
we cannot have f(Xi) = x2 , because X2 and P are disjoint. Therefore F and its inverse Ff give a one-to-one correspondence between T and Tf when Tv is the set of all F (X) = Xf, X running through T. For every XeT, the pair {f(X), x} eX' will correspond uniquely to xeX. If this pair is called gx(x)> then g^ and its inverse yields a mapping between X and XT. In this way we have obtained a simultaneous mapping of the elements of X and those of Xf for all X. Thus the theorem is proved. However we may add the following remark: The function g is such that if xeX then x1 = g^ (x) is eX' and xex 1 . We have: To every xeX the x1 = gx(x) is the element of Xf such that f xex , and inversely if x r eX f is given, the xeX such that gx(x) = xf is the element of X which is ex f . The simultaneous mapping of the elements x in the diverse X onto the elements xf of the diverse X1 is therefore here constructed so that x e x f when x and xf correspond. Now according to the axiom of choice there exists a set W having just one element in common with every set XT. If this element is denoted by w(X T ), being a function of X1 (this function is the set of pairs (X% xf) where xf = W n X T ), then we have W n X1 = {w(X')} and g^ (w(X f ))eX, i.e. g^ (w(F(X)))eX. Thus we have found a function, namely g^ wF, of the elements of T which has as its value for each X an element of X. This is the general principle of choice. Even without the axiom of choice we can introduce addition and multipli-
15
cation of cardinals although only in the case of a finite number of operands. Indeed if 0 is a set of ordered pairs (a,a f ) yielding a mapping of a set A onto a set Af, ^ a similar set furnishing a mapping of B onto Bf, A n B = 0, Af n Bf = 0, then 0 + i// is a mapping of A + B onto Af + Bf. Therefore we can just as in the case of the naive set theory define the sum of the cardinals of two disjoint sets as the cardinal of the sum. Similar remarks are valid for multiplication. If we take the more general case, however, of addition, where the number of cardinal numbers to be added together is infinite; then the definition of addition is only possible when the axiom of choice is presupposed. If T is a set of mutually disjoint sets A,B,C,...., Tf a set of disjoint sets A f ,B ! ,C f ..., while F is a mapping of T onto TT consisting of the pairs (A,A f ), (B,B f ),...., then if A ~ A1, B ~ B1,.... we can prove by the axiom of choice that the union ST is ~ ST f . Indeed according to supposition there is a set 0A of mappings of A onto Af, a set 0g of mappings of B onto B1,... Then according to the axiom of choice there exists a set consisting of one element <?& from 0A, ^3 from 0B>--" and the union of these is then a mapping 0 of ST on ST1. Without the axiom of choice we can only formulate the following theorem: Let T and Tf be as mentioned above, and let us assume that a set of mappings is given consisting of just one mapping of A onto Af, one of B onto Bf, etc., for all elements X resp. Xf of T resp. Tf; then ST ~ ST f . There is on the other hand one important theorem concerning the comparison of cardinals which can be proved without the axiom of choice, namely the Bernstein Theorem. Theorem 4. Let M be ~ M1, AT c A^c M. ThenM~Mtl. Remark: I use for every subset A of M the notation A1 for the image of A by the same mapping as of M onto Mf. Proof: We put Mi = Q + Mf, or in other words Q = MI - Mf. Let T be the set of subsets A of M which have the properties 1) Q c A 2) A' c A. T is not empty because at least MeT. Then let A0 be the intersection of all elements of T. I denote this also by DT. Obviously A0 has still the properties 1) and 2), i.e., A 0 eT or
3)
Q c Ao
4) A0f c Ao .
3) and 4) furnish 5) Q U A0f c A0 whence whence a fortiori

6) (Q U A 0 ! ) f c Q u Ao' .
Q U Ao1 e T,
(Q U A 0 ') f c AO'
From 6) it follows that whence
16
7)

Ao C Q U A0f , Ao = Q + A0f ,
5) and 7) yield noticing that Q n A i ! = Q n M = 0. Now we have

Mi = Q + Mf = Q + A!0 + (Mf - A 0 f ) = A0 + (Mf - AJ),
f
whence, A0 being ~ Aof,

Mi ~ A0f + (Mf - A f 0 ) = M'
which is the theorem. An immediate consequence is that if

M ~ Ni c N and N ~ MI c M,
then
M ~N.
Indeed it follows from NI c N and N ~ Ni that Ni ~ M2, where M2 is a certain subset of MI , so that since M ~ NI
M ~ M2 c MI c M,
whence after the previous theorem

M ~ Mi ~ N.
Corollary: If M ~ M c M, m = M, then
m + 1 = m.
It may be remarked that we have not used the axiom of choice in the proof of this theorem. As an example of another simple theorem of a certain interest, provable as well without the axiom of choice, I will mention Cantor's theorem and the very simple one below concerning the case m and n = 2. Theorem 5. (Cantor's theorem). For every set M we have M < UM. Proof: In the first place the pairs (m,{m}) yield a mapping of M on a subset of UM, namely the subset consisting of all sets {m} where meM. In the second, no mapping f of UM into M can exist. Indeed, let us assume the existence of suchâ mapping f and let N be the set of all f(X) for subsets X of M for which f(X)eX. Then we should have (X) (X CM-^(f(X) e N-"-f(X)l X). Putting in particular N into this formula instead of X, we obtain, since N M, f(N)eN-*-f(N)e~N which is absurd. Using the cardinal number notation this theorem may be written
2 m > m,
because it is seen that the cardinal number of UM must be 2m when m denotes the cardinal number of M. This is perhaps seen most convincingly in the following way. Let Mf be ~ M and M H Mf = 0, f being a mapping of M onto
17
Mf. We know that such Mf and f exist. For every meM I write f(m) = mf. Then we can get a one-to-one correspondence between UM and the product of all the pairs {m,mf}. Let N be CM. Then as often as meM is also eN we let the corresponding element of the product contain m as element, otherwise it contains just mf. Since the set of all pairs {m,m f } evidently has the same cardinal number m as M the product must be of cardinality 2m. A consequence of this theorem is that a set of sets representing all cardinals does not exist. Indeed, if T is such a set, then ST = X, X an arbitrary element of T, and Cantor's theorem says that UST > ST. Hence UST > X for all XeT. It may be suitably mentioned here, that the sum of the cardinals belonging to a set of sets with no greatest cardinal has already a cardinal > all cardinals in the set. It is often asserted that the following theorem, also due to Bernstein, can be proved without the axiom of choice. However, the usual proof, at least, does not fulfill this requirement, so that I think it is a mistake. The theorem is, when m and it denote cardinals: Theorem 6. If m + n = nm, then m and it are comparable. What is meant is that either m = it or it = m. Proof: The supposition m + n = mit means that we are given two disjoint sets M and N together with a mapping of M + N onto M x N. This means again that the set of all pairs (m,n), meM, n e N , is divided into two disjoint parts A and B where A is mapped onto M, B onto N. Now, if there is a particular mf such that all (m f ,n), n running through N, are eA, then N is ~ a subset of A, whence N ~ a subset of M. If no such mf exists, then for each meM there is at least one n such that (m,n)eB. Then one says, it is evident that B contains a subset which is ~M, whence M ~ a subset of N. Theorem 7. Let the cardinals m and it be ^ 2. Then m + it i nut. Proof: We have two sets M and N with at least two elements and we can assume M and N disjoint. Let mi =1= m2 be eM, HI ^ n 2 eN. Let P be the set of all {mi, n}, n running through N. Then P is ~ N. Further let Q be the set of all {m,ni}, m running through M - {mi}, besides the pair {m2,n2}. It is evident that Q is ~ M. Further P and Q are disjoint. Thus P + Q is a subset of M N which is ~ M + N, which proves the theorem. It is seen that the hypothesis of the theorem can not be weakened. Indeed if it is only supposed that one of the two cardinals is = 2, the other, say n, being = 1, the theorem is not valid for finite m. The theorem can be generalized. Let T be a set of at least 2 elements, each element of T containing at least two elements, the elements of T being mutually disjoint. Using the axiom of choice we may assume that we have chosen two elements of each XeT. Let A,B,C,.... be the elements of T and let ai, a 2 ,bi,b 2 ,.... be the chosen elements from A,B,C,.... Then the product PT contains subsets
A! B! Ci
consisting of the elements
18 ar,bi, Ci, r1 ai,bs, Ci, s 1
LECTURES ON SET THEORY ai,bi,ct t 1
and ai,b2, c2, a2,bi,c2, a2,b2, Ci,.... and it is evident that AI~ A, BI ~ B .... This means that PT contains a subset ST so that ST ^ PT. Theorem 8. If 0 < m = n, then every set of cardinality n can be divided into a set T of cardinal m of non-void mutually disjoint subsets. Proof: Let M c N, M = m, N = n and meM. For each xeM a subset Nx of N is defined thus: If x =1= m, then NX = {x}, while in the case x = m, Nm = (N-M) + {m}. It is evident that the NX, x running through M, are all mutually disjoint and their union (sum) is N. The inverse of this would be, that if a set N is the union of a set T of nonvoid mutually disjoint sets, then T= N. However, without axiom of choice this can only be proved if T is finite. Indeed, in order to prove this assertion one has to find a subset of N which is ~ T. This is possible if we can choose one element a from each element A of T; then the pairs (a,A) yield a mapping of T on a subset of N. Otherwise we have no means of proof. On the other hand we may prove the following theorem. Let N be the sum of mutually disjoint and non-void sets Nx, xeM, so that to each xeM corresponds just this single Nx- Then M is ~ a subset of the power set of N, so that m = M= 2", n = N. To every subset XeM we let correspond the subset NX, namely the sum of all Nx, x running through X, which is cN. For different X these corresponding NX are different; therefore 2m = 2". If we had m = 2", then 2m = m which is not the case, by Cantor's theorem. Thus m < 2". Theorem 9. (Zermelo). Let T be mapped on T1 in such a manner that as often MtT corresponds to M'eT1, M < M*. Then ST < PT. Proof: We may assume that the elements A,B,C,... of T are =1= 0. Then F, W ... are all 2. By theorem 7 we then know that STf ^ PF'. Further it is clear that ST i ST"1. Thus ST i FTT and it suffices to prove that PTT cannot be mapped on a subset S of ST. Let us assume that such a mapping were possible. The subset S of ST can be written as Ao + B0 + Co + ..., where Ao is the intersection of S and A,Bo that of S and B,.... The elements of PT are of the form {a f ,b f ,c f ,...}, where af eA f , b f eB f ,.... Let us take into account those which correspond to the elements of A0. If af varies, the corresponding aeA 0 varies. Therefore the aT occurring in the elements {a'jb^c*,....} which are mapped on the elements of Ao can only constitute a proper subset A! of Af, because else Af would have to be ~ A0 which contradicts the assumption A< A"1. Similarly the bf occurring in the elements {a f ,b f ,c f ,...} which are mapped on the elements of Bo must constitute a proper subset BI of Bf, and so on. Now PT also contains, according to the axiom of choice, an element,
THE WELL-ORDERING THEOREM

f
19
{ao, b0, c0, ....}, where aoeA'-Ai, b 0 eB -Bi, .... However this element cannot correspond to any element of ST. Indeed it cannot be mapped on an element of A0, for example, because if it could, ao would have to be one of the elements of AI.
4. The well-ordering theorem

After all this I shall now prove, by use of the choice principle, that every set can be well-ordered. First I shall give another version of the notion "well-ordered", different from the usual one. We may say that a set M is well-ordered, if there is a function R, having M as domain of the argument values and UM as domain of the function values, such that if N D 0 is arbitrary and e UM, there is a unique neN such that NER(n). I have to show that this definition is equivalent to the ordinary one. If M is well-ordered in the ordinary sense, then every nonvoid subset N has a unique first element. Then it is clear that if R(n), neM, means the set of all xeM such that nix, the other definition is fulfilled by this R. Let us, on the other hand, assume that we have a function R of the said kind. Letting N be {a}, one sees that always aeR(a). Let N be {a,b}, a 4= b. Then either a or b is such that NER(a) resp. R(b). If NER(a), then we put a < b. Since then N is not R(b), we have aeR(b). Now let b < c in the same sense that is, ceR(b), be"R(c). Then it is easy to see that a < c. Indeed we shall have {a,b,c} E either R(a) or R(b) or R(c), but bFR(c), ae~R(b). Hence {a,b,c} ER(a) so that {a,c}ER(a), i.e. a < c. Thus the defined relation < is linear ordering. Now let N be an arbitrary subset of M and n be the element of N such that NER(n). Then if meN, m =(= n, we have meR(n), which means that n < m. Therefore the linear ordering is a we 11-ordering. Theorem 10. Let a function 0 be given such that <!>>(A), for every A such that OCA EM, denotes an element of A. Then UM possesses a subset HI such that to every AT EM and D O there is one and only one element N0of HI such thatN E #o and <t>(N0)eN. Proof: I write generally Af = A - {0(A)}. I shall consider the sets P EUM which, like UM, possess the following properties 1) MeP 2) Aep-*A'eP for all A EM 3) T P-*DTeP. These sets P constitute a subset C of UUM. They are called 9 -chains by Zermelo. I shall show that the intersection DC of all elements of C is again a 0 -chain, that is, DC e c. It is seen at once that DC possesses the properties 1) and 2). Now let TEDC. Then, if P e C , we have TEP, and since 3) is valid for P, also DTeP. Since this is true for all P, we have DT e DC as asserted. Thus I have proved that DC e C.
20
In the sequel I put DC = fll and I assert that fll has the property mentioned in the theorem. Obviously fll is the least 0 -chain. Let O c N E M , and let N0 be the intersection of all Qe M for which NEQ, then N E N0. Further 0(N 0 )eN, because otherwise N!0 = N0 - {0(N 0 )} would still contain N and be efll, which is a contradiction, since this would mean that N0 is contained in N0 - {0 (N 0 )}. Thus we have proved the first half of the theorem. The proof of the latter half is considerably more laborious. It will be suitable first to prove the following: Lemma. Let A efll have the property that for every 3Cefll either 3C c A or X = A or A c #. Then Af possesses the same property. By the way, we may notice that such an A exists, M having this property. Proof: If Xefll is such that A = * or A c X, then A1 c X. Therefore, we only need to consider the case Xc A. The question is whether some IBe fll could exist such that Tl c A but I not E A% or in other words, 0(A) still e^. I will denote by fll* the subset of fll which remains after having removed all these 13 from fll. I shall show that HI* is a 0 -chain. 1) MeM* because Me fll and M is not possibly a TJ. Indeed each 1 is cA.
2) Let Be fll*. If A c B, then Bf is note A so that BT is not a 1. On the other hand B'efll, since B c fll. Then B'efll* in this case. If A = B, then Bf = A* so that 0(A)i~B f , whence again Bf is not a 1 so that B'efll*. Finally, let Be A. Then 0(A) must be e~B; otherwise B would be a IS against the supposition Be fll*. But then a fortiori 0(A)e~B f , so that BT is not a?. Therefore B f e fll*. 3) Let TE fll*. Should DT be a 1, we would have
(DT CA) & (0(A) e DT).
Then 0(A) is e every element C of T. Since every C is not a 1, we must have Co): A for every CeT and thus, because of the supposed property of A, AEC for all CeT, whence ASDT, so that DT is no 13. Hence DTefll*. However, since fll is the minimal 0 -chain and fll* is a 0 -chain 9 fll, we have fll* = fll, which means that the elements 15 do not exist. This proves our lemma. Now let fllt be the subset of fll consisting of all Aefll such that for every Xefll we have either #cA or 3C = A or Ac 3C. I shall show that fl^ is a 0 -chain, so that it coincides with fll. 1) M is efll!. This is evident since every Xefll is 9.M. 2) If Ae flli, then Afe fllt. That is just the lemma proved above. 3) Let T be9 flli. Then for every NeT and every Xefll we have either NE3C or 3CCN. Let 3C be an arbitrary element of fll. Then either there is an element N of T such that NE3e, and then DTE 36, or we have for all NeT that #EN, whence #EDT. Thus
THE WELL-ORDERING THEOREM
21
Hence it follows that flli is a 8 -chain and therefore = 0. This means that if A and B are ejR, we always have one of the three cases A cB, A = B, BCA. Further it ought to be noticed that if BCA, then BE A1, else we should have A'CB, which obviously is impossible when B cA. All this makes it now possible to prove the latter half of our we 11-ordering theorem; namely that if N 4= 0 is EM there is only one NoeHI such that 0(N 0 )eN and NENo. We have seen that there is such an No. Every element P of fll such that PcN 0 is EN!0, so that 0(N0)e~P, whence N is not cp. Every other element P of HI is such that NocP, whence N 0 EP% whence again 0(P)e"No so that also 0(P)e~N. Thus N0 is the only element of JH with the two properties NEN 0 and 0(N 0 )eN. We can now define a function R from M to HI thus: As often as Ne 01 & 0(N) = m, we write N = R(m). It follows in particular from the theorem just proved that for every meM a unique NeHl exists such that {m} EN while m = 0(N) so that N = R(m). Thus R and 0 are inverse functions. It is easy to see that 0 maps JH onto M. Indeed, if Ni CN 2 , then NI EN f 2 so that 0(N 2 )e~Ni whereas 0(Ni)eNi. Hence 0(Ni) 0(N 2 ) so that 0 furnishes a one-to-one correspondence between HI and M. Therefore there exists an inverse function mapping M onto HI, that is the function R. Before entering into a more thorough treatment of the well-ordered sets and the ordinals I would like to remind you of some notations I shall use. An initial part A of an ordered set <D shall mean a subset A of <D such that if xeA and y< x, then always also ye A, or in logical symbols (x)(y)((xeA) & (y < x)yeA). Similarly a terminal part C of <D is to be understood. An interval B shall be used in the meaning BE and (x)(y)(z) (xeB & yeB & (x < z) & (z < y)*zeB). These parts A,B,C may be closed or open, for example an initial part A may have a last element, then it is said to be closed, or not, then it is open. An interval B may be open or closed or open to the left, closed to the right or inversely. It ought to be noticed that the union of a set of initial parts is again an initial part. If ae<D, the set of all x< a constitute an initial part. This I shall call the initial section corresponding to a. It ought to be noticed that if (D is well-ordered, every initial part which is not <D itself is an initial section. Theorem 11. Let a well-ordered set M be mapped into itself by a function f which preserves the order, that is a < b -*f(a) < f(b) for all a and b e M. Then for all m e M we have m ^ f(m). Proof: Let us assume that the theorem is not true. That would mean that the subset N of M of all those x for which x > f (x) was not void. Let m denote the least element of N. Then we should have
m > f(m) = mf,
and because m N, m'if(m'). However, since f is order-preserving and m > mT, we should have f(m) > f(m f ), that is m' > f(m f ). It follows that if M is mapped by a function f onto M with preservation of order, then f(x) = x for all x. Indeed, according to the theorem, we have f(x) i x and f'W ^ x, that is, x = f(x).
22
From this it again follows that if a well-ordered set M is mapped with preservation of order onto an other well-ordered set Mf, then this mapping is unique. Indeed if f and g both map M onto Mf, then fg'1 maps M onto M so that fg (x) is x an(* therefore f(x) = g(x) for all x. Theorem 12. If M is mapped by f with preservation of order into an initial part A of itself, then A = M and the mapping is the identical one. We may also say: M cannot be mapped onto an initial section of itself. Proof: Let f map M onto A, A initial part of M. Then no element m of M can be > every element x of A, because f(m) should belong to A so that m > f(m), which contradicts the previous theorem. Thus every meM is = an xeA, whence me A, that is, A = M. Noticing that an initial part of a well-ordered set M is either M itself or a section of M, we have that if M - N (meaning M and N are similar), then M is neither - Ni nor N MI, MI and NI denoting sections of M resp. N. Theorem 13. Let M and N be well-ordered sets. Then either M - NI, Ni a section of N or M = N or Mi = N, Mi a section of M. Proof: Let I be the set of all initial parts of M that are similar to initial parts of N constituting a set J. Then the union SI is in an obvious way similar to SJ. Now either SI must be =M or SJ = N. Else SJ will be the section belonging to an element i of M and SJ the section delivered by j e N . But then SI + {i} would be similar to SJ + {j} which contradicts the definition of I. Now, if SI = M, either M = N or M = a section NI of N according as SJ is N or NI , else SI is a section MI of M while SJ = N so that MI = N.
5. Ordinals and alephs

It is now natural to say that an ordinal a is < an ordinal ft if a is the order-type of a well-ordered set A, 0 the type of B, such that A is similar to an initial section of B. It is clear that a < j 3 & j 3 < y - a < y and that a <jS excludes j3 <a . Thus all ordinals are ordered. However, this ordering is also a we 11-ordering. Let us namely consider an arbitrary set or even class C of well-ordered sets. Let M be one of the sets in C. Its ordinal number JLJ may be the least of all represented by the considered sets. If not there are other sets in C which are similar to sections of M. These sections are furnished by elements of M and among these there is at least one. The corresponding initial section represents then the least ordinal of all furnished by the sets in C. Theorem 14. A terminal part or an interval of a well-ordered set is similar to some initial part of it. It is obviously sufficient to prove this for a terminal part. According to the comparability theorem, otherwise the whole set M would have to be sim-
ORDINALS AND ALEPHS
23
ilar to an interval of itself, but that contradicts the fact that we should have x i f ( x ) for all xeM. A consequence of this is that we always have a = a + j8 and 0 = a + j3. I have earlier defined addition and multiplication of ordered sets. We may define multiplication and exponentiation for well-ordered sets in such a way that well-ordered sets result. First I will repeat the definition of addition: Let T be a well-ordered set of well-ordered sets A,B,C,.. which we assume mutually disjoint. Then the sum ST is well-ordered thus: Any two elements of the same element X of T retain their order in X. If X preceeds Y in T, then every element of X preceeds every element of Y in ST. It is indeed easy to see that ST is well-ordered in that way. Let namely M be EST and 4= 0. Then the diverse XeT which furnish elements of M constitute a non-void subset of T. Since T is well-ordered there is a least element of this subset, N say. Since N is well-ordered there is a least element m in the subset M n N of N. Obviously m is the least element of M. Multiplication I will define as follows. Let us again consider a wellordered set T of mutually disjoint well-ordered sets A,B,C,... =)= 0. Let ao, b0, CQ, ... be the least elements of A,B,C,.... Then I take a subset P of A.B.C in the previous sense, namely the set P consisting of all elements of A.B.C which contain only a finite number of elements different from ao,b0, c 0 ,.... This set P is then ordered by the principle of last differences, which means that if a,b,c,.. and a f , b f , CT, ... are two elements of the product, then a,b,c... < a^b^c*, if m < mf but no later element mi > mi T . Exponentiation is defined by letting all factors in a product be similar well-ordered sets. Lemma. Let T be a well-ordered set of well-ordered sets A,B,C,... such that if X and Y are elements of T and X < Y in T, then X y and the order of the elements of X remain unaltered in Y. Then the union ST is well-ordered and two elements of ST are ordered as in some element X of T. Proof: If T contains a last (greatest) element M, then the truth of the lemma is immediately clear, because in this case ST = M. Therefore we may assume that T does not contain any last element. Let us then consider a subset N of ST, O C N. There will be elements X of T containing elements belonging to N. Let X0 be the first of these X. Then XoflN is a subset =1= 0 of the well-ordered set Xo so that there is a first element in X 0 ON which obviously is the first element in N. Thus it is proved that ST is wellordered. It is evident that two elements of ST will both occur in some element of T and have there the same relation of order. Now let us consider the product P of the well ordered set T of well ordered factors A,B,C,.... The product belonging to an initial section of T may be called a partial product and be denoted by PX, if the section of T is given by X. It is understood that the elements of Py shall, for each Y = X in T, contain y0 only. I shall first prove that if all these partial products are well-ordered, so is P. Indeed as often as X < Y, PXPY so that the partial products constitute a well-ordered set of well-ordered sets of the kind considered in the lemma. Now if there is no last element in T (no last factor in P) then P is the union of all PX and is therefore well-ordered according to the lemma. If there is a last factor F then P = Pp. F where
24
PF is well-ordered according to supposition, and since the product of two well-ordered sets is well-ordered, P is well-ordered. Now let us look at the case that some partial products were not well-ordered. There must then be a least Xo among all the XeT for which PX is not well-ordered. Then PXO is the union of all Py, where Y preceeds Xo in T if Xo has no predecessor, else, if F is the predecessor, we have PY^ = PpF where PF and F are well-ordered. Further all these Py are well-ordered. But then again according to the lemma PY is well-ordered which is a contradiction. Therefore all partial products are well-ordered, which as we just saw implies that P itself is well-ordered. Thus we have proved: Theorem 15. The product P of a well-ordered set of well-ordered sets is well-ordered. I would like to prove that the product a j3 can be conceived as the result of adding /3 sets each of ordinal number a . Let A have the ordinal a,B the ordinal 0. Then a/3 is the ordinal number of the set P of pairs (a,b) ordered according to last differences as explained. Let Mb be the set of all pairs with the last element b and T the set of all these Mfc. Then ST, wellordered as explained above, is just the sum P of all Mb- Each of these has the ordinal a . It is easy to verify that the associative laws hold for addition and multiplication. Also the distributive law o(j3 +y) = a/3 + ay is seen to be valid. On the other hand, the commutative laws do not hold, nor does the distributive formula (a + #)y = a y + j3y. I shall give some examples. 1 + co= a* < w+ 1 2.w = cu <cu2 and therefore (1 + l)co= w < l.w + l.cu. One can also notice that not always For example (2.2) < 2W 2W, On the other hand, if A = 17, (2.(u> + I))2 > 2 2 (u + I)2 0^ , then a A = t a $ r ] and in particular
ff0
+ ^ = (* cr?
We have seen that the ordinal numbers are well-ordered by the relation < . It is then natural to ask how the cardinal numbers behave. Because of the comparability of the ordinals it is immediately clear that the cardinal numbers are comparable; indeed, if M and N are any two sets and they are in some way well-ordered, then either M is similar to, and thus equivalent to, some initial part of N or inversely. Thus we have either M = N or N = M. Now let T be a set of sets. I assert that the cardinal numbers represented by the elements A,B,C,.... of T are well-ordered by the relation < as earlier defined. Evidently it suffices to prove that there is a least cardinal represented
ORDINALS AND ALEPHS
25
by the elements of T, because then the same will be true for every subset of T. Now let M be e T. If M is the smallest cardinal represented by any element of T, then our assertion is correct. Otherwise there will be some elements X of T representing smaller cardinals. All these X we may assume well-ordered. Then each of them is similar to an initial section of M given by an element m of M. Among these m there will be a least one mo. The section given by mo then furnishes the least cardinal number among the mentioned X. Thus the cardinal numbers are also well-ordered by the relation < . More exactly expressed: All cardinals = a given cardinal constitute a wellordered sequence according to their magnitude. The least of the transfinite ones, the cardinal of the denumerable sets, we denote, as Cantor did, by N0, the following by NI , and so on. If a is a transfinite ordinal, i.e. w= a, then we have 1 + a = a, because we may write a = w+ j3, whence l + a = l + ( w + |8) = (l + w) + j3 = co + /3 = a. More generally we have of course n + a = a, n finite. Further it may be noticed, that if a is the ordinal of a set M without last element or in other words a is without immediate predecessor, then for every finite ordinal n we have net = a . We can first prove that a = o;/3, whence na = n(o;/3) = (no;)j3 = w/3 = a since nco is evidently = co. That a indeed is a multiple of o> is seen by distributing the elements of M into classes by putting any two elements into the same class which are either neighbors or have only a finite number of elements between them. It is clear that every class is of type co, and the whole set is the sum of a well-ordered set of these classes, which means that a = (jo (3, j3 denoting the ordinal of the set of the classes. Among all ordinals whose cardinal number is $a there will be a least, usually written co^. This WQ belongs to a very remarkable class of ordinals called principal ordinals. The definition is: An ordinal a is a principal one, if the equation, a = (3 + y only has the solutions j8 <a, y = a and a = j3 , y - 0. One may also say that the ordinal represented by a well-ordered set M is principal, if M is similar to every terminal part of itself. Proof that u>o is principal: Let ua = (3 + y, y > 0. We know that y is the ordinal of some initial part of M, if M has the ordinal wa. If this initial part of M is not M itself, it is an initial section, so that y <w a , and according to the definition of w^ we have that the cardinal number y of y must be < tia. Further J3" is also < N 0 , because j3 is the ordinal of some initial section of M. But the sum of two alephs < Na is again < Na . Thus y must be = ua. Since it is clear that every transfinite cardinal N may be given by a wellordered set without last element, indeed the least ordinal with cardinal number N cannot have a predecessor because 1 + N = N, we obtain from the relation not = a just mentioned that always for finite n. Hence for every aleph $a in particular $a + tia = $a. Further if tip <tia, we obtain
26 which means that
Thus the sum of two alephs is the greater one of them. Further, if N0 and Ny are both < $a , also N0 + Ky < tf ff. The division of ordinals may be performed thus. Let a be given and 0 > O. We consider the ordinals y which are such that for some 6
a = j8y +6
1 assert that there is a greatest value of y here. Indeed the assumption that 0y^ where n < y2 < ...., are all ^ a yields 0 lim yx = #, where lim y\ is the least ordinal > every yx- This is perhaps most easily seen by writing 72 = n + f"X21, 7a = 72 + ra11, ..... and generally y^+i = r\ + rx+i- Tnen lim yx = S y^ putting y\ = yi , and we have by the distributive law for multiplication A
But the several j3yxf will represent the ordinals of different disjoint intervals of a well-ordered set of ordinal a. Thus 5 j3y\f = a. If K is the greatest value of y, we have
a = PK + p, p < 0.
Indeed, if p were = /3 + p , we should obtain a = j3(* + 1) + p' so that K would not be the maximal y. In the particular case 0 = co we get a = GUK + n, n finite. Thus we again get the above result, that if a is the ordinal of a well-ordered set without last element, it is of the form w*. It is easily seen that /3lim 7^ lim jS^ . As a consequence of this there is a maximal power jS^1 = a. Then the division of a by jSî yields
f
a=
Now again there is a maximal power of ft /3^2 say = af. Then we obtain
af = /3y2 1/2 + a f f , <*"< )S y 2, 1^2 < ft
Since the sequence a, af, a",., is decreasing, there is a least one which must be O. Then we have m a = Z 0 rr r r , m finite, all i/r < 0 . r =1 Of particular interest is the case 0 = w. We obtain the result that every ordinal can be written in the form m a= YJ cj yr n r , yi > y2 > ... r =1 m positive and finite, all nr positive and finite. It is clear by the method of construction that this form is unique.
ORDINALS AND ALEPHS
27
It is seen that a cannot be principal without being simply a power of cu On the other hand every power of a; is easily seen to be principal. If yi is kept fixed in the above expression while y2, Ys, . m, and the nr vary, we get all numbers < wî + 1. If also yi varies but is kept < a, a a limit number, we get all ordinals < co^. I will show how we can set up a very simple one-to-one correspondence between the elements of a well-ordered set M of ordinal equal to a power of w on the one hand and the ordered pairs (a,b) which are the elements of M2 on the other. To every pair
kiq kiq
we let correspond the number y = u a k f ( m k , %) k^q where f(m k ,nk) is a one-to-one correspondence between the non-negative integers and their pairs. We set y = 0 for a = |3 = 0. If this is applied to cuff considering the cardinal number $a we obtain Of course we then also get tfj = $a by an easy induction. Because of the well-ordering theorem we then have that m2 = m for every transfinite cardinal m. It is now very remarkable that, if inversely it is presupposed that this formula is valid for every transfinite cardinal number m, then every set can be well-ordered. Thus we have Theorem 16. The general validity of m2 = m implies the general principle of choice and inversely. If we look at the proof of the earlier theorem stating that m and n are comparable when m + ti = mn, we notice that if n say is an aleph, then we need not use the axiom of choice in the proof. Further, if simultanously it is known that n is not = m, we get m = n and then m is an aleph. Now m being an arbitrary cardinal number, it is always possible to define an aleph which is not = m. This was first Jpne by F. Hartogs (Math. Ann. 76, 438, 1915). Let M be a set such that M= m. There are some subsets of M which can be well-ordered. We take into account all well-orderings of all these subsets and distribute these well-ordered subsets into classes of similarity. Every such class is then a set corresponding to an ordinal and these sets constitute again a certain set. To the ordinals represented by the members of this set there exist always greater ordinals e.g. the sum of all the ordinals. Among these greater ordinals there is a least one A say. Then A is not = m, because this would mean that there exists a subset of M which can be well-ordered with ordinal number A, whereas A is greater than every ordinal a for which this was the case. Thus A is an aleph which cannot be = m. Hence the correctness of our assertion, that if always m + n = mn then every set is well-ordered. However, to be perfectly correct we must assume m2 = m for any inductive infinite cardinal number. Now if always m2 = m, we have (m + it)2 = m + n, whence at any rate
28
mn i m + n.
However we have proved earlier that if nt and n are = 2, then tn + n = m it. Thus we obtain mn = m + n.
6.
Some remarks on functions of ordinal numbers
A function f(x) is called monotonic, if (x< y) -(f(x) ^ f(y)) . It is called strictly increasing, if The function is called seminormal, if it is monotonic and continuous, that is if f(lim a\) = lim t(a\), A. here indicating a sequence with ordinal number of the second kind, i.e., without immediate predecessor, while (\i< A.2) ~*(a\l< <*A2)The function is called normal, if it is strictly increasing and continuous; | is called a critical number for f, if f(|) = . Theorem 17. Every normal function possesses critical number sand indeed such numbers > any a. Proof: Let a be chosen arbitrarily and let us consider the sequence a, i(a), I2 (a),.... Then if a^= lim f n (o), we have f(a w ) = f (lim (f n (a)) = lim (a) = aw, that is, a^ is a critical number for f. Examples. 1) The function 1 + x is normal. Critical numbers are all x = w + a, a arbitrary. 2) The function 2x is normal. Critical numbers are all of the form wa, a arbitrary. 3) The function wx is normal. Critical numbers of this function are called -numbers. The least of them is the limit of the sequence I will mention the quite trivial fact that every increasing function f is such that f(x) = x for every x. Theorem 18. Let g(x) ~ x for all x and a be an arbitrary ordinal; then there is a unique semi-normal function f such that f(0) = or, f(x-fl)=g(f(x)). Proof clear by transf inite induction. Theorem 19. Iff is a semi-normal function and /3 is an ordinal which is not a value off, while f possesses values < )3 and values >#, then there is among the x such thatf(x) < $ a maximal one XQ such that
FUNCTIONS OF ORDINAL NUMBERS
29
Proof trivial, because if i(x\) < j3 for all A. in a sequence without last element, then f(limx x ) = limf(xx) ^ |3, but the equality sign is excluded. Let A be a set of ordinal numbers without maximal element. A subset B is said to be closed in A, if every limit of a sequence in B is eB, if it is eA. If B is closed in A and cofinal with A it is called a band of A. Remark. Every band consists of the values of a normal function, and the inverse is true, if the set of the arguments is cofinal with A. Theorem 20. If M and N are bands of A, so is M U N. Proof. Of course M U N is cofinal with A. An arbitrary sequence S in M U N without last element is either such that from a certain point on all elements belong to M say, then the limit is in M; or there are always greater elements both in M and in N, and then there is a common limit in M and N. Theorem 21. If M and N are bands of A and A is as already indicated without last element, but not cofinal with a;, then M n N is a band of A. Proof. We assume that after a certain a0e M there are no common elements in M and N. Then we have an increasing sequence thus: c&n+1 is the first element of N which is > a2n Qf 2 n+2 M which is > a2n+1 . Then lim an is e A and therefore eM and eN which is contrary to the n <o; assumption. Theorem 22. Letf(a,ff) be normal with respect to ft Then it is not an always increasing function with respect to a . Proof. If ai < a2, then the normal functions f(0i,/3) and f(a 2 ,j3) of )3 have a common critical value | according to the last theorem so that f(<*i ,|) = f (ft,{) ={ Let us however, following E. Jacobsthal, consider the functions having the following two properties: 1) f(a,/3) is for constant a a normal function of j3 2) f(a,/3) is for constant /3 a monotonic function of a with f(a,/3) >a. Further let us call fi a generating function for f when i(a,p+l)=i1(i(a,($), a). This equation together with f(a,0) defines f when f is continuous. Theorem 23. If f\ has for a >!,&>! the property 2) and is monotonic in ft while f is continuous andf(a,l) increasing in a, then f satisfies 1)
and 2).
Proof. When a > 1, one has f(a,l) > 1, namely t(a, 1)^ a > 1. If, for a > 1 and ,3=1, f(a, /3) is monotonic in a and f(a,/3) > 1, then because of the
30
definition of f above i(a , j8 + 1) is monotonic in a and f(a, /3 + 1) = f i (f(a,j3), a) > f(a,j3) (see 2)). K X is a limit number, and if, for a> 1 and 1 < 0 < X, f(a,j3) monotonic in a, then f(of,X) is monotonic in a. Thus for a > 1 and 0 >1 we have that f(of,0) is monotonic in o? and a normal function in j3. Further, for a > 1 we have, because of f(a,l)>a, also f(af,/3)>a for 0 > 1. Now, if one starts with 0o(#,$) = 0 + 1 and defines 0r+1(a,/3) by using 0r as generating function for r = 0,1,2 putting 0i(a, 0) = a, 02 (a, 0) = 0, 0 3 (a,0) = 1, then we obtain 0i(a,/3) = a + ft 0 2 (a,/3) = a ft 0 3 (a,j3) = aft An immediate result is that these functions have the properties 1) and 2). Definitions: 1) Let us say that f with generating function fi satisfies a generalized distributive law when a function f2 exists such that (1) fi(f(a,j8), f(a,y)) =f (a,f 2 (fty)). If f2 = fi , we say that f satisfies the special distributive law. 2) We may say that f fulfills a generalized associative law, if a function f 3 exists such that (2) f(f(a,j3),y) = f(a,f 3 (ft y)). If f3 = f , f satisfies the special associative law. Theorem 24. Iff satisfies the general associative law, then f 3 satisfies the special associative law. Proof. K in the formula (2) we put a = f(| ,a f ), /3 = 01, y = yf, the formula (2) yields f(f(f(S,a f ), j3'), y') = ( f ( 5 , a f ) , f 3 O f ,r')) and by application of (2) twice on the left and once on the right side we get f(f(!,f 3 (a',/3')),r') =f(,f 3 (f 3 (a',/3'),r')) = f ( , fata 1 , 3(0*, r'))). whence because f (| , /S) is increasing in ft and that is the special associative law for f3. Theorem 25. Iff, being generated b y f l f satisfies both laws (1) and (2), then /, is generating function offs and fz satisfies the special distributive law. Proof. We have
and
f(f(a,j3), r + 1 ) =f(a,f 3 (ft y + 1)), whence 3 (ft y + l ) = f 2 ( f 3 ( f t r ) , / 3 ) , that is f2 is generating function for f3. Further, by (1)
FUNCTIONS OF ORDINAL NUMBERS
31
f(|,f 2 (f 3 (a,/3),
f3(a,y))) = i(U,fs(,/3)),
f(l,f 3 (a,y))) f(|,f 3 (a,f 2 (fty))).
which by (2),(1),(2) successively yields i(f(f({,fl} f j3), f(f({ f a) f y)) f Kf(S,fl)A(Ay)), By comparison of the first and last expressions containing one obtains a(a(a,!3)f f 3 (a,y)) = fa(a,fa(fty)), that is, f3 satisfies the special distributive law. Theorem 26. #" / is defined by fi , f(a,o) = 0 or 1, / satisfying the generalized distributive law, and iff3 is defined as a continuous function with fz as generating function, by fs(a,o) = 0 fs(a,j3 + D=a(3(ag8), a), then f satisfies the associative law (2). Proof. This law (2) is valid for y = 0, because f(f(a,/3),o) = 0 or 1 and f(a,f 3 (fto)) = f(a,o) = 0 or 1. If the law is valid for y, then it is valid for y + 1, because f(f(a,/3),y+l) = i(((,j3),y), (,|8)) because of the supposition of induction = fi(f(a,f 3 (a,y)), f(a,/3)) = f(o,f 2 (f 3 (fty), ft)) = f(a,f 3 (ft y + 1)). If the law is valid for all y < y0, yo a limit number, then it is true for y0, because f(f(a,j3),yo) = lim f(f(a,/3),y) = lim f(o,f 3 (fty)) = f(a,f 3 (ft y0)). y<yo y< yo Theorem 27. Letf be defined byfi,f(a,o) = O, A(a9o) = a or f(a,o) = I, fi(a,l) = a, while the special associative law is valid for fi , andfi is continuous in ft thenf satisfies the distributive law (1) with /2 (a, (3) = a + /3. Proof. The formula (1) is valid for y = 0, because fi(f(a,j8), f(o,o)) = f(a,/3). Let us assume its truth for y. Then we have i(f(,|3), f(a, y + 0) = i((a,j3), fi(f(a,y), a)), and since the special associative law is valid for f this becomes fi(fi(f(0,/3), f(o,y)), ex) = fi(f(a,/3 + y, a) = f(or,)3 + y + 1). If formula (1) with f2(a,/3) = a + ]8 is valid for all y < y0, yo a limit number, then it is valid for y0, because i((a,j3), f(a,y 0 )) = lim fi(f(o,^), f(a,y)) = lim f(a,/3 +y) = f(a,/3+ro). y<yo y<yo Applying the last two theorems to the three elementary arithmetical operations, 0i(a,/3) = a + ft 02(a,/3) = aft 03(a,/3) = a/3, it is seen that the associative and distributive laws of these are all derivable from the special associative law of addition
(a + 0) + y = a + (j3 + y).
32
Indeed, if we put fi = 0i , f = 02 in Theorem 27 we get and putting fi = 0i , f2 = 0i , f = 02, fa = 02 , Theorem 26 yields (oj3)r = or(0r). Further, if we put fi = 02, f = 03, Theorem 27 yields while putting fi = 0 2 , U = 0 i , f = 0s, fa = 02 one obtains, according to Theorem 26,
7. On the exponentiation of alephs

We have seen that an aleph is unchanged by elevation to a power with finite exponent. I shall add some remarks concerning the case of a transfinite exponent. Since 2Ko > 0, we have (2^)* N0*, but (2**)^ = 2 K K = 2K. On the other hand 2Ko i No**0. Hence
2
No _ IA NO - NO
Of course we then have for arbitrary finite n

^> and not only that. Let namely N0 < w = 2 . Then
2 NO = in
whence
m* = 2*,
In a similar way we obtain for an arbitrary I
for all m > 1 and ^ 2 From our axioms, in particular the axiom of choice, we have derived that every cardinal is an aleph. Therefore 2^ a is an aleph. We can also prove
by the axiom of choice that 2 a > $a+i or perhaps = Na+i . One has never succeeded in proving one of these two alternatives and according to a result of GTodel such a decision is impossible. However, in many applications of set theory it has been convenient to introduce the so-called generalized continuum hypothesis or aleph hypothesis, namely
EXPONENTIATION OF ALEPHS
33
In particular the equation 2 = tf i is called the continuum hypothesis. Of course this assumption means that we introduce a new axiom, namely the following: Let M be a well-ordered set, UM as usual the set of its subsets, and N such a well-ordered set that every initial section of N is ~ M, while N itself is not ~M. Then there exist in our domain D a set 0 of ordered pairs which yields a one-to-one correspondence between UM and N. If we have the axiom of choice, we may say more simply that if M is infinite, then every subset of UM is either ~ a subset of M or it is ~ UM. On the other hand there are a few aleph formulas which can be proved without the (generalized) continuum hypothesis. I shall give some of these. A theorem of Konig says: Theorem 28. Ify runs through all ordinals <X9 where A. is a limit number, then
y<\
IA
X y< n
y<\
y .
This follows from the general inequality theorem of Zermelo proved earlier. By the way, we have ^ Nv = tf^ of course. As a particular case we have y<\ NO, < oi2 ..... Since 0 i 2 ..... is i ^, we obtain the inequality
^Ct) ^ **U) '
Similarly | is > w , etc. An equation of Hausdorff is Theorem 29. K*f 1 = j|0 - a+1 , where a and $ are arbitrary ordinals. Proof. 1) Let a < / 3 so that a + 1 0. Then, since a + i i j3 < 2**0 =
2) Let a ^ 0. Then we can write
=
whence the asserted equation. A theorem of Tarski is: Theorem 30. If y i K^ ên J^y = x*P -
* '
The proof can be given by transfinite induction with respect to y. The
34
theorem is true for y = 0. Let us assume its truth for y. Then by Theorem 29 cH-y+i a a+y K a+y+i -H a a+y a+y+i a? an-y+i Now let A be a limit number such that A = tf 0, while the theorem is assumed valid for all y < X. Then
= Z N
according to the theorem of Konig. Hence
< n
te = n x*P = n N"PK' = ("j3)" n u u r M

y<A "^ ^^>
^
\
/ -î
a +y
a.
a+A
while on the other hand
a+\
a+A CM-A
Therefore the theorem is valid for A. and is proved. I shall further mention without proof the following two theorems: M 1) In order that 2 <* = K0 it is necessary and sufficient that 0 is the least ordinal number { such that K*?a < N!^ . 2) We have 2 a =topif and only if j3 is the least ordinal number | such that $*a = . A further question concerning the cardinal numbers is whether the sopalled inaccessible cardinals exist. An aleph tf Q would be called inaccessible if WQ = Q, or if one prefers, ft = tf Q . This question may again be undecidable so that the introduction of further axioms might be desirable. However, I will not pursue this subject further here.
SETS REPRESENTING ORDINALS 8. Sets representing ordinals
35
There exists a class of sets of such a particular structure that they may suitably be said to represent ordinal numbers. I shall first mention the definition by R. M. Robinson (1937). A set M is an ordinal, if 1) M is transitive. That a set M is transitive means that it contains its union. In symbols: (x)(y)((xey) & (yeM) -(xeM)). 2) Every non empty subset N of M is basic, which means that it is disjoint to one of its elements. In logical symbols: (Ex)(xe N & (xf) N = 0)). 3) If A B , AeM and BeM, then either AeB or BeA. I shall call every set M with the properties 1), 2), 3) an R- ordinal. Remark 1. If HI is a class of R- ordinals, then the intersection of all elements of M is again an R - ordinal. Indeed, if Mo is this intersection, we have that if A e B , Be Mo, then AeB, BeM for every M in HI, whence AeM because M is transitive, whence A e Mo, because this is valid for every M in JH. Thus Mo is transitive. Let O c N E M o . Then for any M in HI we have 0 cN EM, whence by 2) Mo has the property 2). Finally let A and B be different and eMo. Then for any M in III we have A and BeM, whence by 3) either AeB or BeA. Thus Mo has the property 3). Remark_2. Further it may be remarked that if M is an R - ordinal we have MeM, because MeM would mean that the subset {M} of M was not basic. Theorem 31. Every R-ordinal M is the set of all its transitive proper subsets. Proof. Let C be eM. Since M is transitive, C must be EM. Indeed C is CM. C = M is impossible, because that would mean MeM, which is impossible by Remark 2. Further C must be transitive. Indeed let AeB, BeC. Then BeM, whence BEM, whence AeM, whence AEM. By 3) we have either A e C o r C e A o r A = C. I assert that CeA and C = A are impossible. Indeed, CeA would imply that {A,B,C} is not basic, and C = A would mean that {A,B} is not basic. Hence A e C , that is, C is transitive. So far I have proved that every element C of M is a transitive proper subset of M. Let, on the other hand, C be a transitive proper subset of M. Then 0 CM - C so that by 2) an element A of M - C exists such that A n (M - C) = 0. Then, if BeC, neither A = B nor AeB, because of the transitivity of C. Therefore BeA and thus CEA because BeC yields BeA for all B. Since AEM and A H(M - C) = 0, it follows that A E C , whence A = C, whence CeM. Thus I have proved that every transitive proper subset of M is element of M. Remark 3. It is clear according to this that every element of an R - ordinal is an R - ordinal.
36
LECTURES ON SET THEORY Theorem 32. If A and B are R-ordinals, AeB-*--AC B.
Proof. AeB yields, because of the transitivity of B, A EB, but A = B is excluded. If AEB, then it follows from the previous theorem that AeB. Theorem 33. Any class K of R - ordinals is well-ordered by the relation e. Proof. Let A =(= B both belong to K. The intersection A flB is, according to Remark 1 above, an R-ordinal. If we had A DB cA and cB, then by the preceding theorem A f l B would be eA and eB, whence A f l B e A n B which is impossible. Thus either ACB or BcA, whence AeB or BeA, so that K is linearly ordered by e. Now let Kf be a subclass of K and D be the intersection of all elements of Kf. According to the Remark 1 above, D is an R-ordinal, and if A belongs to Kf, DEA and therefore DeA whenever A =)= D. On the other hand D must itself belong to Kf, for if it did not, D would be element of each A in Kf and thus eD, but DeD is impossible. This shows that there is in Kf a first element with regard to the relation e. It is also evident according to this that every R-ordinal is a well-ordered set with regard to the membership relation. Theorem 34. Every transitive set M of R-ordinals is an R-ordinal. Proof. If A and B are two different elements of M, either AeB or BeA according to the preceding theorem. Further, if NEM and 0 CN, there is a first element E of N. Then as often as CeE, C is eN. Thus N is basic. It is clear that every transitive set M of R-ordinals is the least R-ordinal follo'wing all A e M. In particular, if M has an immediate predecessor N, then M = SN + N, otherwise M = SM. Godel has (1939) defined an ordinal number as a set M with the three properties 1) M is transitive. 2) If O C N E M , N is basic. 3) Every element of M is transitive. Let us call these sets M G-ordinals. I shall show that they are just the same sets as the R-ordinals. Let us assume that M is a G-ordinal and that there are elements of M which are not R-ordinals. These constitute a set SÔ and by 2) an element B of S exists such that BDS =0. Now let CeB. Then since BEM, so that CeM, we must have CeM - S, because otherwise CeS which is impossible, BD S being =0, it follows that C is an R-ordinal. According to the last theorem, B is also an R-ordinal, which is a contradiction. Therefore all elements of M are R-ordinals so that M itself is an R-ordinal. Let, inversely, M be an R-ordinal. Then every element of M is transitive, as we have shown above. Thus M is a G-ordinal. Further, Bernays has defined (1941) an ordinal number as a set M with the two properties 1) M is transitive 2) Every transitive proper subset of M is eM.
SETS REPRESENTING ORDINALS
37
We will say that every M satisfying this definition is a B-ordinal. I shall show that the B-ordinals are again the same sets as the R- or G-ordinals. Let M be an R-ordinal. According to Theorem 31 every transitive proper subset of M is an element of M, that is, M is a B-ordinal. Let, on the other hand, M be a B-ordinal, S be the set of elements of M which are R-ordinals. K AeB, BeS, then, according to Remark 3 above, A is an R-ordinal, that is, AeS. Thus S is transitive. By Theorem 34, S is an R-ordinal. Now, if S were ^ M, S would be a transitive proper subset of M, therefore SeM, whence SeS, which is absurd. Hence S = M so that M is an R-ordinal. Zermelo has (1915) set up the definition of ordinals, which we will call Z-ordinals, having the three properties
1) M= 0 or OeM
2) For every element AeM we have either AU {A}= M or A U (A}eM. 3) For every NEM we have either SN = M or SN eM. I shall show that the Z-ordinals are the same as the B-ordinals. Let M =)= 0 be a Z-ordinal and let A be the set of all B-ordinals B such that B LM and BeM. Whenever B f e B e A , B' is a B-ordinal cB whence B'cM and B ' e M so that B'eA._ Thus A is transitive. Therefore A is a B-ordinal. We have AEM, but A e M . Indeed AeM would mean that A e A . Now A maybe = BU{B} with BeM, whence by 2) A = M, or A is =SA, A the set of the preceding B-ordinals, and since SAeM is excluded, we get by 3) that A = M. Thus M is a B-ordinal. Let M be a B-ordinal. If M =t= 0, then O e M , because 0 is a proper transitive subset. K A e M , then AU{A} may be = M. If not, A U {A} is a transitive proper subset of M and therefore eM. Let N be EM. Then SN may be =M. If not, SN is a transitive proper subset of M and therefore eM. Thus M is a Z-ordinal. Finally v. Neumann has defined (1923) a set M as an ordinal number, we may say N-ordinal, as follows: A set M is an ordinal, if it can be well-ordered in such a way that every element is identical with its corresponding initial section. Let M be a N-ordinal. If BeM and A e M , then B is an initial section of M and therefore A e M . Thus M is transitive. Let S be a transitive, proper subset of M and BeS while A precedes B in the well-ordering of M. Then AeB because B is identical with the initial section of M consisting of all elements of M preceding B. Since S is transitive we have AeS. Thus S is an initial part of M, and because ScM an initial section of M. S is identical with this section and is therefore e M. Hence M is a B-ordinal. If, inversely, M is a B-ordinal, one sees by the theorems above that it is wellordered by e such that every element m of M is the set of all elements n preceding m.
38
9. The notions "finite" and "infinite"

We will now leave for a while the theory of transfinite numbers and deal with the notion "finite set". There are different possible definitions of this notion and with the aid of the well-ordering theorem they can be proved to be equivalent. Without the axiom of choice the proof of this equivalence seems impossible. I shall prove that the well-ordered finite sets are just the wellordered sets that are also inversely well-ordered, that is, there is in every non-empty subset also a last element. Definition of the notion inductive finite set: A set u is inductive finite, if the following statement is true: (x)(xeUUu & (Oex) & (y)(z)(yex & zeu -y u{z}ex) -uex). In ordinary language this means that every set x of subsets of u, such that Oex and as often as yex and zeu, always y U {z}ex, contains u as element. Remark. Such sets x of subsets always exist. Indeed Uu is such a set x. According to this definition we of course have the following principle of induction: If a statement S is valid for 0, and S is always valid for y U {z} if it is true for y, y c u, zeu, u inductive finite, then S is valid for u. I shall now prove a few theorems on the inductive finite sets.
Theorem 35. I f u i s inductive finite, so is u u {m}.

Proof. It suffices to assume meu. Let x be a set of subsets of u U {m} such that Oex and if yex and zeu U {m} then y U {z}ex. Further, let xf be the subset of x consisting of all elements of x which are cu. Then Oex f and as often as yex f , zeu, we have y u {z}ex and therefore also y U {z}ex f . Thus, u being inductive finite, uex f . But uex and meu U{m} yields u U{m}ex. Hence the theorem is correct. Theorem 36. Every subset of an inductive finite set u is inductive finite. Proof. Let v be u. I consider the set x of subsets w of u such that w n v is inductive finite. It is obvious that Oex, because the set 0 is inductive finite. Let y be ex and zeu. Then y n v is inductive finite and (y U {z}) 0 v is either y n v, namely when zev, or (y n v) + {z}, namely if zev. But by the preceding theorem also (y 0u) + {z} is inductive finite. Thus as often as yex, zeu, we have y u{z}ex. Since u is inductive finite, it follows that uex. Hence u H v is inductive finite, that is, v is inductive finite. It follows easily from this that each subset v of u, u inductive finite, must be an element of every set of subsets of the kind mentioned in the definition of inductive finiteness. Theorem 37. Ifu and v are inductive finite, so is u\j v. Proof. We consider the subset x of all subsets w of u such that w U v is inductive finite. Obviously Oex. Let yex and zeu. By the previous theorem, y is inductive finite. Further y U v is inductive finite so that y U {z} U v is also inductive finite which means that y U{z} ex. Since u is inductive finite, uex. This again means that u U v is inductive finite.
THE NOTIONS "FINITE" AND "INFINITE"
39
Theorem 38. If T is an inductive finite set of inductive finite sets A,B,C,...., then ST is inductive finite. Proof. We consider the subsets V of T such that SV is inductive finite. Obviously 0 is one of them. If V is one of them and KeT then V U {K} is one of these subsets of T according to the previous theorem, because S(V U {K}) = SV U K. Therefore, since T is inductive finite, T itself is one of these subsets, that is, ST is inductive finite. It is evident that if A is inductive finite, and there is a one-to-one correspondence between A and A', then A* is inductive finite. Using this it is easily proved that the product of two inductive finite sets is again of this kind, and further, that if T is an inductive finite set of inductive finite sets, the product PT is inductive finite. Theorem 39. If u is inductive finite, every set y of subsets of u contains a maximal element x. This is in symbols (U inductive finite) -^(y)(yeUUu-(Ex)(xey & (z)((zey)-(Et)(tex & te~z)v(x=z)))). Proof. Let us consider the subsets of u for which this theorem is valid. Certainly 0 is one of these. Lety be one of them. Then, if z e u , also y U {z} will be such a subset of u. Let, namely, M be a set of subsets of y U {z}. If all these subsets of y U {z}are actually subsets of y, then according to supposition there is a maximal element in M. Otherwise there are elements of M of the form yf U {z}, where yf u. These y7 constitute a set Mf of subsets of y so that there is a maximal one, say y0, among them. But then y0U {z} is a maximal element in M. Hence, since u is inductive finite, the theorem is true for u. The inverse is also true, namely: Theorem 40. If every set of subsets ofu then u is inductive finite. contains a maximal element,
Proof. In particular there is a maximal element in every set x of subsets such that Oex and (yex) & (z-eu) (y U {z}eu). But in this case it is obvious that there is no other maximal element than u itself, which proves the theorem. We might therefore just as well define a finite set as a set with property that there is a maximal subset in every set of subsets. We have seen that this notion coincides with the notion inductive finite, and we may notice that we have proved this without any use of the axiom of choice. A further definition of finiteness is the following: A set M is called Dedekind finite, if there is no one-to-one correspondence between M and any proper subset MT of M. Theorem 41. If M is Dedekind finite, so is M U {m}. Proof. If meM, nothing is to be proved. Let m be eM, and let us assume that f(x), where x runs through M U {m}, furnishes a one-to-one correspondence between M U{m} and a proper part N of that set. If N were M, then f(x) would map M on a proper part of M, contrary to supposition. We may therefore assume N = Ni + {m}, where NiCM. If f(m) were = m, f would map M onto NI . Then we would have to assume that f(m) e NI . In
40
this case f "* (m) e M so that one may define a mapping g such that g(x) = f(x) for all x=)= m and n = f" 1 (m) with g(m) = m and g(n) = f(m). Then g would map M onto Ni. Theorem 42. Every inductive finite set is Dedekind finite. Proof. Let M be inductive finite. Let HI be the set of all Dedekind finite subsets of M. Then OeHl and by the previous theorem N + {m}e HI whenever NeHl. Thus we have Mefll. In this treatment of the notions of finiteness we have hitherto not used the axiom of choice. This is needed, however, to prove the inverse of the last theorem. As a matter of fact, as far as I know, nobody has been able to prove that without the axiom of choice. I shall give two versions of the proof. Theorem 43. Every inductive infinite set is Dedekind infinite. Proof. That the set u is_inductive infinite means that there exists a set x of subsets of u such that uex in spite of the circumstance that Oex and whenever yex & zeu, we have y U {z}ex. It is clear that there is no subset of u occurring as a greatest element of x. Now let us assume the principle of choice, that we have a function f of the subsets y of u such that always f(y)ey. Then we can define a g(y) for all yex thus: g(y) = f(u-y). Then we may remark that the set x has the two properties: 1) Oex, 2) whenever yex also y + {g(y)}ex. All these x together constitute a subset X of Uu. Let XQ be the intersection D3C of all these x. Then XQ still possesses the properties 1) and 2). Furthermore, for every yex 0 , where 0=1= y, there is a y ^ e x o such that y = y-i + g(y-i). Otherwise x0- {y}would still possess the properties 1) and 2) which is contrary to the definition of XQ. Then we may define a mapping of u on a proper part of u as follows. We let u - Sxo be mapped identically onto itself while every g(y), where yex 0 , shall be the image of g(y-i) for the corresponding y _ j . This provides a mapping of Sxo onto the proper part Sxo - (g(0)}. Indeed every zeSxo must be a g(y) for some yexo, because otherwise we could remove all elements y containing the element z from XQ and still have a subset x with the properties 1) and 2). Theorem 44. If an inductive finite set is well-ordered, it is also inversely well-ordered by the same ordering. Proof. Let M be inductive finite. We consider the set T of all subsets N for which the theorem is valid. We have O e T . Let N be eT and meM but not eN. By every well-ordering of N + {m}, either m will precede all elements of N or come after all these, or m will divide N into an initial part NI and a terminal part N2 so that all elements of Ni precede m while all of N2 succeed m. But since every non-empty subset of N has both a first and a last element, one sees that every subset of N + {m}which is not empty has this property as well. Therefore MeT, which means that the theorem is true for M. Theorem 45. If a set M is well-ordered and also inversely well-ordered, it is inductive finite. Proof. Let us assume the existence of elements y of M such that the set of all x = y was not inductive finite. Among these y there is then a least one, say m. There is a predecessor mi of m. Then the set of all
41
x = mi is inductive finite. But according to a previous theorem then also the set of the x ^ m must be inductive finite. Therefore the set of all x i y is inductive finite for arbitrary y. Taking y then as the last element, one sees the truth of the theorem. Using the last theorems we obtain another version of the proof of the statement that every inductive infinite set M is Dedekind infinite. However we must also use the well-ordering theorem, so that this proof depends on the axiom of choice as well. Let M be well-ordered. Then after our preceding results this well-ordering of M cannot simultaneously be an inverse well-ordering. Thus there is a subset Mi ^ 0 without a last element. The set of all elements x = an element y of MI is then an initial part N of M without last element. Every element n of N has a successor n'eN. We may then define a mapping f of M into a proper part of M by putting f(n) = nf for every neN and f(n) = n for every n not eN. '
10. The simple infinite sequence.
Development of arithmetic
Let M be a Dedekind infinite set, f a one-to-one correspondence between M and a proper part Mf of M. Let 0 denote an element of M not in Mf. I denote generally by af the image f(a) of a, also by Pf, when PEM, the set of all pf = f(p) when p runs through P. Let N be the intersection of all subsets X of M possessing the two properties 1) OeX, 2) (x)(xeX-x'eX). Then N is called a simple infinite sequence or the f-chain from 0. We may say that it is the natural number series. It is evident that N has the properties 1) and 2). Further we have the principle of induction: A set containing 0 and for every x in it also containing x1 contains N. Theorem 46. (y)(yeN -(Ex)(y = xf) & (xeN) v y = 0). This means that any element of N is either 0 or the f-image of another element of N. The proof is easy: Let us assume that neN and ^ 0 and ^ every xf when xeN. Then N-{n} would still possess the properties 1) and 2), which is absurd. In order to develop arithmetic it is above all necessary to define the two fundamental operations addition and multiplication. Usually these as well as any other arithmetical functions are introduced by the so-called recursive definitions. I shall show how we are able to use here the ordinary explicit definitions which can be formulated with the aid of the predicate calculus. I shall introduce addition and multiplication by defining the sets of ordered triples (x,y,z) such that x + y = z resp. xy = z. We may consider the sets X of triples (a,b,c), where a,b.,c are eN, which have the two properties: 1) All triples of the form (a,0,a) are eX. 2) Whenever (a,b,c) is eX, (a,b',c f ) is eX.
42
It is clear that there exist such sets X. Indeed the set X0 of all triples (a,b,c), where a,b,c are eN, is one of them. Now let S be the intersection of all these X. I shall show that S is just the set of triples a,b,c such that a + b = c according to the usual meaning of addition. First of all it is clear that S itself is one of the sets X with the properties 1) and 2). Further, the following inversion of 2) is true: Theorem 47. Whenever (a,b',c')eS, we have (a,byc)eS. Proof. Let us assume that we had a triple (a,b f ,c T )eS while (a,b,c)eS. Then it is seen that S-{(a,b',c f )} would still have the two properties. Indeed if (a,fty)eS -{a,b,c f )} then (o,fty)e S, whence (a,/3 f ,y f )e S, whence again (o,j3 f ,y f ) e S -{(a,bjc_f)} unless a = a,j3 = b, y = c which however cannot be the case, since (a,b,c)eS, whereas (a,/3,y)eS. Using Theorem 46 we may also formulate Theorem 47 thus: (x)(y)(z)[tx,y,z)eS & (y 4=0) &(z +0) -(Eu)(Ev)((x,u,v)eS & (y = u') & (z = v'))]. Theorem 48. (a,b f ,0)e"s. Proof. If, for some a,b, we had (a,b f ,0)eS, it is seen that S -{(a,b f ,0)} would still satisfy the requirements 1) and 2). Theorem 49. (x)(y)((x,0,y)eS -*(x = y)). Proof. Indeed, if (a,0,b) with b=(= a were eS, then S - {(a,0,b)} would still possess the properties 1) and 2). Theorem 50. (x)(y)((x,y,0) e S) -(x = 0) & (y = 0)). Proof. Let (a,b,0) be eS. According to theorem 48 we have b = 0 because of Theorem 46. Then Theorem 49 yields a = 0. Theorem 51. (x)(y)(z)(u)(((x,y,z) e S) & ((x,y,u) e S) -(z = u)). Proof. Let P(b) be the proposition (x)(z)(u) (((x,b,z) e S) & ((x,b,u) e S) -* (z = u)). Then P(0) is true. Indeed, if (a,0,c) e S and (a,0,d)eS, it follows from Theorem 49 that c = a and d = a, whence c = d. Let us assume that P(b) is true for some b. Then, if (a,b',c) and (a,b f ,d) are e S, we have by Theorem 47 that c = c'i, d = dif for some GI and di while (a,b,Ci)eS and (a,b,di)eS, whence because of the assumed validity of P(b) it follows that Ci = di, whence c = d. Hence by complete induction the general validity of P(b) is proved. Theorem 52. (x)(y)(Ez)((z,y,z)eS). Proof. Let P(b) here denote (x)(Ez)((x,b,z)eS). Then P(0) is true. Let us assume that P(b) is true for some b. Then for arbitrary a there is a c such that (a,b,c)eS, whence (a,b f ,c ? )eS so that P(b T ) is true. Thus the theorem is proved by complete induction. The two last theorems show that for every x and y there is just one z such that (x,y,z)e S. We may therefore, instead of (a,b,c)eS, write c = a +b. 1, further, 0T is called 1, we have a +1 = af and the equations
a' 0, (af = b1) (a = b), a + 0 = a, a + bf = (a + b)1
43
are generally valid. As is well known we may derive the commutative and associative laws of addition by complete induction. This will be carried out later even in the more difficult case of predicative set theory based on the ramified theory of types. Now let us consider the sets Y of triples with the two properties: 1) all triples (a,0,0) are e Y 2) whenever (a,b,c)eY and (c,a,d)eS, we have (a,b',d)eY. It is evident that such sets of triples exist. Indeed the set of all triples is such a Y. Now let P be the intersection of all these Y. Then it is clear that P is again such a Y, but we can also prove the following inversions of the properties 1) and 2): Theorem 53. If(a,O,b)eP, then b = 0. then P - {(a,0,b)} would not seen, but also 2). Let (a,fty) e P together with P - {(a,0,b)} because (o,j8f,6) Proof. Indeed, if (a,0,b) were eP, b =(= 0, only have the property 1), which is immediately (a,fty) be eP - {(a,0,b)} and (r,a,6) e S. Then (y,a,6) eS yields (a,/3',6) eP, whence (a,/3',6)e cannot coincide with (a,0,b).
Theorem 54. If (a,b',c) ep, then (Ez)((a,b,z)e P & (z,a,c) e S). Proof. Let us assume that we had (&,b\c) e P, while for all z either (a,b,z)e~P or (z,a,c)eS. Let us consider the set Pf = P - {(a,bf,c)}. This set has obviously the property 1). Now let (a,fty) be ePf and therefore eP. As proved above, there exists a unique 6 such that (y,a,6)e S. Then (a,/3f,6) eP and therefore also (c^/S'êP 1 unless a = a,/3 = b,6 = c. This is impossible, however, because in such a case we should have (a,b,y) e P and (y,a,c)e S. Thus P1 would also possess the property 2), and that is absurd. Theorem 55. (x)(y)(z)(u) ((x,y,z) e P. & (x,y,u) e P -> (z = u)). Proof. Let S (b) denote the statement (x)(z)(u) ((x,b,z)eP & (x,b,u)eP - (z = u)). Then S(0) is true because (x,0,z)eP -*(z = 0) and (x,0,u)eP -(u = 0) (see Theorem 53). Let us assume that S(b) is true, and let us look at the conjunction (a,b f ,Ci)e P & (a,b f ,c 2 )e P. K this condition is fulfilled, we have according to Theorem 54, that x and y exist such that (a,b,x)eP & (a,b,y)eP together with (x,a,Ci)eS & (y,a,c 2 )eS. Because of the validity of S(b) this yields first x = y, whence GI = c2 by Theorem 51. Theorem 56. (x)(y)(Ez) ((x,y,z)e P). Proof. Let S(b) here be the statement (x)(Ez) ((x,b,z) e P). Then S(0) is obviously true. Let S(b) be true and let us assume (a,b,c)e P. Then by Theorem 52 there exists a d such that (c,a,d)eS, whence (a,b T ,d)eP. The two last theorems show that to every a,b there exists a unique c such that (a,b,c)eP. Therefore we may instead of (a,b,c)eP write c = ab, c being a function of a and b. Further, we have besides the earlier formulas a1 0, (aT = bf) (a = b), a + 0 = a, a + bf = (a + b)T also
a 0 = 0, abf = ab + a.
These, together with (a = b) (a = c b = c), beside the principle of induction
44
and the predicate calculus, constitute, however, the axiom system for formal number theory, see, for example, R.L. Goodstein, Mathematical Logic, p. 44. Thus we see that the development of ordinary arithmetic is possible in the Zermelo-Fraenkel set theory. The method I used here to replace the recursive definition of addition and multiplication by explicit definitions can be used quite generally for other recursive definitions. The primitive recursive schema, for example, is: f(0, a2,....,an) = g(a2,..., an) f(a! + 1, a2,...., an) = h(f(ai,...., an), a 1? ..., an) Here g and h are previously defined functions with n-1 respectively n+1 arguments, while f is the function to be defined. From the set-theoretic standpoint we may replace this recursive definition by the following explicit one. That g and h are already known may be expressed by saying that we have a set G of n-tuples and a set H of (n +2)-tuples of elements of N such that for arbitrary ai,..., a n _ 1 there is just one b such that (ai,.., a n _i, b)e G and for arbitrary ai, .., a n+1 there is just one b such that (ai, .., a n+ i, b)eH. Then we consider all sets of n+1-tuples of elements of N which possess the two properties: 1) Whenever (a2, ..., an, b)eG, we have (0, a2, ..., an, b ) e X . 2) Whenever (ai, a2, ..., a n ,b)eX and (b, ai, ...., an, c)eH, we have (ai + 1, a2, ..., an, c)eX. Then the intersection F of all sets X of this kind yields the function f, namely, as often as (ai, ...., an, b) is eF, we have b = f(a t , ..., an), and inversely. But also other kinds of recursions may be treated in the same way. As a further example we may take the definition of the Ackermann-Peter function, namely: 0(0,n) = n + 1
0 (m +1, G) = 0 (m, 1)
</>vm +1, n +1) = 0(m, 0(m + 1, n)). We consider here the sets Z of triples with the three properties: 1) All triples (0,n,n + l) are e Z 2) Whenever (m,l,n) is e Z, so is (m + 1, 0, n) 3) For arbitrary m, n, h, k we have
(m + 1, n, h) e Z. & . (m, h, k) e Z -*(m + l, n+1, k) e Z.
If 0 is the intersection of all these sets Z, one proves easily that to every pair a,b there is just one c such that (a,b,c)e0. Thus c is a function 0 of a,b, and this 0 is just the function defined by the recursive schema.
REMARKS ON THE NATURE OF THE SET-THEORETIC AXIOMS
45
11. Some remarks on the nature of the set-theoretic axioms. The set-theoretic relativism. Most of the axioms of the Zermelo-Fraenkel theory have the form: The class of all elements for which a certain statement is valid is a set, or, in other words, the domain D contains an element M such that all the objects in the class, and only these, are e M. We might call these axioms "defining axioms," because the set which is declared to exist is also defined. There are two axioms at least, however, which are not of this kind, namely, the axiom of infinity and the axiom of choice. The axiom I mentioned expressing the general aleph hypothesis is of course not a defining axiom. As I have shown (see Mathematica Scandinavica, vol. 5, p. 40) the axiom of infinity can be put into defining form. The easiest way of doing that is to use the notion of ordinal set introduced in 8. We may define a finite ordinal as an ordinal set M such that (Ex) (xeM) & (M = x*) & (y)(y eM -* (Ez)(zey & y = z*). Here x* means x U{x}. Then the axiom of infinity can be expressed by saying that the finite ordinals constitute a set. The axiom of choice has given rise to many discussions. The reason for this is of course its non-constructive character. But people who desire to retain as much as possible of the old Cantor theory feel obliged to maintainthat axiom. It is also quite clear that from an axiomatic point of view one must be allowed to study the consequences of any axioms whatever. On the other hand it cannot be denied that this axiom also leads to consequences which one scarcely had expected. I shall mention a couple of examples of this without entering into the proofs. In Hausdorff's book "Grundzu'ge der Mengenlehre" one finds the proof of the following statement: It is possible to divide the surface of a sphere into 4 disjoint parts A,B,C,D such that A is a denumerable set of points, while B,C,D, are mutually congruent and at the same time B is congruent to C + D. That two sets of points are congruent means of course that they arise from one another by a rotation of the sphere. Still more astonishing is a result obtained by Banach and Tar ski which has later been improved by some other authors. In an article "Decompositions of a sphere" by T. J. Dekker and J. de Groot in Fund. Math. XLIII it is proved that it is possible to divide a 3-dimensional unit sphere in 5 disjoint pieces, each piece being a connected set, such that by suitable translations and rotations these pieces can be put together again so that two unit spheres are formed. In the last instance it is a matter of personal taste whether one wants to have a set theory without or with an axiom of choice. A similar remark must be made with regard to the aleph hypothesis or the hypothesis of the existence of inaccessible cardinals etc. From a purely logical point of view it would already be interesting to study a set theory with only defining axioms. I have proved (see my address "Some remarks on set theory" in the report of the International Congress of Mathematicians, Cambridge, Mass, 1950) that in such a theory the introduction of any set M can be brought into the form (1) xeM 0(x),
46
where 0(x) is a prepositional function containing only x as a free variable while there may be an arbitrary number of bound variables, 0 being built from atomic expressions xey, yex, y e z , etc. by the logical connectives and the quantifiers. One might think in the first instance that there is a more general way of defining new sets, namely, by writing (2) xeM 0(x,N,P,R,..), where N,P,R,... are previously defined sets entering into the expression 0. However, it is possible to prove that every set defined by an equivalence of the form (2) is already definable by (1). Indeed the reduction of a definition (2) to the form (I) can be performed by introducing the definitions of N,P,R,... into (2) and repeating if necessary, this procedure. If N,P,R,... are defined by (1) we get on once the form (1) by introducing their definitions. If N,P,R,... are themselves defined again by (2), the process must be repeated. The simplest example of reduction from (2) to (1) is the case that M is defined by the axiom of separation applied to a set N which is defined in the form (1). E indeed (3)
and
xeNA(x) xeM^-(xeN) & B(x) ,
then xeMA(x) & B(x) which is of the form (1). Let us take as a little more complicated example the definition of the set M of all non-empty subsets of N, where N is defined by (3). First we have (xeM)(xeUN) & (Ey)(yex),
but
(xeUN)~(xCNH-(z)(zex . v . zeN) ^(z)(zex v A(z)), so that we obtain (xeM) (Ey)(yex) & (z) (z?x v A(z)). It is now easy to understand the correctness of the theorem:
Theorem 57. In a set theory where the axioms are all of the form: The class so and so is a set, the definable sets constitute a denumerable class. Proof. We obtain all sets M by taking in (1) all propositional functions which, by the operations of the predicate calculus, can be built from atomic statements yez and only contain x as a free variable. We may replace x by XQ, letting the bound variables be denoted by Xi, x2, .... Further, 0(x) may be written in prenex normal form, while its matrix is written in conjunctive normal form. Then we will get an enumeration of all 0(x) by enumerating all finite sequences consisting first of some pairs of integers corresponding to the quantifiers of the prefix, the first number being the index of the x which occurs as quantifier, the last number being 0 or 1 according as the quantifier is universal or existential. This sequence of pairs is then followed by a finite
REMARKS ON THE NATURE OF THE SET-THEORETIC AXIOMS
47
sequence of finite sequences of triples, each triple corresponding to an atomic statement x m ex n , the last number in the triple being 0 or 1 according as x m ex n occurs unnegated or negated and the first numbers being m and n. Of course this class of definable sets is not itself a set, or, in other words, it is no object in the domain D; neither is the enumeration of these sets a correspondence which occurs as a set in D. These considerations are put in a clearer light by the application to axiomatic set theory of the Lb'wenheim theorem, or more exactly a generalization of this. The theorem of Lowenheim says that if F is a well-formed formula of the first order predicate calculus with certain predicate variables A,B,C,.., either F is provable or F can be satisfied in the natural number series by suitable determination of A,B,C,... in that domain of individuals. The generalization which I proved in 1919 says that the same is true for_an enumerated set of such formulas, say Fi, F2, ..., that is,, either some Fj is provable or the whole set of formulas can be satisfied by suitable determination in N of the predicates occurring in them. Since the axioms of our set theory are either such formulas or are schemas each case of which is such a formula, it is clear that the generalized Lowenheim theorem can be applied. Therefore we have that if the axioms are consistent, it must be possible to determine the relation e between the natural numbers in such a way that all our axioms become valid. This result appears paradoxical, but it is not difficult to understand how it can be explained. Indeed, the existence of sets in our domain D is given by the axioms, and we have no guarantee that it should not be possible in other ways to introduce further sets. Therefore we have no reason to expect that, for example, the subsets of an infinite set which we can prove to ' exist in D are all of the subsets in an absolute sense. We must be content with a relativistic conception of set theory. Everything must be conceived in relation to D as it is supposed to be by the axioms, and we must abandon the idea that the axioms shall yield an absolute notion of "set" as in Cantor's theory. That M is not ~ N means in the axiomatic theory that there is in D no set F of pairs (m,n), meM, neN, yielding a one-to-one correspondence between M and N. But that does not mean that we cannot find such a set at all. There might be such a set, but outside D. In this way there might be a one-to-one correspondence between the Zermelo number series consisting of the elements 0, {0}, {{0}}, .... and the whole domain D, but this correspondence is not one of the sets of pairs which occur in D. Because of the general character of the theorem of Lowenheim and its generalization, it is clear that this set-theoretic relativism is unavoidable if we desire to have an exact formulation of set theory at all. Of course it shows the illusory character of the absolutist conceptions of Cantor's theory.
48
12. The simple theory of types In order to avoid the logical paradoxes, Russell invented the theory of types. The idea is to distribute all objects of thought into different types or, in other words to assume that they can be put into different layers or at different levels. We have some original objects called objects of type 0 (or 1 if one prefers). Sets of these objects or relations between them are objects of type 1. Sets of these again are objects of type 2, and so on. Further, the membership relation xey shall only have a meaning, if y is of type n + 1 as often as x is of type n. Composite prepositional functions 0(x) built up from atomic propositions xey have then only meaning if it is possible to attach numbers to the occurring variables such that always the symbol y in every occurring atomic proposition xey gets the number n + 1 when x gets the number n. Such expressions 0(x) are called stratified. We may now set up the following axiom of comprehension: For any stratified 0 (x) there exists a y such that the equivalence x e y 0(x) is generally valid, that is, it is valid for all x of type n if y is of type n+1. Since we do not introduce negative types, there will be a lowest possible type for x in 0(x), say no. Then the axiom asserts (I) (Ey)(x)(xe y 0(x)), where the range of the universal quantifier is the domain of all objects of type n, n = no, and the range of (Ey) is the domain of objects of type n + 1. The identity relation x = y might be introduced as an undefined notion beside the membership relation e . Then we would have to set up the axiom (x = y)-(i//(x) 'My)) for every stratified i//(x). It is simpler, however, to use only e as an undefined notion and define = by letting x = y stand for the validity of the equivalence, for any stratified ty . We then also need, however, the axiom of extensionality (II) (z)(zex-zey)-(x = y). It is seen at once that the axioms of the power set and the union in the Zermelo-Fraenkel theory are valid statements here, and also the axiom of separation for stratified C(x). As to the axioms of the small sets, these are also valid with the restriction that {a,b} can be built only when a and b are of the same type. It must be noticed, however, that we get not only universal sets of different types but also hull sets of different types. Indeed (Ey) (xey v xe~y) and (y)(xey & xey) used as 0(x) in (I) define, if y runs through all individuals of type n + 1, the universal set of type n respectively the null set of type n. Because of the restriction in building the set {a,b} , we ought to look at the union and intersection of two sets. If A(x) and B(x) are two stratified
THE SIMPLE THEORY OF TYPES
49
prepositional functions with only x as free variable, then also A(x) & B(x) and A(x) V B(x) will be stratified. This is seen as follows: If we can attach numbers to x and the other (the bound) variables in A(x) and do the same for B(x), then it is possible to do this for A(x) and B(x) in such a way that x is assigned the same number in A and B. Asa consequence of this, we can, for every type = a certain one, always build the union x(A(x) vB(x)) and the intersection x(A(x) & B(x)) of the sets x A(x) and xB(x). It must be remarked that A(x) is also stratified when A(x) is, so that we get a complementary set to every given set. There is a certain difficulty with regard to relations and functions. One would have liked to be able to conceive a binary relation as a set of pairs, and it would have been nice if this set could have been of the same type as a set of single elements. However, this would require the introduction of ordered pairs, triples, and so on, as new objects of the same type as the different terms in these sequences. Thus an ordered pair (a,b), where a and b are of a certain type, should again be an object of this type. This would mean a certain complication. Instead of that one could let the sign e stand for a binary relation in the case x e y , and a ternary if an ordered pair (x,y) is ez, and so on. Probably this is not advisable. The best thing to do is, I should think, to introduce the ordered pairs, triples, and so on, as sets. Also by this procedure one has to tolerate a certain complication, because the set of all x such that A(x) will not be of the same type as the set of all (x,y) such that B(x,y). For example if we have to do with a set N representing the number series, then the set of all primes p will be of same type as N, but the set P of all ordered pairs (x,y), where xeN, y eN, will be of a type 2 units higher. Indeed (a,b) = {{a,b}, {a}} is of a type 2 units higher than the type of a and b. The set {{p}} will however be of the same type as the set P. So far as I can see, it will be best to consider the ordered pairs, triples, etc., as sets. If we should try to develop mathematics, basing it on the simple theory of types, it would be desirable to have an axiom of infinity for the things of type 0. Indeed, if there is only a finite number of individuals of type 0, there can be only a finite number of each of the higher types. The development of arithmetic will then already be difficult and analysis would scarcely be possible. Now the axiom of infinity might be set up in different ways. We might assume a one-to-one correspondence f given between the set V of all things of type 0 and a proper subset VT of V. This mapping f would then be a fundamental notion in the theory beside the relation e . We may manage so that we don't introduce such an extra notion. We may assume the axiom (III) (x) (x is inductive finite -(Ey)(yix)). where y runs through all objects of type 0, x all objects of type 1. Then there will exist sets x of type 1 with 0,1,2,... elements. Introducing the notion cardinal number for the sets of type 1, every one of these cardinals is a set of type 2, and the finite cardinals constitute a set of type 3 which can be taken as the natural number series. Starting with this, the introduction of negative integers, fractions, real numbers, etc., can be performed in just the usual way. One has to take care of the type distinctions, but it is quite easy to develop ordinally mathematics in this way. Some small changes will often be necessary to carry over the theorems
50
and their proofs from the Zermelo-Fraenkel theory to the simple theory of types. Bernstein's equivalence theorem with its proof remains unchanged. Cantor's theorem that UM is always of higher cardinality than M must be expressed thus: Let EM be the set of all unit sets {m} contained in M. Then EM < UM. The previous definition of well-order ing (see 4) must be slightly changed to this wording: A set M is well-ordered, if there is a function R from EM to UM such that, for 0<=NEM, there is a unique neN such that NER({ n}). The wording of Theorem 10 must now be: Let a function 0 be given such that 0(A), for every A such that OcAM, denotes a unit subset of A. Then there is a subset JH of UM such that to every NE-M there is one and only one element No of HI such that NNo and 0(No) EN. Such slight changes will be necessary in many of the previous theorems and proofs. K we look at Theorem 6 for example, there can be no meaning in an equivalence between M + N and M N or even M x N, because the elements of M N are of type t + 1 and those of M x N are of type t + 2 when those of M and N are of type t. If, however, we replace M by its sets of unit subsets EM and N by EN, then EM + EN and M N will be of same type, and an equivalence between these two sets will be meaningful. Similarly we can compare EEM + EEN and M x M. I don't think it is necessary to carry out in detail these small changes in the considerations. By the way, it may be remarked that functions may well be introduced such that arguments and values are not of same type, but if functions should be conceived as special cases of relations, and relations as sets of sequences conceived as sets, such a procedure must be avoided.
13. The theory of Quine

There have been many attempts to avoid the introduction of types, which are inconvenient. One of these is the theory of Quine. An exposition of this can be found in the book "Logic for Mathematicians" recently published by B. Rosser. Quine's theory is something intermediate between the axiomatic theory of Zermelo-Fraenkel and Russell's type theory. It has in common with the former the feature that there are no type distinctions. On the other hand it has in common with the latter the feature that only stratified propositional functions are admitted for the definition of new sets. Indeed we have in Quine's theory the following axiom of comprehension: (Ey)(x)(xey0(x)) with the whole domain of objects as range of variation of x and y. Of course y must not occur in 0(x). It is easy to see that here we again get only one null set A and only one universal set V. We may for example use these definitions: xeA-~(y)(xey & x?y), xeV-^-(Ey)(xey v xey) . Obviously the set V is eV. Nevertheless Russell's antinomy cannot be deduced, because the propositional function xex is not stratified, so that no
THE THEORY OF QUINE
51
set M can be introduced such that xeM should be-*-*-xex. The ordinary constructions of new sets are, however, valid. If A(x) and B(x) are stratified, say without free variables other than x, also A(x) & B(x) and A(x) v B(x) are stratified, making the definition of an intersection and the union of two sets possible. Further, if A(x) is stratified, and x does not occur in A(y), then (Ey)(xey & A(y)) is stratified as well. This shows the existence of the union of all elements of a given set. Further (x)(xey v A(x)) is stratified so that we can always build the set of all subsets of a given set. Since A(x) is also stratified, there always exists a complementary set to any given set. There is therefore a greater possibility for the introduction of new sets in this theory than in Zermelo's. In spite of this, however, it turns out that the existence of infinite sets is not any more provable in Quine's theory than in Zermelo's, so that an axiom of infinity is just as well needed here. This is due to the fact that the prepositional functions needed for the definition of an infinite set are not stratified. In Rosser's book the axiom of infinity is set up thus: (m)(n)(meNn & n e N n & m + l = n + l m = n). Here Nn means the set of natural numbers, where the natural numbers are defined as the cardinals of finite sets. The axiom has the effect that none of these cardinals coincides with the set A, or in other words, there exist finite cardinal numbers as large as we please. The sequence of natural numbers is then infinite. It is interesting to look at Cantor's theorem. In type theory we could not compare Um with m. Here we can do that, but Cantor's theorem is not generally valid. That it cannot be generally valid is clear, because at any rate it cannot be true for V. However, if we modify the theorem a little, saying that UM is of higher cardinality than EM (this was also the formulation we could use in type theory) then we get a correct statement. This circumstance shows again that M and EM cannot always be equivalent. This appears very peculiar, but if we try to prove the equivalence between M and EM in general, this turns out to be impossible, because we would have to use prepositional functions which are not stratified. Nevertheless, in many particular cases the use of non-stratified formulas can be avoided. We therefore have to distinguish between sets M for which we can prove the equivalence between M and EM and those for which this is not provable. The former kind of sets are said to be Cantorian and Can M is written for the statement M ~ EM. Rosser mentions in his book that the statement Can M is provable not only for the natural number series, M = Nn, but for alt the sets which occur in ordinary mathematics. Since UVE V, we have UVî V. On the other hand
UV> EV.
so that
(1)
Iv < V
52
From this relation it follows (see the proof below) that (2) H? < EV, so that the sets V,EV, EEV, .... will possess decreasing cardinal numbers. The existence of such a decreasing sequence of cardinals shows that these cardinals cannot be alephs, whence it follows that not all sets can be wellordered. Therefore, the axiom of choice cannot be added to the other axioms of Quine's theory without contradictions. We may express this fact by saying that the principle of choice can be proved false in Quine's theory. This was pointed out by Specker. Proof that (2) follows from (1): Because of (1) there exists a mapping of the set of all unit sets {m} on a subset of V. Indeed the identical mapping is of that kind. However, the identical mapping maps the set of all {{m}} on just this subset of all sets {m}. Let us on the other hand assume that EV could be mapped onto EEV. The mapping would then consist of mutually disjoint pairs ({m}, {{n}}). However, the certainly existing set of pairs (m, {n}) would then furnish a mapping of V on EV contrary to (1). Hence (2) follows from (1). The theory of Quine's does not seem to have many adherents among mathematicians. The reason for this is presumably the existence of such sets in it as V which are elements of themselves, pathological sets as they are called. I don't think, however, that this circumstance ought to worry mathematicians, because it is not necessary to take these abnormal sets into account in the development of the ordinary mathematical theories.
14. The ramified theory of types. Predicative set
theory
I have already mentioned Poincare's objection to Cantor's set theory, that one makes use of the so-called non-predicative definitions. These definitions collect objects in such a way that the totality of these objects, or objects logically dependent upon that totality, are considered as belonging to the same totality, so that the definition has a circular character. It might perhaps be better to say that a non-predicative definition is the definition of an entity by a logical expression containing a bound variable such that the defined entity is one of the possible values of this variable. However, instead of trying to explain this generally, I think it is better to take a characteristic example. Let us consider mankind, the domain of all human beings. We have the binary relation "x is a child of y" which I write Ch(x,y). Let us try to define descendant of P, P any given person. If we make use of the notion of finite number we may proceed thus: We define the relation Chn(x,y) recursively by letting Ch'foy) stand for Ch(x,y) y) stand for (Ez)(Chn(x,z) & Ch(z,y)).
THE RAMIFIED THEORY OF TYPES Then the proposition "x is a descendant of P" may be written (En)(Chn(x,P)).
53
All this is quite clear and simple, but notice that we have to use quantifiers that are logically very different in nature, namely, on the one hand, quantifiers with mankind as range of variation, and, on the other hand, a quantifier extended over natural numbers. What appears most unsatifactory, however, is the circumstance that the notion natural number itself is of the same kind as the notion descendant of P. Indeed we can say that the numbers are the descendants of O by the successor relation + 1; therefore the above definition only refers one descendant relation to another. We may therefore ask if we can give a definition of a purely logical character that is independent of the notion of natural number. Following Frege and Dedekind we may do that by letting "z is a descendant of P" stand for & (x)(y) (X(y) & Ch where X runs through all classes of human beings. In ordinary language the wording of this is: That z is a descendant of P means that z belongs to every class X with the two properties, 1) P belongs to X, 2) whenever y belongs to X and x is a child of y, then x belongs to X. This is a typical example of a non -predicative definition because the defined class "descendant of P" is itself one of the values which the variable X is assumed to run through. Of course this definition is quite in order in the axiomatic set theory of Zermelo, also in Quine's theory, and in the simple theory of types as well. But in the case of such theories we have the question of consistency. The older and more natural point of view was that we should be able to set up a kind of reasoning which could be considered reliable so that we were assured a priori that contradiction would never arise. If we should try to set up such a logic, then the ramified theory of types, a theory where non-predicative definitions are excluded, might be assumed to be the correct one. It could be reasonable to assume that this theory is really a perfectly reliable one. Then, if we could believe this, a proof of consistency of this theory would be something out of the way, namely unnecessary and without point, because the reasoning yielding this proof could not be considered more reliable than the theory itself. In the ramified theory of types we have, just as in the simple theory, a distinction of type such that a e b only has a meaning when the type of b is a unit more than that of a. However we have also a distinction of order between objects of the same type. Thus if a class of objects of type zero is defined in such a way that only quantifiers extended over objects of type zero are used, then this class is of first order. If a class, still of objects of type 0, is defined so that beside eventual quantifiers extended over objects of type 0, there are also quantifiers extended over the just mentioned classes of order 1, then this class is said to be of order 2, and so on. A similar distinction of order must take place for the objects of type 1,2,.... But there are even further distinctions, because a class of objects of type 0, say, can also be defined by a logical expression containing quantifiers extended over objects of type 2 or even higher types. I shall, however, not try to go into further detail in this rather complicated affair, but rather give some examples of the kind of reasoning that is possible when we proceed in a predicative manner.
54
As a first example we may look at the proof of the Bernstein theorem of equivalence. We had sets M, Mf, MI such that
M ~ M f , M f CMi CM and we proved the existence of a 1 - 1 - correspondence between M and Mi. In the proof of this which I gave earlier I used, however, at one point a nonpredicative definition, namely, reckoning DT as a subset of M in the same meaning as the diverse elements of T. If we assume that the correspondence between M and Mf is of 1s* order, M, Mf and Mi sets of 1s* order and we let T be the setwhich of course is of type one unit higher than the type of M, Mf, Mi of all subsets of 1st order A such that for Q = MI - Mf
Q E A , A T EA, then DT is a subset of 2nc* order and the earlier conclusion that A0 = DT is eT is no longer valid. Nevertheless we may prove the identity
Ao = Q + Afo which we obtained in the earlier proof, but it must now be shown in a different way. Let us here write D instead of A0. Then I shall first show that we have
D = Q + Df.
Let us assume that a d existed such that deD, but de~Q& de~D f . The assumption deD 1 means that an XeT exists such that deX 1 , because DT is just the intersection of all Xf, where XeT. On the other hand we have deX and deQ. Now the set Y = X - {d} is of order 1 just as X and still Q is EY. Let y be eY. Then yeX, whence y f e X because X f ex. Hence y f e Y , because y f cannot be = d, since deY T and yf e Y1. Thus we have proved that
Q EY and Y' EY
so that YeT. Now we had deY, whence deD which is a contradiction. Therefore I have shown that if deD, then deQ v deD 1 , that is
(1) D EQ U D ' . Since QEA for every AeT we have Q ED, and since A f E A for every AeT we get DT E D. Thus (2) Q U D' E D D = Q + Df
(1) and (2) then yield as before and the remaining part of the proof can be carried out just as before. There are however also theorems in the usual set theory which are no longer provable in predicative set theory. As an example I shall mention Cantor's theorem that UM always possesses higher cardinality than M. We must replace M by EM of course, so that we would have to try to prove the
THE RAMIFIED THEORY OF TYPES
55
nonexistence of a 1-1 -correspondence between UM and EM. Our earlier proof was essentially due to the possibility of deriving a contradiction by considering the set N of all meM such that, if F was the assumed correspondence, meX where X was the subset of M corresponding to {m} by F, that is, (X, {m})eF. Translating the last phrase into logical symbols we have m e N (X){XeUM-*((X, {m})eF ~ Since this expression contains the quantifier X extended over all sets X of order 1 say, the defined set N of elements m is of order 2. But then we cannot substitute N instead of X and the derived contradiction disappears. Then Cantor's theorem is not longer provable as before. One might perhaps think that it could be proved in a quite different way, but that does not seem to be the case. In my opinion one has little reason to be worried because of the necessity to drop this theorem. Indeed the distinction of order compensates for the fact that we don't have the usual distinction of cardinality. As a further example of predicative reasoning I shall develop elementary arithmetic basing it as before on a definition of the simple infinite sequence, now, however, taking into account the order distinction. I prefer now to talk about classes, relations, etc., instead of sets. Also I think the considerations will be easier, if I use suffixes to denote the different orders. To begin with I assume that we have a class M and a binary relation fi (x,y) both of order 1. The relation fi is supposed to be a 1-1 -correspondence. The identity relation x = y is assumed to be a relation of order 1; but for simplicity I assume the axiom valid for 0 of arbitrary orders. Then we assume fi (x, y) & f ! (z, y) (x = z) f ! (x, y) & f ! (x, u) - (y = u) For simplicity I denote y, whenever fi(x,y) takes place, by xf. The class of 1-st order consisting of all xf, x running through Xi , I denote by Xi f . Then I assume that Mf c M and O may denote an element of M not in Mf. I denote by N2 the class defined thus: neN2(Xi)(O6Xi & (x) (xeXi - > x f e X i ) ->(neXi)) or, as I now prefer to write it, N a (n) (Xi) (XitO) & (x) (K^x) ^X^x')) ->X 1 (n)). The class of type 2 whose elements are all Xi for which Xi (O) & (x) (Xi(x) -*Xi(xt)) may be denoted by T. Similarly N3 is defined thus: N 3 (n) (X2) (X2(0) & (x) (X 2 (x) -X 2 (x')) ^X 2 (n)), etc. Corresponding to these definitions we have the following principles of induction. If a class Xr of order r contains O and besides x always contains XT, then Xr contains the whole class Nr+i . We may regard N2, N 3 ,... as successively sharpened determinations of the natural number series. Now I shall show how we can define a ternary relation of second order, S 2 (x,y,z), such that, conceiving S 2 (x,y,z) as x + y = z, we obtain the ordinary theorems of addition.
56
Let us consider the ternary relations of first order X! (x,y,z) with the two properties 1) (x) Xi(x, 0,*) , 2) (x)(y)(z) (Xi(x,y,z) They constitute a class Tr of type 2. These have an intersection S2 (x,y,z) and trivially we have (x)S 2 (x,0,x) and (x)(y)(z) (S2(x,y,z) -S2(x,y',z')). I shall prove such statements as (x)(N2(x) -(x = 0 v N f 2 (x)) or in other words (x)(N2(x) -(x Further (x)(y)S2 (x,y',0)) Proof of (x)(N2(x) -x = 0 v (Ey(N 2 (y) & (x = y')). Let us assume the existence of an individual a such that N2 (a) & (a =1= 0) & N2 (a). Because of N 2 (a) we have for every XieT that Xi(a). Now let X? be Xi - {a}. Then I shall show that for at least_one Xi , Xi* would still have the properties 1) and 2) so that "Xi*eT, whence N 2 (a), a contradiction. Indeed we have X? (0) since XÔ) and a =1= 0. Further, if X?(a), then Xâ), whence Xi(a'), Xi being eT, whence again Xi* (d1) unless a = af. Now there must be at least one Xi e T for which this is not the case, because otherwise we should have Na(a) contrary to the assumption concerning a. Since there is an XêT such that Xi*(a), we should have N 2 (a), which is a contradiction. Proof of S2 (a,bT,0) for arbitrary a and b. Let us assume S2 (a,bf ,0). Then we have Xi (a,bf,0) for every Xie Tr. Let Xf be Xi -{(a,b',0) }. Then X? still has the property 1), because (x,0,x) can never be = (a,bf ,0), 0 being =t= every yf. However, X* also possesses the property 2). Indeed if Xt (a,3,y), thenX!(a,/3,y), whence Xi (a, j3 f ,y f ), whence Xt (a, 0 T ,r'), unless (a,j3 f ,y f ) were = (a,bT,0) which is impossible because yf I 0. But X*e Tr and xT(a,b',0) yields S 2 (a,b f ,0). Proof of S2(a,0,c) -(a = c). Let us assume S 2 (a,0,c) & (a =)= c). Then for every Xi Tr we have Xi (a,0,c). Let X* be X! - {(a,0,c)}. Then it is seen again that X* will still possess the two properties, so that X*e Tr. Since X3} (a,0,c), it follows that S 2 (a,0,c) which is contrary to supposition. Then the truth of S2 (a,b,0) -a = 0&b = 0 follows from the last three statements. Proof of jSâjb'jC*) -^S 2 (a,b,c). Let us assume for some a,b,c that S2 (a,b f ,c f ) & S2 (a,b,c). Then for an arbitrary element Xi of Tr we have Xi (a,bf ,c f ), whereas for a certain Xi we have Xi (a,b,c). Let Xf be Xi {(a,b f ,c f ) } for such an Xi . Then it is seen immediately that Xt has the property 1). It has the property 2) as well. Indeed, let 3Ci(a,fty) be true. Then Xi(o,fty) is true, whence Xi (a,f?,y*), whence X*(a,j3 t ,y t ), unless (a,P,f) = (a,b f ,c f ) which however would mean (a,/3,y) = (a,b,c) but that is impossible because we have Xi (a,b,c) but Xi(ff,0,y). Hence X*eTr so that X* (a,b f ,c f ) leads to S 2 (a,b f ,c f ) contrary to supposition. (x)(z)S2 (x,0,z) - (x = z) and (x)(y)(S2 (x,y,0) -*x = 0 & y = 0 = 0 - v (Ey)(N 2 (y) & (x = y'))).
57
Proof of S 2 (0,b,c) (b = c). Let Xi be e Tr and X* be what remains of Xx when all triples (0,y,z) with y 4= z are removed from Xi. Obviously X* is of order 1 just as Xi is. I assert that also X*e Tr. Indeed for every triple (o,0,a) we have Xi(a,Q,a) whence also X*(a,0,a). Otherwise (o?,0,a) would be of the form (0,y,z) with y ^ z, but that is not the case. Thus X* has the property 1). Let us assume X*(a,/3,y). Then Xi (a,0,y), whence Xi(af, f ,y f ), whence also X*(a,/3 f ,y T ) unless (a,j3l,yt) is of the form (0,y,z) with y =1= z, that is, a = 0, (3* =1= yf. But then we should have Xf (ff,fty). Thus X?e Tr and since S2(0,b,c)-*X*(0,b,c) we have b = c. Theorem 58. (x)(y)(z)(S (x',y,z') ->S (x,y,z)). Proof. For each Xi e Tr we let X* be what remains of Xi when all triples (x f ,y,z f ) are removed for which we have Xi (x f ,y,z f ) but not Xi(x,y,z), that is, X?(x',y,z')X^x'^z') &X!(x,y,z). Further all triples (x,y,0) are removed for which x or y is =(= 0. Then X* has the property 1). Indeed for all (a,Q,a) we have Xi(a,0,a)> whence X*(o?,0,a), because if a= a[, we have also Xi (o?i,0,Q?i). Now let us assume X?(a,/3,y). Then Xi(a,/3,y) whence Xi(o?,|3l,yt) whence X*(a,j3 f ,y f ), unless (a,j3f,yf) = a certain (x f ^y,z f ) for which Xi(x f f y,z f ) & Xi(x,y,z). That would mean Xifa^y') & Xi(ai,j3 f ,y) with a = a[. Let us first consider the case y + 0, that is, y = yj for a certain yi. Then because of X*(o?,/3,y) we have Xi(a,j3,y) & Xifo^ftyi). But Xi (i,jS,yi) yields Xi (a,j3T,y) so that we get a contradiction. It remains for us to look at X* (a,ftO). This requires a = j3 = 0. But Xi(0,0 f ,0 f ) is true and therefore also Xf (0,0 f ,0 f ) because (0,O f ,0 T ) is not removed from Xi by the construction of X*. Thus Xt has the property 2) as well, so that X*eTr. Now let a,b,c be arbitrary. I assert that S 2 (a',b,c')^S 2 (a,b,c). Let us assume_ S2 (a^bjC*) & S2 (a,b,c). Then there exists an Xi eTr such that X 1 (a f ,b,c t ) & Xi (a,b,c). We build the corresponding X* as above. Then we have X f e T r and Xf(a f ,b,c f ), whence S2(a',b,c') which is a contradiction. Corollary.
T f
(x)(y)(z)(S2 (x',y,z') -S2 (x,y',z)).
Proof. S 2 (a ,b,c ) ->S2(a,b,c) -S 2 (a,b f ,c f ). I will only mention that such a statement as (y)(N2(y) ~(x)(Ez)Xi(x,y,z)) is easily proved. I shall not make any use of that, but instead prove the following theorems. Theorem 59. (y)(N3(y) -(x)(z)(u)(S2(x,y,z) & S2(x,y,u) -(z = u)). Proof. Let C2 be the class of all y such that (x)(z)(u)(S2(x,y,z) & S2(x,y,u) -*(z = u)). Clearly C2(0) is true, because S2(a,0,c) is only true for a = c. Now let C 2 (b) be true. If, then, for certain a,c,d we have S 2 (a,b f ,c) & S 2 (a,b f ,d), then according to a remark above, c must be = c{ for a certain GI and d = d{ likewise, whence S2(a,b,c) & S2(a,b,d), whence, because of
58
C2(b), Ci = di, whence c = d. Thus C 2 (0) & (y)(Ca(y) -'CaCy')) is true, whence the theorem, because of the definition of N3 . Theorem 60. (y)(N3 (y) -* (x)(Ez)S2 (x,y,z)). Proof. Let C2 here be the class of all y such that (x)(Ez)S2(x,y,z). Obviously C2 (0) is true. Let us assume C2 (b) and let a be arbitrary. Then we have S2(a,b,c) for a certain c, whence S 2 (a,b f ,c f ) whence C 2 (b f ). Hence the theorem. The last two theorems may be combined in the single statement (y)(N 3 (y)-(x)(Ez)S 2 (x,y,z)), where E means "there exists one and only one". particular Of course this yields in
(x)(y)(N3(x) & N3(y) ~*(Ez) S2(x,y,z) , but the question arises, whether the z here again is an element of N3 . I shall now show that this is really the case. Let C2 denote an arbitrary class of 2. order with the two properties 1) C 2 (0) and 2) (x) (C2(x) -C2(x')). Then for every such class C2 I construct another class C* thus: C| (y) (x) (C2(x) -*(Ez) (S2(x,y,z) & C 2 (z)). Now I assert that C * has again the properties 1) and 2). The truth of C^ (0) is immediately seen, because we have S2(x,0,x) and C2(x) >C 2 (x). Let us assume "C? (b). Then for an arbitrary a we have a unique c such that S2(a,b,c) and C 2 (c). Hence S (a,b?,cT) & C (c f ), and according to a theorem above we cannot have S 2 (a,b f ,d) unless d = cf. Thus C*(bT) follows from Theorem 61. (x)(y)(N3(x) & N 3 (y) -(Ez) S2(x,y,z) & N 3 (z)).
Proof. According to the definition of C* we have for arbitrary C2 of the supposed kind (x)(y)(C2(x) & C? (y) -(Ez)(S2(x,y,z) & C 2 (z))). Now N3 is C2 and C* . Therefore (x)(y) (N3(x) & N3(y) -(Ez) (S2(x,y,z) & C 2 (z)). Here C2 is an arbitrary chain of 2. order, that is, a class of 2. order with the properties 1) and 2). Therefore we may just as well write (x)(y)(N3(x) & N3(y) -(Ez) (S2(x,y,z) & (X2)(X2(0) & (u)(X2(u) ->X 2 (u')) ->X2(z))) which, by taking into account the definition of N3 , is just our theorem. In this way we have succeeded in obtaining a ternary relation S2 (x,y,z) which in N3 will play the role of addition, as I shall show. Theorem 62. (z) (N3(z)-> (x)(y)(u)(v)(w)(S2 (x,y,v) & S2(v,z,u) & S2(y,z,w)-> S2(x,w,u))) Proof. Let C2 (b) denote (x)(y)(u)(v)(w)(S2(x,y,v) & S2(v,b,u) & S2(y,b,w) ->S2(x,w,u)).
59
Clearly C2 is a class of second order. We have that C2 (0) is true, because S2(v,0,u) & S2(y,0,w) -*(u = v) & (y = w). Let C2(b) be true and let us assume S2(x,y,v) & S 2 (v,b f ,u) & S 2 (y,b f ,w). Then we have u = uj , w = wl for some ui, wi and S 2 (v,b,ui) & S 2 (y,b,wi) which, together with S2(x,y,v), because of C2(b), yields S 2 (x,wi,ui), whence S(x,w,u). Thus the implication C 2 (b) ->C2(bt) is generally valid. Then the theorem follows from the definition of N3 . A fortiori we have (x)(y)(z)(u)(v)(w)(N3(x) & N 3 (y) & N 3 (z) & N3(u) & N 3 (v) & N3(w) (S2(x,y,v) & S 2 (v,z,u) & S2(y,z,w) - S2(x,w,u)). This is the associative law of addition. Theorem 63. (x)(N3(x) -(y)(z)(S2(x,y,z) -S2(y,x,z))). Proof. Let C2 (a) be an abbreviation for (y)(z)(S2(a,y,z) ->S2(y,a,z)). Then C 2 (0) is true because, according to a result above, S(0,y,z) ->(y = z) and S 2 (y,0,z) (y = z). Let us assume the truth of C 2 (a) and let S 2 (a f ,b,c) be true. Then by some results above we have c = cj for a certain GI and S 2 (a f jb,c f ) - S 2 (a,b f ,c f ) so that because of C 2 (a), we also get S 2 (b f ,a,c), whence S 2 (b,a f ,c). Therefore we have (y)(z)(S2(a',y,z) -S2(y,a',z)), so that C 2 (a)-> C 2 (a'). According to the definition of N3 , the theorem must be valid. A fortiori we have (N3(x) & N 3 (y) & N 3 (z) -(S2(x,y,z) - S2(y,x,z))). This is the commutative law of addition. Thus the ternary relation S2(x,y,z) & N3(x) & N3(y) & N 3 (z) which we can write 3 (x,y,z) or z = x + y is a relation of 3. order which has the ordinary properties of addition, in particular, x + (y + z) = (x + y) + z, x + y = y + x. Now let us define a relation "less than or equal to" of second order, namely, M2 (x, y)* (Ez) S2 (x,z,y). Then inside N3 Theorem 64. M2(a,b) & M2(b,c) ~* M2(a,c). Proof. The hypothesis of the implication amounts to S 2 (a,d,b)&S 2 (b,e,c) for some d and e. According to Theorem 59 there is an f such that S 2 (d,e,f). Then theorem 62 furnishes S 2 (a,f,c), whence M2(a,c). Theorem 65. (y)(N3(y) - (x)(M2(x,y) v M2(y,x))).
60
Proof. Let C2(b) be (x)(M2(x,b) v M2(b,x)). Then C 2 (0) is true, because M2(0,x) is obviously true. Let us assume C2(b). If M2(x,b') is true, we have at once C2 (bf), and M2 (x,br) is true if M2 (x,b) is. Otherwise we have M2(b,x) that is (Ez) S2(b,z,x). If z =t= 0, we have z = zl and S2(b,z,x) -*S2(bf, Zi, x), that is, M 2 (b f ,x). K z = 0, we have x = b, whence M 2 (x,b f ). Thus C2 is a chain of 2. order, and hence (y)(N3(y) -*C2(y)), which is the theorem. It follows that M2 will have the ordinary properties of the relation = in
Na.
Now in order to develop elementary arithmetic we must introduce multiplication. This can again be done by considering some ternary relations. It must be remarked, however, that these relations ought to be chosen as 1. order relations Yi (x,y,z). Otherwise we might have to make a transition to unnecessarily high orders of the number series. It would not be advantageous to take, for example, the relations Z 2 (x,y,z) which have the properties 1) (x)Z2 (x,0,0) and 2) (x)(y)(z)(Z 2 (x,y,z) & S2(z,x,u) ->Z 2 (x,y',u). It is better to introduce addition and multiplication simultaneously as follows. Let us consider all quaternary relations Ui(x,y,z,u) such that Ui is true only for u = 0 or 1 and has the properties )f 2) (x)Ui(x f O f O,l), 3) (x)(y)(z)(U 1 (x,y,z,0)-U(x,y%z',0)), 4) (x)(y)(z)(Ui(x,y,z,l) & Ui(z,x,u,0) -Ui(x,y',u,l)). Then if S2(x,y,z) denotes the intersection of all Ui(x,y,z,0) and P2(x,y,z) the intersection of all Ui(x,y,z,l), one is able to show that in a suitable Nn all of the ordinary principles of addition and multiplication are provable, x + y = z meaning S 2 (x,y,z) and xy = z meaning P 2 (x,y,z). However, I will not carry out all that here in detail, in particular for the reason that different procedures are possible. One fact ought to be noticed: The relation S2 (x,y,z), which in N3 defined addition, does that also in Nn for any n> 3, that is, every Nn is closed with regard to this addition. Let us, for example, consider N4 . If N4 (a) and N4(b), then N 3 (a) and N3(b) so that a unique c exists such that S2(a,b,c) & N 3 (c). But how can we conclude N4 (c) ? This can be seen thus: Let S3(x,y,z) be the intersection of all X2(x,y,z) with the properties 1) and 2). Then we can prove in the same way as above that (x)(y)(N4(x) & N4(y) -(Ez) S3(x,y,z) & N 4 (z)). Furthermore let us write the z for which S3(x,y,z) & N 4 (z) as x + fy. Now it is obvious that S3(x,y,z) ->S2(x,y,z). Hence, for arbitrary a and b such that N 4 (a) and N 4 (b), we get that c = a + f b >c = a + b , so that the result of the operation +f is the same as the result of +. In the same way the other operations we may introduce, such as multiplication, exponentiation, etc., all will retain their meaning for the natural number sequences of higher orders. I must confine my remarks to these hints, which I nevertheless hope are sufficient to show that a purely logical development of arithmetic similar to that given by Dedekind in his work "Was sind und was sollen die Zahlen" is possible even in the ramified type theory.
LORENZEN'S OPERATIVE MATHEMATICS
61
If we turn to analysis it must be remarked that the classical form of it cannot be obtained. Indeed it will be necessary to distinguish between real numbers of different orders. A class of real numbers of 1. order which is bounded above possesses an upper bound, but this bound may then be a real number of order 2. Nevertheless a great part of analysis can be developed as usual, namely, the most useful part of it dealing with continuous functions, closed point-sets, etc. The reason for this is that it is often possible to prove theorems of reducibility, namely, theorems saying that a class (or relation) of a certain order coincides with one of lower order. I will not enter into this but only refer the reader to the book: "Das Kontinuum" by H. Weyl, where he has developed such a kind of predicative analysis.
15. Lorenzen's operative mathematics In more recent years the German mathematician P. Lorenzen has set forth a system of mathematics which in some respects resembles the ramified theory of types, but it has also one important feature in common with the simple theory of types, namely, that the simple infinite sequence and similar notions are characterized by an induction principle which is assumed valid within all layers of objects. Lorenzen talks namely about layers of objects, not of types or orders. To begin with he takes into account some original objects, say numerals, figures built up in a so-called calculus as follows. We have the rules of production
which means that the object or symbol 1 is originally given and whenever we have a symbol or a string of symbols k we may build the string k 1 obtained by placing 1 after k. He introduces the notion "system". A system is a finite set of symbols. The systems are obtained by the rules x X~X, x The length or cardinal number of a system X is denoted by |x|. He gives the rules |X,X|=|X|1 for these lengths. Now the explanation of the successive layers of language is as follows. From certain originally given symbols called atoms, say Ui un, he constructs strings of symbols by the schema
X XUi
x -'xun
62
Further, he introduces logical symbols, first A , V, *, 1 denoting conjunction, disjunction, implication and negation respectively, then Ax, Ay, ... which are universal quantifiers, Vx, Vy, ... which are existential quantifiers, ep, ea which express membership, namely, that something belongs to a class, relation, etc., an operational symbol having the same meaning as Russell's x, and finally 4p, Ja, .. which are called operators of induction. These last ones have the following significance: Let AI , A2 , .... be prepositional expressions built up from propositions YI , Y2 , .... Y n ep while we have the schema of production AI~ > XI,I , Xi,2 , ...., then 4 p written before this schema denotes the relation that is the set of all m - tuples which can be constructed by the schema. The symbolic figures obtained in this way constitute what Lorenzen calls the first layer of language and denotes by Si . Whenever Sn, the n*h layer of language, has been constructed, he defines the (n+1)**1 layer Sn+1 as consisting of all figures belonging to Sn together with all further ones which can be derived from them by the same means we used in deriving the first layer from the atoms Ui ...., un. By this procedure it is necessary to distinguish between variables in different layers, for example, by writing the number of the layer as a subscript just as I used an order subscript above in the ramified type theory. The construction of layers can, however, be continued transfinitely. Indeed, after having performed the construction of the layers Sn with finite n, Lorenzen defines S(j, co the least transfinite ordinal, as the union of all Sn, n < w. Now it becomes possible to introduce Sum > 8^+2 , and their union Scu 2 , and so on. He can introduce all S a , where a is any constructed transfinite ordinal. One sees the resemblance between this theory and the ramified theory of types. In both theories an expression containing a bound variable extended over a previously obtained range is considered as belonging to a new range of symbols. The presence of the symbols Up, lov- means that there is not, as in the previously treated systems, any attempt to reduce the inductive or recursive definitions to the explicit ones, an attempt which caused so much trouble above, in particular in the case of the predicative set theory. In accordance with this attitude in Lorenzen' s system, the principle of complete induction remains unchanged by transition to higher layers. It is obvious that the objects of a certain layer SQ can be enumerated. In his book "Operative Logik and Mathematik" he shows this in detail for Si and it is easy to see how that can be carried out for an arbitrary layer SQ. The formula giving this enumeration does, however, not belong to SQ but to SQ+I . Sometimes, of course, a set belonging to SQ may have an enumeration belonging to SQ. We may then say that the set is denumerable in SQ. Otherwise the set is nonde numerable in SQ. All this shows that the notion "denumerable" must be conceived in a relative sense. This result we also obtained by application of the Lowenheim theorem to the axiomatic set theory in so far as denumerable models must exist for any consistent set theory. But in Lorenzen' s theory this relativism is obtained immediately.
LORENZEN'S OPERATIVE MATHEMATICS
63
In connection with this we may notice that the problem concerning the principle of choice disappears. Indeed, the enumeration in S0+ 1 of the objects constituting the layer SQ makes possible at once the simultaneous choice of one element from every set in SQ. On the other hand it is not certain that we can find a formula in SQ furnishing such a choice for a set of sets in SQ. Thus we have again a relativity with regard to the existence of choice functions. Now let us consider real numbers defined, for example, as initial parts of the ordered set R of rational numbersand sets of reals all belonging to the layer S & , where Q is a limit number. Then it is possible to prove for each set M of real numbers, M as well as the elements of M belonging to S e , that if M is bounded below, it possesses a lower bound y also in S Q . Indeed y is the intersection of all elements of M considered as initial parts of R. Since MeS e , we have Me S Q, 9 some ordinal < @ . In the definition of y all occurring variables belong to SQ but there is a universal quantifier extended over SQ. Thus y is a real number occurring in S0 + i. However, since @ is a limit number we have 9 + 1 < Q . Therefore the lower bound y always again belongs to S e . More special theorems, such as the existence of a convergent subsequence of a bounded sequence of reals, and that every convergent sequence (in the sense of Cauchy) has a real number as limit, are easily proved. The theory of neighborhoods and coverings is more difficult. In order to be able to develop the usual covering theorems, Lorenzen finds it necessary to take into account sets of real numbers belonging to essentially higher layers than the real numbers themselves. He choses two limit numbers, @ i < 0 2. The considered real numbers shall all belong to S e 1, whereas sets of, and relations between, these reals are allowed to belong to S e . The classes and relations which already belong to S 9 i are called primary, those which belong to Se 2 but not S e 1 are called secondary. It may be noticed that by taking into account also the secondary sets we are enabled to say that all the reals in an interval constitute a set, namely, a secondary one. Indeed it is clear that all these numbers belonging to S 0 , constitute a set tr 1 that occurs in S e 1+1. Similar remarks can be made for neighborhoods. Lorenzen now succeeds in proving the Heine-Borel theorem, which here has the wording: To every primary covering, that is a primary set of neighborhoods, one can find a finite covering, that is, a finite set of such neighborhoods. A further important notion is that of a quasi-primary function: That y = f(xi,...,x n ) is quasi-primary means that, whenever Xi,..., xn are primary real numbers, that is, they belong to S0 1, y is a primary real. Of course every primary function is quasi-primary, but the inverse is not always true. Thus, for example, x + y is quasi-primary but not primary. Indeed the set of all triples (x,y,z) such that x + y = z, where x,y,z run through S e L, does not belong to S 0 x, but to S @x + 1. For the quasi-primary functions Lorenzen proves theorems analogous to the theorems in ordinary analysis concerning functions of real numbers. Thus he proves that a continuous quasi-primary function on a closed interval
64
is uniformly continuous. Further he proves that the values of such a function on a closed interval are bounded and that the upper and lower bounds are attained. He also proves that such a function takes every value between two of its values. If a quasi-primary function has a derivative for every (primary) real number, then this derivative is again a quasi-primary function. He also develops a theory of integration, defining first the Riemann integral, later also Lebesque's. It might seem that a measure theory must be impossible in this system, because by ordinary concepts the measure should be = 0 for denumerable sets, and here all sets are denumerable in a sufficiently high layer. However, the distinction between primary and secondary sets makes a definition of measure possible in such a way that the primary sets all get the measure 0, but not the secondary sets. This system has one great advantage in distinction to the previous ones, namely, that the objects we are dealing with are all definitely and explicitly given. It is true of course that the unsolvability or even undecidability of many problems remains as before, but we know what we are talking about. In the previous theories it was at any rate not required that our considerations should be restricted to the definable or constructible objects.
16. Some remarks on intuitionist
mathematics
Of great interest is the so-called intuitionism which above all is due to the Dutch mathematician L. E. J. Brouwer. This theory is essentially characterized by the requirement that an assertion of the existence of a mathematical object must contain a means of finding or constructing such an object. Further, the use of such a formal logical principle as "tertium non datur" is only justified, if we have a decision procedure. The intuitionist critique of classical mathematics is similar to the critique of Kronecker who also declared that a great part of ordinary mathematics was only words. It would lead too far, however, if I should give in these lectures a detailed exposition of the intuitionist foundation of mathematics. I must confine my exposition here to a few remarks which I hope will give an idea of the intuitionist way of reasoning. The conjunction p & q retains its usual meaning also in intuitionist logic. The disjunction p v q can be asserted if and only if either p can be asserted or q can. The negation ~| p shall mean that the assumption p leads to a contradiction. The implication p q means that we are in possession of a certain construction which will furnish a proof of q as soon as a proof of p is available. The assertion (x)p (x) is justified if we possess a schema showing the property p(x) for an arbitrary x, and (E(x)p(x) can be asserted if we know an x with the property p or at least have a method for constructing such an x. Since we have no general method to prove either p or "| p, the tertium non datur, p v~l p, is not generally valid. It can be proved that p *"|~lP *s generally true, but not the inverse implication. Such differences in the propositional logic cause differences in predicate logic of course. As an interest-
REMARKS ON INTUITIONIST MATHEMATICS
65
ing example of the difference in the classical and the intuitionist way of stating a theorem, I will take an example mentioned in the book "Intuitionism" of Hey ting. Let us define a real number p by writing an infinite decimal fraction as follows. As long as no sequence of digits 0,1,2,3,4,5,6,7,8,9 has occured in the development of IT = 3.14... as a decimal fraction, there shall only be digits 3 in the development of p, however, if it should happen that the digits in the places n - 9, , n should be just 0,1,...,9, then all digits after the nth shall be 0 in the development of p. Then it is easy to prove that
This can, in classical mathematics, be expressed thus:

P= Q
3.10*
However, this is not correct intuitionistically, because the last statement n 1 would mean that we are able to prove either that p = ^ or that p = 10 -l for 1Qn a certain n. But in order to do that we would have to decide whether a sequence 0,1,... ,9 occurs in the development of TT or not. This we are unable to do at (the) present. This is an example of the circumstance that the two statements (Ex)p(x), 1(x)-|p(x) , which are equivalent in classical logic, are not generally equivalent in intuitionist logic. Let a real number generator (abbreviated an rng) be any sequence of rational numbers an such that for every positive integer k we can find another positive integer n such that
for all p. We put a =b
when for every k we can find n such that ' an+p ~ ^n+p ' ^ ijj: for all p. Further, a =t= b may mean
l(a = b), that is, the assumption a = b leads to a contradiction. On the other hand a tt b shall mean that we know a k and an n such that, for all p,
' a n+p " b n+p> > ' while a < b shall mean that we know a k and an n such that bn+p
for all p.
66
It is then possible to prove a lot of theorems about these rng. A real number is the set of all rng which are = a certain rng. The intuitionist notion set will soon be explained below. I shall mention a few of the most important theorems about the rng. One proves that a = b is equivalent ~|~l(a = b), or, in other words, if the assumption a =(= b leads to a contradiction, then a = b. Further, if a =14= b, then for every c we have a 4 c. v b 4= c. It is clear that a $ b -*a b. Further, a =0= b is equivalent to a < b . v . b < a. Instead of l(a < b) one writes a <f b. Then we have that a < t b & b > c - * a > c . Addition, subtraction, multiplication of the rag's a and b is defined by taking the rng with the general term an + t>n
a
n - bn,
anbn ,
whereas the quotient g is defined as a - 7- under the assumption b 4= 0, where r- is the rng c whose general term is cn = | whenever bn 4= 0 and cn = 0, if bn = 0. It is then trivial to prove the associative, commutative and distributive laws. It may be noticed that a + b ^ O - ^ a ^ O - v - b ^ O . For a more thorough study of this subject I recommend Heyting's book. As an introduction to the intuitionist set theory it is convenient to define the notion ips, that is, infinitely proceeding sequence of natural numbers. We are dealing with an ips, if we first choose a natural number ai and, for every n, as soon as B.I ,..., an have been chosen, we choose a n+ i. What determines these choices, whether they obey a certain law or are made at random or more or less arbitrarily, is irrelevant. We are justified in saying that an ips is something that becomes, not that is. If we let a mathematical entity correspond to every finite initial sequence ax,..., an of an ips, we obtain an infinitely proceeding sequence of such entities. A set can be built in two ways: 1) There may be a common way of generating its elements, 2) one considers all elements having a common property. The sets which are obtained in the first manner are called spreads. The sets obtained according to the second point of view are called species. The definition of a spread is as follows: One has two rules, a spread rule and a complementary rule. The spread rule A determines a process for the generation of ips in the following way. 1) A determines for every natural number k, whether it is allowed to be the first element of an ips or not. 2) Every allowed sequence ai,..., an+1 shall be generated from an earlier allowed sequence ai,..., an. 3) Whenever an allowed sequence ai,...., an is given, the rule A determines, for any natural number k, whether ai,..., an, k is an allowed sequence or not. 4) To every admitted sequence a!,..., an at least one natural number k can be found such that ai,..., an, k is an admitted sequence. The complementary rule T determines for every allowed sequence ai,..., an a corresponding mathematical object bn. Some elucidating examples, taken from Hey ting's book, may be suitably mentioned here. 1) Let ri, r2,.... be an enumeration of all rational numbers. We build a spread M by letting the rule A M be this: Every natural number is admitted as ai. Whenever ai,..., an is an admitted sequence, a!,..., an, an+1 shall be admitted if and only if
REMARKS ON INTUITIONIST MATHEMATICS
67
The rule F^j shall be: To every admitted sequence ai,..., an we let correspond the rational number ran. It is easy to see that the elements of M are rng, and indeed, if c is an arbitrary rng, we can find an element m of M such that m = c. Thus M is simply the continuum consisting of all rng. 2) If the rule A -^ in example 1 is restricted by adding the requirement 0 < ra ^ 1 fr every n, then M is the spread consisting of all rng x such that n 3) If the rule A^ in example 2 is further restricted by the requirement that for each n > 1 we shall have
1 r 2" r
then M will consist of all rng y such that 0 < y < 1. It is evident that by changing the rules AM and F^ one can obtain the most varied spreads of rng. A simple example of a species is the notion real number. A real number is the species whose elements are all rng equal to a given one. A general remark is that the definition of an element of a species must always precede the definition of the species in order to avoid circular definitions. Also in the intuitionist theory we have the operations of union and intersection of two species. If e as usual means the membership relation we have the definitions SET stands for (x) (xeS ->xeT) S = T means (S ET) & (T ES). Further we have for arbitrary x the equivalences (xeS n T)*(xeS) & (xeT), (xeS U T)-(xeS) v (xeT). Letting 4 mean the negation of e in the intuitionist sence, we have the following definition of the difference species S - T: (xeS - T)(xeS) & (x^T). It must then be noticed that we don't always have S = T U (S - T). That is only the case if T E S and we are able, for every xeS, to prove either xeT or x4 T. A subspecies T of S is called detachable when we possess such a decision method to decide for any xeS whether it is eT or not. A characteristic notion is "S is congruent to T". That means l(Ex)(xeS & x4 T v x<|s & xeT), which can also be written ~l(Ex)(xeS & xdT) & "](Ex)(xis & xeT). As an example of the use of this notion I shall mention the theorem:
68
LECTURES ON SET THEORY Let T S and S1 = T U (S - T). Then S and Sf are congruent.
Proof. First we have Sf S because T S and S - T S. Hence ~|(Ex)(xeS & x^S f ). Therefore it remains only to prove that ~"|(Ex)(x4s & xeS T ). But this is equivalent to ~~l(Ex)(x<|s & (xeT v - xeS & x<tT)) which again is equivalent to ~l(Ex)((x4s& xeT) v (x^S & xeS & xeT)) which is equivalent to (Ex)(x^S & xeT) which follows from (x)(xeT -*xeS). Simple examples of detachable subspecies of the natural number sequence are given by the even or the odd numbers. The linear continuum can be shown to have no other detachable subspecies than itself and the null species. A species is said to be finite if there is a 1-to-l correspondence between it and an initial part 1,..., n of the natural number series. It is called denumerable if there is such a correspondence between the species and the whole number series. A species is called numerable if it can be mapped onto a detachable subspecies of the sequence of natural numbers. An important notion is "finitary spread" or, more briefly, "fan". A fan is a spread with such a spread law that there are only finitely many allowed first terms, and for every n every admitted sequence with n terms has only a finite number of sequences with n + 1 terms as admitted continuations. Above all the so-called fan theorem is important here. It says that if 0(a) is an integral-valued function of a, a varying through the different elements of the fan, then the value of 0 is already determined by a finite initial sequence of a. Therefore, if 0((Ji) = m, there exists an n such that 0(a2) = m as often as a2 has the same first n terms as ai. An important application of the fan theorem is the proof of the statement that every function which is continuous on a bounded and closed point species is uniformly continuous on the point species. Further, such covering theorems as that of Heine-Borel can be proved. However, not all of the theorems of classical analysis can be proved in intuitionist mathematics. I must confine my exposition of intuitionism to these scattered remarks A more thorough exposition would require a more complete treatment of intuitionist logic, and that would take more space than I have at my disposal here.
17. Mathematics without quantifiers

In all the theories we have treated above we have made use of the logical quantifiers, the universal one and the existential one. We have used them without scruples even in the case of an infinite number of objects. There is now a way of developing mathematics, in particular arithmetic, without the use of these operations which, in the case of an infinite number of objects, may be considered as an extension or extrapolation of conjunction and disjunction in the finite case. If we shall really consider the infinite as something becoming, something not finished or finishable, one might argue that we ought to avoid the quantifiers extended over an infinite range. Such a theory is possible. I myself published in 1923 a first beginning of such a strict finitist mathematics. I treated arithmetic, showing that by the use of
MATHEMATICS WITHOUT QUANTIFIERS
69
free variables for general statements, basing the theory on the principles of definition by recursion and proof by complete induction, ordinary arithmetic could be developed in a very natural way. Later this theory, called Recursive Arithmetic, has been more perfectly formalized, first in Hilbert Bernays, "Grundlagen der Mathematik", Vol. 1, 1934, 7, later also by H. B. Curry (Amer. J. Math. Vol. 63, 1941, pp. 263-282). But the most complete exposition of this kind of mathematics has been given by R. L. Goodstein. He has extended the use of these purely finitist methods also to analysis. However, since this kind of mathematics rather avoids set theory in its proper sense than replaces it by a new form of it, I find no reason to pursue this subject further in these lectures on set theory.
18. The possibility of set theory based on many-valued logic It is well known that it is possible to set forth logical calculi, both propositional calculi and predicate calculi as well, where the statements can have more than the two truth values in classical logic. It is then natural to ask if it should not perhaps be easier to obtain a consistent set theory by taking into account many-valued logics. One might think that it could then perhaps be possible to avoid the distinction of type (and order), even if we maintained a general axiom of comprehension allowing the greater number of truth values. I myself have investigated the possibility of using truth functions of the kind proposed by -Lukasiewicz. My results are published in a paper "Bemerkungen zum Komprehensions axiom". (Zeitschr. f. math. Logik und Grundlagen d. Math., Bd. 3, S. 1 - 17 (1957).) The basic logic is as follows: The truth values are numbers between 0 and 1. The values of p & q, p v q, IP are respectively the min (value of p, value of q), max (value of p, value of q), 1- value of p. Further the value of (x)p(x) is the minimum of the values of p(x) for the diverse x. The value of (Ex)p(x) is the maximum of the values of p(x). In the case of finitely many truth values they are the diverse multiples of the least one 4= 0. Some of my results are: If we shall have an unrestricted axiom of comprehension, a consistent theory is impossible if the number of truth values is finite. On the other hand, it seems to be possible to obtain a consistent set theory with an unrestricted axiom of comprehension if all rational numbers =0 and = 1 are allowed as truth values. I was able to prove that a rudimentary set theory, where the axiom of comprehension (Ey) (x) (xey^0(x)) is only used in the case that 0(x) is built up from the atomic membership propositions by use of the logical connectives, &, v, ~| , alone, is consistent. It ought to be noticed, however, that in any set theory where we use quantifiers extended over the whole domain, the set introduced by the axiom of comprehension are defined relative to the total domain, so that the whole theory in that respect is circular. If we want to avoid circularity, we must accept a distinction of the objects we are dealing with into types, orders or layers, or
70
whatever we prefer to call these subdivisions of our domain. In any case, research concerning set theories based on many-valued logic must be continued before we can say whether it is really promising or not.

Abstract Set Theory (Skolem)

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Abstract Set Theory (Skolem)

Enviado por

Direitos autorais:

Formatos disponíveis

NOTRE DAME MATHEMATICAL LECTURES Number 8

ABSTRACT SET THEORY

NOTRE DAME, INDIANA 1962

Copyright 1962 UNIVERSITY OF NOTRE DAME

ABSTRACT SET THEORY

LECTURES ON SET THEORY

LECTURES ON SET THEORY

Or in other words, if we put for x = 0

LECTURES ON SET THEORY

the corresponding z shall be

yo + 1 + 1 + * + ^+ 1 x2 y 2 +... If x is irrational, but y rational, the corresponding z shall be

Ordered sets. A theorem of Hausdorff.

LECTURES ON SET THEORY

LECTURES ON SET THEORY

LECTURES ON SET THEORY

3. Axiomatic set theory. Axioms of Zermelo and Fraenkel

AXIOMATIC SET THEORY

14 Proof: Let a be I S(MUN).

LECTURES ON SET THEORY

AXIOMATIC SET THEORY

3) and 4) furnish 5) Q U A0f c A0 whence whence a fortiori

From 6) it follows that whence

LECTURES ON SET THEORY

5) and 7) yield noticing that Q n A i ! = Q n M = 0. Now we have

whence, A0 being ~ Aof,

which is the theorem. An immediate consequence is that if

whence after the previous theorem

AXIOMATIC SET THEORY

consisting of the elements

18 ar,bi, Ci, r1 ai,bs, Ci, s 1

LECTURES ON SET THEORY ai,bi,ct t 1

THE WELL-ORDERING THEOREM

4. The well-ordering theorem

LECTURES ON SET THEORY

THE WELL-ORDERING THEOREM

LECTURES ON SET THEORY

5. Ordinals and alephs

ORDINALS AND ALEPHS

LECTURES ON SET THEORY

ORDINALS AND ALEPHS

26 which means that

LECTURES ON SET THEORY

ORDINALS AND ALEPHS

LECTURES ON SET THEORY

Some remarks on functions of ordinal numbers

FUNCTIONS OF ORDINAL NUMBERS

LECTURES ON SET THEORY

FUNCTIONS OF ORDINAL NUMBERS

f(l,f 3 (a,y))) f(|,f 3 (a,f 2 (fty))).

LECTURES ON SET THEORY

7. On the exponentiation of alephs

Of course we then have for arbitrary finite n

In a similar way we obtain for an arbitrary I

2) Let a ^ 0. Then we can write

The proof can be given by transfinite induction with respect to y. The

LECTURES ON SET THEORY

te = n x*P = n N"PK' = ("j3)" n u u r M

while on the other hand

SETS REPRESENTING ORDINALS 8. Sets representing ordinals

LECTURES ON SET THEORY Theorem 32. If A and B are R-ordinals, AeB-*--AC B.

SETS REPRESENTING ORDINALS

LECTURES ON SET THEORY