A Course in Error-Correcting Codes - Justesen and Høholdt

SS
E
E
M
SS
E
E
M MM
ETB_justesen_titelei.qxd 14.01.2004 9:36 Uhr Seite 1
EMS Textbooks in Mathematics
EMS Textbooks in Mathematics is a book series aimed at students or professional mathematicians seeking an
introduction into a particular field. The individual volumes are intended to provide not only relevant techniques,
results and their applications, but afford insight into the motivations and ideas behind the theory. Suitably
designed exercises help to master the subject and prepare the reader for the study of more advanced and spe-
cialized literature.
Jrn Justesen
Tom Hholdt
A Course In Error-
Correcting Codes
SS
E
E
M
SS
E
E
M MM
European Mathematical Society
Authors:
Jrn Justesen Tom Hholdt
COM Department of Mathematics
Technical University of Denmark Technical University of Denmark
Bldg.371 Bldg. 303
DK-2800 Kgs. Lyngby DK-2800 Kgs. Lyngby
Denmark Denmark
2000 Mathematics Subject Classification 94-01;12-01
Bibliographic information published by Die Deutsche Bibliothek
Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available in the Internet at http://dnb.ddb.de.
ISBN 3-03719-001-9
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in other ways, and storage in data banks. For any kind of use permission
of the copyright owner must be obtained.
2004 European Mathematical Society
Contact address:
European Mathematical Society Publishing House
Seminar for Applied Mathematics
ETH-Zentrum FLI C1
CH-8092 Zrich
Switzerland
Phone: +41 (0)1 632 34 36
Email: info@ems-ph.org
Homepage: www.ems-ph.org
Printed on acid-free paper produced from chlorine-free pulp. TCF
Printed in Germany
9 8 7 6 5 4 3 2 1
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Block codes for error-correction
1.1 Linear codes and vector spaces . . . . . . . . . . . . . . . . . . . . . 1
1.2 Minimum distance and minimum weight . . . . . . . . . . . . . . . . 4
1.3 Syndrome decoding and the Hamming bound . . . . . . . . . . . . . 8
1.4 Weight distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Finite elds
2.1 Fundamental properties of nite elds . . . . . . . . . . . . . . . . . 19
2.2 The nite eld F
2
m . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Minimal polynomials and factorization of x
n
1 . . . . . . . . . . . 25
2.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Bounds on error probability for error-correcting codes
3.1 Some probability distributions . . . . . . . . . . . . . . . . . . . . . 33
3.2 The probability of failure and error for bounded distance decoding . . 34
3.3 Bounds for maximum likelihood decoding of binary block codes . . . 37
3.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 Communication channels and information theory
4.1 Discrete messages and entropy . . . . . . . . . . . . . . . . . . . . . 41
4.2 Mutual information and capacity of discrete channels . . . . . . . . . 42
4.2.1 Discrete memoryless channels . . . . . . . . . . . . . . . . . 42
4.2.2 Codes and channel capacity . . . . . . . . . . . . . . . . . . 45
4.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Reed-Solomon codes and their decoding
5.1 Basic denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Decoding Reed-Solomon Codes . . . . . . . . . . . . . . . . . . . . 51
5.3 Vandermonde matrices . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4 Another decoding algorithm . . . . . . . . . . . . . . . . . . . . . . 57
5.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
vi Contents
6 Cyclic Codes
6.1 Introduction to cyclic codes . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Generator- and parity check matrices of cyclic codes . . . . . . . . . 65
6.3 A theorem on the minimum distance of cyclic codes . . . . . . . . . . 66
6.4 Cyclic Reed-Solomon codes and BCH-codes . . . . . . . . . . . . . 67
6.4.1 Cyclic Reed-Solomon codes . . . . . . . . . . . . . . . . . . 68
6.4.2 BCH-codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Frames
7.1 Denitions of frames and their efciency . . . . . . . . . . . . . . . . 73
7.2 Frame quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.1 Measures of quality . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.2 Parity checks on frames . . . . . . . . . . . . . . . . . . . . 76
7.3 Error detection and error correction . . . . . . . . . . . . . . . . . . 77
7.3.1 Short block codes . . . . . . . . . . . . . . . . . . . . . . . . 77
7.3.2 Convolutional codes . . . . . . . . . . . . . . . . . . . . . . 79
7.3.3 Reed-Solomon codes . . . . . . . . . . . . . . . . . . . . . . 79
7.3.4 Low density codes and turbo codes . . . . . . . . . . . . . . 80
7.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8 Convolutional codes
8.1 Parameters of convolutional codes . . . . . . . . . . . . . . . . . . . 83
8.2 Tail-biting codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.3 Parity checks and dual codes . . . . . . . . . . . . . . . . . . . . . . 87
8.4 Distances of convolutional codes . . . . . . . . . . . . . . . . . . . . 89
8.5 Punctured codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.6 Linear systems as encoders . . . . . . . . . . . . . . . . . . . . . . . 91
8.7 Unit memory codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9 Maximum likelihood decoding of convolutional codes
9.1 Finite state descriptions of convolutional codes . . . . . . . . . . . . 97
9.2 Maximum likelihood decoding . . . . . . . . . . . . . . . . . . . . . 102
9.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
10 Combinations of several codes
10.1 Product codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
10.2 Concatenated codes (serial encoding) . . . . . . . . . . . . . . . . . 112
10.2.1 Parameters of concatenated codes . . . . . . . . . . . . . . . 112
10.2.2 Performance of concatenated codes . . . . . . . . . . . . . . 113
10.2.3 Interleaving and inner convolutional codes . . . . . . . . . . 114
10.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
11 Decoding Reed-Solomon and BCH-codes with the Euclidian algorithm
11.1 The Euclidian algorithm . . . . . . . . . . . . . . . . . . . . . . . . 119
Contents vii
11.2 Decoding Reed-Solomon and BCH codes . . . . . . . . . . . . . . . 121
11.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
12 List decoding of Reed-Solomon codes
12.1 A list decoding algorithm . . . . . . . . . . . . . . . . . . . . . . . . 127
12.2 An extended list decoding algorithm . . . . . . . . . . . . . . . . . . 130
12.3 Factorization of Q(x, y) . . . . . . . . . . . . . . . . . . . . . . . . 132
12.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
13 Iterative decoding
13.1 Low density parity check codes . . . . . . . . . . . . . . . . . . . . . 137
13.2 Iterative decoding of LDPC codes . . . . . . . . . . . . . . . . . . . 138
13.3 Decoding product codes . . . . . . . . . . . . . . . . . . . . . . . . 143
13.4 Parallel concatenation of convolutional codes (turbo codes) . . . . . 145
13.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
14 Algebraic geometry codes
14.1 Hermitian codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
14.2 Decoding Hermitian codes . . . . . . . . . . . . . . . . . . . . . . . 155
14.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
A Communication channels
A.1 Gaussian channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.2 Gaussian channels with quantized input and output . . . . . . . . . . 160
A.3 ML Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
B Solutions to selected problems
B.1 Solutions to problems in Chapter 1 . . . . . . . . . . . . . . . . . . . 163
B.10 Solutions to problems in Chapter 10 . . . . . . . . . . . . . . . . . . 176
C Table of minimal polynomials . . . . . . . . . . . . . . . . . . . . . . . . 185
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Preface
In this book we present some topics in coding theory, which we consider to be particu-
larly important and interesting, both from the point of view of theory and applications.
Some of the results are new, most are not, but the choice of subjects reects a part of
the development of coding theory through the last decade. Thus some classical results
have been omitted, and several recent results are included. However the presentation is
new in many places.
We have kept the amount of detail at a minimum. Only the necessary mathematics is
presented, the coding constructions are concentrated on the most important techniques,
and decoding algorithms are presented in their basic versions. However, we have in-
cluded proofs of all essential results.
The aim has been to make the book a suitable starting point for independent investi-
gations in the subject. The learning of the basic tools is supported by many problems,
and there are there are more advanced problems and project suggestions for continu-
ing in the direction where the reader has particular background and interest. This also
means that in order to fully understand the subject, it is essential to solve the problems
and work on the projects. In Appendix B we give solutions to some of the problems,
and instructors can get the complete set of solutions from us. We strongly suggest that
the problems are supplemented by computer excersices, and in most cases the projects
will require a small amount of programming. On the web page of the book we give
examples of excercises in maple
r
and Matlab
r
.
The book grew out of a course at The Technical University of Denmark (covering one
third of a semester) and is written for an audience of primarily graduate or advanced
undergraduate students. It requires some background in elementary linear algebra and
algorithms. Some background in computer science or electrical engineering will also
facilitate the understanding, but mostly a certain maturity is needed.
We have not included references in the text, since we do not expect them to be of im-
mediate use to the reader. However, there is an annotated bibliography and references
to other resources for further study. Many students have helped in improving the text,
in particular we want to acknowledge the important effort by Bergr Jnsson.
Lyngby, November 2003
Jrn Justesen, Tom Hholdt.
Chapter 1
Block codes for error-correction
This chapter introduces the fundamental concepts of block codes and error correction.
Codes are used for several purposes in communication and storage of information. In
this book we discuss only error-correcting codes i.e. in the received message, some
symbols are changed, and it is the objective of the coding to allow these errors to
be corrected. The discussion of error mechanisms and channel models is postponed
to Chapter 4 and here we will simply consider the number of errors that the code can
correct. One of the most important classes of codes, the linear block codes, is described
as vector spaces, and some concepts from linear algebra are assumed. In particular this
includes bases of vector spaces, matrices, and systems of linear equations. Initially we
let all codes be binary, i.e. the symbols are only 0 and 1, since this is both the simplest
and the most important case, but later on we will consider other symbol alphabets as
well.
1.1 Linear codes and vector spaces
A block code C is a set of M codewords
C = {c
1
, c
2
, . . . , c
M
}
c
i
= (c
i0
, c
i1
, . . . , c
in1
)
where the codewords are n-tuples and we refer to n as the length of the code. The ele-
ments c
i j
belong to a nite alphabet of q symbols. For the time being we consider only
binary codes, i.e. the alphabet is {0, 1}, but later we shall consider larger alphabets.
The alphabet will be given the structure of a eld, which allows us to do computations
on the codewords. The theory of nite elds is presented in Chapter 2.
Example 1.1.1. The binary eld F
2
The elements are denoted 0 and 1 and we do addition and multiplication according to the follow-
ing rules: 0 + 0 = 0, 1 + 0 = 0 + 1 = 1, 1 + 1 = 0 and 0 0 = 1 0 = 0 1 = 0, 1 1 = 1.
One may say that addition is performed modulo 2, and in some contexts the logical operation +
is referred to as exclusive or (xor).
With few exceptions we shall consider only linear codes, which are described as vector
spaces.
If F is a eld and n a natural number, then the elements of F
n
can be seen as a vector
space V = (F
n
, +, F) where
x, y F
n
x = (x
0
, x
1
, . . . , x
n1
)
y = (y
0
, y
1
, . . . , y
n1
)
x + y = (x
0
+ y
0
, x
1
+ y
1
, . . . , x
n1
+ y
n1
)
f x = ( f x
0
, f x
1
, . . . , f x
n1
) where f F
This may be a familiar concept with F as the eld of real numbers, but we shall use
other elds, in particular F
2
, in the following examples. Note that a vector space has a
basis, i.e. a maximal set of linearly independent vectors, and that any vector is a linear
combination of elements fromthe basis. The dimension of a vector space is the number
of elements in a basis.
In V we have an inner product dened by
x y = x
0
y
0
+ x
1
y
1
+ + x
n1
y
n1
So the value of the inner product is an element of the eld F. If two vectors x and y
satisfy x y = 0 they are said to be orthogonal. Note that we can have x x = 0 with
x = 0 if F = F
2
.
Denition 1.1.1. A linear (n, k) block code C, is a k-dimensional subspace of the
vector space V.
The code is called linear since if C is a subspace we have
c
i
C c
j
C c
i
+c
j
C
c
i
C f F f c
i
C
In particular the zero vector is always a codeword. The number of codewords is M =
q
k
, where q is the number of elements in the eld F.
Example 1.1.2. An (n, k) = (7, 4) binary block code.
Consider a code consisting of the following 16 vectors:
0000000 1111111
1000110 0111001
0100011 1011100
0010101 1101010
0001111 1110000
1100101 0011010
1010011 0101100
1001001 0110110
It may readily be veried that any sum of two codewords is again a codeword.
1.1 Linear codes and vector spaces 3
When a code is used for communication or storage, the information may be assumed to
be a long sequence of binary digits. The sequence is segmented into blocks of length
k. We may think of such a block of information as a binary vector u of length k. We
shall therefore need an encoding function that maps k-vectors onto codewords. In a
systematic encoding, we simply let the rst k coordinates be equal to the information
symbols. The remaining n k coordinates are sometimes referred to as parity check
symbols and we shall justify this name below.
Instead of listing all the codewords, a code may be specied by a basis of k linearly
independent codewords.
Denition 1.1.2. A generator matrix G of an (n, k) code C is a k n matrix whose
rows are linearly independent.
If the information is the vector u of length k, we can state the encoding rule as
c = uG (1.1)
Example 1.1.3. A basis for the (7,4) code.
We may select four independent vectors from the list above to give the generator matrix
G =
_
_
_
_
1 0 0 0 1 1 0
0 1 0 0 0 1 1
0 0 1 0 1 0 1
0 0 0 1 1 1 1
_
_
The same code, in the sense of a set of words or a vector space, may be described
by different generator matrices or bases of the vector space. We usually just take one
that is convenient for our purpose. However, G may also be interpreted as specifying
a particular encoding of the information. Thus row operations on the matrix do not
change the code, but the modied G represents a different encoding mapping. Since
G has rank k, we can obtain a convenient form of G by row operations in such a way
that k columns form a k k identity matrix I . We often assume that this matrix can be
chosen as the rst k columns and write the generator matrix as
G = (I, A)
This form of the generator matrix gives a systematic encoding of the information.
We may now dene a parity check as a vector, h, of length n which satises
Gh
T
= 0
where h
T
denotes the transpose of h.
The parity check vectors are again a subspace of V with dimension n k.
Denition 1.1.3. A parity check matrix H for an (n, k) code C is an (n k)n matrix
whose rows are linearly independent parity checks.
So if G is a generator matrix for the code and H is a parity check matrix, we have
GH
T
= 0
where 0 is a k (n k) matrix of zeroes.
From the systematic form of the generator matrix we may nd H as
H = (A
T
, I ) (1.2)
where I now is an (n k) (n k) identity matrix. If such a parity check matrix is
used, the last n k elements of the codeword are given as linear combinations of the
rst k elements. This justies calling the last symbols parity check symbols.
Denition 1.1.4. Let H be a parity check matrix for an (n, k) code C and let r F
n
,
then the syndrome s = syn(r) is given by
s = Hr
T
(1.3)
We note that if the received word, r = c +e, where c is a codeword and e is the error
pattern (also called the error vector) then
s = H(c +e)
T
= He
T
(1.4)
The term syndrome refers to the fact that s reects the error in the received word. The
codeword itself does not contribute to the syndrome, and for an error-free codeword
s = 0.
It follows fromthe above denition that the rows of H are orthogonal to the codewords
of C. The code spanned by the rows of H is what is called the dual code C
dened
by
C
= {x F
n
|x c = 0 c C}
It is often convenient to talk about the rate of a code R =
k
n
. Thus the dual code has
rate 1 R.
Example 1.1.4. A parity check matrix for the (7,4) code.
We may write the parity check matrix as
H =
_
_
1 0 1 1 1 0 0
1 1 0 1 0 1 0
0 1 1 1 0 0 1
_
_
1.2 Minimum distance and minimum weight
In order to determine the error correcting capability of a code we will introduce the
following useful concept
Denition 1.2.1. The Hamming weight of a vector x, denoted w
H
(x), is equal to the
number of nonzero coordinates.
1.2 Minimum distance and minimum weight 5
Note that the Hamming weight is often simply called weight.
For a received vector r = c
j
+e the number of errors is the Hamming weight of e.
We would like to be able to correct all error patterns of weight t for some t , and we
may use the following denition to explain what we mean by that.
Denition 1.2.2. A code is t -error correcting if for any two codewords c
i
= c
j
, and
for any error patterns e
1
and e
2
of weight t , we have c
i
+e
1
= c
j
+e
2
.
This means that it is not possible to get a certain received word from making at most t
errors in two different codewords.
A more convenient way of expressing this property uses the notion of Hamming dis-
tance
Denition 1.2.3. The Hamming distance between two vectors x and y, denoted
d
H
(x, y) is the number of coordinates where they differ.
It is not hard to see that
d
H
(x, y) = 0 x = y
d
H
(x, y) = d
H
(y, x)
d
H
(x, y) d
H
(x, z) +d
H
(z, y)
So d
H
satises the usual properties of a distance, in particular the third property is the
triangle inequality. With this distance V becomes a metric space.
As with the weight the Hamming distance is often simply called distance, hence we
will in the following often skip the subscript H on the weight and the distance.
Denition 1.2.4. The minimum distance of a code, d, is the minimum Hamming dis-
tance between any pair of different codewords.
So the minimum distance can be calculated by comparing all pairs of codewords, but
for linear codes this is not necessary since we have
Lemma 1.2.1. In an (n, k) code the minimum distance is equal to the minimum weight
of a nonzero codeword.
Proof. It follows fromthe denitions that w(x) = d(0, x) and that d(x, y) = w(x y).
Let c be a codeword of minimum weight. Then w(c) = d(0, c) and since 0 is a
codeword we have d
min
w
min
. On the other hand, if c
1
and c
2
are codewords at
minimum distance, we have d(c
1
, c
2
) = w(c
1
c
2
) and since c
1
c
2
is again a
codeword we get w
min
d
min
. We combine the two inequalities and get the result.
Based on this observation we can now prove
Theorem 1.2.1. An (n, k) code is t -error correcting if and only if t <
d
2
.
Proof. Suppose t <
d
2
and we had two codewords c
i
and c
j
and two error patterns
e
1
and e
2
of weight t such that c
i
+ e
1
= c
j
+ e
2
. Then c
i
c
j
= e
2
e
1
but
w(e
2
e
1
) = w(c
i
c
j
) 2t < d, contradicting the fact that the minimum weight
is d. On the other hand suppose that t
d
2
and let c have weight d. Change t + 1
of the nonzero positions to zeroes to obtain y. Then d(0, y) d (t + 1) t and
d(c, y) t but 0 + y = c +(y c) so the code is not t -error correcting.
Example 1.2.1. The minimum distance of the (7, 4) code from Example 1.1.2 is 3 and t = 1.
Finding the minimum distance of a code is difcult in general, but the parity check
matrix can sometimes be used. This is based on the following simple observation.
Lemma 1.2.2. Let C be an (n, k) code and H a parity check matrix for C. Then, if j
columns are linearly dependent, C contains a codeword with nonzero elements in some
of the corresponding positions, and if C contains a word of weight j , then there exist j
linearly dependent columns of H.
This follows directly from the denition of matrix multiplication, since for a codeword
c we have Hc
T
= 0.
Lemma 1.2.3. Let C be an (n, k) code with parity check matrix H. The minimum
distance of C equals the minimum number of linearly dependent columns of H.
This follows immediately from the Lemma 1.2.3 and Lemma 1.2.1
For a binary code this means that d 3 the columns of H are distinct and nonzero.
One of our goals is, for given n and k, the construction of codes with large minimum
distance. The following important theoremensures the existence of certain codes, these
can serve as a reference for other codes.
Theorem 1.2.2. (The Varshamov-Gilbert bound)
There exists a binary linear code of length n, with at most m linearly independent parity
checks and minimum distance at least d, if
1 +
_
n 1
1
_
+ +
_
n 1
d 2
_
< 2
m
Proof. We shall construct an m n matrix such that no d 1 columns are linearly
dependent. The rst column can be any nonzero m-tuple. Nowsuppose we have chosen
i columns so that no d 1 are linearly dependent. There are
_
i
1
_
+ +
_
i
d 2
_
linear combinations of these columns taken d 2 or fewer at a time. If this number
is smaller than 2
m
1 we can add an extra column such that still d 1 columns are
linearly independent.
1.2 Minimum distance and minimum weight 7
For large n, the Gilbert-Varshamov bound indicates that good codes exist, but there is
no known practical way of constructing such codes. It is also not known if long binary
codes can have distances greater than indicated by the Gilbert-Varshamov bound. Short
codes can have better minimum distances. The following examples are particularly
important.
Denition 1.2.5. A binary Hamming code is a code whose parity check matrix has all
nonzero binary m-vectors as columns.
So the length of a binary Hamming code is 2
m
1 and the dimension is 2
m
1 m,
since it is clear that the parity check matrix has rank m. The minimum distance is at
least 3 by the above and it it easy to see that it is exactly 3. The columns can be ordered
in different ways, a convenient method is to represent the natural numbers from 1 to
2
m
1 as binary m-tuples.
Example 1.2.2. The (7, 4) code is a binary Hamming code with m = 3.
Denition 1.2.6. An extended binary Hamming code is obtained by adding a zero
column to the parity check matrix of a Hamming code and then adding a row of all 1s.
This means that the length of the extended code is 2
m
. The extra parity check is
c
0
+c
1
+ +c
n1
= 0 and therefore all codewords in the extended code have even
weight and the minimum distance is 4. Thus the parameters, (n, k, d), of the binary
extended Hamming code are (2
m
, 2
m
m 1, 4).
Denition 1.2.7. A biorthogonal code is the dual of the binary extended Hamming
code.
The biorthogonal code has length n = 2
m
and dimension k = m + 1. It can be seen
that the code consists of the all 0s vector, the all 1s vector and 2n 2 vectors of weight
n
2
(See problem 1.5.9).
Example 1.2.3. The (16, 11, 4) binary extended Hamming code and its dual.
Based on the denition above we can get a parity check matrix for the (16, 11, 4) code as
_
_
_
_
_
_
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
_
_
The same matrix is a generator matrix for the (16, 5, 8) biorthogonal code.
1.3 Syndrome decoding and the Hamming bound
When a code is used for correcting errors, one of the important problems is the design
of a decoder. One can consider this as a mapping from
n
F
q
into the code C, as an
algorithm or sometimes even as a physical device. We will usually see a decoder as
a mapping or as an algorithm. One way of stating the objective of the decoder is: for
a received vector r, select as the transmitted codeword c a codeword that minimizes
d(r, c). This is called maximum likelihood decoding for reasons we will explain in
Chapter 3. It is clear that if the code is t -error correcting and r = c +e with w(e) t ,
then the output of such a decoder is c.
It is often difcult to design a maximum likelihood decoder, but if we only want to
correct t errors where t <
d
2
it is sometimes easier to get a good algorithm.
Denition 1.3.1. A minimumdistance decoder is a decoder that, given a received word
r, selects the codeword c that satises d(r, c) <
d
2
if such a word exists, and otherwise
declares failure.
It is obvious that there can be at most one codeword within distance
d
2
from a received
word.
Using the notion of syndromes from the previous section we can think of a decoder as
a mapping of syndromes to error patterns. Thus for each syndrome we should choose
an error pattern with the smallest number of errors, and if there are several error pat-
terns of equal weight with the same syndrome we may choose one of these arbitrarily.
Such a syndrome decoder is not only a useful concept, but may also be a reasonable
implementation if n k is not too large.
Denition 1.3.2. Let C be an (n, k) code and a
n
F . The coset containing a is the
set a +C = {a +c|c C}.
If two words x and y are in the same coset and H is a parity check matrix of the code,
we have Hx
T
= H(a +c
1
)
T
= Ha
T
= H(a + c
2
)
T
= Hy
T
so the two words have
the same syndrome. On the other hand if two words x and y have the same syndrome,
then Hx
T
= Hy
T
and therefore H(x y)
T
= 0, so this is the case if and only if x y
is a codeword and therefore x and y are in the same coset. We therefore have
Lemma 1.3.1. Two words are in the same coset if and only if they have the same
syndrome.
The cosets form a partition of the space
n
F into q
nk
classes each containing q
k
ele-
ments.
This gives an alternative way of describing a syndrome decoder: Let r be a received
word. Find a vector f of smallest weight in the coset containing r (i.e. syn( f ) =
syn(r)) and decode into r f . A word of smallest weight in a coset can be found
once and for all, such a word is called a coset leader. With a list of syndromes and
1.3 Syndrome decoding and the Hamming bound 9
corresponding coset leaders, syndrome decoding can be performed as follows: Decode
into r f where f is the coset leader of the coset corresponding to syn(r). In this
way we actually do maximum likelihood decoding and can correct q
nk
error patterns.
If we only list the cosets where the coset leader is unique and have the corresponding
syndromes we do minimum distance decoding.
We illustrate this in
Example 1.3.1. Let C be the binary (6, 3) code with parity check matrix
H =
_
_
1 1 1 1 0 0
1 0 1 0 1 0
1 1 0 0 0 1
_
_
From the parity check matrix we see that we have the following syndromes and coset leaders:
syndrome coset leader
(000)
T
000000
(111)
T
100000
(101)
T
010000
(110)
T
001000
(100)
T
000100
(010)
T
000010
(001)
T
000001
(011)
T
000011
In the last coset there is more than one word with weight 2 so we have chosen one of these, but
for the rst seven cosets the coset leader is unique.
This illustrates
Lemma 1.3.2. Let C be an (n, k) code. C is t -error correcting if and only if all words
of weight t are coset leaders.
Proof. Suppose C is t -error correcting. Then d = w
min
> 2t . If two words of weight
t were in the same coset their difference would be in C and have weight at most 2t ,
a contradiction.
To prove the converse suppose all words of weight t are coset leaders but that c
1
and
c
2
were codewords at distance 2t . Then we can nd e
1
and e
2
such that c
1
+ e
1
=
c
2
+ e
2
with w(e
1
) t and w(e
2
) t and therefore syn(e
1
) = syn(e
2
), contradicting
that e
1
and e
2
are in different cosets.
We note that the total number of words of weight t is
1 +(q 1)
_
n
1
_
+(q 1)
2
_
n
2
_
+ +(q 1)
t
_
n
t
_
and that the total number of syndromes is q
nk
so we have
Theorem 1.3.1. (The Hamming bound) If C is an (n, k) code over a eld F with q
elements that corrects t errors, then
t
j =0
(q 1)
j
_
n
j
_
q
nk
.
The bound can be seen as an upper bound on the minimum distance of a code with
given n and k, an upper bound on k if n and d are given or as a lower bound on n if k
and d are given.
0 5 10 15 20 25 30 35
0
20
40
60
80
100
120
140
d
n
k
Figure 1.1: Gilbert ( x ) and Hamming ( o ) bounds on the number of
parity symbols for binary codes of length 255.
In the binary case the bound specializes to
t
j =0
_
n
j
_
2
nk
For the binary Hamming codes where t = 1 we have 1 + n = 2
nk
. For this reason
these codes are called perfect.
1.4 Weight distributions 11
1.4 Weight distributions
If we need more information about the error-correcting capability of a code than what
is indicated by the minimum distance, we can use the weight distribution.
Denition 1.4.1. The weight distribution of a code is a vector A = (A
0
, A
1
, . . . , A
n
)
where A
w
is the number of codewords of weight w. The weight enumerator is the
polynomial
A(z) =
n
w=0
A
w
z
w
.
We note that for a linear code the number A
w
is also the number of codewords of
distance w from a given codeword. In this sense the geometry of the code as seen from
a codeword is the same for all these.
We note an important result on the weight distribution of dual codes
Theorem 1.4.1. (MacWilliams) If A(z) is the weight enumerator of a binary (n, k)
code C, the weight enumerator B(z) of the dual code C
is given by
B(z) = 2
k
(1 +z)
n
A
_
1 z
1 +z
_
(1.5)
Because of the importance of the theorem we give a proof even though it is quite
lengthy (and may be skipped at rst reading).
Proof. Let H be a parity check matrix of C. Let H
ext
be the matrix that consists of all
linear combinations of the rows of H, so H
ext
has as rows all the codewords of C
.
Let y
n
F
2
and dene the extended syndrome
s
ext
= H
ext
y
T
It is obvious that
c C Hc
T
= 0 H
ext
c
T
= s
ext
= 0 (1.6)
Let
s
ext
= (s
1
, s
2
, . . . s
2
nk )
and dene
E
j
= {x
n
F
2
|s
j
= 0}, j = 1, 2, . . . 2
nk
From (1.6) we have that
C = E
1
E
2
E
2
nk
=
n
F
2
\(E
1
E
2
E
2
nk )
where
E
j
=
n
F
2
\E
j
= {x
n
F
2
|s
j
= 1}.
If we let
E = E
1
E
2
nk
we have
C =
n
F
2
\E (1.7)
The weight enumerator of
n
F
2
is
n
i=0
_
n
i
_
z
i
= (1 +z)
n
,
so if we let E(z) denote the weight enumerator of E we have
A(z) = (1 + z)
n
E(z)
In the following we will determine E(z).
We rst note that by linearity if s
ext
= 0, then s
ext
consists of 2
nk1
0s and 2
nk1
1s. That means that a word from E is in exactly 2
nk1
of the E
j
s and we therefore
have
2
nk1
E(z) =
2
nk
j =1
E
j
(z)
where E
j
(z) is the weight enumerator of E
j
Let w
j
denote the Hamming weight of the j -th row of H
ext
, then
E
j
(z) = (1 +z)
nw
j
w
j
k=1,odd
_
w
j
k
_
z
k
= (1 +z)
nw
j
1
2
_
(1 +z)
w
j
(1 z)
w
j
_
=
1
2
(1 +z)
n
1
2
(1 +z)
nw
j
(1 z)
w
j
From this we get
E(z) = (1 +z)
n
1
2
nk
(1 +z)
n
2
nk
j =1
_
1 z
1 +z
_
w
j
= (1 +z)
n
1
2
nk
(1 +z)
n
B
_
1 z
1 +z
_
(1.8)
and therefore
A(z) =
1
2
nk
(1 +z)
n
B
_
1 z
1 + z
_
By interchanging the roles of C and C
we get the result.

1.5 Problems 13
From (1.5) we can get
2
k
n
i=0
B
i
z
i
=
n
w=0
A
w
_
1 z
1 +z
_
w
(1 +z)
n
2
k
n
i=0
B
i
z
i
=
n
w=0
A
w
(1 z)
w
(1 +z)
nw
2
k
n
i=0
B
i
z
i
=
n
w=0
A
w
w
m=0
_
w
m
_
(z)
m
nw
l=0
_
n w
l
_
z
l
(1.9)
From which we can nd B if A is known and vice versa. The actual calculation can be
tedious, but can be accomplished by many symbolic mathematical programs.
Example 1.4.1. (Example 1.2.3 continued) The biorthogonal (16, 5, 8) code has weight enu-
merator
A(z) = 1 +30z
8
+ z
16
The weight enumerator for the (16, 11, 4) extended Hamming code can then be found using (1.9)
B(z) = 2
5
_
(1 + z)
16
+30(1 z)
8
(1 + z)
8
+(1 z)
16
_
= 2
5
_
_
16
j =0
_
16
j
_
_
z
j
+(z)
j
_
+30
8
m=0
_
8
m
_
(z)
m
8
l=0
_
8
l
_
z
l
_
_
= 1 +140z
4
+448z
6
+870z
8
+448z
10
+140z
12
+ z
16
1.5 Problems
Problem 1.5.1 Consider the following binary code
( 0 0 0 0 0 0 )
( 0 0 1 1 1 1 )
( 1 1 0 0 1 1 )
( 1 1 1 1 0 0 )
( 1 0 1 0 1 0 )
1) Is this a linear code?
2) Add more words such that the resulting code is linear.
3) Determine a basis of this code.
Problem 1.5.2 Let C be the binary linear code of length 6 with generator matrix
G =
_
_
1 0 1 0 1 0
1 1 1 1 0 0
1 1 0 0 1 1
_
_
1) Determine a generator matrix for the code of the form (I A).
2) Determine a parity check matrix for the code C
.
3) Is (1,1,1,1,1,1) a parity check for the code?
Problem 1.5.3 A linear code has the generator matrix
G =
_
_
_
_
1 0 0 0 1 1 0 0 1 1 1 0
0 1 0 0 0 1 1 0 0 1 1 1
0 0 1 0 0 0 1 1 1 0 1 1
0 0 0 1 1 0 0 1 1 1 0 1
_
_
1) Determine the dimension and the minimum distance of the code and its dual.
2) How many errors do the two codes correct?
Problem 1.5.4 Let the columns of the parity check matrix of a code be h
1
, h
2
, h
3
, h
4
, h
5
so
H =
_
_
| | | | |
h
1
h
2
h
3
h
4
h
5
| | | | |
_
_
1) What syndrome corresponds to an error at position j ?
2) Express H(10011)
T
using h
1
, h
2
, h
3
, h
4
, h
5
.
3) Show that if (1, 1, 1, 1, 1) is a codeword, then h
1
+h
2
+h
3
+h
4
+h
5
= 0.
Problem 1.5.5 Let C be a code with minimum distance d and parity check matrix H.
1) Show that H has d linearly dependent columns.
2) Show that any d 1 or fewer columns of H are linearly independent.
3) What is d then?
Problem 1.5.6 Use the Gilbert-Varshamov bound to determine k such that there exists a binary
(15, k) code with minimum distance 5.
Problem 1.5.7
1) Determine the parameters of the binary Hamming codes with m = 3, 4, 5 and 8.
2) What are the parameters of the extended codes?
Problem 1.5.8 In Example 1.1.2 we gave a parity check matrix for a Hamming code.
1) What is the dimension and the minimum distance of the dual code?
2) Show that all pairs of codewords in this code have the same distance.
3) Find the minimum distance of the dual to a general binary Hamming code.
These codes are called equidistant because of the property noted in 2).
Problem 1.5.9 Consider the biorthogonal code B(m) of length 2
m
and dimension m +1.
1) Show that B(2) contains (0, 0, 0, 0), (1, 1, 1, 1) and six words of weight 2.
2) Show that B(m) contains the all 0s vector, the all 1s vector and 2n 2 vectors of weight
n
2
.
3) Replace {0, 1} with {1, 1} in all the codewords and show that these are orthogonal as real
vectors.
1.5 Problems 15
Problem 1.5.10 Let G be a generator matrix of an (n, k, d) code C where d 2. Let G
be the
matrix obtained by deleting a column of G and let C
be the code with generator matrix G
.
1) What can you say about n
, k
and d
?
This process of obtaining a shorter code is called puncturing.
Another way of getting a shorter code from an (n, k, d) code C is to force an information
symbol, x say, to be zero and then delete that position.
This process is called shortening.
2) What are the parameters of the shortened code?
Problem 1.5.11 Let H be a parity check matrix for an (n, k) code C. We construct a new code
C
ext
of length n +1 by dening c
n
= c
0
+c
1
+ +c
n1
.
What can be said about the dimension, the minimum distance and the parity check matrix of
C
ext
?
Problem 1.5.12 Let C be a binary (n, k) code.
1) Show that the number of codewords that has 0 at position j is either 2
k
or 2
k1
.
2) Show that

cC
w(c) n 2
k1
(Hint : Write all the the codewords as rows of a 2
k
n
matrix.)
3) Prove that
d
min

n 2
k1
2
k
1
This is the so-called Plotkin bound.
Problem 1.5.13 Let C be the code with generator matrix
G =
_
_
1 0 0 1 0 1
0 1 0 1 1 1
0 0 1 0 1 1
_
_
1) Determine a parity check matrix for C.
2) Determine the minimum distance of C.
3) Determine the cosets that contain (111111), (110010) and (100000) respectively and nd for
each of these the coset leader.
4) Decode the received word (111111).
Problem 1.5.14 Let C be the code of length 9 with parity check matrix
H =
_
_
_
_
0 1 0 0 1 1 0 0 0
0 1 1 1 0 0 1 0 0
1 1 1 1 0 0 0 1 0
1 1 1 0 1 0 0 0 1
_
_
1) What is the dimension of C?
2) Determine the minimum distance of C.
3) Determine coset leaders and the corresponding syndromes for at least 11 cosets.
4) Is 000110011 a codeword?
5) Decode the received words 110101101 and 111111111.
Problem 1.5.15 Let C be an (n, k) code with minimum distance 5 and parity check matrix H.
Is it true that H(1110 . . . 0)
T
= H(001110 . . . 0)
T
?
Problem 1.5.16 Determine a parity check matrix for a code that has the following coset leaders:
000000, 100000, 010000, 001000, 000100, 000010, 000001, 110000.
Problem 1.5.17 Find an upper bound on k for which there exists a (15, k) code with minimum
distance 5. Compare the result with Problem 1.5.6
Problem 1.5.18 An (8, 1) code consists of the all 0s word and the all 1s word.
What is the weight distribution of the dual code?
Problem 1.5.19 In this problem we investigate the existence of a binary (31, 22, 5) code C.
1) Show that the Hamming bound does not rule out the existence of such a code.
2) Determine the number of cosets of C and determine the number of coset leaders of weight
0, 1, and 2 respectively.
3) Determine for each of the cosets, where the coset leaders have weights 0, 1 or 2, the maximal
number of words of weight 3 in such a coset.
4) Determine for each of the cosets, where the coset leader has weight at least 3, the maximal
number of words of weight 3 in such a coset.
5) Show that this leads to a contradiction and therefore that such a code does not exist!
Problem 1.5.20 Let C be the binary (32, 16) code with generator matrix G = (I, A) where I is
a 16 16 identity matrix and
A =
_
_
_
_
J

I

I

I
I J

I

I
I

I J

I
I

I

I J
_
_
where

I is a 4 4 identity matrix and
J =
_
_
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
1) Prove that C = C
i.e. the code is self dual. (Why is it enough to show that each row in G is
orthogonal to any other row?)
2) Determine a parity check matrix H.
3) Find another generator matrix of the form G
= (A
, I )
4) Prove that for a self dual code the following statement is true: If two codewords both have
weights that are multiples of 4, that is also true for their sum.
5) Prove that the minimum distance of the code is a multiple of 4.
6) Prove that the minimum distance is 8.
7) How many errors can the code correct?
8) How many cosets does the code have?
9) How many error patterns are there of weight 0, 1, 2 and 3?
10) How many cosets have coset leaders of weight at least 4?
1.5 Problems 17
Problem 1.5.21 Project Write a program for decoding the (32, 16) code above using syndrome
decoding. If more than three errors occur, the program should indicate a decoding failure and
leave the received word unchanged.
We suggest that you generate a table of error patterns using the syndrome as the address. Initially
the table is lled with the all 1s vector (or some other symbol indicating decoding failure). Then
the syndromes for zero to three errors are calculated, and the error patterns are stored in the
corresponding locations.
Assume that the bit error probability is 0.01.
1) What is the probability of a decoding failure?
2) Give an estimate of the probability of decoding to a wrong codeword.
Problem 1.5.22 Project Write a program that enables you to determine the weight enumerator
of the binary (32, 16) code above.
One way of getting all the codewords is to let the information vector run through the 2
16
possi-
bilities by converting the integers from 0 to 2
16
1 to 16 bit vectors and multiply these on the
generator matrix.
Problem 1.5.23 Project A way to construct binary codes of prescribed length n and minimum
distance d is the following: List all the binary n vectors in some order and choose the codewords
one at the time such that it has distance at least d from the previously chosen words. This
construction is greedy in the sense that it selects the rst vector on a list that satises the distance
test. Clearly different ways of listing all length n binary vectors will produce different codes.
One such list can be obtained by converting the integers from 0 to 2
n
1 to binary n vectors.
Write a program for this greedy method.
1) Try distance 3, 5 and 6. Do you get good codes?
2) Are the codes linear? If so why?
You might try listing the binary vectors in a different order.
Chapter 2
Finite elds
As we have stated earlier we shall consider codes where the alphabet is a nite eld.
In this chapter we dene nite elds and investigate some of the most important prop-
erties. We have chosen to cover in the text only what we think is necessary for coding
theory. Some additional material is treated in the problems.
2.1 Fundamental properties of nite elds
We begin with the denition of a eld.
Denition 2.1.1. A eld F is a nonempty set, S, with two binary operations, + and ,
and two different elements of S, 0 and 1, such that the following axioms are satised.
1. x, y x + y = y + x
2. x, y, z (x + y) +z = x +(y +z)
3. x x +0 = x
4. x (x) x +(x) = 0
5. x, y x y = y x
6. x, y, z x (y z) = (x y) z
7. x x 1 = x
8. x = 0 x
1
x x
1
= 1
9. x, y, z x (y +z) = x y + x z
Classical examples of elds are the rational numbers Q, the real numbers R, and the
complex numbers C.
20 Finite elds
If the number of elements, |S|, in S is nite we have a nite eld and it is an interesting
fact that these can be completely determined in the sense that these exist if and only if
the number of elements is a power of a prime and they are essentially unique. In the
following we will consider only the case where |S| = p, p is a prime, and |S| = 2
m
.
Theorem 2.1.1. Let p be a prime and let S = {0, 1, . . . , p 1}. Let + and denote
addition and multiplication modulo p respectively. Then (S, +, , 0, 1) is a nite eld
with p elements which we denote F
p
.
Proof. It follows from elementary number theory that the axioms 1 7 and 9 are sat-
ised. That the nonzero elements of S have a multiplicative inverse (axiom 8) can be
seen in the following way:
Let x be a nonzero element of S and consider the set

S = {1 x, 2 x, . . . , ( p 1) x}.
We will prove that the elements in

S are different and nonzero so therefore

S = S\{0}
and in particular there exists an i such that i x = 1.
It is clear that 0 /

S since 0 = i x, 1 i p 1 implies p|i x and since p is a prime
therefore p|i or p|x, a contradiction.
To see that the elements in

S are different suppose that i x = j x where 1 i, j
p 1. We then get p|(i x j x) p|(i j )x and again since p is a prime we get
p|(i j ) or p|x. But x < p and i j {( p2), . . . , ( p2)} so therefore i = j .
Example 2.1.1. The nite eld F
2
If we let p = 2, we get F
2
that we have already seen.
3
.
If we let p = 3 we get the ternary eld with elements 0, 1, 2.
In that we have
0 +0 = 0, 0 +1 = 1 +0 = 1, 0 +2 = 2 +0 = 2, 1 +1 = 2,
1 +2 = 2 +1 = 0, 2 +2 = 1, 0 0 = 0 1 = 1 0 = 2 0 = 0 2 = 0,
1 2 = 2 1 = 2, 2 2 = 1, (so 2
1
= 2 !)
We will now prove that multiplication in any nite eld F with q elements can essen-
tially be done as addition of integers modulo q 1. This results from the fact that
F contains an element , a so-called primitive element, such that F\{0} = {
i
|i =
0, 1, . . . , q 2} and
q1
= 1 . Therefore
i

j
=
(i+j ) mod (q1)
.
Denition 2.1.2. Let F be a nite eld with q elements, and let a F\{0}. The order
of a, ord (a), is the smallest positive integer s such that a
s
= 1.
The set {a
i
|i = 1, 2, . . . } is a subset of F and must therefore be nite, so there exists
i
1
and i
2
such that a
i
1
= a
i
2
and hence a
i
1
i
2
= 1. This means that there exists an i
such that a
i
= 1 and therefore also a smallest such i , so the order is well dened.
2.1 Fundamental properties of nite elds 21
Lemma 2.1.1. Let F be a nite eld with q elements, and let a, b F\{0}; then
1. ord (a) = s a, a
2
, . . . , a
s
are all different.
2. a
j
= 1 ord (a) | j .
3. ord
_
a
j
_
=
ord(a)
gcd(ord(a), j )
.
4. ord (a) = s, ord (b) = j , gcd(s, j ) = 1 ord (ab) = s j .
Proof.
1. If a
i
= a
j
, 0 < i < j s we get a
j i
= 1 with 0 < j i < s, contradicting
the denition of order.
2. If ord (a) = s and j = sh we get a
j
= a
sh
= (a
s
)
h
= 1. If a
j
= 1 we let
j = sh + r with 0 r < s and get 1 = a
j
= a
sh+r
= (a
s
)
h
a
r
= a
r
and
therefore r = 0 by the denition of order, so s| j .
3. Let ord (a) = s and ord
_
a
j
_
= l; then 1 = (a
j
)
l
= a
j l
so by 2, we get s| j l and
therefore
s
gcd(s, j )
j l
gcd(s, j )
hence
s
gcd(s, j )
l.
On the other hand (a
j
)
s
gcd( j,s)
= (a
s
)
j
gcd( j,s)
= 1 so again by 2 we have l
s
gcd( j,s)
and therefore l =
s
gcd( j,s)
.
4. (ab)
s j
= (a
s
)
j
(b
j
)
s
= 1 1 = 1 so by 2 ord (ab) |s j , but since gcd(s, j ) = 1
this means that ord (ab) = l
1
l
2
where l
1
|s and l
2
| j .
So 1 = (ab)
l
1
l
2
and therefore 1 = [(ab)
l
1
l
2
]
s
l
1
= a
sl
2
b
sl
2
= b
sl
2
so by 2 j |l
2
s
and since gcd( j, s) = 1 we have j |l
2
and hence j = l
2
.
In the same way we get s = l
1
and the claim is proved.
The lemma enables us to prove
Theorem 2.1.2. Let F be a nite eld with q elements, then F has an element of order
q 1.
Proof. Since |F\{0}| = q 1 it follows from 1 of Lemma 2.1.1 that the order of any
element is at most q 1.
Let be an element of maximal order and any element in F\0.
Let ord () = r and ord() = s. We will rst show that s|r.
Suppose not, then there exists a prime p and natural numbers i and j such that r =
p
i
a, s = p
j
b with j > i and gcd(a, p) = gcd(b, p) = 1.
We have from Lemma 2.1.1 3 that
ord
_
p
i
_
=
r
gcd
_
r, p
i
_ = a and ord
_
b
_
=
s
gcd(s, b)
= p
j
22 Finite elds
and since gcd(a, p) = 1 we get from 2.1.1 4
ord
_
p
i

b
_
= a p
j
> a p
i
= r
contradicting the assumption that r is the maximal order.
So we have ord() |ord ().
This implies that every element of F\{0} is a zero of the polynomial z
ord()
1 and
since a polynomial of degree n can have at most n zeroes (see Theorem 2.2.2) we
conclude that ord () q 1 and hence that ord() = q 1 and the theorem is
proved.
Corollary 2.1.1. The order of any nonzero element in a nite eld with q elements
divides q 1.
Corollary 2.1.2. Any element of a nite eld with q elements satises
q
= 0.
The theorem does not give a method to nd a primitive element; usually one has to use
trial and error.
Example 2.1.3. 3 is a primitive element of F
17
.
Since the possible orders of a nonzero element of F
17
are 1, 2, 4, 8 and 16 but 3
2
= 9, 3
4
= 13
and 3
8
= 16 we see that 3 must have order 16.
2.2 The nite eld F
2
m
In the following we will consider some of the properties of polynomials with coef-
cients in a eld and present the construction of a nite eld with 2
m
elements.
Recall that if F is a eld, then F[x] denotes the set of polynomials with coefcients
from F, i.e. expressions of the form
a
n
x
n
+ +a
1
x +a
0
where a
i
F. We have the notion of the degree of a polynomial (denoted deg) and can
do addition and multiplication of polynomials.
Theorem 2.2.1. Let a(x), b(x) F[x], b(x) = 0; then there exist unique polynomials
q(x) and r(x) with deg (r(x)) < deg(b(x)) such that
a(x) = q(x)b(x) +r(x)
Proof. To prove the uniqueness suppose a(x) = q
1
(x)b(x) + r
1
(x) and a(x) =
q
2
(x)b(x) + r
2
(x) where deg(r
1
(x)) < deg(b(x)) and deg(r
2
(x)) < deg(b(x)). We
then get r
2
(x) r
1
(x) = b(x)(q
1
(x) q
2
(x)), but since the degree of the polyno-
mial on the left hand- side is smaller than the degree of b(x), this is only possible if
(q
1
(x) q
2
(x)) = 0 and r
2
(x) r
1
(x) = 0.
2.2 The nite eld F
2
m 23
To prove the existence we rst note that if deg(b(x)) > deg(a(x)) we have a(x) =
0 b(x) + a(x) so the claim is obvious here. If deg(b(x)) deg(a(x)) let a(x) =
a
n
x
n
+ +a
1
x +a
0
and b(x) = b
m
x
m
+ +b
1
x +b
0
and look at the polynomial
b
1
m
a
n
x
nm
b(x) a(x). This has degree < n so we can use induction on the degree of
a(x) to get the result.
Theorem 2.2.2. A polynomial of degree n has at most n zeroes.
Proof. Let a be a zero of the polynomial f (x) F[x]. By the above we have that
f (x) = q(x)(x a) +r(x) with deg(r(x)) < 1 so r(x) must be a constant. Since a is
a zero of f (x) we have 0 = f (a) = q(a) 0 +r(a) and therefore r(x) = r(a) = 0 so
f (x) = q(x)(x a) and q(x) has degree n 1. If b, b = a also is a zero of f (x), it
must be a zero of q(x) so we can repeat the argument and eventually get the result.
A polynomial f (x) F[x] is called irreducible if f (x) = a(x)b(x) implies that
deg(a(x)) = 0 or deg(b(x)) = 0 (e.g. either a(x) or b(x) is a constant).
Irreducible polynomials play the same role in F[x] as prime numbers for the integers,
since it can be shown that any polynomial can be written (uniquely) as a product of irre-
ducible polynomials. From this follows that if f (x) is irreducible and f (x)|a(x)b(x),
then f (x)|a(x) or f (x)|b(x). Actually the proofs are fairly easy modications of the
corresponding proofs for the integers.
Example 2.2.1. Irreducible polynomials from F
2
[x] of degree at most 4
These are: Of degree 1: x, x + 1. Irreducible polynomials of higher degree must have constant
term 1 since else x would be a factor and an odd number of terms since else 1 would be a zero
and therefore (x 1) would be a factor, so of degrees 2 and 3 we get: x
2
+x +1, x
3
+x +1 and
x
3
+x
2
+1. Of degree 4 we get: x
4
+x +1, x
4
+x
3
+1, x
4
+x
3
+x
2
+x +1. The polynomial
x
4
+ x
2
+1 which also has an odd number of terms and constant term 1 is not irreducible since
x
4
+ x
2
+1 = (x
2
+ x +1)
2
.
We are now ready to construct the eld F
2
m.
As elements of F
2
m we take the 2
m
m-tuples with elements from F
2
and addition is
dened coordinatewise. This implies that a +a = 0. It is easy to see that the rst four
axioms for a eld are satised.
Multiplication is somewhat more complicated. Let f (x) F
2
[x] be an irreducible
polynomial of degree m. Let a = (a
0
, a
1
, . . . , a
m1
) and b = (b
0
, b
1
, . . . , b
m1
) and
therefore a(x) = a
m1
x
m1
+ +a
1
x+a
0
and b(x) = b
m1
x
m1
+ +b
1
x+b
0
. We
dene the multiplication as a(x)b(x) modulo f (x), i.e. if a(x)b(x) = q(x) f (x)+r(x)
where deg(r(x)) < m, we set r = (r
0
, r
1
, . . . , r
m1
). If we let 1 = (1, 0, . . . , 0) we
see that 1 a = a.
With the addition and multiplication we have just dened we have constructed a eld.
This is fairly easy to see, again the most difcult part is to prove the existence of a
multiplicative inverse for the nonzero elements.
To this end we copy the idea from F
p
. Let a F
2
m\{0}. The set A = {a h|h
F
2
m\{0}} does not contain 0 since this would, for the corresponding polynomials, imply
24 Finite elds
that f (x)|a(x) or f (x)|h(x). Moreover the elements of A are different since if a h
1
=
a h
2
we get for the corresponding polynomials that f (x)|a(x)(h
1
(x) h
2
(x)), and
since f (x) is irreducible this implies f (x)|a(x) or f (x)|h
1
(x)h
2
(x). But this is only
possible if h
1
(x) = h
2
(x), since f (x) has degree m but both a(x) and h
1
(x) h
2
(x)
have degree < m. This gives that A = F
2
m \{0} and in particular that 1 A.
It can be proven that for any positive integer m there exists an irreducible polynomial
of degree m with coefcients in F
2
and therefore the above construction gives for any
positive integer m a nite eld with q = 2
m
elements; we denote this F
q
. It can also
be proven that essentially F
q
is unique.
Example 2.2.2. The the nite eld F
4
Let the elements be 00, 10, 01 and 11, then we get the following table for the addition:
+ 00 10 01 11
00 00 10 01 11
10 10 00 11 01
01 01 11 00 10
11 11 01 10 00
Using the irreducible polynomial x
2
+ x +1 we get the following table for multiplication:
00 10 01 11
00 00 00 00 00
10 00 10 01 11
01 00 01 11 10
11 00 11 10 01
From the multiplication table it is seen that the element 01 corresponding to the polynomial x is
primitive, i.e. has order 3.
We have seen that multiplication in a nite eld is easy once we have a primitive
element and from the construction above addition is easy when we have the elements
as binary m-tuples. Therefore we should have a table listing all the binary m-tuples and
the corresponding powers of a primitive element in order to do calculations in F
2
m . We
illustrate this in
16
The polynomial x
4
+ x + 1 F
2
[x] is irreducible and can therefore be used to construct F
16
.
The elements are the binary 4-tuples : (0, 0, 0, 0) . . . (1, 1, 1, 1). These can also be considered
as all binary polynomials of degree at most 3. If we calculate the powers of (0, 1, 0, 0) which
corresponds to the polynomial x we get:
x
0
= 1 , x
1
= x, x
2
= x
2
, x
3
= x
3
, x
4
= x +1, x
5
= x
2
+x, x
6
= x
3
+x
2
, x
7
= x
3
+x +1,
x
8
= x
2
+ 1, x
9
= x
3
+ x, x
10
= x
2
+ x + 1, x
11
= x
3
+ x
2
+ x, x
12
= x
3
+ x
2
+ x + 1,
x
13
= x
3
+ x
2
+1, x
14
= x
3
+1, x
15
= 1.
n
1 25
This means we can use (0, 1, 0, 0) as a primitive element which we call and list all the binary
4-tuples and the corresponding powers of .
binary 4 tuple power of polynomial
0000 0
0001
0
1
0010 x
0100
2
x
2
1000
3
x
3
0011
4
x +1
0110
5
x
2
+ x
1100
6
x
3
+ x
2
1011
7
x
3
+ x +1
0101
8
x
2
+1
1010
9
x
3
+ x
0111
10
x
2
+ x +1
1110
11
x
3
+ x
2
+ x
1111
12
x
3
+ x
2
+ x +1
1101
13
x
3
+ x
2
+1
1001
14
x
3
+1
If f (x) F
2
[x] is irreducible and has degree m it can be used to construct F
2
m as we
have seen. If we do this, then there exists an element F
2
m such that f () = 0,
namely the element corresponding to the polynomial x. This follows directly from the
way F
2
m is constructed.
Actually as we shall see the polynomial f (x) can be written as a product of polynomi-
als from F
2
m [x] of degree 1.
n
1
Our main objective in this section is to present an algorithm for factorization of x
n
1
into irreducible polynomials from F
2
[x]. We do this at the end of the section. Along
the way we shall dene the so-called minimal polynomials and prove some of their
properties.
Lemma 2.3.1. If a, b F
2
m, then (a +b)
2
= a
2
+b
2
.
Theorem 2.3.1. Let f (x) F
2
m[x]. Then f (x) F
2
[x] f (x
2
) = f (x)
2
.
Proof. Let f(x) = f (x) = f
k
x
k
+ + f
1
x + f
0
. Then f (x)
2
= ( f
k
x
k
+ +
f
1
x + f
0
)
2
= f
2
k
x
2k
+ + f
2
1
x
2
+ f
2
0
and f (x
2
) = f
k
x
2k
+ + f
1
x
2
+ f
0
. So
f (x)
2
= f (x
2
) f
2
i
= f
i
f
i
F
2
.
Corollary 2.3.1. Let f (x) F
2
[x]. If F
2
m is a zero of f (x) then so is
2
.
Theorem 2.3.2. (x
m
1)|(x
n
1) m|n.
26 Finite elds
Proof. Follows from the identity:
x
n
1 = (x
m
1)(x
nm
+ x
n2m
+ + x
nkm
) + x
nkm
1
Theorem 2.3.3. F
2
m is a subeld of F
2
n m|n.
Proof. If F
2
m is a subeld of F
2
n , then F
2
n contains an element of order 2
m
1
and therefore (2
m
1)|(2
n
1) and so m|n. If m|n we have (2
m
1)|(2
n
1) and
therefore(x
2
m
1
1)|(x
2
n
1
1) so (x
2
m
x)|(x
2
n
x) and hence F
2
m = {x|x
2
m
=
x} F
2
n = {x|x
2
n
= x}.
Denition 2.3.1. Let be an element of F
2
m. The minimal polynomial of , m
(x) is
the polynomial in F
2
[x] of lowest degree that has as a zero.
Since is a zero of x
2
m
x indeed there exists a binary polynomial with as a zero.
The minimal polynomial is unique since if there were two of the same degree their
difference would have lower degree but still have as a zero.
Theorem 2.3.4. Let be an element of F
2
m and let m
(x) F
2
[x] be the minimal
polynomial of . Then:
1. m
(x) is irreducible.
2. If f (x) F
2
[x] satisfy f ( ) = 0, then m
(x)| f (x).
3. x
2
m
x is the product of the different minimal polynomials of the elements of
F
2
m.
4. deg(m
(x)) m, with equality if is a primitive element.

Proof.
1. If m
(x) = a(x)b(x) we would have a( ) = 0 or b( ) = 0, contradicting the

minimality of the degree of m
(x).
2. Let f (x) = m
(x)q(x)+r(x) with deg(r(x)) < deg(m
(x)). This gives r( ) =

0 and therefore r(x) = 0.
3. Any element of F
2
m is a zero of x
2
m
x, so by 2 we get the result.
4. Since F
2
m can be seen as a vector space over F
2
of dimension m, the m + 1
elements, 1, , . . . ,
m
, are linearly dependent so there exists (a
0
, a
1
, . . . , a
m
),
a
i
F
2
such that a
m
m
+ + a
1
+ a
0
= 0. If is a primitive element
1, , . . . ,
m1
must be linearly independent, since if not, the powers of would
give fewer than 2
m
1 different elements.
Theorem 2.3.5. x
2
m
x = the product of all binary irreducible polynomials whose
degrees divide m.
n
1 27
Proof. Let f (x) be an irreducible polynomial of degree d where d|m. We want to
prove that f (x)|x
2
m
x. This is trivial if f (x) = x so we assume f (x) = x. Since
f (x) is irreducible it can be used to construct F
2
d and f (x) is the minimal polynomial
of some element in F
2
d ; so by 2, above, we have f (x)|x
2
d
1
1 and since d|m we
have from that 2
d
1|2
m
1 and therefore x
2
d
1
1|x
2
m
1
1. Conversely if f (x) is
irreducible, divides x
2
m
x and has degree d, we shall prove that d|m. Again we can
assume f (x) = x and hence f (x)|x
2
m
1
1. We can use f (x) to construct F
2
d . Let
F
2
d be a zero of f (x) and let be a primitive element of F
2
d , say
= a
d1
d1
+ +a
1
+a
0
(2.1)
Since f () = 0 we have
2
m
= and by equation (2.1) and Lemma 2.3.1 therefore
2
m
= and hence
2
m
1
= 1. Then the order 2
d
1 of must divide 2
m
1 so by
Theorem 2.3.2 d|m and we are nished.
Denition 2.3.2. Let n be an odd number and let j be an integer 0 j < n. The
cyclotomic coset containing j is dened as
{ j, 2 j mod n, . . . , 2
i
j mod n, . . . , 2
s1
j mod n}
where s is the smallest positive integer such that 2
s
j mod n = j .
If we look at the numbers, 2
i
j mod n, i = 0, 1, . . . , they can not all be different so
indeed the s in the denition above exists.
If n = 15 we get the following cyclotomic cosets:
{0}
{1, 2, 4, 8}
{3, 6, 12, 9}
{5, 10}
{7, 14, 13, 11}
We will use the notation that if j is the smallest number in a cyclotomic coset, then
that coset is called C
j
. The subscripts j are called coset representatives. So the above
cosets are C
0
, C
1
, C
3
, C
5
and C
7
. Of course we always have C
0
= {0}.
It can be seen from the denition that if n = 2
m
1 and we represent a number i as a
binary m-vector, then the cyclotomic coset containing i consists of that m-vector and
all its cyclic shifts.
The following algorithm is based on
Theorem 2.3.6. Suppose n is an odd number, and that the cyclotomic coset C
1
has
m elements. Let be a primitive element of F
2
m and =
2
m
1
n
. Then m
j =
ic
j
(x
i
).
28 Finite elds
Proof. Let f
j
=

ic
j
(x
i
), then f
j
(x
2
) = ( f
j
(x))
2
by Denition 2.3.2 and the
fact that has order n. It then follows from Theorem 2.3.1 that f
j
(x)F
2
[x].
We also have that f
j
(
j
) = 0 so m
j | f
j
(x) by Theorem 2.3.4. It follows from Corol-
lary 2.3.1 that the elements
_
j
_
i
with i C
j
also are zeroes of m
j (x) and therefore

m
j = f
j
(x).
Corollary 2.3.2. With notation as above we have
x
n
1 =
j
f
j
(x)
We can then give the algorithm. Since this is the rst time we present an algorithm, we
emphasize that we use the concept of an algorithm in the common informal sense of a
computational procedure, which can be effectively executed by a person or a suitable
programmed computer. There is abundant evidence that computable functions do not
depend on the choice of a specic nite set of basic instructions or on the programming
language. An algorithmtakes a nite input, and terminates after a nite number of steps
producing a nite output. In some cases no output may be produced, or equivalently a
particular message is generated. Some programs do not terminate on certain inputs, but
we exclude these from the concept of an algorithm. Here, algorithms are presented in
standard algebraic notation, and it should be clear fromthe context that the descriptions
can be converted to programs in a specic language.
Algorithm 2.3.1. Factorization of x
n
1
Input: An odd number n
1. Find the cyclotomic cosets modulo n.
2. Find the number m of elements in C
1
.
3. Construct the nite eld F
2
m, select a primitive element and put =
2
m
1
n
.
4. Calculate
f
j
(x) =
ic
j
(x
i
), j = 0, 1, . . . .
Output: The factors of x
n
1, f
0
(x), f
1
(x), . . .
One can argue that this is not really an algorithm because in step 3 one needs an irre-
ducible polynomial in F
2
[x] of degree m.
The table in appendix C not only gives such polynomials up to degree 16, these are
also chosen such that they are minimal polynomials of a primitive element.
For m 8 we have moreover given the minimal polynomials for
j
, where j is a coset
representative.
2.4 Problems 29
Example 2.3.1. Factorization of x
51
1
The cyclotomic cosets modulo 51 are
C
0
= {0}
C
1
= {1, 2, 4, 8, 16, 32, 13, 26}
C
3
= {3, 6, 12, 24, 48, 45, 39, 27}
C
5
= {5, 10, 20, 40, 29, 7, 14, 28}
C
9
= {9, 18, 36, 21, 42, 33, 15, 30}
C
11
= {11, 22, 44, 37, 23, 46, 41, 31}
C
17
= {17, 34}
C
19
= {19, 38, 25, 50, 49, 47, 43, 35}
We see that |C
1
| = 8. Since
2
8
1
51
= 5 we have that =
5
where is a primitive element
of F
2
8
.
Using the table from Appendix C (the section with m = 8) we get for the minimal polynomials:
m
1
(x) = 1 + x
m
(x) = m
5
(x) = x
8
+ x
7
+ x
6
+ x
5
+ x
4
+ x +1
m
3 (x) = m
15 (x) = x
8
+ x
7
+ x
6
+ x
4
+ x
2
+ x +1
m
5 (x) = m
25 (x) = x
8
+ x
4
+ x
3
+ x +1
m
9
(x) = m
45
(x) = x
8
+ x
5
+ x
4
+ x
3
+1
m
11
(x) = m
55
(x) = x
8
+ x
7
+ x
5
+ x
4
+1
m
17
(x) = m
85
(x) = x
2
+ x +1
m
19 (x) = m
95 (x) = x
8
+ x
7
+ x
4
+ x
3
+ x
2
+ x +1
2.4 Problems
Problem 2.4.1 In this problem we are considering F
17
.
1) What is the sum of all the elements?
2) What is the product of the nonzero elements?
3) What is the order of 2?
4) What are the possible orders of the elements?
5) Determine for all the possible orders an element of that order.
6) How many primitive elements are there?.
7) Try to solve the equation x
2
+ x +1 = 0.
8) Try to solve the equation x
2
+ x 6 = 0.
30 Finite elds
Problem 2.4.2 Let F be a eld.
1) Show that a b = 0 a = 0 or b = 0.
2) Show that {0, 1, 2, 3} with addition and multiplication modulo 4 is not a eld.
Problem 2.4.3 Let a F
q
. Determine
q2
j =0
(a
i
)
j
Problem 2.4.4 Determine all binary irreducible polynomials of degree 3.
Problem 2.4.5 Construct F
8
using f (x) = x
3
+ x + 1 , that is explain what the elements are
and how to add and multiply them.
Construct a multiplication table for the elements of F
8
{0, 1}.
Problem 2.4.6 Which of the following polynomials can be used to construct F
16
?
x
4
+ x
2
+ x, x
4
+ x
3
+ x
2
+1, x
4
+ x +1, x
4
+ x
2
+1.
Problem 2.4.7 We consider F
16
as constructed in Example 2.2.3
1) Determine the sum of all the elements of F
16
.
2) Determine the product of all the nonzero elements of F
16
.
3) Determine all the primitive elements of F
16
.
Problem 2.4.8
1) The polynomial z
4
+ z
3
+1 has zeroes in F
16
. How many?
2) The polynomial z
4
+ z
2
+ z has zeroes in F
16
. How many?
Problem 2.4.9 Let f (x) = (x
2
+ x +1)(x
3
+ x +1).
Determine the smallest number m such that f (x) has ve zeroes in F
2
m .
Problem 2.4.10 What are the possible orders of the elements in F
2
5
? and in F
2
6
?
Problem 2.4.11 Factorize x
9
1 over F
2
.
73
1 over F
2
.
85
1 over F
2
.
18
1 over F
2
.
Problem 2.4.15 Is x
8
+ x
7
+ x
6
+ x
5
+ x
4
+ x +1 an irreducible polynomial over F
2
?
Problem 2.4.16
1) Show that f (x) = x
4
+ x
3
+ x
2
+ x +1 is irreducible in F
2
[x].
2) Construct F
16
using f (x) , that is explain what the elements are and how to add and multiply
them.
3) Determine a primitive element.
4) Show that the polynomial z
4
+ z
3
+ z
2
+ z +1 has four roots in F
16
.
2.4 Problems 31
Problem 2.4.17 Let f (x) F
2
[x] be an irreducible polynomial of degree m. Then f (x) has a
zero in F
2
m .
By solving 1) 7) below you shall prove the following: f (x) has m different zeroes in F
2
m and
these have the same order.
1) Show that f (
2
) = f (
4
) = = f (
2
m1
) = 0.
2) Show that ,
2
, . . . ,
2
m1
have the same order.
3) Show that
2
i
=
2
j
, j > i
2
j i
= . (You can use the fact that a
2
= b
2
a = b)
Let s be the smallest positive number such that
2
s
= .
4) Show that ,
2
, . . . ,
2
s1
are different.
5) Show that g(x) = (x )(x
2
) (x
2
s1
) divides f (x).
6) Show that g(x) F
2
[x].
7) Show that g(x) = f (x) and hence s = m.
Problem 2.4.18
1) Determine the number of primitive elements of F
32
.
2) Show that the polynomial x
5
+ x
2
+1 is irreducible over F
2
.
3) Are there elements in F
32
that has order 15?
4) Is F
16
a subeld of F
32
?
We construct F
32
using the polynomial x
5
+x
2
+1. Let be an element of F
32
that satises
5
+
2
+1 = 0.
5) What is (x )(x
2
)(x
4
)(x
8
)(x
16
)?
6)
4
+
3
+ =
i
. What is i ?
F
32
can be seen as a vector space over F
2
.
7) Show that the dimension of this vector space is 5.
Let be an element of F
32
, = 0, 1.
8) Show that is not a root of a binary polynomial of degree less than 5.
9) Show that 1, ,
2
,
3
,
4
is a basis for the vector space.
10) What are the coordinates of
8
with respect to the basis 1, ,
2
,
3
,
4
?
Problem 2.4.19 Let C be a linear (n, k, d) code over F
q
.
1) Show that d equals minimal number of linearly dependent columns of a parity matrix H.
2) What is the maximal length of an (n, k, 3) code over F
q
?
3) Construct a maximal length (n, k, 3) code over F
q
and show that it is perfect, i.e. that a
1 +(q 1)n = q
nk
.
Chapter 3
Bounds on error probability for
error-correcting codes
In this chapter we discuss the performance of error-correcting codes in terms of error
probabilities. We derive methods for calculating the error probabilities, or at least
we derive upper bounds on these probabilities. As a basis for the discussion some
common probability distributions are reviewed in Section 3.1. Some basic knowledge
of probability theory will be assumed, but the presentation is largely self-contained.
In applying notions of probability, the relation to actual observations is always an issue.
In the applications we have in mind, the number of transmitted symbols is always
very large, and the probabilities are usually directly reected in observable frequencies.
Many system specications require very small output error probabilities, but even in
such cases the errors usually occur with a measurable frequency. In a few cases the
errors are so rare, that they may not be expected to occur within the lifetime of the
system. Such gures should be interpreted as design parameters that may be used for
comparison with reliability gures for critical system components.
3.1 Some probability distributions
Assume that the symbols fromthe binary alphabet, {0, 1}, have probabilities P[1] = p,
P[0] = 1 p. If the symbols in a string are mutually independent, we have
P[x
1
, x
2
, . . . , x
n
] =
i
P[x
i
] (3.1)
Lemma 3.1.1. The probability that a string of length n consists of j 1s and n j 0s is
given by the binomial distribution:
P[n, j ] =
_
n
j
_
p
j
(1 p)
nj
(3.2)
Proof. Follows from (3.1).
The expected value of j is = np, and the variance is
2
= np(1 p).
When n is large, it may be convenient to approximate the binomial distribution by the
Poisson distribution
P[ j ] = e
j
j !
(3.3)
where again is the expected value and the variance is
2
= . The formula (3.3)
may be obtained from (3.2) by letting n go to innity and p to zero while = np is
kept xed.
3.2 The probability of failure and error for bounded
distance decoding
If a linear binary (n, k) code is used for communication, we usually assume that the
codewords are used with equal probability, i.e. each codeword, c
j
has probability
P[c
j
] = 2
k
If errors occur with probability p, and are mutually independent and independent of
the transmitted symbol, we say that we have a binary symmetric channel (BSC).
For codes over larger symbol alphabets, there may be several types of errors with differ-
ent probabilities. However, if p is the probability that the received symbol is different
from the one transmitted, and the errors are mutually independent, (3.2) and (3.3) can
still be used.
In this section we consider bounded distance decoding, i.e. all patterns of at most t
errors and no other error patterns are corrected. In the particular case t =
_
d1
2
_
such
a decoding procedure is, as in Chapter 1, called minimum distance decoding.
If more that t errors occur in bounded distance decoding, the word is either not decoded,
or it is decoded to a wrong codeword. We use the term decoding error to indicate that
the decoder produces a word different from the word that was transmitted. We use the
term decoding failure to indicate that the correct word is not recovered. Thus decoding
failure includes decoding error. We have chosen to dene decoding failure in this way
because the probability of decoding error, P
err
, is typically much smaller than the
probability of decoding failure, P
fail
. If this is not the case, or there is a need for the
exact value of the probability that no decoded word is produced, it may be found as
P
fail
P
err
.
From (3.2) we get
Theorem 3.2.1. The probability of decoding failure for bounded distance decoding is
P
fail
= 1
t
j =0
_
n
j
_
p
j
(1 p)
nj
=
n
j =t +1
_
n
j
_
p
j
(1 p)
nj
(3.4)
3.2 The probability of failure and error for bounded distance decoding 35
The latter sum may be easier to evaluate for small p. In that case one, or a few, terms
often give a sufciently accurate result. Clearly for minimum distance decoding, the
error probability depends only on the minimum distance of the code.
If more than t errors occur, a wrong word may be produced by the decoder. The
probability of this event can be found exactly for bounded distance decoding. We
consider only the binary case.
Lemma 3.2.1. Let the zero word be transmitted and the weight of the decoded word be
w. If the error pattern has weight j and the distance from the received vector to the
decoded word is l, we have
j +l w = 2i 0
Proof. Since j is the distance from the transmitted word to the received vector, w the
distance to the decoded word, and l the distance fromthe received vector to the decoded
word, it follows from the triangle inequality that i 0. The error sequence consists of
j i errors among the w nonzero positions of the decoded word and i errors in other
coordinates. Thus l = i +(w j +i ).
We can now nd the number of such vectors.
Lemma 3.2.2. Let the weight of the decoded word be w; then the number of vectors at
distance j from the transmitted word, and at distance l from the decoded word, is
T( j, l, w) =
_
w
j i
__
n w
i
_
(3.5)
Here i =
j +lw
2
, for ( j +l w) even, and w l j w +l. Otherwise T is 0.
Proof. From Lemma 3.2.1.
Once a program for evaluating the function dened in Lemma 3.2.2 is available, or it
has been computed for a limited range of parameters, some of the following expressions
((3.6) and (3.10)) may be readily evaluated.
The weight enumerator for a linear code was dened in Section 1.4
A(z) =
n
w=0
A
w
z
w
where A
w
is the number of codewords of weight w. The probability of decoding error
is now found by summing (3.5) over all codewords, over j , and l t .
Theorem 3.2.2. The error probability for bounded distance decoding on a BSC with
probability p is
P
err
=
w>0
w+t
j =wt
t
l=0
A
w
T( j, l, w) p
j
(1 p)
nj
(3.6)
Proof. Since the code is linear, we may assume that the zero word is transmitted. The
received vector can be within distance t of at most one nonzero word, so the probability
is found exactly by the summation.
11
01
00
10
11
01
00
10
11
01
00
10
y
x
z
w
j
i, l
Figure 3.1: A (6, 3, 3) code is obtained from the codewords listed in Ex-
ample 1.1.2 by eliminating the rst symbol in the eight codewords that
start with a 0. In the gure, x represents the rst two bits of a code-
word, y the next two, and z the last. Vectors with Hamming distance 1
are represented as neighbors in this way, and it is readily checked that the
Hamming distance between codewords is at least 3. The lines illustrate
Lemma 3.2.2: There is a vector at distance 4 from the zero word and at
distance s = 1 from a codeword of weight w = 3. In this case i = 1.
Theorem 3.2.2 indicates that the weight enumerator contains the information required
to nd the error probability. However, nding the weight enumerator usually requires
extensive computations. It is usually possible to get a sufciently good approximation
by expanding (3.6) in powers of p and only include a few terms.
Example 3.2.1. Error probability of the (16, 11, 4) extended Hamming code.
The code can correct one error. Two errors are detected, but not corrected. Thus we may nd the
probability of decoding failure from (3.4) as
P
fail
= 1 (1 p)
16
16p(1 p)
15
= 120p
2
+
If p is not very large, decoding errors occur mostly as a result of three errors. From (3.5) we
nd T(3, 1, 4) = 4, and P
err
can be found from (3.6) using A
4
= 140. Actually three errors are
always decoded to a wrong codeword. Thus the rst term of the expansion in powers of p is
P
err
= 560p
3
+
3.3 Bounds for maximum likelihood decoding of binary block codes 37
3.3 Bounds for maximumlikelihood decoding of binary
block codes
It is often possible to decode more than
d
2
errors, but for such algorithms it is much
more difcult to calculate the exact error probability. In this section we give some
bounds for this case.
The following concept is particularly important:
Denition 3.3.1. Maximum likelihood (ML) decoding maps any received vector, r, to a
codeword c such that the distance d(r, c) is minimized.
The term likelihood refers to the conditional probability of receiving r given that c is
transmitted. If we assume that all codewords are used equally often, ML decoding
minimizes the probability of decoding error. If the closest codeword is not unique, we
choose one of them. The corresponding error patterns are then corrected while other
error patterns of the same weight cause decoding error. In Section 1.3 we discussed
how decoding could be based on syndrome tables. In principle one error pattern in
each coset may be decoded, and if these error patterns are known, we could extend
the summation in (3.4) and calculate the error probability. However, this approach is
feasible only for short codes.
This section gives some upper bounds for the error probability of ML decoding of
binary codes. These bounds depend only on the weight enumerator, and as for Theo-
rem 3.2.1 it is often sufcient to know the rst terms of the numerator.
Theorem 3.3.1. For a code with weight enumerator A(z) used on the BSC with bit error
probability p, an upper bound on the probability of decoding error is
P
err

w>0
j >
w
2
A
w
_
w
j
_
p
j
(1 p)
wj
+
1
2
j >0
A
2 j
_
2 j
j
_
p
j
(1 p)
j
(3.7)
Proof. Since the code is linear, we may assume that the zero word is transmitted. An
error occurs if the received vector is closer to some nonzero codeword than to the zero
vector. For each codeword of odd weight w > 0, the probability that the received
codeword is closer to this codeword than to the zero word is
j >
w
2
_
w
j
_
p
j
(1 p)
wj
since there must be errors in more than
w
2
of the positions where the nonzero codeword
has a 1. We now get an upper bound on the probability of the union of these events by
taking the sum of their probabilities. When w is even and there are errors in half of
the positions, there are at least two codewords at distance
w
2
. If a pattern of weight
w
2
occurs only in a single nonzero codeword, a decoding error is made with probability
1
2
.
If the same pattern occurs in more than one word, the error probability is higher, but
the pattern is then counted with weight
1
2
at least twice in (3.7).
Because most errors occur for j close to
w
2
, we can use this value of j for all error
patterns. Since half of all error patterns have weights greater than
w
2
, we now nd a
simpler approximation
P
err
<
w>0
A
w
2
w1
( p p
2
)
w
2
(3.8)
Introducing the function Z =
4p(1p), which depends only on the channel, we get

P
err
<
1
2
w>0
A
w
Z
w
(3.9)
This is a useful bound when the number of errors corrected is fairly large. Equations
(3.7) and (3.9) indicate that the error probability depends not only on the minimum
distance, but also on the number of low weight codewords. As p increases, codewords
of higher weight may contribute more to the error probability, because their number is
much greater.
In (3.7) we overestimate the error probability whenever a vector is closer to more than
one nonzero word than to the zero word. Actually even an error pattern of weight
d
2
may be equally close to the zero word and to several nonzero words, but that usually
happens only in relatively few cases.
When calculating the terms in (3.7) we do not specify the value of the error patterns
outside the w positions under consideration. Thus if there are several errors in other po-
sitions, it is more likely that the vector is also close to another codeword, and vectors of
high weight are clearly counted many times (but of course they have small probability).
We can get a better bound by keeping track of the weight of the entire error pattern.
The number of error patterns of weight j >
w
2
that are closer to a particular nonzero
word of weight w than to the zero word may be found using Lemma 3.2.2 as:
j
w
2
l<j
T( j, l, w)
We may again take a union bound by multiplying this number by A
w
and summing over
w. This sum is an upper bound on the number of errors of weight j that may cause a
decoding error. As long as the j -th term is less than the total number of weight j vec-
tors, we include it in the bound. However, when j becomes larger, we simply assume
that all vectors of weight j cause errors. Thus the bound (Poltyrevs bound) becomes
Theorem 3.3.2. For an (n, k, d) code used on the BSC with transition probability p, an
upper bound on the error probability is
P
err

w>0
A
w
w
2
<j J
l<j
T( j, l, w) p
j
(1 p)
nj
(3.10)
+
j >J
_
n
j
_
p
j
(1 p)
nj
+
1
2
l>0
lj J
A
2l
T( j, j, 2l) p
j
(1 p)
nj
3.4 Problems 39
Equation (3.10) is true for any choice of J, but the value should be chosen to minimize
the left side. As in (3.7) an extra term has been added to account for error patterns of
weight
w
2
. This bound usually gives an excellent approximation to the actual probabil-
ity of error with maximum likelihood decoding for all p of interest.
Example 3.3.1. Consider the (16, 5, 8) code with weight distribution A(z) = 1 + 30z
8
+ z
16
.
Using (3.7) we get the bound
P
err
30
_
1
2
_
8
4
_
p
4
(1 p)
4
+
_
8
5
_
p
5
(1 p)
3
+
_
In Theorem 3.3.2 we may include the weight 4 errors in the same way, and we again get
30 35 = 1050 as the rst term. Among error patterns of weight 5 we nd the 30 56 included in
the expression above, which are patterns with ve errors among the eight positions of a nonzero
word. However, there are additional error patterns obtained by adding a single error to a weight
4 error pattern in any of the eight zero coordinates of the codeword. These errors were implicit in
the rst term, but by counting them explicitly we see that the number, 1050 8+30 56 = 10080
exceeds the total number of weight 5 vectors, 4368. Thus the bound should be
P
err
1050p
4
(1 p)
12
+
_
16
5
_
p
5
(1 p)
11
+
_
16
6
_
p
6
(1 p)
10
A more detailed analysis shows that some 4-tuples are shared between the codewords, and thus
the number of weight 4 errors is slightly overestimated. Actually some weight 5 errors are also
corrected, and thus the second term is also a little too large.
3.4 Problems
Problem 3.4.1 Consider the (16, 11, 4) code used on a BSC with p = 0.01.
1) What is the mean value and the standard deviation of the number of errors in a block?
2) What are the probabilities of 0, 1, 2, 3, and 4 errors?
3) Compare these results to the approximation using the Poisson distribution.
4) What is the probability of decoding failure?
5) What is the probability of decoding error (as indicated in Example 3.2.1 this number is closely
approximated by the probability of 3 errors)?
Problem 3.4.2 The weight distribution of a (15, 7) code is
[1, 0, 0, 0, 0, 18, 30, 15, 15, 30, 18, 0, 0, 0, 0, 1]
Assume minimum distance decoding.
2) If the error probability is p = 0.01, what is the probability of decoding failure?
3) How many error patterns of weight 3 cause decoding error (either an exact number or a good
upper bound)?
4) Give an approximate value of the probability of decoding error based on (3.6).
Problem 3.4.3 There exists a (16, 8, 6) code with weight enumerator A(z) = 1 + 112z
6
+
30z
8
+112z
10
+z
16
(this is one of the few non-linear codes we shall mention, but all questions
may be answered as for a linear code).
2) What is the probability of decoding failure with p = 0.01?
3) How many error patterns of weight 4 and 6 have distance 2 to a particular codeword of
weight 6?
Problem 3.4.4 Consider the distribution of the number of errors in a code of length n = 256
and p = 0.01.
1) What is the average number of errors?
2) What is the probability that more than eight errors occur?
3) Is the Poisson distribution sufciently accurate in this case?
Problem 3.4.5 A (32, 16, 8) code has weight enumerator
A(z) = 1 +620z
8
+13888z
12
+36518z
16
+13888z
20
+620z
24
+ z
32
1) If maximum likelihood decoding is assumed, nd an upper bound on the error probability.
2) Is (3.9) a useful approximation ?
3) Is the bound signicantly improved by using (3.10)?
Note that since all codewords have even weight, half of the cosets contain only error patterns
of even weight and the other half of the cosets only errors of odd weight.
4) Is it possible that there are 35A
8
cosets containing 2 error patterns of weight 4 ?
Problem 3.4.6 Project Using a syndrome decoder, simulate minimum distance decoding of a
(32, 16, 8) code.
Does the error probability agree with the calculated value?
Similarly, perform ML decoding and compare the results with the theoretical value.
Chapter 4
Communication channels and
information theory
In Chapter 3 we discussed the Binary Symmetric Channel as a model of independent
errors in a transmitted sequence. We now give a more general treatment of models
of communication channels. An information channel is a model of a communication
link or a related system where the input is a message and the output is an imperfect
reproduction of it. The discussion is based on results from information theory, which
provides ways of measuring the amounts of information that can be transferred through
channels with given properties.
4.1 Discrete messages and entropy
To discuss how information is transmitted through a channel that introduces errors we
need at least a simple model of the messages we want to transmit. We assume that the
sender has an unlimited amount of data that he, or she, wants to send to a receiver. The
data is divided into messages, which we usually assume to be strings of independent
binary symbols. One reason for using this particular kind of message is that it is often
possible to convert a more realistic message into a string of binary data. We shall give a
more formal model of the messages here, because the concepts involved will be needed
for describing the channels.
For a discrete memoryless source the output is a sequence of independent random
variables with the same properties. Each output, X, has values in a nite alphabet
{x
1
, x
2
, . . . , x
r
}. The probability of x
j
is P(x
j
) = p
j
. It is often convenient to refer to
the probability distribution of X as the vector Q(X) = ( p
1
, p
2
, . . . , p
r
). As a measure
of the amount of information represented by X we dene
Denition 4.1.1. The entropy of a discrete memoryless source, X, is
H(X) = E
_
log P(X)
_
=
j
p
j
log p
j
(4.1)
where E indicates the expected value. Usually the logarithms are taken with base 2 in
this context, and the (dimensionless) unit of information is a bit.
We note that for an alphabet of r symbols, the maximal value of H is logr, and this
value is reached when all symbols have probability
1
r
.
Example 4.1.1. The entropy of a discrete source
A source has an alphabet of four symbols with probability distribution
_
1
2
,
1
4
,
1
8
,
1
8
_
. If the
symbols all had probability
1
4
, the entropy would be two bits. Now we nd
H =
1
2
log 2 +
1
4
log 4 +
1
8
log 8 +
1
8
log 8 =
1
2
+
1
2
+
3
4
=
7
4
The source symbols may be represented by on the average
7
4
binary symbols if we use the source
code a : 0, b : 10, c : 110, d : 111.
The importance of the information theoretic quantities is related to the coding theo-
rems, which indicate that sequences of symbols can be mapped into a standard repre-
sentation with the same information content, usually binary symbols. Thus an encoder
maps (a xed or variable) number of source symbols to a (xed or variable) number
of binary encoded symbols such that (on the average) N source symbols are mapped
to approximately NH(X) encoded symbols. The code in Example 4.1.1 is a simple
example of a variable length source code. Most interesting sources are more compli-
cated, and in particular they have memory. Coding for such sources is often referred to
as data compression.
4.2 Mutual information and
capacity of discrete channels
An information channel is a model of a communication link or a related system where
the input is a message and the output is an imperfect reproduction of it. The chan-
nel may describe a number of restrictions on the message and various transmission
impairments.
In this section we introduce the concept of mutual information as a measure of the
amount of information that ows through the channel, and we shall dene the capacity
of a channel.
4.2.1 Discrete memoryless channels
In a discrete memoryless channel the input and output are sequences of symbols, and
the current output depends only on the current input. The channel connects a pair
of random variables, (X, Y), with values from nite alphabets {x
1
, x
2
, . . . , x
r
} and
{y
1
, y
2
, . . . , y
s
}. The channel is described by the conditional probability of y
i
given x
j
,
P(y
i
|x
j
) = p
j i
. It is often convenient to represent the channel by the transition matrix
Q(Y|X) = [ p
j i
]. The probability distribution of the output variable, Y, then becomes
Q(Y) = Q(X)Q(Y|X)
4.2 Mutual information and capacity of discrete channels 43
As a measure of the amount of information about X represented by Y we dene
Denition 4.2.1. The mutual information of the pair, (X, Y) is
I (X; Y) = E
_
log
P(y|x)
P(y)
_
=
j
P(x
j
)
i
p
j i
_
log p
j i
log P(y
i
)
_
(4.2)
We note that for a given X, the mutual information I (X; Y) cannot exceed H(X). This
value is reached when Q is a unit matrix, i.e. each value of X corresponds to a unique
value of Y, since in that case P(X, Y) = P(Y).
It may also be noted that I (Y; X) = I (X; Y). The symmetry follows from rewriting
P(Y|X) as
P(Y,X)
P(X)
in the denition of I . This is an unexpected property since we tend
to think of the ow of information going from X to Y (carried by a ow of energy).
But actually we may interpret I as the amount of information about Y provided byX.
Lemma 4.2.1. I (X; Y) = H(Y) H(Y|X) = H(X) H(X|Y).
This follows immediately from the denition and is a convenient way of calculating I .
Usually it is easier to calculate I fromthe rst form, and this calculation is indicated on
the right side of (4.2). The term H(Y|X) can be found from the transition probabilities
of the channel, and H(Y) is found from the output distribution, which is obtained by
multiplying the input distribution by the transition matrix. To use the second form we
need to calculate H(X|Y) from the reverse transition probabilities.
Denition 4.2.2. The capacity of a discrete channel, C(Y|X) is the maximum of I with
respect to P(X).
The denition of capacity may appear straightforward, but analytical solutions are of-
ten difcult. In many cases of interest, the symmetry of the transition probabilities
suggests that the maximum is obtained with symmetric input probabilities. If neces-
sary the maximum over a few parameters can be computed numerically.
Example 4.2.1. The binary symmetric channel
The single most important channel model is the binary symmetric channel, (BSC). This channel
models a situation where random errors occur with probability p in binary data. The transition
matrix is
Q =
_
1 p p
p 1 p
_
For equally distributed inputs, the output has the same distribution, and the mutual information
may be found from Lemma 4.2.1 as (in bits/symbol)
C = 1 H( p) (4.3)
where H is the binary entropy function
H( p) = p log p (1 p) log(1 p) (4.4)
The entropy function is symmetrical with respect to p =
1
2
, and reaches its maximal value, 1,
there. Thus the capacity of the BSC drops to 0, which is expected since the output is independent
of the input. For small p the capacity decreases quickly with increasing p, and for p = 0.11, we
get C =
1
2
.
For a long (n, k) block code, we expect the number of errors to be close to np. For the
code to correct this number of errors it is necessary that the number of syndromes, 2
nk
is at least equal to the number of error patterns (this is the Hamming bound). We relate
this result to the capacity of the channel through the following useful approximation to
the binomial coefcients:
Lemma 4.2.2.
m
j =0
_
n
j
_
2
nH(
m
n
)
(4.5)
Proof. For
m
n
<
1
2
we can use the binomial expansion to get
1 =
_
m
n
+
_
1
m
n
__
n
j =0
_
n
j
_
_
m
n
_
j
_
1
m
n
_
nj
=
m
j =0
_
n
j
__
m
n
1
m
n
_
j
(1
m
n
)
n
j =0
_
n
j
__
m
n
1
m
n
_
m
_
1
m
n
_
n
= 2
nH(
m
n
)
m
j =0
_
n
j
_
and the inequality follows.
Thus np errors can be corrected only if H( p) < 1
k
n
, i.e. R =
k
n
< C.
The main difculty in proving that one can get a small probability of decoding error
for rates close to capacity is related to the fact that it is not possible to construct long
codes that reach the Hamming bound. Thus while it is possible to correct most errors
of weight close to np, not all errors can be corrected.
Example 4.2.2. The binary erasure channel
We say that the channel erases symbols if it outputs a special character ? in stead of the one sent.
If actual errors do not occur, the transition matrix becomes
Q =
_
1 p p 0
0 p 1 p
_
The capacity of this channel, the binary erasure channel, BEC, is easily found from the last ver-
sion of Lemma 4.2.1 as C = 1 p. Thus the information is simply reduced by the fraction of
the transmitted symbols that are erased. If a long (n, k) code is used on the BEC, the syndrome
equations provide a system of linear equations that may be solved to give the erased symbols.
If close to np symbols are erased and the rate of the code is less than 1 p, there are more
4.2 Mutual information and capacity of discrete channels 45
equations than variables. Since we know that at least the transmitted word is a solution, it is
usually unique. However, for any set of j columns of the matrix to be linearly independent, we
must have j < d, and for a binary code d is much smaller than n k. Thus it is not always
possible to correct the erasures.
Example 4.2.3. Let the transition matrix of a discrete channel be
Q =
_
_
_
_
_
1
2
1
2
0 0
0
1
2
1
2
0
0 0
1
2
1
2
1
2
0 0
1
2
_
_
_
_
_
Because of the symmetry, all the input symbols may be chosen to have probability
1
4
. The mu-
tual information is then readily calculated from Lemma 4.2.1 as I = 1. This value cannot be
increased by changing the input distribution, but it is more convenient to use symbols 1 and 3
with probability
1
2
, and in this case one bit is transmitted without error.
Other discrete channels may be used as models of the signal processing in modula-
tors/demodulators (modems). Many channel models of practical interest are derived
by assuming that the input symbols are real values, and that the channel adds indepen-
dent Gaussian noise of zero mean and some variance to the input. In Appendix A we
discuss coding for such channels and give the capacity of a Gaussian noise channel.
4.2.2 Codes and channel capacity
The importance of the capacity is related to coding theorems, which indicate that k
message bits can be reliably communicated by using the channel a little more than
n =
k
C
times. Thus an encoder maps the k message bits to n encoded symbols using
a code consisting of 2
k
vectors. A coding theorem states that for any code length, n,
and for any rate R < C (in bits per channel symbol), there is a positive constant, E(R)
such that for some codes the error probability satises
P(e) < 2
NE(R)
(4.6)
For most channels of interest it has not been possible to give direct proofs by construct-
ing good codes. Instead the proofs rely on averages over large sets of codes. We give
an outline of a proof for the BSC.
If a class of linear block codes use all nonzero vectors with equal probability, the av-
erage weight enumerator is obtained by scaling the binomial distribution to the right
number of words.
A(z) = 1 +
w>0
2
n+k
_
n
w
_
z
w
(4.7)
It may be noted that with this distribution Theorem1.4.1 gives a B(z) of the same form
with k
= n k. For small weight, w, the number of codewords is less than 1, and

the Varshamov-Gilbert bound, Theorem 1.2.2 indicates the rst weight where the dis-
tribution exceeds 1. Combining the bound (3.9) with this weight distribution we get
the following result:
Theorem 4.2.1. For rates R < R
0
and any block length n there exist block codes such
that the error probability on a BSC satises
P(e) < 2
n(R
0
R)
(4.8)
where
R
0
= 1 log(1 + Z) (4.9)
Proof. From (3.9) and (4.5) we get
P(e) < 2
n+k+nH(
w
n
)+wlog(Z)
The channel parameter Z =

4p(1 p) was introduced in Chapter 3. Here we nd
the exponent in the bound by taking the largest term, which occurs for
w
N
=
Z
1+Z
. The
result then follows.
A different method is needed to obtain a positive exponent for rates close to capacity.
For a memoryless channel, we can reach the maximal value of I by using independent
inputs. However, it is not clear how such symbols could be used for communicating a
message efciently. If we want to transmit k information bits reliably, we must use the
channel at least n =
k
C
times in order to get enough information. A block code is ex-
actly a rule for selecting a vector of n symbols in such a way that with high probability
we can recover the message from the received vector.
When the inputs are no longer independent, less information is in general transmit-
ted through the channel. In principle the loss could be calculated by converting the
channel into a vector channel. Now the input is one of the 2
k
equally likely message
vectors, and since the channel is memoryless, the transition probabilities are found as
the product of the symbol probabilities.
We can now derive some information from Lemma 4.2.1: Assume that we want to
get the mutual information close to k = nC. One version of the lemma gives that
H(X) = k, so H(X|Y) should be zero. Thus an output vector should almost always
indicate a unique transmitted message. The other version of the Lemma gives
H(Y
1
, Y
2
, . . . , Y
n
) nC +nH(Y|X) = nH(Y)
Thus the output distribution should be close to that of n independent symbols, and
the channel should spread out the output vectors enough to eliminate the effect of the
selected input vectors.
4.3 Problems 47
4.3 Problems
Problem 4.3.1 A memoryless source has three symbols and probability distribution
_
1
2
,
1
4
,
1
4
_
.
1) What is the entropy of the source?
2) What is the largest entropy of a source with three symbols?
3) Give a variable length binary code for the source.
4) Let u
i
be a sequence of independent variables with probability distribution
_
1
2
,
1
2
_
, and let
v
i
be generated as v
i
=
u
i
u
i1
2
. Find the probability distribution of v
i
. Is V a memoryless
source? Argue that the entropy of V is 1 bit.
Problem 4.3.2 A memoryless channel has transition probabilities
Q =
_
1
2
3
8
1
8
0
0
1
8
3
8
1
2
_
1) Find the output distribution when the input distribution is
_
1
2
,
1
2
_
.
2) Find the mutual information between input and output for this distribution.
3) What is the channel capacity?
Problem 4.3.3 A binary symmetric channel has transition probability p = 0.05.
1) What is the capacity of the channel?
2) If a code of length 256 is used, what is the average number of errors?
3) What is the Hamming bound on the number of errors that can be corrected with a (256, 160)
code?
4) What would be the probability of decoding error if such a code could be used?
5) What is the Gilbert bound for the same parameters?
Problem4.3.4 The binary Z channel. In some binary channels one of the input symbols is much
more likely to be in error than the other. Consider the channel with transition matrix
Q =
_
1 0
1 p p
_
1) Let the input symbols have probability
1
2
each. What is the mutual information?
2) How much bigger is the channel capacity for p = 0.1, p =
1
4
, p =
1
2
?
3) Find an analytic expression.
Problem 4.3.5 A discrete memoryless channel has 17 input and output symbols, both indicated
by [x
1
, x
2
, x
3
, . . . , x
17
]. The probability of error, i.e. the probability that Y = X, is p.
1) What is the channel capacity if all other symbols occur with the same probability,
p
16
?
2) What is the capacity if the only error symbols are x
( j 1 mod 17)
when x
j
is transmitted and
their probability is
p
2
?
3) Evaluate the capacities for p = 0.11.
Chapter 5
Reed-Solomon codes and
their decoding
Reed-Solomon codes were discovered in 1959 and have since then been one of the
most important classes of error-correcting codes. The applications range fromCDs and
DVDs to satellite communications. In this chapter we will describe the Reed-Solomon
codes and also give some decoding algorithms of these codes.
5.1 Basic denitions
Before introducing the codes we will rst prove an upper bound on the minimum dis-
tance of any code.
Theorem 5.1.1. (The Singleton bound) Let C be an (n, k) code with minimum distance
d. Then
d n k +1
Proof. We give three different proofs of the theorem, in one of which we do not assume
that the code is linear.
1) Choose the information vector to consist of k 1 zeroes and one nonzero. Then the
weight of the corresponding codeword is at most (n k) +1.
2) The rank of a parity check matrix H for the code is n k so n k +1 columns of H
are linearly dependent. Therefore the minimum number of dependent columns (i.e.
d) must be smaller than or equal to n k +1.
3) If we delete d 1 xed positions from all the q
k
codewords, they are still differ-
ent since each pair differ in at least d positions. There are q
nd+1
vectors with the
remaining positions and thus k n d +1 and the result follows.
Denition 5.1.1. (Reed-Solomon codes)
Let x
1
, . . . , x
n
be different elements of a nite eld F
q
. For k n consider the set P
k
of polynomials in F
q
[x] of degree less than k. A Reed-Solomon code consists of the
codewords
( f (x
1
), f (x
2
), . . . , f (x
n
)) where f P
k
It is clear that the length of the code is n q. The code is linear since if
c
1
= ( f
1
(x
1
), . . . , f
1
(x
n
)) and c
2
= ( f
2
(x
1
), . . . , f
2
(x
n
)),
then
ac
1
+bc
2
= (g(x
1
), . . . , g(x
n
))
where a, b F
q
and g(x) = af
1
(x) +bf
2
(x).
The polynomials in P
k
form a vector space over F
q
of dimension k since there are
k coefcients. We now invoke a fundamental theorem of algebra (Theorem 2.2.1)
twice: Two distinct polynomials cannot generate the same codeword since the differ-
ence would be a polynomial of degree < k and it cannot have n zeroes, so the dimen-
sion of the code is k. A codeword has weight at least n k +1 since a polynomial of
degree < k can have at most k 1 zeroes. Combining this with Theorem 5.1.1 we get
Theorem 5.1.2. The minimum distance of an (n, k) Reed-Solomon code is n k +1.
In many applications one takes x
i
=
i1
, i = 1, 2, . . . , q 1 where is a primitive
element of F
q
, so in this case we have x
n
i
= 1 , i = 1, . . . , n, and n = q 1.
From the denition of the codes it can be seen that one way of encoding these codes is
to take k information symbols i
0
, i
1
, . . . , i
k1
and encode them as
(i (x
1
), i (x
2
), . . . , i (x
n
))
where i (x) = i
k1
x
k1
+ +i
1
x +i
0
.
This is a non-systematic encoding; we will describe a systematic encoding in a problem
in the next chapter.
Since for Reed-Solomon Codes we must have n q there are no interesting binary
codes. Codes over F
q
where q is a prime make easier examples and in particular the
eld F
11
is useful for decimal codes, since there is no eld with ten elements. However
in most practical cases we have q = 2
m
.
Example 5.1.1. Reed-Solomon codes over F
11
Since 2 is a primitive element of F
11
we can take x
i
= 2
i1
mod 11, i = 1, 2, . . . , 10 with
k = 5 and i (x) = i
4
x
4
+ +i
1
x +i
0
; we get as the corresponding codeword as
(i (1), i (2), i (4), i (8), i (5), i (10), i (9), i (7), i (3), i (6))
5.2 Decoding Reed-Solomon Codes 51
So
(1, 0, 0, 0, 0) is encoded into (1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(0, 1, 0, 0, 0) is encoded into (1, 2, 4, 8, 5, 10, 9, 7, 3, 6)
(0, 0, 1, 0, 0) is encoded into (1, 4, 5, 9, 3, 1, 4, 5, 9, 3)
(0, 0, 0, 1, 0) is encoded into (1, 8, 9, 6, 4, 10, 3, 2, 5, 7)
(0, 0, 0, 0, 1) is encoded into (1, 5, 3, 4, 9, 1, 5, 3, 4, 9)
and these ve codewords can be used as the rows of a generator matrix of the code, which is a
(10, 5, 6) code over F
11
.
5.2 Decoding Reed-Solomon Codes
In this section we describe the rst of three minimum distance decoding algorithms
for Reed-Solomon codes (in Chapter 12 we present an algorithm for correcting more
errors). We will rst present the idea and give a formal algorithm later.
Let r = c + e be a received word, and assume that w(e) t =
_
nk
2
_
. The idea is to
determine a bivariate polynomial
Q(x, y) = Q
0
(x) + y Q
1
(x) F
q
[x, y]\{0}
such that
1. Q(x
i
, r
i
) = 0, i = 1, . . . , n.
2. deg(Q
0
) n 1 t .
3. deg(Q
1
) n 1 t (k 1).
The polynomial Q(x, y) is called an interpolating polynomial for the received word.
We rst prove:
Theorem 5.2.1. There is at least one nonzero polynomial Q(x, y) which satises con-
ditions 1 3.
Proof. The condition 1 gives n homogeneous linear equations in the coefcients and
there are n 1 t + 1 + n 1 t (k 1) + 1 n + 1 possible coefcients, so
indeed the system has a nonzero solution.
We also have
Theorem 5.2.2. If the transmitted word is generated by g(x) and the number of errors
is less than
d
2
, then g(x) =
Q
0
(x)
Q
1
(x)
.
Proof. c = (g(x
1
), . . . , g(x
n
)) and r = c +e with w(e) t . The polynomial Q(x, y)
satises Q(x
i
, g(x
i
) + e
i
) = 0 and since e
i
= 0 for at least n t i s, we see that
the univariate polynomial Q(x, g(x)) has at least n t zeroes, namely the x
i
s where
g(x
i
) = r
i
. But Q(x, g(x)) has degree at most n t 1, so Q(x, g(x)) = 0 and
therefore Q
0
(x) + g(x)Q
1
(x) = 0 and hence g(x) =
Q
0
(x)
Q
1
(x)
.
The maximal degrees of the components of Q will be used frequently in the following,
so we dene
l
0
= n 1 t and l
1
= n 1 t (k 1)
Note that since Q(x, y) = Q
1
(x)
_
y +
Q
0
(x)
Q
1
(x)
_
= Q
1
(x)(y g(x)) the x
i
s where
the errors occurred are among the zeroes of Q
1
(x), therefore the polynomial Q
1
(x) is
called an error locator polynomial.
The algorithm now can be presented as follows:
Algorithm 5.2.1.
Input: A received word r = (r
1
, r
2
, . . . , r
n
)
1. Solve the system of linear equations
_
_
_
_
_
_
1 x
1
x
2
1
. . . x
l
0
1
r
1
r
1
x
1
. . . r
1
x
l
1
1
1 x
2
x
2
2
. . . x
l
0
2
r
2
r
2
x
2
. . . r
2
x
l
1
2
.
.
.
.
.
.
.
.
. . . .
.
.
.
.
.
.
.
.
. . . .
.
.
.
1 x
n
x
2
n
. . . x
l
0
n
r
n
r
n
x
n
. . . r
n
x
l
1
n
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Q
0,0
Q
0,1
Q
0,2
.
.
.
Q
0,l
0
Q
1,0
Q
1,1
.
.
.
Q
1,l
1
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
0
0
.
.
.
0
0
0
.
.
.
0
_
_
(5.1)
2. Put
Q
0
(x) =
l
0
j =0
Q
0, j
x
j
, Q
1
(x) =
l
1
j =1
Q
1, j
x
j
, g(x) =
Q
0
(x)
Q
1
(x)
3. If g(x) F
q
[x]
output: (g(x
1
), g(x
2
), . . . , g(x
n
))
else
output: failure
Notice in the system above that each row of the matrix corresponds to a pair (x
i
, r
i
).
We have already seen that if the number of errors is smaller than half the minimum
distance, then the output of the algorithm is the sent word.
5.3 Vandermonde matrices 53
Example 5.2.1. Decoding the (10, 5, 6) Reed-Solomon code over F
11
We treat the code from Example 5.1.1 and suppose we receive r = (5, 9, 0, 9, 0, 1, 0, 7, 5). We
have l
0
= 7 and l
1
= 3 and therefore we get 10 equations with 11 unknowns. The matrix
becomes
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1 1 5 5 5 5
1 2 4 8 5 10 9 7 9 7 3 6
1 4 5 9 3 1 4 5 0 0 0 0
1 8 9 6 4 10 3 2 9 6 4 10
1 5 3 4 9 1 5 3 0 0 0 0
1 10 1 10 1 10 1 10 1 10 1 10
1 9 4 3 5 1 9 4 0 0 0 0
1 7 5 2 3 10 4 6 7 5 2 3
1 3 9 5 4 1 3 9 0 0 0 0
1 6 3 7 9 10 5 8 5 8 4 2
_
_
The system has as a solution
(4, 1, 2, 2, 2, 9, 1, 0, 7, 3, 10, 0) corresponding to Q
0
(x) = x
6
+9x
5
+2x
4
+2x
3
+2x
2
+x +4
and Q
1
(x) = 10x
2
+3x +7. We then get g(x) = x
4
+x
3
+x
2
+x +1 corresponding to the code-
word c = (5, 9, 0, 6, 0, 1, 0, 7, 0, 4) so we have corrected two errors in positions corresponding
to 2
3
(= 8) and 2
9
(= 6) and one sees that indeed 8 and 6 are the zeroes of Q
1
(x).
5.3 Vandermonde matrices
In this section we give some results for matrices with a special structure. As we will
demonstrate these matrices play a signicant role in the following, in particular we nd
parity check matrices for Reed-Solomon codes.
Lemma 5.3.1. Let F
q
be an element of order n, i.e.
n
= 1 and
i
= 1 for
0 < i < n and let x
j
=
j 1
, j = 1, 2, . . . , n. Let
A =
_
_
_
_
_
1 1 . . . 1
x
1
x
2
. . . x
n
.
.
.
.
.
. . . .
.
.
.
x
a
1
x
a
2
. . . x
a
n
_
_
and B =
_
_
_
_
_
x
1
x
2
. . . x
n
x
2
1
x
2
2
. . . x
2
n
.
.
.
.
.
. . . .
.
.
.
x
s
1
x
s
2
. . . x
s
n
_
_
where
s +a +1 n
then
BA
T
= 0
Before we prove the claim we note that it also follows that AB
T
= 0.
Proof. Let C = BA
T
. Since b
i j
= x
i
j
and a
r j
= x
r1
j
we get
c
ir
=
n
j =1
x
i
j
x
r1
j
=
n
j =1
x
i+r1
j
=
n
j =1
_
i+r1
_
j 1
Since
i +r 1 s +a n 1
so
i+r1
= 1
we get
c
ir
=
_
i+r1
_
n
1
i+r1
1
but
_
n
_
i+r1
= 1
and the result follows.
If a = n 1 the A-matrix is a so-called Vandermonde matrix, and here the determinant
can be calculated.
Theorem 5.3.1. Let
D
n
=
1 x
1
x
2
1
. . . x
n1
1
1 x
2
x
2
2
. . . x
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n
x
2
n
. . . x
n1
n
where x
i
F
q
.
Then
D
n
=
n
i>j
_
x
i
x
j
_
Proof. By induction on n. We have D
2
= x
2
x
1
. D
n
can be considered as a polyno-
mial of degree n 1 in the variable x
n
. This has zeroes x
1
, x
2
, . . . x
n1
and since the
coefcient to x
n1
n
is D
n1
we get
D
n
=
n1
i=1
(x
n
x
i
) D
n1
Corollary 5.3.1.
x
i
1
x
i
2
. . . x
i
t
x
2
i
1
x
2
i
2
. . . x
2
i
t
.
.
.
.
.
.
.
.
.
.
.
.
x
t
i
1
x
t
i
2
. . . x
t
i
t
= x
i
1
x
i
t
1l<j t
_
x
i
j
x
i
l
_
5.3 Vandermonde matrices 55
For later use we have also
Corollary 5.3.2.
x
i
1
x
i
2
. . . x
i
S1
S
1
x
i
S+1
. . . x
i
t
x
2
i
1
x
2
i
2
. . . x
2
i
S1
S
2
x
2
i
S+1
. . . x
2
i
t
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
t
i
1
x
t
i
2
. . . x
t
i
S1
S
t
x
t
i
S+1
. . . x
t
i
t
=x
i
1
x
i
s1
x
i
s+1
x
i
t
1l<S<j t
_
x
i
j
x
i
l
_
t
r=1
p
(S)
r
S
r
where p
(S)
r
are the coefcients of the polynomial.
P
(S)
(x) =
t
r=1
p
(S)
r
x
r
r = (1)
s1
x
t
m=1
m=s
_
x x
i
m
_
Proof.
d =
x
i
1
x
i
2
. . . x
i
s1
x x
i
s+1
. . . x
i
t
x
2
i
1
x
2
i
2
. . . x
2
i
s1
x
2
x
2
i
s+1
. . . x
2
i
t
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
t
i
1
x
t
i
2
. . . x
t
i
s1
x
t
x
t
i
s+1
. . . x
t
i
t
= x
i
1
x
i
s1
x
i
s+1
x
i
t
1l<s<j <t
_
x
i
j
x
i
l
_
(1)
s1
x
t
m=1
m=s
_
x x
i
m
_
by Corollary 5.3.1 and hence
d = x
i
1
x
i
s1
x
i
s+1
x
i
t
1l<s<j <t
_
x
i
j
x
i
l
_
P
(s)
(x)
so by replacing the x
j
with S
j
the claim follows.
We return to the matrix A and write the elements as powers of and get:
A =
_
_
_
_
_
1 1 . . . 1
1 . . .
n1
.
.
.
.
.
. . . .
.
.
.
1
a
. . .
(n1)a
_
_
In the special case where A is a square matrix i.e. a = n 1 and g is the vector of
coefcients of a polynomial g(x) of degree < n, then
G = Ag
T
is a vector of values of g(x). The inverse matrix has the same form. In this way the
encoding of Reed-Solomon codes can be interpreted as a nite eld transform (some-
times called a nite eld Fourier Transform). Similary the syndrome can be described
as the transform of the received word.
Based on the above results it is easy to obtain a parity check matrix for a Reed-Solomon
code in the special case when x
j
=
j 1
where is an element of order n in F
q
(so
n|q 1). A generator matrix for the code is
G =
_
_
_
_
_
1 1 . . . 1
x
1
x
2
. . . x
n
.
.
.
.
.
. . . .
.
.
.
x
k1
1
x
k1
2
. . . x
k1
n
_
_
It then follows from Lemma 5.3.1 that a parity check matrix for the code is:
H =
_
_
_
_
_
x
1
x
2
. . . x
n
x
2
1
x
2
2
. . . x
2
n
.
.
.
.
.
. . . .
.
.
.
x
nk
1
x
nk
2
. . . x
nk
n
_
_
Or written with powers of
H =
_
_
_
_
_
1 . . .
n1
1
2
. . . (
2
)
n1
.
.
.
.
.
. . . .
.
.
.
1
nk
. . . (
nk
)
n1
_
_
(5.2)
Example 5.3.1. (Example 5.1.1 continued)
A parity check matrix for the (10, 5, 6) code over F
11
is then
_
_
_
_
_
_
1 2 4 8 5 10 9 7 3 6
1 4 5 9 3 1 4 5 9 3
1 8 9 6 4 10 3 2 5 7
1 5 3 4 9 1 5 3 4 9
1 10 1 10 1 10 1 10 1 10
_
_
5.4 Another decoding algorithm 57
5.4 Another decoding algorithm
In this section we present another version of minimum distance decoding of RS codes
(historically this was the rst such algorithm).
We treat the case where x
i
=
i1
where is an element of order n in F
q
.
Here the decoding problem is split into two stages. First we nd an error locator poly-
nomial i.e. Q
1
(x) and then we determine the error values.
Let r = c +e be a received word with w(e) <
d
2
. Let the syndrome S = (S
1
, S
2
, . . . ,
S
nk
) be
S = Hr
T
where H is the parity check matrix( 5.2). With r = (r
1
, r
2
, . . . , r
n
) and r(x) =
r
n
x
n1
+ +r
2
x +r
1
this means that
S
i
= r(
i
) = e(
i
).
In the following we also refer to the individual S
i
s as syndromes.
Theorem 5.4.1. An error locator Q
1
is a solution to the system of linear equations
_
_
_
_
_
S
1
S
2
. . . S
l
1
+1
S
2
S
3
. . . S
l
1
+2
.
.
.
.
.
. . . .
.
.
.
S
l
1
S
l
1
+1
. . . S
2(l
1
)
_
_
_
_
_
_
_
_
_
Q
1,0
Q
1,1
Q
1,2
.
.
.
Q
1,l
1
_
_
=
_
_
_
_
_
_
_
0
0
0
.
.
.
0
_
_
Proof. We use the results of Section 5.2 to prove rst that if Q
1
(x) is an error locator
then the system of equations is satised and then if we have a solution to the system of
equations we can get the interpolating polynomial Q(x, y).
We note that the system of equations to determine Q(x, y) can be written as
_
_
_
_
_
_
1 x
1
x
2
1
. . . x
l
0
1
1 x
2
x
2
2
. . . x
l
0
2
.
.
.
.
.
.
.
.
. . . .
.
.
.
1 x
n
x
2
n
. . . x
l
0
n
_
_
_
_
_
_
_
_
_
Q
0,0
Q
0,1
Q
0,2
.
.
.
Q
0,l
0
_
_
+
_
_
_
_
_
_
r
1
r
1
x
1
. . . r
1
x
l
1
1
r
2
r
2
x
2
. . . r
2
x
l
1
2
.
.
.
.
.
. . . .
.
.
.
r
n
r
n
x
n
. . . r
n
x
l
1
n
_
_
_
_
_
_
_
_
_
Q
1,0
Q
1,1
Q
1,2
.
.
.
Q
1,l
1
_
_
=
_
_
_
0
.
.
.
0
_
_
(5.3)
Since the rst part of this system is independent of the received word we can remove it
by multiplying the system with
B =
_
_
_
_
_
x
1
x
2
. . . x
n
x
2
1
x
2
2
. . . x
2
n
.
.
.
.
.
. . . .
.
.
.
x
l
1
1
x
l
1
2
. . . x
l
1
n
_
_
it follows from Section 5.3 that we get
_
_
_
_
_
x
1
x
2
. . . x
n
x
2
1
x
2
2
. . . x
2
n
.
.
.
.
.
. . . .
.
.
.
x
l
1
1
x
l
1
2
. . . x
l
1
n
_
_
_
_
_
_
_
_
r
1
r
1
x
1
. . . r
1
x
l
1
1
r
2
r
2
x
2
. . . r
2
x
l
1
2
.
.
.
.
.
. . . .
.
.
.
r
n
r
n
x
n
. . . r
n
x
l
1
n
_
_
_
_
_
_
_
_
_
Q
1,0
Q
1,1
Q
1,2
.
.
.
Q
1,l
1
_
_
=
_
_
_
_
_
_
_
0
0
0
.
.
.
0
_
_
This is the same as
_
_
_
_
_
x
1
x
2
. . . x
n
x
2
1
x
2
2
. . . x
2
n
.
.
.
.
.
. . . .
.
.
.
x
l
1
1
x
l
1
2
. . . x
l
1
n
_
_
_
_
_
_
_
r
1
. . . 0 0
0 r
2
. . . 0
.
.
.
.
.
.
.
.
.
0
0 0 . . . r
n
_
_
_
_
_
_
_
1 x
1
. . . x
l
1
1
1 x
2
. . . x
l
1
2
.
.
.
.
.
. . . .
.
.
.
1 x
n
. . . x
l
1
n
_
_
_
_
_
_
_
_
_
Q
1,0
Q
1,1
Q
1,2
.
.
.
Q
1,l
1
_
_
=
_
_
_
_
_
_
_
0
0
0
.
.
.
0
_
_
If we let
D(r) =
_
_
_
_
_
x
1
x
2
. . . x
n
x
2
1
x
2
2
. . . x
2
n
.
.
.
.
.
. . . .
.
.
.
x
l
1
1
x
l
1
2
. . . x
l
1
n
_
_
_
_
_
_
_
r
1
. . . 0 0
0 r
2
. . . 0
.
.
.
.
.
.
.
.
.
0
0 0 . . . r
n
_
_
_
_
_
_
_
_
1 x
1
. . . x
l
1
1
1 x
2
. . . x
l
1
2
.
.
.
.
.
. . . .
.
.
.
1 x
n
. . . x
l
1
n
_
_
then an easy calculation shows that d
i j
= S
i+j 1
, i = 1, . . . , l
1
j = 1, . . . , l
1
+1 and
that D(r) = D(e).
So Q
1
(x) is a solution to the system
_
_
_
_
_
S
1
S
2
. . . S
l
1
+1
S
2
S
3
. . . S
l
1
+2
.
.
.
.
.
. . . .
.
.
.
S
l
1
S
l
1
+1
. . . S
2(l
1
)
_
_
_
_
_
_
_
_
_
Q
1,0
Q
1,1
Q
1,2
.
.
.
Q
1,l
1
_
_
=
_
_
_
_
_
_
_
0
0
0
.
.
.
0
_
_
(5.4)
5.4 Another decoding algorithm 59
On the other hand
If Q
1
(x) solves (5.4) we have that
_
_
_
_
_
x
1
x
2
. . . x
n
x
2
1
x
2
2
. . . x
2
n
.
.
.
.
.
. . . .
.
.
.
x
l
1
1
x
l
1
2
. . . x
l
1
n
_
_
_
_
_
_
_
_
r
1
r
1
x
1
. . . r
1
x
l
1
1
r
2
r
2
x
2
. . . r
2
x
l
1
2
.
.
.
.
.
. . . .
.
.
.
r
n
r
n
x
n
. . . r
n
x
l
1
n
_
_
_
_
_
_
_
_
_
Q
1,0
Q
1,1
Q
1,2
.
.
.
Q
1,l
1
_
_
=
_
_
_
_
_
_
_
0
0
0
.
.
.
0
_
_
and therefore the vector
_
_
_
_
_
_
r
1
r
1
x
1
. . . r
1
x
l
1
1
r
2
r
2
x
2
. . . r
2
x
l
1
2
.
.
.
.
.
. . . .
.
.
.
r
n
r
n
x
n
. . . r
n
x
l
1
n
_
_
_
_
_
_
_
_
_
Q
1,0
Q
1,1
Q
1,2
.
.
.
Q
1,l
1
_
_
is in the nullspace of B.
This, as follows from (5.3), is contained in the space spanned by the columns of
_
_
_
_
_
_
1 x
1
x
2
1
. . . x
l
0
1
1 x
2
x
2
2
. . . x
l
0
2
.
.
.
.
.
.
.
.
. . . .
.
.
.
1 x
n
x
2
n
. . . x
l
0
n
_
_
and therefore (5.3) has a solution Q
0
(x).
The above argument holds for any solution Q
1
(x), so in the case where there is a choice
we select the solution of lowest degree.
When Q
1
(x) has been determined, its zeroes
i
1
,
i
2
, . . .
i
t
are found, usually by
testing all q elements of F
q
. Since
He
T
= (S
1
, S
2
, . . . , S
2(l
1
)
),
the error values can then be found by solving the system of equations.
_
_
_
_
_
x
i
1
x
i
2
. . . x
i
t
x
2
i
1
x
2
i
2
. . . x
2
i
t
.
.
.
.
.
.
.
.
.
.
.
.
x
t
i
1
x
t
i
2
. . . x
t
i
t
_
_
_
_
_
_
_
e
i
1
e
i
2
.
.
.
e
i
t
_
_
=
_
_
_
_
_
S
1
S
2
.
.
.
S
t
_
_
(5.5)
By Cramers rule we get
e
i
s
=
x
i
1
x
i
2
. . . x
i
S1
S
1
x
i
S+1
. . . x
i
t
x
2
i
1
x
2
i
2
. . . x
2
i
S1
S
2
x
2
i
S+1
. . . x
2
i
t
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
t
i
1
x
t
i
2
. . . x
t
i
S1
S
t
x
t
i
S+1
. . . x
t
i
t
x
i
1
x
i
2
. . . x
i
S1
x
i
S
x
i
S+1
. . . x
i
t
x
2
i
1
x
2
i
2
. . . x
2
i
S1
x
2
i
S
x
2
i
S+1
. . . x
2
i
t
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
t
i
1
x
t
i
2
. . . x
t
i
S1
x
t
i
S
x
t
i
S+1
. . . x
t
i
t
and therefore using corollaries 5.3.1 and 5.3.2

e
i
S
=
x
i
1
. . . x
i
S1
x
i
S+1
. . . x
i
t
1l<S<j t
_
x
i
j
x
i
l
_
x
i
1
. . . x
i
t
1lj t
_
x
i
j
x
i
l
_
t
r=1
P
(S)
r
S
r
=
t
r=1
P
(S)
r
S
r
P
(S)
(x
i
S
)
(5.6)
We now have the following
Algorithm 5.4.1. Petersons
1
, r
2
, . . . , r
n
)
1. Calculate the syndromes S
i
= r
_
i
_
, i = 1, 2, . . . , n k, where r(x) =
r
n1
x
n1
+ +r
1
x +r
0
.
2. Find the solution Q
1
(x) of lowest degree to the system
_
_
_
_
_
S
1
S
2
. . . S
l
1
+1
S
2
S
3
. . . S
l
1
+2
.
.
.
.
.
. . . .
.
.
.
S
l
1
S
l
1
+1
. . . S
2l
1
_
_
_
_
_
_
_
Q
1,0
Q
1,1
.
.
.
Q
1,l
1
_
_
=
_
_
_
_
_
0
0
.
.
.
0
_
_
3. Find the zeroes of Q
1
(x),
i
1
, . . . ,
i
t
, say.
4. Find the error values by solving the system (5.5) or use the formula (5.6.)
Output: The error vector (e
1
, e
2
, . . . , e
n
)
5.5 Problems 61
Example 5.4.1. (Example 5.2.1) We use the (10, 5, 6) code over F
11
and receive
r = (5, 9, 0, 9, 0, 1, 0, 7, 0, 5).
With r(x) = 5x
9
+7x
7
+ x
5
+9x
3
+9x +5 we get the syndromes
S
1
= r(2) = 8 , S
2
= r(4) = 8 , S
3
= r(8) = 3 , S
4
= r(5) = 10.
The corresponding system of equations is
_
8 8 3
8 3 10
_
_
_
Q
1,0
Q
1,1
Q
1,2
_
_
=
_
0
0
_
A solution is Q
1
(x) = 10x
2
+3x +7 with zeroes 8(= 2
3
) and 6(= 2
9
), so the error polynomial
has the form bx
9
+ax
3
. To nd a and b we get from the equation He
T
= S the two equations
8a+6b = 8 and 9a+3b = 8 which give a = 3 and b = 1 so the error polynomial is x
9
+3x
5
and
therefore the codeword is c(x) = w(x) e(x) = 4x
9
+7x
7
+x
5
+6x
3
+9x +5 corresponding
to the result obtained in Example 5.2.1.
5.5 Problems
Problem 5.5.1
1) Find two binary codes of length 4 that satisfy Theorem 5.1.1 with equality.
2) Find a (4, 2) ternary code that satises Theorem 5.1.1 with equality.
Problem 5.5.2 Consider a (6, 4) Reed-Solomon code over F
7
where x
i
= 3
i1
.
1) Find a generator matrix for the code using the polynomials 1, x, x
2
and x
3
.
2) Find a generator matrix in systematic form. Which polynomials generate the rows?
3) What is the minimum distance?
4) Find a codeword from f (x) = x
3
+ x and add 2 to position 3.
5) What are the degrees of Q
0
(x) and Q
1
(x) and how many coefcients are there in Q(x, y)?
6) Use the decoding algorithm of Section 5.2 to correct the error (Algorithm 5.2.1).
7) Find a parity check matrix of the code.
8) How are single errors corrected from the two syndromes?
Problem 5.5.3 Consider a (7, 3) Reed-Solomon code over F
8
with x
i
=
i1
where is a
primitive element.
1) What is the codeword corresponding to f (x) = x
2
+ x?
2) Add a new position by including x
8
= 0. What are the parameters of the new code?
3) Add two errors to the codeword from 1) and use the decoding algorithm from Section 5.2.
4) Find a parity check matrix for the code.
5) How can you tell from the syndrome whether a received word contains one or two errors?
Problem 5.5.4
1) Find the error locator for the received word in problem 5.5.3 3) using Petersons algorithm
(Algorithm 5.4.1).
2) Check that the right positions are roots.
3) Calculate the error values.
Problem 5.5.5 Let be a primitive element of F
16
satisfying
4
+ +1 = 0 and consider the
(15, 9, 7) Reed-Solomon code over F
16
determined by x
i
=
i1
i = 1, 2, . . . , 15.
1) Determine a generator matrix for this code.
2) Encode the information sequence 1, 1, 1, 1, 1, 1, 1, 1, 1.
3) Determine a parity check matrix for C.
4) Decode the received word (
4
, ,
2
,
3
, ,
5
,
6
,
7
, ,
9
,
10
,
11
,
12
,
13
,
14
) us-
ing Petersons algorithm.
Problem 5.5.6
1) Show that 3 is a primitive element of F
17
.
Consider a Reed-Solomon code over F
17
with x
i
= 3
i1
i = 1, . . . , 16, where we encode
i (x) = i
9
x
9
+ +i
1
x +i
0
.
2) What are the parameters of this code?
3) If we include 0 as a point and take as codewords
(i (0), i (x
1
), . . . , i (x
16
))
what are then the parameters of this code?
Add the information symbol i
9
as an extra code position. (One may think of this as i ()
why?)
4) If i
9
= 0 how many other positions can be 0? Find the parameters of this code.
We will use the code from 3) to correct three errors.
5) What are the coefcients of the interpolating polynomial Q(x, y), where y = r(x) is the
received word? Set up the equations to determine Q(x, y) assuming i (x) = 0. Try also
i (x) = 1.
Chapter 6
Cyclic Codes
In this chapter we will introduce a special, but still important, class of codes that have
a nice mathematical structure. In particular it turns out to be easy to estimate the min-
imum distance of these codes.
6.1 Introduction to cyclic codes
Denition 6.1.1. An (n, k) linear code C over F
q
is called cyclic if any cyclic shift of a
codeword is again a codeword, i.e. if
c = (c
0
, c
1
, . . . , c
n1
) C =c = (c
n1
, c
0
, . . . , c
n2
) C.
Example 6.1.1. The (7, 3) code over F
2
that consists of the codewords
(0, 0, 0, 0, 0, 0, 0), (1, 0, 1, 1, 1, 0, 0), (0, 1, 0, 1, 1, 1, 0), (0, 0, 1, 0, 1, 1, 1),
(1, 0, 0, 1, 0, 1, 1), (1, 1, 0, 0, 1, 0, 1), (1, 1, 1, 0, 0, 1, 0), (0, 1, 1, 1, 0, 0, 1)
can be seen to be a cyclic code.
The properties of cyclic codes are more easily understood if we treat words as polyno-
mials in F
q
[x]. This means that if (a
0
, a
1
, . . . , a
n1
)
n
F
q
we associate the polyno-
mial a(x) = a
n1
x
n1
+ +a
1
x +a
0
F
q
[x].
In the following we will not distinguish between codewords and codepolynomials.
The rst observation is
Lemma 6.1.1. If
c(x) = c
n1
x
n1
+ +c
1
x +c
0
and c(x) = c
n2
x
n1
+ +c
0
x +c
n1
then
c(x) = xc(x) c
n1
(x
n
1)
64 Cyclic Codes
The lemma is proved by direct calculation.
Theorem 6.1.1. Let C be a cyclic (n, k) code over F
q
and let g(x) be the monic poly-
nomial of lowest degree in C\{0}.
Then
1. g(x) divides c(x) for every c C.
2. g(x) divides x
n
1 in F
q
[x].
3. k = n deg(g(x)).
We rst note that g(x) is uniquely determined, since if there were two, their difference
(since the code is linear) would have lower degree and would be a codeword.
Proof of the theorem.
It is clear from the denition of g(x) that k n deg(g(x)), since there are only
n deg(g(x)) positions left.
If g(x) has degree s, then g(x) = g
s1
x
s1
+ +g
1
x +g
0
+x
s
so from Lemma 6.1.1
we get that x
j
g(x) is in C if j n 1 s. Therefore a(x)g(x) where deg(a(x))
n 1 s are also codewords of C.
It is also easy to see that x
j
g(x) where j n 1 s are linearly independent code-
words of C, so k n s, and we then have k = n s, proving 3.
To prove 1, suppose c(x) C then c(x) = a(x)g(x) + r(x) where deg(r(x)) <
deg(g(x)). Since deg(a(x)) n 1 s we have that a(x)g(x) is a codeword and
therefore that r(x) = c(x)a(x)g(x) is also in the code. Since deg(r(x)) < deg(g(x))
this implies that r(x) = 0 and therefore g(x) divides c(x), and 1 is proved.
2 follows directly from the lemma since g(x) divides c(x) and also c(x).
The polynomial g(x) in the theorem is called the generator polynomial for the cyclic
code C.
Example 6.1.2. (Example 6.1.1 continued) We see that g(x) = x
4
+ x
3
+ x
2
+ 1 and that
the codewords all have the form (a
2
x
2
+ a
1
x + a
0
)g(x) where a
i
F
2
and that x
7
1 =
(x
4
+ x
3
+ x
2
+1)(x
3
+ x
2
+1).
So to a cyclic code corresponds a divisor of x
n
1 and a natural question is therefore
if there to any divisor of x
n
1 corresponds a cyclic code. We answer that in the
afrmative in
Theorem 6.1.2. Suppose g(x) F
q
[x] is monic and divides x
n
1.
Then C = {i (x)g(x)|i (x) F
q
[x], deg(i (x)) < n deg(g(x))} is a cyclic code with
generator polynomial g(x).
Proof. It is obvious that C is a linear code, and if it is cyclic the generator polynomial is
g(x) and hence the dimension is ndeg(g(x)) so we only have to prove that C is cyclic.
6.2 Generator- and parity check matrices of cyclic codes 65
To this end let g(x) = g
s1
x
s1
+ + g
1
x + g
0
+ x
s
and h(x) =
(x
n
1)
g(x)
=
x
ns
+h
ns1
x
ns1
+ +h
1
x +h
0
.
Let c(x) = i (x)g(x) where deg(i (x)) < n s, then
c(x) = xc(x)c
n1
(x
n
1) = xi (x)g(x)c
n1
h(x)g(x) = (xi (x)c
n1
h(x))g(x).
Now c
n1
= i
ns1
so indeed (xi (x) c
n1
h(x)) has deg < n s and therefore c(x)
is also in C.
The two theorems combined tell us that we can study cyclic codes by studying the di-
visors of x
n
1. In the case q = 2 and n odd we gave a method for nding divisors of
x
n
1 in Section 2.3.
Example 6.1.3. Binary cyclic codes of length 21
Using the algorithm of Section 2.3 we have
x
21
1 = (x1)(x
6
+x
4
+x
2
+x+1)(x
3
+x
2
+1)(x
6
+x
5
+x
4
+x
2
+1)(x
2
+x+1)(x
3
+x+1).
With g(x) = (x
6
+ x
4
+ x
2
+ x + 1)(x
3
+ x
2
+ 1) we get a (21, 12) binary code. With
g
1
(x) = (x
6
+ x
4
+ x
2
+ x +1)(x
3
+ x +1) we also get a (21, 12) code.
6.2 Generator- and parity check matrices
of cyclic codes
Let C be an (n, k) cyclic code over F
q
. As proved in Section 6.1 the code C has a gen-
erator polynomial g(x) of degree n k, that is g(x) = x
nk
+g
nk1
x
nk1
+ +
g
1
x + g
0
and we also saw that x
j
g(x), j = 0, 1, . . . , k 1 gave linearly independent
codewords. This means that a generatormatrix of C is
G =
_
_
_
_
_
g
0
g
1
g
2
. . .
0 g
0
g
1
. . .
0 0 g
0
. . .
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
So G has as its rst row the coefcients of g(x) and the remaining k 1 rows are
obtained as cyclic shifts.
To get a parity check matrix we observe that g(x)h(x) = x
n
1, since h(x) was dened
exactly in this way and, if c(x) = i (x)g(x), we get that c(x)h(x) = i (x)g(x)h(x) =
i (x)(x
n
1) so the polynomial c(x)h(x) does not contain any terms of degrees k, k +
1, . . . , n 1 and therefore
n1
i=0
c
i
h
j i
= 0 for j = k, k +1, . . . , n 1, where h
s
= 0
if s < 0.
From this we get that the vectors
(h
k
, h
k1
, . . . , h
0
, 0, . . . , 0), . . . , (0, . . . , h
k
, h
k1
, . . . , h
0
)
66 Cyclic Codes
give n k independent parity check equations so a parity check matrix is
H =
_
_
_
_
_
h
k
h
k1
h
k2
. . .
0 h
k
h
k1
. . .
0 0 h
k
. . .
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
So H has as its rst row the coefcients of h(x) in reverse order and the remaining
n k 1 rows are cyclic shifts of the rst row. Therefore we have
Theorem 6.2.1. If C is an (n, k) cyclic code with generator polynomial g(x), then the
dual code C
is also cyclic and has generator polynomial g
(x) = h
0
x
k
+ +
h
k1
x +h
k
= x
k
h(x
1
) where h(x) =
x
n
1
g(x)
Example 6.2.1. For the code considered in Example 6.1.1 we get
G =
_
_
1 0 1 1 1 0 0
0 1 0 1 1 1 0
0 0 1 0 1 1 1
_
_
and
H =
_
_
_
_
1 1 0 1 0 0 0
0 1 1 0 1 0 0
0 0 1 1 0 1 0
0 0 0 1 1 0 1
_
_
_
_
6.3 A theorem on the minimum distance of cyclic codes
In this section we prove a lower bound on the minimum distance of a cyclic code (the
so-called BCH-bound).
Theorem 6.3.1. Let g(x) be the generator polynomial of a cyclic (n, k) code C over
F
q
and suppose that g(x) has among its zeroes
a
,
a+1
, . . . ,
a+d2
where
m
F
q
has order n.
Then d
min
(C) d.
Proof.
H =
_
_
_
_
_
_
1
a
2a
. . .
a(n1)
.
.
.
.
.
.
.
.
. . . .
.
.
.
.
.
.
.
.
.
.
.
. . . .
.
.
.
1
a+d2
2(a+d2)
. . .
(n1)(a+d2)
_
_
_
_
_
_
6.4 Cyclic Reed-Solomon codes and BCH-codes 67
is a parity check matrix of C (properly interpreted) and if we take the determinant of
the matrix consisting of d 1 coloumns of H we get:
ai
1
ai
2
. . .
a(i
d1
)
.
.
.
.
.
. . . .
.
.
.
.
.
.
.
.
. . . .
.
.
.
(a+d2)i
1
(a+d2)i
2
. . .
(a+d2)(i
d1
)
x
a
1
. . . x
a
d1
.
.
. . . .
.
.
.
x
a+d2
1
. . . x
a+d2
d1
where x
j
=
i
j
. This is then
x
a
1
. . . x
a
d1
1 . . . 1
x
1
. . . x
d1
.
.
. . . .
.
.
.
x
d2
1
. . . x
d2
d1
so from Theorem 5.3.1 we get

x
a
1
. . . x
a
d1
d1
i>j
(x
i
x
j
) = 0
So any d 1 columns are linearly independent and the result follows.
The (21, 12) code with g(x) = (x
6
+x
4
+x
2
+x +1)(x
3
+x
2
+1) = m
(x)m
3 (x) = x
9
+
x
8
+x
7
+x
5
+x
4
+x +1 has minimum distance at least 5, since the generator has ,
2
,
3
,
4
among its zeroes. Since c(x) = (x
2
+ x + 1)g(x) = x
11
+ x
9
+ x
4
+ x
3
+ 1 is a codeword
the minimum distance is 5. This is actually best possible. For the (21, 12) code with generator
g
1
(x) = m
(x)m
9 (x) the bound only gives d

min
3, but the true minimumdistance is again 5.
6.4 Cyclic Reed-Solomon codes and BCH-codes
In this section we shall see that a large class of the Reed-Solomon codes as dened in
Chapter 5 are cyclic codes and we will determine their generator polynomials. We will
also consider (in the case where q = 2
m
), the subcode consisting of the binary words
of the original code and show that this is also a cyclic code with a generator polyno-
mial that is easy to determine. It then follows from the denition that indeed we have
a lower bound on the minimum distance. We shall also indicate how the Petersons
decoding algorithm works for decoding these binary codes.
68 Cyclic Codes
6.4.1 Cyclic Reed-Solomon codes
Let be a primitive element of F
q
, let n be a divisor of q 1 and let =
q1
n
. Let
C
s
the Reed-Solomon code with x
i
=
(i1)
, i = 1, 2, . . . , n , obtained by evaluating
polynomials in P
s
= { f (x) F
q
[x]| deg( f (x)) < s}.
Theorem 6.4.1. C
s
is a cyclic (n, s) code over F
q
with generator polynomial
g(x) = (x )(x
2
) (x
ns
)
Proof. The code is cyclic since if c = ( f (
0
), f (), . . . , f (
(n1)
)), then
c = ( f (
(n1)
), f (
0
), f (), . . . , f (
(n2)
) , so if we dene f
1
(x) = f (
1
x), then
c = ( f
1
(
0
), f
1
(), . . . , f
1
(
(n1)
)) , since
n
= 1.
It follows from section 6.3 that a parity check matrix for the code C
s
is
H =
_
_
_
_
_
1 . . .
(n1)
1
2
. . .
2(n1)
.
.
.
.
.
. . . .
.
.
.
1
ns
. . . (
ns
)
n1
_
_
That means that the generator polynomial has zeroes ,
2
, . . . ,
ns
so therefore the
generator polynomial of this Reed-Solomon code is
g(x) = (x )(x
2
) . . . (x
ns
).
6.4.2 BCH-codes
In the special case where q = 2
m
we look at the the binary words in the Reed-Solomon
codes C
s
, which we call C
s
(sub) this is the so-called subeld subcode.
Theorem 6.4.2. The code C
s
(sub) is a linear cyclic code whose generator polynomial
is the product of the different minimal polynomials of ,
2
, . . . ,
ns
. The code has
minimum distance at least n s +1.
Proof. The code is linear since the sum of two codewords again is in the code and
cyclic since the Reed-Solomon code is. The generator polynomial has among its ze-
roes ,
2
, . . . ,
ns
and the binary polynomial of lowest degree with these zeroes is
exactly the product of the different minimal polynomials of ,
2
, . . . ,
ns
. Since
the codewords are still words from the Reed-Solomon code the minimum distance is at
least n s +1.
In this case we can omit all the even powers of , since if is a zero of a binary
polynomial so is
2
. If we choose a basis for F
2
m
as a vectorspace over F
2
and and
replace the powers of in the parity check matrix for the Reed-Solomon code with
binary m-columns (the coordinates) we get a parity check matrix (with possibly linearly
6.4 Cyclic Reed-Solomon codes and BCH-codes 69
dependent rows) for the subeld subcode. From this follows that the dimension of
C
s
(sub) is at least n m
_
ns
2
_
. Usually this a very weak bound; in specic cases
where we have the generator polynomial g(x) of course there is no problem since then
k = n deg(g(x)). The true minimum distance of these codes can be hard to nd,
however in many cases the bound actually gives the truth. With n = 2
m
1 and d
min
=
2t +1 we get n k t log(n). For low rates this is very close to the Hamming bound.
The codes obtained in this way are called (narrow sense) BCH-codes after Bose,
Chaudhuri and Hocquenghem who rst considered them. One of the nice things is
that you can guarantee (a lower bound on) the minimum distance of the code, and it
was exactly this that originally was the rationale for the construction.
The Peterson decoding algorithm can be used to decode these codes as well, as we are
going to illustrate.
Example 6.4.1. Decoding of a (15, 5, 7) BCH-code
This code corresponds to the case where n = 15, s = 9 and q = 2
4
, so the generator polynomial
has among its roots ,
2
, . . . ,
6
and in this case = , where is a primitive element of F
16
.
Since m
(x) = m
2 (x) = m
4 (x) and m
3 (x) = m
6 (x) we get that the generator polynomial

is m
(x)m
3 (x)m
5 (x) = x
10
+x
9
+x
8
+x
6
+x
5
+x
2
+1 where we have used the table to nd
the minimal polynomials. The dimension of the code is therefore 1510 = 5 and since the gen-
erator has weight 7 and we know that the minimum distance is at least 7, we have that d
min
= 7.
In the following we use the version of F
16
that has a primitive element which satises
4
+
3
+
1 = 0. Actually we already used that in the determination of the minimal polynomials above.
If we receive the word r(x) = x
5
+ x
4
+ x
3
+ x we rst calculate the syndromes.
We get S
1
= r() =
3
, S
2
= r(
2
) = S
1
2
=
6
, S
3
= r(
3
) =
6
, S
4
= r(
4
) = S
2
2
=
12
, S
5
= r(
5
) =
5
and S
6
= r(
6
) = S
3
2
=
12
.
We now solve the system
_
_
S
1
S
2
S
3
S
4
S
2
S
3
S
4
S
5
S
3
S
4
S
5
S
6
_
_
_
_
_
_
Q
1,0
Q
1,1
Q
1,2
Q
1,3
_
_
=
_
_
0
0
0
_
_
and get:
Q
1,0
=
7
, Q
1,1
=
8
, Q
1,2
=
3
, Q
1,3
= 1
and therefore Q
1
(x) = x
3
+
3
x
2
+
8
x +
7
that has zeroes
0
,
10
and
12
.
From this we get as codeword c(x) = r(x) +x
12
+x
10
+1 = x
12
+x
10
+x
5
+x
4
+x
3
+x +1.
It turns out that c(x) = (x
2
+ x +1)g(x).
70 Cyclic Codes
6.5 Problems
Problem 6.5.1 g(x) = x
6
+ x
3
+1 divides x
9
1 in F
2
[x].
1) Show this.
g(x) can be used as generator polynomial of a binary cyclic (9, k) code C i.e.
C = {i (x)g(x)|i (x) F
2
[x], deg(i (x)) < 3}
3) Determine a generator matrix of C.
4) Is x
8
+ x
6
+ x
5
+ x
3
+ x
2
+1 a codeword of C?
5) What can you say about the minimum distance of C?
Problem 6.5.2 The polynomial x
15
1 can be factored into irreducible polynomials over F
2
in
the following way:
x
15
1 = (x +1)(x
2
+ x +1)(x
4
+ x +1)(x
4
+ x
3
+1)(x
4
+ x
3
+ x
2
+ x +1)
Let C be the binary cyclic code of length 15 that has generator polynomial
g(x) = (x +1)(x
4
+ x +1)
2) Is x
14
+ x
12
+ x
8
+ x
4
+ x +1 a codeword in C?
3) Determine all cyclic binary (15, 8) codes.
4) How many cyclic binary codes of length 15 are there?
Problem 6.5.3 Let C be the cyclic code of length 15 that has generator polynomial g(x) =
x
4
+ x +1.
1) Determine a paritycheck matrix of C.
2) Find the minimum distance of C.
3) What is C?
4) What is the dimension of C
?
Problem 6.5.4 Let g(x) be the generator polynomial of a cyclic code of length n over F
2
.
Show that g(1) = 0 if and only if all codewords have even weight.
Problem 6.5.5 Let C be a binary cyclic code of odd length n.
Show that if C contains a word of odd weight then it contains the all 1s word i.e. (1, 1, 1, 1 . . . , 1)
Problem 6.5.6 Let C be the cyclic (21, 12) code over F
2
that has generator polynomial g(x) =
m
(x)m
3 (x).
2) Determine the generator polynomial of C
.
3) What can you say about the minimum distance of C
?
6.5 Problems 71
Problem 6.5.7 Let C be the cyclic code of length 63 over F
2
that has generator polynomial
(x +1)(x
6
+ x +1)(x
6
+ x
5
+1).
What can you say about the minimum distance of C ?
Problem 6.5.8
1) Determine a generator polynomial for a cyclic code of length 31 over F
2
that can correct 3
errors.
2) What is the dimension of the code you found?
3) Can you do better?
Problem 6.5.9 Let C be a binary (n, k) code. Recall that an encoding rule is called systematic
if the information (i
0
, i
1
, . . . , i
k1
) is encoded into a codeword (c
0
, c
1
, . . . , c
n1
) that contains
(i
0
, i
1
, . . . , i
k1
) in k xed positions.
A cyclic (n, k) code consists of all the words of the form i (x)g(x) where i (x) has degree < k
and g(x) is the generator polynomial of the code.
1) Show that the encoding rule i (x) i (x)g(x) is not systematic by looking at the (7, 4) code
with generator polynomial x
3
+ x +1.
We now use the rule i (x) x
nk
i (x) (x
nk
i (x) mod g(x)).
2) Show that x
nk
i (x) (x
nk
i (x) mod g(x)) is a codeword in the cyclic code with generator
g(x).
3) Prove that this is a sytematic encoder.
4) Use the same code as above to encode 1011 using the systematic encoder.
Problem 6.5.10 Let C be a cyclic (n, k) code with generator polynomial g(x).
Let h(x) =
x
n
1
g(x)
and let a(x) be a polynomial of degree < k and gcd(h(x), a(x)) = 1.
Let C
= {c(x) = i (x)a(x)g(x) mod (x

n
1)| deg(i (x)) < k}.
Show that C
= C.
Problem 6.5.11 x
23
1 = (x +1)m
(x)m
5 (x) where m
(x) = 1 = x
11
+ x
9
+ x
7
+ x
6
+
x
5
+ x +1.
Let C be the (23, 12) code that has generator polynomial m
(x).
1) Show that C has minimum distance at least 5.
Let C
ext
be the (24, 12) code obtained by adding an overall parity check.
2) Show that the weights of all the codewords in C
ext
are divisible by 4.
3) Show that C has minimum distance 7.
4) Show that C is a perfect code.
This is the famous binary Golay code.
Problem 6.5.12 Let C be a cyclic (n, k) code whose generator polynomial g(x) is the product
of the different minimalpolynomials of the elements ,
2
,
3
, . . . ,
2t
, where F
2
m has
order n.
We then know that d
min
(C) 2t +1.
Consider the case where n = a(2t +1).
72 Cyclic Codes
1) Show that ,
2
,
3
, . . . ,
2t
are not zeroes of x
a
1.
2) Show that m
s (x), s = 1, 2, . . . , 2t does not divide x

a
1.
3) Show that gcd(g(x), x
a
1) = 1.
4) Show that p(x) =
x
n
1
x
a
1
is a binary polynomial.
5) Show that g(x)| p(x).
6) What is the weight of p(x)?
7) Show that the minimum distance of C is 2t +1.
Problem 6.5.13 We have
x
31
1 =(x +1)(x
5
+ x
2
+1)(x
5
+ x
3
+1)(x
5
+ x
4
+ x
3
+ x
2
+1)
(x
5
+ x
3
+ x
2
+ x +1)(x
5
+ x
4
+ x
2
+ x +1)(x
5
+ x
4
+ x
3
+ x +1).
1) How many binary cyclic codes of length 31 are there?
2) Is there one of these with dimension 10 ?
Let C be the code with generator polynomial g(x) = (x +1)(x
5
+ x
2
+1).
3) Is x
7
+ x
5
+ x
4
+ x
2
+1 a codeword of C?
4) Is x
7
+ x
5
+ x
4
+1 a codeword of C?
Problem 6.5.14 Determine the generator polynomials and the dimension of the cyclic codes of
length 63 that corrects 4, 5, 9, 10, 11 and 15 errors.
Problem 6.5.15 Show that there exists cyclic (n, k, d
min
) binary codes with the following pa-
rameters:
(21, 12, 5) , (73, 45, 10) , (127, 71, 19).
Problem 6.5.16 When we construct binary cyclic codes of length n we use minimal polynomi-
als for powers of an element of order n. It is a natural question if the choice of this element
is important. This problem treats the question in a special case.
Let n be an odd number and j a number such that gcd(n, j ) = 1.
1) Show that the mapping : {0, 1, . . . , n 1} {0, 1, . . . , n 1} dened by (x) = x j
mod n is a permutation i.e. a bijective mapping.
Let C
1
be the cyclic code that has m
(x) as generator polynomial, and let C

2
be the code that
has m
j (x) as generator polynomial.

2) Show that (c
0
, c
1
, . . . , c
n1
) C
1
(c
(0)
, c
(1)
, . . . , c
(n1)
) C
2
.
3) What does that mean for the two codes?
Chapter 7
Frames
So far we have considered transmission of independent symbols and block codes. How-
ever, in order to assure correct handling and interpretation, the data is usually organized
in les with a certain structure. In a communication system the receiver also needs
structure in order to decode and further process the received sequence correctly. A
general discussion of the architecture of communication systems is outside the scope
of this text. However, we can capture some essential aspects of the discussion by as-
suming that the information is transmitted in frames of some xed length. In some
practical systems the frames are already part of the data structure, in other cases the
data is segmented to facilitate the transmission.
It is often desirable to express the performance of the error-correcting code in terms of
the quality of the frames. We dene some commonly used parameters for frame quality
and relate them to the error probability of the codes. If it were possible to construct
good long block codes and decode them with the error probability indicated in Chap-
ter 3, the required quality could clearly be reached. However, at this time there is no
known practical method for achieving this performance. Thus we have to study coding
methods that give a satisfactory quality using short codes or simple codes with sub-
optimal error-correcting properties. In many cases a combination of two or more cod-
ing steps is used, and the combined construction is adapted to the length of the frame.
7.1 Denitions of frames and their efciency
Denition 7.1.1. A data frame is an entity of data that can be independently stored,
communicated, and interpreted. It consists of a header, a data eld, and a parity eld.
We assume that the data eld consists of a xed number of source symbols, although
some systems use frames of variable length identied in the header. When information
is transmitted over a channel, it is encoded using a suitable error-correcting code. The
transfer frame consists of a xed number of channel symbols, which in general belong
to a different alphabet.
74 Frames
Denition 7.1.2. A transfer frame is an entity of encoded information transmitted over
the channel. It consists of a header (possibly separately encoded) and a body.
The headers identify the beginning of the frame, a function which is referred to as
frame synchronization. We note that in general the two kinds of frames are different,
and in some systems they are handled independently, they have separate headers, and
they may have different lengths and boundaries.
The data frame header may contain information relevant to the interpretation of the
contents like a frame number, some identication of the contents, and possibly point-
ers for unpacking. In addition the transfer frame header contains information about the
routing including the receiver address. Thus it serves a function similar to the label
on a physical parcel. In the present context we shall not discuss the security of the
information i.e. the cryptographic coding and signatures.
In the following discussion we usually refer to a binary channel, and we assume that
the two stages of framing have been combined. Thus we shall make references to the
frame header, but the structure will not be discussed in detail.
The data eld consists of K
f
binary data symbols. It is in general protected by a parity
eld of B
f
bits, which are parity checks that are added to allow the user to verify the
integrity of the frame. The data frame is then encoded using a channel code of rate
R. The header may be encoded by the same code, or it may be treated differently to
simplify the processing. We let H
f
indicate the total number of channel symbols used
for transmitting the headers.
The addition of header information reduces the efciency of the communication. Thus
it may be relevant to measure this overhead by the following quantity:
Denition 7.1.3. The transmission efciency, , is the ratio of channel symbols repre-
senting the data and parity elds to the total number of transmitted channel symbols.
The transmission overhead is 1 .
It follows from the denition that we have
Lemma 7.1.1.
1 =
H
f
K
f
+ B
f
+ H
f
As discussed in Chapter 4, we can achieve reliable communication of K information
symbols by transmitting N channel symbols provided that
K
N
is less than the channel
capacity, C. However, for a given frame length we have to reduce the rate to get a
desired reliability.
Denition 7.1.4. The information efciency is the ratio =
R
C
, and similarly the infor-
mation loss is 1
R
C
.
7.1 Denitions of frames and their efciency 75
In terms of the parameters of the frame we have:
Lemma 7.1.2.
=
K
f
_
K
f
+ B
f
_
C
A high efciency can only be obtained with long frames. The transmission overhead
1 can be reduced by using a frame that is long compared to the header. We may
assume the length of the header to be almost independent of the frame length, but var-
ious practical constraints limit the frame length. Since we assume that the frames are
processed independently, the length of the frame limits the information efciency. We
would get the best reliability by encoding a frame using a good block code of length N,
and when the frame length is given, there is a maximal value of that allows a desired
performance to be reached.
In Example 7.1.1 we demonstrate that at least in principle a small probability of decod-
ing failure is possible for a realistic frame size and a low transmission overhead.
Example 7.1.1. Frames on a BSC
If N = 8192 binary symbols and the error probability on the channel is p =
1
64
, we have on the
average 128 errors. Assuming that the errors are independent, the probability of having j errors
is given by (3.2) or (3.3). If we could use a code to correct 192 errors with bounded distance
decoding, (3.4) gives
P
fail
= 5 10
8
.
From (4.3) we nd the capacity of the BSC with p = 1/64 to be C = 0.884. As discussed
in relation to Lemma 4.2.2, the Hamming bound is really based on the same calculation as the
capacity. Since we want to correct a fractional number of errors equal to
192
8192
=
3
128
, we need a
a code of rate
R 1 H
_
3
128
_
= 0.84
It should be noted that the minimum distance of a code of length 8192 and rate 0.84 is proba-
bly only about 194 (by the Gilbert bound, Theorem 1.2.2). However, the following calculation
shows that the error probability is not increased much by the low weight codewords. Using the
approximation to the weight distribution (4.7) we nd the expected number of codewords of
weight 200 to be about 1012. The contribution to the error probability estimated by the union
bound (3.7) is vanishing. This example shows that for realistic frame sizes we can have a fairly
high information efciency,
=
0.84
0.884
= 0.95
and still have a low probability of error. However, a block code with these parameters may not
be practical.
76 Frames
7.2 Frame quality
7.2.1 Measures of quality
In a communication network, the transmission of a frame can go wrong in a number
of ways. The address may be corrupted, or the package may be lost or delayed. It
is the purpose of the error-correction to ensure that the frame is delivered to the user
without errors, or that transmission errors are at least detected. The performance can
be characterized by two numbers:
Denition 7.2.1. The probability of undetected error, P(ue), is the probability that a
frame is passed on to the user as error-free while in fact parts of the data are in error.
Denition 7.2.2. The probability of frame error, P(fe), is the probability that the receiver
detects errors in the data, but is not able to correct them, or at least is not able to do
so with sufcient reliability.
In some applications the receiver may discard unreliable frames (and the user may be
able to have them retransmitted). In other cases it is preferable to get a frame even
though it is slightly corrupted. In our discussion we shall assume that frames with
detected errors are discarded, and we shall analyze the probability of this event, the
probability of frame error. It should clearly be small enough to allow normal commu-
nication to take place, but it does not have to be extremely small (10
5
may be a typical
target value). On the other hand the probability of undetected error should be much
smaller (< 10
10
, although for voice communication there is no such requirement).
7.2.2 Parity checks on frames
If there are few errors, it may be sufcient to include a system of parity checks in the
frame structure. Often cyclic versions of Hamming codes or subcodes of Hamming
codes are used for this purpose, and we shall discuss only this case.
Lemma 7.2.1. A cyclic binary code dened by the generator polynomial
g(x) = p(x)(x +1)
where p(x) is a primitive polynomial of degree m has parameters
(n, k, d) = (2
m
1, 2
m
m 2, 4)
The maximumnumber of information symbols is 2
m
m2, but the code can be short-
ened by leaving out leading information bits. The m + 1 parity bits are known as the
frame check sequence or the CRC (cyclic redundancy check). Usually the frame check
sequence is not used for error correction, but errors in the transmission are detected. If
more than two errors occur, we may estimate the probability that the frame is accepted
as the probability that a random sequence produces the zero frame check sequence, i.e.
P(ue) P[t 2] 2
m1
(7.1)
Thus the length of the sequence is chosen to give a sufcient frame length and an
acceptable reliability.
7.3 Error detection and error correction 77
Example 7.2.1. Standard CRCs
The following two polynomials used for standard CRCs:
x
16
+ x
12
+ x
5
+1
= (x +1)(x
15
+ x
14
+ x
13
+ x
12
+ x
4
+ x
3
+ x
2
+ x +1) CRC-ITU-T
x
16
+ x
15
+ x
2
+1 = (x +1)(x
15
+ x +1) CRC-ANSI
Both polynomials are products of x +1 and irreducible polynomials of degree 15 and
period 32767. Thus they generate distance 4 subcodes of Hamming codes, and all
combinations of at most three errors in at most 32751 data bits are detected.
7.3 Error detection and error correction
In this section we discuss the performance of different codes in relation to commu-
nication of frames. The discussion also serves as motivation for the various coding
structures introduced in subsequent chapters.
7.3.1 Short block codes
For long frames and channels with a signicant error probability, p, we cannot simply
use a good block code and correct t = Np errors. There is no known method for
solving such a decoding problem with a realistic amount of computation. Therefore
we have to use codes and decoding methods that require less computation. A frame of
length N may be broken into m codewords from a (n, k) code. Here we assume that k
is chosen to divide the number of information symbols in the frame.
In Chapter 3 we presented bounds on the error probability for block codes. Assume
that based on a bound or on simulations of the particular code we know the probability
of decoding error, P(e).
Lemma 7.3.1. If the frame consists of
N
n
blocks, and the noise is independent, the prob-
ability of a correct frame is
1 P(ue) =
_
1 P(e)
_ N
n
(7.2)
Thus we can get a very small probability of undetected errors only if the error rate on
the channel is very small, and the system is not very efcient. In order to improve the
performance we may restrict the number of errors corrected in each block. However,
with a probability of decoding failure, P( f ), we similarly get a bigger probability of
frame error.
Lemma 7.3.2. If the frame consists of
N
n
blocks, and the noise is independent, the prob-
ability of frame error is
P(fe) = 1
_
1 P( f )
_ N
n
(7.3)
78 Frames
This may be the best approach if data is transmitted in short frames. If separate en-
coding of the header eld is used, it may also be preferable to only correct a limited
number of errors. However, for the situation of main interest here, it is more efcient
to combine error correction with a frame check sequence as discussed in the previous
section. By using 16 or 32 bits for this purpose it is possible to make the probability
of undetected error very small, and we can still use the full power of the short block
code for error correction. If there are several equally probable transmitted words for a
particular received block, we may choose to leave it as received.
In the degraded frames some blocks are either not decoded or decoded to a wrong
word. We may calculate (a close approximation to) the bit error probability by as-
suming that the number of bit errors is
d
2
for a block that is not decoded and d for a
wrong word. These bits are equally likely to hit the information symbols and the parity
symbols in the code, so the number should be multiplied by the rate, R. Clearly it is
an advantage in this situation that the encoding is systematic, since decoding might
otherwise change more information bits. However, the output bit error probability is
not necessarily a good measure of the quality of the received signal when the errors
are concentrated in a few blocks. For some purposes the bits are going to be used as
larger symbols, and a symbol error probability would be more relevant. If the error
probability is used for predicting the performance of subsequent stages of processing,
a more accurate description of the error statistics is also required (an assumption of
independent errors might lead to wrong conclusions). Such a description might again
be a symbol error probability of a combination of bit errors and symbol errors.
Example 7.3.1. Frame reliability of block codes
If a (64, 37) block code is used for the frames in Example 7.1.1, each frame contains 128 blocks.
It follows from the Hamming bound Theorem 1.3.1, that in most cases it is possible to correct 6
errors. Assuming bounded distance decoding with six errors, we nd the probability of decoding
failure from (3.4)
P
fail
= 8 10
5
Thus the average number of blocks decoded to a wrong codeword is 0.01 in each frame. If a
decoding error causes 13 bits to be in error, the average bit error probability would be
P
bit
= 13
0.01
N
= 1.6 10
5
but that is not a satisfactory level.
We could correct fewer errors in each block and reduce the probability of undetected error. There
is a (64, 36) code with minimum distance 12, and we might do minimum distance decoding cor-
recting ve errors. However, in this case the probability of decoding failure may be found from
(3.4) to be
P
fail
= 5 10
4
and Lemma 7.3.2 gives P(fe) = 0.06, which is clearly too high. If we use maximum likelihood
decoding, but apply a 16 bit CRC, the probability of accepting a frame with a false decoding is
reduced by a factor of
2
16
= 1.5 10
5
.
7.3 Error detection and error correction 79
Lemma 7.3.1 gives P(ue) = 0.01, but with the CRC check this is reduced to
P(ue) 1.5 10
7
while the probability of frame error is
P(fe) 10
2
.
The example shows that with a short block code supplemented with a CRC check we can get an
acceptable performance, but the rate,
37
64
= 0.58, is not close to capacity.
The decoder must somehowknow how to segment the received sequence into blocks in
the correct way. In the setting we consider here, the obvious solution to the problem of
block synchronization is to partition the frame starting fromthe sync pattern. However,
block codes, and cyclic codes in particular are quite vulnerable to shifts of the symbol
sequence within the frame. In addition all linear codes have the undesirable feature that
the all 0s word appears. For this reason a xed coset of such a code is sometimes used
rather than the original code. One such method is to add a pseudo random sequence at
the transmitter and the receiver. If the sequence is not a codeword, it may serve to shift
the code to a suitable coset.
7.3.2 Convolutional codes
Another approach to error correction in long frames is convolutional codes. Here parity
checks are inserted in the transmitted stream following a simple periodic pattern. The
encoded string consists of blocks of length n each representing k information symbols,
but (n, k) are usually very small integers, and each information symbol is used in the
encoding of M +1 blocks (M being referred to as the memory of the code).
The theory of convolutional codes and their decoding is developed in Chapters 8 and 9.
We shall adopt the point of view that a convolutional code is a generic structure that
allows the code to be adapted to the frame length in question. Once the frame length
and the method of ending the encoding are specied, we have a long block code. The
encoding gives a minimumdistance, which is independent of the block length, and thus
the error probability is similar to that of a short block code. Although a convolutional
code is encoded as one long block, a decoding error changes only a limited segment.
7.3.3 Reed-Solomon codes
When coding is used on a channel with a signicant bit error probability, most errors
are corrected by a suitably chosen short block code or convolutional code. However, as
mentioned earlier, the probability of decoding error may not be acceptable. The prob-
ability of undetected errors can be reduced by having a frame check sequence as part
of the data.
However, Reed-Solomon codes provide a much more powerful alternative for the par-
ity eld in the data frame. The data (which often consists of bytes or 1632 bit words)
80 Frames
is rst encoded using a Reed-Solomon code over a big eld, most often the eld of 8-
bit symbols, F
2
8. Such a code allows several errors to be corrected, often 8 or 16, but
at the same time the probability of undetected error is low.
The Reed-Solomon codes are used alone on channels with relatively few errors, but
they can also be combined with a binary channel code. Such a two-stage coding sys-
tem is called a concatenated code, and these codes are the subject of Chapter 10.
7.3.4 Low density codes and turbo codes
In some sense the advantage of all suboptimal code structures is that some of the error
correction can be performed on the basis of local parity checks involving a small set
of symbols. A more direct approach is to select a parity check matrix, which is very
sparse; i.e. each row contains only a relatively small number of 1s. Such codes, called
low-density parity check codes and their decoding, are treated in Chapter13.
So-called turbo codes is a related idea based on iteration between two very simple
convolutional codes. The data frame is rst encoded using one encoder, and then the
data are permuted and encoded again. In the decoding the estimated reliabilities of
the information symbols are exchanged between the two decoders, and the decoding is
iterated until they agree. This topic is also treated in Chapter13.
7.4 Problems
Problem 7.4.1 Consider frames consisting of 32 words of 16 bits. The rst word is the frame
header, which combines header information for both the data frame and the transmission frame.
There are used 30 words to transmit 11 data bits protected by the (16,11,4) extended Hamming
code. The last word is a parity check on the frame excluding the header (the mod 2 sum of the
data words).
1) Find the transmission efciency of the frame.
2) The frame is transmitted over a BSC with bit error probability, p = 0.005. Find the informa-
tion efciency.
3) Prove that the last word is also a word in the extended Hamming code.
4) With the given bit error probability, what is the probability that a word
a) Is correct?
b) Contains one error?
c) Contains two errors or another even number, but is not a codeword?
d) Contains an odd number of errors > 1?
e) Is a codeword different from the one sent.
5) The receiver uses the Hamming code to correct an error and detect double errors. If a double
error is detected, the parity word at the end of the frame is used to correct it. What is the
probability that
a) The frame is detected in error (parity word not satised) or there are two or more double
errors?
b) The frame is accepted with errors?
7.4 Problems 81
Problem 7.4.2 Frames are transmitted on a BSC with p =
1
128
. The length of the frame is 4096
bits excluding the unencoded header of the transmission frame.
1) What is the capacity of the channel?
2) Is it possible, in principle, to communicate reliably over the channel using codes of rate
a)
14
15
?
b)
15
16
?
3) Assuming a code rate of R =
3
4
,
a) How many information bits are transmitted?
b) What is the information efciency?
4) If a block code of length 256 is used,
a) What is the average number of bit errors in a block?
b) How many errors, T can be corrected (assuming that this number is found from the Ham-
ming bound)?
5) If T errors cause a decoding failure and T + 1 errors a decoding error, what are the proba-
bilities of these events? The probability distribution for the number of errors is the binomial
distribution, but it may be approximated by the Poisson distribution.
6) If only this code is used, what are the probabilities of frame error and undetected error?
7) A 16 bit CRC is used to reduce the probability of undetected error.
a) Is the sequence long enough?
b) How much is the information efciency reduced ?
Chapter 8
Convolutional codes
While most results in coding theory are related to block codes, more complex structures
are found in many applications. In this chapter we present the denitions and basic
properties of some types of codes, which are collectively referred to as convolutional
codes. The common characteristic of these codes is that the block length is variable, of-
ten it is matched to the frame length, while the code is dened by a local encoding rule.
Even though convolutional codes and block codes have been studied in parallel for
several decades, many aspects of convolutional codes are still not well understood, and
there are several approaches to the theory. Thus the present chapter does not follow
any established standard for terminology and notation.
8.1 Parameters of convolutional codes
Throughout this chapter N denotes the length of the encoded frame and R the rate of
the convolutional code. All codes considered in this chapter are binary.
The essential difference between convolutional codes and block codes is that the frame
length is variable and does not directly enter into the denition of the code. The follow-
ing sequence of denitions will lead to the denition of a convolutional code as a family
of block codes with the same nonzero segment in the rows of the generator matrix.
Let R =
k
n
where k and n are integers, and let N be a multiple of n. Most of the exam-
ples have R =
1
2
, since this case is particularly important and in several ways easier to
analyze than other rates.
Denition 8.1.1. A frame input, u, is a binary vector of RN information symbols. An
information vector, v(u), is a binary vector of length N, where
v
j
=
_
u
i
for j =
_
i
R
_
0 otherwise
Thus the input bits are distributed evenly through the frame. We refer to these positions
as information positions even though the encoding is not necessarily systematic.
Denition 8.1.2. A convolutional encoding of the information vector v(u) by a binary
generating vector, g, where g
0
= g
m
= 1, maps v(u) to the encoded N-vector
y
j
=
m
i=0
g
i
v
j i
(8.1)
where the sum is modulo 2, N > m, and the indices are interpreted modulo N.
We refer to the composition in (8.1) as a cyclic convolution of v and g, and the term
convolutional code refers to this form of the encoding. It is simple to perform the en-
coding as a non-cyclic convolution, and we can make this modication by inserting
mR 0s at the end of the frame input. The reduction of the information efciency is
small when N is much larger than m. In real frames, a known synchronization pattern
in the header may serve the same function.
The rate is often a ratio between very small integers, R =
k
n
. In this case the encoded
sequence can be segmented into blocks of length n, each containing k information
symbols and n k parity symbols. When we write sequences of such blocks, they are
separated by periods in order to simplify the reading.
The encoding rule has been chosen in such a way that a particular information symbol
only affects a short segment of the encoded frame. Later we shall see that an error
similarly only affects a short segment of the syndrome.
Example 8.1.1. A rate
1
2
convolutional code
A code that is often used in examples has R =
1
2
and g = (11.10.11). A codeword consists
of blocks each of two symbols, where the rst can be chosen as an information symbol and
the second as a parity bit. For N = 16 the frame input u = (11001000) is rst converted
into the information vector v = (10.10.00.00.10.00.00.00) and then convolved with g to give
y = (11.01.01.11.11.10.11.00).
Lemma 8.1.1. For a given generating vector, g, the convolutional encoding denes a
linear (N, K) block code with dimension K RN.
Proof. The linearity follows from (8.1), and the dimension is at most equal to the
number of symbols in u.
It is easy to nd examples of generating vectors such that K < RN for some N. How-
ever, the discussion below will show that such codes are usually not interesting. The
following denition serves to eliminate this case.
Denition 8.1.3. An encoding with generating sequence, g, and rate, R, is non-cata-
strophic if all block codes obtained from the convolutional encoding (8.1) with suf-
ciently large N have dimension K = RN.
The encoding of a convolutional code may be expressed in matrix form. The matrix
that corresponds to the encoding rule has a particular form since only a narrow band of
symbols can be nonzero.
8.1 Parameters of convolutional codes 85
Denition 8.1.4. The K by N generator matrix for a convolutional code, G(N) is de-
ned by
y = uG(N)
where y is the encoded vector dened by (8.1).
For the code with R =
1
2
and g = (11.10.11), the generator matrix G(16) is obtained in the
following way: In the rst row of the generator matrix, g is followed by N 6 zeros. Row 2 and
the following rows are obtained as cyclic shifts of the previous row by two positions.
G =
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0
1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1
_
_
Lemma 8.1.2. A generating sequence is non-catastrophic if and only if there is a length
L such that an encoded sequence can contain at most L consecutive 0s when the infor-
mation vector has at least one 1 for every m bits.
Proof. If there could be an arbitrarily long sequence of 0s, the input could be periodic
with some period N
. But this would imply that a nonzero vector was encoded as the
zero word by the length N
code, and thus the rank would be less than RN
. If there
is a length N
code with dimension less than RN
, we can nd a nonzero input that is

encoded as the zero vector. By repeating this input, we can get an arbitrarily long zero
encoded sequence in a long code.
The proof indicates that a catastrophic encoding would have the undesirable property
that a periodic nonzero input would produce only a nite nonzero encoded segment in
the output sequence. This also implies that two inputs could differ in arbitrarily many
positions while the outputs were the same except for a nite number of symbols. For
long frames such an encoding would tend to give many errors in the decoded informa-
tion symbols (catastrophic error-propagation).
Denition 8.1.5. The integer, M = R(m +1) 1, is called the memory of the code.
The parameters of a convolutional code are given as (n, k, M). If R =
k
n
and the length
of g, m+1, is (M+1)n, the encoder stores kM input symbols. More generally we have
Lemma 8.1.3. Every encoded symbol depends on at most the previous M +1 informa-
tion symbols (with the indices taken modulo RN).
Proof. In (8.1) y
j
depends on m + 1 consecutive symbols of v. If the rst of these
is the information symbol u
i
= v
j
, j =
_
i
R
_
, then u
M+i+1
is the symbol v
j
,
j
=
_
i+M+1
R
_

_
i
R
_
+
_
M+1
R
_
j + m + 1. (When A and B are integers and
R < 1, A = BR
_
A
R
_
B). Thus it is outside the range of the summation, which
can therefore include at most M +1 information symbols.
The code considered earlier has parameters (2, 1, 2). The encoding in the example is readily
seen to be non-catastrophic, since a nonzero input cannot produce two consecutive zero output
blocks. The memory of the code is 2. The (16, 8) block code generated by this matrix has
minimum distance 4, since (10.00.10.00.10.00.10.00) is a codeword. All longer block codes
generated by g have minimum distance 5.
The encoding rule (8.1) is slightly too restrictive in the general case. The sequence g
should be dened as a periodic function and allowed to alternate between k generating
vectors. Thus there would be k different types of rows in the generator matrix. Such
codes will be further discussed in Sections 8.3 and 8.5. Note that we do not require k
and n to be mutually prime, and we can introduce a longer period by multiplying k and
n by the same factor. Thus a (4, 2) code indicates that the rows in the generator matrix
of a rate
1
2
code alternates between two different generators.
As discussed in this section, the concept of convolutional codes is closely related to
a particular form of the encoding. We may choose to dene the code as such in the
following (non-standard) way:
Denition 8.1.6. An (n, k, M) convolutional code is the sequence of block codes of
length, N = j n, obtained by the encoding rule of (8.1).
8.2 Tail-biting codes
In our denition of convolutional codes we have used cyclic convolution as the standard
encoding rule. When we want to emphasize that a code is dened in this way, in par-
ticular if a relatively short frame is considered, we refer to such a code as a tail-biting
code. This concept provides a particularly smooth transition from block codes to con-
volutional codes. Short tail-biting codes are useful as alternative descriptions of some
block codes, and they have the same properties as block codes in general, whereas a
long tail-biting code is one of several equivalent ways of dening a convolutional code.
Many good codes with small block lengths have generator matrices of this form. An
equivalent form of the generator matrix for a rate
1
2
code can be obtained by collecting
the even numbered columns into a circulant matrix and the odd numbered columns into
a matrix with the same structure. Such codes are sometimes called quasi-cyclic.
Example 8.2.1. A short tail-biting code
A (16, 8, 5) block code may be described as a rate
1
2
tail-biting code with generator sequence
g = (11.11.00.10). This code has the largest possible minimum distance for a linear (16, 8)
code, although there exists a non-linear (16, 8, 6) code. Longer codes with the same generator
also have minimum distance 5.
8.3 Parity checks and dual codes 87
Example 8.2.2. The form of the generator matrix for tail-biting codes suggests a relation to
cyclic codes.
A (15, 5) cyclic code with minimum distance 7 is generated by g(x) = x
10
+ x
8
+ x
5
+ x
4
+
x
2
+ x +1.
In this case a generator matrix may be written taking shifts of the generator by three positions at
a time. It can be proved (as discussed in Problem 8.8.5) that longer tail-biting codes generated by
the same generator also have minimum distance 7. Unfortunately this (and other constructions
derived from good block codes) do not produce the best convolutional codes.
8.3 Parity checks and dual codes
Parity check matrices are of great importance in the analysis of block codes. In this
section we introduce a parallel concept for convolutional codes, although at this point
it is not as widely used. For a given tail-biting code and a xed length N, we can nd a
parity check matrix by solving a system of linear equations as discussed in Chapter 1.
However, it is a natural requirement that H should have a structure similar to that of G,
and in particular the nonzero part of the rows should be independent of N. With this
constraint the problem becomes more difcult, and in fact it may not be possible to get
a parity check matrix that is quite as simple as G.
For non-catastrophic codes of rate R =
1
2
, the parity check matrix has the same form
as G, and the nonzero sequence h is obtained by simply reversing g. The orthogonality
of G and H follows from the fact that each term in the inner product appears twice.
The parity check matrix has rows that are shifts of h = (11.01.11). Thus by letting the rst row
represent the rst parity check involving the rst block, H(16) becomes
H =
_
_
_
_
_
_
_
_
_
_
_
_
1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1
0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1
_
_
The parity symbol in position j is associated with a row of H that has its last nonzero
entry in position j . Thus the parity symbol is a function of previously received sym-
bols.
For codes of other rates we can always nd the parity check sequence, h, by solving a
system of linear equations. If we look again at the row of H that has its last nonzero
entry in position j , the last row of G that we have to consider is the one that has its
rst nonzero symbols in the same block. If M is the memory of the code, the previous
M rows may also have nonzero symbols in the same block, and h has to be orthogonal
to these rows. It is not clear offhand how long the vector h must be, but whenever we
extend it by a block of n symbols, we have to consider another block of G, which im-
plies k < n rows. Thus we eventually have more variables than equations, and we can
choose h as the shortest nonzero vector that is orthogonal to the relevant segment of G.
If we use this approach for a non-catastrophic code with rate
n1
n
, we get a parity check
matrix consisting of cyclic shifts of a single row. However, in general there are n k
different equations producing different vectors. The calculations get more complicated
if there are multiple solutions, or if different systems lead to parity vectors of different
length. We choose to restrict our analysis to the case where there is a unique solution
of the following type
Denition 8.3.1. A rate R convolutional encoder is called regular if the generator ma-
trix has nonzero coefcients g
i, j
in row i only for
_
i
R
_
j
_
i+M+1
R
_
1.
Similarly the coefcients of the parity check matrix are nonzero only for
_
i
1R
_
j
_
i+M+1
1R
_
1, and the nonzero part of the i-th row of each matrix is the unique nonzero
vector orthogonal to the corresponding segment of the other matrix.
Thus the nonzero part of a row of H is h = (h
m
, h
m
1
, . . . , h
0
) where M + 1 =
(1 R)(m
+1).
Example 8.3.2. Dual regular codes
The concept of dual codes will be illustrated by a pair of codes with rate
1
3
and
2
3
. The generator
sequence for the rst code is
g = (111.101.011.110)
and the memory is M = 3 (even though Denition 8.3.1 appears to allow the last coefcient of g
to be number 11, the conditions for the dual code cannot be satised in that case). The nonzero
part of h has its last coefcient in one of the last two positions in a block. In both cases a se-
quence of length 5 has to satisfy only four linear equations, and thus there are nonzero solutions.
A segment of the H matrix becomes
H =
_
_
_
_
_
_
_
_

000 001 010 110 000 000
000 000 110 101 000 000
000 001 010 110 000
000 000 110 101 000

_
_
Similarly four blocks of H provide 11 equations with g as the only nonzero solution. Thus we
have m
= 5 and M = 3.
Denition 8.3.2. A syndrome for a received sequence r is dened as
s
j
=
m
i=0
r
j i
h
i
(8.2)
where s
j
= 0 if r is a codeword.
8.4 Distances of convolutional codes 89
Lemma 8.3.1. A single error affects at most M +1 syndrome bits.
We may use H as the generator matrix for a dual rate
1
n
code, but it is preferable to
dene the dual convolutional code by the matrix that has the order of the blocks re-
versed (a similar convention is used for cyclic block codes as discussed in Section 6.2).
It follows from the denition of regular codes that the dual pair has the same memory.
Lemma 8.3.2. A pair of dual regular codes have the same memory.
8.4 Distances of convolutional codes
For block codes the Hamming distance between codewords gives important informa-
tion about the error-correcting properties. In particular the minimum distance is impor-
tant. We present a similar parameter for convolutional codes.
If we eliminate the last M bits of the frame input u (or force them to be zero), the
encoding (8.1) becomes a regular non-cyclic convolution. This way of adapting the
convolutional encoding to a particular frame length is called termination.
Denition 8.4.1. A terminated (n, k, M) convolutional code is the sequence of block
codes of length N = j n obtained by the encoding rule (8.1) with the last M input bits
set to zero.
Thus a terminated code has rate less than R, but for a long frame the difference is
small. However, in the analysis of convolutional codes it is also useful to consider
short terminated codes. If we consider j consecutive rows of the generator matrix and
the segment of the encoded sequence where these rows are not all zero, the length N
of the sequence satises j =

_
R(N
m)
_
. The set of encoded sequences is a linear
(N
, j ) code called the j-th terminated code.

Denition 8.4.2. The j -th row distance of the convolutional code is the minimum weight
of a nonzero vector spanned by j consecutive rows.
Thus the j -th row distance is the minimum distance of the j -th terminated code. Since
each terminated code contains all shorter codes, we have
Lemma 8.4.1. The row distance is a decreasing function of j .
Denition 8.4.3. The free distance, d
f
, is the minimum value of the row distance.
In most codes of interest the row distance is constant from j = 1, or it is reached with a
very small value of j . The free distance d
f
is also the minimum distance of the convo-
lutional code as dened in Denition 8.1.2 and Lemma 8.1.1 for any sufciently large
N. The free distance is the most important measure of the error-correcting capability
of a convolutional code.
Lemma 8.4.2. For a given non-catastrophic convolutional code, all tail-biting codes of
length greater than some constant N
have minimum distance d

f
.
Proof. Since the long tail-biting codes contain all shorter terminated codes, they have
words of weight d
f
. A codeword that is not contained in a terminated code is produced
by a nonzero input in the full length of the code. However, for a non-catastrophic code
such a codeword cannot have arbitrarily long zero segments, and thus the weight must
increase with the length of the code. For sufciently long codes, the weights of these
codewords must exceed d
f
.
Theorem 8.4.1. A convolutional code with free distance d
f
can correct any error pat-
tern of weight less than
d
f
2
if N is sufciently large.
Proof. The theoremfollows fromLemma 8.4.1 and a similar result for block codes.
Lemma 8.4.1 provides a link between the free distance of convolutional codes and the
minimum distance of tail-biting codes. However, the following result is more useful
for predicting possible values of d
f
.
Theorem 8.4.2. The free distance of a convolutional code is upper bounded by the min-
imum distance of the best (N
, j ) block code, where j = R(N
m).
Proof. The terminated code is a block code with these parameters, and its minimum
distance, the row distance, is an upper bound on the free distance of the convolutional
code.
Since the largest value of the minimum distance of any block code with given param-
eters is known for a large number of cases, it is possible to obtain close bounds on the
free distance. In spite of this important connection between the minimum distances,
the result does not suggest any constructions.
Example 8.4.1. We may obtain an upper bound on the distance of a (3, 1) code with M = 3
by considering the parameters of block codes obtained by termination: (11, 1), (14, 2), (17, 3),
. . . . It follows that the upper bound is 9. For rate
2
3
we should compare with block codes
(7, 2), (10, 4), (13, 6), . . . , and the upper bound is 4.
In this section we have shown that convolutional codes have a parameter, the free dis-
tance, which is closely related to minimum distances of block codes. However, as N
increases, this distance is xed. Thus while it is useful to know that even in short seg-
ments, all error patterns of weight less than
d
f
2
can be corrected, we need to know how
the number of errors can be increased for long segments. We return to this problem in
Chapter 9.
8.5 Punctured codes
In Problem1.5.10 we have considered two methods for decreasing the length of a block
code by one symbol. We give the denition here to emphasize the terminology.
Denition 8.5.1. A code is shortened when an information symbol is rst forced to be
zero and then eliminated. A code is punctured when a parity symbol is eliminated.
8.6 Linear systems as encoders 91
Thus the rate is decreased by shortening and increased by puncturing the code. The
opposite operations are called lengthening and extending. We have
Lemma 8.5.1. The dual of a shortened code is the punctured dual code.
Proof. The dual of the original code has a parity symbol where the original code has an
information symbol. If this symbol is forced to be zero, the remaining positions satisfy
all the parity checks, and the position can be punctured.
These concepts may be applied to convolutional codes, but usually symbols are punc-
tured in a periodic pattern through the entire frame or a predened part of the frame.
The main advantage of puncturing is that the rate of the code can be increased while
the structure of the decoder is unchanged. In particular the same decoder can be used.
We return to this aspect in Chapter 9.
Most of the high-rate codes that are used in current systems are obtained by puncturing
codes of rate
1
2
. Puncturing may be used to change between different rates depend-
ing on the bit error probability, or the technique may be used to vary the number of
correctable errors so that some information symbols are given additional protection.
The denition of general (n, k, M) codes in Section 8.1 was partially motivated by the
structure of punctured codes.
Example 8.5.1. Starting from the rate
1
2
code in Example 8.1.1, we can puncture every other
parity symbol to get a rate
2
3
code with generators g
= (1101) and g
= (11111). We still
have M = 2. In order to nd the generator of the dual code we can solve a system of linear
equations or we can shorten the dual code by eliminating the same symbols. Before we can
eliminate the symbols from the dual code, we must nd a linear combination of rows such that
all the relevant symbols are zero. In this case it is simply the sum of three consecutive shifts of
h, h
= (111.011.110). However when this generator is interpreted as generating a code of rate

1
3
, the memory is still 2.
8.6 Linear systems as encoders
Convolutional codes have been characterized by the parameters, (n, k, M). The linear
encoding rule can be expressed in matrix form by the convolution
y
j
=
i0
G
i
u
j i
(8.3)
Here the input blocks, u, have length k, the output blocks, y, length n, and the matrices,
G
i
, are k by n. Note that for k = 1, (8.3) coincides with (8.1), but if k is not very small,
this is a much wider denition of encoding rules than (8.1).
The encoding equation suggests using methods from linear systems or power series,
and in particular the technique of generating functions (Z-transforms). This allows us
to express the code by a polynomial generating function
y(D) = u(D)G(D) (8.4)
The indeterminate, D, is used in the literature, and it may be thought of as an element
in an extension eld of the symbol eld (as Z is complex in usual linear systems).
The D notation allows us to manipulate sequences in a more compact notation, but the
most important difference is that we can write innite (eventually periodic) sequences
as rational functions. The notation is used to study questions relating to encoders,
notably equivalences between encoders and duality of encoders. Since we assumed
a nite memory in (8.1), the entries in G(D) will be polynomials of degree at most
m. However by suitable scalings and row operations we can obtain a generator matrix
in systematic form, i.e. a form containing a k by k identity matrix while the remain-
ing entries may be rational functions in D. Such systematic encoders are sometimes
preferred.
We may write the generator and parity check matrices as
G(D) = (1 + D + D
2
, 1 + D
2
) H(D) = (1 + D
2
, 1 + D + D
2
)
Note that the polynomial vectors are orthogonal, but to convert them to binary vectors, the sym-
bol D in H must be interpreted as a left shift of one block.
The generator matrix for a rate
1
3
code may be written in transform notation as
G(D) = (1 + D + D
3
, 1 + D
2
+ D
3
, 1 + D + D
2
)
Similarly we may take the transform of the generator matrix (parity check matrix), but in order
to get the same result of polynomial multiplication and scalar products of vectors, we need to
use negative powers of D or reverse the order of the blocks:
H(D) =
_
1 1 + D D
2
1 + D D 1
_
The encoding rules for convolutional codes are often non-systematic. The generator
matrix for any particular tail-biting code could be converted to systematic form, but the
structure would be changed. The D notation allows us to reduce the generator matrix
to systematic form if rational functions are allowed for the parity symbols.
The generator matrix
G(D) = (1 + D + D
3
, 1 + D
2
+ D
3
, 1 + D + D
2
)
can be replaced by the systematic encoding matrix
G
s
(D) =
_
1,
1 + D
2
+ D
3
1 + D + D
3
,
1 + D + D
2
1 + D + D
3
_
If H is seen as the generator matrix for the dual rate
2
3
code, we can similarly modify the encod-
ing to make it systematic:
H
s
(D) =
_
_
1 0
1+D+D
3
1+D+D
2
0 1
1+D
2
+D
3
1+D+D
2
_
_
8.7 Unit memory codes 93
The transform notation is particularly convenient for (n, 1) codes where we can write
the generator matrix as
G(D) = (G
1
(D), G
2
(D), . . . , G
n
(D))
We have the following result.
Theorem 8.6.1. An (n, 1) code is non-catastrophic if and only if the G
i
do not have a
nontrivial common factor.
Proof. An ultimately periodic input may be written as a rational function, and the out-
put is a polynomial (and thus nite) if and only if the denominator divides all G
i
.
8.7 Unit memory codes
Some of the algebraic methods developed for block codes can be used to describe con-
volutional codes with larger block size.
A unit memory code (UMC) is generated by an encoding rule of the type
y
j
= u
j
G
0
+u
j 1
G
1
(8.5)
Here G
0
is a generator matrix for an (n, k) block code and G
1
is a k by n matrix, with
rank m k. When m < k, it is convenient to write the generator matrices as
G = [G
0
, G
1
] =
_
G
00
0
G
01
G
11
_
where G
00
is a k m by m matrix.
Using this structure we shall dene a convenient class of codes.
Denition 8.7.1. The unit memory code generated by G is regular if the linear subspaces
spanned by G
00
, G
01
and G
11
share only the zero vector.
With this denition of a unit memory code we have
Lemma 8.7.1. The dual of a regular UMC is also a regular UMC.
Proof. Any vector in the code belongs to the space spanned by the rows of G
00
, G
01
and G
11
, and the dimension of this space is k +m (it may be the whole space). Thus if
H
00
is a basis for the dual space, it generates the parity checks that are satised by all
blocks (it might be just the zero vector).
We may nd the parity checks of the code by solving the following system of linear
equations:
_
_
_
_
_
_
G
00
0
G
11
0
G
01
G
11
0 G
01
0 G
00
_
_
_
H
00
H
01
0
0 H
11
H
00
_
= 0 (8.6)
It follows from our assumptions that the rank of the coefcient matrix is 2k + m, and
the rank of H
00
is n k m. Thus the dimension of the remaining solutions is
2n 2k m 2(n k m) = m
Thus we have determined a repeated set of n k parity checks of the form correspond-
ing to the denition. Notice that without the assumption that the subspaces spanned
by the three matrices in G be disjoint (except for the zero vector), the dual code might
have a different structure, and in particular it could have a memory greater than 1.
Example 8.7.1. A double error-correcting unit memory code
A regular (n, n r) UMC may be dened by the parity check matrix
H =
_
H
0
, H
1
_
=
_
e 0
H
01
H
11
_
where the block length is n = 2
r1
, and e is an all 1s vector. The remaining syndrome bits
are specied as for the BCH codes correcting two errors: The columns of H
01
are powers of a
primitive (n 1)-th root of unity, , and a zero column; the columns of H
11
are powers of either
3
or
1
, which must also be a primitive (n 1)-th root. The algebraic structure allows us to
give an outline of a decoding algorithm for correcting error patterns containing at most j + 1
errors in any segment of j consecutive blocks:
Algorithm 8.7.1. Decoding a UMC
Input: Received word
1. Assume that the errors have been corrected up to block i 1. Calculate the next r syn-
drome bits, which involve only received block r
i
, s
i
= H
0
r
i
T
. Since H
0
is also the parity
check matrix for an extended Hamming code, we can immediately correct single errors
and detect double error.
2. When a single error has been corrected, we can remove the term H
1
r
i
T
from the following
syndrome and continue.
3. If there are two errors in block i , this will be detected as even parity and a nonzero syn-
drome. In this situation the decoding is delayed until a block with even parity occurs.
4. In the following blocks we detect at rst only the parity. Blocks with odd parity contain
single errors, and they will be decoded later.
5. Since we have assumed that j blocks contain at most j + 1 errors, and the rst block
contained two errors, the next block with even parity must be correct.
6. We can then decode the previous blocks in the reverse direction using H
11
, since we as-
sumed that all columns of this matrix are distinct.
7. Finally the two errors in block i can be decoded using a decoding method for double error
correcting BCH codes, since we now have both syndromes available.
Output: Decoded word.
The example shows how convolutional codes can use available syndrome bits to correct slightly
heavier error patterns, but this advantage comes at the expense of a variable decoding delay.
8.8 Problems 95
8.8 Problems
Problem 8.8.1 Consider a rate
1
2
convolutional code with generating vector g = (11.10.01.11).
1) How is the information sequence u = (10111000) encoded?
2) Write a generator matrix for the tail-biting code of length 16. What is the dimension of the
code?
3) How many consecutive zero blocks can there be in a nonzero codeword?
4) Is the encoding non-catastrophic?
5) What is the memory of the code?
6) Find the minimum distance of the code of length 16.
7) What is the minimum distance of a code of length 64?
Problem 8.8.2 A rate
1
3
convolutional code has generating vector g = (111.101.110).
1) Is the encoding non-catastrophic?
2) What is the memory of the code?
3) Encode the sequence u = (11010).
Problem 8.8.3 Consider the same code as in Problem 8.8.1.
1) Prove that the matrix from 8.8.1 2 is also a parity check matrix for the code.
2) Find the syndrome for a single error.
3) In a long code the syndrome is found to be (. . . 00011000 . . . ). Find a likely error pattern.
4) Same question with the syndrome (. . . 0001000 . . . ).
Problem 8.8.4 Consider the same code as in Problem 8.8.2.
1) Find a parity check matrix for the tail-biting code of length 15.
2) Is the code regular?
Problem 8.8.5 Consider the same codes as in Problems 8.8.1 and 8.8.2.
1) Find the row distances of the codes and their free distances.
2) How many errors can the codes correct?
3) Give examples of patterns with higher weight that lead to decoding error.
Problem 8.8.6 Consider the (2, 1, 3) code from Problem 8.8.1and puncture the code to get a
rate
2
3
code.
1) Write a generator matrix of length 15 for this code.
2) Find a parity check matrix for the code (or a generator matrix for the dual code).
Problem 8.8.7 Same codes as in Problems 8.8.1 and 8.8.2.
1) Write generator matrices and parity check matrices in D notation.
2) Verify that the rows of these matrices are orthogonal when the terms are multiplied as poly-
nomials.
Problem 8.8.8 Consider the convolutional (2, 1, 4) code with generator matrix
G(D) =
_
g
0
(D), g
1
(D)
_
=
_
1 + D + D
4
, 1 + D
2
+ D
3
+ D
4
_
1) Show that g
0
is reducible, while g
1
factors into two primary polynomials.
2) Show that it follows from 1) that the weight of short encoded sequences is at least 7.
3) For the information sequence u = (1011), nd the weight of the encoded sequence.
4) How many consecutive blocks can be 00?
Problem 8.8.9 A convolutional (2, 1, 6) code which is often used has generator matrix
G(D) =
_
g
0
(D), g
1
(D)
_
=
_
1 + D + D
2
+ D
5
+ D
6
, 1 + D
2
+ D
3
+ D
4
+ D
6
_
The free distance is 10.
1) Show that g
0
is irreducible, while g
1
factors into two binary polynomials.
2) Find the information polynomial, u(D), of lowest degree such that for some j ,
u(D)g
1
(D) = 1 + D
j
3) Find the weight of the encoded sequence for this information polynomial.
Problem 8.8.10 Let g be the generating vector for a non-catastrophic R =
1
2
encoding.
1) Is it possible to get the same code with a different vector ?
2) Write G as a polynomial, G(D). Is G(D)(1 + D
2
) a generating vector for a non-catastrophic
encoding ?
3) Prove that the encoding is non-catastrophic if and only if g(x) does not have a factor that is a
polynomial in D
2
.
Problem 8.8.11 Consider the (8, 4) unit memory code generated by the matrix:
G =
_
e 0
G
0
G
1
_
where
G
0
=
_
_
0 1 0 0 1 0 1 1
0 0 1 0 1 1 1 0
0 0 0 1 0 1 1 1
_
_
, G
1
=
_
_
0 1 1 1 0 1 0 0
0 0 1 0 0 1 1 1
0 0 0 1 1 1 0 1
_
_
and e is a vector of 1s.
1) Show that the code is regular.
2) Verify that the dual code has the same form.
3) Verify that the code is constructed as discussed in Example 8.6.1.
4) Find the free distance.
Chapter 9
Maximum likelihood decoding
of convolutional codes
The main topic of this chapter is an efcient algorithm for ML decoding of convolu-
tional codes. The algorithm is described in detail in Section 9.2, but before we present
it we need a new way of describing convolutional codes.
9.1 Finite state descriptions of convolutional codes
In this section we describe convolutional codes as output sequences of nite state ma-
chines.
Denition 9.1.1. A nite state machine is dened by a nite set of states,
= {
0
,
1
,
2
, . . . ,
r
},
an input alphabet A, a next state function from A to , an output alphabet B, and
an output function from A to B. In any instance of time, j, the machine receives
an input symbol, makes a transition from its current state to the state given by the next
state function and produces the output symbol given by the output function.
The encoding of a convolutional code may be described by a nite state machine (the
encoder). The input is an information symbol, u
j
. It is usually most convenient to
identify the state by the previous M input symbols, [u
j 1
, u
j 2
, . . . , u
j M
], and we
shall refer to state
i
as the state where the previous input bits are the binary expansion
of the integer i with the most recent symbol as the least signicant bit.
The next state is obtained by shifting the state variables to the right and adding the new
input as the rst state variable. Thus a transition from state i leads to either state 2i or
2i +1. An important consequence of this labeling of the states is that all encoders with
a given memory have the same states and next state mapping.
The output, y
j
, is a string of symbols, which are determined as a function of the input
and the state, i.e. of M +1 input symbols. For a linear code, the output function may
be found from the generator matrix as the product of G and the input vector, but much
of this description can be used even when linearity is not assumed.
Lemma 9.1.1. A nite state encoder for a code with memory M has 2
M
states.
For sets of nite strings associated with nite state machines it is often useful to adopt
the notation of regular sets. We give a brief presentation of this concept here, although
it is not essential for understanding the results of this chapter.
If a and b are sets of strings from the nite alphabet A,
a +b indicates the union of the sets,
ab indicates concatenation, the set of strings consisting of a string from a fol-
lowed by a string from b,
we use this symbol to indicate the empty set (not an empty string),
a indicates the set of nite repetitions of a: {, a, aa, aaa, . . . }, where is the
empty string.
Denition 9.1.2. A regular set is a set of nite strings which is described by a nite
expression composed of the symbols of the alphabet and the compositions +, concate-
nation, and .
Example 9.1.1. It is convenient to use parentheses in the usual way
a (b +c)d = bd +cd +abd +acd +
The relation to nite state machines is established by the following result.
Theorem 9.1.1. The set of strings, inputs or outputs, associated with the transitions
from a given initial state to a given nal state in a nite state machine is a regular set.
The output function is given as a 2
M
2
M
matrix, , where the row and column in-
dices 0 i < 2
M
refer to a numbering of the states given above. In particular state 0
corresponds to an all 0 vector of state variables. Each entry in ,
i
i
, is a regular set
describing the binary vectors associated with the transition from state i to state i
.
Lemma 9.1.2. If is the output matrix with regular sets as entries,
b
gives the sets of
output associated with transitions in b steps.
For the (2, 1) code with g = (11.10.11), the encoder has four states. With the chosen numbering
of the states we may write as
=
_
_
_
_
00 11
11 00
01 10
10 01
_
_
If the encoder is initially in state 00 and the input is (101100), the sequence of states is
(01, 10, 01, 11, 10, 00) and the output (11.10.00.01.01.11).
9.1 Finite state descriptions of convolutional codes 99
Denition 9.1.3. A chain of state transitions in the nite state machine is a vector of
states indexed by j and a vector of possible state transitions such that if the transition
at time j is from state
i
to state
i
, the transition at time j +1 starts in state
i
.
The convolutional encoding (8.1) may now be described by chains.
Lemma 9.1.3. An encoded vector in a tail-biting code is the output associated with a
chain of length N that starts and ends in the same state.
Lemma 9.1.4. An encoded vector in a terminated code is a chain of length N that starts
and ends in state zero.
In the remaining part of this section we discuss properties of linear codes. A chain that
has the same initial and nal state is called a circuit. For a linear code there is always
a circuit of length 1 with y = (0 . . . 0) from state 0.
Denition 9.1.4. A nite state description is non-catastrophic if there is no other circuit
with all 0 output than repetitions of the length 1 circuit from state 0.
It follows from the discussion in Section 8.1 that this denition is consistent with the
previous use of the term non-catastrophic.
In Chapter 8 we discussed the properties of codewords in various block codes asso-
ciated with convolutional codes. It is convenient to break long encoded vectors into
shorter segments, which are often referred to as the codewords of the convolutional
code.
Denition 9.1.5. A codeword of a linear convolutional code is the output associated
with a circuit in the encoder that starts in state 0 and does not go through this state.
We refer to the number of transitions of the encoder as the length of the codeword.
Thus the zero word is the only word of length 1, and other words have length at least
M +1.
The (2, 1, 2) code has one word of length 1, 00, and no words of length 2. The generator,
11.10.11 is the only word of length 3 and also the only word of weight 5. There are two words
of weight 6, one with length 4 and one with length 5: (11.01.01.11) and (11.10.00.10.11).
The concept of codewords leads again to the free distance.
Theorem 9.1.2. The free distance is the minimum weight of a nonzero codeword.
In Chapter 8 we introduced row distances as minimum distances of terminated codes.
The following denition introduces a concept that describes how the weights of code-
words increase with increasing length.
Denition 9.1.6. The j-th extended row distance is the minimum weight of a nonzero
codeword of length j + M.
In Section 9.2 we shall show in detail that a decoding error consists in choosing an en-
coded vector that differs from the one transmitted by the addition of a codeword. This
is the motivation for studying the weight enumerator of a convolutional code.
Denition 9.1.7. The weight enumerator A(z) for a convolutional code is a formal
power series
A(z) =
w
A
w
z
w
(9.1)
where A
w
is the number of codewords of weight w.
Sometimes it is convenient to replace z by two variables x and y indicating the number
of 0s and 1s in the codeword. In that way it is also possible to determine the length of
the codewords. However, we leave this extension as an exercise.
The weight enumerator may be found by enumerating the codewords that start and end
in state 0. In order to eliminate sequences that pass through this state, we need the
matrix
which is obtained from by eliminating all transitions from state zero (i.e.
the rst column is set to ). If
i
indicates the set of sequences that start in state 0 and
end in state i , the regular sets may be collected into a vector .
Theorem 9.1.3. The set of nonzero codewords is the element
0
in the solution to the
system of equations
=
1
where the entries in the vector
1
has
10
in the second coordinate and is zero other-
wise.
Proof.
i
consists of sequences that start in state 0, reach some state i
, and then make

a transition to state i . This is the left side of the equation. In addition state 1 can be
reached directly from state 0.
Since we are usually not interested in nding the actual sequences but only the weights,
we replace this relation by a system of linear equations where the entries are polyno-
mials in z. The matrix (z) is obtained from by replacing all entries of weight w by
z
w
, and
(z) is obtained from
in the same way. An output of weight 0 is replaced

by a 1, and zeros are lled in where no transitions occur (replacing ). The functions
i
(z) indicate the weight enumerator for the segments of codewords that start in state
0 but end in state i , (the row index); these functions now satisfy
(z)
_
_
_
_
_
0
(z)
1
(z)
.
.
.
r
(z)
_
_
=
_
_
_
_
_
0
(z)
1
(z)
.
.
.
r
(z)
_
_
_
_
_
_
0
1
(z)
.
.
.
0
_
_
(9.2)
where the entry
i
(z) on the right side represents the transitions from state zero to the
other states. Since the transitions from state 0 were eliminated from
, the function
9.1 Finite state descriptions of convolutional codes 101
0
enumerates the chains that end here, which are the codewords. Thus we can nd
A(z) =
0
(z) by solving a system of symbolic equations. With a suitable programming
tool that can be done even for a moderately large number of states. It follows from this
calculation that we have
Theorem 9.1.4. The weight enumerator of a convolutional code is the rational function
0
(z).
The codewords of the code from Example 8.1.1 may be found by removing the zero circuit of
length 1. This gives the following system of equations:
_
_
_
_
0 0 0 z
2
0 0 0 1
0 z z 0
0 z z 0
_
_
_
_
_
_
3
_
_
=
_
_
_
_
1
z
2
3
_
_
And we obtain the weight enumerator as
A(z) =
0
(z) =
z
5
1 2z
= z
5
+2z
6
+4z
7
+
The weight enumerator is primarily of interest as a tool for obtaining upper bounds
on the error probability. By following the derivation of the union bound in Chapter 3,
we get an upper bound on the probability that the decoder makes an error in the rst
block. Assume that the zero word is transmitted. An ML decoder will select one of
the nonzero words leaving state 0 in the rst block, and this will happen if the received
sequence is closer to one of these words than to the zero sequence. Thus the same
calculation as the one used to derive (3.7) gives
P(e) <
w>0
j
w
2
A
w
_
w
j
_
p
j
(1 p)
wj
(9.3)
which may be approximated by
P(e) <
1
2
A(Z) (9.4)
where Z is the function
Z =

4p(1 p)
The bound (9.3) is closely related to (3.9) in Chapter 3, but it is particularly useful for
convolutional code since it allows us to use the rational form of the weight enumerator.
9.2 Maximum likelihood decoding
The process of decoding a convolutional code is conveniently described as a search for
a chain that disagrees with the received sequence in the smallest number of positions.
We present several properties of maximum likelihood decoding which lead to the im-
portant Viterbi decoding algorithm for convolutional codes (we use this name although
the algorithm may be seen as a special case of the principle of dynamic programming).
Earlier we introduced the transition matrix with entries
i
i
, which are the blocks
that the encoder outputs in making a transition from state i to state i
. For a received
block, r, we may nd the number of differences between
i
i
and r
j
as
e
i
i
( j ) = W
_
i
+r
j
_
where W indicates Hamming weight. Thus we may store these values in a table indexed
by the values of r.
Example 9.2.1. Consider the decoding of the (3, 1) code with M = 3 and
g = (111.011.110.111)
The encoder has memory 3 and eight states. The state transition matrix is
=
_
_
_
_
_
_
_
_
_
_
_
_
000 111
111 000
011 100
100 011
110 001
001 110
101 010
010 101
_
_
For a received zero block, the e
i
i
are found as the number of 1s in each entry of this matrix. We
refer to this integer matrix as E
000
. Similarly for each of the other seven values of a received
block we need a table of the corresponding error weights. For 101 we have
E
101
=
_
_
_
_
_
_
_
_
_
_
_
_
2 1
1 2
2 1
1 2
2 1
1 2
0 3
3 0
_
_
Here the symbol is used instead of (or a sufciently large number).
9.2 Maximum likelihood decoding 103
For a BSC with error probability p we have
Lemma 9.2.1. The probability of receiving r
j
given that the previous state was state i
and the following state i
, is
P[r] = p
e
i
i
( j )
(1 p)
ne
i
i
( j )
= (1 p)
n
_
p
1 p
_
e
i
i
( j )
For a chain of transitions, the probability becomes a product of such terms, and it is
more convenient to consider the logarithm of the probability. Leaving out constants
that are independent of the error pattern, we get
log
_
P[r]
_
K
j
e
i
i
( j )
The Maximum Likelihood (ML) chain for the frame is the one that maximizes the con-
ditional probability, which is the one that minimizes the sum of the errors (since the
constant is negative).
If the probability distribution on the states given the received sequence up to time j 1
is q( j 1), the distribution after receiving r
j
may be found by multiplying q( j 1)
by a matrix of conditional probabilities. The algorithm below may be interpreted as
identifying the largest term in each of these sums of products.
Lemma 9.2.2. If the maximum likelihood chain goes through state i at time j , it has the
smallest number of errors among all chains that reach state i at time j
Proof. Assume that another chain reaching state i at time j
had a smaller total number

of errors. We could then follow this chain for j < j
, but since the possible transitions

after time j
depend only on the state at time j
, the rest of the chain could be chosen as

the ML chain. In this way we would get a smaller total number of errors contradicting
the assumption.
The principle of dynamic programming is that for each instance of time, j , and for each
state, the optimal chain leading to that state may be computed by extending the optimal
chains at time j 1. Let the accumulated number of errors on the chain leading to
state i be
i
( j 1). Each chain may be extended in 2
k
different ways, and similarly
each state may be reached from 2
k
previous states. For each possible received block
r( j ) we nd the number of errors associated with a particular state transition, e
i
i
( j ),
from the relevant E
r
table. The new errors are added to the accumulated number, and
we select the best chain into each state:
i
( j ) = min
i
[
i
( j 1) +e
i
i
( j )] (9.5)
The updated -vector is used in the next computation, and in the examples we store
them in a matrix. However, the old values are not needed, and in an actual decoder it is
sufcient to have storage for the two vectors involved in the current computation. The
chain reaching state i at time j is selected among the transitions from possible previ-
ous states. For each state and each instance of time, the selected previous state must
be stored (if the minimum in (9.5) is not unique, one is selected arbitrarily). When
we reach the end of the frame and nd the smallest cumulated number of errors, these
return addresses are used to nd the chain which becomes the decoded sequence.
We can now state the Viterbi algorithm:
Algorithm 9.2.1. ML Decoding (Viterbi)
Input: Received sequence.
1. Initialialize
i
(0) as 0 for possible initial states and for other states.
2. For each received block, update
i
( j ) according to (9.5) and extend the optimal
chains accordingly.
3. Among the possible nal states, select the one that has the smallest
i
(N), and
decode the received sequence by tracing the chain backwards from this state.
Output: Decoded chain.
It follows from the derivation that we have
Theorem 9.2.1. The Viterbi algorithm determines the maximum likelihood code se-
quence.
Having determined the most likely transmitted sequence, we can now recover the in-
formation sequence (or that may be done when storing the chains).
For a terminated code, we let the zero state be the only initial and nal state. In order to
be sure to nd the maximum likelihood word for a tail-biting code, we may have to try
each state as the initial state one at a time, and nd (N) for the same state. However,
by letting the decoding continue some time beyond t = N, we have a high probability
of nding the best codeword.
If the encoder is started in state 000 and received sequence is
r = (000.101.000.101.000.101.000)
we calculate the vectors ( j ) using (9.5) and the matrices E
000
and E
101
(in order to avoid
writing out all eight E-matrices we have chosen an error pattern with only two different received
blocks). Since each chain can be extended by a 0 or a 1, two states are reached from state 0 at
time j = 1, 4 at time j = 2 and all eight at time j = 3. From time 4 each state can be reached by
two branches, and we select the one with the smallest number of errors. The decoding process is
illustrated in Figure 9.1.
Thus at time 5 state 3 can be reached from state 1 or 5. In the rst case the chain already had
three errors and one error is added in the transition, whereas in the last case there have been four
9.2 Maximum likelihood decoding 105
2
r 000 101 000 101 000 101
000
...
j 1 3 4 5 6 7
0
1
2
3
4
5
6
7
(j)
(j)
(j)
(j)
(j)
(j)
(j)
(j)
0
0
2
6
5
6
4
5
7
6
6
0
3
2
1
5
4
2
3
3
7
6
6
5
4
3
5
4
5
4
2
5
5
5
4
3
4
5
4 4 4
5 7
6
5 7
6
7
Figure 9.1: Decoding of an 8-state convolutional code. The columns

show the vectors ( j ) for time j = 1, . . . , 7. ( j ) is calculated from
the current input and ( j 1) using (9.5), the calculation of
3
(5) is in-
dicated by dotted arrows. The full set of connections between the states is
shown as dotted lines between time 4 and 5, this graphical representation
of the matrix is often called a trellis. The state at time 7 is assumed to
be 0, and the full arrows indicate the trace-back of the decoded sequence.
errors in the rst four blocks and the transition to state 3 adds two additional errors. Assuming
that the nal state here is the zero state, we can choose y = (000.111.100.101.001.111.000) as
the transmitted sequence and u = (0110000) as the information.
Any time the decoded chain leaves the correct sequence of states, a decoding error oc-
curs, and we shall say that the error event continues until the chain reaches a correct
state (whether all blocks between these two instances are correct or not). We shall refer
to such a decoding error as a fundamental error event.
Lemma 9.2.3. A fundamental error event of length, L, is the output of a chain leaving
state 0 at time j and reaching state 0 for the rst time at time j + L, i.e. a codeword
of length L.
Proof. It follows from the linearity of the code that we may nd the error sequences by
adding the two sequences, and that the result is itself a codeword. As long as the chains
agree, the sum is 0 and the nite state machine stays in state 0. The sum sequence
returns to state 0 when the two sequences reach the same state.
The probability of such an error event is related to the extended row distances.
Lemma 9.2.4. For a fundamental error event of length L to occur the number of errors
must be at least half of the (L M)-th extended row distance.
In Section 9.1 we derived the upper bound (9.3) on the probability of an error event
starting in the rst block. We can interpret the bound as an upper bound on the prob-
ability that an error event starts in any particular block, j , sometimes called the event
error probability. This probability is smaller than the probability of an error starting in
the rst block because the decoder may select a wrong branch starting before time j .
To arrive at the expected number of wrong blocks, the probabilities of the error events
must be multiplied by their length.
The Viterbi algorithm is a practical way of performing maximum likelihood decoding.
The complexity is directly related to the number of states, and thus it increases expo-
nentially with the memory of the code. Decoders with a few thousand states can be
implemented. It is an important aspect of the algorithm that it can use soft decision
received values, i.e. real values of the noisy signal, rather than binary values. The
weights e
i
i
, which we derived as the number of errors per block, may be replaced by
the logarithm of the conditional probability or a suitable approximation. Decoding for
Gaussian noise channels is discussed in Appendix A .
The algorithm can be used to perform ML decoding of block codes if they can be de-
scribed as tail-biting codes. In principle other block codes can be decoded using the
same principle if the decoder is based on a nite state description of the encoder, but
the tail-biting structure is important as a way of keeping the number of states small.
9.3 Problems
Problem 9.3.1 Consider again the rate
1
2
convolutional code with generating vector
g = (11.10.01.11).
1) What are the codewords of weight 6?
2) How many states are there in the encoder?
3) If the input is a periodic repetition of 11100.11100. . . . , what is the encoded sequence?
4) Write out the transition matrix for the code.
5) Find the codewords of length 4, 5, and 6.
6) Find the weight enumerator of the code.
7) How many words of weight 7 and 8 are in the code?
8) Which of the following error patterns will always be correctly decoded?
a : 00.10.00.00.10.00 . . .
b : 01.10.00.00.00.00 . . .
c : 11.00.01.00.00.00 . . .
d : 11.00.00.00.00.11.00 . . .
e : 11.00.10.00.00.00 . . .
f : 00.00.00.01.11.00 . . .
9) Find an upper bound on the error probability using (9.3) when p = 0.01.
10) What is the largest value of p such that (9.3) gives a probability < 1?
11) Write out the E matrices needed for decoding.
12) Assume that there is a single error in the rst block. Find those weights (i ) that are less than
4. Repeat the problem with two errors in the rst block.
13) How can the information symbols be recovered from the decoded sequence?
9.3 Problems 107
Problem 9.3.2 Consider the (2, 1, 2) code in Example 8.1.1
1) In the transition matrix (z), replace z with variables x and y such that x
i
indicates i 0s and
y
j
indicates j 1s. Calculate the two-variable weight enumerator.
2) How many codewords have four 0s and six 1s?
Problem 9.3.3 Consider the matrix in Example 8.1.1 with sequences (regular sets) as entries.
1) Find
2
and
3
.
2) Check that the generator sequence appears in the expected place in the last matrix.
3) Set up the system of linear equations in regular sets as discussed in Section 9.1 and eliminate
as many variables as possible by substitution.
4) In order to solve for
0
the following result is needed:
Let A, B, C be regular sets and A = B + AC; then A = BC
.
Find a regular set description of the set of nonzero codewords.
5) Compare the results of 9.3.3 4) and 9.3.2 1).
Problem 9.3.4 (project) Use a program for symbolic computation to nd the weight enumerator
of the convolutional code given in Problem 8.8.8.
Problem 9.3.5 (project)
1) Write a program for encoding and decoding the code from Problem 8.8.8.
2) Decode terminated frames of length 1000 information bits with bit error probability p = 0.02
and p = 0.05.
3) Compare the results to the upper bounds.
4) Modify the encoder/decoder to use tail/biting frames.
Chapter 10
Combinations of several codes
As discussed in Chapter 7, long codes are often required to obtain sufcient reliabil-
ity, but a single long block code or a convolutional code cannot provide the required
performance with an acceptable amount of computation. Here we give an introduction
to coding methods where several codes, or several stages of coding, are combined in a
frame. The difculty of constructing good long codes led to this approach in an early
stage of the study of error-correcting codes, and it continues to be a focal point of much
development of communication systems.
If the length of the combined code is chosen to match the length of a transfer frame, the
performance is also directly related to the required frame error probability. We discuss
the performance from this point of view, and also briey mention some other issues
related to the application of such codes in communication systems or data storage.
When several codes are combined, the received symbols are stored in a buffer, and the
decoders work on only a subset of these symbols. In this way the total length will have
an impact on the decoding delay and possibly other secondary properties of the sys-
tem, but it does not directly determine the complexity of the decoding. The total code
may not have a very good minimum distance, but in most cases error patterns will be
decoded even though the weight exceeds half the minimum distance. Since the error
probability depends on a combination of weights of error patterns and the number of
such patterns, a code with a relatively small number of low weight words can still have
an acceptable performance.
10.1 Product codes
Product codes are the simplest form of composite codes, but nevertheless it is still a
construction of interest.
Denition 10.1.1. A product code is a vector space of n
1
by n
2
arrays such that each
row is a codeword in a linear (n
1
, k
1
, d
1
) code, and each column is a codeword in a
linear (n
2
, k
2
, d
2
) code.
Theorem 10.1.1. The parameters of a product code are (N, K, D)=(n
1
n
2
,k
1
k
2
,d
1
d
2
).
Proof. The information symbols may be arranged in a k
1
by k
2
array. The parity
checks on rows and columns of information symbols may be computed independently.
The parity checks on parity checks may be computed in two ways, but it follows from
linearity that the results are the same: The last rows may be calculated as a linear com-
bination of previous rows, and consequently they are codewords in the row code. Thus
(N, K) are the products stated in the lemma. The product of the minimum distances is
a lower bound since a nonzero codeword has at least one nonzero row of weight at least
d
1
and each nonzero column has weight at least d
2
. A minimum distance codeword of
weight d
1
may be repeated in d
2
rows to form a codeword of weight exactly d
1
d
2
. In
fact these are the only codewords of that weight.
For simplicity we shall assume that the row and column codes are identical with pa-
rameters (n, k, d) unless different parameters are explicitly given.
In the simplest approach to decoding, each column of the received word is rst decoded
separately, and a decoding algorithm for the row codes is then applied to the corrected
word. We refer to such an approach as serial decoding.
Lemma 10.1.1. If a product code is decoded serially, any error pattern of weight at
most
D1
4
is decoded.
Proof. If d is odd, the worst case situation is that several columns contain
d+1
2
errors
and are erroneously decoded. However given the total number of errors, there can be at
most
d1
2
such columns, and the errors will be decoded by the row codes. Of course, if
the rows had been decoded rst, the errors would have been decoded immediately. On
the other hand an error pattern with at least
d
2
errors in each row and column cannot be
decoded immediately, and thus we cannot be sure to decode
_
d
2
_
2
errors.
Any error pattern with at most
D1
2
errors can be corrected, but it is not trivial to extend
the decoding algorithm to reach this limit. Furthermore, to ensure small probability of
frame error, the algorithm should also decode most errors of somewhat greater weight.
We return to this decoding problem in Chapter 13.
The following lemma is useful for estimating the error probability.
Lemma 10.1.2. If the number of minimum weight codewords in the two component
codes is a
1
and a
2
, the number of words of weight d
1
d
2
in the product code is a
1
a
2
.
Proof. Since the weight of a nonzero row is at least d
1
and the weight of a column
is at least d
2
, a minimum weight codeword can have only d
1
nonzero columns and d
2
nonzero rows. Thus the number of codewords of that weight is at most the number of
ways of choosing the nonzero rows and columns such that there is a minimum weight
codeword on exactly those positions. On the other hand for any choice of a minimum
weight row word and column word, there is a corresponding minimum weight word in
the product code.
10.1 Product codes 111
For slightly higher weights the number of words can also be expressed in a fairly direct
way as indicated in the following example, however the relations become more com-
plicated for higher weights. Thus we can get an upper bound on the error probability
in the following form.
Theorem 10.1.2. For a product of binary (n, k, 2t ) codes with weight distribution A
w
,
the error probability is upper bounded by
P(e)
1
2
A
2
2t
_
4t
2
2t
2
_
p
2t
2
+
Proof. The result follows from Theorem 3.3.1.
Example 10.1.1. Product of extended Hamming codes
If the component codes are chosen as the extended Hamming code (64, 57, 4), the product code
has length N = 4096, rate
K
N
= 0.793, and minimum distance 16. If ML decoding is performed,
the error probability can be estimated by the probability that eight errors occur within a set of
nonzero coordinates of a weight 16 word. The number of weight 4 codewords in the (64, 57)
code is
A
4
=
_
64
3
_
4
= 10416
And the number of weight 16 words in the product code is the square of this number. From
Theorem 3.3.1 we get an upper bound on the error probability: Thus of the 12870 subsets of 16
positions in a word, only 12 are shared with other minimum weight codewords. From Theorem
10.1.2 we get an upper bound on the error probability:
P(e)
1
2

_
16
8
_
10416
2
p
8
+
Thus for p = 1/64 we get P(e) 0.0025. Of the 12870 subsets of 8 positions among the ones
of a weight 16 word only 12 are shared with other minimum weight codewords. Thus this is
a very close estimate for small p. The next term in the weight distribution of the product code
is A
24
, and it is possible to nd the number of weight 24 codewords as the number of pairs of
weight 16 words that share four positions. It may then be veried that these words contribute
much less to the error probability. Thus a sufciently powerful decoding algorithm could decode
on the average one error in each row and column, and if the frame is supplied with a frame check
sequence, one could obtain adequate performance even for fairly large error probabilities.
Example 10.1.2. Product of RS codes
In composite coding systems Reed-Solomon codes play a particularly important role. The com-
monly used codes are based on 8-bit symbols, q = 256. In some systems with very large frames
(video), product codes of RS codes are used, and for complexity reasons the component codes
are decoded only up to half the minimum distance. Improved performance may be achieved
by reordering the symbols between the two codings. In common CDs, two stages of (short-
ened) RS codes are used, and interleaving is used between the stages to spread out the effect of
concentrated surface defects. Here both RS codes have distance only 4. The rst stage of de-
coding corrects single errors and detects double errors. After deinterleaving, the second decoder
stage corrects the symbols that come from blocks with double errors. We refer to the technical
standards for details of these systems.
10.2 Concatenated codes (serial encoding)
There are historical reasons for the use of the word concatenate, two encoders/de-
coders were placed next to each other. The term does not quite reect the way the
codes are combined. Some authors use the term serial to indicate that the output of the
rst stage of encoding is input into the second stage encoder. We shall use the term
concatenated code to indicate a combination of an RS (outer) code and a binary (inner)
code, which is how the expression was originally used.
10.2.1 Parameters of concatenated codes
In Chapter 5 we dened a class of Reed-Solomon codes over the alphabet of q symbols
(q a power of a prime) and parameters (N, K, D) = (q 1, K, q K).
Lemma 10.2.1. The binary image of an RS code over the symbol eld of q = 2
r
elements
is a binary (r(q 1), r K, D
) code with D
q K.
Proof. The symbols in the big eld are represented as binary r-vectors with addition
represented as vector addition. We can nd a basis for the binary image by converting
a row in the original generator matrix to r binary vectors. Let the original information
symbol be a 1 in position j , 0 j < K, and consider the r 1 codewords obtained
by multiplying this word by the rst r 1 powers of the primitive element used for
constructing the eld. The images of these vectors have a 1 in one of the positions
j r to j r +r 1. Since all elements in the big eld are linear combinations of these
r elements, we can get all images of the RS codewords as linear combinations of the
Kr binary vectors. The minimum weight of the binary image is at least the Hamming
weight of the RS codeword.
The real minimum distance is difcult to analyze, and also of minor importance to the
decoding. However, we can get an upper bound using the construction of BCH codes
(Chapter 6):
Lemma 10.2.2. The minimum distance of the binary image of an (N, K, D) RS code is
upper bounded by the minimum distance of the BCH code C
K
(sub).
Proof. As discussed in Chapter 6, the BCH code is contained in the RS code. Taking
the binary image of a codeword that is already binary does not change its weight. Since
the generator polynomial of the BCH code contains all roots in the generator polyno-
mial of the RS code and some additional roots, its minimum distance may be greater
than D. For RS codes of low rates, the bound is trivial, D N.
Now we combine the binary images of RS codes with a second binary coding stage in
a way that is similar to the product codes. Let the information bits be represented as an
r I by K array. Rows j r r + 1 to j r are encoded as the binary image of an (N, K)
RS code, the j th outer code. Each column, which consists of a symbol from each RS
code, is then encoded as a codeword from an (n, I r, d) block code, the inner code. The
parameter I is referred to as the interleaving degree.
10.2 Concatenated codes (serial encoding) 113
Denition 10.2.1. A concatenated code with outer (q 1, K) RS code, interleaving de-
gree I , and binary (n, I log q) inner code, is a binary (n(q 1), K I log q) code, such
that K log q information symbols are encoded as a binary image of the RS code, and
for each j , symbols j from each of the I RS codes are encoded as a codeword of the
inner code.
Theorem 10.2.1. The minimum distance of the concatenated code is at least Dd.
Proof. In a nonzero word at least one RS code has D = q K nonzero symbols. These
are encoded by the inner code which has weight at least d in each nonzero column.
While a product code has many codewords with weight equal to the lower bound, a con-
catenated code may have none or very few. If D
is the minimum distance of C

K
(sub),
there are codewords of weight D
d.
10.2.2 Performance of concatenated codes
The common decoding method for a concatenated code is to rst decode the inner code.
The information symbols obtained in this way are interpreted as symbols of the RS
codewords, which are then decoded. As discussed in the case of product codes, serial
correction of errors in the inner and outer code ensures only correction of
Dd
4
errors. If
more than
d
2
errors occur in a codeword of the inner code, a symbol error in the outer
code may be produced, and only
D1
2
symbol errors can be corrected. However, with
a random distribution of bit errors, most of them would be corrected by the inner code.
Let p be the bit error probability. For a given inner code and decoding method (which
would usually be maximum likelihood decoding), the probability of decoding error for
the inner code, p
i
, can be found by one of the methods of Chapter 3.
The average number of symbol errors in a block of the RS code is = Np
i
. Since
this is typically a small fraction of the block length and N is quite large, the Poisson
distribution
P(t ) = e
t
t !
is a good approximation to the number of errors. We get a simple upper bound on the
tail of the Poisson distribution by using a quotient series with quotient
m
T+1
.
Lemma 10.2.3. If errors occur independently with probability, p, the probability that
there are more than T errors in a block of N symbols is upper bounded by
( p, T) =
e
Np
(Np)
T+1
(T +1)!
_
1
Np
(T+1)
_
Thus we have
Theorem 10.2.2. The probability of decoding failure for the concatenated code with an
outer RS code correcting T errors and inner code decoding error probability, p
i
, is
upper bounded by
P
f l
< ( p
i
, T)
Actual decoding errors are usually quite rare. The following theorem is a useful ap-
proximation.
Theorem 10.2.3. The probability of decoding error for the concatenated code is approx-
imately
P
ue

P
f l
T!
Proof. Decoding of T =
D1
2
errors maps only a small fraction of the q
NK
syn-
dromes on error patterns. Since there are about
N
T
T!
sets of error positions and q
T
sets
of error values, the probability that a random syndrome corresponds to an error pattern
is
1
T!
.
For a typical RS code with T = 8 this factor is 40320 > 2
15
. Thus in addition to cor-
recting errors left by the inner code, the RS code provides a reduction of the probability
of undetected error about equal to that of the 16 bit CRC check. It is consistent with the
denitions in Chapter 7 to consider the check symbols of the RS code as the parity eld
in the data frame. However, since the outer code contributes to the error correction, the
convention in some contexts is to include it in the rate of the overall code, and in this
way it will also affect the information efciency.
Example 10.2.1. A concatenated code
If we consider a moderately large eld as F
2
16 , we get (N, K) RS codes which easily reach the
limits of practical frame sizes. However for shorter frames, interleaving of eight bit symbols is
often preferred. If two such codes are interleaved and each symbol is expressed in the usual way
as a binary vector of length 8, the total code is a linear binary (16N, 16K) code. However, the
minimum distance of the binary code may not be much larger than N K. With (32, 16, 8)
inner codes we get codes of length 2
13
. If the rate of the outer code is
3
4
, the total code has rate
3
8
and minimum distance 512 (the Varshamov-Gilbert bound for these parameters is D 1280).
However, we get an acceptable performance even with an average number of errors per frame of
460. With the bit error probability
460
8192
, Theorem 3.3.1 gives an upper bound on the decoding
error probability for the inner code of 0.055. Thus the RS code has to correct an average of
14 errors, and a decoding failure caused by more than 32 errors has probability < 10
5
. With
T = 32 the probability of undetected error is extremely small.
10.2.3 Interleaving and inner convolutional codes
The concatenated codes used in practice often have a convolutional code as the inner
code. If the binary images of the I interleaved codes are written as an mI by N array,
the array is encoded by the convolutional code one column at a time. The purpose of
the interleaving, other than making the code longer, is to make the symbol errors in
each RS code almost independent. A precise analysis of such a code is difcult.
However, we can get a very similar code using a tail-biting version of the convolu-
tional code as the inner code for each column. The advantage from the point of view
of the analysis is that the symbol errors in a particular outer code are mutually now
independent.
10.2 Concatenated codes (serial encoding) 115
In tail-biting codes (and convolutional codes), there is a length associated with an er-
ror event and a probability distribution on the lengths. What is relevant in this con-
text is that the performance of the inner code may be characterized by probabilities
p
i1
, p
i2
, . . . , p
i I
that 1, 2, . . . , I consecutive outer code symbols in a particular col-
umn are in error. Since a single long error event is more likely than two shorter events,
we assume
p
i j
> p
i1
p
i, j 1
Clearly it may also occur that two non-consecutive symbols in the same column are in
error, but since this would be the result of two independent decoding errors, we may
treat this situation as errors in different columns. The probability that there is an error
in a column is
p
c
=
j
p
i j
The average symbol error probability may be found as
p
s
=
1
I
j
j p
i j
Clearly the probability of decoding failure is upper bounded by ( p
c
, T), which fol-
lows from using Theorem 10.2.1 with p
c
as the probability of error for the inner code.
However, we can get a stronger bound.
Theorem 10.2.4. For a concatenated code with an inner convolutional code, where the
symbol error probability after decoding the inner code is P
s
, the probability of frame
loss is upper bounded as
P
< I ( p
s
, T)
Proof. For each RS code the probability of decoding failure is ( p
s
, T). Thus for I
interleaved codes the average number of RS codewords that are not decoded I times
this number. The errors in different RS words within a frame are not independent, but
the theorem is true for any distribution of the words.
The actual performance is a little better since longer error events in the inner code tends
to cause more decoding failures in certain frames.
Even though several of the outer codewords cannot be decoded, there is often at least
one decoded RS word. Since this word has a very high probability of being correct,
it would be possible to iterate the decoding of the inner and outer codes. In the posi-
tions where errors have been corrected in the RS words, the decoding of the inner code
should be repeated with the corrected symbols. However, it is difcult to analyze the
improvement in performance of such a procedure.
10.3 Problems
Problem 10.3.1 Consider a product of (7, 4, 3) Hamming codes.
1) What are the parameters of the code?
2) How many errors can be corrected if the rows and columns are decoded serially?
3) How many minimum weight words are there in the product code?
4) Describe a parity check matrix for the product code (without necessarily writing it out).
5) How can three errors be decoded if they are
a) in three different rows or columns?
b) in only two rows and columns?
Problem 10.3.2 Consider a product of (16, 12) RS codes over F
16
.
2) How many errors can be corrected if we assume minimum distance decoding?
3) How many errors can be decoded if rows and columns are decoded serially?
Problem 10.3.3 Consider a product of the binary (5, 4) and (3, 2) even parity codes.
2) Prove that the code is cyclic if the positions are taken in the proper order.
3) Find the generator polynomial of the cyclic code.
Problem 10.3.4 Consider the binary image of a (15, 10) RS code over F
16
.
2) Change the code to a concatenated code with inner code (5, 4, 2). What are the parameters?
3) Change the code to interleaving degree 2 using the (9, 8, 2) code as inner code. What are the
parameters?
Problem 10.3.5 Consider a (30, 20) RS code.
1) If the symbol error probability is
1
30
, what is the probability of decoding failure?
2) Under the same assumptions, what is the probability of decoding error?
3) The code is used as the outer code in a concatenated code with I = 2 and inner (14, 10, 3)
code. If the bit error probability is 0.01, give an upper bound on the symbol error probability.
4) Give an upper bound on the probability of decoding failure for the concatenated code.
Problem 10.3.6 Let F
16
be expressed as in Chapter 2, Example 2.2.3.
1) Find the coefcients of the generator polynomial for an RS code of length 15 correcting two
errors.
2) Which binary code is contained in the RS code?
3) Let the generator matrix of a (7,4) Hamming code be as in Example 1.1.3. What are the
codewords representing the coefcients found in 1)?
10.3 Problems 117
4) Consider the concatenated code obtained by encoding the symbols of the RS code from ques-
tion 1) using the code from 3). What are the parameters of the code? (You may use the result
of 2) to nd the exact minimum distance).
5) Find a codeword in the concatenated code by taking the generator polynomial of the RS code
and encoding the symbols as in 4). Introduce seven bit errors in the nonzero symbols, and
decode the word.
Problem 10.3.7 (project) Use the interleaved RS code from Problem 10.3.5 and use the convo-
lutional code from Problem 8.8.1 as the inner code. Compare decoding results for a single inner
code and for a tail-biting code in each column.
Problem 10.3.8 (project) Write a decoding program for a (255, 239) RS code. Use this decoder
for a concatenated code with the (2, 1, 6) convolutional code from Problem 8.8.9 as inner code.
Perform a simulation with 2% bit errors.
Chapter 11
Decoding Reed-Solomon and BCH-
codes with the Euclidian algorithm
In this chapter we present the extended Euclidian algorithm for polynomials with coef-
cients in a (nite) eld and then show how this can be used for the decoding of cyclic
codes. Many practical decoders of Reed-Solomon codes use this method.
11.1 The Euclidian algorithm
Let F be a eld and let a(x), b(x) F[x] and suppose that deg(a(x)) deg(b(x)).
The Euclidian algorithm is used to determine a greatest common divisor, d(x), of a(x)
and b(x), and the extended version also produces two polynomials f (x) and g(x) such
that
f (x)a(x) + g(x)b(x) = d(x)
The input to the algorithm is a(x) and b(x).
The algorithm operates with four sequences of polynomials r
i
(x), q
i
(x), f
i
(x) and
g
i
(x).
The initialization is
r
1
(x) = a(x), f
1
(x) = 1, g
1
(x) = 0
and
r
0
(x) = b(x), f
0
(x) = 0, g
0
(x) = 1
For i 1 the algorithm performs polynomial division of r
i2
by r
i1
to obtain the
quotient q
i
and the new remainder r
i
(see section 2.2):
r
i2
(x) = q
i
(x)r
i1
(x) +r
i
(x), where deg(r
i
(x)) < deg(r
i1
(x))
and then we update f and g as
f
i
(x) = f
i2
(x) q
i
(x) f
i1
(x)
and
g
i
(x) = g
i2
(x) q
i
(x)g
i1
(x)
Since the degrees of the r
i
(x) decrease there is a last step, n say, where r
n
(x) = 0. We
then have that gcd(a(x), b(x)) = r
n
(x) and f
n
(x)a(x) + g
n
(x)b(x) = r
n
(x). In the
application to decoding, however we are actually not interested in the gcd as such, but
in some of the intermediate results.
Example 11.1.1. Let a(x) = x
8
and b(x) = x
6
+ x
4
+ x
2
+ x +1 be binary polynomials.
The algorithm gives:
i f
i
g
i
r
i
q
i
1 1 0 x
8
-
0 0 1 x
6
+ x
4
+ x
2
+ x +1 -
1 1 x
2
+1 x
3
+ x +1 x
2
+1
2 x
3
+1 x
5
+ x
3
+ x
2
x
2
x
3
+1
3 x
4
+ x +1 x
6
+ x
4
+ x
3
+ x
2
+1 x +1 x
4 x
5
+ x
4
+ x
3
+ x
2
x
7
+ x
6
+ x
3
+ x +1 1 x +1
5 - - 0 x +1
From this we see that gcd(x
8
, x
6
+ x
4
+ x
2
+1) = 1 and that (x
5
+ x
4
+ x
3
+ x
2
)x
8
+(x
7
+ x
6
+ x
3
+ x +1)(x
6
+ x
4
+ x
2
+ x +1) = 1.
That the algorithm works is based on the following:
Lemma 11.1.1. For all i 1:
1. gcd(r
i1
(x), r
i
(x)) = gcd(r
i
(x), r
i+1
(x))
2. f
i
(x)a(x) + g
i
(x)b(x) = r
i
(x)
3. deg(g
i
(x)) +deg(r
i1
(x)) = deg(a(x))
The statements can be proven by induction. The rst two are easy; we will do the
induction step in 3.
Suppose deg(g
i
(x)) +deg(r
i1
(x)) = deg(a(x)) then
deg(g
i+1
(x)) +deg(r
i
(x))
=deg(g
i+1
(x) +(deg(r
i1
(x) deg(q
i+1
(x))
=deg(g
i+1
(x) +((deg(a(x) deg(g
i
(x))) deg(q
i+1
(x))
=deg(a(x)) +(deg(g
i+1
(x) deg(g
i
(x)) deg(q
i+1
(x))
=deg(a(x)).
11.2 Decoding Reed-Solomon and BCH codes 121
11.2 Decoding Reed-Solomon and BCH codes
Let C be a cyclic (n, k) code over F
q
and suppose that the generator polynomial has
among its zeroes the elements ,
2
, . . . ,
2t
where F
q
m is an element of order n.
For the RS-codes we have n|q 1 and t =
_
nk
2
_
and for the binary BCH-codes we
have n|2
m
1 and d
min
2t +1.
We know from Chapters 5 and 6 that if Q(x) = q
0
+q
1
x +q
2
x
2
+ +q
t
x
t
satises
_
_
_
_
_
S
1
S
2
. . . S
t +1
S
2
S
3
. . . S
t +2
.
.
.
.
.
. . . .
.
.
.
S
t
S
t +1
. . . S
2t
_
_
_
_
_
_
_
_
_
q
0
q
1
q
2
.
.
.
q
t
_
_
=
_
_
_
_
_
_
_
0
0
0
.
.
.
0
_
_
(11.1)
where the syndromes S
i
are dened as S
i
= r
_
i
_
when r(x) is the received word,
then the error locations (i.e. the powers of that indicate the positions of the errors)
are among the zeroes of Q(x).
If we dene
S(x) =
2t
i=1
S
i
x
2t i
and calculate S(x)Q(x), then (11.1) gives that the terms of degree t , t +1, . . . , 2t 1
in S(x)Q(x) are all zero and therefore deg
_
S(x)Q(x) mod x
2t
_
< t .
On the other hand if Q(x), with deg(Q(x)) t saties that
deg
_
S(X)Q(x) mod x
2t
_
< t
then we have a solution to (11.1)
Theorem 11.2.1. If the Euclidian algorithm is used on input a(x) = x
2t
and b(x) =
S(x) and j is chosen s.t. deg(r
j 1
(x)) t and deg(r
j
(x)) < t , then Q(x) = g
j
(x).
Proof. It follows from Lemma 11.1.1 3 that deg(g
j
(x)) = 2t deg(r
j 1
(x)) t . By
the denition of j and from Lemma 11.1.1 2 we have that f
j
(x)x
2t
+ g
j
(x)S(x) =
r
j
(x). Since deg(r
j
(x)) < t we then get that deg
_
g
j
(x)S(x) mod x
2t
_
< t and there-
fore the theorem is proved.
Example 11.2.1. A (15, 5, 7) binary code
16
where
4
++1 = 0. Let C be the cyclic code with genera-
tor polynomial g(x) = m
(x)m
3 (x)m
5 (x) = (x
4
+x +1)(x
4
+x
3
+x
2
+x +1)(x
2
+x +1) =
x
10
+ x
8
+ x
5
+ x
4
+ x
2
+ x +1.
So C is a (15, 5, 7) code.
Let the received word be r(x) = x
4
+ x
2
+ x +1. The syndromes are
S
1
= r() = 1 +
2
+
5
+
6
= 1
S
2
= S
1
2
= 1
S
3
= r(
3
) = 1 +
6
+
15
+
3
=
7
S
4
= S
2
2
= 1
S
5
= r(
5
) = 1 +
10
+
10
+1 = 0
S
6
= S
3
2
=
14
We therefore have S(x) = x
5
+ x
4
+
7
x
3
+ x
2
+
14
.
The Euclidian algorithm on x
6
and S(x) gives:
i g
i
r
i
q
i
1 0 x
6
-
0 1 x
5
+ x
4
+
7
x
3
+ x
2
+
14
-
1 x +1
13
x
4
+
13
x
3
+ x
2
+
14
x +
14
x +1
2
2
x
2
+
2
x +1
12
x
3
+
12
x
2
+x +
14
2
x
3
3
x
3
+
3
x
2
+
12
x +1
9
x
2
+
11
x +
14
x
From this we see that j = 3 and g
3
(x) =
3
x
3
+
3
x
2
+
12
x +1, which has
8
,
9
and
10
as zeroes, so the error vector is x
10
+ x
9
+ x
8
and the codeword is therefore x
10
+ x
9
+ x
8
+
x
6
+ x
5
+ x
2
+1 ( = g(x)).
Example 11.2.2. A (15, 9, 7) Reed-Solomon code over F
16
16
where
4
+ + 1 = 0 and let C be the (15, 9, 7)
Reed-Solomon code over F
16
obtained by evaluation of polynomials of degree at most 8 in
0
, , . . . ,
14
. Then C has generator polynomial g(x) = (x )(x
2
)(x
3
)(x
4
)
(x
5
)(x
6
) = x
6
+
10
x
5
+
14
x
4
+
4
x
3
+
6
x
2
+
9
x +
6
.
Let r(x) = x
8
+
14
x
6
+
4
x
5
+
9
x
3
+
6
x
2
+ be a received word.
The syndromes are
S
1
= r() =
8
+
5
+
9
+
12
+
8
+ = 1
S
2
= r(
2
) = +
11
+
14
+1 +
10
+ = 1
S
3
= r(
3
) =
9
+
2
+
4
+
2
+ =
3
S
4
= r()
4
=
2
+
8
+
9
+
6
+
14
+ =
S
5
= r(
5
) =
10
+
14
+
14
+
9
+ + =
13
S
6
= r(
6
) =
3
+
5
+
4
+
12
+
3
+ =
3
We therefore have S(x) = x
5
+ x
4
+
3
x
3
+
6
x
2
+
13
x +
3
.
11.2 Decoding Reed-Solomon and BCH codes 123
6
and S(x) gives:
i g
i
r
i
q
i
1 0 x
6
-
0 1 x
5
+ x
4
+
3
x
3
+
6
x
2
+
13
x +
3
-
1 x +1
14
x
4
+
5
x
3
+ x
2
+
8
x +
3
x +1
2
2
x
2
+
4
x
11
x
3
+
10
x
2
++
7
x x +1
3
4
x
3
+
3
x
2
+
9
x +1
7
x
2
+x +
3
3
x +
3
The polynomial g
3
(x) has as zeroes 1,
4
and
7
, so the error polynomial is e(x) = e
1
+e
2
x
4
+
e
3
x
7
.
To nd the error values one can solve the system of equations:
S
1
= 1 = e() = e
1
+e
2
4
+e
3
7
(11.2)
S
2
= 1 = e(
2
) = e
1
+e
2
8
+e
3
14
(11.3)
S
3
=
3
= e(
3
) = e
1
+e
2
12
+e
3
6
(11.4)
The solution is e
1
= , e
2
=
6
and e
3
=
10
so we get e(x) =
10
x
7
+
6
x
4
+ and therefore
c(x) = r(x) +e(x) = x
8
+
10
x
7
+
14
x
6
+
4
x
5
+
6
x
4
+
9
x
3
+
6
x
2
(= x
2
g(x)).
The error values can be found directly using a formula (due to D. Forney) that we will
now derive. Before doing that we will introduce a useful concept.
Denition 11.2.1. Let a(x) = a
n
x
n
+ +a
1
x +a
0
be a polynomial in F
q
[x].
Then a
(x) = na
n
x
n1
+ + 2a
2
x + a
1
where the integers i in the coefcients are
the sum of i 1s in the eld.
With this denition it is easy to see that
(a(x) +b(x))
= a
(x) +b
(x) and (a(x)b(x))
= a
(x)b(x) +a(x)b
(x).
Let the error positions be x
i
1
, x
i
2
, . . . , x
i
t
with the corresponding error values e
i
1
, e
i
2
,
. . . , e
i
t
. Using the notation from Theorem 11.2.1 we will prove
Theorem 11.2.2.
e
i
S
= x
(2t +1)
i
S
r
j
_
x
i
S
_
g
j
_
x
i
S
_ (11.5)
Proof. From (5.6) we have that
e
i
S
=
t
r=1
P
(S)
r
S
r
P
(S)
(x
i
S
)
where
P
(S)
(x) =
t
r=1
P
(S)
r
x
r
= (1)
S1
x
t
l=1
l=S
_
x x
i
l
_
and S
r
=
t
j =1
e
i
j
x
r
i
j
.
From Theorem 11.2.1 we get
Q(x) = g
j
(x) =
t
l=1
_
x x
i
l
_
so
g
j
(x) =
t
S=1
t
l=1
l=S
_
x x
i
l
_
and hence
g
j
_
x
i
S
_
=
t
l=1
l=S
_
x x
i
l
_
= x
1
i
S
(1)
(S1)
P
(S)
_
x
i
S
_
From Theorem 11.2.1 we see that
r
j
(x) = g
j
(x)S(x) mod x
2t
and therefore
r
j
(x) =
t
l=1
_
x x
i
l
_
2t
i=1
S
i
x
2t i
mod x
2t
= (1)
S1
P
(S)
(x)x
1
_
x x
i
S
_
2t
i=1
S
i
x
2t i
mod x
2t
= (1)
S1
_
x x
i
l
_
2t
i=1
t
r=1
P
(S)
r
x
r1
S
i
x
2t i
mod x
2t
and hence
r
j
_
x
i
S
_
= (1)
(S1)
x
2t
i
S
P
(S)
r
S
r
Example 11.2.3. The (15, 9, 7) Reed-Solomon code over F
16
(continued)
We had r
3
(x) =
7
x
2
+x +
3
and g
3
(x) =
4
x
3
+
3
x
2
+
9
x +1 so g
3
(x) =
4
x
2
+
9
.
We use formula (11.5) and get:
e
1
=
7
7
+ +
3
4
+
9
1
14
=
e
2
=
_
4
_
7
1 +
5
+
3
12
+
9
=
2

1
2
8
=
6
e
3
=
_
7
_
7

6
+
8
+
3
3
+
9
=
11
=
10
in accordance with the result in Example 11.2.2.
11.3 Problems 125
11.3 Problems
16
with
4
+ +1 = 0.
Let C be the (15, 9) Reed-Solomon code over F
16
obtained by evaluation of polynomials from
F
16
[x] of degree at most 8 in
0
, ,
2
, . . . ,
14
. The generator polynomial of this code is
g(x) = (x )(x
2
) (x
6
).
Decode the received words below using the Euclidian algorithm and formula (11.5).
1) r(x) = x +
2
x
2
+ x
3
2) r(x) =
2
x
3
+
7
x
2
+
11
x +
6
Problem 11.3.2 We use the code C
sub
from above and receive
1) 1 + x
2
+ x
5
+ x
8
and
2) 1 + x + x
2
+ x
3
+ x
7
+ x
11
Decode these two words using the Euclidian Algorithm.
16
with
4
+ +1 = 0.
Let C be the (15, 7) Reed-Solomon code over F
16
obtained by evaluation of polynomials from
F
16
[x] of degree at most 6 in
0
, ,
2
, . . . ,
14
. The generator polynomial of this code is
g(x) = (x )(x
2
) (x
8
).
Decode the received word below using the Euclidian algorithm and formula (11.5). r(x) =
1 +x +
2
x
2
+ x
3
+ x
4
+ x
5
+ x
6
+ x
7
+ x
8
+
3
x
9
+ x
10
+ x
11
+ x
12
+ x
13
+ x
14
Chapter 12
List decoding of Reed-Solomon codes
In this chapter we will extend the decoding method presented in Chapter 5 for Reed-
Solomon codes in a way that allows correction of errors of weight greater than half the
minimum distance. In this case the decoder gives a list of closest codewords, so the
method is called list decoding.
12.1 A list decoding algorithm
Recall that if X = {x
1
, x
2
, . . . , x
n
} F
q
and k n, then the Reed-Solomon code
consists of all the words of the form
( f (x
1
), f (x
2
), . . . , f (x
n
))
where f (x) F
q
[x] has degree < k. The parameters are (n, k, n k +1).
Let r
n
F
q
be a received word, and suppose r is the sum of a codeword c and an error
vector e of weight at most . The idea here is an extension of the method presented in
Chapter 5, namely to determine a bivariate polynomial
Q(x, y) = Q
0
(x) + Q
1
(x)y + Q
2
(x)y
2
+ + Q
l
(x)y
l
such that:
1. Q(x
i
, r
i
) = 0, i = 1, 2, . . . , n.
2. deg(Q
j
(x)) n 1 j (k 1), j = 0, 1, . . . , l.
3. Q(x, y) = 0.
We then have
Lemma 12.1.1. If Q(x, y) satises the above conditions and c = ( f (x
1
), f (x
2
), . . . ,
f (x
n
)) with deg( f (x)) < k, then (y f (x))|Q(x, y).
Proof. The polynomial Q(x, f (x)) has degree at most n 1 but since r
i
= f (x
i
)
except in at most cases we have that Q(x
i
, f (x
i
)) = 0 in at least n cases and
therefore Q(x, f (x)) = 0. If we consider the polynomial Q(x, y) as a polynomial in
y over F
q
[x], we then conclude that (y f (x)) divides Q(x, y).
This means that we can nd all the codewords that are within distance at most from
the received word by nding factors of the polynomial Q(x, y) of the form (y f (x))
with deg( f (x)) < k. If is greater than half the minimum distance it is possible that
we get more than one codeword (a list), but since the y-degree of Q(x, y) is at most l,
there are at most l codewords on the list. Not all factors may correspond to a codeword
within distance to the received word.
It is not obvious under which conditions, on the numbers and l, a polynomial Q(x, y)
that satises the conditions actually exists, but the following analysis adresses this
question.
The rst condition is a homogeneous system of n linear equations so if the number of
unknowns is greater than n this system indeed has a nonzero solution. The number of
unknowns is (n ) +(n (k 1)) +(n 2(k 1)) + +(n l(k 1)).
This number equals (l +1)(n )
1
2
l(l +1)(k 1), so the condition becomes
(l +1)(n )
1
2
l(l +1)(k 1) > n (12.1)
or equivalently
< n
l
l +1

l
2
(k 1) (12.2)
and in order that condition (12.2) gives a nonnegative degree we must also have that
(n ) l(k 1) 0 (12.3)
or equivalently
n l(k 1) (12.4)
If l = 1 condition (12.2) becomes <
nk+1
2
which is half the minimum distance.
If l = 2 we get < n 2(k 1) if
k
n

2
3
+
1
n
and this is smaller than half the
minimum distance! If
k
n
<
2
3
+
1
n
then we get <
2
3
n (k 1) and this is smaller
than half the minimum distance if
k
n

1
3
+
1
n
. So we only get an improvement on half
the minimum distance if
k
n
<
1
3
+
1
n
.
This illustrates the general situation: In order to get an improvement on half the mini-
mum distance the parameters should satisfy
k
n
<
1
l +1
+
1
n
and in this case
< n
l
l +1

l
2
(k 1)
12.1 A list decoding algorithm 129
Example 12.1.1. List of 2 decoding of a (15, 3) Reed- Solomon code over F
16
.
16
where
4
+
3
+ 1 = 0 and consider the (15, 3) Reed-
Solomon code obtained by evaluating polynomials of degree at most 2 in the powers of . The
code has minimum distance 13 and is thus 6-error correcting. However using the above with
l = 2 we see that it is possible to decode up to seven errors with list size at most 2.
If w = (0, 0, 0, 0, 0, 0, 0, 0,
6
,
2
,
5
,
14
, ,
7
,
11
)
is received one nds
Q(x, y) = (1+x)y +y
2
= (y 0)(y (1+x)) and the corresponding two codewords are then
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) and
(0,
12
,
9
,
4
,
3
,
10
,
8
,
13
,
6
,
2
,
5
,
14
, ,
7
,
11
)
The list of l decoding algorithm can be presented as follows.
Algorithm 12.1.1. List decoding of RS codes (Sudan)
1
, r
2
, . . . , r
n
), and a natural number
l
j =0
_
_
_
_
_
r
1
j
. . . 0 0
0 r
2
j
. . . 0
.
.
.
.
.
.
.
.
.
0
0 0 . . . r
n
j
_
_
_
_
_
_
_
_
1 x
1
. . . x
l
j
1
1 x
2
. . . x
l
j
2
.
.
.
.
.
. . . .
.
.
.
1 x
n
. . . x
l
j
n
_
_
_
_
_
_
_
_
_
Q
j,0
Q
j,1
Q
j,2
.
.
.
Q
j,l
j
_
_
=
_
_
_
_
_
_
_
0
0
0
.
.
.
0
_
_
Here l
j
= n 1 j (k 1).
2. Put
Q
j
(x) =
l
j
r=0
Q
j,r
x
r
and
Q(x, y) =
l
j =o
Q
j
(x)y
j
.
3. Find all factors of Q(x, y) of the form (y f (x)) with deg ( f (x)) < k.
Output: A list of the factors f (x) that satisfy
d ( f (x
1
), f (x
2
), . . . , f (x
n
), (r
1
, r
2
, . . . , r
n
))
We will return to the factorization problem in Section 12.3. It follows from the above
analysis that if < n
l
l+1

l
2
(k 1), then the sent codeword is on the resulting list.
12.2 An extended list decoding algorithm
As we have seen the method described in the previous section only works for low rates
of the codes. In this section we discuss an improvement that works for all rates. In
order to explain this, we rst dene what we mean by the multiplicity of a zero of a
bivariate polynomial.
Denition 12.2.1. Let Q(x, y) =

q
i, j
x
i
y
j
be a polynomial in F
q
[x, y].
Let (a, b)
n
F
q
2 and let Q
(x, y) = Q(x +a, y +b) =
i, j
x
i
y
j
.
If q
i, j
= 0 for i + j < s and s is the largest such number, then (a, b) is said to be a
zero of Q(x, y) of multiplicity s.
The polynomial Q(x, y) = 1 + x
2
+ y
2
+ x
2
y
2
F
2
[x, y] has (1, 1) as a zero of
multiplicity 4 since Q(x +1, y +1) = x
2
y
2
.
We note that if Q(x, y) has (a, b) as a zero of multiplicity s, this corresponds to
_
s+1
2
_
homogeneous linear equations in the coefcients.
Let C be a Reed-Solomon code and let r be a received word, that is the sum of a
codeword c and an error vector of weight at most . We then determine a bivariate
polynomial
Q(x, y) = Q
0
(x) + Q
1
(x)y + Q
2
(x)y
2
+ + Q
l
(x)y
l
such that:
1. (x
i
, r
i
), i <= 1, 2, . . . , n are zeroes of Q(x, y) of multiplicity s.
2. deg(Q
j
(x)) s(n ) 1 j (k 1), j = 0, 1, . . . , l.
3. Q(x, y) = 0.
We then have
Lemma 12.2.1. If Q(x, y) satises the above conditions and c = ( f (x
1
), . . . , f (x
n
))
with deg( f (x)) < k, then (y f (x))|Q(x, y).
Proof. We rst prove that if f (x
i
) = r
i
, then (x x
i
)
s
|Q(x, f (x)). To see this let
p(x) = f (x + x
i
) r
i
; we then have p(0) = 0 and therefore x| p(x).
If h(x) = Q(x + x
i
, p(x) +w
i
) it follows from condition 1 that h(x) has no terms of
degree smaller than s so x
s
|h(x) and therefore (x x
i
)
s
|h(x x
i
). Since h(x x
i
) =
Q(x, f (x)) the claim is proved.
Now the proof of the lemma follows by noting that the polynomial Q(x, f (x)) has
degree at most s(n ) 1. But (x x
i
)
s
|Q(x, f (x)) for at least n x
i
s and
therefore Q(x, f (x)) is divisible by a polynomial of degree at least s(n ) and hence
Q(x, f (x)) = 0.
12.2 An extended list decoding algorithm 131
This implies that we can nd all the codewords of distance at most from the received
word by nding factors of Q(x, y) of the form (y f (x)) with deg( f (x)) < k.
In order to ensure the existence of a polynomial Q(x, y) that satises the three condi-
tions, we rst note that condition 2 only gives nonnegative degrees if
s(n ) l(k 1) 0 (12.5)
or equivalently
n
l(k 1)
s
(12.6)
The rst condition is a system of n
_
s+1
2
_
homogeneous linear equations, so if the num-
ber of unknowns is greater than this number the system has a nonzero solution. The
condition is
(l +1)s(n )
1
2
l(l +1)(k 1) > n
_
s +1
2
_
(12.7)
or equivalently
<
n(2l s +1)
2(l +1)
l(k 1)
2s
(12.8)
A detailed analysis of the numbers involved shows that if s < l, then we get an im-
provement on half the minimum distance if
k
n

1
n
+
s
l +1
(12.9)
Example 12.2.1. List of 4 decoding of a (63, 31) Reed-Solomon code over F
64
.
The code has minimum distance 33, but with s = 3 and l = 4 we can correct 17 errors.
We can now formulate the algorithm.
Algorithm 12.2.1. List decoding of RS codes (Gurusami-Sudan)
Input: A received word (r
1
, r
2
, . . . , r
n
) and natural numbers and s
1. Solve for Q
a,b
the system of linear equations, h +r < s and i = 1, 2, . . . , n.
ah
br
_
a
h
__
b
r
_
Q
a,b
x
ah
i
r
br
i
= 0 (12.10)
with Q
a
, b = 0 if l > a or b > l
a
where l
a
= s(n ) 1 a(k 1).
2. Put
Q
j
(x) =
l
j
r=0
Q
j,r
x
r
and Q(x, y) =
l
j =0
Q
j
(x)y
j
3. Find all factors of Q(x, y) of the form (y f (x)) with deg ( f (x)) < k
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
k/n
/
n
Figure 12.1: Upper bounds on the fractional number of errors, /n, that
can be corrected in an RS code for given rate and list size l = 1, 2, 3, 4.
For each list size it follows from (12.8) and (12.9) that the bound is a
sequence of straight line segments connecting
_
k
n
,

n
_
=
_
s(s+1)
l(l+1)
,
ls
l+1
_
for s = 0, 1, 2, . . . , l.
Output: A list of factors f (x) that satisfy
d (( f (x
1
), f (x
2
), . . . , f (x
n
)) , (r
1
, r
2
, . . . , r
n
)) <
Here equation (12.10) is a simple reformulation of the condition that (x
i
, r
i
), i =
1, 2, . . . , n are zeroes of multiplicity s of Q(x, y).
It follows from the above analysis that if
<
n(2l s +1)
2(l +1)
l(k 1)
2s
then the sent codeword is on the output list.
12.3 Factorization of Q(x, y)
We shall briey discuss the problem of nding factors of Q(x, y) of the form y f (x)
with deg( f (x)) < k. Let h(x) F
q
[x] be an irreducible polynomial of degree k.
Let E = F
q
k be constructed using this h(x). For a polynomial p(x) F
q
[x] let
[ p(x)] denote the element in E corresponding to p(x) mod h(x). Let be the map
12.3 Factorization of Q(x, y) 133
: F
q
[x, y] E[y] given by
i
p
i
(x)y
i
_
=
i
[ p
i
(x)]y
i
(12.11)
So the effect of the mapping is to reduce all the p
i
(x) modulo h(x) and therefore
(Q(x, y)) can be considered as an element of F
q
k [y]. It is not hard to see that
(Q
1
Q
2
) = (Q
1
) (Q
2
) and (Q
1
+ Q
2
) = (Q
1
) +(Q
2
)
based on this observation one can prove
Lemma 12.3.1. If (y f (x))|Q(x, y), then y [ f ] is a factor of (Q(x, y)).
This means that the factorization problem is reduced to factoring univariate polynomi-
als over F
q
k and there are efcient algorithms (in most symbolic computing languages)
for doing this. Actually some of these packages can factor Q(x, y) directly.
Example 12.3.1. We consider the (15, 7, 9) Reed-Solomon code over F
16
where is a primitive
element of F
16
satisfying
4
+ +1 = 0 and x
i
=
i1
, i = 1, 2, . . . , 15.
With s = 4 and l = 6 we can list decode ve errors.
Let the received word be r = (0, 0,
11
, 0,
12
,
11
, 0, 0, 0, 0, 0, 0,
3
, 0,
7
).
If we use the method of Section 12.2 we get
Q(x, y) = (
5
+
4
x +
2
x
2
+ x
3
+
7
x
4
+
3
x
5
+
6
x
6
+
3
x
7
+
7
x
8
+
6
x
9
+
10
x
10
+
13
x
11
+
9
x
12
+
10
x
13
+
3
x
14
+
11
x
15
+ x
17
+
14
x
18
+
2
x
19
+
4
x
20
+
10
x
21
+
12
x
22
+
6
x
24
+
2
x
25
+
2
x
26
+x
27
+
3
x
28
+
8
x
29
+
9
x
31
+
13
x
32
+ x
33
)y
+(
4
x + x
2
+
11
x
3
+
8
x
4
+ x
5
+
14
x
6
+
3
x
7
+
14
x
8
+
5
x
9
+
7
x
10
+
11
x
11
+
6
x
12
+x
13
+
9
x
14
+
11
x
15
+x
16
+
13
x
17
+
11
x
18
+
11
x
19
+
14
x
20
+
10
x
21
+
5
x
22
+
13
x
23
+x
24
+
4
x
25
+
3
x
26
+
9
x
27
)y
2
+(
9
+
5
x +
9
x
2
+
3
x
3
+
10
x
4
+
4
x
6
+
9
x
15
+
5
x
16
+
9
x
17
+
3
x
18
+
10
x
19
+
4
x
21
)y
3
+(
2
+
3
x +
5
x
2
+
10
x
3
+x
4
+
10
x
5
+
14
x
7
+
4
x
8
+
5
x
9
+
12
x
10
+
12
x
11
+
12
x
12
+ x
13
+ x
14
)y
4
+(
12
+
14
x
+
13
x
2
+
4
x
3
+
10
x
4
+
9
x
5
+
11
+ x
6
+
4
x
7
+
8
x
8
)y
5
+(x +
8
x
2
)y
6
The polynomial h(x) = 1+
11
x +
6
x
2
+
8
x
3
+
11
x
4
+
12
x
5
+
13
x
6
+x
7
is irreducible
over F
16
and using that we get
(Q(x, y)) = (
8
+
14
x +
9
x
2
+
10
x
3
+ x
4
+
5
x
5
+
12
x
6
)y
+(
8
+x +
11
x
2
+
4
x
3
+
7
x
4
+
6
x
5
+
6
x
5
+ x
6
)y
2
+(
8
+
10
x +
8
x
2
+ x
3
+
2
x
4
+ x
5
)y
3
+(
4
+
14
x +
8
x
2
+
11
x
3
+
9
x
4
+
9
x
5
+x
6
)y
4
+(
5
+
2
x + x
2
+
9
x
3
+x
4
+
3
x
5
+ x
6
)y
5
+ y
6
The factorization of this polynomial into irreducible factors over F
16
7 gives
(Q(x, y)) = y
_
(
4
+
10
x +
11
x
2
+
9
x
3
+
4
x
4
+
9
x
5
+ x
6
) + y
_
_
(1 +
14
x +
3
x
2
+
5
x
3
+
7
x
4
+
3
x
5
+x
6
)
+(
13
+
10
x +
7
x
2
+x
3
+
10
x
4
+
7
x
5
+
4
x
6
)y
+(
8
+
8
x +
12
x
2
+
3
x
3
+
13
x
4
+
6
x
6
)y
2
+(
8
+
4
x +
12
x
2
+ x
4
+x
5
)y
3
+ y
4
_
The factor (
4
+
10
x +
11
x
2
+
9
x
3
+
4
x
4
+
9
x
5
+ x
6
) + y corresponds to a word with
distance 14 from the received word r and the only other degree 1 factor is y, so in this case we
only get one word on the list, namely the all 0s word and therefore the decoding was not only
correct but also unique. Actually this is the typical case.
If we consider the same situation as above and receive
r = (1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0)
we get
Q(x, y) = (
14
+
14
x
10
+
14
x
20
)y
2
+(
14
+
14
x
10
)y
4
+
14
y
6
and
(Q(x, y)) = (
11
+
6
x
2
+
2
x
3
+
10
x
4
+
8
x
5
+
2
x
6
)y
2
+(
8
+
6
x +
5
x
2
+
8
x
4
+
10
x
5
+ x
6
)y
4
+ y
6
The factorization gives:
(Q(x, y)) =
_
(
5
+
10
x
5
) + y
_
2
_
(
10
+
5
x
5
) + y
_
2
y
2
This in turn corresponds to three codewords, all with distance 5 to r, namely
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
(1,
10
, 0, 1,
10
, 0, 1,
10
, 0, 1,
10
, 0, 1,
10
, 0)
and
(1, 0,
5
, 1, 0,
5
, 1, 0,
5
, 1, 0,
5
, 1, 0,
5
)
12.4 Problems 135
12.4 Problems
Problem 12.4.1
1) How many errors can be corrected with a list size 2 in a (31,10) Reed-Solomon code over F
32
?
2) With list size 3?
Problem 12.4.2 A (31, 5) Reed-Solomon code over F
32
is list decoded. What list size should
be used in order to correct as many errors as possible?
Problem 12.4.3 Consider the (10,3,8) Reed-Solomon code over F
11
with x
i
= 2
i1
, i =
1, 2, . . . , 10
1) How many errors can be corrected with list size 2?
2) If (0, 0, 6, 9, 1, 6, 0, 0, 0, 0) is received and we decode with list size 2, show that Q(x, y) =
y
2
y(x
2
3x +2).
3) What are the possible sent codewords?
Problem 12.4.4 Consider a (31,15) Reed-Solomon code over F
32
. If this is list decoded using
the extended algorithm, how many errors can be corrected using different values of l and s?
Chapter 13
Iterative decoding
In Chapter 10 we discussed several ways of building a frame structure from smaller
component codes. Here we discuss decoding of composite codes based on iterating the
decoding of the components. Our starting point is a type of long codes where the parts
are rather loosely connected, and in this way some decoding decisions can be made
based on local information. In the development of coding theory this was one of the
early approaches, although it has only recently become a subject of wide interest.
13.1 Low density parity check codes
As discussed in Chapter 1, a block code is dened by its parity check matrix, H, and
every row in H indicates a parity check that all words satisfy. A code has many equiv-
alent parity check matrices, but for codes under consideration here we concentrate on
a particular matrix with a useful property. Sometimes it is useful to specify a matrix
with more than N k rows to get a desirable structure. Thus in this chapter we shall
not require that the rows of H are linearly independent.
Denition 13.1.1. A low-density parity check (LDPC) code with parameters (N, i, j )
has block length N and a parity check matrix with j 1s in each row and i 1s in each
column. N is usually much larger than i and j .
Lemma 13.1.1. The rate of an (N, i, j ) LDPC satises
R 1
i
j
Proof. If H has b rows, the total number of 1s is bj = Ni . These rows may not all be
linearly independent, and thus the dimension k of the code satises
k N b = N
Ni
j
Let the parity check matrix be indexed as H = [h
uv
]. It is convenient to refer to the
set of row indices in column v such that h
uv
= 1 as I
v
, and J
u
is similarly the set of
column indices such that h
uv
= 1 for v in J
u
. We can say that position v is checked
by the rows I
v
and that parity check u includes symbols J
u
. Since we assume that H
is sparse, we will require that for u
= u
the intersection of J
u
and J
u
is at most one
element. Similarly for v
= v
the intersection of I
v
and I
v
is at most one element.
For large N it is more practical to store the sets I
v
and J
u
than the entire matrix H.
More general LDPC codes can be dened by specifying a distribution of weights in the
rows or columns or by assigning the elements of H independently with a low probabil-
ity of 1s. We shall discuss only the classical structure with xed weights.
Example 13.1.1. A cyclic code as LDPC code
In Problem 6.5.6 we have considered a (21, 12, 5) cyclic code. By including a factor x + 1 in
the generator polynomial we get a (21, 11, 6) code generated by
g(x) = x
10
+ x
7
+ x
6
+ x
4
+ x
2
+1
The parity polynomial is
h(x) = x
11
+ x
8
+ x
7
+ x
2
+1
and we obtain the parity check matrix as a square matrix consisting of all cyclic shifts of the rst
row
[101000011001000000000]
In this way we get a (21, 5, 5) LDPC code. For any pair of distinct rows or columns in the parity
check matrix there is exactly one position where both have a 1.
Example 13.1.2. Decoding erasures with LDPC codes
One may get a rst impression of iterative decoding by considering decoding of erasures only
(thus the assumption is that no actual errors occur). If we consider the code from Example 13.1.1
with bits 8 to 15 (bit 0 is the leftmost bit) of the received word erased, decoding simply consists
in solving the system of linear equations Hr
T
= 0. We consider one parity check equation at
a time. In the rst 14 equations there are more than one erasure, and we leave them unchanged
for the time being. However, in the next two and the last three equations there is only one era-
sure, and we can ll in the symbol values. Now we can return to the rst equations and nd the
remaining symbols. Thus the calculation proceeds until all erasures are corrected or all remain-
ing equations contain more than one erasure. The last situation causes a decoding failure, even
though the remaining system of equations might have a unique solution.
13.2 Iterative decoding of LDPC codes
We can decode an LDPC code in the following way: Consider the i parity checks in I
v
.
Each of these involves j 1 other received symbols, which we assume to be distinct. If
pj < 1 where p is the probability of bit error, most sets of j symbols will not include
errors, and most parity checks are satised. Thus we can decode each position by a
majority decision: For each position, the symbol in position v is changed if a majority
of the parity checks with indices I
v
are not satised.
Lemma 13.2.1. If less than
i
2
errors occur among the i ( j 1) +1 symbols involved in
the majority decision, the position is correctly decoded.
13.2 Iterative decoding of LDPC codes 139
Proof. If position v is correct, less than
i
2
parity checks can have errors, and thus most
of them are satised. If there is an error in position v, less than
i
2
parity checks contain
an additional error, and thus most of them fail.
The code may be decoded by a single application of this procedure, or we can choose to
repeat the decoding with the modied vector as input and iterate the decoding a certain
number of times. This iterative approach is called bit-ipping.
Algorithm 13.2.1. Bit-ipping
Input: The received vector r.
1. Set q = r.
2. Calculate s = Hq
T
.
3. If for some v
uI
v
s
u
>
i
2
, set q
v
= q
v
+1 mod 2.
4. Repeat from 2 until q is unchanged.
Output: The decoded word, q
For a received vector r, we refer to Hr
T
as a generalized syndrome (generalized be-
cause it may have more than N k bits).
Lemma 13.2.2. The weight of the generalized syndrome is non-increasing when the
bit-ipping algorithm is applied.
Proof. A symbol is changed only if the number of 1s in the syndrome can be re-
duced.
Theorem 13.2.1. The iterative algorithm stops after a nite number of iterations with
either a correct codeword or a vector where most of the parity checks are satised for
all symbols.
Proof. The theorem follows from Lemma 13.2.2.
In the derivation of the following algorithm we need some important structural proper-
ties, which are often described in graph theory terms: Let a graph consist of two types
of nodes, symbol nodes and parity nodes. A node representing a symbol, c
v
, v J
u
is connected by an edge to the parity node representing row u. Thus in an (N, i, j )
LDPC code each symbol node is connected to i parity nodes, and these are connected
to j symbol nodes. In this way the graph is an image of the parity check matrix with
each edge representing a 1 in the matrix. For a graph consisting of connected symbol
and parity nodes, the code dened by the graph consists of all vectors of symbol node
values that satisfy the parity checks to which they are connected.
The situation is particularly simple if the graph is a tree, i.e. it is connected, but if
any one edge is removed, the graph is no longer connected, in particular a tree has no
circuits. Thus for each edge we can talk about the subgraph on either side of it.
Denition 13.2.1. In an LDPC code, a code dened by a subgraph consisting of a subset
of the parity nodes and all symbol nodes connected to them is called a tree code if the
corresponding graph is a tree.
Lemma 13.2.3. In an (N, i, j ) LDPC code with j > 2, a tree code is a linear
(q j q +1, q j 2q +1, 2) code.
Proof. If there are q parity checks in the code, there are q j branches connected to sym-
bol nodes. However q 1 of these symbols are shared between two parity checks when
they are connected to form a tree. In a tree some nodes called leaves are connected to
only one other node. In this case the leaves are symbol nodes. For j > 2, there is at
least one parity node that is connected to at least two leaves. Thus all parity checks are
satised if these are the only nonzero symbols.
Thus a tree code in itself has poor minimum distance, and in particular the leaves are
poorly protected.
Example 13.2.1. Tree codes in LDPC codes.
In the code considered in Example 13.1.1, i = j = 5. The ve parity checks on a particular
symbol dene a tree code consisting of all 21 symbols. By making a majority decision about
each symbol, 2 errors can be corrected.
In order to analyze the following algorithm, we state the decoding problem in a more
general way: Assume that there is an integer weight a
v
(c
v
) associated with symbol v,
i.e. the weight has value a
v
(0) if c
v
= 0 and a
v
(1) if c
v
= 1. For a codeword c we get
the weight
A(c) =
v
a
v
(c
v
)
Find a codeword c such that the weight is minimized
A = min
cC
A(c)
Thus in particular the usual maximum likelihood decoding problem is obtained by let-
ting a
v
(x) = r
v
+ x mod 2.
For each edge in the tree connecting symbol v and parity check u, we dene a message
being passed in each direction, each in the form of a pair of integers.
Denition 13.2.2. The message m
s
(v, u, x) from symbol v to parity check u indicates
the minimum value of the weight function over codewords in the subtree that includes
the symbol, but not the parity check, conditioned on c
v
= x. Similarly, the message
m
p
(v, u, x) from parity check u to symbol v indicates the minimum value of the weight
function over codewords in the subtree that includes the parity check but not the sym-
bol, conditioned on c
v
= x.
Lemma 13.2.4. The minimal weight may be found from the messages on any branch as
A = min
x
[m
s
(v, u, x) +m
p
(v, u, x)]
13.2 Iterative decoding of LDPC codes 141
Proof. Each edge separates the tree into two subtrees, and the minimal weight is ob-
tained by adding the contributions for the subtrees with one of the choices for c
v
=x.
Theorem 13.2.2. The messages dened in Denition 13.2.2 can be calculated recur-
sively starting from the leaves of the tree. When all incoming messages to a node are
known, the outgoing messages are calculated as
1. Symbol node v,
m
s
(v, u
, x) = a
v
(x) +
uI
v
,u=u
m
p
(v, u, x)
2. Parity node u,
m
p
(v
, u, x
) =
vJ
u
,v=v
min
x
m
s
(v, u, x)
when x
satises the parity check. For the other value of x
, change one of the

m
s
such that the increase is minimal.
Proof. For a symbol node, the minimal value is the sum of the incoming values from
subtrees connected to it plus the weight of the symbol itself. All of these values are
conditioned on the same symbol value. For a parity node we rst nd the sum of the
minimal incoming messages. The associated symbol values determines the possible
value on the output edge. In order to get the message for the other symbol value, we
have to change at least one input symbol. The smallest increase in the weight is ob-
tained by changing only one message. In the rst step we can calculate the messages
from those symbol nodes that are leaves in the tree, since the only input here are the
symbol weights a
v
(c
v
). After each step, the new set of nodes that have all input mes-
sages dened may be obtained as the leaves of the tree that remain when the leaves
from the previous stage have been removed.
The minimal weight calculated from the messages is a lower bound on the number of
errors in the tree. Even though not all errors can be corrected with certainty, decoding
the tree code serves as a way of estimating the number of errors.
The messages can be simplied by always taking the difference between the two values
rather than transmitting a pair of integers. This change has no effect on the result of the
decoding, but we lose information about the number of errors that is corrected. In this
case the initial values of the weights are a
v
= a
v
(1) a
v
(0), i.e. 1. The numerical
value of the message difference indicates the reliability of the symbol. If all symbol
weights are initiated to a
v
(c
v
) and the outgoing messages are initially set to this value,
we can use the updating from Theorem 13.2.2. By iterating this calculation a nite
number of times, we arrive at the same result. The reason is that the message from
a subtree does not depend on the incoming message, and thus the correct value will
propagate from the leaves as before.
If the symbol weights are initiated in the same way in a general graph, and the same
updating rules are used, the messages in a given edge after s steps will depend on only
nodes that can be reached in s steps from this branch. If this part of the graph is a tree,
the messages may be interpreted as for the tree code.
Algorithm 13.2.2. Iterative decoding by message-passing
Input: The received vector r
1. Initially the received symbols are assigned weights a
v
(c
v
) = r
v
+1 mod 2.
2. The messages are calculated as in Theorem 13.2.2.
3. The algorithm continues until the weights are unchanged or a preset limit on the
number of iterations has been reached.
4. For each symbol, the value that gives the minimum number of errors is chosen.
Output: The decoded word q
The application of Algorithm 13.2.2 is only justied even in some approximate sense
as long as the number of nodes that can be reached from a given starting node is less
than N. If the difference messages a
v
are used, they may be interpreted as reliabilities
of the symbols and used in additional iterations of the algorithm. It can be observed
experimentally that the performance of the decoding is improved, but an exact analysis
of this form of iterative decoding is difcult. However the following result indicates
that it is sometimes possible to combine the local results of decoding the tree codes
into a global result:
Theorem 13.2.3. Assume that several tree codes in an LDPC code are decoded, and that
all N k independent parity checks are included in the codes. If T is the largest num-
ber of errors detected in the decoding of the tree codes, any error patterns of weight T
which is consistent with the decoding results of the tree codes is an ML decision.
Proof. As discussed earlier, the minimal weight obtained by decoding a tree code is a
lower bound on the number of errors that has occurred, but there may be several possi-
ble error patterns of the same weight. After decoding each code we can make a list of
these patterns. Since the tree codes include all parity checks, an error pattern which is
shared between all lists has the right syndrome. Since each tree code is decoded ML,
the number of errors cannot be less than T. Thus any error pattern with T errors is ML,
but there may not be such an error pattern.
Example 13.2.2. A (15, 3, 3) LDPC code
The code considered in this example is the dual of the (15, 11, 3) Hamming code. We can get
a generator matrix by removing the bottom row and the right (all 0s) column from the matrix in
Example 1.2.3. The parity checks are weight 3 codewords of the Hamming code. There are 35
such words, but we shall use a particular set of 15 words (of course only 11 of these are linearly
13.3 Decoding product codes 143
independent). Thus the code is described as a (15, 3, 3) LDPC code with the following parity
check matrix:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 1 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 1 0 1 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 1 0 0 1
0 0 1 0 0 0 0 0 0 0 0 0 1 1 0
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 1 0 0 0 1 0 0
0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
0 0 0 0 1 0 0 0 0 0 1 0 0 1 0
0 0 0 0 0 1 0 1 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0 1 0 0 0 0 0 1
0 0 0 0 0 0 1 0 0 1 0 0 1 0 0
0 0 0 0 0 0 1 0 0 0 1 1 0 0 0
_
_
The decoding is based on tree codes of the following form: Start with a particular parity check
and the three symbols connected to it. Add the six additional parity checks connected to these
symbols. These six parity nodes are connected to 12 other symbols, which in this case are exactly
the 12 remaining code symbols. Starting from any particular parity check, it is easy to check that
the tree code has this form. It is less obvious that the matrix gives such tree codes for all starting
nodes.
The minimum weight of the tree code is 2 (by Lemma 13.2.3), but a codeword which is nonzero
in one of the three central symbols has weight at least 6. Thus we can correct two errors by
message passing in such a tree, but not always three errors. However, we can detect the presence
of three errors, and by combining the decoding results for two trees we can decode the error
patterns. Consider the tree codes dened by the rst seven rows of the parity check matrix, and
the received vector
100010101000101
which causes parity checks 1, 5, 6, and 7 to fail. Starting from the leaves, the messages from
parity checks 2, 3, and 4 become (1, 0), (1, 0), and (0 ,1). From 5, 6, and 7 we get (1, 0). These
contributions are added in symbols 1, 2, and 3 to give (3, 0), (1, 2), and (2, 1). Parity check
1 is satised if two of these symbols are 1s. Taking the rst three symbols to be (1, 0, 1) the
weight indicates that two errors have occurred. If the calculation of messages is continued using
Theorem 13.2.2, the messages from parity check 1 become (3, 2), (1, 2), and (2, 1). Adding the
messages to the rst three symbols and the weight of the symbol itself, we get (6, 2), (2, 4), (4,
2). This again conrms that two errors have occurred, and that the values of the decoded symbols
are (1, 0, 1). The rst symbol is slightly more reliable than the other two.
13.3 Decoding product codes
The decoding discussed in the previous section focuses on single symbols and par-
ity checks. For this reason little use is made of the known results on constructing
and decoding good codes. However, it is also possible to combine decoding of sim-
ple component codes and iterative decoding. In this section we return to the topic of
decoding product block codes, and the following section discusses iterative decoding
using convolutional codes.
The basic properties of product codes were introduced in Section 10.1, and we noted
that it was desirable to correct a number of errors signicantly greater than
D
2
.
We introduce a graph terminology similar to the one in the previous section by associat-
ing a node with each symbol and a parity node with each rowand column. Each symbol
node is connected to the parity node for the respective row and column. Thus the parity
node represents not just a single parity check, but a decoding of the component code.
Denition 13.3.1. In a product code, a code dened by a subgraph consisting of the
parity node for a row and all column nodes and symbol nodes is called a row tree
code. Similarly a column tree code is dened by one column node, all row nodes and
all symbol nodes.
It may be readily checked that the graphs in question are in fact trees.
In order to decode the rst row tree code we decode each column twice. We decode
column j assuming that the symbol in the rst row is 0; the number of errors corrected
in this case is a
j
(0). Similarly a
j
(1) is the number of errors corrected when the rst
symbol is assumed to be 1. Using these weights we nd a codeword in row 1, c
1
, such
that the sum of the weights is minimized:
A = mi n
cC
a
j
(c
1 j
)
Lemma 13.3.1. For any error pattern of weight less than
D
2
the rst row is correctly
decoded.
Proof. Any codeword in the row tree code that has a nonzero symbol in the rst row,
has weight at least d in that row, and total weight at least D = d
2
. Thus if the number
of errors is less than
D
2
, any codeword with a different bit in the rst row would have
distance greater than
D
2
to the received vector. However, since there are codewords
of weight d which are nonzero only in a single column (and outside row 1), the error
pattern is not necessarily correct in other rows.
We can now prove a result similar to Theorem 13.2.3.
Theorem 13.3.1. Let a row tree code and a column tree code in a product code be de-
coded, and the total number of errors be T in both cases. Any error pattern of weight
T which is consistent with both results is maximum likelihood.
Proof. All parity nodes are included in the two codes, and by the same reasoning as
in the proof of Theorem 13.2.3 any error pattern that is consistent with the results of
decoding the two codes is maximum likelihood.
13.4 Parallel concatenation of convolutional codes (turbo codes) 145
Example 13.3.1. Decoding of product code
We consider a product of two extended Hamming codes dened by the parity check matrix given
in Example 1.2.3. Thus the parameters of the code are (256, 121, 16). As a rst step of the de-
coding we calculate the syndromes of all rows and columns. The syndrome of a single Hamming
code may be expressed as xp, where p is 0 for an even number of errors and 1 for error patterns
of odd weight, and x is the mod 2 sum of the error positions. In order to simplify the example
we consider a case where the rows and columns have the same sequence of syndromes (the error
pattern otherwise has a typical distribution on rows and columns):
(01, 20, 31, 20, 51, 41, 150, 71, 91, 00, 00, 00, 00, 00, 00, 31)
Since there is at least one error in the odd columns and two errors in even syndromes with
nonzero syndromes, the syndromes indicate at lest 13 errors. We shall base the decoding on two
tree codes, the row 0 code with messages fromall columns and the corresponding column 0 code.
The messages may be described in the following simplied way when only a single iteration is
used:
00: 0 errors if the symbol is correct, four errors otherwise
x0: two errors whether the symbol is correct or in error
01: one error if the symbol is an errors, three otherwise
x1: three errors if the symbol is an error, one otherwise
Given these messages we search for the minimum weight codeword in row 0, which has syn-
drome 31. We shall not give a formal decoding algorithm for the extended Hamming code with
weights associated with the symbols, but based on the decoding of the row without weights, we
shall nd a nearby codeword which minimizes the sum of the weights. In addition to the zero
codeword we consider low weight codewords with 1s among the positions 1, 2, 4, and 7 where
the columns indicate a single error in row 0 or a double error. Adding a 1 in position 3 to give
the correct syndrome we get the potential error patterns
_
_
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1
1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0
_
_
Of these the second pattern gives at least 17 errors, whereas the two other patterns could give 15
errors. While these results give codewords in the tree codes, they may not produce codewords in
the product code. The third error pattern would place the double errors in columns 2 and 4 in rows
0 and 2, but this is not consistent with the syndrome in row 2. By the symmetry of the syndromes,
the remaining solution is also the unique minimum weight codeword in the column 0 tree code.
The single errors in rows/columns 3, 5, and 9 are in immediate agreement. The remaining
single and double errors in rows/columns 4, 6, 7, and 8 are seen to agree if placed in positions
(4, 4), (4, 6), (6, 4), (7, 7), (7, 8), and (8, 7). Since the two tree codes agree, the decoding is ML.
13.4 Parallel concatenation of convolutional codes
(turbo codes)
In parallel encoding the same information symbols are encoded by two systematic en-
coders, and the transmitted frame consists of the information symbols and two sets of
parity symbols. Usually the information symbols are permuted in some way before
the second encoding, and the permutation is referred to as interleaving. Product codes
may be seen as a type of parallel encoding if the part of the word representing parity
checks on parity symbols is removed. Leaving out these symbols reduces the minimum
distance, but if the component codes have moderate rate, the overall rate is increased.
The minimum distance of product codes could be increased, or at least the number of
low weight words could be decreased, if different permutations were applied to the
rows before the encoding of the columns. Unfortunately at this time no analysis of the
interaction of interleavers and encoders is available, and we do not discuss the choice
of interleavers.
Parallel encoding and decoding by iteration between the two coding stages can be used
with a variety of coding methods. This section presents the construction that is best
known, parallel concatenation of two (possibly identical) convolutional codes. In typi-
cal constructions a frame consists of about 10000 information symbols, and two sets of
parity symbols produced by 16-state encoders with systematic rational encoding ma-
trices as discussed briey in Section 8.6. Before the second encoding, the information
symbols are permuted in the input sequence in such a way that symbols which were
close to each other in the rst encoding will be spread out in the second encoding.
The permutation increases the minimum distance of the combined code and it serves
to send dependent messages from one decoder to different segments of the other code.
The decoder uses a modied Viterbi decoding algorithm which accepts weights for the
input symbols and produces a message for each decoded information symbol. The rst
step is to allow more general cost functions. The state transitions and the correspond-
ing transmitted blocks are given by the matrix as in Chapter 9. In Chapter 9 the
costs were simply the number of differences between the received block, r
j
, and
i
i
,
but the algorithm can be used more generally with e
i
i
being a function of j , the state
transition (i
, i ), and the received symbols r

j
. As long as the costs are added and do not
depend on past segments of the chosen chain, the optimality of the solution is proved
as in Section 9.2 .
As discussed in Section 13.2, the updated weights of each information symbol, a
j
=
a
j
(1) a
j
(0), indicate the difference between the number of errors corrected when
information symbol j is assumed to be a 1 and a 0. Thus, in addition to the minimum
weight sequence, we also want, for each instance of time, j
, the closest codeword with

the opposite value of the information symbol. We could nd this sequence by forcing
the Viterbi algorithm to choose a particular information symbol in this step and then
continuing the search. However we can get these weights with less computational ef-
fort by combining the result of the modied Viterbi decoding with a similar decoding
performed in the reverse direction through the sequence. Since the reverse algorithm
is also optimal, the sequences will eventually have the same accumulated costs, but the
intermediate values are different. Thus at time j
the forward algorithm has calculated

the minima of the accumulated costs for sequences starting at time 0 and reaching each
of the states at time j
, and the reverse algorithm has calculated the accumulated costs

from time RN back to time j
. Thus we can nd the cost of the alternative sequence

as the minimum of the sum over the states that have the required information symbol.
13.4 Parallel concatenation of convolutional codes (turbo codes) 147
The messages from code 1 about information symbol u
j
, m
1
( j )) is now calculated as
a
j
m
2
( j ), where m
2
is the message received from code 2.
Algorithm 13.4.1. Turbo decoding
Input: The received sequence
1. Decode code 1 using the modied Viterbi algorithm with weights
e
i
i
( j ) = W(
i
i
+r
j
)
if the information symbol u
i
i
= 0,
e
i
i
( j ) = W(
i
i
+r
j
) +m
2
( j )
if the information symbol u
i
i
= 1.
2. Repeat the decoding in the reverse direction to obtain the accumulated weights
i
( j ).
3. For each j
, nd the new weight of the information symbol at time j
, a
j
as the
difference between the weight of the ML sequence and the best sequence with the
opposite value of the information symbol at time j
.
4. Generate a message for each information symbol as m
1
( j
) = a
j
m
2
( j
).
5. Permute the information symbols and decode code 2 using these messages.
6. Iterate between the two decoding steps a prescribed number of times.
Output: The decoded information sequence
The rationale for Algorithm 13.4.1 is that decoding decisions in a convolutional code
usually depend on only a short segment of the received sequence. Thus the weight
of an information symbol may be interpreted as the difference in weight between two
decoded sequences that differ only on a segment close to the symbol. The purpose of
the interleaver is to ensure that the messages contributing to the decision come from
different parts of the frame and are based on independent decoding decisions. If the
frame is large and only a limited number of iterations are used, the decoding will in this
way approximately have the properties of decoding a tree code. However, the number
of iterations is often about 20, and tree code approximation is no longer valid. As for
message passing decoding of LDPC codes we may interpret the later iterations as at-
tempts to reconcile the decoding results for the two convolutional codes by assigning
reliabilities of the information symbols. However, a precise analysis of the algorithm
has not been made.
Example 13.4.1. Turbo decoding.
In this example we illustrate the decoding principle by considering a very small parallel encod-
ing. A systematic convolutional code of rate
1
2
and memory 1 is generated by g = (11.01). A
tail-biting version of length 5 is used here. If the information bits are reordered to the sequence
(1, 3, 5, 2, 4), and encoded again with the same code, we may represent the total rate
1
3
code by
the generator matrix
_
_
_
_
_
_
1 0 0 0 0 1 1 0 0 0 1 1 0 0 0
0 1 0 0 0 0 1 1 0 0 0 0 1 1 0
0 0 1 0 0 0 0 1 1 0 1 0 0 0 1
0 0 0 1 0 0 0 0 1 1 0 1 1 0 0
0 0 0 0 1 1 0 0 0 1 0 0 0 1 1
_
_
Thus the code is a (15, 5, 5) block code. The codeword is (11000, 10100, 11110) and the re-
ceived sequence (01000, 10000, 01110). We iterate between decoding the two tail-biting codes.
The rst decoding operates on the received sequence 01.10.00.00.00. Starting and ending in
state 0 we get the costs
_
0 1 2 2 2 2
1 1 2 3
_
There are two equally good decisions, and we choose the zero word. Decoding the same se-
quence in the reverse direction we get the costs
_
2 1 0 0 0 0
1 1 1 1
_
Thus for the information symbols (0, 0, 0, 0, 0) the costs are (2, 2, 2, 2, 2), while a 1 in each of
the rst four positions gives weights that are the sums of the costs in state 1: (2, 2, 3, 4, 3). We
have added the weight of the last information bit without a detailed calculation. This result shows
that the information sequence (1, 1, 0, 0, 0) has the same number of errors as the zero sequence,
whereas more errors would have to be assumed to make the other information bits 1s. Taking
the difference between the two sets of weights we get
a( j ) = (0, 0, 1, 2, 1)
The messages are obtained by subtracting the contribution from the current information symbol:
1
( j ) = (1, 1, 0, 1, 0)
Now the second code is decoded with the received sequence 00.01.11.01.00. In the decoding,
the messages from the rst decoding stages are added to the cost of the transitions where the
information symbol is a 1. However the messages have to be permuted in the same way as the
information symbols.
Again starting in state 0 the costs become
_
0 0 1 3 2 2
1 2 2 4
_
and in the reverse direction
_
2 2 1 1 0 0
1 2 0 1
_
This gives messages
2
( j ) = (0, 0, 3, 2, 2)
Thus, when the decoding of the rst code is repeated, the message 3 decides the second bit in
favor of a 1, and the decoded information sequence becomes (1, 1, 0, 0, 0).
13.5 Problems 149
13.5 Problems
Problem 13.5.1
1) Perform a systematic encoding of the code from Example 13.1.1 when the information sym-
bols are (11011011000). Use the parity polynomial to determine the rst parity symbol and
proceed to nd the others one at a time. Compare this process to the erasure correction de-
scribed in Example 13.1.2.
2) Decode the received vector (010110100110110110110) using majority decisions.
Problem 13.5.2
1) Use the structure of a cube to construct a parity check matrix in the following way: Associate
a binary symbol with each edge, and let each corner represent a parity check on the three
adjoining edges. Write out the matrix.
2) Find the parameters of the code.
3) Find a tree code with as many symbols as possible.
4) Show how an error can be corrected by combining two tree codes.
Problem 13.5.3 1) Decode the code from Example 13.1.1 using bit ipping when the received
vector is (010110100110111110110).
2) How many errors are corrected?
3) Is the result the same when the symbols are decoded in a different order?
4) Repeat the decoding using message passing.
Problem 13.5.4 1) Use the parity check matrix from Example 13.1.1 to decode the vector
(111010001110101) using message passing in a tree code.
2) Continue the decoding for two more iterations. Is the decision ML ?
Problem 13.5.5 1) Consider a product code where the component codes have generator and
parity check matrices
_
_
_
_
0 1 0 1 0 1 0 1
0 0 1 1 0 0 1 1
0 0 0 0 1 1 1 1
1 1 1 1 1 1 1 1
_
_
What are the parameters of the code?
2) Decode the received word
_
_
_
_
_
_
_
_
_
_
_
_
0 1 1 0 0 1 1 1
1 1 0 0 0 0 1 1
0 0 0 1 1 1 0 0
0 0 1 1 1 1 0 0
0 0 1 1 0 1 0 0
0 0 1 1 1 1 0 0
0 1 0 0 0 0 1 1
1 1 0 0 0 0 1 1
_
_
Chapter 14
Algebraic geometry codes
In this chapter we briey describe a class of algebraic block codes, which is a recent
generalization of Reed-Solomon codes.
Where Reed-Solomon codes over F
q
can have length at most q, these so called alge-
braic geometry codes can be signicantly longer. We will not give the general theory
here but we present a special class of codes that seems to have the best chance to even-
tually be used in practice.
14.1 Hermitian codes
Acodeword in an (n, k) Reed-Solomon code is obtained by evaluating certain functions
(polynomials fromF
q
[x] of degree smaller than k) in certain elements (x
1
, x
2
, . . . , x
n
)
from F
q
. In particular we can get a basis for the code by evaluating the monomials
1, x, . . . , x
k1
in the n elements.
The algebraic geometry codes are obtained by generalizing this idea in the sense that
one chooses points from some geometric object (i.e. curves or higher dimensional
surfaces) and then chooses a suitable set of functions. The evaluation of any of these
functions in the chosen points gives a codeword. It turns out that with the proper choice
of points and functions one gets codes (for large q and n) that are better than previously
known codes.
In order to describe the codes we need some facts on polynomials in two variables. Let
F be a eld, then F[x, y] denotes the set of polynomials in the variables x and y with
coefcients from F, i.e. expressions of the form
a
m,n
x
m
y
n
+ +a
i, j
x
i
y
j
+ +a
0,1
y +a
1,0
x +a
0,0
where a
i, j
F. The degree is the maximum of the numbers i + j with a
i, j
= 0. Like
polynomials in one variable a polynomial in two variables is irreducible if it cannot be
written as a product of polynomials of positive degree.
We state without proof a theorem that plays the same role as Theorem 2.2.2 did for
Reed-Solomon codes.
Theorem 14.1.1. (Bezout) Let f (x, y), g(x, y) F[x, y] where f (x, y) is irreducible
and has degree m, and g(x, y) has degree n. If g(x, y) is not a multiple of f (x, y),
then the two polynomials have at most mn common zeroes in F
2
.
In the following we will describe some algebraic geometric codes, the Hermitian codes.
The construction consists of choosing points on a specic curve and then choosing a
suitable set of functions whose values in these points give a codeword. This illustrates
the general construction.
We choose points P
1
= (x
1
, y
1
), P
2
= (x
2
, y
2
), . . . , P
n
= (x
n
, y
n
) as all the different
elements of
2
F
q
2 that satisfy the equation
x
q+1
y
q
y = 0
It can be shown (see Problem 14.3.3) that the polynomial x
q+1
y
q
y is irreducible.
Note that the points have coordinates in F
q
2 . With the notation as above we have:
Lemma 14.1.1. n = q
3
.
The points are (0, ) with F
q
2 ,
q
+ = 0
and
_
i(q+1)+j (q1)
,
i
_
, i = 0, 1, . . . , q 2, j = 0, 1, . . . , q,
where
i
q
+
i
=
i(q+1)
,
i
F
q
2 and is a primitive element of F
q
2 .
Proof. We rst note that if F
q
2 , then
q
+ F
q
. Also
q
1
+
1
=
q
2
+
2

q
1

q
2
+
1

2
= 0 and therefore (
1

2
)
q
+ (
1

2
) = 0. The polynomial
x
q
+x has exactly q zeroes in F
q
2 (see Problem 14.3.4), so
q
+ = 0 in q cases and
for each i {0, 1, . . . , q 2},
q
+ =
i(q+1)
in q cases, since
q+1
is a primitive
element of F
q
. In the rst case we get x
q+1
= 0 and hence x = 0 and in the second
case we get x
q+1
=
i(q+1)
with the q + 1 solutions
i+j (q1)
, j = 0, 1, . . . , q.
Therefore the number of points n = q +(q
2
q)(q +1) = q
3
.
In order to describe the set of functions we will use in the construction of the codes,
we introduce the following:
Denition 14.1.1. For x
a
y
b
F
q
2 [x, y] let
(x
a
y
b
) = aq +b(q +1)
and for f (x, y) =

i, j
f
i, j
x
i
y
j
F
q
2 [x, y],
( f (x, y)) = max
f
i, j
=0
_
x
i
y
j
_
The function : F
q
2[x, y] is called the order function.
We note that ( f (x, y)g(x, y)) = ( f (x, y)) +(g(x, y)).
With this we dene the Hermitian codes.
14.1 Hermitian codes 153
Denition 14.1.2. Let s be a natural number s < n q
2
+q. A Hermitian code H(s)
over F
q
2 consists of the codewords
( f (P
1
), f (P
2
), . . . , f (P
n
))
where f (x, y) F
q
2 [x, y] with ( f (x, y)) s and deg
x
( f (x, y)) < q +1
From the construction it is obvious that the length of the code is n, and it is also clear
that the code is linear.
To determine the dimension we rst note that the evaluation of the monomials
M(s) = {x
a
y
b
where 0
_
x
a
y
b
_
s, 0 a < q +1}
in P
1
, P
2
, . . . , P
n
gives a basis for the code. To see this we rst prove
Lemma 14.1.2. The elements of M(s) have different orders.
Proof. If
_
x
a
1
y
b
1
_
=
_
x
a
2
y
b
2
_
we have a
1
q +b
1
(q +1) = a
2
q +b
2
(q +1) and therefore
(a
1
a
2
+b
1
b
2
)(q +1) = (a
1
a
2
). But the absolute value of the right- hand side
is smaller than q +1 and therefore we must have a
1
= a
2
and hence b
1
= b
2
.
This in turn also implies that we have a natural ordering of these monomials namely
by their value. Let this be
0
,
1
, . . . .
Lemma 14.1.3. The vectors obtained by evaluating the monomials in M(s) in the points
P
1
, P
2
, . . . , P
n
are linearly independent.
Proof. Suppose that
0
0
(P
j
) + +
v
v
(P
j
) = 0 for j = 1, 2, . . . , n.
This implies that the polynomial (x, y) =
0
0
(x, y) + +
v
v
(x, y) has n ze-
roes. Since deg
x
((x, y)) < q +1 it is not a multiple of x
q+1
y
q
y and the degree
of (x, y) is at most
s+q
q+1
and since P
1
, P
2
, . . . , P
n
are points on the curve with the
equation x
q+1
y
q
y = 0 which has degree q +1 it follows from Theorem 14.1.1
that the number of zeroes is at most s +q. But s +q < n and therefore
0
0
(x, y) +
+
v
v
(x, y) must be the zero polynomial so
0
=
1
= =
v
= 0.
Let (s) denote the number of elements in M(s).
We then have
Theorem 14.1.2. If s > q
2
q 2, then (s) = s +1
q
2
q
2
and for s > q
2
q 1
there exists (a, b) with 0 a q, 0 b such that aq +b(q +1) = s.
Proof. By induction on s. If s = q
2
q 1 the number of solutions to the inequality
that has a = j is q 1 j so the total number is
q
2
q
2
corresponding to the statement
of the theorem. If s = q
2
q, then (q 1)q = s. Suppose we have proven the
statement for all s
where q
2
q s
< s. Then if a
1
q +b
1
(q +1) = s 1 we get
(a
1
1)q +(b
1
+1)(q +1) = s. If a
1
= 0 we get s = q q +(b
1
+1 q).
This combined with the observation above immediately gives
Theorem 14.1.3. The dimension k(s) of the code H(s) is
k(s) =
_
(s) if 0 s q
2
q 2
s +1
q
2
q
2
if q
2
q 2 < s < n q
2
+q
The number g =
1
2
(q
2
q) is called the genus of the curve x
q+1
y
q
y = 0. It can
be proved that this is the number of natural numbers that do not appear as orders. We
also have
Theorem 14.1.4. H(s)
= H(n +q
2
q 2 s).
Proof. We will only prove the theorem in the case where s > q
2
q 2. From the
above it is easy to see that k(s) + k(n + q
2
q 2 s) = n, so it sufces to prove
that any element of H(s) is orthogonal to any element of H(n +q
2
q 2 s).
This is the same as showing that if (a
1
, b
1
) and (a
2
, b
2
) satisfy a
1
q + b
1
(q + 1)
s, a
1
< q +1 and a
2
q +b
2
(q +1) n +q
2
q 2 s, a
2
< q +1 then,
n
i=0
x
a
1
+a
2
y
b
1
+b
2
(P
i
) = 0
By Lemma 14.1.1 the left- hand side of this equation is the same as
q
+=0
0
a
1
+a
2
b
1
+b
2
+
q2
i=0
q
+=
i(q+1)
b
1
+b
2
q
j =0
(
i(q+1)+j (q1)
)
a
1
+a
2
If a
1
+a
2
= 0 this is

b
1
+b
2
= 0
If a
1
+a
2
= 0 we get
q2
i=0
j (q1)(a
1
+a
2
)
q
+=
i(q+1)
b
1
+b
2
q
j =0
j (q1)(a
1
+a
2
)
Since
q
j =0
j (q1)(a
1
+a
2
)
=

(q
2
1)(a
1
+a
2
)
1
(q1)(a
1
+a
2
)
1
the sum is 0 if
(q1)(a
1
+a
2
)
= 1.
The case
(q1)(a
1
+a
2
)
= 1 is treated in Problem 14.3.5.
14.2 Decoding Hermitian codes 155
Note that it is now easy to get a parity check matrix for the code H(s). The rows
of that are the evaluation of the monomials in M(n + q
2
q 2 s) in the points
P
1
, P
2
, . . . , P
n
.
Our nal task is to estimate the minimum distance of the Hermitian codes. Here we
have the following
Theorem 14.1.5. Let d(s) denote the minimum distance of the code H(s).
If q
2
q 2 < s < n q
2
+q, then d(s) n s
We will only prove this in the case where s = h(q +1) for some nonnegative integer h.
We can show that a codeword ( f (P
1
), . . . , f (P
n
)) in H(s) has weight at least n s by
showing that the polynomial f (x, y) has at most s zeroes among the points P
1
, . . . , P
n
.
Since aq + b(q + 1) h(q + 1) we get (a + b)(q + 1) h(q + 1) + q and hence
a + b h +
q
q+1
and therefore a +b h, so the degree of f (x, y) is at most h. We
also have that deg
x
( f (x, y)) < q +1 and therefore by Theorem 14.1.1 f (x, y) has at
most h(q +1) = s zeroes, since P
1
, P
2
, . . . , P
n
are on a curve of degree q+1.
A proof of the theorem for other values of s is more difcult, actually one can prove
that d(s) = n s.
Combining Theorem 14.1.3 and Theorem 14.1.5 we get
Corollary 14.1.1. d(s) n k(s) +1 g for the codes H(s).
Recall that for Reed-Solomon codes we have d = n k +1 so the algebraic geometry
codes have smaller minimum distance, but are longer.
Example 14.1.1. Hermitian codes over F
16
With q = 4 we get (64, s +1 6, 64 s) codes over F
16
for s = 10, 11, . . . , 50. If we take the
(64, 32, 27) code over F
16
and concatenate it with the (8,4,4) extended Hamming code we get a
(512,128,108) binary code.
14.2 Decoding Hermitian codes
In this section we will present a mixture of the methods from Section 5.2 and Section
5.4 to give a decoding algorithm for Hermitian codes.
We consider the code H(s) as presented in Section 14.1 and receive a word r =
(r
1
, r
2
, . . . , r
n
) which is the sum of a codeword c = ( f (P
1
), f (P
2
), . . . , f (P
n
)) and
an error vector e of weight . The idea is to determine a polynomial in three variables
Q(x, y, z) = Q
0
(x, y) +z Q
1
(x, y) F
q
2 [x, y, z]\{0}
1. Q(x
i
, y
i
, r
i
) = 0, i = 1, . . . , n.
2. (Q
0
) s + + g.
3. (Q
1
) + g.
Theorem 14.2.1. If errors occurred, then there is at least one nonzero polynomial
Q(x, y, z) which satises the three conditions.
Proof. If there are errors, there exists a polynomial Q
1
(x, y) =

i=0

i
i
with
the error positions as zeroes since we get a system of homogenous linear equations
in + 1 unknowns. If f (x, y) is the polynomial that gives the transmitted code-
word, and r(x, y) is the polynomial corresponding to the received word we then have
f (P
j
)Q
1
(P
j
) = r(P
j
)Q
1
(P
j
) for j = 1, 2, . . . , n since f (P
j
) = r(P
j
) if P
j
is
not an error position and Q
1
(P
j
) = 0 if P
j
is an error position. This means that
Q(x, y, z) = f (x, y)Q
1
(x, y) z Q
1
(x, y) satises all three conditions.
Theorem 14.2.2. If the number of errors is less than
dg
2
, then the error postions are
zeroes of Q
1
(x, y).
Proof. If <
dg
2
=
nsg
2
we get s++g < n. The polynomial Q(x, y, f (x, y)),
where the transmitted codeword is generated by f (x, y) has at least n zeroes among
the points P
1
, P
2
, . . . , P
n
. The degree of Q(x, y, f (x, y)) is smaller than or equal to
s++g
q+1
so by Theorem 14.1.1 it must either be the zero polynomial or a multiple of the
polynomial x
q+1
y
q
y. In both cases we get Q
0
(P
j
) + f (P
j
)Q
1
(P
j
) = 0 for
j = 1, 2, . . . , n. Since we have Q
0
(P
j
) + r(P
j
)Q
1
(P
j
) = 0 for j = 1, 2, . . . , n,
we get by subtraction (r(P
j
) f (P
j
))Q
1
(P
j
) = 0 for j = 1, 2, . . . , n and hence
Q
1
(P
j
) = 0 if P
j
is an error position.
From the above we can now give a decoding algorithm for the codes H(s). We dene
l
0
= s +
dg
2
and l
1
=
dg
2
.
The algorithm now can be presented as follows:
Algorithm 14.2.1.
1
, r
2
, . . . , r
n
)
_
_
_
_
_
_
_
0
(P
1
)
1
(P
1
) . . .
l
0
(P
1
) r
1
0
(P
1
) r
1
1
(P
1
) . . . r
1
l
1
(P
1
)
0
(P
2
)
1
(P
2
) . . .
l
0
(P
2
) r
2
0
(P
2
) r
2
1
(P
2
) . . . r
2
l
1
(P
2
)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
(P
n
)
1
(P
n
) . . .
l
0
(P
n
) r
n
0
(P
n
) r
n
1
(P
n
) . . . r
n
l
1
(P
n
)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Q
0,0
Q
0,1
Q
0,2
.
.
.
Q
0,l
0
Q
1,0
Q
1,1
.
.
.
Q
1,l
1
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
0
0
.
.
.
0
0
0
.
.
.
0
_
_
2. Put Q
1
(x, y) =

l
1
j =0
Q
1, j
j
.
3. Find the zeroes of Q
1
(x, y) among the points P
1
, P
2
, . . . , P
n
.
4. Calculate the rst l
1
+g 1 syndromes using the parity check matrix of the code
and the received word.
14.3 Problems 157
5. Solve the system of linear equations using the parity check matrix and the (pos-
sible) error positions and the calculated syndromes to nd the error values.
output: The error vector
Notice in the systemabove that each rowof the matrix corresponds to a triple (x
i
, y
i
,r
i
).
We have already seen that if the number of errors is smaller than
dg
2
, then the output
of the algorithm is the sent word.
Example 14.2.1. Decoding of the code H(27) over F
16
.
The code has parameters (64, 22, 37) and the above algorithm corrects 15 errors.
It is possible to present another version of the above algorithm as we did in Section
5.4 and generalize it so that all error patterns of weight smaller than half the minimum
distance can be decoded; we refer to the special literature for that.
14.3 Problems
Problem 14.3.1
1) With s = 27 what are the parameters of the code H(s) over F
16
?
2) Find a generator matrix for the code.
3) What are the parameters of the dual code?
4) How can you nd a parity check matrix of the code?
Problem14.3.2 (Project) Write a program that implements the decoding algorithm for the codes
H(s) over F
16
.
Try to decode random error patterns of weight up to half the minimum distance.
Problem 14.3.3 Show that the polynomial x
q+1
y
q
y is irreducible in F
q
2 [x, y].
Hint: write x
q+1
y
q
y as (a(x) + yb(x, y)) (c(x) + yd(x, y)).
Problem 14.3.4 Show that the polynomial x
q
+ x has exactly q zeroes in F
q
2.
Problem 14.3.5 Complete the proof of Theorem 14.1.4 by considering the case when
(q1)(a
1
+a
2
)
= 1.
Appendix A
Communication channels
In some communication channels the output is accurately described as the sum of (a
scaled version of) the input and an independent noise term, which follows a Gaussian
distribution. This description applies to deep space communication where several cod-
ing techniques were rst applied. The appendix describes how the codes studied in this
book are applied to such channels, and how some of the decoding techniques can be
modied to work in this context.
A.1 Gaussian channels
The concept of mutual information as introduced in Chapter 4 can be extended to real
variables. We dene the mutual information by a modication of (4.2)
I (X; Y) = E
_
log
p(y|x)
p(y)
_
(A.1)
where p is the density function. Note that for a ratio between two probabilities, we
may use the ratio of the density function. Again the capacity is the maximum of this
quantity with respect to an acceptable input distribution. We shall consider only the
special case that is of practical importance:
Denition A.1.1. A discrete time memoryless additive Gaussian channel is dened by
variables X and Y, where X has zero mean and standard deviation , and Y = X +N
where the noise variable N is normal with zero mean and standard deviation and
independent of X.
We note without proof that the mutual information is maximized by choosing Y to
be Gaussian, and that this implies X Gaussian. With this choice the capacity may be
calculated.
Theorem A.1.1. The capacity of the discrete memoryless Gaussian channel is
C =
1
2
log
_
1 +

2
2
_
(A.2)
160 Communication channels
Proof. For a normal variable with standard deviation ,
E
_
log( p(x))
_
=
1
2
log
_
2
2
_
+ E
_
x
2
2
2
log e
_
(A.3)
It follows from the denition of the variance that the last term is
1
2
log e. Applying
this expression to the output, Y, and Y conditioned on X, which is the noise distribu-
tion, gives the theorem.
In many communication systems the modulation system gives rise to two identical and
mutually independent channels (quadrature modulation). It is sometimes convenient
to represent these as a single complex channel where the noise is a complex Gaussian
variable. We may nd the capacity as twice the value given by (A.2)
A.2 Gaussian channels with quantized input
and output
As we mentioned in the previous section, the mutual information of the Gaussian chan-
nel is maximized by using a normal distribution as the input. However, for practical
communication it is convenient to use a discrete input distribution with the smallest
possible number of levels. Clearly we have to use more than A input symbols to reach
a capacity of log A. Similarly, the receiver is simplied if we can work with a quan-
tized output. The easiest situation would be to distinguish the same number of levels
as used for the input, but better performance is obtained with ner output quantization.
Example A.2.1. A quantized Gaussian channel
The Gaussian channel may be converted to a binary symmetric channel by using inputs 1 and
observing the sign of the output. Introducing the function Q() as the probability that a normal
variable with 0 mean and variance 1 exceeds , we have p = Q
_
1
_
for the discrete channel.
For the Gaussian channel, = = 1 gives C =
1
2
. However Q(1) = 0.1587, and the capac-
ity of a BSC with p = 0.1587 is only 0.3689. Thus the quantization introduces a signicant
degradation. However most of the degradation is due to the output quantization.
Let the input alphabet be real numbers {x
1
, x
2
, . . . , x
r
}, and let the output be compared
to a set of thresholds {s
1
, s
2
, . . . , s
m
}. Thus the output alphabet is {y
1
, y
2
, . . . , y
m+1
}
where the discrete output y
j
indicates that the real output, x + n, satises s
j 1
<
x +n < s
j
, 1 < j < m +1, y
1
corresponds to x +n < s
1
, and y
m+1
to x +n > s
m
.
The transition probabilities may be found as
P
_
y
j
|x
i
_
= Q
_
x
i
s
j 1
_
Q
_
x
i
s
j
_
(A.4)
where is the standard deviation of the noise.
A.3 ML Decoding 161
Example A.2.2. (Example continued)
With the same channel as before, we keep the binary input but use eight output symbols dened
by thresholds at 0,
1
2
, 1, and
3
2
. The transition matrix becomes
Q =
_
0.3085 0.1915 0.1915 0.1499 0.0918 0.0441 0.0165 0.0062
0.0062 0.0165 0.0441 0.0918 0.1499 0.1915 0.1915 0.3085
_
The channel capacity of this discrete channel can then be calculated to be C = 0.4773.
As illustrated in the example, Gaussian channels can be converted to discrete channels
by using a simple discrete input alphabet and performing a suitable quantization of
the output. The capacity of the discrete channel is upper bounded by the value (A.2)
for the Gaussian channel with the same input variance and noise. In particular binary
codes can be used on channels with capacity less than 1, but to get the best performance
4 16 output levels are often used (soft decision).
Example A.2.3. A complex Gaussian channel
A commonly used system of transmitted points for a complex Gaussian channel is the set
{1 i, 1 3i, 3 i, 3 3i } which is known as 16QAM (quadrature amplitude mod-
ulation). At the output the plane may be divided into decision regions by using the thresholds
0 and 2 for both the real and the imaginary part. If the input symbols are equally likely, the
variance of the input is 5. The 16 input points are conveniently mapped to binary input symbols
by coding the four levels as 00, 01, 11, 10. In this way an error that causes the received signal
to be interpreted as a neighboring symbol in the real or imaginary direction will be mapped to
a single bit error. Thus there is no great loss in using a binary code on this channel. Another
alternative is a Reed-Solomon code.
A.3 ML Decoding
When equally likely codewords from a discrete alphabet are used in the transmission,
the receiver should select the codeword that maximizes P[r|c
i
]. This is the Maximum
Likelihood decision. For a memoryless channel this probability can be expressed as a
product over the symbols, and the calculations are simplied by maximizing the loga-
rithm:

j
log P
_
r
j
|c
i j
_
(A.5)
In the binary case we can simplify the expression by subtracting the logarithm condi-
tioned on the opposite transmitted symbol. This does not change the decision. Using
the expression for the Gaussian distribution we get
j
_
_
r
j
c
i j
2
_
2
+
_
r
j
+c
i j
2
_
2
_
(A.6)
This result shows that the receiver should maximize the correlation
j
r
j
c
i j
(A.7)
162 Communication channels
or equivalently minimize the error
j
c
i j
_
c
i j
r
j
_
(A.8)
Thus when the channel output is quantized using A 1 equally spaced thresholds, we
can simply use the integers {0, 1, . . . , A 1} as weights for the symbols.
Using such a set of weight, the Viterbi algorithm for convolutional codes, iterative de-
coding of block codes and certain other techniques can be modied with these so-called
soft decisions.
Appendix B
Solutions to selected problems
B.1 Solutions to problems in Chapter 1
Problem 1.5.1
1) No, the sum of the last two words is not in there. Or 5 is not a power or of 2.
2) If we add (010110), (011001) and (100101) we get a linear code.
3) There are many possibilities, e.g. (101010), (010110), (001111).
Problem 1.5.2
1) Using row operations we get
G =
_
_
1 0 0 1 0 1
0 1 0 1 1 0
0 0 1 1 1 1
_
_
2) Since H
C
= G
C
this is the G that we have.
3) No, G(111111)
T
= (100) = (000).
Problem 1.5.3
1) The dimension of the code is 4 and therefore the dimension of the dual is 124 = 8.
To get the minimum distance of the code (= minimum weight) at this point we list
the 2
4
= 16 codewords and nd d
min
= 6.
For the dual code we have H
C
= G
C
and since columns 1, 4 and 5 in G sum to
zero and there is no zero column and no two equal columns, the minimum distance
of C
is 3.
2) The original code can correct two errors and the dual code can correct one error.
164 Solutions to selected problems
Problem 1.5.4
1) h
j
.
2) h
1
+h
4
+h
5
.
3) (1, 1, 1, 1, 1) is a codeword if and only if H(11111)
T
= h
1
+h
2
+h
3
+h
4
+h
5
= 0.
Problem 1.5.5 This is Lemma 1.2.3.
Problem 1.5.6
Since 1 +
_
14
1
_
+
_
14
2
_
+
_
14
3
_
= 470 and 470 < 2
9
but 470 > 2
8
the GV-bound says that
there exists a (15, 6, 5) code.
Actually there is a (15, 7, 5) code as we will see.
Problem 1.5.7
1) Parameters of the binary Hamming codes (See Denition 1.2.5).
m 3 4 5 8
n = 2
m
1 7 15 31 255
k = 2
m
1 m 4 11 26 247
d 3 3 3 3
2) Parameters of the extended binary Hamming codes (See Dention 1.2.6).
m 3 4 5 8
n = 2
m
8 16 32 256
k = 2
m
1 m 4 11 26 247
d 4 4 4 4
Problem 1.5.8
1) The code is a (7, 4, 3) code, so the dual code has dimension 3. By listing all the
2
3
= 8 codewords we see that the minimum weight (= the minimum distance) is 4.
2) By looking at all the eight codewords we see that all the distances are 4.
3) A generator matrix for the dual Hamming code (2
m
1, m) has as columns all the
nonzero binary m-vectors.
The minimum distance (= the minimum weight) of this code is 2
m1
and any two
codewords have this distance.
There are many ways of seeing that; here is a proof by induction on m.
For m = 1 the statement is trivial since the only words are 0 and 1. Suppose the
statement is true for m = i ; we will prove that it is also true for m = i +1.
We have
G
i+1
=
_
G
i
G
i
0
0 . . . 0 1 . . . 1 1
_
and from this the statement follows easily.
B.1 Solutions to problems in Chapter 1 165
Problem 1.5.9
1) B(2) has generator matrix
G =
_
_
1 0 1 0
0 1 1 0
1 1 1 0
_
_
From this the result is trivial.
2) This follows directly from Problem 1.5.8.
3) Since the distance between any two words is
n
2
this is obvious (except that the all 1s
word and the all 1s word are not orthogonal.
Problem 1.5.10
1) We get
n
= n 1
k
= k (since d 2)
d
=
_
d if d is even
d 1 if d is odd
d is even if all minimum weight words have a zero in the deleted position.
2) We get
n
= n 1
k
= k 1
d
= d
Problem 1.5.11
We get
k
ext
= k and d
ext ,min
=
_
d if d is even
d +1 otherwise
The parity check matrix for the extended code is obtained by adding a column of zeroes
to H and the a row of 1s, i.e.
H
ext
=
_
_
_
_
_
H
c
0
.
.
.
0
1 . . . 1
_
_
Problem 1.5.12
1) c
j
= 0, if not satised for all the 2
k
codewords, gives one new equation so the
dimension drops one (i.e. there are 2
k1
such codewords).
2)
cC
w(c) = total number of 1s n 2
k1
by 1).
3) Since
cC
w(c) (2
k
1)d
min
the result follows.
Problem 1.5.15
No, since that would give a word of weight 4 in the code.
Problem 1.5.16 Just avoid the sum of the two rst columns in H.
Problem 1.5.17
The Hamming bound gives k 8.
Problem 1.5.18 1 +28z
2
+70z
4
+28z
6
+ z
8
.
Problem 2.4.1
1) All elements have an additive inverse, therefore the sum of all elements equals 0.
2) The elements 16 and 1 are their own multiplicative inverses and all other elements
have another element as an multiplicative inverse, therefore the product of all non-
zero elements 1 1 16 = 16
3) ord(2) = 8.
4) The possible orders are 1, 2, 4, 8, 16.
5) ord(1) = 1, ord (16) = 2, ord (4) = 4, ord (2) = 8, ord (3) = 16.
6) eight primitive elements.
7) No solutions.
8) 2 and 14.
Problem 2.4.2
1) If a = 0 we get a
1
(ab) = b = 0.
2) 2 2 = 0 so by 1) it can not be a eld!
Problem 2.4.3
If a
i
= 1 we get 1 and else 0, since a
q1
= 1.
Problem 2.4.4
x
3
+ x +1 and x
3
+ x
2
+1.
Problem 2.4.5
binary 3-tuple power of polynomial
000 0
100
0
1
010 x
001
2
x
2
110
3
x +1
011
4
x
2
+ x
111
5
x
2
+ x +1
101
6
x
2
+1
Problem 2.4.6 x
4
+ x +1.
Problem 2.4.7
1) All elements have an additive inverse, therefore
= 0.
2) All elements have a multiplicative inverse exept 1 which is its own inverse, therefore
the product 1 1 1 = 1.
3) From Example 2.2.3 we get that
i
is primitive iff gcd(i, 15) = 1.
Problem 2.4.8
1) z
4
+z
3
+1 has zeroes
7
,
11
,
13
and
14
.
2) z
4
+z
2
+z has only 0.
Problem 2.4.9
m = lcm(2, 3) = 6.
Problem 2.4.10
1, 31 and 1, 3, 7, 9, 21, 63.
Problem 2.4.11
x
9
1 = (x +1)(x
2
+ x +1)(x
6
+ x
3
+1).
Problem 2.4.12
x
73
1 = (x +1)(x
9
+x
7
+x
4
+x
3
+1)(x
9
+x
4
+x
2
+x +1)(x
9
+x
8
+1)(x
9
+x
6
+x
5
+
x
2
+1)(x
9
+x
8
+x
6
+x
3
+1)(x
9
+x
6
+x
3
+x +1)(x
9
+x +1)(x
9
+x
8
+x
7
+x
5
+1).
Problem 2.4.13 x
85
1 = (x +1)(x
8
+x
6
+x
5
+x
4
+x
2
+x +1)(x
8
+x
7
+x
5
+x
4
+
x
3
+x
2
+1)(x
8
+x
7
+x
6
+x
4
+x
2
+x +1)(x
8
+x
7
+x
3
+x +1)(x
8
+x
5
+x
4
+x
3
+
x
2
+x +1)(x
8
+x
7
+x
6
+x
5
+x
4
+x
3
+1)(x
8
+x
5
+x
4
+x
3
+1)(x
4
+x
3
+x
2
+
x +1)(x
8
+x
7
+x
6
+x
4
+x
3
+x
2
+1)(x
9
+x
7
+x
5
+x
2
+1)(x
9
+x
6
+x
3
+x +1).
Problem 2.4.14 x
18
1 = (x
9
1)
2
.
Problem 2.4.15 Yes it is on the list, see Appendix C, m = 8, i = 5.
Problem 2.4.16
3) x +1.
4) x, x
2
, x
4
, x
8
.
Problem 3.4.1
1) = np = 16 0.01 = 0.16 and =

np(1 p) = 0.401
2) From (3.2) we get 0.852, 0.138, 0.010, 0.0005, 0.00002.
3) From (3.3) we get 0.852, 0.134, 0.011, 0.00058, 0.00002. Thus the agreement is
quite good for these parameters.
4) From (3.4) we get P
fail
= 1 0.852 0.138 = 0.011 since t = 1 here.
5) So it is the probability of getting three errors i.e. 0.0005.
Problem 3.4.2
1) The weight distribution shows d = 5, thus t = 2.
2) P
fail
= P(> 2) = 0.0004.
3) 3 bit errors cause a decoding error if the received vector is at distance 2 from a
codeword of weight 5. Thus there are 18
_
5
3
_
= 180 such error patterns (exact result).
The total number of weight 3 words is
_
15
3
_
= 455. The number could also be found
using (3.6) with j = 3, w = 5, s = 2, i = 0.
4) P
err
= 180 0.01
3
0.99
12
= 0.00016. The next term for j = 4 gives 90+450 error
patterns, but the total probability is much smaller.
Problem 3.4.3
1) The minimum distance is 6, so the code corrects two errors.
2) Assuming that only two errors are corrected we have
P
f ail
P(3) =
_
16
3
_
p
3
(1 p)
13
= 0.0005.
3) From (3.5) we get T(4, 2, 6) = 15 and T(6, 2, 6) = 6 10 = 60.
Problem 3.4.4
1) np = 2.56
2) P(9) = 0.00094 , P(10) = 0.00024 , P(11) = 0.00005 so P(> 8) = 0.00125.
3) We get about 0.00132, all terms are slightly larger.
Problem 4.3.1
1) The entropy of the source is H = 1.5 bits.
2) H
max
= log(3) = 1.585 bits.
3) C = {0, 10, 11}.
4)
_
1
2
,
1
4
,
1
4
_
for [0, 1, 1]. V is not memoryless. There is a one-to-one mapping be-
tween the binary and the ternary sequences.
Problem 4.3.2
1) When the input distribution is
_
1
2
,
1
2
_
, the output distribution becomes P(Y) =
_
5
16
,
3
16
,
3
16
,
5
16
_
.
2) I = H(Y) H
_
Y
X
_
= 1 + H
_
3
8
_
7
4
= 0.204 bits.
3) C = I ; this is the maximizing distribution .
Problem 4.3.3
1) C = 1 H(0.05) = 0.714 bits.
2) The average number of errors is t
a
= 256 0.05 = 12.8.
3) The Hamming bound gives : log
2
_
_
256
t
_
_
< 256 160 thus t = 19.
4) P(e) = P[> 19 errors] =
_
256
20
_
p
20
(1 p)
236
+
_
256
21
_
p
21
(1 p)
235
+ = 0.03.
5) The Gilbert bound for the same parameters is d
g
= 21 by a similar calculation.
(There is a code with d = 26.)
Problem 5.5.1
1) There are (4, 1, 4), (4, 3, 2) and (4, 4, 1) binary codes.
2) A (4, 2, 3) ternary code could have
H =
_
1 0 1 1
0 1 1 1
_
Problem 5.5.2
1)
G =
_
_
_
_
1 1 1 1 1 1
1 3 2 6 4 5
1 2 4 1 2 4
1 6 1 6 1 6
_
_
2) Using the polynomials
2(x 3)(x 2)(x 6)
(x 1)(x 2)(x 6)
2(x 1)(x 3)(x 6)
2(x 1)(x 3)(x 2)
we get
G =
_
_
_
_
1 0 0 0 6 2
0 1 0 0 6 2
0 0 1 0 2 1
0 0 0 1 5 6
_
_
3) d
min
= n k +1 = 6 4 +1 = 3.
4) Addition of the second and fourth row of G from 1) gives (2, 2, 3, 5, 5, 4) so r =
(2, 4, 3, 5, 5, 4).
5) l
0
= 4 and l
1
= 1 so we have seven coefencients. (and six equations).
6) Q
0
(x) = 4x
4
+3x
3
+4x
2
+3x, Q
1
(x) = x +4 so
Q
0
(c)
Q
1
(x)
= x + x
3
.
7) H =
_
1 3 2 6 4 5
1 2 4 1 2 4
_
8) A single error would give a syndrome that is a column of H.
Problem 5.5.3
1) (1, ,
2
,
3
,
4
,
5
,
6
) +(1,
2
,
4
,
6
, ,
3
,
5
) = (0,
5
, ,
4
,
2
,
2
, ).
2) (8, 3, 6).
3) With r = (0,
5
, , 0, 0,
2
, ) one gets Q
0
(x) = x
4
+
2
x
3
+
2
x
2
+ x and
Q
1
(x) = x
2
+
6
x +1 and f (x) = x + x
2
.
4)
H =
_
_
_
_
1
2
6
1
2
6

3
5
1
3
5

4
1
4

5
3
_
_
5) One error would give:
S
2
S
1
=
S
3
S
2
=
S
4
S
3
.
Problem 5.5.4
1) Q
1
(x) = x
2
+
6
x +1 = (x
3
)(x
4
).
2) yes.
3)
4
and
2
.
Problem 6.5.1
1) x
9
1 = (x
3
1)(x
6
+ x
3
+1).
2) k = 3.
3)
G =
_
_
1 0 0 1 0 0 1 0 0
0 1 0 0 1 0 0 1 0
0 0 1 0 0 1 0 0 1
_
_
4) Yes, since x
8
+ x
6
+ x
5
+ x
3
+ x
2
+1 = (x
2
+1)(x
6
+ x
3
+1).
5) d
min
= 3 (List the eight codewords).
Problem 6.5.2
1) k = 15 5 = 10.
2) No g(x) does not divide.
3) The possible generator polynomials are
(x
3
+1)(x
4
+ x +1)
(x
3
+1)(x
4
+ x
3
+1)
(x
3
+1)(x
4
+ x
3
+ x
2
+ x +1)
4) 2
5
but two of them are trivial so they are 30.
Problem 6.5.3
1) h(x) =
x
15
1
x
4
+ x +1
= x
11
+ x
8
+ x
7
+ x
5
+ x
3
+ x
2
+ x +1.
so
H =
_
_
_
_
1 1 1 1 0 1 0 1 1 0 0 1 0 0 0
0 1 1 1 1 0 1 0 1 1 0 0 1 0 0
0 0 1 1 1 1 0 1 0 1 1 0 0 1 0
0 0 0 1 1 1 1 0 1 0 1 1 0 0 1
_
_
2) d
min
= 3.
3) C is the (15, 11, 3) Hamming code.
4) The dual code has dimension 4.
Problem 6.5.4
g(x)|c(x) so 0 = c(1) = c
0
+c
1
+ +c
n1
c has even weight.
Problem 6.5.5
Since C has a word of odd weight x +1 does not divide g(x) and therefore g(x)
x
n
1
x+1
=
1 + x + + x
n1
so (1111 . . . 1) is a codeword.
Problem 6.5.6
1) g(x) has among its zeroes ,
2
,
3
,
4
so d
min
5.
2) g
(x) = (x 1)m
(x)m
7(x)m
3 (x) so,
3) d
min
6.
Problem 6.5.7
g(x) has among its zeroes
2
,
1
, 1, ,
2
so d
min
6, but by the Hamming bound
there is no (63, 50, 7) code so it is exactly 6.
Problem 6.5.8
With g(x) = m
(x)m
3(x)m
5(x), primitive in F
2
5 we get a (31, 16, 7) code and
this is best possible.
Problem 7.4.1
1) The transmission efciency of the frame is
30
32
=
15
16
since the header and the parity
eld do not hold data.
2) C = 0.955 and R =
11
16
gives the information efciency 0.72.
3) This follows from the linearity of the code.
4) The probability that a word
a) Is correct is (1 p)
16
= 0.923,
b) contains 1 error is 16p(1 p)
15
= 0.074,
c) contains 2 errors or another even number, but is not a codeword is approximately
_
16
2
_
p
2
(1 p)
14
= 0.0028,
d) contains an odd number of errors > 1 is approximately
_
16
3
_
p
3
(1 p)
13
=
6.6 10
5
,
e) is a codeword different from the one sent. At least four errors, but only 140 of
these are codewords, i.e. 140 p
4
(1 p)
12
= 8 10
8
.
5) The probability that
a) The frame is detected in error (parity word not satised) or there are two or more
double errors is the probability of either two double errors or a triple error (caus-
ing decoding error)
_
31
2
_
.0028
2
0.923
29
+31 6.6 10
5
0.923
30
= 5.4 10
4
.
b) The frame is accepted with errors. This requires at least ve errors. If a decoding
error and a double error occur, the parity check will cause a wrong correction.
However, there will only be two corrections if the positions of the double error
are among the four positions in error in the other word. Thus we can keep this
probability at about 31 30 0.0028 6.6 10
5
6
120
= 8.6 10
6
.
Problem 8.8.1
1) 11.10.10.10.00.00.10.11.00.00.00
2) The code is a (16, 8) code. Each rowof G is a shift of g an even number of positions.
3) The information 111 gives two zero pairs. There is no longer sequence.
4) Yes.
5) M = 3.
6) d is even and at most 6. The information 11101110 gives a codeword of weight 4,
so d = 4.
7) For all long codes the minimum distance is 6.
Problem 8.8.2
1) Yes.
2) M = 2.
3) 111.010.011.001.101.110.000.000
Problem 8.8.3
1) As noted in Section 8.3 the sequence h for a code of rate
1
2
is found by reversing g.
Since g is symmetric here, h = g.
2) The nonzero part of the syndrome for a single error is (1011) for the rst position in
a pair and (1101) for the second position.
3) Adding the two syndromes from the previous question we nd that the error
11.00.00. . . . gives the syndrome (0110. . . ).
4) From the previous questions it follows that at least three errors are needed to give
the syndrome. One possibility is 10.11.
Problem 8.8.4
1) The reversed sequence is the same as g.
2) There are two different columns (1101) and (1011).
3) Two errors in a pair 00.11.00. . . . gives the right syndrome.
4) There have to be three errors.
Problem 8.8.5
Code from 8.8.1:
1) All row distances are 6, which is also the free distance.
2) 2.
3) Since g = 11.10.01.11 is a codeword, 11.10.00.00 and 00.00.01.11 have the same
syndrome, and at most one of them can be correctly decoded. The error pattern
11.00.00.11 would be decoded as 00.10.01.00
Code from 8.8.2:
1) All row distances are 7, which is also the free distance.
2) 3.
3) 111.100.000.000. . . . would be decoded as 000.001.110.000
Problem 8.8.6
1) Generator of punctured code
11.1.01.1
1.10.0.11
11.1.01.1
2) It is enough to nd one row. Multiplying by (1011) gives 11.10.10.10.11.10.11
Eliminating the punctured 0s give 111.101.111.110 and memory 3.
Problem 8.8.7
1) All row distances are 6 for the rst code and 7 for the second. That is also the free
distance.
2) 2 and 3.
3) Take error patterns that are part of codewords (11.10.00. . . . ) and (111.100.000. . . . )
Problem 8.8.8
G = (1 + D + D
3
, 1 + D
2
+ D
3
)
H = (1 + D
2
+ D
3
, 1 + D + D
3
)
G = (1 + D + D
2
, 1 + D
2
, 1 + D)
Problem 9.3.1
1) 2 , g and the sum of the rst 3 rows.
2) 8.
3) 00.01.00.00.10. etc.
4) A next state table indicating the next state for each input symbol.
Present state 000/0 001/1 010/2 011/3 100/4 101/5 110/6 111/7
Next state on 0 0 2 4 6 0 2 4 6
Next state on 1 1 3 5 7 1 3 5 7
The outputs for the transitions listed above are
Present state 0 1 2 3 4 5 6 7
Output on 0 00 10 01 11 11 01 10 00
Output on 1 11 01 10 00 00 10 01 11
5) 11.10.01.11, 11.01.11.10.11, 11.10.10.01.01.11, 11.01.00.00.10.11
6)
2x
6
x
10
15x
2
+2x
6
= 2x
6
+10x
8
+
7) none of weight 7 (all weights are even) two of weight 6 (found in 5)), ten of weight 8.
8) a and b are decoded since they have weight <
d
f ree
2
, c and f are not uniquely de-
coded since the three 1s are among the rst six 1s in g. d is decoded as a double
error. e is decoded since it has distance > 3 to all nonzero codewords.
9) Since Z = 0.2 , we get P
e
< 1.6 10
4
.
10) about 0.05.
Problem 9.3.2
1) As in Example 9.1.4
0
= y
2
3
.
2
=
3
.
1
= y
2
+x
2
3
.
3
= xy
3
/(1xy xy).
And the result
0
= xy
5
/(1 xy x
3
y) which agrees with the example.
2) By long division the rst terms are found as
0
= xy
5
+x
2
y
6
+x
4
y
6
+x
3
y
7
+2x
5
y
7
+
x
7
y
7
+ (words of weight > 7). Thus there is one word with four 0s and six 1s.
Problem 10.3.1
1) (49, 16, 9).
2) 3. At least two by Lemma 10.1.1, less than four from the comments in the proof.
3) 49 since there are seven words of weight three in the Hamming code.
4) There are three rows in the parity check matrix of the Hamming code. These are
copied to give parity checks on the seven rows and seven columns, however, the last
three sets of checks are linear combinations of other rows. Thus we get 33 rows,
which is the right number.
5) a) They will be corrected as single errors.
b) One error is corrected by the column code, in the other column two errors cause a
decoding error. The three positions in this column are corrected by the row codes.
Problem 10.3.2
1) (256, 144, 25).
2) 12.
3) 6.
Problem 10.3.3
1) (15, 8, 4).
2) Write the codewords as 5 by 3 arrays. Any cyclic shift of rows or columns gives a
codeword. By taking a shift of both rows and columns a symbol cycles through all
15 positions. It is important for this argument that n
1
, n
2
are relatively prime.
3) The generator polynomial has degree 7, and since the weights are even, one factor is
x +1. The possible irreducible factors are x
2
+x +1 and 3 polynomials of degree 4.
Thus the degree 2 polynomial has to be a factor. The last factor is x
4
+x
3
+x
2
+x +1.
This is most easily veried by considering a particular codeword.
Problem 10.3.4
1) (60, 40, 6).
2) (75, 40, 12).
3) (135, 80, 12).
Problem 10.3.5
1) From the binomial distribution P
f
= 4 10
4
. The Poisson approximation with
mean value 1 gives 6 10
4
.
2) From Theorem 10.2.3. P
ue

P
f
120
= 3.3 10
6
.
3) The probability of more than one error in a (14, 10, 3) codeword is 0.0084.
4) For one RS code we nd 1.75 10
7
(by the same calculation as in 10.3.1) . P
f
for
the concatenated code is at most this number multiplied by I = 2, i.e. 3.5 10
7
(Theorem 10.2.4).
Problem 11.3.1
1) r(x) = x +
2
x
2
+ x
3
, so we know already that the error locations are ,
2
,
3
with the corresponding error values ,
2
, 1
S
1
= r() =
2
+
4
+
3
=
12
S
2
= r(
2
) =
3
+
6
+
6
=
3
S
3
= r(
3
) =
4
+
8
+
9
=
6
S
4
= r(
4
) =
5
+
10
+
12
=
11
S
5
= r(
5
) =
6
+
12
+1 =
S
6
= r(
6
) =
7
+
14
+
3
=
9
So
S(x) =
12
x
5
+
3
x
4
+
6
x
3
+
11
x
2
+x +
9
.
6
an S(x) gives:
i g
i
r
i
q
i
1 0 x
6
0 1 S(x)
1
3
x +
9
8
x
4
+
3
x
3
+
8
x
2
+
3
x +
3
3
x +
9
2
7
x
2
+
2
x +
10
9
x
3
+
5
x
2
+
4
4
x +
11
3
6
x
3
+
2
x
2
+
4
x +
12
13
x
2
+
6
14
x +
13
so the error locator is g
3
(x) =
6
x
3
+
2
x
3
+
4
x +
12
which indeed has ,
2
and
3
as zeroes. Since g
3
=
6
x
2
+
4
, formula (11.5) gives e
1
=
7

15
+
6
8
+
4
= ,
e
2
=
14

17
+
6
10
+
4
=
2
, e
3
=
21

19
+
6
12
+
4
= 1.
2) r(x) =
2
x
3
+
7
x
2
+
11
x +
6
.
S
1
= r() =
5
+
9
+
12
+
6
=
12
S
2
= r(
2
) =
8
+
11
+
13
+
6
=
9
S
3
= r(
3
) =
11
+
13
+
14
+
6
=
5
S
4
= r(
4
) =
14
+1 +1 +
6
=
8
S
5
= r(
5
) =
2
+
2
+ +
6
=
11
S
6
= r(
6
) =
5
+
4
+
2
+
6
=
13
So
S(x) =
12
x
5
+
9
x
4
+
5
x
3
+
8
x
2
+
11
x +
13
.
6
an S(x) gives:
i g
i
r
i
q
i
1 0 x
6
0 1 S(x)
1
3
x +1
12
x
4
+
3
x
3
+
6
x
2
+
6
x +
13
3
x +1
2
3
x
2
+
9
x + x
3
+
11
x
2
+
2
x +
14
x +
4
3 x
3
+
11
x
2
+
11
x +
3
12
x
2
+
4
x +
12
x +
13
But g
3
does not have any zeroes in F
16
, so there must have occured more than three
errors.
Problem 11.3.2
1) r(x) = x
8
+ x
5
+ x
2
+1.
S
1
= r() =
8
+
5
+
2
+1 =
5
S
2
= r(
2
) =
10
S
3
= r(
3
) = 1 +
6
+1 +
9
=
5
S
4
= r(
4
) =
5
S
5
= r(
5
) =
10
+
10
+
10
+1 =
5
S
6
= r(
6
) =
10
So
S(x) =
5
x
5
+
10
x
4
+
5
x
3
+
5
x
2
+
5
x +
10
.
6
an S(x) gives:
i g
i
r
i
q
i
1 0 x
6
0 1 S(x)
1
10
x +1
5
x
4
+
10
x
3
+
10
x
2
+
10
10
x +1
2
10
x
2
+ x +1 x
3
+
5
x
2
+ x +
10
x
3 x
3
+
5
x
2
+ x +1 x
2
+ x +
10
5
x
The zeroes of g
3
(x) are ,
4
and
10
and therefore e(x) = x
10
+ x
4
+ x.
2) r(x) = x
11
+ x
7
+ x
3
+ x
2
+ x +1.
S
1
= r() =
11
+
7
+
3
+
2
+ +1 =
9
S
2
= r(
2
) =
3
S
3
= r(
3
) =
3
+
6
+
9
+
6
+
3
+1 =
7
S
4
= r(
4
) =
6
S
5
= r(
5
) =
10
+
5
+1 +
10
+
5
+1 = 0
S
6
= r(
6
) =
14
So
S(x) =
9
x
5
+
3
x
4
+
7
x
3
+
6
x
2
+
14
.
6
an S(x) gives:
i g
i
r
i
q
i
1 0 x
6
0 1 S(x)
1
6
x +1
8
x
4
+
2
x
3
+
6
x
2
+
5
x +
14
6
x +1
2
7
x
2
+x +1 x +
14
x
Here g
2
(x) has
10
and
13
as zeroes, so e(x) = x
10
+ x
13
and we decode into
x
13
+ x
11
+ x
10
+ x
7
+ x
3
+ x
2
+ x +1.
Problem 12.4.1
n = 31, k = 10 so d
min
= 22.
1) With list size 2, we get < 31
2
3

2
2
9 = 20
2
3
9, so 11.
2) With list size 3, we get < 31
3
4

3
2
9 =
39
4
, so the answer is 11!
Problem 12.4.2
n = 31, k = 10.
l = 1 gives 13.
l = 2 gives 16.
l = 3 gives 18.
l = 4 gives 16.
so the optimal list size is 3.
Problem 12.4.3
1) n = 10, k = 3 so with l = 2 we can correct 4 errors.
2) The system of equations is
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1
1 2 4 8 5 10
1 4 5 9 3 1
1 8 9 6 4 10
1 5 3 4 9 1
1 10 1 10 1 10
1 9 4 3 5 1
1 7 5 2 3 10
1 6 3 7 8 10
_
_
_
_
_
_
_
_
_
_
Q
00
Q
01
Q
02
Q
03
Q
04
Q
05
_
_
+
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
0 0
6
9
1
6
0
0
0 0
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1
1 2 4 8
1 4 5 9
1 8 9 6
1 5 3 4
1 10 1 10
1 9 4 3
1 7 5 2
1 3 9 5
1 6 3 7
_
_
_
_
_
_
Q
10
Q
11
Q
12
Q
13
_
_
+
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
0 0
3
4
1
3
0
0
0 0
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1
1 2
1 4
1 8
1 5
1 10
1 9
1 7
1 3
1 6
_
_
_
Q
20
Q
21
_
=
_
_
_
_
_
0
0
.
.
.
0
_
_
and it is easy to check that 0, 0, 0, 0, 0, 0, 2, 3, 1, 0, 1, 0 solves this system.
3) Q(x, y) = y(y (x
2
3x +2)) so the possible codewords are
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0) and (0, 0, 4, 8, 1, 6, 1, 8, 2, 9).
Problem 13.5.1 The parity bits are (00011001001).
The encoding can be interpreted as correcting erasures on the last n k positions.
The decoded word is (110110110110110110110)
Problem 13.5.2
The code has n = 12 symbols and 8 parity checks, 7 of these independent. So it is a
(12, 5, 4) code.
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 0 0 0 0 0 0 0 0 0
1 0 0 1 1 0 0 0 0 0 0 0
0 1 0 0 0 1 1 0 0 0 0 0
0 0 1 0 0 0 0 1 1 0 0 0
0 0 0 1 0 0 0 1 0 1 0 0
0 0 0 0 1 0 1 0 0 0 1 0
0 0 0 0 0 1 0 0 1 0 0 1
0 0 0 0 0 0 0 0 0 1 1 1
_
_
A tree code can have four parity checks and nine symbols. A single error is identied
by the two parity checks that fail. They can be in the same tree code, or there may be
one in each code.
Problem 13.5.3
Three errors are corrected, but the result depends on the order in which the positions
are considered.
With message passing the weights indicate that three errors have occurred, but there is
not a unique result.
Problem 14.3.1
1) (64, 22, 37).
2) Let be a primitive element of F
16
with
4
+ +1 = 0, then the 64 points on the
curve x
5
+ y
4
+ y = 0 are
(0, 0), (0, 1), (0,
5
), (0,
10
);
(1, ), (
3
, ), (
6
, ), (
12
, ), (
9
, );
(1,
2
), (
3
,
2
), (
6
,
2
), (
12
,
2
), (
9
,
2
);
(1,
4
), (
3
,
4
), (
6
,
4
), (
12
,
4
), (
9
,
4
);
(1,
8
), (
3
,
8
), (
6
,
8
), (
12
,
8
), (
9
,
8
);
(,
6
), (
4
,
6
), (
7
,
6
), (
10
,
6
), (
13
,
6
);
(,
7
), (
4
,
7
), (
7
,
7
), (
10
,
7
), (
13
,
7
);
(,
9
), (
4
,
9
), (
7
,
9
), (
10
,
9
), (
13
,
9
);
(,
13
), (
4
,
13
), (
7
,
13
), (
10
,
13
), (
13
,
13
);
(
2
,
3
), (
5
,
3
), (
8
,
3
), (
11
,
3
), (
14
,
3
);
(
2
,
11
), (
5
,
11
), (
8
,
11
), (
11
,
11
), (
14
,
11
);
(
2
,
12
), (
5
,
12
), (
8
,
12
), (
11
,
12
), (
14
,
12
);
(
2
,
14
), (
5
,
14
), (
8
,
14
), (
11
,
14
), (
14
,
14
)
and the 22 polynomials in which one evaluates these points are:
1, x, y, x
2
, xy, y
2
, x
3
, x
2
y, xy
2
, y
3
, x
4
, x
3
y,
x
2
y
2
, xy
3
, y
4
, x
4
y, x
3
y
2
, x
2
y
3
, xy
4
, y
5
, x
4
y, x
3
y
3
.
This gives a 22 64 generator matrix for the code.
3) (64, 42, 17).
4) Since H(22)
= H(42) we can copy the previous method.

Problem 14.3.3
x
q+1
y
q
y = (a(x) + yb(x, y))(c(x) + yd(x, y))
= a(x)c(x) + y(b(x, y)c(x) +d(x, y)a(x)) + y
2
b(x, y)d(x, y)
and therefore a(x)c(x) = x
q+1
so a(x) = x
q+1i
,c(x) = x
i
but this implies
b(x, y)x
i
+d(x, y)x
q+1i
+ yb(x, y)d(x, y) = y
q1
1
and hence i = 0 and b(x, y) = 1 or i = q + 1 and d(x, y) = 1. In the rst case
we get d(x, y) = y
q2
and in the second case b(x, y) = y
q1
. If we insert these in
the original expression we get a contradiction.
Problem 14.3.4
The zeroes of x
q
+ x in F
q
2 for q even are 0 and
i(q+1)
, i = 0, 1, . . . , q 2 where
is a primitive element of F
q
2 and for q odd the zeroes are
0 and
q+1
2
j
, j = 1, 3, . . . , 2(q 1) 1, where is a primitive lement of F
q
2 .
Problem 14.3.5
If
(q1)(a
1
+a
2
)
= 1 we have q
2
1
(q 1)(a
1
+a
2
) and hence a
1
+a
2
= h(q +1)
with h 1. So the relevant sum in this case is
q2
i=0
q
+ =
i(q+1)
b
1
+b
2
q
j =0
2ih(q+1)
=
b
1
+b
2
_
q
+
_
2h
=
b
1
+b
2
2h
l=0
_
2h
l
_
lq
2hl
= 0
Appendix C
Table of minimal polynomials
This appendix contains a table of minimal polynomials of
i
where is a primitive
element of F
2
m.
The table is used in the following way.
1. Choose the eld, by choosing, m, a power of 2.
2. Choose the power of , by choosing i .
3. Read the powers of x from the table.
Example C.0.1. We nd the minimal polynomial of
3
in F
8
.
1. Since F
8
= F
2
3 we have m = 3.
2. We have
3
, i.e. i = 3.
3. We read from the table the powers of x as (0, 2, 3), which means that the polynomial is
x
3
+ x
2
+1.
m = 2
Power of Powers of x
i = 1 (0, 1, 2)
m = 3
Power of Powers of x Power of Powers of x
i = 1 (0, 1, 3) i = 3 (0, 2, 3)
m = 4
i = 1 (0, 1, 4) i = 3 (0, 1, 2, 3, 4) i = 5 (0, 1, 2) i = 7 (0, 3, 4)
m = 5
i = 1 (0, 2, 5) i = 3 (0, 2, 3, 4, 5) i = 5 (0, 1, 2, 4, 5)
i = 7 (0, 1, 2, 3, 5) i = 11 (0, 1, 3, 4, 5) i = 15 (0, 3, 5)
186 Table of minimal polynomials
m = 6
i = 1 (0, 1, 6) i = 3 (0, 1, 2, 4, 6) i = 5 (0, 1, 2, 5, 6)
i = 7 (0, 3, 6) i = 9 (0, 2, 3) i = 11 (0, 2, 3, 5, 6)
i = 13 (0, 1, 3, 4, 6) i = 15 (0, 2, 4, 5, 6) i = 21 (0, 1, 2)
i = 23 (0, 1, 4, 5, 6) i = 27 (0, 1, 3) i = 31 (0, 5, 6)
m = 7
i =1 (0,3,7) i =3 (0,1,2,3,7) i =5 (0,2,3,4,7)
i =7 (0,1,2,4,5,6,7) i =9 (0,1,2,3,4,5,7) i =11 (0,2,4,6,7)
i =13 (0,1,7) i =15 (0,1,2,3,5,6,7) i =19 (0,1,3,6,7)
i =21 (0,2,5,6,7) i =23 (0,6,7) i =27 (0,1,4,6,7)
i =29 (0,1,3,5,7) i =31 (0,4,5,6,7) i =43 (0,1,2,5,7)
i =47 (0,3,4,5,7) i =55 (0,2,3,4,5,6,7) i =63 (0,4,7)
m = 8
i =1 (0,2,3,4,8) i =3 (0,1,2,4,5,6,8) i =5 (0,1,4,5,6,7,8)
i =7 (0,3,5,6,8) i =9 (0,2,3,4,5,7,8) i =11 (0,1,2,5,6,7,8)
i =13 (0,1,3,5,8) i =15 (0,1,2,4,6,7,8) i =17 (0,1,4)
i =19 (0,2,5,6,8) i =21 (0,1,3,7,8) i =23 (0,1,5,6,8)
i =25 (0,1,3,4,8) i =27 (0,1,2,3,4,5,8) i =29 (0,2,3,7,8)
i =31 (0,2,3,5,8) i =37 (0,1,2,3,4,6,8) i =39 (0,3,4,5,6,7,8)
i =43 (0,1,6,7,8) i =45 (0,3,4,5,8) i =47 (0,3,5,7,8)
i =51 (0,1,2,3,4) i =53 (0,1,2,7,8) i =55 (0,4,5,7,8)
i =59 (0,2,3,6,8) i =61 (0,1,2,3,6,7,8) i =63 (0,2,3,4,6,7,8)
i =85 (0,1,2) i =87 (0,1,5,7,8) i =91 (0,2,4,5,6,7,8)
i =95 (0,1,2,3,4,7,8) i =111 (0,1,3,4,5,6,8) i =119 (0,3,4)
i =127 (0,4,5,6,8)
m = 9
i = 1 (0, 4, 9) i = 3 (0, 3, 4, 6, 9) i = 5 (0, 4, 5, 8, 9)
i = 7 (0, 3, 4, 7, 9) i = 21 (0, 1, 2, 4, 9) i = 35 (0, 8, 9)
i = 63 (0, 2, 5, 6, 9) i = 77 (0, 3, 6, 8, 9) i = 91 (0, 1, 3, 6, 9)
i = 119 (0, 1, 9) i = 175 (0, 5, 7, 8, 9)
m = 10 m = 14
i = 1 (0, 3, 10) i = 1 (0, 1, 6, 10, 14)
m = 11 m = 15
i = 1 (0, 2, 11) i = 1 (0, 1, 15)
m = 12 m = 16
i = 1 (0, 1, 4, 6, 12) i = 1 (0, 1, 3, 12, 16)
m = 13
i = 1 (0, 1, 3, 4, 13)
Bibliography
Readers looking for additional information can use the following directions and start-
ing points.
Textbooks
The following two textbooks give an introduction to codes with some emphasis on
applications in communication systems
Shu Lin and Daniel J. Costello, Jr. Error Control Coding: Fundamentals and
Applications. Prentice-Hall, Englewood Cliffs, NJ, USA, 1983, 2nd ed. 2004.
ISBN 0-13-283796-X.
R. Blahut. Algebraic Codes for Data Transmission. Cambridge University
Press, 2003.
An introduction to convolutional codes with in- depth treatment of linear encoders may
be found in
Rolf Johannesson and Kamil Sh. Zigangirov. Fundamentals of Convolutional
Coding. IEEE Press, New York, NY, USA, 1999. ISBN 0-7803-3483-3.
Reference books
Constructions and combinatorial properties of error-correcting codes are treated in
great detail in
F. J. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes,
volume 16 of North-Holland Mathematical Library. North-Holland, 1983,
reprinted 1998.
An extensive set of reference articles, somewhat uneven in the treatment of the different
problems.
V. S. Pless and W. Huffman, editors. Handbook of Coding Theory. Elsevier,
1999.
188 Table of minimal polynomials
The following classical textbook is still a valuable reference on cyclic codes.
W. W. Peterson and E. J. Weldon. Error Correcting Codes. John Wiley & Sons
Publishers, (New York NY),
Professional journals
The most important journal covering original research on coding theory.
IEEE Transactions on Information Theory.
The following special issue is particularly important as an entry point into the literature
Information Theory 1948-98 Special Commemorative Issue. IEEE Transactions
on Information Theory, 44(6), October 1998.
This issue contains a number of surveys with extensive lists of references.
Developing areas and applications are often treated in special issues of this and other
journals. Some recent issues are
Special issue on graphs and iterative algorithms. IEEE Transactions on Infor-
mation Theory, February 2001.
Signal Processing for High-Density Storage Channels. IEEE Journal on Selected
Areas in Communications, April 2001.
Important mathematics journals are
North Holland. Discrete Mathematics.
Academic Press. Finite Fields and their Applications.
Kluwer Acad. Publ. Designs, Codes and Cryptograpy.
Springer. Applicable Algebra in Engineering, Communication and Computing.
Applications standards
The details of codes used in particular systems may be found in documents published
by various standards organizations. However, the information will often have to be
extracted from large amounts of related information.
The Consultative Committee for Space Data Systems is an international body that sets
common standards for the major space agencies. The documents are available at
www.ccsds.org
Standards for commercial communication systems including mobile communication
may be obtained from European Telecommunication Standards Institute at
www.etsi.org
189
Implementations
Details of coding and decoding algorithms are covered in some of the books and jour-
nals mentioned above. However, practical decoders often use special circuits, and many
are based on programmable logic arrays. Some information may be obtained from ma-
jor suppliers of such circuits at
www.xilinx.com or www.altera.com
Tables of codes
Some reseachers maintain web sites which are updated with information about the best
known codes etc. A useful site is
www.win.tue.nl/aeb/voorlincod.html
which give parameters of best known codes.
Internet resources
Much information can be obtained by using standard search engines as Google.com.
The entry Reed-Solomon codes gives many useful references. However it should be
noted that the information cannot always be expected to be correct or up-to-date. More
information about coding in space system can be found at
www331.jpl.nasa.gov/public/JPLtcodes.html
Finally the authors plan to make sample problem sheets in maple
r
and Matlab
r
as
well as corrections to this book available on
www.mat.dtu.dk/publications/books
Index
A, 11
A(z), 11
A
w
, 11
algebraic geometry codes, 151
, 20
B, 11
B(z), 11
B
f
, 74
B
w
, 11
basis, 2
BCH-bound, 66
BCH-codes, 69
BEC, 44
, 27
binary, 1
channel, 74
code, 1
erasure channel, 44
eld, 1
Hamming code, 7
symmetric channel, 34, 43
binomial distribution, 33
biorthogonal code, 7
bit-ipping, 139
block codes, 1
body, 74
bounded distance decoding, 34
BSC, 34, 43
C(Y | X), 43
C
, 4
C
s
, 68
C
s
(sub), 68
C, 1
c, 4
capacity of a discrete channel, 43
catastrophic error-propagation, 85
catastropic encoding, 85
chain of state transitions, 99
channel symbols, 73
circuit, 99
circulant matrix, 86
code
algebraic geometry, 151
binary, 1
biorthogonal, 7
block, 1
column tree, 144
composite, 109
concatenated, 113
convolutional, 83
cyclic, 63
dual regular convolution, 89
Golay, 71
Hamming, 7
Hermitian, 152
length, 1
low-density parity check, 137
polynomial, 63
product, 109
punctured, 90
punctured dual, 91
quasi-cyclic, 86
Reed-Solomon, 50
regular unit memory, 93
row tree, 144
shortened, 90
tail-biting, 86
tree, 140
unit memory, 93
codeword, 1
number of s, 2
column error probability, 115
column tree code, 144
composite code, 109
concatenated code, 80, 112, 113
192 Index
construction of a eld, 23
convolutional codes, 83
convolutional encoder, 88
coset, 8
cyclotomic, 27
representatives, 27
coset leader, 8
coset representatives, 27
CRC, 76
curve, 152
cyclic binary code, 76
cyclic code, 63
cyclic redundancy check, 76
cyclotomic coset, 27
D, 91
d, 5
d
H
, 5
d
f
, 89
d
min
, 5
data eld, 73
data frame, 73
decoder, 8
maximum likelihood, 8
minimum distance, 8
syndrome, 8
decoding
bounded distance, 34
list, 127
maximum likelihood, 8
minimum distance, 34
serial, 110
decoding error, 34
decoding failure, 34
denition of a eld, 19
degree, 22, 151
interleaving, 112
dimension, 2
discrete memoryless channel, 42
discrete memoryless source, 41
distance
extended row, 99
free, 89
Hamming, 5
minimum, 5
row, 89
distribution
binomial, 33
Poisson, 34
dual code, 4
dual regular code, 89
dynamic programming, 102
E, 42
E(R), 45
e, 4
encoder, 97
convolutional, 88
regular convolutional, 88
encoding
catastropic, 85
non-systematic, 50
parallel, 145
systematic, 3
encoding rule
block code, 3
encoding rules
non-systematic, 92
entropy, 41
equidistant, 14
erase symbol, 44
error locator polynomial, 52
error pattern, 4
error vector, 4
, 74
event error probability, 106
extended binary Hamming code, 7
extended row distance, 99
eld
binary, 1
denition, 19
nite, 20
nite eld, 20
nite state encoder, 98
nite state machine, 97
frame check sequence, 76
frame input, 83
free distance, 89
fundamental error event, 105
G(D), 92
G, 3
g, 84, 85
, 74
generating vector, 84
generator matrix
Index 193
block code, 3
convolutional code, 85
generator polynomial, 64
genus, 154
Golay code, 71
H, 3, 87
H(D), 92
H(X | Y), 43
H(X), 42, 43
H(Y | X), 43
H(Y), 43
H( p), 43
H
f
, 74
h, 3, 87
Hamming bound, 10
Hamming code, 7
extended, 7
Hamming distance, 5
Hamming weight, 4
header, 73
Hermitian codes, 152
I (X; Y), 43
information, 3
information channel, 41
information efciency, 74
information loss, 74
information symbols, 83
information vector, 83
interleaving, 146
interleaving degree, 112
irreducible, 23, 151
K, 84
K
f
, 74
k, 2
L, 85
l, 35
l
0
, 52
l
1
, 52
LDPC, 137
length, 1
linear block code, 2
linear encoding rule, 91
list decoding, 127
low-density parity check code, 137
M, 2, 85
m, 7, 27
m +1, 85
majority decision, 138
matrix
circulant, 86
maximum likelihood decoding, 8
maximum likelihood decoder, 8
memory, 85
message, 140
metric space, 5
minimal polynomials, 25
minimum distance, 5
minimum distance decoder, 8
minimum distance decoding, 34
, 34
-vector, 103
multiplicity, 130
mutual information, 43
N, 83
n, 1
non-catastrophic, 84, 99
non-systematic encoding, 50
non-systematic encoding rules, 92
order, 20
order function, 152
orthogonal, 2
P, 33
P(e), 45, 77
P( f ), 77
P( f e), 76
P(ue), 76
P
fail
, 34
P
, 115
P
err
, 34
p, 33
p
c
, 115
p
s
, 115
parallel encoding, 145
parameters
block code, 7
convolutional code, 85
parity check, 3
parity check matrix
block code, 3
parity check symbols, 3
194 Index
parity eld, 73
perfect code, 10
, 98
Plotkin bound, 15
Poisson distribution, 34
polynomial
degree, 22
irreducible, 23
minimal, 25
Polytrevs bound, 38
primitive element, 20
probability, 33
of correct frame, 77
of frame error, 76
of undected error, 76
event error, 106
of decoding failure, 77
probability of frame loss, 115
product code, 109
, 113
punctured code, 90
punctured dual code, 91
puncturing, 15
Q(), 41
Q(x), 51
Q(x, y), 51
Q
0
(x), 52
Q
1
(x), 52
q, 2
quality, 76
quasi-cyclic code, 86
R, 4, 83
r, 4
rate, 4
received word, 4
Reed-Solomon code, 50
regular convolutional encoder, 88
regular set, 98
regular unit memory code, 93
, 100
row distance, 89
row tree code, 144
S, 57
s, 4
serial decoding, 110
serial encoding, 112
shortened code, 90
shortening, 15
, 34
source symbols, 73
subeld subcode, 68
symbol error probability, 115
syndrome, 4
syndrome decoder, 8
systematic encoding, 3
t, 5
tail-biting code, 86
termination, 89
ternary eld, 20
transfer frame, 74
transmission efciency, 74
transmission overhead, 74
tree, 139
tree code, 140
triangle inequality, 5
u, 83, 89
u, 3
UMC, 93
unit memory code, 93
V, 2
v(u), 83
Varshamov-Gilbert bound, 6
vector space, 2
w
H
, 4
w
min
, 5
weight, 5
weight distribution, 11
weight enumerator, 11
Z, 38

A Course in Error-Correcting Codes - Justesen and Høholdt

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Course in Error-Correcting Codes - Justesen and Høholdt

Enviado por

Direitos autorais:

Formatos disponíveis

SS

we get the result.

be the code with generator matrix G

(x)) m, with equality if is a primitive element.

(x) = a(x)b(x) we would have a( ) = 0 or b( ) = 0, contradicting the

(x)q(x)+r(x) with deg(r(x)) < deg(m

(x)). This gives r( ) =

j (x) and therefore

4p(1p), which depends only on the channel, we get

= n k. For small weight, w, the number of codewords is less than 1, and

and therefore using corollaries 5.3.1 and 5.3.2

is also cyclic and has generator polynomial g

so from Theorem 5.3.1 we get

9 (x) the bound only gives d

6 (x) we get that the generator polynomial

= {c(x) = i (x)a(x)g(x) mod (x

s (x), s = 1, 2, . . . , 2t does not divide x

(x) as generator polynomial, and let C

j (x) as generator polynomial.

code, and thus the rank would be less than RN

code with dimension less than RN

, we can nd a nonzero input that is

of the sequence satises j =

, j ) code called the j-th terminated code.

have minimum distance d

, j ) block code, where j = R(N

= (111.011.110). However when this generator is interpreted as generating a code of rate

, and then make

(z) is obtained from

in the same way. An output of weight 0 is replaced

had a smaller total number

, but since the possible transitions

depend only on the state at time j

, the rest of the chain could be chosen as

Figure 9.1: Decoding of an 8-state convolutional code. The columns

is the minimum distance of C

(x) and (a(x)b(x))

(x, y) = Q(x +a, y +b) =

satises the parity check. For the other value of x

, change one of the

, i ), and the received symbols r

, the closest codeword with

the forward algorithm has calculated

, and the reverse algorithm has calculated the accumulated costs

. Thus we can nd the cost of the alternative sequence

, nd the new weight of the information symbol at time j

= H(42) we can copy the previous method.

Você também pode gostar