Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
Lecture 3: Linearity Testing
Lecturer: Prahladh Harsha Scribe: Joshua A. Grochow
In today’s lecture we’ll go over some preliminaries to the coding theory we will need
throughout the course. In particular, we will go over the WalshHadamard code and the
BlumLubyRubinfeld linearity test. In the following lecture we will use these tools to show
that NP ⊆ PCP[poly, O(1)] (recall that our goal is ultimately to bring the polynomial
amount of randomness down to a logarithmic amount). A common feature throughout the
course is that we will use codes to construct PCPs and viceversa, construct useful codes
from PCPs.
3.1 Coding Theory – Preliminaries
Coding theory is the study of how to communicate reliably over a noisy channel. The most
common setting is as follows: A message m is put through an encoder E, yielding a value
E(m), also called the codeword (typically much longer than m). The codeword E(m) is
then sent through the noisy channel and arrives at the other end with some noise introduced
η by the channel as E(m) + η. The decoder D then takes E(m) + η as input, and ideally
outputs m as long as the noise is not too much. (Note that it is expected of the decoder
that when applied to E(m) it outputs m.)
Formally, a code is speciﬁed by an encoding function C : ¦0, 1¦
k
→ ¦0, 1¦
n
; the outputs
of C are called codewords. The rate of the code C is k/n, i.e. the ratio of input bits to
codeword bits. Heuristically, this is the amount of information of the input message per bit
of the codeword.
An important notion in coding theory is that of the distance between two codewords;
here (and typically) we use either the Hamming distance ∆(x, y) = #¦i[x
i
= y
i
¦ or the
normalized (Hamming) distance δ(x, y) = ∆(x, y)/n (where n = [x[ = [y[).
The distance d of a code C is the minimum distance between two distinct codewords,
i.e., min
x=y
¦∆(C(x), C(y))¦. For a code of distance d, if the number of bits ﬂipped is
strictly less than d/2 (i.e., weight(η) ≤ d/2), then E(m) +η can be uniquely decoded to m.
A code C : ¦0, 1¦
k
→ ¦0, 1¦
n
with distance d is called an (n, k, d)
2
code, where n is the size
of the codewords, k the size of the input, d the distance of the code, and 2 the size of the
alphabet ¦0, 1¦.
A code C is called linear if for all x and y, if x and y are codewords then so is x + y
(where addition is performed bitwise modulo 2 – i.e. XOR)
1
. To indicate that a (n, k, d)
2

code is also linear, we use the notation [n, k, d]
2
code (with square brackets). The word
“code,” is commonly used to refer to both the encoding function C (as stated above), as
well as the set of all codewords Im(C).
Typically, the most signiﬁcant tradeoﬀ in coding theory is that between the rate and
distance of a code. For example, given a particular rate, we might ask what is the best
1
Note that if we work over a larger alphabet than binary – e.g. over a larger ﬁnite ﬁeld, we require the
additional constraint that x is a codeword then so are all scalar multiples of x (i.e., αx)
31
distance that can be achieved.
3.1.1 Algorithmic Questions and Sublinear Time Algorithms
There are three main algorithmic questions that arise in coding theory:
1. Complexity of encoding;
2. Error detection: given r ∈ ¦0, 1¦
n
, decide if r is a codeword; and
3. Error correction: given r ∈ ¦0, 1¦
n
, ﬁnd x ∈ ¦0, 1¦
k
minimizing ∆(r, C(x)).
In this course, we will interested in these questions in the context of sublinear time
algorithms. We need to be speciﬁc what we mean by sublinear time. Note that a sublinear
time algorithm A to compute a function f ¦0, 1¦
k
→ ¦0, 1¦
n
doesn’t have enough time to
read it’s input or write it’s output. We get over this by accessing the input and writing the
output by oracle access. That is, an oracle machine A is a sublinear time algorithm for f if
A
x
(j) = f(x)
j
where f(x)
j
is the jth bit of f(x). Note that j need only be log n bits, so
can be read entirely in sublinear time. More formally,
• The input is represented implicitly by an oracle. Whenever the sublinear time algo
rithm wants to access the j
th
bit of the input string x (for some j ∈ [k]), it queries
the input xoracle for the j
th
bit and obtains x
j
.
Figure 1: Sublinear Time Algorithms
• The output is not explicitly written by the algorithm, instead it is only implicitly
given by the algorithm. Formally, on being queried for index i of the output string
f(x) (for some i ∈ [n]), the algorithm outputs the bit f(x)
i
. Thus, the algorithm itself
behaves as an oracle for the string f(x), which in turn has oracle access to the input
oracle x.
• Since the algorithm does not read the entire input x, we cannot expect it compute
the output f(x) exactly. We instead relax our guarantee on the output as follows:
On input x ∈ ¦0, 1¦
k
, the algorithm must compute f(x
) exactly for some x
∈ ¦0, 1¦
k
that is close to the actual input x. In other words, the algorithm computes functions
on some approximation to the input instead of the input itself.
32
Property testing, for those familiar with it, is typically done in this framework. Figure 1
gives a pictorial description of a sublinear time algorithm with the above mentioned relax
ations.
Now we’ll consider the algorithmic questions of coding theory in the sublinear time
framework. Let’s consider the questions above one by one:
1. For a code to have good errorcorrecting properties, most bits of the codeword needs
to depend on most messagebits. Taking this into account, it does not seem reasonable
to expect a “good” code to have sublinear time encoding. (However, there has been
some recent work in the area of locally encodable codes by relaxing some of the error
correction properties of the code).
2. Error detection: in this context, error detection is known as local testing. That is,
given r ∈ ¦0, 1¦
n
, test if either r is a codeword or r is far from every codeword (similar
to the gap problems we saw earlier, and also to property testing).
3. Error correction is known as local decoding in this context. Given j, the decoder
queries some bits of a noisy codeword C(x) + η and outputs the jth bit of x. more
formally, we say a code is locally decodable if there exists a local decoder Dec such
that for all x and r where ∆(r, C(x)) < , the decoder satisﬁes
∀i ∈ ¦1, . . . , k¦, Pr[Dec
r
(i) = x
i
] ≥
3
4
.
If there is such a decoder for a given code C and it makes fewer than q queries, then
we say C is (q, )locally decodable.
Obviously, sublinear time algorithms in general, and local decoding and local testing
algorithms in particular, will be randomized algorithms.
Note that local decoding is only required to work when the input is suﬃciently close to a
codeword, whereas local testability determines whether a given string is close to a codeword
or far from any codeword. Thus the local decodability of a code says nothing about its local
testability.
3.2 The WalshHadamard Code and Linearity Testing
Now onto our ﬁrst code, the WalshHadamard code. This will be the main tool in proving
that NP has exponential size PCPs, i.e. NP ⊆ PCP[poly, O(1)]. There are two dual views
of the WalshHadamard code: on the one hand, WH(x) is the evaluation of every linear
function at x; on the other hand, it consists of the dot product of x with every ∈ ¦0, 1¦
n
.
A function f ¦0, 1¦
k
→ ¦0, 1¦ is linear if ∀x, y, f(x + y) = f(x) + f(y).
For example, for any a ∈ ¦0, 1¦
k
, the function
a
(x) =
a
i
x
i
mod 2 (i.e. the dot
product of a and x as vectors over GF(2)) is a linear function. In fact, L
k
= ¦
a
[a ∈ ¦0, 1¦
k
¦
is the set of all linear boolean functions. (Proof: observe that the space of linear functions
is a vector space, and has the same dimension as L
k
.)
The WalshHadamard code WH : ¦0, 1¦
k
→ ¦0, 1¦
2
k
is then deﬁned as WH(x) =
x
,
i.e. WH(x) is the truth table of
x
. More formally, for any a ∈ ¦0, 1¦
k
, the ath bit of the
WalshHadamard codeword WH(x) is WH(x)
a
=
x
(a).
33
Since any two distinct linear functions disagree on exactly half the set of inputs (i.e,
Pr
a
[
x
(a) =
y
(a)] = 1/2, for x = y), the fractional distance of the WalshHadamard codes
is 1/2. Thus the WalshHadamard code is a [k, 2
k
, 2
k−1
]code. It has very good distance,
but poor rate.
It is useful to note that there are two dual views of the WalshHadamard code based on
the fact that WH(x)
a
=
x
(a) =
a
(x). Thus, WH(x) is both the evaluation of the linear
function
x
at every point as well as the evaluation of every linear function at the point x.
Note that the WH code has the special property that the input bits x
i
are in fact a subset
of the codeword bits, since x
i
=
x
(e
i
). Codes with this property are called systematic codes.
3.2.1 Local Decodability of the WalshHadamard Code
Decoding the WalshHadamard code is very simple. Given a garbled codeword f : ¦0, 1¦
k
→
¦0, 1¦, which is δclose to some WalshHadamard codeword, the decoder Dec works as
follows: (The decoder Dec has oracle access to the function f)
Dec
f
: “On input z,
1. Choose r ∈
R
¦0, 1¦
k
2. Query f(z + r) and f(r)
3. Output f(z + r) −f(r)
Claim 3.1. If f : ¦0, 1¦
k
→ ¦0, 1¦ is δclose to WH(x), then,
Pr[Dec
f
(z) =
x
(z)] ≥ 1 −2δ, ∀x ∈ ¦0, 1¦
k
.
Proof. Since f is δclose to
x
, we have that for a random r, the probability that f(z +r) =
x
(z + r) is at least 1 − δ, and so is the probability that f(r) =
x
(r). If both of these
conditions hold, then Dec
f
(z) =
x
(z), by the linearity of
x
. Thus Pr
r
[Dec
f
(z) =
x
(z)] ≥
1 −2δ.
Since the decoder only makes two queries, the WalshHadamard code is 2locally decod
able.
3.2.2 Local Testability of the WalshHadamard Code: Linearity Testing
To locally test the WH code, we wish to test whether a given truth table is the truth table
of a linear function. The problem of local testing of the WH code is more commonly called
linearity testing. Formally, given f ¦0, 1¦
k
→ ¦0, 1¦ we want to test whether it is a linear
function (WH codeword) or far from linear.
The test is, as in the WH decoder, quite simple: pick y and z uniformly at random from
¦0, 1¦
k
and check that f(z) + f(y) = f(z + y). This test was proposed and ﬁrst analyzed
by Blum, Luby and Rubinfeld [BLR93].
BLRTest
f
: “ 1. Choose y, z ∈
R
¦0, 1¦
k
independently
2. Query f(y), f(z), and f(y + z)
3. Accept if f(y) + f(z) = f(y + z).
34
Obviously if f is linear, this test will pass with probability 1. The question is, can there
be function that are far from linear but still pass this test with high probability? No, as
shown in the following
Theorem 3.2. If f is δfar from linear, then
Pr[BLRTest
f
accepts ] ≤ 1 −δ.
The above theorem is tight since a random function is 1/2far from linear, and passes
the BLRTest with probability exactly 1/2.
The original proof of Blum, Luby and Rubinfeld [BLR93] is a combinatorial proof of
a weaker version of the above theorem, but we will give an algebraic proof, as similar
techniques will arise later in the course. This algebraic proof is due to Bellare, Coppersmith,
h˚astad, Kiwi and Sudan [BCH
+
96]. Before proceeding to the proof, we will ﬁrst need to
equip ourselves with some basics of Boolean Fourier analysis.
3.2.3 Fourier Analysis
First, rather than working over ¦0, 1¦ as the output of a linear function, it will be convenient
to treat the output space as ¦+1, −1¦ (the roots of unity) under multiplication. Thus the
Boolean 0 corresponds to 1 in this setting, and the Boolean 1 corresponds to 1. Linearity
now takes the form: f ¦0, 1¦
n
→ ¦+1, −1¦ is linear if f(x + y) = f(x)f(y).
Consider the family of functions T = ¦f : ¦0, 1¦
k
→ R¦ and equip T with an addition
(f + g)(x) = f(x) + g(x). It is clear that T is a vector space over the reals. Furthermore,
the characteristic functions ¦δ
a
[a ∈ ¦0, 1¦
k
¦ are a basis, where δ
a
(x) = 1 if x = a and 0
otherwise. Thus T has dimension 2
k
.
We will now show that the linear functions of the form χ
a
(x) = (−1)
a(x)
= (−1)
P
a
i
x
i
( mod 2)
also form a basis for T. For this, it is fruitful to deﬁne an inner product on T as follows:
'f, g` = Exp
x∈{0,1}
k [f(x)g(x)] =
1
2
k
x∈{0,1}
k
f(x)g(x).
(It is an easy exercise to check that this is in fact an inner product.) Note that there are
already 2
k
= dimT functions of this form, so all we need to do is show that they are linearly
independent.
We begin by examining a few basic properties of the functions χ
a
. First, note that
Property 1. χ
a
(x + y) = χ
a
(x)χ
a
(y)
i.e. χ
a
is linear, as mentioned previously. Second,
Property 2. χ
a+b
(x) = χ
a−b
(x) = χ
a
(x)χ
b
(x)
i.e. χ
a
(x) is also linear in a; this should come as no surprise, because of the duality of
a
(x) as both a linear function of a and of x.
The ﬁrst property we will need that is not entirely obvious is that
Property 3.
Exp
x
[χ
a
(x)] =
1 if a = 0
0 otherwise
35
Proof. If a = 0, this clearly holds. If a = 0, then by permuting the indices we may assume
that a
1
= 0 without loss of generality. Then we have
2
k
Exp
x
[χ
a
(x)] =
x
(−1)
P
a
i
x
i
=
x:x
1
=1
(−1)
P
a
i
x
i
+
x:x
1
=0
(−1)
P
a
i
x
i
Then, since (−1)
a(0y)
= −(−1)
a(1y)
, these two sums exactly cancel out.
We then have,
Property 4.
'χ
a
, χ
b
` =
1 if a = b
0 otherwise
.
This follows from the above via 'χ
a
, χ
b
` = E
x
[χ
a
(x)χ
b
(x)] = E
x
[χ
a−b
(x)], and then
applying the above fact.
Thus, the χ
a
form not only a basis for T, but also an orthonormal basis for T. Since
the χ
a
form an orthonormal basis, for any f ∈ T we may write f =
a
ˆ
f
a
χ
a
for 'f, χ
a
` =
ˆ
f
a
∈ R. These
ˆ
f
a
are called the Fourier coeﬃcients of f.
Observer that if the normalized distance δ(f, χ
a
) = , then
ˆ
f
a
= 1(1−)+(−1) = 1−2,
so the Fourier coeﬃcients capture the normalized distance from linear functions.
Now we come to one of the most basic useful facts in Fourier analysis:
Property 5 (Parseval’s identity). 'f, f` =
ˆ
f
2
a
Proof. Writing f in terms of the basis χ
a
, we get:
'f, f` = '
a
ˆ
f
a
χ
a
,
b
ˆ
f
b
χ
b
`
=
a,b
ˆ
f
a
ˆ
f
b
'χ
a
, χ
b
` (by linearity of ', `)
=
a
ˆ
f
2
a
where the last line follows from the previous because the χ
a
form an orthonormal basis.
Corollary 3.3. In particular, if f is a Boolean function, i.e. ±1valued, then
ˆ
f
2
a
= 1.
3.2.4 Proof of Soundness of BLRTest
Finally, we come to the proof of the soundness of the BlumLubyRubinfeld linearity test:
Proof of Theorem 3.2. Suppose f is δfar from any linear function. Note that we can rewrite
the linearity condition f(x)f(y) = f(x + y) as f(x)f(y)f(x + y) = 1, since f is ±1valued.
Then
Pr
x,y
[BLRTest accepts f] = Pr
x,y
[f(x)f(y)f(x + y) = 1]
36
Note that for any random variable Z with values in ¦+1, −1¦, Pr(Z = 1) = Exp[
1+Z
2
],
since if Z = 1, then (1 + Z)/2 = 1 and if Z = −1, then (1 + Z)/2 = 0, so (1 + Z)/2 is an
indicator variable for the event Z = 1. Thus we have
Pr
x,y
[BLRTest accepts f] = Exp
x,y
1 + f(x)f(y)f(x + y)
2
=
1
2
+
1
2
Exp
x,y
[f(x)f(y)f(x + y)]
Now, writing out f in terms of its Fourier coeﬃcients, we get
Pr
x,y
[BLRTest accepts f] =
1
2
+
1
2
Exp
x,y
a,b,c
ˆ
f
a
ˆ
f
b
ˆ
f
c
χ
a
(x)χ
b
(y)χ
c
(x + y)
=
1
2
+
1
2
Exp
x,y
a,b,c
ˆ
f
a
ˆ
f
b
ˆ
f
c
χ
a
(x)χ
b
(y)χ
c
(x)χ
c
(y)
Then, we apply linearity of expectation and the fact that x and y are independent to get:
Pr
x,y
[BLR accepts f] =
1
2
+
1
2
a,b,c
ˆ
f
a
ˆ
f
b
ˆ
f
c
Exp
x
[χ
a
(x)χ
c
(x)] Exp
y
[χ
b
(y)χ
c
(y)]
=
1
2
+
1
2
a,b,c
ˆ
f
3
a
≤
1
2
+
1
2
max
a
(
ˆ
f
a
)
a
ˆ
f
2
a
=
1
2
+
1
2
max
a
(
ˆ
f
a
) (by Parseval’s identity)
= 1 −δ
where the last line follows from the fact that f is δfar from linear, so its largest Fourier
coeﬃcient can be at most 1 −2δ, as noted previously.
References
[BCH
+
96] Mihir Bellare, Don Coppersmith, Johan H˚astad, Marcos A. Kiwi, and
Madhu Sudan. Linearity testing in characteristic two. IEEE Transactions on
Information Theory, 42(6):1781–1795, November 1996. (Preliminary version in
36th FOCS, 1995). doi:10.1109/18.556674.
[BLR93] Manuel Blum, Michael Luby, and Ronitt Rubinfeld. Self
testing/correcting with applications to numerical problems. J. Computer and
System Sciences, 47(3):549–595, December 1993. (Preliminary Version in 22nd
STOC, 1990). doi:10.1016/00220000(93)90044W.
37
1}n . the algorithm computes functions on some approximation to the input instead of the input itself. We instead relax our guarantee on the output as follows: On input x ∈ {0. it queries the input xoracle for the j th bit and obtains xj .distance that can be achieved. which in turn has oracle access to the input oracle x. Error detection: given r ∈ {0. • Since the algorithm does not read the entire input x. the algorithm must compute f (x ) exactly for some x ∈ {0. 3. We get over this by accessing the input and writing the output by oracle access. Note that j need only be log n bits. 1}k . instead it is only implicitly given by the algorithm. 1}n doesn’t have enough time to read it’s input or write it’s output.1. ﬁnd x ∈ {0. Note that a sublinear time algorithm A to compute a function f {0. 1}k → {0. we cannot expect it compute the output f (x) exactly. 1}n . so can be read entirely in sublinear time. decide if r is a codeword. Thus. 2. 32 . Figure 1: Sublinear Time Algorithms • The output is not explicitly written by the algorithm.1 Algorithmic Questions and Sublinear Time Algorithms There are three main algorithmic questions that arise in coding theory: 1. That is. 1}k minimizing ∆(r. In other words. on being queried for index i of the output string f (x) (for some i ∈ [n]). Complexity of encoding. C(x)). the algorithm itself behaves as an oracle for the string f (x). Whenever the sublinear time algorithm wants to access the j th bit of the input string x (for some j ∈ [k]). • The input is represented implicitly by an oracle. an oracle machine A is a sublinear time algorithm for f if Ax (j) = f (x)j where f (x)j is the jth bit of f (x). and 3. the algorithm outputs the bit f (x)i . More formally. In this course. 1}k that is close to the actual input x. Error correction: given r ∈ {0. we will interested in these questions in the context of sublinear time algorithms. We need to be speciﬁc what we mean by sublinear time. Formally.
Let’s consider the questions above one by one: 1. we say a code is locally decodable if there exists a local decoder Dec such that for all x and r where ∆(r. and has the same dimension as Lk . 1}k . For a code to have good errorcorrecting properties. 4 If there is such a decoder for a given code C and it makes fewer than q queries. for any a ∈ {0. then we say C is (q. is typically done in this framework. k}. Pr[Decr (i) = xi ] ≥ . . y. it consists of the dot product of x with every ∈ {0. Taking this into account. That is. A function f {0. Lk = { a a ∈ {0. error detection is known as local testing. the dot product of a and x as vectors over GF (2)) is a linear function. f (x + y) = f (x) + f (y). In fact. This will be the main tool in proving that N P has exponential size PCPs. 1}2 is then deﬁned as W H(x) = x . . whereas local testability determines whether a given string is close to a codeword or far from any codeword. 1} is linear if ∀x.2 The WalshHadamard Code and Linearity Testing Now onto our ﬁrst code. 1}k → {0. on the other hand. 3. test if either r is a codeword or r is far from every codeword (similar to the gap problems we saw earlier.Property testing. C(x)) < . more formally. Now we’ll consider the algorithmic questions of coding theory in the sublinear time framework. . 1}n . Error detection: in this context. Given j. the decoder satisﬁes 3 ∀i ∈ {1.e. for any a ∈ {0. 1}n . N P ⊆ P CP [poly. (Proof: observe that the space of linear functions is a vector space. O(1)]. 1}k } is the set of all linear boolean functions. 3. Note that local decoding is only required to work when the input is suﬃciently close to a codeword. and local decoding and local testing algorithms in particular. . the WalshHadamard code. and also to property testing). there has been some recent work in the area of locally encodable codes by relaxing some of the errorcorrection properties of the code). most bits of the codeword needs to depend on most messagebits. i. (However. Error correction is known as local decoding in this context. it does not seem reasonable to expect a “good” code to have sublinear time encoding. 2. given r ∈ {0. will be randomized algorithms.e. the decoder queries some bits of a noisy codeword C(x) + η and outputs the jth bit of x. )locally decodable. Figure 1 gives a pictorial description of a sublinear time algorithm with the above mentioned relaxations. 1}k → {0. 33 . for those familiar with it.) k The WalshHadamard code W H : {0. W H(x) is the truth table of x . i. More formally. W H(x) is the evaluation of every linear function at x.e. There are two dual views of the WalshHadamard code: on the one hand. the ath bit of the WalshHadamard codeword W H(x) is W H(x)a = x (a). sublinear time algorithms in general. 1}k . the function a (x) = ai xi mod 2 (i. Obviously. For example. Thus the local decodability of a code says nothing about its local testability.
by the linearity of x . Given a garbled codeword f : {0. The test is. ∀x ∈ {0. Since f is δclose to x . given f {0. and f (y + z) 3. 3. Luby and Rubinfeld [BLR93]. 2k .e. as in the WH decoder. 1}k 2.2 Local Testability of the WalshHadamard Code: Linearity Testing To locally test the WH code. Codes with this property are called systematic codes. which is δclose to some WalshHadamard codeword. It has very good distance. Thus. the WalshHadamard code is 2locally decodable. since xi = x (ei ). 1}k . Formally. Since the decoder only makes two queries. then. f (z). 1} we want to test whether it is a linear function (WH codeword) or far from linear. 1}k and check that f (z) + f (y) = f (z + y). 1}k → {0. Choose r ∈R {0. Accept if f (y) + f (z) = f (y + z).1. Pr[Decf (z) = x (z)] ≥ 1 − 2δ.Since any two distinct linear functions disagree on exactly half the set of inputs (i. BLRTestf : “ 1. Choose y. and so is the probability that f (r) = x (r). we have that for a random r. Query f (z + r) and f (r) 3. the decoder Dec works as follows: (The decoder Dec has oracle access to the function f ) Decf : “On input z.1 Local Decodability of the WalshHadamard Code Decoding the WalshHadamard code is very simple. Note that the WH code has the special property that the input bits xi are in fact a subset of the codeword bits. 1}. z ∈R {0. 3. This test was proposed and ﬁrst analyzed by Blum. 34 . Pra [ x (a) = y (a)] = 1/2. Output f (z + r) − f (r) Claim 3.2. Proof. for x = y). The problem of local testing of the WH code is more commonly called linearity testing. the fractional distance of the WalshHadamard codes is 1/2. 1. If both of these conditions hold. If f : {0. 1}k → {0. the probability that f (z + r) = x (z + r) is at least 1 − δ. quite simple: pick y and z uniformly at random from {0. Thus Prr [Decf (z) = x (z)] ≥ 1 − 2δ. Thus the WalshHadamard code is a [k.2. but poor rate. then Decf (z) = x (z). 1}k → {0. W H(x) is both the evaluation of the linear function x at every point as well as the evaluation of every linear function at the point x. It is useful to note that there are two dual views of the WalshHadamard code based on the fact that W H(x)a = x (a) = a (x). we wish to test whether a given truth table is the truth table of a linear function. Query f (y). 2k−1 ]code. 1} is δclose to W H(x). 1}k independently 2.
Kiwi and Sudan [BCH+ 96]. Linearity now takes the form: f {0. First. it will be convenient to treat the output space as {+1. Thus F has dimension 2k .Obviously if f is linear.2. this should come as no surprise. but we will give an algebraic proof. as similar techniques will arise later in the course.1}k (It is an easy exercise to check that this is in fact an inner product. Second. This algebraic proof is due to Bellare. g = Expx∈{0.) Note that there are already 2k = dim F functions of this form. and the Boolean 1 corresponds to 1. If f is δfar from linear. χa (x) is also linear in a. −1} is linear if f (x + y) = f (x)f (y). this test will pass with probability 1. 1}n → {+1.e.2. and passes the BLRTest with probability exactly 1/2. x∈{0. 1}k } are a basis. For this. the characteristic functions {δa a ∈ {0. Coppersmith. 3. The question is. Consider the family of functions F = {f : {0. where δa (x) = 1 if x = a and 0 otherwise. χa+b (x) = χa−b (x) = χa (x)χb (x) i. rather than working over {0. 1} as the output of a linear function.1}k [f (x)g(x)] = 1 · 2k f (x)g(x). as shown in the following Theorem 3. P We will now show that the linear functions of the form χa (x) = (−1) a (x) = (−1) ai xi ( mod 2) also form a basis for F. It is clear that F is a vector space over the reals. Property 2. because of the duality of (x) as both a linear function of a and of x. Furthermore. χa is linear. can there be function that are far from linear but still pass this test with high probability? No. χa (x + y) = χa (x)χa (y) i. it is fruitful to deﬁne an inner product on F as follows: f. then Pr[BLRTestf accepts ] ≤ 1 − δ. Thus the Boolean 0 corresponds to 1 in this setting. Expx [χa (x)] = 1 if a = 0 0 otherwise 35 . note that Property 1. The above theorem is tight since a random function is 1/2far from linear.3 Fourier Analysis First.e. as mentioned previously. h˚ astad. We begin by examining a few basic properties of the functions χa . Luby and Rubinfeld [BLR93] is a combinatorial proof of a weaker version of the above theorem. −1} (the roots of unity) under multiplication. The original proof of Blum. Before proceeding to the proof. a The ﬁrst property we will need that is not entirely obvious is that Property 3. 1}k → R} and equip F with an addition (f + g)(x) = f (x) + g(x). so all we need to do is show that they are linearly independent. we will ﬁrst need to equip ourselves with some basics of Boolean Fourier analysis.
Note that we can rewrite the linearity condition f (x)f (y) = f (x + y) as f (x)f (y)f (x + y) = 1. 3. χa = ˆ ˆ fa ∈ R.2. f. These fa are called the Fourier coeﬃcients of f . Property 4. ˆ Observer that if the normalized distance δ(f. · ) ˆ2 fa a = where the last line follows from the previous because the χa form an orthonormal basis. Now we come to one of the most basic useful facts in Fourier analysis: Property 5 (Parseval’s identity). Then Pr [BLRTest accepts f ] = Pr [f (x)f (y)f (x + y) = 1] x. then fa = 1(1− )+(−1) = 1−2 . Then we have 2k · Expx [χa (x)] = x (−1) P ai xi = Then.2.Proof.y 36 .b ˆ ˆ fa fb χa . then by permuting the indices we may assume that a1 = 0 without loss of generality. Suppose f is δfar from any linear function. χb (by linearity of ·. then ˆ2 fa = 1.y x. If a = 0. b ˆ fb χb = a. ±1valued. Since ˆ the χa form an orthonormal basis.3. χb = 1 0 if a = b . otherwise This follows from the above via χa .e. If a = 0. Writing f in terms of the basis χa . χb = Ex [χa (x)χb (x)] = Ex [χa−b (x)]. Corollary 3. since (−1) a (0y) (−1) x:x1 =1 P ai xi + (−1) x:x1 =0 P ai xi = −(−1) a (1y) . χa ) = . and then applying the above fact. f = ˆ2 fa Proof. this clearly holds. we come to the proof of the soundness of the BlumLubyRubinfeld linearity test: Proof of Theorem 3. the χa form not only a basis for F. We then have.4 Proof of Soundness of BLRTest Finally. In particular. these two sums exactly cancel out. but also an orthonormal basis for F. Thus. so the Fourier coeﬃcients capture the normalized distance from linear functions. i. we get: f. χa . since f is ±1valued. if f is a Boolean function. f = a ˆ fa χa . for any f ∈ F we may write f = a fa χa for f.
Michael Luby. then (1 + Z)/2 = 1 and if Z = −1.b. J. doi:10.c 1 1 ˆ + max(fa ) 2 2 a ˆ2 fa a 1 1 ˆ + max(fa ) (by Parseval’s identity) = 2 2 a = 1−δ where the last line follows from the fact that f is δfar from linear.y 2 2 a. Marcos A.556674.y [f (x)f (y)f (x + y)] 2 2 Now.Note that for any random variable Z with values in {+1. Kiwi. so its largest Fourier coeﬃcient can be at most 1 − 2δ. IEEE Transactions on Information Theory. so (1 + Z)/2 is an indicator variable for the event Z = 1. November 1996. Thus we have Pr [BLRTest accepts f ] = Expx. doi:10.y fa fb fc χa (x)χb (y)χc (x + y) x. References [BCH+ 96] Mihir Bellare. we apply linearity of expectation and the fact that x and y are independent to get: Pr [BLR accepts f ] = x. [BLR93] Manuel Blum.c Then.y fa fb fc χa (x)χb (y)χc (x)χc (y) 2 2 a.b. and Ronitt Rubinfeld. 47(3):549–595. 2 since if Z = 1. December 1993. Linearity testing in characteristic two. 1990). 1995). 37 . then (1 + Z)/2 = 0. (Preliminary Version in 22nd STOC. as noted previously.b.c = ≤ ˆ3 fa a. 42(6):1781–1795. Selftesting/correcting with applications to numerical problems. −1}. Don Coppersmith.b.y 1 + f (x)f (y)f (x + y) 2 = 1 1 + Expx. writing out f in terms of its Fourier coeﬃcients. Johan H˚ astad. we get 1 1 ˆ ˆˆ Pr [BLRTest accepts f ] = + Expx. (Preliminary version in 36th FOCS.1109/18.1016/00220000(93)90044W. P r(Z = 1) = Exp[ 1+Z ]. and Madhu Sudan.y x. Computer and System Sciences.y 1 1 + 2 2 1 1 + 2 2 ˆ ˆˆ fa fb fc Expx [χa (x)χc (x)] Expy [χb (y)χc (y)] a.c 1 1 ˆ ˆˆ = + Expx.
This action might not be possible to undo. Are you sure you want to continue?
Use one of your book credits to continue reading from where you left off, or restart the preview.