Você está na página 1de 10

NN 1219

Neural
Networks
PERGAMON Neural Networks 11 (1998) 1049–1058

Contributed article

Constructive function-approximation by three-layer artificial


neural networks
Shin Suzuki*
Information Science Research Laboratory, NTT Basic Research Laboratories, 3-1 Morinosato-Wakamiya, Atsugi-Shi, Kanagawa Pref., 243-0198 Japan
Received 3 February 1995; revised 11 May 1998; accepted 11 May 1998

Abstract

Constructive theorems of three-layer artificial neural networks with (1) trigonometric, (2) piecewise linear, and (3) sigmoidal hidden-layer
units are proved in this paper. These networks approximate 2p-periodic pth-order Lebesgue-integrable functions (Lp2p ) on R m to R n for p $ 1
with Lp2p ¹ norm. (In the case of (1), the networks also approximate 2p-periodic continuous functions (C 2p) with C 2p-norm.) These theorems
provide explicit equational representations of these approximating networks, specifications for their numbers of hidden-layer units, and
explicit formulations of their approximation-error estimations. The function-approximating networks and the estimations of their approxi-
mation errors can practically and easily be calculated from the results. The theorems can easily be applied to the approximation of a non-
periodic function defined in a bounded set on R m to R n. q 1998 Elsevier Science Ltd. All rights reserved.

Keywords: Three-layer artificial neural network; Function approximation; Approximating network construction; Hidden-layer unit number
specification; Approximation-error estimation; Jackson’s theorem

1. Introduction actual approximation-error values, because the formulations


of these estimations contain inexplicit constants. In this paper,
Function approximation is a key issue in evaluating the we prove constructive approximation theorems of three-layer
computational ability of multilayer artificial neural net- artificial neural networks with three kinds of hidden-layer
works (Hecht-Nielsen, 1990). The main results of extensive units: (1) trigonometric, (2) piecewise linear, and (3) sig-
studies on this subject are that three-layer artificial neural moidal hidden-layer units, which approximate 2p-periodic
networks can, with a sufficient number of hidden-layer pth-order Lebesgue-integrable functions (Lp2p ) on R m to R n
units, approximate continuous functions (Cybenko, 1989; for p $ 1; i.e. each of their coordinate functions is an
Funahashi, 1989) and Lebesgue-integrable functions Lp2p ¹ function on R m to R, with Lp2p ¹ norm. (In the case
(Hecht-Nielsen, 1989; Hornik et al., 1989; Hornik, 1991) of (1), the networks also approximate 2p-periodic con-
to any degree of accuracy. Most of these studies, however, tinuous functions (C 2p) on R m to R n with C 2p-norm.)
only showed the existence of approximating networks by (Suzuki, 1993). These theorems provide explicit equational
non-constructive methods. These results therefore contri- representations of three kinds of approximating networks,
bute almost nothing to answering such important questions specifications for their numbers of hidden-layer units, and
as how we can construct these approximating networks and explicit formulations of their approximation error estimations,
how many hidden-layer units we need to approximate which can be calculated practically and easily. For simplicity,
specific functions within some specified error. To solve the approximations to 2p-periodic functions on R m to R n are
these problems constructive approximations by artificial discussed in this paper. But the results can easily be applied
neural networks have been studied (Mhasker and Micchelli, to the approximation to a non-periodic function defined in a
1992; Mhasker, 1993). These studies describe approximations bounded set on R m to R n, as discussed later.
to continuous functions with approximation-error estimations.
They, however, are not simple enough for deriving explicit 2. Preliminaries
equational representations of approximating networks that can
be calculated practically. It is also difficult to estimate their Let N and R be the sets of natural and real numbers iand
N 0 ¼ N ∪ {0}. Let 0 P ¼ (0,…,0) and 1i ¼ (0, …0, 1̆, 0,
* Tel.: +81 462 503574; Fax: +81 462 504721; E-mail: shin@idea.brl.ntt.jp …0) [ N0m . Let lrl ¼ m i ¼ 1 lri l for r ¼ (ri )i ¼ 1 [ N0 ,
m m

0893-6080/98/$19.00 q 1998 Elsevier Science Ltd. All rights reserved.


PII: S0 89 3 -6 0 80 ( 98 ) 00 0 68 - 9
1050 S. Suzuki / Neural Networks 11 (1998) 1049–1058
ÿ Pm 
2 1=2 Pm
ktk ¼ i ¼ 1 ti for t ¼ (ti )m
i ¼ 1 [ R , and rt ¼
m
i ¼ 1 ri ti : where al, p, q [fi ] ¼ 2Bl bl, p bl, q hfi , cos(p ¹ q)ti, bl, p, q [fi ] ¼
For p $ 1, we denote by Lp2p (Rm ) the space of 2p-periodic 2Bl bl, p bl, q hfi , sin(p ¹ q)ti, and v l[f i] ¼ hf i, 1i. (Note: this
(on each R of domain R m) pth-order Lebesgue-integrable summation is over combinations of p ¼ (pu )m u ¼ 1 and
functionsR on R R m to R with kf kLp2p ¼ q ¼ (qv )mv ¼ 1 [ N0 such that p Þ q, 0 # p u, q n # l.
m
¹m p … p
¹ p lf (x)l dxg
p 1=p
f(2p) ¹p and by C 2p(R m) the That is, if it is added in the case of (p, q), it is not added
space of 2p-periodic continuous functions on R m to R in the case of (q, p). This notation is used throughout this
with kf kC2p ¼ sup{lf (x)l; lxi l # p}: Let W ¼ Lp2p (Rm ) paper.)
or or C2p (Rm ) throughout
Rp Rthis paper. For f, g [ W, we define The approximation error of each coordinate with W-norm
h f , gi ¼ (2p) ¹ m ¹ … p f (t)g(t) dt, fˆ(r) ¼ h f , e ¹ irt i, is estimated by
p ¹p
and the Rp convolution
R of f and g f p g(x) ¼  
(2p) ¹ m ¹ … p f (t)g(x ¹ t) dt: We say h is a higher p2 p ÿ 
p ¹p
kfi ¹ TNl [fi ]kW # 1 þ m qW fi , (l þ 2) ¹ 1 : (2)
order infinity than y, if h, y [ R and h ¼ h(y) → ` and 2
y=h(y) → 0 as y → `. We introduce the modulus of con-
tinuity and Lipschitz condition for a multivariable function,
which measure the variation and the smoothness of a function.
Theorem 2 (Constructive approximation by piecewise
Definition 1 (Modulus of continuity and Lipschitz con- linear hidden-layer networks). Let f ¼ (fi )ni¼ 1 be an
dition). Let f [ W and d $ 0. The modulus of continuity of Lp2p ¹ function on R m to R n. For independent l and
f in W is q W(f, d) ¼ sup {kf(· þ t) ¹ f(·)k RW; t [ RRm, ktk # d}, j [ N, a three-layer network PNl, j [f] ¼ (PNl, j [fi ])ni¼ 1 on
p … p
where kf (· þ t) ¹ f (·)kW ¼ f(2p) ¹ m ¹ p ¹ p lf (x þ R m to R n, which approximates f with Lp2p ¹ norm and has
t) ¹ f (x)l dxg (W ¼ L2p (R )), sup {lf (x þ t) ¹ f (x)l;
p 1=p p m
2mjl(l þ 1) (2l þ 1) m¹1 piecewise linear hidden-layer
lxi l # p} (W ¼ C2p (Rm )). f satisfies a Lipschitz condition units PL j,k, is constructed by
with constant M . 0 and exponent n . 0 in W, when
kf(· þ t) ¹ f(·)k W # Mktk n for t [ R m. PNl, j [fi ](x) ¼ vl, j [fi ]

3. Main results u , qv #l 4lp ¹


0#pX Xqlj ¹ 1
þ al, j, p, q, k [fi ]PLj, k ((p ¹ q)x), ð3Þ
Qm Combinations k¼0
We denote Bl ¼ (2=l þ 2) m
and bl, r ¼ i ¼ 1 sin of pÞq[Nm 0

ri þ 1=l þ 2p for l [ N and r ¼ (ri )mi ¼ 1 [ N0 .


m
We say
f ¼ (fi )i ¼ 1 is a W-function on R to R if each f i [ W,
n m n
where
8  
> kp
>
> 0 rx # ¹ lrlp þ
>
> 2j
>
>
>
<  
2j kp (k þ 1)p
PLj, k (rx) ¼ rx þ 2lrlj ¹ k ¹ lrlp þ , rx , ¹ lrlp þ
>
> p 2j 2j
>
>  
>
>
>
> (k þ 1)p
: 1 rx $ ¹ lrlp þ ðcf: Fig: 1Þ
2j

when W ¼ Lp2p (Rm ) for p $ 1 or C 2p(R m), and g ¼ (gi )ni¼ 1


approximates f with W-norm if each g i approximates f i with
W-norm.

Theorem 1 (Constructive approximation by trigono-


metric hidden-layer networks). Let f ¼ (fi )ni¼ 1 be a W-
function on R m to R n. For l [ N, a three-layer network
TNl [f] ¼ (TNl [fi ])ni¼ 1 on R m to R n, which approximates f
with W-norm and has (2l þ 1) m ¹ 1 trigonometric hidden-
layer units, is constructed by
TNl [fi ](x) ¼ vl [fi ]
u , qv #l
0#pX
þ {al, p, q [fi ] cos(p ¹ q)x
Combinations
of pÞq[Nm 0

Fig. 1. The input–output functions of the piecewise linear hidden-layer unit


þ bl, p, q [fi ] sin(p ¹ q)x}, ð1Þ PL j,k and the sigmoidal hidden-layer unit SG j,k.
S. Suzuki / Neural Networks 11 (1998) 1049–1058 1051

0#pXu , qv #l Lipschitz condition simplifies these error estimations as the


p
vl, j [fi ] ¼ hfi , 1i þ 2Bl sin following corollary shows.
4j Combinations
of pÞq[Nm
0
Corollary 1 (Error estimations under Lipschitz con-
( ¹ 1)lp ¹ ql bl, p bl, q hfi , cos(p ¹ q)ti, dition). In Theorems 1, 2, and 3 if f i of f satisfies a Lipschitz
p condition with constant M i and exponent n i in W ¼ Lp2p (R m)
and al, j, p, q, k [fi ] ¼ 4( ¹ 1)lp ¹ ql Bl bl, p bl, q sin or C 2p(R m), then q W(f i, (l þ 2) ¹1) in Eq. (2) Eq. (4) Eq. (6)
4j
 can be replaced by Mi (l þ 2) ¹ ni .
(2k þ 1)p
3 hfi , sin(p ¹ q)ti cos The next corollary shows the asymptotic behaviors of the
4j approximations by networks TN l, PN l,j, and SN l,j when l
 increases. This assures that they can approximate functions
(2k þ 1)p
¹ hfi , cos(p ¹ q)ti sin : with any degree of accuracy if their numbers of hidden-layer
4j
units increase under the following conditions.
The approximation error of each coordinate with
Lp2p ¹ norm is estimated by Corollary 2 (Asymptotic behaviors of the approxi-
  mations). (i) kfi ¹ TNl [fi ]kW → 0 as l → `. (ii) kfi ¹ PNl, j
p2 p ÿ 
kfi ¹ PNl, j [fi ]kLp2p # 1 þ m qLp2p fi , (l þ 2) ¹ 1 [fi ]kLp2p → 0 as l → `, if j is a higher order infinity than
2
 m 
mp
l 2 . (iii) kfi ¹ SNl, j [fi ]kLp2p → 0 as l → `, if j is a higher
8(l þ 2)
þ 2kfi kL2p
p ¹1 order infinity than l mp.
p2
( p  )1=p
p m 4j p Remark 1. Only the multidimensional Fourier coefficients
3 ¹ cot : ð4Þ of an approximated function such as h f i, 1i, h f i, cos (p ¹
(2p)m ¹ 1 j p 4j
q)ti, and h f i, sin (p ¹ q)ti for p and q [ Nm
0 are needed to
construct approximating networks, and the modulus of con-
tinuity of the function is needed to estimate their approxi-
Theorem 3 (Constructive approximation by sigmoidal mation errors in Theorems 1, 2, and 3.
hidden-layer networks). Let f ¼ (fi )ni¼ 1 be an Lp2p ¹
function on R m to R n. For independent l and j [ N, a Remark 2. For any l [ N, if j is large enough for l, the
three-layer network SNl, j [f] ¼ (SNl, j [fi ])ni¼ 1 on R m to functions constructed by networks TN l, PN l,j, and SN l,j
R n, which approximates f with Lp2p ¹ norm and has become almost the same, and their approximation errors
2mjl(l þ 1)(2l þ 1) m¹1 sigmoidal hidden-layer units of become almost the same value, which can be estimated
SG j,k, is constructed by mainly by the same formulation based on the modulus of
SNl, j [fi ](x) ¼ vl, j [fi ] continuity in terms of l; i.e. the right side of Eq. (2) and the
first terms of the right sides of Eqs. (4) and (6), while the
u , qv #l 4lp ¹
0#pX Xqlj ¹ 1 second terms of the right sides of Eqs. (4) and (6) are neg-
þ ligible. In fact, for any l [ N, as j increases, the functions
Combinations k¼0
of pÞq[Nm
constructed by PN l,j and SN l,j approach the function con-
0
structed by TN l and the approximation errors of PN l,j and
al, j, p, q, k [fi ]SGj, k ((p ¹ q)x), ð5Þ SN l,j approach the error of TN l. This is because Theorems
where 2 and 3 are derived using the proof of Theorem 1, as we can
     ¹ 1 see later in their proofs.
8j
SGj, k (rx) ¼ 1 þ exp ¹ rx þ 8lrlj ¹ 4k ¹ 2
p Remark 3. We try to formulate the error estimations in
Theorems 1, 2, and 3 in terms of the number of hidden-
(cf. Fig. 1), and vl, j [fi ] and al, j, p, q, k [fi ] are the same
layer units U. In the case of p
TNl, the error is approximately
definitions as in Theorem 2.
estimated by (1 þ (p2 =2) m)qW (fi , 2U ¹ 1=m ) for U ¼
The approximation error of each coordinate with
(2l þ 1)m ¹ 1. In the case of PN l,j and SN l,j, according
Lp2p ¹ norm is estimated by
  to Remark 2, when j is large enough for l, the perrors
 are
p2 p ÿ  approximately estimated by (1 þ (p2 =2) m)qW (fi ,
kfi ¹ SNl, j [fi ]kLp2p # 1 þ m qLp2p fi , (l þ 2) ¹ 1
2 2(mj=2U) ¹ 1=(m þ 1) ) for U ¼ 2mjl(l þ 1)(2l þ 1)m ¹ 1 ,
 m  which is independent of j.
8(l þ 2)
þ 2kfi kLp2p ¹1
p2 Remark 4. These results also apply to an approximation to
( p  )1=p
p m a non-periodic function defined in a bounded set on R m. In
1 4j p
3 log 2 ¹ þ ¹ cot : ð6Þ fact, such a function is easily transformed into a 2p-periodic
(2p)m ¹ 1 j 2 p 4j
function by an affine transformation and an extension of the
1052 S. Suzuki / Neural Networks 11 (1998) 1049–1058

domain. For simplicity, only the approximation to a 2p- nq W (f, d) for n [ N. For any t $ 0, we denote [t] as the
periodic function on R m is discussed in this paper. largest integer # t. Then q W (f, td) # q W (f, (1 þ [t])d) #
(1 þ [t])q W(f, d) # (1 þ t)q W(f, d). Q.E.D.

4. Proof of main results 4.1.2. Multivariable function approximation by a


convolution operator
First we prove a constructive approximation theorem by An approximation by convolution operator is an
multidimensional trigonometric polynomials, which is a important method of function approximation.
constructive multidimensional extension of Jackson’s
theorem of approximations by trigonometric polynomials. Theorem 4 (Approximation by convolution operator).
Theorem 1 is directly implied from the theorem. Then we Let f [ W, « . 0, and kl [ L12p (Rm ) be non-negative for
prove Theorems 2 and 3 using theorems proved in the first l [ N. Then the convolution k l p f approximates f such that
step and approximation theorems for multidimensional


trigonometric functions approximated by piecewise linear kkl p f ¹ f kW # k̂l (0) ¹ 1 kf kW
and sigmoidal hidden-layer networks.
" #
p ¹ 1 1=2
4.1. Approximation by trigonometric hidden-layer networks þ k̂l (0) þ p« k̂l (0) qW (f , «gl ), ð7Þ
2
Jackson’s theorems are approximation theorems using
( )1=2
certain functions which permit estimation of their approxi- X
m
ÿ 
mation errors (Feinerman and Newman, 1974; Davis, 1975; where gl ¼ mk̂l (0) ¹ Re k̂l (1i ) : (8)
i¼1
Takenouchi and Nishishiraho, 1986). In the case of a multi-
variable-function approximation, a multidimensional exten-
sion of Jackson’s theorem about an approximation by
polynomials has been proved (Feinerman and Newman, Proof. t # p sin t=2 (0 # t # p) and p sin t=2 # t ( ¹
1974). In this paper a multidimensional extension of p # t # R0), thenR t2 # p2 sin2 t=2 ( ¹p p#
 2t # p). Then
Jackson’s theorem for an approximation by trigonometric (2p) ¹m p … p 2
(t)dt # p= gl and (2p) ¹ m
Rp R ¹p ¹p ktk p
k 2
polynomials is constructively proved. This provides an … p ktkk(t)dt # p= 2fk̂(0)g1=2 gl from Schwarz
¹p ¹p
explicit equational representation of an approximating inequality.
p ¹ 1 Hence 1=2 kkl p f ¹ k̂l (0)f kW # ½k̂(0) þ
multidimensional trigonometric polynomial and an explicit p= 2d gl fk̂(0)g ÿqW (f , d) from kf(· ¹ t) ¹ f(·)k W #
formulation of its approximation-error estimation for the (1 þ ktkd ¹1)q W(f, d) according to (v) of Proposition 1,
l .
order of the polynomial. Theorem 1 is derived from the Fubini’s theorem, and Hölder’s inequality. If gp  0, put
theorem, because the polynomial is just a network with d ¼ «gl . 0 and kkl p f ¹ k̂l (0)f kW # [k̂(0) þ (p= 2)« ¹ 1 {k̂
trigonometric hidden-layer units. (0)}1=2 ]qW (f , «gl ). If gl ¼ 0, kkl p f ¹ k̂l (0)f kW #
k̂(0)qW (f , d). Let d → 0 þ 0, kkl p f ¹ k̂l (0)f kW ¼ 0 from
4.1.1. Properties of the modulus of continuity (v) of Proposition 1. Hence, we obtain Eq. (7). Q.E.D.
First we show the following properties of the modulus of
continuity of a multivariable function. 4.1.3. Multidimensional Fejér–Korovkin kernel
We introduce the following multidimensional Fejér–
Proposition 1. Let f [ W and d $ 0. (i) q W(f, 0) ¼ 0, (ii) 0 # Korovkin kernel.
q W(f, d) # 2kfk W, (iii) q W(f, d 1) # q W(f, d 2) for d 1 # d 2, (iv)
q W(f, d 1 þ d 2) # q W(f, d 1) þ q W(f, d 2) for d 1, d 2 $ 0, (v) Definition 2 (Multidimensional Fejér–Korovkin kernel).
limd→0 þ 0 qW (f , d) ¼ 0, (vi) q W(f, td) # (1 þ t)q W(f, d) for t Let r ¼ (ri )m
i ¼ 1 [ N0 and l [ N. The m-dimensional Fejér–
m

$ 0, (vii) If qLp2p (f , d) and qC2p (f , d) exist, then Korovkin kernal K l is defined by


qLp2p (f , d) # qC2p (f , d), and (viii) If f satisfies a Lipschitz 2
X
condition with constant M and exponent n in W, then irt
Kl (t) ¼ Bl bl, r e , where bl, r
0#r #l
i
Proof. (i), (ii), (iii), (iv), (vii), and (viii): These follow from ( )¹1
the definition of a modulus of continuity. (v): We prove that Ym
ri þ 1 X
kf (· þ t) ¹ f (·)kW → 0 as ktk → 0 þ 0. When W ¼ C 2p(R m), f ¼ sin p and Bl ¼ (bl, r )2 :
i¼1 lþ2 0#ri #l
is uniformly continuous. Hence, the relation holds. When
W ¼ Lp2p (Rm ), C2p (Rm ) is a dense subset of W. Then for f [
W and any « . 0, there exists g [ C 2p(R m) such that Proposition 2
kf(·) ¹ g(·)k W , «=3. As g is uniformly continuous, there exists  m
« 2
d $ 0 such that kg(· þ t) ¹ g(·)k W , for ktk , d. Hence, for (i) Bl ¼ :
3 lþ2
ktk , d, kf(· þ t) ¹ f(·)k W , «. (vi): From (iv), q W (f, nd) #
S. Suzuki / Neural Networks 11 (1998) 1049–1058 1053

u , qv #l
0#pX 4.1.4. Approximation by multidimensional trigonometric
(ii) Kl (t) ¼ 1 þ 2Bl bl, p bl, q cos(p ¹ q)t: polynomials
Combinations
of pÞq[Nm 0
The convolution of the multidimensional Fejér–
Korovkin kernel and a target function gives an approxi-
u , qv #l
0#pX mating multidimensional trigonometric polynomial. A
(iii) K̂ l (r) ¼ Bl blp blq (r [ Nm
0 ) thus K̂ l (0) ¼ 1: multidimensional extension of Jackson’s theorem for an
p¹q¼r approximation by trigonometric polynomials is construc-
p, q[Nm
0
tively proved from the convolution.
p
(iv) K̂ l (1i ) ¼ cos : Theorem 5 (Constructive approximation by multidimen-
lþ2
sional trigonometric polynomials). Let f [ W and K l be
the m-dimensional Fejér–Korovkin kernel. The convolution
K l p f is an m-dimensional trigonometric polynomial which
Proof approximates f such that
!¹m
X rþ1 u , qv #l
0#pX
(i) : Bl ¼ sin2 p Kl p f (x) ¼ h f , 1i þ 2Bl bl, p bl, q
0#r#l lþ2
Combinations
!¹m of pÞq[Nm
X
0
2(r þ 1) ( )
¼2 m
lþ1¹ cos p h f , cos(p ¹ q)ti cos(p ¹ q)x
0#r#l lþ2 : ð9Þ
 m þ h f , sin(p ¹ q)ti sin(p ¹ q)x
2
¼ :
lþ2
The approximation error is estimated by
X X  
(ii) : Kl (t) ¼ Bl blp eipt blq eiqt ¼ 1 þ 2Bl p2 p ÿ 
0#pu #l 0#qv #l kf ¹ Kl p f kW # 1 þ m qW f , (l þ 2) ¹ 1 : (10)
2
u , qv #l
0#pX
3 blp blq cos(p ¹ q)t:
Combinations
If f satisfies a Lipschitz condition with constant M and
of pÞq[Nm 0 exponent n in W, then q W(f, (l þ 2) ¹1) of Eq. (10) can be
(iii): h1, e ¹irti ¼ 0 (r Þ 0), hcos at, cos bti ¼ 1/2 (a ¼ 6 b), 0 replaced by M(l þ 2) ¹n.
(otherwise) and hcos at, sin bti ¼ 0. Hence, it is derived
from (ii). Proof. Eq. (9) is derived from (ii) of Proposition 2. We
apply K l to Theorem 4. From Eq. (8) of Theorem p4, (iii)
p X q þ1 X
(iv) : cos sin2 i p¼ and  p
(iv) pof Proposition 2, g l ¼ 2m sin p/
l þ 2 0#qi #l lþ2 0#qi #l ¹ 1 2(l þ 2) , mp= 2(l þ 2). Put «p¼  pl
{g (l þ 2)} ¹1
. 0,
then «g l ¼ (l þ 2) and « , p m= 2. Then we obtain
¹1 ¹1
qi þ 1 q þ2
sin p sin i p Eq. (10) from Eq. (7) of Theorem 4 and (iii) of Proposition
lþ2 lþ2 2, and Lipschitz case from (viii) of Proposition 1. Q.E.D.
from
Corollary 3 (Multidimensional extension of Jackson’s
p q þ1 theorem). For f [ W there exists a l-order m-dimensional
cos sin2 i p
lþ2 lþ2 trigonometric polynomial Tÿl for l [  N such that
  p
1 qi qi þ 1 qi þ 1 qi þ 2 kf ¹ Tl kW # (1 þ p2 m=2)qW f , (l þ 2) ¹ 1 . If f satisfies
¼ sin p sin p þ sin p sin p : a Lipschitz condition with constant M and exponent n in
2 lþ2 lþ2 lþ2 lþ2
W, then q W(f, (l þ 2) ¹1) can be replaced by M(l þ 2) ¹n.
Hence, from (iii),
0#q1 , …, qi ¹ 1 , Proof. The corollary is derived from Theorem 5. Q.E.D.
qi þ 1 ,X
…, qm #l
K̂ l (1i ) ¼ Bl bl, q bl, q þ 1i 4.1.5. Proof of Theorem 1
0#qi #l ¹ 1 Network construction: We denote K l p f i by TN l[f i] and
X qi þ 1 q þ2 obtain Eq. (1) from Eq. (9) of Theorem 5. Because TN l[f i] is
sin p sin i p a linear combination of trigonometric functions, it is just a
0#qi #l ¹ 1 lþ2 lþ2 p
¼ X ¼ cos : Q:E:D: network on R m to R with trigonometric hidden-layer units.
q þ 1 lþ2
sin2 i p Then TN l[f] is a network on R m to R n with trigonometric
0#qi #l lþ2 hidden-layer units which approximates f with W-norm.
1054 S. Suzuki / Neural Networks 11 (1998) 1049–1058

Hidden-layer unit number: Each TN l[f i] has common Proof We construct


hidden-layer units based on cos(p ¹Pq)x and sin(p ¹ q)x.
0#pu , qv #l
X¹ 1 
4lrlj
(k þ 1)p

Then the number
P ¹ l#ri #l is given by Combinations of pÞq[Nm PSj (rx) ¼ sin ¹ lrlp þ
2 ¼ 1=2 rÞ0[Nm 2 ¼ (2l þ 1) ¹ 1.
m 0
k¼0 2j
0  
Error estimation: Eq. (2) is derived from Eq. (10) of kp
¹ sin ¹ lrlp þ PLj, k (rx)
Theorem 5. Q.E.D. 2j
4.2. Approximations by piecewise linear and sigmoidal p X¹ 1
4lrlj
(2k þ 1)p
¼ 2( ¹ 1)lrl sin cos PLj, k (rx),
hidden-layer networks 4j k¼0 4j
which approximates sin(rx), where j is a partition number
First we show a constructive approximation to a multi-
of a quarter period of sinlrlx. Then it is a network with 4lrlj
dimensional trigonometric function by networks with
piecewise
Rp P2j ¹ 1 units based on PL j,k. From
linear hidden-layer
piecewise linear and sigmoidal hidden-layer units. Applying
0 PS j (x) dx ¼
P ¹1 ðp=4jÞ k ¼ 0 fsinkp=2j þ sin (k þ 1)p=2jg
this to the polynomial obtained in Theorem 5, we derive
¼ ðp=2jÞ 2j k¼1 sin kp=2j p cot p=4j,
¼ ðp=2jÞ R p then
Theorems 2 and 3.
ksin rx ¹ PSj (rx)kL12pp # ð2p R m=(2p)m Þ ¹ p lsin
4.2.1. Multidimensional trigonometric-function lrlx ¹ PS (lrlx)ldx
pj ¼ ( m =(2p) m¹1
Þ p
¹p fsin x ¹ PS j (x)g
approximation by two kinds of networks dx ¼ pð m=(2p)m ¹ 1 j) (4j=p ¹ cot p=4j). Because 0 #
We construct networks with piecewise linear and sigmoidal lsin rx ¹ PS j(rx)l # 1 according to the construction of
hidden-layer units, which approximate multidimensional PS j(rx), Eq. (13) can be derived from ksin rx ¹
trigonometric functions, specify their numbers of hidden- PSj (rx)kLp2p # ksin rx ¹ PSj (rx) kL1=p 1 for p $ 1. We
2p
layer units, and estimate their approximation errors. can construct PC j(rx) in the same manner. Q.E.D.

Theorem 6. Let r [ Nm Theorem 7 Let r [ Nm 0 and SG j,k be a function defined in


0 and PL j,k be a function defined in
Theorem 2. For j [ N, three-layer networks PS j(rx) and Theorem 3. For j [ N, three-layer networks SS j(rx) and
PC j(rx) (cf. Fig. 2), which respectively approximate sin(rx) SC j(rx) (cf. Fig. 3), which respectively approximate sin(rx)
and cos(rx) and have 4lrlj piecewise linear hidden-layer and cos(rx) and have 4lrlj sigmoidal hidden-layer units
units based on PL j,k, are constructed by based on SG j,k, are constructed by
p X
4lrlj ¹ 1
p X
4lrlj ¹ 1
PSj (rx) ¼ 2( ¹ 1)lrl sin SSj (rx) ¼ 2( ¹ 1)lrl sin
4j k ¼ 0 4j k ¼ 0
(2k þ 1)p
(2k þ 1)p cos SGj, k (rx) and ð14Þ
cos PLj, k (rx) and ð11Þ 4j
4j
PCj (rx) ¼ ( ¹ 1)lrl ¹ 2( ¹ 1)lrl SCj (rx) ¼ ( ¹ 1)lrl ¹ 2( ¹ 1)lrl

X¹ 1 p X¹ 1
4lrlj
(2k þ 1)p
(2k þ 1)p
4lrlj
3 sin
p
sin PLj, k (rx): ð12Þ 3 sin sin SGj, k (rx): ð15Þ
4j 4j 4j k¼0 4j
k¼0
The approximation errors with Lp2p ¹ norm are estimated by
The approximation errors with Lp2p ¹ norm are estimated by
ksin rx ¹ SSj (rx)kLp2p ¼ kcos rx ¹ SCj (rx)kLp2p
ksin rx ¹ PSj (rx)kLp2p ¼ kcos rx ¹ PCj (rx)kLp2p
( p  )1=p
( p  )1=p p m 1 4j p
p m 4j p # log 2 ¹ þ ¹ cot : ð16Þ
# ¹ cot : ð13Þ (2p)m ¹ 1 j 2 p 4j
(2p)m ¹ 1 j p 4j

Fig. 2. Sin 2x and the functions derived from the approximating networks Fig. 3. Sin 2x and the functions derived from the approximating networks
with piecewise linear hidden-layer units PS j(2x) at j ¼ 1 and 2. with sigmoidal hidden-layer units SS j(2x) at j ¼ 1 and 2.
S. Suzuki / Neural Networks 11 (1998) 1049–1058 1055

Proof. Denote, by SS j(rx), Eq. (11) of Theorem 6 replacing [ ¹ p, p] (cf. Fig. 4) by the neural networks proposed in this
PL j,k(rx) with SG j,k(rx). Then it is a network with 4lrlj paper. The following approximating networks are con-
sigmoidal hidden-layer units based on SG j,k. Let structed using Eqs. (1), (3) and (5): (i) networks with
yk ¼ ¹ p þ kp=2j and y ¼ 8j=px þ 8jp ¹4k
 ¹ 2.m Then
Rp trigonometric hidden-layer units for l ¼ 2, 4, 6, 8, and
kPLj, k (rx) ¹ SGj, k (rx) kL12p #p ð2p
 m =(2p) Þ R ¹ p 10, which respectively have 24, 80, 168, 288, and 440
`
lPLj, k (lrlx) ¹ SGj, k (lrlx)ldx p # ð m=(2p)m ¹ R1 lrlÞ ¹ ` hidden-layer units; (ii) Networks with piecewise linear
lPLj, k (x) ¹ SGj, k (x)ldx ¼ ð2pm =(2p)m ¹ 1 lrlÞ y`kRþ yk þ 1 =2 and sigmoidal hidden-layer units for l ¼ 2, 4, 6, 8, and
`
fPLj, k (x) ¹ RSGj, k (x)gdx ¼ ðp m=(2p)m ¹ 1 4lrljÞ½ p 2 f1 ¹ 10 at j ¼ 5, which respectively have about 6.0 3 10 2,
2
sg(y)g dy þ 0 fy þ 2=2 ¹ sg(y)g dyÿ ¼ ðp m=(2p)m ¹ 1 3.6 3 10 3, 1.1 3 10 4, 2.4 3 10 4, and 4.6 3 10 4 hidden-layer
4lrljÞ(log 2 ¹ 1=2). Hence, from Eqs. (11) and (13) of units, and the same l values at j ¼ 60, which respectively
Theorem 6, Eq. (16) is derived in the case of p ¼ 1. 0 # have about 7.2 3 10 3, 4.3 3 10 4, 1.3 3 10 5, 2.9 3 10 5, and
lsin rx ¹ SS j(rx)l # 1 according to the construction of 5.5 3 10 5 hidden-layer units. Their actual approximation-
SS j(rx). Then we obtain Eq. (16) from ksin rx ¹ error values with L12p ¹ norm, which are the left sides of
SSj (rx)kLp2p # ksin rx ¹ SSj (rx)kL1=p
1 for p $ 1. We can con- Eqs. (2), (4) and (6), are calculated by numerical integra-
2p
struct SC j(rx) in the same manner. Q.E.D. tions. Their estimated approximation error values are calcu-
lated from the right sides of Eqs. (2), (4) and (6). j is fixed at
4.2.2. Proofs of Theorems 2 and 3 and Corollaries 1 and 2 5 and 60 for l in this example, therefore notice that the
estimated error values of networks with piecewise and
Proofs of Theorems 2 and 3. Network construction: We sigmoidal hidden-layer units do not necessarily decrease
denote, by PN l,j[f i], Eq. (9) of Theorem 5 replacing monotonically as l increases because of the second terms
cos(p ¹ q)x and sin(p ¹ q)x respectively with PC j((p ¹ q) of the right sides of Eqs. (4) and (6). When j ¼ 5, the actual
x) and PS j((p ¹ q)x) of Theorem 6 and obtain Eq. (3). error values of the three kinds of networks are about the
Because PN l,j[f i] is a linear combination of PL j,k, it is just same and decrease monotonically as l increases (cf.
a network on R m to R with piecewise linear hidden- Fig. 5). This shows that these approximations proceed in
layer units. Then PN l,j[f] is a network on R m to R n with almost the same way as l increases, even when j is fixed
piecewise linear hidden-layer units approximating f with at a small value for l in this example. The actual error
Lp2p ¹ norm. values are always bounded by the estimated error values.
Hidden-layer unit number: Each PN l,j[f i] has common The estimated error values of the networks with piecewise
hidden-layer units based on PL j,k. Let r ¼ p ¹ q P in Eq. (3), and sigmoidal hidden-layer units, however, do not decrease
¹ l#ri #l
then the
P4lrlj ¹ 1 number
P ¹ l#ri #l is given by 1/2 rÞ0[Nm monotonically as l increases because j is fixed at a small
0
k¼0 P ¼ 2j r[Nm lrl ¼ 2jfl(l þ 1)(2l þ 1)m ¹ 1 þ value for l. When j ¼ 60, these actual error values are
(2l þ 1) ¹ l#r2 , …, rm #l (lr2 l þ … þ lrm l)g ¼ 2mjl(l þ 1)
0
almost equal and decrease monotonically as l increases
(2l þ 1)m ¹ 1 . P0#pu , qv #l (cf. Fig. 6). This shows that these approximations proceed
Error P estimation: Bl CombinationsP
of pÞq[Nm bl, p bl, q ¼ almost equally as l increases, when j is fixed at a larger
(b l, p ) g ¼
0 2
(1/2)Bl f P 0#pu , qv #l b b
l, p l, q ¹ 0#pu #l value relative to l in this example. In this case, the actual
ð1=2ÞfBl ( 0#r#l sinðrþ1Þp=ðlþ2) ¹1g ¼ ð1=2Þf(2=l þ
2m
approximation-error values are bounded by the estimated
2)m (cotðp=2Þ(l þ 2))2m ¹ 1g # ð1=2Þf(8(l þ 2)=p2 )m ¹ 1g, approximation-error values which decrease monotonically
because 0:9 , ðp=2(l þ 2)Þ cot p=2(l þ 2) , 1 for l [ N. as l increases. The actual error values of the three kinds of
Hence, from Eq. (3), Eq. (9) of Theorem 5, Eq. (13) of networks in the both cases of j ¼ 5 and 60 in this example
Theorem 6, lhfi , cos (p ¹ q)til # kfi kLp2p , and lhfi , sin (p ¹ q) can be estimated by the same formulation; i.e. the right side
til # kfi kLp2p , kKl p fi ¹ pPNl, j [fi ]kL2p # 2kfi kL2p of Eq. (2) and the first terms of the right sides of Eqs. (4) and
p p

f(8(l þ 2)=p2 )m ¹ 1g fðp m=(2p)m ¹ 1 jÞ(4j=p ¹ cot (6), as we stated in Remark 2. As explained in the following
p=4j)g1=p . Thus, Eq. (4) is obtained from Eq. (10) of discussion, the approximation capabilities of the three kinds
Theorem 5 for W ¼ Lp2p . We can prove Theorem 3 in the of networks cannot be compared with each other using the
same manner as Theorem 2 using SC j and SS j of Theorem 7
instead of PC j and PS j. Q.E.D.

Proofs of Corollaries 1 and 2. These corollaries are derived


from Eqs. (2), (4) and (6) of Theorems 1, 2, and 3, and (viii)
of Proposition 1. Q.E.D.

5. An example

This is an example of the approximation to a two- Fig. 4. The three-dimensional and contour graphs of the approximated
dimensional Gaussian function e ¹ (x þ y ) on [ ¹ p, p] 3
2 2
function e ¹ (x þ y ) .
2 2
1056 S. Suzuki / Neural Networks 11 (1998) 1049–1058

Fig. 5. The actual and estimated approximation error values of the approximating networks with trigonometric, piecewise, and sigmoidal hidden-layer units for
l at j ¼ 5.

result. The figures of three-dimensional and contour graphs 6. Conclusion


of the three kinds of approximating networks are almost
equal for the same l (cf. Figs 7–9). (Because these graphs In this paper we proved constructive theorems for three-
of three kinds of networks are almost equal for the same j, layer artificial neural networks with (1) trigonometric, (2)
we show graphs of piecewise linear hidden-layer networks piecewise linear, and (3) sigmoidal hidden-layer units,
at j ¼ 5 and sigmoidal hidden-layer networks at j ¼ 60.) which approximate 2p-periodic pth-order Lebesgue-
They are smooth and symmetric and become steeper as l integrable functions on R m to R n for p $ 1 with
increases. Lp2p ¹ norm from a multidimensional extension of Jackson’s

Fig. 6. The actual and estimated approximation error values of the approximating networks with trigonometric, piecewise, and sigmoidal hidden-layer units for
l at j ¼ 60.
S. Suzuki / Neural Networks 11 (1998) 1049–1058 1057

Fig. 7. The three-dimensional and contour graphs of the functions derived


from the approximating networks with trigonometric hidden-layer units for Fig. 8. The three-dimensional and contour graphs of the functions derived
from the approximating networks with piecewise linear hidden-layer units
l ¼ 2, 4, and 10.
for l ¼ 2, 4, and 10 at j ¼ 5.

theorem for an approximation by trigonometric poly- error evaluations using the modulus of continuity of a target
nomials. (In the case of (1), the networks also approximate function and can be represented by explicit formulations
2p-periodic continuous functions with C 2p-norm.) Because without any explicit constant. Then we can easily calculate
most of the previous results about the constructive approxi- the estimated values using these formulations. Networks
mations by neural networks approximate continuous func- TN l, PN l,q, and SN l,j can approximate functions with
tions, the approximations in this paper extend the space of any degree of accuracy, if their numbers of hidden-layer
target functions. Theorems 1, 2, and 3 provide explicit units increase. Corollary 2 shows that their approximation
equational representations of approximating networks errors approach 0, if l increases under some conditions of j.
TN l, PN l,j, and SN l,j respectively with trigonometric, The approximation methods by networks with piecewise
piecewise linear, and sigmoidal hidden-layer units, specifi- linear and sigmoidal hidden-layer units are based on the
cations for their numbers of hidden-layer units, and explicit method by networks with trigonometric hidden-layer
formulations of their approximation-error estimations. Most units. In fact, for any l, the functions by PN l,j and SN l,j
previous constructive approximation methods are not approach the function by TN l then they become almost the
simple enough for deriving explicit equational representa- same, if j increases. Moreover, for any l, the approximation
tions of approximating networks that can be calculated errors of PN l,j and SN l,j approach the error of TN l if j
practically. The approximation methods presented in this increases. Then they become almost the same value, which
paper only need multidimensional Fourier coefficients of a can be estimated mainly by the same formulation in terms of
target function and can practically and easily construct l; i.e. the right side of Eq. (2) and the first terms of the right
approximating networks. The formulations of the approxi- sides of Eqs. (4) and (6), while the second terms of the right
mation-error estimations in most of the previous results sides of Eqs. (4) and (6) are negligible, if j is large enough
contain inexplicit constants. Then it is not easy to calculate for l. These results apply easily to an approximation to a
the estimated values of approximation errors practically non-periodic function defined in a bounded set on R m. The
using the formulations, which fit the actual valuers. The approximation example to a two-dimensional Gaussian
error estimations in this paper are derived from the direct function by our methods illustrates our results.
1058 S. Suzuki / Neural Networks 11 (1998) 1049–1058

approximation errors. The approximation capabilities of


networks with three kinds of hidden-layer units cannot be
compared with each other using these theorems, because
Theorems 2 and 3 are derived using the proof of Theorem
1. Thus, we still have a problem: how to derive more effi-
cient and direct constructive approximations by artificial
neural networks with different kinds of hidden-layer units.

Acknowledgements

The author would like to thank Information Science


Research Laboratory Executive Manager Dr Ken-ichiro
Ishii, HONDA Research Group Reader Dr Masaaki
Honda, Dr Takeshi Okadome, Mr Yoshinao Shiraki, and
other members of HONDA Research Group for many useful
and helpful discussions on this and related works.

References

Cybenko G. (1989). Approximation by superpositions of sigmoidal func-


tion. Mathematics of Control, Signals and System, 2 (4), 305–314.
Davis, P. J. (1975). Interpolation and Approximation. New York: Dover.
Feinerman, R. P. and Newman, D. J. (1974). Polynomial Approximation.
Baltimore: Williams & Wilkins.
Funahashi K. (1989). On the approximate realization of continuous map-
pings by neural networks. Neural Networks, 2, 183–192.
Hecht-Nielsen, R. (1989). Theory of the back propagation neural network.
In Proceedings of the International Joint Conference on Neural
Networks, I (pp. 593–611).
Fig. 9. The three-dimensional and contour graphs of the functions derived
Hecht-Nielsen, R. (1990). Neurocomputing. Reading, MA: Addison-
from the approximating networks with sigmoidal hidden-layer units for l ¼
Wesley.
2, 4, and 10 at j ¼ 60.
Hornik K. (1991). Approximation capabilities of multilayer feedforward
networks. Neural Networks, 4, 251–257.
7. Discussion Hornik K., Stinchcombe M., & White H. (1989). Multilayer feedforward
networks are universal approximators. Neural Networks, 2, 359–366.
Mhasker H. N. (1993). Approximation properties of a multilayered feed-
Jackson’s theorem, on which our theory is basically
forward artificial neural network. Advances in Computational
based, is not guaranteed to give the best approximation. Mathematics, 1, 61–80.
Consequently, Theorems 1, 2, and 3 do not necessarily Mhasker H. N., & Micchelli C. A. (1992). Approximation by superposition
deliver the most efficient methods for constructing the of sigmoidal and radial basis functions. Advances in Applied
three kinds of approximating networks and the relation Mathematics, 13, 350–373.
Suzuki, S. (1993). Function approximation by three-layer artificial neural
between the numbers of hidden-layer units and the approxi-
networks. In Proceedings of 1993 International Symposium on Non-
mation errors. The numbers of hidden-layer units of the linear Theory and its Applications, 4 (pp. 1269–1272).
three kinds of networks constructed by our methods Takenouchi, O. and Nishishiraho, T. (1986). Kinji Riron (Approximation
increase exponentially in l and j, which determine the Theory). Baifuukan (in Japanese).

Você também pode gostar