Chapter 6

Mechanical Engineering Department, UI
TME 707 (Applied Numerical Methods)
CHAPTER 6
LEAST SQUARES AND FITTING
6.0 Introduction
The subject of least squares is quite an old -so old in fact that it has a really well established vocabulary.
The trouble with established words is that one often forgets where they come from and begins to think
that they themselves represent some law of nature. I would like to be able to count the number of times
that after I have suggested that someone do some type of fitting analysis on his particular problem, he
has replied saying - "well, thanks heaven for that - I'll be able to use the subroutine package that the IBM
company has kindly made available to our computing centre".
A person responsible for the above statement is highly irresponsible, to my way of thinking. Anyone
who blindly uses these subroutine packages is really asking for it since they are usually designed for a
polynomial fit, and it very often happens that what is actually needed for a particular problem is a
trascedental fit - and may be not in the least squares sense.
Before the least square analysis is dealt with, I shall discuss this business of fitting from a
general point to view. Let us suppose that we are presenting with data point (x1 , y1 ), (x2 , y2 ), , (xn ,
yn ) where yi is the value at xi . it now desired to fit these data points to a curve particular functional
form.
Y = f (x1 , 1 , 2 , . . ., k )
(6.1)
For example we may fit to

Y = sin ( 1 x )e
2 x+
3
x
or
Y = 1 sin ( 2 x ) = 3 sin ( 4 x ) ,
Dr. D. A. Fadare
Page 1 of 14
6/4/2010
etc., where fit by adjusting the parameters i so that the resulting Y gives a reasonable representation of
the data. What guides us in choosing the functional form f? This is usually done by either observing the
data and making a good guess at the functional form- or a simplified model of the physical situation may
suggest the form for f.
Returning to Eq. 6.1, it is now desired to fit Y = f from the data. How is this fitting to be
accomplished? This is a problem since there are an infinity of fitting criteria, and there appears to be no
uniform method known. Some of the fitting criteria are given below.
a)
One fit by minimizing

n
S p ( 1 , . . ., k ) = y i f ( xi 1 , . . ., k )
(6.2)
i =1
with respect to 1 , . . . . , k
b)
one fit by minimizing the maximum value of

yi f (xi1 , . . ., k )
(6.3)
for i = 1, 2, , with respect to the i s

c)
One minimizes
n
R p ( 1 , . . ., k ) = Wi y i f ( x i 1 , . . ., k )
(6.4)
i =1
with respect to the k i 's. Here Wi is a weighing function >0.

d)
instead of fitting Y to f, one may attempt to fit f(y) to F(f) when F has a specified
functional form-e.g., log, sin, exp, etc. one may now fit according to a), b) or c) above -for example,
using the a) criterion, we should minimize
n
F ( y ) F (x , .
i =1
. ., k )
with respect to the i s

e)
Instead of fitting Y to f ( xi 1 , . . ., k ) a person may decide to fit
Dr. D. A. Fadare
Page 2 of 14
6/4/2010
X = g ( y, 1 , . . ., k )
(6.5)
where g is a specified functional form of y. with the above functional form, once again can use the
criteria of a), b), c) or d) outlined.
It should now be evident that an infinite number of fitting schemes are possible, and the least
square polynomial fit offered by the subroutine package certainly does not exhaust the choices.
It is also understandable that each fitting scheme will yield a different approximately function.
A word should be said about whether we should be fitting y to f ( xi 1 , . . ., k ) or x to
g ( y, 1 , . . ., k ) . In other words, which should be the dependent and independent variable.
The rule for deciding goes like this: One has the data points (xi, yi ) i = 1 ,,n . Now decide which
variable, x or y is the more exact in value. If x is the more exact (reading more reliable) fit y to
f ( xi 1 , . . ., k ) whereas if y is more exact, fit x to g ( y, 1 , . . ., k ) . The more exact variable should
be made the independent variable, while the variable whose readings are less reliable should be made
the independent variable.
The situation can be visualized as shown below, where x and y are the independent and dependent
variable respectively.
y
yn
y2
y1
y3
x
x1
x2
x3
xn
At each exact (assumed exact) xi , y is measured as yi . Actually this value of yi may be thought of as a
sample from a distribution i.e., for each xi, an infinite number of possible values for yi can be obtained
Dr. D. A. Fadare
Page 3 of 14
6/4/2010
- it just turned out that for the experiment carried out we read off yi . Looked at from this point of view, x
is often called a variable. This means that y is a distribution (often called a functional) which depends on
x. One of the gross assumption function for y has the same shape and form irregardless of the value for x
i.e., the functional y only depends on x as far as its mean value is concerned, but the variance (or
standard deviation) say is independent of x. It gives one pause when we realize that criteria in quality
control in factory production, for instance, blithly assumes this. I do not want to ride on the big jet when
she goes down.
The popular fitting schemes are usually of the form Eq. 6.4 where W(x) is the weighting
function. W(x) is usually chosen by the fitter so that the data points which are considered relatively
unimportant or poor in accuracy are assigned a small W(xi) while the more worthy points are given a
larger W(xi).
It is then clear that the resultant fit will be more in accord with the pertinent data, while the influence of
the lesser important (or unreliable data will be small as we wish to be after all, we (the fitter) choose
W(x).
It can be shown that if W(x) = 1 and p = 1, Eq. 6.2 represents a fit according to the median
whereas if p = 2, one obtains a so-called least square fit or a fit according to the mean.
A fit according to Eq. 6.3 or minimizing the maximum
yi f (xi1 , . . ., k ) p
For i = 1, 2, ,n with respect to the i s is called a Chebyshev fit. As a matter of fact, if p large in
Eq. 6.2 we also obtain a Chebyshev fit. This is true since if p becomes large, the maximization of
n
W
i =1
yi f ( xi1 , . . ., k )
becomes equivalent to minimizing the maximum term in this sum since the maximum term will drown
out the other terms.
Dr. D. A. Fadare
Page 4 of 14
6/4/2010

y
xmax = 1
xmedian = 1.6783......
0.3
xmean = 2
0.2
x*{exp(-x)}
0.1
0
1
As far as the terms median and mean are concerned, they are pictorially shown in the above sketch of
the distribution
p(x) = xe x, range (0, ) . The mean or first moment is given by
xp(x )dx
x mean =
range
e x dx = 2
If the range of a distribution is (a, b), the median is defined as

xmed
p(x )dx =
a
p(x )dx
xmed
i.e., xmed is the dividing point distribution which a random value of x has equal likelihood of being
greater or less than. For the example, one has
xmed
xmed
x
xe dx =
e x (1 + x )
xe
xmed
e xmed (1 + x med ) =
dx
= e x (1 + x )
xmed
1
2
which has the solution
Dr. D. A. Fadare
Page 5 of 14
6/4/2010
xmed = 1.6783
In this section, I shall now describe some of the analytical consequences of performing a least fit on data
i.e, using Eq. 6.2 or with the weighting function -Eq. 6.4 with the index p = 2 (hence the name least
square). Why? Simply because it is the only type of fit that permits substantial explicit analysis and even
closed form solutions at times. The other types of fits ( p 2 ) which may be more appropriate to a given
situation cannot be solved in closed form.

In fact, if we wish to fit by minimizing
n
Rk = Wi y i f ( xi 1 , . . ., k )
i =1
we must resort to actually minimizing the actual sum numerically by a method such as the steepest
descent scheme which was outlined in section 4. If f is a transcendental function of the variables 1 , . . . .
, k (or even non-linear) even a least square fit will have to be done numerically so that for such cases,
there is no particular advantage to the least square fir-unless f is a linear function of the i s.
You may argue by saying that if we wish to fit the data to an exponential, often least squares can easily
be done. For example, suppose we fit to
Y= Ae Bx
(6.6a)
ln y = ln A + Bx
(6.6b)
By taking logs, we get
so we can find A and B by fitting according to Eq. 6.6b which is linear in ln A and B. it should now be
stated that fits according to Eq.6.6a and 6.6b are not the same. This is seen as follows:
To fit according to Eq. 6.6a, we minimize
S ( A, B) = Wi y i Ae Bxi
(6.7c)
whereas, according to Eq. 6.6b, we fit by minimizing
Dr. D. A. Fadare
Page 6 of 14
6/4/2010
S ( A, B) = Wi (ln y i ln A Bxi )
y
= Wi ln iBxi
Ae
(6.7b)
with respect to ln A and B Examining Eqs. 6.7a and b. it is seen

At once that the functions S and S are quite different and hence will certainly give different values for A
and B. hence if we find A and B according to Eq.6.6b and 6.7b, we have not fitted to an exponential at
all but rather we have fitted ln y to ln A + Bx which is different.
6.1 Straight Line Fit
As a first example to fitting, let do a straight line fit to the data with a weighting function w which has
assigned values at the data points i of Wi. We also assume that the variable (or most reliable variable) is
X so we fit to
(6.8)
y = ax + b
where a and b are found by minimizing
S (a, b) = Wi (ln y i ln A Bxi )
Taking
(6.9)
dS dS
=
= 0 , we obtain
da db
n
dS
= 2 Wi ( y i axi b )xi = 0
da
i =1
(6.10)
n
dS
= 2 Wi ( y i axi b ) = 0
db
i =1
For brevity, let us define
Dr. D. A. Fadare
Page 7 of 14
6/4/2010
W =
1 n
Wi
n i =1
(6.11)
Wx =
1 n
Wi xi
n i =1
Then, using the notation defined by Eq. 6.11, the relations of Eq. 6.10 become
aWx 2 + bWx = Wxy
(6.12)
aWx + bW = Wy
from which we obtain closed results for a and b as

a=
Wxy W Wx Wy
( )
Wx 2 W Wx
(6.13)
b=
Wx 2 Wy Wx Wxy
( )
Wx 2 W Wx
If we choose y as the variant (or most reliable variable) and fit according to
x = y +
(6.14)
or adjusted and to minimize

n
S1 = Wi ( xi xi )
(6.15)
i =1
we obtain and by demanding
S1 S1
=
= 0 . Results very similar to Eq. 6.13 are obtained. In fact, if
x and y are interchange in Eq. 6.13 we can read for a and for b so that we get at once
Dr. D. A. Fadare
Page 8 of 14
6/4/2010
Wxy W Wx Wy
( )
Wy 2 W Wy
(6.16)
Wy 2 Wx Wy Wxy
( )
Wy 2 W Wy
If we now wish to write as y = ax + b, our new a and b will be given by
new a =
, new b =
(6.17)
Naturally, the new a and b thus evaluated will not be equal to those found by Eq. 6.13.
Suppose both x and y are somewhat unreliable. In this case, we can (arbitrarily, mind you) fit to
a1 x + b1 y = 1
(6.18)
By minimizing
S (a1 ,b1 )
or
(6.19)
S (a1 , b1 ) = Wi i
where i is the perpendicular distance of the data point (xi, yi) to the line a1x + b1y = 1
Dr. D. A. Fadare
Page 9 of 14
6/4/2010
a1x+b1y = 1
i ( xi, yi)
It is easy to show using high school algebra that

a1 xi + b1 y i 1
i =
a1 + b1
(6.20)
Hence,
S =
1
2
a1 + b1
W (a x
i
i =1
+ b1 y i 1)
(6.21)
S
2a1
= 2
a1 (a1 + b1 2 ) 2
+
=
W (a x
i =1
a1 + b1
W (a x
2
2
i =1
1 i
1 i
(a1 + b1 )
W (a x
i =1
+ b1 y i 1)x i
+ b1 y i 1)
1 i
+ b1 y i 1) a1 x i a1b1 y i + a1 + a1 x i + b1 x i = 0
2
Or
n
W (a x
i =1
Taking
S
b1
1 i
+ b1 y i 1)(a1 + b1 [b1 xi a1 y i ]) = 0
(6.22)
= 0 , we obtain the result of Eq. 6.22 with x and y, a1 and b1 interchanged, i.e.,
n
W (a x
i =1
1 i
+ b1 y i 1)(b1 + a1 [a1 xi b1 y i ]) = 0
(6.23)
from Eqs. 6.22 and 6.23, one can immediately write
Dr. D. A. Fadare
Page 10 of 14
6/4/2010
W (a x
i =1
1 i
+ b1 y i 1) = 0
(6.24)
W (a x
i =1
1 i
+ b1 y i 1)(a1 y i b1 xi ) = 0
From the first relation of Eq. 6.24, we obtain
a1Wx + b1Wy = W
so that a1 can be expressed in terms of bI as
a1 =
1
W b1Wy
Wx
(6.25)
Substituting Eq. 625 for a1 into the second relation of Eq. 6.24, a quadratic equation for b1 results which
reads
2
Ab1 + Bb1 + C = 0
Where
[(
) ( ) ]+ Wx Wy[Wx
2
A = Wxy Wy Wx
[(
Wy 2
) ( ) ] 2W Wy Wxy
2
B = W Wx Wy 2 Wx 2 + Wx Wy + Wx
C = W Wxy W Wx Wy
(6.26)
If we eliminate b1 instead, we obtain a quadratic equation for a1 identical to Eq. 6.26 but with x and y
interchanged, of course.
It may now be wondered since we obtain a quadratic equation of b1 (as in Eq. 6.26) evidently
two roots obtain resulting in two solutions. Which one should we take as the best straight line?
However, a little reflection enables us to see that Eq. 6.26 is the result of taking
Dr. D. A. Fadare
Page 11 of 14
S
S
= = 0
a1
b1
6/4/2010
and the result need not be a minimum. In fact, the two solutions of Eq. 6.26 can be shown to yield
straight lines that are perpendicular to one another. This is seen as follows:
The slope of the line a1 xi + b1 yi 1 is R where
R=
a1
b1
or using Eq. 6.25 we obtain

R=
W b1Wy
Wx b1
Wy
W
Wx Wx b1
and using Eq. 6.26, we have

R=
2 AW
Wy
Wx Wx B B 2 4 AC
(6.27)
Setting the two slopes as R1 and R2

R1 =
R2 =
2 AW
Wy
Wx Wx B + B 2 4 AC
Wy
2 AW
Wx Wx B B 2 4 AC
(6.28)
From Eq. 6.28

2
Wy
2 AW Wy
1
1

+
R1 R2 =
2
2
(Wx ) B + B 4 AC B B 2 4 AC
Wx
) (
R1 R 2 =
( )
+ 4A W
Wx 2 4 AC
) ( )
[C (Wy) + BW Wy + A(W ) ]
C (Wx )
1
Finally, using the definition for A, B and C given by Eq. 6.26 we obtain the result
R1 R2 = 1
Dr. D. A. Fadare
Page 12 of 14
(6.29)
6/4/2010
Hence, one line is best fit to the data while the other line is the worst fit (in a way). Actually, the best fit
corresponds to a minimum of S whereas the other solution is a minimax of S or a saddle point. In
practice, a glance at the data enables one to decide which solution is the desired one.
As far a the errors are concerned, this is hard to tell you may need to consult text books on
regression analysis for a fairly rigorous treatment on errors. At this point, suffice it so say that if the
weights Wi are all equal to unity and we fit according to Eq. 6.9 obtaining a and b as per Eq. 6.13, then a
crude estimate for the error in y can be obtained from the resulting S(a, b). In fact, defining
2 =
1
S (a, b )
n2
(6.30)
where S(a, b) is the resulting sum at the minimum, the error is given by
Error
2
n
(6.31)
In addition, the errors in a and b are also crudely give by

a err
berr
()
x2 x
x2
()
x2 x
Where W = 1 , of course and n is large
Problem
1.
Suppose we have data as follows for which desire a straight line least square fit
0.25, 0.75, 1.0, 1.5, 2.0
2.5, 2.8, 3.5, 4.0, 4.75
5.25, 5.75, 6.25, 6.75, 7.25
1.6, 1.6, 2.0, 2.2. 2.55
2.7, 3.0, 3.25, 3.75, 4.0
4.25, 4.75, 4.8, 5.25, 5.4
Dr. D. A. Fadare
Page 13 of 14
6/4/2010
a)
With x as the more reliable variable and Wi =1. fit according to Eq. 6.9.
b)
With y as the more reliable variable and Wi = 1, fit according to Eq. 6.15.
c)
With x as the more reliable variable and W (x ) =
d)
With both x and y unreliable and WI = l, fit according to Eq. 6.19.
1
, fit according to Eq. 6.9.
1+ x2
In all four cases above, print out the result as y = ax + b i.e., print the resulting a and b.
Dr. D. A. Fadare
Page 14 of 14
6/4/2010

Chapter 6

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Chapter 6

Enviado por

Direitos autorais:

Formatos disponíveis

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

For example we may fit to

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

One fit by minimizing

one fit by minimizing the maximum value of

for i = 1, 2, , with respect to the i s

with respect to the k i 's. Here Wi is a weighing function >0.

with respect to the i s

Instead of fitting Y to f ( xi 1 , . . ., k ) a person may decide to fit

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

p(x) = xe x, range (0, ) . The mean or first moment is given by

If the range of a distribution is (a, b), the median is defined as

which has the solution

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

situation cannot be solved in closed form.

By taking logs, we get

whereas, according to Eq. 6.6b, we fit by minimizing

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

with respect to ln A and B Examining Eqs. 6.7a and b. it is seen

6.1 Straight Line Fit

For brevity, let us define

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

from which we obtain closed results for a and b as

or adjusted and to minimize

we obtain and by demanding

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

If we now wish to write as y = ax + b, our new a and b will be given by

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

It is easy to show using high school algebra that

from Eqs. 6.22 and 6.23, one can immediately write

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

From the first relation of Eq. 6.24, we obtain

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

or using Eq. 6.25 we obtain

and using Eq. 6.26, we have

Setting the two slopes as R1 and R2

From Eq. 6.28

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)

In addition, the errors in a and b are also crudely give by

Where W = 1 , of course and n is large

0.25, 0.75, 1.0, 1.5, 2.0

2.5, 2.8, 3.5, 4.0, 4.75

5.25, 5.75, 6.25, 6.75, 7.25

1.6, 1.6, 2.0, 2.2. 2.55

2.7, 3.0, 3.25, 3.75, 4.0

4.25, 4.75, 4.8, 5.25, 5.4

Mechanical Engineering Department, UI

TME 707 (Applied Numerical Methods)