A Study of Fuzzy Lenear Regression !!!!

A Study of Fuzzy Linear Regression
Dr. Jann-Huei Jinn

Department of Statistics
Grand Valley State University
Allendale, Michigan, 49401
and
Dr. Chwan-Chin Song and Mr. J. C. Chao
Department of Applied Mathematics
National Cheng-Chi University
Taipei, Taiwan, R.O.C.

1. Introduction

We often use regression analysis to model the relationship between dependent
(response) and independent (explanatory) variables. In traditional regression analysis,
residuals are assumed to be due to random errors. Thus, statistical techniques are applied
to perform estimation and inference in regression analysis. However, the residuals are
sometimes due to the indefiniteness of the model structure or imprecise observations. The
uncertainty in this type of regression model becomes fuzziness, not random. Since Zadeh
(1965) proposed fuzzy sets, fuzziness has received more attention and fuzzy data analysis
has become increasingly important.

In order to consider the fuzziness in regression analysis, Tanaka et al. (1982) first
proposed a study of fuzzy linear regression (FLR) model. They considered the parameter
estimations of FLR models under two factors, namely the degree of the fitting and the
vagueness of the model. The estimation problems were then transformed into linear
programming (LP) based on these two factors. Since the measure of best fit by residuals
under fuzzy consideration is not presented in Tanakas approach, Diamond (1988)
proposed the fuzzy least-squares approach, which is a fuzzy extension of the ordinary
least squares based on a new defined distance on the space of fuzzy numbers. In general,
the fuzzy regression methods can be roughly divided into two categories. The first is
based on Tanakas LP approach. The second category is based on the fuzzy least-squares
approach.

In section 2, we introduced the fuzzy number and its operation, a simple distance
formula, two fuzzy linear regression models and their least squares estimates.

In section 3, we introduced LR type fuzzy number and nonsymmetrical doubly linear
adaptive fuzzy regression model, Yang and Kos distance formula, and least squares
estimates which relate to membership functions.

In section 4, we applied traditional methods of detecting possible outliers and
influence points to derive the leverage values, residuals and Cook distance formula for
the fuzzy linear regression models..

In section 5, we used the theoretical results in the previous chapters to analyze the
Tanakas (1987) data.

The derivation of some important formulas is given in appendices.

2. Introduction to Fuzzy Linear Regression

2.1 Fuzzy umber and Its Operation

Fuzzy data is a natural type of data, like non-precise data or data with a source of
uncertainty not caused by randomness. This kind of data is easy to find in natural
language, social science, psychometrics, environments and econometrics, etc. Fuzzy
numbers have been used to represent fuzzy data. These are also used to model fuzziness
of data.
Let be a one-dimensional Euclidean space with its norm denoted by . . A fuzzy
number is an upper semicontinuous convex function F: [0,1] with { } 1 ) ( = x F x
non-empty. In other words, a fuzzy number A is defined as a convex normalized fuzzy
set of the real line so that there exists exactly one
o
x with F(
o
x )=1, and its
membership F(x) is piecewise continuous.

Definition 2.1 (Zimmermann [pp.62-63])

Let L (and R) be decreasing, shape functions from
+
to [0,1] with L(0)=1; L(x)<1
for all x>0; L(x)>0 for all x<1; L(1)=0 for all x and L(+)=0). Then a fuzzy number A is
called of LR-type if for m, >0, 0 > in ,
A(x)=
|
|
\
|
|
\
|
,
,
m x
m x
R
m x
x m
L
where m is called the center (mean or mode)value of A

and and are called left and right spreads, respectively. Symbolically, A is denoted
by ( )
LR
m , , . If = , A=(m, , )
LR
is called symmetrical fuzzy number, denoted
by A=(m, )
LR
. For instance, the algebraic and geometric characteristics of the
membership function of the more utilized LR fuzzy number, the triangular fuzzy
number, are shown in the following:
A(x)=
, 1
, 1
m x m
m x
m x m
x m

Another example, the exponential fuzzy number, its membership function
A(x)=
(
(
\
|
(
(
\
|
, exp
, exp
m x
s
m x
m x
s
x m
n
n
where s is the spread.
Definition 2.2 (Dubois(1980))

Let A=( )
LR a a a
m , , and B=( )
LR b b b
m , , be two LR-type fuzzy numbers. Then by
the extension principle, the following operations are defined:
1. A+B=( )
LR b a b a b a
m m + + + , ,
2. A= ( )
LR a a a
m , , =( )
LR a a a
m , , when >0
3. A= ( )
LR a a a
m , , =( )
RL a a a
m , , when <0
4. B=( )
RL b b b
m , ,
5. AB=( )
LR a a a
m , , ( )
RL b b b
m , , =( )
LR b a b a b a
m m + + , ,

Definition 2.3 (a Euclidean distance formula)

Let A=( )
LR a a
m , and B=( )
LR b b
m , be two symmetrical fuzzy numbers, then the
distance between A and B is defined as:
D=
2 2
) ( ) (
b a b a
m m + (2.1)
Let A=( )
LR a a a
m , , and B=( )
LR b b b
m , , be two LR-type fuzzy numbers, then the
distance between A and B is defined as:
D=

w w w m m
b a b a m b a
2 2 2
) ( ) ( ) ( + + (2.2)
Where 0 >
m
w , 0 >
w , and 0 >
w are arbitrary weights.

2.2 General Fuzzy Linear Regression Model

Consider the following general fuzzy linear regression model and call it Model I:

ip p i i o i
x A x A x A A y + + + + = ...
2 2 1 1
i=1,2,,n (2.3)
where
ij
x are real numbers, ] , [
i i i i i
s c s c y + = are fuzzy numbers,
i
c is the center (mean
or mode) value,
i
s is the spread, and ] , [
m m m m m
r a r a A + = are the fuzzy regression
parameters, which has the same membership function as
i
y . How should we estimate
m
A if the distance between two fuzzy numbers are undefined? We may treat
i i Li
s c y = and
i i Ri
s c y + = as the left and right end points of the sample data,
respectively. For the left end data points ( ) { } n i x x x y
ip i i Li
,..., 2 , 1 ,..., ,
2 1 ,
= (and similarly to
the right end data points ( ) { } n i x x x y
ip i i Ri
,..., 2 , 1 ,..., , ,
2 1
= we may use the linear regression
model
p p o
x x y + + + = ...
1 1
to obtain the following estimates, respectively,
n i x L x L x L L y
ip p i i o Li
,..., 2 , 1 , ...
2 2 1 1
= + + + + =
) ) ) )
)

n i x R x R x R R y
ip p i i o Ri
,..., 2 , 1 ...
, 2 2 1 1
= + + + + =
) ) ) )
)

Then, | |
m m m m m
r a r a A
) ) ) )
)
+ = , where
2
m m
m
R L
a
) )
) +
= ,
2
m m
m
L R
r
) )
)

= .
Using this way to estimate the regression parameters,
m
A , didnt consider the
advantage of using the membership function to describe the data. The fuzzy concept were
not used in the estimation of parameters. In order to obtain more appropriate estimates of
fuzzy regression parameters,
m
A , the least squares method and the distance between two
fuzzy numbers should be considered.

Based on the definition 2.3, we can use ordinary least-squares method to estimate the
fuzzy parameters in the general fuzzy linear regression model ((2.3), Model I). Assuming
that
i
y = ) , (
i i
s c and ) , (
m m m
r a A = have same membership function, after appropriate
translation, we can make all of 0 >
ij
x . Then (2.3) can be expressed as

ip p p i i o o i i
x r a x r a x r a r a s c ) , ( ... ) , ( ) , ( ) , ( ) , (
2 2 2 1 1 1
+ + + + =
According to the Euclidean distance formula of (2.1), the least-squares estimates of
i
a
and
i
r are the values of
i
a ,
i
r which minimize the value of
2
D where

2
D = | |
=
+ + + + + + +
n
i
ip p i o i ip p i o i
x r x r r s x a x a a c
1
2
1 1
2
1 1
)) ... ( ( )) ... ( (
Let v
r
denote the length of vector v
r
, then by using vector and matrix expressions
2
D can be rewritten as
2
D =
2
C a +
2
S r where is a ) 1 ( + p n design
matrix, )' ,..., , (
1 p o
a a a a = , )' ,..., , (
1 p o
r r r r = , )' ,..., , (
2 1 n
c c c C = , and )' ,..., , (
2 1 n
s s s S = .
Let 0
2
=
a
D
and 0
2
=
r
D
then the solutions of a and r which minimize
2
D are as
follows: C a ' ) ' (
1
=

S r ' ) ' (
1
=

(2.4)

The above method used regression with respect to center and spread. The estimation
results are not related to the membership functions. But, in the later real data analyses,
this method provided better results in the estimation of fuzzy parameter values.

2.3 Symmetrical Doubly Linear Adaptive Fuzzy Regression Model

Under the structure of model I, if we use the Euclidean distance formula and the least-
squares method to do linear regression with respect to center and spread respectively,
then the estimates show that the centers and spreads are not related. But, DUrso and
Gastaldi (2000) think that the dynamic of the spreads is somehow dependent on the
magnitude of the (estimated) centers. Therefore, they proposed the doubly linear adaptive
fuzzy regression model (call it Model II) to obtain the parameter estimates.

They considered symmetrical fuzzy numbers with triangular membership function.
Where a fuzzy number,
i
y = ) , (
i i
s c , is completely identified by the two parameters c
(center) and s (left and right spread). Model II is defined as follows:

c
C C + =
*
a C =
*
(2.5)

s
S S + =
*
d b C S 1
* *
+ = (2.6)
where is a ) 1 ( + p n matrix containing the input variables (data
matrix), )' ,..., , (
1 p o
a a a a = is a column vector containing the regression parameters of the
first model (referred to as core regression model), )' ,..., , (
2 1 n
c c c C = and a C =
*
are the
vector of the observed centers and the vector of the interpolated centers, respectively,
both having dimensions 1 n , and )' ,..., , (
2 1 n
s s s S = and
*
S are the vector of the
assigned spreads and the vector of the interpolated spreads, respectively, both having
dimension 1 n , 1 is a 1 n -vector of all 1's, b and d are the regression parameters for
the second regression model (referred to as spread regression model).

Apparently, the above model is based on two linear models. The first one interpolates
the centers of the fuzzy observations, the second one yields the spreads, by building
another linear model over the first one. Observe that predictive variables X are taken into
consideration in Eq. (2.6) through the observed centers. The model is hence capable to
take into account possible linear relations between the size of the spreads and the
magnitude of the estimated centers. This is often the case in real world applications,
where dependence among centers and spreads is likely (for instance, the uncertainty or
fuzziness with a measurement could depend on its magnitude).

DUrso and Gastaldi used the Euclidean distance formula of (2.1) and the least-
squares method to obtain the estimates of a , b and d such that the value of
2
D is
minimized, where

2
D =
2
*
2
*
S S C C +
=
2 2
1 ' ' 2 1 ' 2 ' 2 ' ) 1 ( ' ' ' 2 ' nd bd a d S ab S S S b a a a C C C + + + + +
Let 0
2
=
a
D
, 0
2
=
b
D
and 0
2
=
d
D
, they obtained the following equations:

0
2
=
a
D
= bd Sb b a C 1 ' ' ) 1 ( ' '
2
+ + +

0
2
=
b
D
= d a a S ab a 1 ' ' ' ' ' +

0
2
=
d
D
= nd b a S + + 1 ' ' 1 ' (2.7)
Based on the equations in (2.7), they obtained the following least-squares iterative
solutions of a , b and d :
a = )) 1 ( ' ) ' ((
) 1 (
1
1
2
bd Sb C
b
+
+

b = ) 1 ' ' ' ( ) ' ' (
1
d a a S a a

d = ) 1 ' ' 1 ' (
1
b a S
n
(2.8)
The derivation of the recursive solutions of a , b and d : from the first equation of (2.7),
we can easily obtain a = )) 1 ( ' ) ' ((
) 1 (
1
1
2
bd Sb C
b
+
+

, substituting it into the
second and third equations of (2.7), we obtained:

0 2

'
'
2
2
2 2
2
= + + + nbd d C n bd S n b S C S d b C n b S C b C (2.9)
and 0 = + d S C b (2.10)
where C C
' ) ' (
1
=

, S S
' ) ' (
1
=

, C C
n
= ' 1
1
, S S
n
= ' 1
1
.
From (2.10), we obtained C b S d = , substituting it back into (2.9), we obtained a
simplified quadratic equation of b: 0
3 2
2
1
= + + M b M b M
Where S C n S C M =
'
1
,
2 2
2 2
2

C n S n S C M + = , C S S C n M
'
3
= .

By solving the quadratic equation of b, we obtained

1
3 1
2
2 2
2
4
M
M M M M
b

= , and the corresponding solutions of
C b S d

= ,
))

1
( ' ) ' ((
)
1 (
1
1
2
d b b S C
b
a +
+
=

The least-squares estimates were obtained by substituting these two sets of a , b
, and
d
into
2
D such that the value of
2
D is minimized. Based on the equations of a , b
, and
d
, we can conclude that no matter what kind of membership function of the response
fuzzy number,
i
y = ) , (
i i
s c , the estimates of parameters are the same. Therefore, these
least squares estimates do not consider other possible shapes of fuzzy numbers.

3. LR type of Fuzzy Linear Regression

3.1 onsymmetrical Doubly Linear Adaptive Fuzzy Regression Model

When we have numerical (crisp) explanatory variables
j
X ( ) ,..., 2 , 1 k j = and a LR
fuzzy dependent variable ) , , ( q p c Y (where c is the center, p and q , respectively, the
left and right spreads), a model capable to incorporate the possible influence of the
magnitude of the centers on the spreads, can be taken into account (DUrso and Gastaldi,
2000, 2001, 2002). If the fuzzy response numbers ] , [
i i i i i
p c q c y + = are
nonsymmetrical with triangular membership function. DUrso (2003) proposed a fuzzy
regression model (call it Model III) which is expressed in the matrix form:

+ =
*
C C a C = * (3.1)
+ =
*
P P d b C P 1
* *
+ = (3.2)
+ =
*
q q h g C q 1
* *
+ = (3.3)
where is a ) 1 ( + k n matrix containing the vector 1 concatenated to k crisp input
variables; C ,
*
C are 1 n vectors of the observed centers and interpolated centers,
respectively; P ,
*
P are 1 n vectors of observed left spreads and interpolated left spreads,
respectively; q ,
*
q are 1 n vectors of observed right spreads and interpolated right
spreads, respectively; a is a 1 ) 1 ( + k vector of regression parameters for the regression
model for C ; h g d b , , , are regression parameters for the regression models for P and q ;1
is a 1 n vector of all 1s; , , are 1 n vectors of residuals.
This model is based on three sub-models. The first one interpolates the centers of the
fuzzy data, the other two sub-models are built over the first one and yield the spreads.
This formulation allows the model to consider possible relations between the size of the
spreads and the magnitude of the estimated centers, as it is often necessary in real case
studies. Model III can be called a nonsymmetrical doubly linear adaptive fuzzy regression
model.

DUrso used the Euclidean distance formula of (2.2) and the least-squares method to
obtain the estimates of a , h g d b , , , such that the value of
2
D is minimized, where

2
D =
c
C C
2
*
+
p
P P
2
*
+
q
q q
2
*
=
c
a a a C C C ) ' ' ' 2 ' ( +
+
p
nd bd a ab a d ab P P P ) 1 ' ' 2 ' ' ) 1 ( ' 2 ' (
2 2
+ + + +
+
q
nh gh a ag a h ag q q q ) 1 ' ' 2 ' ' ) 1 ( ' 2 ' (
2 2
+ + + + (3.4)
and
c
,
p
,
q
are arbitrary positive weights.

Recursive solutions to the above system are found by equating to zeros the partial
derivates with respect to the parameters a , h g d b , , , :
a = | |
q p c
q p c
g h q b d P C
g b

) 1 ( ) ) 1 ( ( ' ) ' (
1
1
2 2
+ +
+ +

b
= ) 1 ' ' ' ' ( ) ' ' (

1
d a P a a a

d
= ) 1 ' ' 1 ' (

1
b a P
n

g = ) 1 ' ' ' ' ( ) ' ' (
1
h a q a a a

h
= ) 1 ' ' 1 ' (

1
g a q
n
= ) 1 ' ' ' ' 1 ' (
1
h a q a q
n
(3.5)
Where h g d b a
, ,
, are the iterative least-squares estimates (obtained at the end of the

iterative process). The optimization procedure does not guarantee the attainment of the
global minimum, only a local one. For this reason, it is suggested to initialize the iterative
algorithm by considering several possible starting points in order to check the stability of
the solution. Based on the equations of h g d b a
, ,
, , we can conclude that the estimates of

parameters are not related to the membership function of the response fuzzy number.

3.2 Yang and Kos Distance Formula

Under the structure of Model I, II, III and the use of Euclidean distance, all the least-
squares estimates are not able to consider the possible effect of the membership function
of fuzzy response numbers. In this section, we will adapt the Yang and Kos (1996)
distance formula to try to find the least-squares estimates which are related to the
membership function of fuzzy response numbers.

Definition 3.1 (Yang and Kos distance formula(1996))

Let ) (
LR
F denote the set of all LR-type fuzzy numbers. Define a new type of
distance for any
LR a a a
m A ) , , ( = , B=( )
LR b b b
m , , in ) (
LR
F as follows:

2 2 2 2
)) ( ) (( )) ( ) (( ) ( ) , (
b b a a b b a a b a LR
r m r m l m l m m m B A d + + + + = (3.6)
where

=
1
0
1
) ( d L l and

=
1
0
1
) ( d R r

Yang and Ko (1996) also proved that ( ) (
LR
F ,
LR
d ) is a complete metric space. If A
and B are symmetrical LR type fuzzy numbers then r l = and
2 2 2 2
) ( 2 ) ( 3 ) , (
b a b a LR
l m m B A d + = . If A and B are symmetrical triangular type of
fuzzy numbers then

= = =

1
0
1
0
1
2
1
) 1 ( ) ( dx x dx x L l . If A and B are exponential type of
fuzzy numbers then

= =

1
0
1
0
/ 1 1
) ln ( ) ( dx x dx x L l
m
= )
1
1 (
m
+ . Compare with the
distance formulas of (2.1) and (2.2), the distance formula of (3.6) can avoid the subjective
choice of the weights ( 0 >
m
w , 0 >
w , and 0 >
w are arbitrary weights).

3.3 The Least Squares Estimates (Based on Yang and Kos Distance)

In this section, we will consider LR type of response fuzzy numbers and use the
distance formula of (3.6) to find least squares estimates of regression parameters. Under
the structure of Model I, if we have symmetrical LR type fuzzy response numbers
LR i i i
s c y ) , ( = , then r l = in (3.6). The sum of squared error
2
D can be expressed in
vector form:

2
D =
2 2 2
) ( ) ( ) ( ) ( lS C r l a lS C r l a C a + + + +
= S S l S r l r r l C a a a ' 2 ' ' 4 ' ' 2 ' ' 6 ' ' 3
2 2 2
+ +
Let 0
2
=
a
D
and 0
2
=
r
D
then
0
2
=
a
D
= C a ' 6 ' 6
0
2
=
r
D
= S l r l ' 4 ' 4
2 2

and the solutions of a and r which minimize
2
D are as follows:
C a ' ) ' (
1
=

S r ' ) ' (
1
=

(3.7)
Therefore, under the structure of Model I, no matter whether we used the distance
formula of (2.1) or (3.6), we obtained the same least squares estimates and they are are
not related with their class membership functions.

Next, let us consider Model II (DUrso and Gastaldi (2000), doubly linear adaptive
fuzzy regression model), the sum of squared error
2
D can be expressed in vector form:

2
D =
2 2 2
) ( )] 1 ( [ ) ( )] 1 ( [ lS C d ab l a lS C d ab l a C a + + + + + +
= S ba l bda l a a b l C C a C a a ' ' 4 1 ' ' 4 ' ' 2 ' 3 ' 6 ' ' 3
2 2 2 2
+ + +
+ S S l S dn l nd l ' 2 4 2
2 2 2 2
+

Let 0
2
=
a
D
, 0
2
=
b
D
and 0
2
=
d
D
, after lengthy tedious and complicated calculations
(see Appendix I) we obtained the following least squares estimates: a , b
, and d
=
1
3 1
2
2 2
2
4
K
K K K K

C b S d

= ,
a = ) 1 '
2 1 '
2 '
2 ' 3 ( ) ' (
2 3
1
2 2 2 2 1
2 2
X C b l S b l S b l C
b l
+ +
+

(3.8)
where ) '
( 2
2
1
S C n S C l K = , )
( 2 )
( 3
2
2
2
2
2
S n S C n C K = , S C C S n K
' 3 3
3
= ,
and C C
' ) ' (
1
=

, S S
' ) ' (
1
=

, C C
n
= ' 1
1
, S S
n
= ' 1
1
.
The least-squares estimates were obtained by substituting these two sets of a , b
, and
d
into
2
D such that the value of
2
D is minimized. Based on the equations of a , b
, and
d
, we can conclude that these least squares estimates do relate to the membership
function of the response fuzzy number,
LR i i i
s c y ) , ( = .

Under the structure of Model III (DUrso (2001)) and consider nonsymmetrical LR
type of response fuzzy numbers, the sum of squared error
2
D can be expressed in vector
form:

2
D =
2 2 2
) ( )] 1 ( [ ) ( )] 1 ( [ rq C h ag l a lP C d ab l a C a + + + + + +
= a C rg lb a a g r rg b l lb a C C C + + + + + ' ) 2 2 ( ' ' ) 2 2 3 ( ' 6 ' 3
2 2 2 2

+ P C a gh r rh ld db l a q rg a P lb ' 2 1 ' ' ) 2 2 2 2 ( ' ) 2 2 ( ' ) 2 2 (
2 2
+ + + +
+ q q P P n h r n d l q rhn P ldn C n rh ld q C ' ' 2 2 ) 2 2 ( ' 2
2 2 2 2
+ + + + +
Let 0
2
=
a
D
, 0
2
=
b
D
, 0
2
=
d
D
, 0
2
=
g
D
and 0
2
=
h
D
, after lengthy tedious and
complicated calculations we obtained the following equations:

1 ' 2 ' 2 ' 2 ' 2 ' ) 2 4 2 4 6 ( ' 6
2 2 2 2
+ + + + + + ld P lb P C lb a g r rg b l lb C
+ 1 ' 2 1 ' 2 ' 2 ' 2 ' 2 1 ' 2
2 2
+ + gh r rh q rg q C rg db l =0
1 ' ' 2 ' ' 2 ' ' 2 ' 2 ' 2
2 2
+ + da l a ba l a la a lP a lC =0
dn l ba l la lP lC
2 2
2 1 ' ' 2 1 ' ' 2 1 ' 2 1 ' 2 + + =0
1 ' ' 2 ' ' 2 ' ' 2 ' 2 ' 2
2 2
+ + + ha r a ga r a ra a rq a rC =0
hn r ga r ra rq rC
2 2
2 1 ' ' 2 1 ' ' 2 1 ' 2 1 ' 2 + + + =0

Since the equations are too complicated to find general solutions of h g d b a , , , , we just
list the following recursive equations and try to use mathematics software to find possible
solutions.
1 ' 2 ' 2 ' 2 ' 2 ' 3 [ ) ' (
2 2 3
1
2 1
2 2 2 2
+ +
+ +
=

ld P b l P l C lb C
g r b l rg lb
a
1 ' 2 1 2 ' 2 ' 2 ' 2 1 ' 2
2 2 2
+ + + + rh gh r q g r q r C rg bd l ]
) 1 ' ' ' ' ' ' ( ) ' ' (
1
1
+ =

lda a a a lP a C a a
l
b
) 1 ' ' ' ' ' ' ( ) ' ' (
1
1
+ + =

rha a a a rq a C a a
r
g
) 1 ' ' 1 ' ' (
1
+ + = lba a P l C
l
d
) 1 ' ' 1 ' ' (
1
+ + = a rga q r C
r
h
From the above equations, it is obvious that the least squares estimates are related to
the membership function of the response fuzzy number,
LR i i i
s c y ) , ( = .

4. Diagnostic of Outliers and Influences

4.1 Diagnostic of Outliers and Influences in Linear Regression Model

Although a residual analysis is useful in assessing model fit, departures from the
regression model are often hidden by the fitting process. For example, there may be
outliers in either the response or explanatory variables that can have a considerable
effect on the analysis. Observations that significantly affect inferences drawn from the
data are said to be influential. Methods for assessing influence are typically based on the
change in the vector of parameter estimates when observations are deleted.

The leverage
j j jj
x x h
1 '
) ' (

= is associated with the
th
j data point and measures, in
the space of the explanatory variables, how far the
th
j observation is from the other n-1
observations. For a data point with high leverage,
jj
h approaches 1 ) 1 0 (
jj
h ,
indicates it is a possible outlier. The residuals
i i i
y y e = are used to detect possible
outliers for the response variable y, where
i
y is the
th
i predicted y value. A large value of
i
e indicates the
th
i data point could be an outlier. One may also use
) ( ) (
i i i
y y e = =
ii
i
h
e
1
to detect possible outliers, where
) (
i
y is the predicted y value when
the
th
i observation is dropped from the analysis. A large value of
) (i
e also indicates the
th
i data point could be an outlier.

In traditional linear regression analysis, one may use the Cook distance,
2
2
) (

ks
CD
i
i

= =
2 2
2
) 1 (
ii
ii i
h
h
ks
e
to detect possible influential data points where

) (
i
is
the predicted Y vector value when the
th
i observation is dropped from the analysis, k is
the number of parameters, and
k n
e
s
n
i
i
=

=1
2
2
is the mean square error. A large value of
i
CD
indicates that
th
i data point could be an influential observation. One of the advantages of
using Cook distance is that no matter what measurement units are used in the explanatory
and response variables, the value of
i
CD will not be affected.

4.2 Diagnostic of Outliers and Influences in Fuzzy Linear Regression Model

In this section, we will consider the Model I (see (2.3)) and derive the corresponding
formulas of
i
e ,
) (i
e , and
i
CD to detect possible outliers and influential data points. For
Model II (see (2.5) and (2.6)) and Model III (see (3.1), (3.2), and (3.3)), we were not able
to derive any formulas of
i
e ,
) (i
e , and
i
CD to detect possible outliers and influential data
points.

Based on the Euclidean distance, we obtained (see the derivations in Appendix A.2)

2 2 2 2 2
) ( ) ( ) ( ) (
s
i
c
i i i i i i
e e r x s a x c e + = + = (4.1)

2
) (
2
) (
2
) (
) ( ) (
i i i i i i i
r x s a x c e + = =
2
1
|
|
\
|
ii
i
h
e
(4.2)
where a x c e
i i
c
i
= is the residual from the center of a fuzzy number and r x s e
i i
s
i
= is
the residual from the spreads of a fuzzy number. a and r are defined in (2.4).

Similarly, based on the Yang and Kos distance we obtained (see the derivations in
Appendix A.2)

2 2 2 2 2
) ( 2 ) ( 3 ) , (
s
i
c
i i i LR i
e l e y y d e + = = (4.3)
) , (
) (
2 2
) ( i i LR i
y y d e = =3
2
)
1
(
ii
c
i
h
e
+
2 2
)
1
( 2
ii
s
i
h
e
l
=
2
1
|
|
\
|
ii
i
h
e
(4.4)
From (4.2) and (4.4), the relation between
i
e and
) (i
e are the same as in general linear
regression model. That is, a large value of
) (i
e , indicates the
th
i data point could be an
outlier.

In order to derive a formula similar to the Cooks distance under the fuzzy
environment, we need to define a new type of distance between fuzzy vectors. Let
) (
LR
F denote the set of all LR-type fuzzy numbers, and
) (
~
LR
F = ( ) { } ) ( ' ,..., ,
2 1

LR i k
F X X X X is the set of all fuzzy k dimensional vectors.
Based on the distance definition in ) (
LR
F , we can define a new distance in ) (
~
LR
F .

Lemma 4.1 Let d : ) (
LR
F ) (
LR
F be a metric, for any two fuzzy vectors
)' ,..., , (
2 1 k
X X X = , = )' ,..., , (
2 1 k
Y Y Y ) (
~
LR
F , define

=
=
k
i
i i LR
Y X d d
1
2
) , ( ) , (
~
(4.5)
then
LR
d
~
is a metric in ) (
~
LR
F . If d is a complete metric then so does
LR
d
~
(see the proof
in Appendix 3).

When d is a simple metric, define Cooks distance
i
CD as follows:

i
CD =
2
) (
2
)
(
~
ks
d
i LR

=
2
2
) (
2
) (

ks
r r a a
i i
+

then we obtained (see the derivation in Appendix 4)

i
CD =
2
2
2
) 1 (
1
ii
ii i
h
h e
ks
(4.6)
where
k n
e
s
n
i
i
=

=1
2
2
and
2 2 2
) ( ) (
s
i
c
i i
e e e + = .
When d is Yang and Kos metric, define Cooks distance
i
CD as follows:

i
CD =
2
) (
2
)
(
~
ks
d
i LR

= { }
2
) ( ) (
2
) ( ) (
2
) (
2
( ) ( ) ( ) (
1
i i i i i
r l a r l a r l a r l a a a
ks
+ + + +
then we obtained (see Appendix A.4)

i
CD =
2
2
2
) 1 (
1
ii
ii i
h
h e
ks
(4.7)
where
k n
e
s
n
i
i
=

=1
2
2
and
2 2 2 2
) ( 2 ) ( 3
s
i
c
i i
e l e e + = .

Although the formulas (4.6) and (4.7) looks the same, the values of
2
i
e and
2
s are
different. In general,
2
s in (4.7) is larger than the value of
2
s in (4.6), therefore the Cook
distance calculated in (4.6) is larger than the Cook distance calculated in (4.7). From (4.6)
and (4.7) we knew that
i
CD is affected by the leverage value
jj
h and residual
i
e . This is
the same as in the traditional regression analysis.

Since we were not able to derive similar formulas as (4.1) (4.4) for Model II and III,
the best we can do is to delete a data point (per time) and recalculate the values of
) (i
e ,
i
CD , etc.

5. Data Analysis

In this section, we will use the Tanakas data (1987, see Table 1) to illustrate the
theoretical results which we obtained in the previous sections. The data set contains three
independent variables, one fuzzy response variable and ten data points. We only consider
exponential fuzzy response values. The advantage of using exponential membership
function is that we only need to choose appropriate m value ( Note: m is the mean value
of LR type fuzzy numbers) to reflect the distribution of response variable. If the values of
response variable tend to fall outside the interval of existing data then we choose smaller
m value. Otherwise, we will choose larger m value to describe the membership function.

Since we were not able to derive the least squares estimates for model III and we only
consider exponential membership function, we will use model I and II to do data analysis.
Tables 2 11 show the results of using the Euclidean distance, Yang and Kos distance
and different m values. In each table, it contains the least squares estimates, the sum of
squared residuals, the leverage value
jj
h , the values of
2
i
e and
2
) (i
e , and the COOK
distance,
i
CD . Since under the Euclidean simple distance formula, the m value will not
affect the results of using model I and II, therefore we only give the results of m=2 (see
Table 2 and 3).

Table 1: Tanakas Data (1987)
Case # Predictors
1 i
x
2 i
x
3 i
x
Fuzzy Response Variable
) , (
i i i
r c Y =
1 3 5 9 (96,42)
2 14 8 3 (120,47)
3 7 1 4 (52,33)
4 11 7 3 (106,45)
5 7 12 15 (189,79)
6 8 15 10 (194,65)
7 3 9 6 (107,42)
8 12 15 11 (216,78)
9 10 5 8 (108,52)
10 9 7 4 (103,44)

Table 2: Model I, m=2, Least-Squares Estimates Under Euclidean Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 14.69 41.25 0.25
2 (120,47) (122.48, 49.13) 0.43 10.67 32.44 0.21
3 (52,33) (49.36, 32.11) 0.41 7.75 21.90 0.13
4 (106,45) (104.82, 43.01) 0.26 5.35 9.75 0.04
5 (189,79) (191.79, 76.71) 0.55* 13.06 63.57* 0.52*
6 (194,65) (193.64, 67.67) 0.39 7.25 19.38 0.11
7 (107,42) (109.77, 40.85) 0.60* 9.08 55.55* 0.50*
8 (216,78) (211.65, 77.08) 0.42 19.73 58.34* 0.37
9 (108,52) (110.89, 53.24) 0.37 9.91 25.12 0.14
10 (103,44) (103.36, 42.58) 0.18 2.14 3.22 0.01
)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r 63 . 99
2
=
i
e

Table 3: Model II, m=2, Least-Squares Estimates Under Euclidean Distance

Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.86, 42.38) 0.40 4.71 11.08 0.36
2 (120,47) (122.04, 50.63) 0.43 17.34 31.44 0.38
3 (52,33) (50.11, 29.56) 0.41 15.41 92.87* 0.66*
4 (106,45) (104.13, 45.38) 0.26 3.65 6.55 0.35
5 (189,79) (193.31, 71.51) 0.55* 74.67* 90.38* 0.38
6 (194,65) (192.58, 71.30) 0.39 41.67 118.34* 0.63*
7 (107,42) (108.12, 46.55) 0.60* 21.97 38.00 0.39
8 (216,78) (211.71, 76.90) 0.42 19.64 78.55* 0.61*
9 (108,52) (112.48, 47.86) 0.37 37.44 82.31* 0.51
10 (103,44) (102.67, 44.96) 0.18 1.02 5.34 0.34

)' 40 . 5 , 62 . 7 , 43 . 3 , 14 . 3 ( = a , b
=0.29, d
=14.88,
2
i
e =236.98
Table 4: Model I, m=1.2, Least-Squares Estimates Under Yang and Kos Distance

Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 35.61 100.02 0.24
2 (120,47) (122.48, 49.13) 0.43 26.43 80.41 0.20
3 (52,33) ( 49.36, 32.11) 0.41 22.26 62.94 0.15
4 (106,45) (104.82, 43.01) 0.26 11.17 31.59 0.03
5 (189,79) (191.79, 76.71) 0.55* 32.71 159.18* 0.51*
6 (194,65) (193.64, 67.67) 0.39 12.99 34.72 0.08
7 (107,42) (109.77, 40.85) 0.60* 25.61 156.77* 0.55*
8 (216,78) (211.65, 77.08) 0.42 58.16 171.95* 0.42
9 (108,52) (110.89, 53.24) 0.37 27.84 70.53 0.16
10 (103,44) (103.36, 42.58) 0.18 3.95 5.94 0.01

)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a , )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r ,
2
i
e =256.73

Table 5: Model II, m=1.2, Least-Squares Estimates Under Yang and Kos Distance

Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (94.91, 37.94) 0.40 32.79 60.26 0.02
2 (120,47) (122.32, 49.77) 0.43 29.66 80.87 0.04
3 (52,33) (52.54, 19.65) 0.41 316.42* 360.47* 0.08
4 (106,45) (105.01, 42.30) 0.26 15.89 24.84 0.01
5 (189,79) (191.09, 79.46) 0.55* 13.48 63.15 0.04
6 (194,65) (190.68, 79.29) 0.39 394.20* 536.55* 0.07
7 (107,42) (108.98, 44.01) 0.60* 18.96 104.09 0.07
8 (216,78) (209.07, 87.22) 0.42 294.63* 498.27* 0.13*
9 (108,52) (112.82, 45.67) 0.37 140.58 207.55 0.05
10 (103,44) (103.52, 41.69) 0.18 10.51 13.72 0.002

)' 19 . 5 , 41 . 7 , 30 . 3 , 28 . 1 ( = a , b
=0.43, d
= - 3.03,
2
i
e =1267.13

Table 6: Model I, m=2, Least-Squares Estimates Under Yang and Kos Distance

Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 34.24 96.18 0.23
2 (120,47) (122.48, 49.13) 0.43 25.53 77.68 0.20
3 (52,33) (49.36, 32.11) 0.41 22.10 62.49 0.15
4 (106,45) (104.82, 43.01) 0.26 10.38 18.91 0.03
5 (189,79) (191.79, 76.71) 0.55* 31.66 154.08* 0.51*
6 (194,65) (193.64, 67.67) 0.39 11.57 30.93 0.07
7 (107,42) (109.77, 40.85) 0.60* 25.35 155.17* 0.55*
8 (216,78) (211.65, 77.08) 0.42 57.99 171.46* 0.43
9 (108,52) (110.89, 53.24) 0.37 27.53 69.75 0.16
10 (103,44) (103.36, 42.58) 0.18 3.55 5.34 0.01

)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a , )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r ,
2
i
e =249.91

Table 7: Model II, m=2, Least-Squares Estimates Under Yang and Kos Distance

Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (94.76, 37.78) 0.40 32.73 61.86 0.02
2 (120,47) (122.33, 49.76) 0.43 28.30 77.95 0.04
3 (52,33) (52.28, 19.29) 0.41 295.52* 336.34* 0.09
4 (106,45) (105.00, 42.22) 0.26 15.12 23.74 0.01
5 (189,79) (191.12, 79.67) 0.55* 14.15 67.62 0.06
6 (194,65) (190.93, 79.59) 0.39 362.74* 493.27* 0.07
7 (107,42) (109.07, 43.99) 0.60* 19.09 107.17 0.08
8 (216,78) (209.27, 87.57) 0.42 279.81* 476.79* 0.13*
9 (108,52) (112.65, 45.54) 0.37 130.24 194.90 0.05
10 (103,44) (103.59, 41.60) 0.18 10.06 13.27 0.002

)' 17 . 5 , 45 . 7 , 29 . 3 , 11 . 1 ( = a , b
=0.43, d
= - 3.45,
2
i
e =1187.75


Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 34.41 96.65 0.23
2 (120,47) (122.48, 49.13) 0.43 25.64 78.01 0.20
3 (52,33) (49.36, 32.11) 0.41 22.12 62.54 0.15
4 (106,45) (104.82, 43.01) 0.26 10.48 19.08 0.03
5 (189,79) (191.79, 76.71) 0.55* 31.79 154.70* 0.51*
6 (194,65) (193.64, 67.67) 0.39 11.74 31.39 0.07
7 (107,42) (109.77, 40.85) 0.60* 25.38 155.36* 0.55*
8 (216,78) (211.65, 77.08) 0.42 58.01 171.52* 0.43
9 (108,52) (110.89, 53.24) 0.37 27.57 69.85 0.16
10 (103,44) (103.36, 42.58) 0.18 3.60 5.41 0.01

)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a , )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r ,
2
i
e =250.74


Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (94.78, 37.79) 0.40 32.73 61.65 0.02
2 (120,47) (122.33, 49.76) 0.43 28.46 78.30 0.04
3 (52,33) (52.31, 19.34) 0.41 298.04* 339.26* 0.08
4 (106,45) (105.00, 42.23) 0.26 15.21 23.87 0.01
5 (189,79) (191.11, 79.64) 0.55* 14.06 67.06 0.05
6 (194,65) (190.90, 79.55) 0.39 366.55* 498.51* 0.07
7 (107,42) (109.06, 43.99) 0.60* 19.08 102.80 0.08
8 (216,78) (209.24, 87.52) 0.42 281.59* 479.11* 0.13*
9 (108,52) (112.67, 45.56) 0.37 131.49 196.43 0.05
10 (103,44) (103.59, 41.61) 0.18 10.11 13.32 0.002

)' 17 . 5 , 44 . 7 , 29 . 3 , 13 . 1 ( = a , b
=0.43, d
= - 3.39,
2
i
e =1197.32


Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 35.88 100.80 0.24
2 (120,47) (122.48, 49.13) 0.43 26.62 80.97 0.20
3 (52,33) (49.36, 32.11) 0.41 22.29 63.03 0.15
4 (106,45) (104.82, 43.01) 0.26 11.33 20.64 0.03
5 (189,79) (191.79, 76.71) 0.55* 32.92 160.21* 0.51*
6 (194,65) (193.64, 67.67) 0.39 13.28 35.49 0.08
7 (107,42) (109.77, 40.85) 0.60* 25.67 157.10* 0.54*
8 (216,78) (211.65, 77.08) 0.42 58.19 172.06* 0.42
9 (108,52) (110.89, 53.24) 0.37 27.90 70.69 0.15
10 (103,44) (103.36, 42.58) 0.18 4.02 6.06 0.01

)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a , )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r ,
2
i
e =258.10


Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (94.93, 37.97) 0.40 32.80 59.96 0.01
2 (120,47) (122.31, 49.77) 0.43 29.93 81.46 0.04
3 (52,33) (52.59, 19.71) 0.41 320.68* 365.36* 0.08
4 (106,45) (105.01, 42.31) 0.26 16.05 25.07 0.01
5 (189,79) (191.09, 79.43) 0.55* 13.36 62.29 0.04
6 (194,65) (190.63, 79.23) 0.39 400.57* 545.32* 0.07
7 (107,42) (108.96, 44.02) 0.60* 18.93 103.48 0.07
8 (216,78) (209.03, 87.16) 0.42 297.67* 502.71* 0.13*
9 (108,52) (112.86, 45.70) 0.37 142.67 210.11 0.05
10 (103,44) (103.59, 41.70) 0.18 10.61 13.82 0.002

)' 19 . 5 , 40 . 7 , 30 . 3 , 31 . 1 ( = a , b
=0.43, d
= - 2.96,
2
i
e =1283.29

5.1 Discussion

From Table 2 and 3, the estimates of center and spread under model I are better than
those estimates in model II. In theory, if we useYang and Kos distance, the estimates of
center and spread under model II should be affected by the value of m. But, based on
Tables 5,7,9,11, we found that different m values do not affect very much on the
estimates.

In theory, the distance formula and m values do not affect the estimates of model I
parameters. But, they do affect the parameter estimates in model II. Based on Tables 3
and 5, the usage of different formula has more effect on the parameter estimates in model
II.

Case #5 and #7 have larger leverage values
ii
h , they are possible outliers from the
predictors. In model I, based on the value of
i
e it seems no possible outliers from the
response variable. However, based on the values of
) (i
e in Tables 2,4,6,8, case #5,7,8 are
possible outliers from the response variable. In model II under the Euclidean distance,
Tale 3 shows that case #3,5,6,8,9 are the five possible outliers from the response variable.
But, under the Yang and Kos distance, Tables 5,7,9,11 show that only case #3,6,8 are the
three possible outliers from the response variable.

Under model I, based on Tables 2,4,6,8,10, case #5,7 have larger
i
CD values and they
are influential observations. Under model II and Euclidean distance, table 3 shows that
case #3,6,8 have larger
i
CD values. But, in model II and use Yang and Kos distance,
only case #8 has large
i
CD value and is an influential point (see Tables 5,7,9,11).

If we use exponential membership function for our fuzzy numbers and useYang and
Kos distance, how to best choose the m value to do fuzzy liner regression under model
II? The simplest rule is to choose the m value such that the residual sum of squares,
2
i
e , is
smallest. Based on tables 5,7,9,11, we can see the best choice is m=2.

APPENDIX

A.1: The derivation of a , b
, and d
in (3.8)

2
D =
2 2 2
) ( )] 1 ( [ ) ( )] 1 ( [ lS C d ab l a lS C d ab l a C a + + + + + +

= S ba l bda l a a b l C a C C C a a C a a ' ' 2 1 ' ' 2 ' ' ' ' 2 ' ' ' ' ' '
2 2 2 2
+ + + +
+ S S l d S l a bS l C C a C S d l d l a bd l ' 2 1 ' 2 ' 2 ' 2 ' 2 ' 1 2 1 ' 1 2 ' 1 2
2 2 2 2 2 2 2
+ + +
= S ba l bda l a a b l C C a C a a ' ' 4 1 ' ' 4 ' ' 2 ' 3 ' 6 ' ' 3
2 2 2 2
+ + +
+ S S l S dn l nd l ' 2 4 2
2 2 2 2
+

Let 0
2
=
a
D
, 0
2
=
b
D
and 0
2
=
d
D
, we obtained
0
2
=
a
D
= S b l bd l a b l C a ' 2 1 ' 2 ' 2 ' 3 ' 3
2 2 2 2
+ + (A.1.1)
0
2
=
b
D
= S a da a ba ' ' 1 ' ' ' ' + (A.1.2)
0
2
=
d
D
= S nd ba ' 1 1 ' ' + (A.1.3)
From (A.1.1), we obtained
a = ) 1 '
2 1 '
2 '
2 ' 3 ( ) ' (
2 3
1
2 2 2 2 1
2 2
X C b l S b l S b l C
b l
+ +
+

, substituting a into
(A.1.2) and (A.1.3), we obtained
0
2
=
b
D
= d C n C dn b l S C b l C b 9 6 '
9
2 2 2 2
2
+ +
+ b S l S C nbd l bd S n l
2
2 2 2 2
6
' 9 6 12 (A.1.4)
0
2
=
d
D
= S d C b + (A.1.5)
where C C
' ) ' (
1
=

, S S
' ) ' (
1
=

, C C
n
= ' 1
1
, S S
n
= ' 1
1
.
From (A.1.5), we obtained C b S d

= and substituting it into (A.1.4) we obtained a
quadratic equation of b, 0
3 2
2
1
= + + K b K b K . The solution is
b
=
1
3 1
2
2 2
2
4
K
K K K K

A.2: The derivation of (4.2), (4.3), and (4.4)

I. Based on Euclidean distance formula, we have
2
) (
2
) (
2
) (
) ( ) (
i i i i i i i
r x s a x c e + =
since
' 1
) (
) ' (
1

i
ii
c
i
i
x
h
e
a a

= , and
' 1
) (
) ' (
1

i
ii
s
i
i
x
h
e
r r

= therefore

2
) (i
e =
2 ' 1 2 ' 1
) ) ' (
1
( ) ) ' (
1
(
i i
ii
s
i
i i i i
ii
c
i
i i
x x
h
e
r x s x x
h
e
a x c

+ +
+
=
2 2
)
1
( )
1
(
ii
s
i
ii
c
i
h
e
h
e

=
2
1
|
|
\
|
ii
i
h
e

II. Based on Yang and Kos distance formula, we have

2
i
e =
2 2 2
)] ( ) [( )] ( ) [( ) ( r lx a x ls c r lx a x ls c a x c
i i i i i i i i i i
+ + + +
=
2 2
)] ( [ 2 ) ( 3 r x s l a x c
i i i i
+
=
2 2 2
) ( 2 ) ( 3
s
i
c
i
e l e +

2
) (i
e =
2
) (
'
) (
' 2
) (
'
) (
2
) (
)] ( ) [( )] ( ) [( ) (
i i i i i i i i i i i i i i i
r lx a x ls c r lx a x ls c a x c + + + +
=
2
) (
2
) (
)) ( ( 2 ) ( 3
i i i i i i
r x l ls a x c +
=3
2
)
1
(
ii
c
i
h
e
+
2 2
)
1
( 2
ii
s
i
h
e
l

=
2
1
|
|
\
|
ii
i
h
e

A.3: Proof of Lemma 4.1

In order to prove
LR
d
~
is a metric, we need to prove the following three properties:
1. , ) (
~
LR
F , 0 ) , (
~

LR
d . If ) , (
~

LR
d =0 then = .
2. , ) (
~
LR
F , ) , (
~

LR
d = ) , (
~

LR
d .
3. , , ) (
~
LR
F , ) , (
~

LR
d ) , (
~

LR
d + ) , (
~

LR
d .

Since d is a metric, its easy to show that properties 1 and 2 are satisfied. We need
to show that property 3 is satisfied:
) , (
~

LR
d =
=
k
i
i i
Y X d
1
2
) , (
=
k
i
i i
Z X d
1
2
) , ( +
=
k
i
i i
Y Z d
1
2
) , (
) , (
~

LR
d + ) , (
~

LR
d +2

=
k
i
i i
Z X d
1
2
) , (

=
k
i
i i
Y Z d
1
2
) , (
=( )
2
) , (
~
) , (
~
+
LR LR
d d
Therefore, ) , (
~

LR
d ) , (
~

LR
d + ) , (
~

LR
d .
Assume that ( ) (
LR
F , )
LR
d is a complete metric. Let { }
1 m
m
be a Cauchy
sequence in ) (
~
LR
F , i.e., 0 > , l l m m > ' , < ) , (
~
' m m
LR
d . Then,
for l m m > ' , , ) , (
' m
j
m
j
X X d <

=
k
i
m
i
m
i
X X d
1
' 2
) , ( = < ) , (
~
' m m
LR
d . Hence,
k j 1 , { }
=1 m
m
j
X is a Cauchy sequence in ) (
LR
F . Therefore,
j
X ) (
LR
F ,
j
m
j
X X . Let )' ,..., , (
2 1 k
X X X = . Since
j
m
j
X X for 0 > ,
j
n
for
j
n m > , we have
k
X X d
j
m
j

< ) , ( , k j ,..., 2 , 1 = . Let { }
k
n n n n ,..., , max
2 1
= ,
Then for n m > , we have ) , (
~

m
LR
d = <
=
k
i
i
m
i
X X d
1
2
) , ( . That is,
m
.
A.4: The derivation of equations (4.6) and (4.7)

Under the Euclidean distance: )
(
~ 1
) (
2
i LR i
d
ks
CD = =

=
n
i
i i
Y Y d
ks
1
) (
2
2
)
(
1

=
)
`
+

= =
n
i
n
i
i i i i i i
r x r x a x a x
ks
1 1
2
) (
2
) (
2
) ( ) (
1

=
)
`
ii
ii
s
i
ii
ii
c
i
h
h
e
h
h
e
ks
2 2
2
)
1
( )
1
(
1

=
2
2
2
) 1 (
1
ii
ii i
h
h e
ks

Under Yang and Kos distance: )
(
~ 1
) (
2
i LR i
d
ks
CD = =

=
n
i
i i
Y Y d
ks
1
) (
2
2
)
(
1

=

= =
+
n
i
i i i i i i
n
i
i i i
r lx a x r lx a x a x a x
ks
1
2
) ( ) (
1
2
) (
2
)] ( ) [( ) ( {
1

+
=
+ +
n
i
i i i i i i
r lx a x r lx a x
1
2
) ( ) (
} )] ( ) [(
=
|
|
\
|
+
|
|
\
|
ii
ii
s
i
ii
ii
c
i
h
h
e
l h
h
e
ks
2
2
2
2
1
2
1
3
1

=
2
2
2
) 1 (
1
ii
ii i
h
h e
ks

REFERENCES

1. D. Dubois, H. Prade, Fuzzy Sets and Systems: Theory and Applications, Academic
Publishers, New York, 1980.
2. Zimmermann, H. J., (1996), Fuzzy Set Theory and Its Applications, Kluwer
Academic Press, Dordrecht.
3. Draper, N. R. and Smith, H. (1980), Applied Regression Analysis, Wiley, New
York.
4. DUrso, P. and Gastaldi, T. (2000), A Least-squares Approach to Fuzzy Linear
Regression Analysis, Computational Statistics and Data Analysis 34, 427-440.
5. DUrso, P., (2003), Linear Regression Analysis for Fuzzy/Crisp Input and
Fuzzy/Crisp Output Data, Computational Statistics and Data Analysis 42, 47-72.
6. Tanaka, H., (1987), Fuzzy Data Analysis by Possibilistic Linerar Models, Fuzzy Sets
and Systems 24, 363-375.
7. Tanaka, H., Uejima, S., Asai, K., (1982), Fuzzy Linear Regression Model, IEEE
Trans. Systems Man Cybernet 12, 903-907.
8. Xu, R. and Li, C., (2001), Multidimensional Least-squares Fitting With a Fuzzy
Model, Fuzzy Sets and Systems 119, 215-223.
9. Yang, M. S. and Ko, C. H., (1996), On a Class of c-numbers Clustering Procedures
for Fuzzy Data, Fuzzy Sets and Systems 84, 49-60.
10. Yang, M. S. and Liu, H. H., (2003), Fuzzy Least-squares Algorihms for Interactive
Fuzzy Linear Regression Models, Fuzzy Sets and Systems 135, 305-316.
11. PENA, Daniel (2005), A New Statistics for Influence in Linear Regression,
Technometrics VOL. 47, NO. 1, 1-12.

A Study of Fuzzy Lenear Regression !!!!

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Study of Fuzzy Lenear Regression !!!!

Enviado por

Direitos autorais:

Formatos disponíveis

A Study of Fuzzy Linear Regression

Dr. Jann-Huei Jinn

where m is called the center (mean or mode)value of A

w are arbitrary weights.

= ) 1 ' ' ' ' ( ) ' ' (

= ) 1 ' ' 1 ' (

= ) 1 ' ' 1 ' (

, are the iterative least-squares estimates (obtained at the end of the

, , we can conclude that the estimates of

w are arbitrary weights).

to detect possible influential data points where

Você também pode gostar