Escolar Documentos
Profissional Documentos
Cultura Documentos
|
|
\
|
|
\
|
,
,
m x
m x
R
m x
x m
L
, 1
, 1
m x m
m x
m x m
x m
Another example, the exponential fuzzy number, its membership function
A(x)=
(
(
\
|
(
(
\
|
, exp
, exp
m x
s
m x
m x
s
x m
n
n
where s is the spread.
Definition 2.2 (Dubois(1980))
Let A=( )
LR a a a
m , , and B=( )
LR b b b
m , , be two LR-type fuzzy numbers. Then by
the extension principle, the following operations are defined:
1. A+B=( )
LR b a b a b a
m m + + + , ,
2. A= ( )
LR a a a
m , , =( )
LR a a a
m , , when >0
3. A= ( )
LR a a a
m , , =( )
RL a a a
m , , when <0
4. B=( )
RL b b b
m , ,
5. AB=( )
LR a a a
m , , ( )
RL b b b
m , , =( )
LR b a b a b a
m m + + , ,
Definition 2.3 (a Euclidean distance formula)
Let A=( )
LR a a
m , and B=( )
LR b b
m , be two symmetrical fuzzy numbers, then the
distance between A and B is defined as:
D=
2 2
) ( ) (
b a b a
m m + (2.1)
Let A=( )
LR a a a
m , , and B=( )
LR b b b
m , , be two LR-type fuzzy numbers, then the
distance between A and B is defined as:
D=
w w w m m
b a b a m b a
2 2 2
) ( ) ( ) ( + + (2.2)
Where 0 >
m
w , 0 >
w , and 0 >
=
+ + + + + + +
n
i
ip p i o i ip p i o i
x r x r r s x a x a a c
1
2
1 1
2
1 1
)) ... ( ( )) ... ( (
Let v
r
denote the length of vector v
r
, then by using vector and matrix expressions
2
D can be rewritten as
2
D =
2
C a +
2
S r where is a ) 1 ( + p n design
matrix, )' ,..., , (
1 p o
a a a a = , )' ,..., , (
1 p o
r r r r = , )' ,..., , (
2 1 n
c c c C = , and )' ,..., , (
2 1 n
s s s S = .
Let 0
2
=
a
D
and 0
2
=
r
D
then the solutions of a and r which minimize
2
D are as
follows: C a ' ) ' (
1
=
S r ' ) ' (
1
=
(2.4)
The above method used regression with respect to center and spread. The estimation
results are not related to the membership functions. But, in the later real data analyses,
this method provided better results in the estimation of fuzzy parameter values.
2.3 Symmetrical Doubly Linear Adaptive Fuzzy Regression Model
Under the structure of model I, if we use the Euclidean distance formula and the least-
squares method to do linear regression with respect to center and spread respectively,
then the estimates show that the centers and spreads are not related. But, DUrso and
Gastaldi (2000) think that the dynamic of the spreads is somehow dependent on the
magnitude of the (estimated) centers. Therefore, they proposed the doubly linear adaptive
fuzzy regression model (call it Model II) to obtain the parameter estimates.
They considered symmetrical fuzzy numbers with triangular membership function.
Where a fuzzy number,
i
y = ) , (
i i
s c , is completely identified by the two parameters c
(center) and s (left and right spread). Model II is defined as follows:
c
C C + =
*
a C =
*
(2.5)
s
S S + =
*
d b C S 1
* *
+ = (2.6)
where is a ) 1 ( + p n matrix containing the input variables (data
matrix), )' ,..., , (
1 p o
a a a a = is a column vector containing the regression parameters of the
first model (referred to as core regression model), )' ,..., , (
2 1 n
c c c C = and a C =
*
are the
vector of the observed centers and the vector of the interpolated centers, respectively,
both having dimensions 1 n , and )' ,..., , (
2 1 n
s s s S = and
*
S are the vector of the
assigned spreads and the vector of the interpolated spreads, respectively, both having
dimension 1 n , 1 is a 1 n -vector of all 1's, b and d are the regression parameters for
the second regression model (referred to as spread regression model).
Apparently, the above model is based on two linear models. The first one interpolates
the centers of the fuzzy observations, the second one yields the spreads, by building
another linear model over the first one. Observe that predictive variables X are taken into
consideration in Eq. (2.6) through the observed centers. The model is hence capable to
take into account possible linear relations between the size of the spreads and the
magnitude of the estimated centers. This is often the case in real world applications,
where dependence among centers and spreads is likely (for instance, the uncertainty or
fuzziness with a measurement could depend on its magnitude).
DUrso and Gastaldi used the Euclidean distance formula of (2.1) and the least-
squares method to obtain the estimates of a , b and d such that the value of
2
D is
minimized, where
2
D =
2
*
2
*
S S C C +
=
2 2
1 ' ' 2 1 ' 2 ' 2 ' ) 1 ( ' ' ' 2 ' nd bd a d S ab S S S b a a a C C C + + + + +
Let 0
2
=
a
D
, 0
2
=
b
D
and 0
2
=
d
D
, they obtained the following equations:
0
2
=
a
D
= bd Sb b a C 1 ' ' ) 1 ( ' '
2
+ + +
0
2
=
b
D
= d a a S ab a 1 ' ' ' ' ' +
0
2
=
d
D
= nd b a S + + 1 ' ' 1 ' (2.7)
Based on the equations in (2.7), they obtained the following least-squares iterative
solutions of a , b and d :
a = )) 1 ( ' ) ' ((
) 1 (
1
1
2
bd Sb C
b
+
+
b = ) 1 ' ' ' ( ) ' ' (
1
d a a S a a
d = ) 1 ' ' 1 ' (
1
b a S
n
(2.8)
The derivation of the recursive solutions of a , b and d : from the first equation of (2.7),
we can easily obtain a = )) 1 ( ' ) ' ((
) 1 (
1
1
2
bd Sb C
b
+
+
, substituting it into the
second and third equations of (2.7), we obtained:
0 2
'
'
2
2
2 2
2
= + + + nbd d C n bd S n b S C S d b C n b S C b C (2.9)
and 0 = + d S C b (2.10)
where C C
' ) ' (
1
=
, S S
' ) ' (
1
=
, C C
n
= ' 1
1
, S S
n
= ' 1
1
.
From (2.10), we obtained C b S d = , substituting it back into (2.9), we obtained a
simplified quadratic equation of b: 0
3 2
2
1
= + + M b M b M
Where S C n S C M =
'
1
,
2 2
2 2
2
C n S n S C M + = , C S S C n M
'
3
= .
By solving the quadratic equation of b, we obtained
1
3 1
2
2 2
2
4
M
M M M M
b
= , and the corresponding solutions of
C b S d
= ,
))
1
( ' ) ' ((
)
1 (
1
1
2
d b b S C
b
a +
+
=
The least-squares estimates were obtained by substituting these two sets of a , b
, and
d
into
2
D such that the value of
2
D is minimized. Based on the equations of a , b
, and
d
, we can conclude that no matter what kind of membership function of the response
fuzzy number,
i
y = ) , (
i i
s c , the estimates of parameters are the same. Therefore, these
least squares estimates do not consider other possible shapes of fuzzy numbers.
3. LR type of Fuzzy Linear Regression
3.1 onsymmetrical Doubly Linear Adaptive Fuzzy Regression Model
When we have numerical (crisp) explanatory variables
j
X ( ) ,..., 2 , 1 k j = and a LR
fuzzy dependent variable ) , , ( q p c Y (where c is the center, p and q , respectively, the
left and right spreads), a model capable to incorporate the possible influence of the
magnitude of the centers on the spreads, can be taken into account (DUrso and Gastaldi,
2000, 2001, 2002). If the fuzzy response numbers ] , [
i i i i i
p c q c y + = are
nonsymmetrical with triangular membership function. DUrso (2003) proposed a fuzzy
regression model (call it Model III) which is expressed in the matrix form:
+ =
*
C C a C = * (3.1)
+ =
*
P P d b C P 1
* *
+ = (3.2)
+ =
*
q q h g C q 1
* *
+ = (3.3)
where is a ) 1 ( + k n matrix containing the vector 1 concatenated to k crisp input
variables; C ,
*
C are 1 n vectors of the observed centers and interpolated centers,
respectively; P ,
*
P are 1 n vectors of observed left spreads and interpolated left spreads,
respectively; q ,
*
q are 1 n vectors of observed right spreads and interpolated right
spreads, respectively; a is a 1 ) 1 ( + k vector of regression parameters for the regression
model for C ; h g d b , , , are regression parameters for the regression models for P and q ;1
is a 1 n vector of all 1s; , , are 1 n vectors of residuals.
This model is based on three sub-models. The first one interpolates the centers of the
fuzzy data, the other two sub-models are built over the first one and yield the spreads.
This formulation allows the model to consider possible relations between the size of the
spreads and the magnitude of the estimated centers, as it is often necessary in real case
studies. Model III can be called a nonsymmetrical doubly linear adaptive fuzzy regression
model.
DUrso used the Euclidean distance formula of (2.2) and the least-squares method to
obtain the estimates of a , h g d b , , , such that the value of
2
D is minimized, where
2
D =
c
C C
2
*
+
p
P P
2
*
+
q
q q
2
*
=
c
a a a C C C ) ' ' ' 2 ' ( +
+
p
nd bd a ab a d ab P P P ) 1 ' ' 2 ' ' ) 1 ( ' 2 ' (
2 2
+ + + +
+
q
nh gh a ag a h ag q q q ) 1 ' ' 2 ' ' ) 1 ( ' 2 ' (
2 2
+ + + + (3.4)
and
c
,
p
,
q
are arbitrary positive weights.
Recursive solutions to the above system are found by equating to zeros the partial
derivates with respect to the parameters a , h g d b , , , :
a = | |
q p c
q p c
g h q b d P C
g b
) 1 ( ) ) 1 ( ( ' ) ' (
1
1
2 2
+ +
+ +
b
, ,
, ,
=
1
0
1
) ( d L l and
=
1
0
1
) ( d R r
Yang and Ko (1996) also proved that ( ) (
LR
F ,
LR
d ) is a complete metric space. If A
and B are symmetrical LR type fuzzy numbers then r l = and
2 2 2 2
) ( 2 ) ( 3 ) , (
b a b a LR
l m m B A d + = . If A and B are symmetrical triangular type of
fuzzy numbers then
= = =
1
0
1
0
1
2
1
) 1 ( ) ( dx x dx x L l . If A and B are exponential type of
fuzzy numbers then
= =
1
0
1
0
/ 1 1
) ln ( ) ( dx x dx x L l
m
= )
1
1 (
m
+ . Compare with the
distance formulas of (2.1) and (2.2), the distance formula of (3.6) can avoid the subjective
choice of the weights ( 0 >
m
w , 0 >
w , and 0 >
a
D
and 0
2
=
r
D
then
0
2
=
a
D
= C a ' 6 ' 6
0
2
=
r
D
= S l r l ' 4 ' 4
2 2
and the solutions of a and r which minimize
2
D are as follows:
C a ' ) ' (
1
=
S r ' ) ' (
1
=
(3.7)
Therefore, under the structure of Model I, no matter whether we used the distance
formula of (2.1) or (3.6), we obtained the same least squares estimates and they are are
not related with their class membership functions.
Next, let us consider Model II (DUrso and Gastaldi (2000), doubly linear adaptive
fuzzy regression model), the sum of squared error
2
D can be expressed in vector form:
2
D =
2 2 2
) ( )] 1 ( [ ) ( )] 1 ( [ lS C d ab l a lS C d ab l a C a + + + + + +
= S ba l bda l a a b l C C a C a a ' ' 4 1 ' ' 4 ' ' 2 ' 3 ' 6 ' ' 3
2 2 2 2
+ + +
+ S S l S dn l nd l ' 2 4 2
2 2 2 2
+
Let 0
2
=
a
D
, 0
2
=
b
D
and 0
2
=
d
D
, after lengthy tedious and complicated calculations
(see Appendix I) we obtained the following least squares estimates: a , b
, and d
=
1
3 1
2
2 2
2
4
K
K K K K
C b S d
= ,
a = ) 1 '
2 1 '
2 '
2 ' 3 ( ) ' (
2 3
1
2 2 2 2 1
2 2
X C b l S b l S b l C
b l
+ +
+
(3.8)
where ) '
( 2
2
1
S C n S C l K = , )
( 2 )
( 3
2
2
2
2
2
S n S C n C K = , S C C S n K
' 3 3
3
= ,
and C C
' ) ' (
1
=
, S S
' ) ' (
1
=
, C C
n
= ' 1
1
, S S
n
= ' 1
1
.
The least-squares estimates were obtained by substituting these two sets of a , b
, and
d
into
2
D such that the value of
2
D is minimized. Based on the equations of a , b
, and
d
, we can conclude that these least squares estimates do relate to the membership
function of the response fuzzy number,
LR i i i
s c y ) , ( = .
Under the structure of Model III (DUrso (2001)) and consider nonsymmetrical LR
type of response fuzzy numbers, the sum of squared error
2
D can be expressed in vector
form:
2
D =
2 2 2
) ( )] 1 ( [ ) ( )] 1 ( [ rq C h ag l a lP C d ab l a C a + + + + + +
= a C rg lb a a g r rg b l lb a C C C + + + + + ' ) 2 2 ( ' ' ) 2 2 3 ( ' 6 ' 3
2 2 2 2
+ P C a gh r rh ld db l a q rg a P lb ' 2 1 ' ' ) 2 2 2 2 ( ' ) 2 2 ( ' ) 2 2 (
2 2
+ + + +
+ q q P P n h r n d l q rhn P ldn C n rh ld q C ' ' 2 2 ) 2 2 ( ' 2
2 2 2 2
+ + + + +
Let 0
2
=
a
D
, 0
2
=
b
D
, 0
2
=
d
D
, 0
2
=
g
D
and 0
2
=
h
D
, after lengthy tedious and
complicated calculations we obtained the following equations:
1 ' 2 ' 2 ' 2 ' 2 ' ) 2 4 2 4 6 ( ' 6
2 2 2 2
+ + + + + + ld P lb P C lb a g r rg b l lb C
+ 1 ' 2 1 ' 2 ' 2 ' 2 ' 2 1 ' 2
2 2
+ + gh r rh q rg q C rg db l =0
1 ' ' 2 ' ' 2 ' ' 2 ' 2 ' 2
2 2
+ + da l a ba l a la a lP a lC =0
dn l ba l la lP lC
2 2
2 1 ' ' 2 1 ' ' 2 1 ' 2 1 ' 2 + + =0
1 ' ' 2 ' ' 2 ' ' 2 ' 2 ' 2
2 2
+ + + ha r a ga r a ra a rq a rC =0
hn r ga r ra rq rC
2 2
2 1 ' ' 2 1 ' ' 2 1 ' 2 1 ' 2 + + + =0
Since the equations are too complicated to find general solutions of h g d b a , , , , we just
list the following recursive equations and try to use mathematics software to find possible
solutions.
1 ' 2 ' 2 ' 2 ' 2 ' 3 [ ) ' (
2 2 3
1
2 1
2 2 2 2
+ +
+ +
=
ld P b l P l C lb C
g r b l rg lb
a
1 ' 2 1 2 ' 2 ' 2 ' 2 1 ' 2
2 2 2
+ + + + rh gh r q g r q r C rg bd l ]
) 1 ' ' ' ' ' ' ( ) ' ' (
1
1
+ =
lda a a a lP a C a a
l
b
) 1 ' ' ' ' ' ' ( ) ' ' (
1
1
+ + =
rha a a a rq a C a a
r
g
) 1 ' ' 1 ' ' (
1
+ + = lba a P l C
l
d
) 1 ' ' 1 ' ' (
1
+ + = a rga q r C
r
h
From the above equations, it is obvious that the least squares estimates are related to
the membership function of the response fuzzy number,
LR i i i
s c y ) , ( = .
4. Diagnostic of Outliers and Influences
4.1 Diagnostic of Outliers and Influences in Linear Regression Model
Although a residual analysis is useful in assessing model fit, departures from the
regression model are often hidden by the fitting process. For example, there may be
outliers in either the response or explanatory variables that can have a considerable
effect on the analysis. Observations that significantly affect inferences drawn from the
data are said to be influential. Methods for assessing influence are typically based on the
change in the vector of parameter estimates when observations are deleted.
The leverage
j j jj
x x h
1 '
) ' (
= is associated with the
th
j data point and measures, in
the space of the explanatory variables, how far the
th
j observation is from the other n-1
observations. For a data point with high leverage,
jj
h approaches 1 ) 1 0 (
jj
h ,
indicates it is a possible outlier. The residuals
i i i
y y e = are used to detect possible
outliers for the response variable y, where
i
y is the
th
i predicted y value. A large value of
i
e indicates the
th
i data point could be an outlier. One may also use
) ( ) (
i i i
y y e = =
ii
i
h
e
1
to detect possible outliers, where
) (
i
y is the predicted y value when
the
th
i observation is dropped from the analysis. A large value of
) (i
e also indicates the
th
i data point could be an outlier.
In traditional linear regression analysis, one may use the Cook distance,
2
2
) (
ks
CD
i
i
= =
2 2
2
) 1 (
ii
ii i
h
h
ks
e
i
is
the predicted Y vector value when the
th
i observation is dropped from the analysis, k is
the number of parameters, and
k n
e
s
n
i
i
=
=1
2
2
is the mean square error. A large value of
i
CD
indicates that
th
i data point could be an influential observation. One of the advantages of
using Cook distance is that no matter what measurement units are used in the explanatory
and response variables, the value of
i
CD will not be affected.
4.2 Diagnostic of Outliers and Influences in Fuzzy Linear Regression Model
In this section, we will consider the Model I (see (2.3)) and derive the corresponding
formulas of
i
e ,
) (i
e , and
i
CD to detect possible outliers and influential data points. For
Model II (see (2.5) and (2.6)) and Model III (see (3.1), (3.2), and (3.3)), we were not able
to derive any formulas of
i
e ,
) (i
e , and
i
CD to detect possible outliers and influential data
points.
Based on the Euclidean distance, we obtained (see the derivations in Appendix A.2)
2 2 2 2 2
) ( ) ( ) ( ) (
s
i
c
i i i i i i
e e r x s a x c e + = + = (4.1)
2
) (
2
) (
2
) (
) ( ) (
i i i i i i i
r x s a x c e + = =
2
1
|
|
\
|
ii
i
h
e
(4.2)
where a x c e
i i
c
i
= is the residual from the center of a fuzzy number and r x s e
i i
s
i
= is
the residual from the spreads of a fuzzy number. a and r are defined in (2.4).
Similarly, based on the Yang and Kos distance we obtained (see the derivations in
Appendix A.2)
2 2 2 2 2
) ( 2 ) ( 3 ) , (
s
i
c
i i i LR i
e l e y y d e + = = (4.3)
) , (
) (
2 2
) ( i i LR i
y y d e = =3
2
)
1
(
ii
c
i
h
e
+
2 2
)
1
( 2
ii
s
i
h
e
l
=
2
1
|
|
\
|
ii
i
h
e
(4.4)
From (4.2) and (4.4), the relation between
i
e and
) (i
e are the same as in general linear
regression model. That is, a large value of
) (i
e , indicates the
th
i data point could be an
outlier.
In order to derive a formula similar to the Cooks distance under the fuzzy
environment, we need to define a new type of distance between fuzzy vectors. Let
) (
LR
F denote the set of all LR-type fuzzy numbers, and
) (
~
LR
F = ( ) { } ) ( ' ,..., ,
2 1
LR i k
F X X X X is the set of all fuzzy k dimensional vectors.
Based on the distance definition in ) (
LR
F , we can define a new distance in ) (
~
LR
F .
Lemma 4.1 Let d : ) (
LR
F ) (
LR
F be a metric, for any two fuzzy vectors
)' ,..., , (
2 1 k
X X X = , = )' ,..., , (
2 1 k
Y Y Y ) (
~
LR
F , define
=
=
k
i
i i LR
Y X d d
1
2
) , ( ) , (
~
(4.5)
then
LR
d
~
is a metric in ) (
~
LR
F . If d is a complete metric then so does
LR
d
~
(see the proof
in Appendix 3).
When d is a simple metric, define Cooks distance
i
CD as follows:
i
CD =
2
) (
2
)
(
~
ks
d
i LR
=
2
2
) (
2
) (
ks
r r a a
i i
+
then we obtained (see the derivation in Appendix 4)
i
CD =
2
2
2
) 1 (
1
ii
ii i
h
h e
ks
(4.6)
where
k n
e
s
n
i
i
=
=1
2
2
and
2 2 2
) ( ) (
s
i
c
i i
e e e + = .
When d is Yang and Kos metric, define Cooks distance
i
CD as follows:
i
CD =
2
) (
2
)
(
~
ks
d
i LR
= { }
2
) ( ) (
2
) ( ) (
2
) (
2
( ) ( ) ( ) (
1
i i i i i
r l a r l a r l a r l a a a
ks
+ + + +
then we obtained (see Appendix A.4)
i
CD =
2
2
2
) 1 (
1
ii
ii i
h
h e
ks
(4.7)
where
k n
e
s
n
i
i
=
=1
2
2
and
2 2 2 2
) ( 2 ) ( 3
s
i
c
i i
e l e e + = .
Although the formulas (4.6) and (4.7) looks the same, the values of
2
i
e and
2
s are
different. In general,
2
s in (4.7) is larger than the value of
2
s in (4.6), therefore the Cook
distance calculated in (4.6) is larger than the Cook distance calculated in (4.7). From (4.6)
and (4.7) we knew that
i
CD is affected by the leverage value
jj
h and residual
i
e . This is
the same as in the traditional regression analysis.
Since we were not able to derive similar formulas as (4.1) (4.4) for Model II and III,
the best we can do is to delete a data point (per time) and recalculate the values of
) (i
e ,
i
CD , etc.
5. Data Analysis
In this section, we will use the Tanakas data (1987, see Table 1) to illustrate the
theoretical results which we obtained in the previous sections. The data set contains three
independent variables, one fuzzy response variable and ten data points. We only consider
exponential fuzzy response values. The advantage of using exponential membership
function is that we only need to choose appropriate m value ( Note: m is the mean value
of LR type fuzzy numbers) to reflect the distribution of response variable. If the values of
response variable tend to fall outside the interval of existing data then we choose smaller
m value. Otherwise, we will choose larger m value to describe the membership function.
Since we were not able to derive the least squares estimates for model III and we only
consider exponential membership function, we will use model I and II to do data analysis.
Tables 2 11 show the results of using the Euclidean distance, Yang and Kos distance
and different m values. In each table, it contains the least squares estimates, the sum of
squared residuals, the leverage value
jj
h , the values of
2
i
e and
2
) (i
e , and the COOK
distance,
i
CD . Since under the Euclidean simple distance formula, the m value will not
affect the results of using model I and II, therefore we only give the results of m=2 (see
Table 2 and 3).
Table 1: Tanakas Data (1987)
Case # Predictors
1 i
x
2 i
x
3 i
x
Fuzzy Response Variable
) , (
i i i
r c Y =
1 3 5 9 (96,42)
2 14 8 3 (120,47)
3 7 1 4 (52,33)
4 11 7 3 (106,45)
5 7 12 15 (189,79)
6 8 15 10 (194,65)
7 3 9 6 (107,42)
8 12 15 11 (216,78)
9 10 5 8 (108,52)
10 9 7 4 (103,44)
Table 2: Model I, m=2, Least-Squares Estimates Under Euclidean Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 14.69 41.25 0.25
2 (120,47) (122.48, 49.13) 0.43 10.67 32.44 0.21
3 (52,33) (49.36, 32.11) 0.41 7.75 21.90 0.13
4 (106,45) (104.82, 43.01) 0.26 5.35 9.75 0.04
5 (189,79) (191.79, 76.71) 0.55* 13.06 63.57* 0.52*
6 (194,65) (193.64, 67.67) 0.39 7.25 19.38 0.11
7 (107,42) (109.77, 40.85) 0.60* 9.08 55.55* 0.50*
8 (216,78) (211.65, 77.08) 0.42 19.73 58.34* 0.37
9 (108,52) (110.89, 53.24) 0.37 9.91 25.12 0.14
10 (103,44) (103.36, 42.58) 0.18 2.14 3.22 0.01
)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r 63 . 99
2
=
i
e
Table 3: Model II, m=2, Least-Squares Estimates Under Euclidean Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.86, 42.38) 0.40 4.71 11.08 0.36
2 (120,47) (122.04, 50.63) 0.43 17.34 31.44 0.38
3 (52,33) (50.11, 29.56) 0.41 15.41 92.87* 0.66*
4 (106,45) (104.13, 45.38) 0.26 3.65 6.55 0.35
5 (189,79) (193.31, 71.51) 0.55* 74.67* 90.38* 0.38
6 (194,65) (192.58, 71.30) 0.39 41.67 118.34* 0.63*
7 (107,42) (108.12, 46.55) 0.60* 21.97 38.00 0.39
8 (216,78) (211.71, 76.90) 0.42 19.64 78.55* 0.61*
9 (108,52) (112.48, 47.86) 0.37 37.44 82.31* 0.51
10 (103,44) (102.67, 44.96) 0.18 1.02 5.34 0.34
)' 40 . 5 , 62 . 7 , 43 . 3 , 14 . 3 ( = a , b
=0.29, d
=14.88,
2
i
e =236.98
Table 4: Model I, m=1.2, Least-Squares Estimates Under Yang and Kos Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 35.61 100.02 0.24
2 (120,47) (122.48, 49.13) 0.43 26.43 80.41 0.20
3 (52,33) ( 49.36, 32.11) 0.41 22.26 62.94 0.15
4 (106,45) (104.82, 43.01) 0.26 11.17 31.59 0.03
5 (189,79) (191.79, 76.71) 0.55* 32.71 159.18* 0.51*
6 (194,65) (193.64, 67.67) 0.39 12.99 34.72 0.08
7 (107,42) (109.77, 40.85) 0.60* 25.61 156.77* 0.55*
8 (216,78) (211.65, 77.08) 0.42 58.16 171.95* 0.42
9 (108,52) (110.89, 53.24) 0.37 27.84 70.53 0.16
10 (103,44) (103.36, 42.58) 0.18 3.95 5.94 0.01
)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a , )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r ,
2
i
e =256.73
Table 5: Model II, m=1.2, Least-Squares Estimates Under Yang and Kos Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (94.91, 37.94) 0.40 32.79 60.26 0.02
2 (120,47) (122.32, 49.77) 0.43 29.66 80.87 0.04
3 (52,33) (52.54, 19.65) 0.41 316.42* 360.47* 0.08
4 (106,45) (105.01, 42.30) 0.26 15.89 24.84 0.01
5 (189,79) (191.09, 79.46) 0.55* 13.48 63.15 0.04
6 (194,65) (190.68, 79.29) 0.39 394.20* 536.55* 0.07
7 (107,42) (108.98, 44.01) 0.60* 18.96 104.09 0.07
8 (216,78) (209.07, 87.22) 0.42 294.63* 498.27* 0.13*
9 (108,52) (112.82, 45.67) 0.37 140.58 207.55 0.05
10 (103,44) (103.52, 41.69) 0.18 10.51 13.72 0.002
)' 19 . 5 , 41 . 7 , 30 . 3 , 28 . 1 ( = a , b
=0.43, d
= - 3.03,
2
i
e =1267.13
Table 6: Model I, m=2, Least-Squares Estimates Under Yang and Kos Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 34.24 96.18 0.23
2 (120,47) (122.48, 49.13) 0.43 25.53 77.68 0.20
3 (52,33) (49.36, 32.11) 0.41 22.10 62.49 0.15
4 (106,45) (104.82, 43.01) 0.26 10.38 18.91 0.03
5 (189,79) (191.79, 76.71) 0.55* 31.66 154.08* 0.51*
6 (194,65) (193.64, 67.67) 0.39 11.57 30.93 0.07
7 (107,42) (109.77, 40.85) 0.60* 25.35 155.17* 0.55*
8 (216,78) (211.65, 77.08) 0.42 57.99 171.46* 0.43
9 (108,52) (110.89, 53.24) 0.37 27.53 69.75 0.16
10 (103,44) (103.36, 42.58) 0.18 3.55 5.34 0.01
)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a , )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r ,
2
i
e =249.91
Table 7: Model II, m=2, Least-Squares Estimates Under Yang and Kos Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (94.76, 37.78) 0.40 32.73 61.86 0.02
2 (120,47) (122.33, 49.76) 0.43 28.30 77.95 0.04
3 (52,33) (52.28, 19.29) 0.41 295.52* 336.34* 0.09
4 (106,45) (105.00, 42.22) 0.26 15.12 23.74 0.01
5 (189,79) (191.12, 79.67) 0.55* 14.15 67.62 0.06
6 (194,65) (190.93, 79.59) 0.39 362.74* 493.27* 0.07
7 (107,42) (109.07, 43.99) 0.60* 19.09 107.17 0.08
8 (216,78) (209.27, 87.57) 0.42 279.81* 476.79* 0.13*
9 (108,52) (112.65, 45.54) 0.37 130.24 194.90 0.05
10 (103,44) (103.59, 41.60) 0.18 10.06 13.27 0.002
)' 17 . 5 , 45 . 7 , 29 . 3 , 11 . 1 ( = a , b
=0.43, d
= - 3.45,
2
i
e =1187.75
Table 8: Model I, m=3, Least-Squares Estimates Under Yang and Kos Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 34.41 96.65 0.23
2 (120,47) (122.48, 49.13) 0.43 25.64 78.01 0.20
3 (52,33) (49.36, 32.11) 0.41 22.12 62.54 0.15
4 (106,45) (104.82, 43.01) 0.26 10.48 19.08 0.03
5 (189,79) (191.79, 76.71) 0.55* 31.79 154.70* 0.51*
6 (194,65) (193.64, 67.67) 0.39 11.74 31.39 0.07
7 (107,42) (109.77, 40.85) 0.60* 25.38 155.36* 0.55*
8 (216,78) (211.65, 77.08) 0.42 58.01 171.52* 0.43
9 (108,52) (110.89, 53.24) 0.37 27.57 69.85 0.16
10 (103,44) (103.36, 42.58) 0.18 3.60 5.41 0.01
)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a , )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r ,
2
i
e =250.74
Table 9: Model II, m=3, Least-Squares Estimates Under Yang and Kos Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (94.78, 37.79) 0.40 32.73 61.65 0.02
2 (120,47) (122.33, 49.76) 0.43 28.46 78.30 0.04
3 (52,33) (52.31, 19.34) 0.41 298.04* 339.26* 0.08
4 (106,45) (105.00, 42.23) 0.26 15.21 23.87 0.01
5 (189,79) (191.11, 79.64) 0.55* 14.06 67.06 0.05
6 (194,65) (190.90, 79.55) 0.39 366.55* 498.51* 0.07
7 (107,42) (109.06, 43.99) 0.60* 19.08 102.80 0.08
8 (216,78) (209.24, 87.52) 0.42 281.59* 479.11* 0.13*
9 (108,52) (112.67, 45.56) 0.37 131.49 196.43 0.05
10 (103,44) (103.59, 41.61) 0.18 10.11 13.32 0.002
)' 17 . 5 , 44 . 7 , 29 . 3 , 13 . 1 ( = a , b
=0.43, d
= - 3.39,
2
i
e =1197.32
Table 10: Model I, m=10, Least-Squares Estimates Under Yang and Kos Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (93.20, 44.62) 0.40 35.88 100.80 0.24
2 (120,47) (122.48, 49.13) 0.43 26.62 80.97 0.20
3 (52,33) (49.36, 32.11) 0.41 22.29 63.03 0.15
4 (106,45) (104.82, 43.01) 0.26 11.33 20.64 0.03
5 (189,79) (191.79, 76.71) 0.55* 32.92 160.21* 0.51*
6 (194,65) (193.64, 67.67) 0.39 13.28 35.49 0.08
7 (107,42) (109.77, 40.85) 0.60* 25.67 157.10* 0.54*
8 (216,78) (211.65, 77.08) 0.42 58.19 172.06* 0.42
9 (108,52) (110.89, 53.24) 0.37 27.90 70.69 0.15
10 (103,44) (103.36, 42.58) 0.18 4.02 6.06 0.01
)' 03 . 5 , 92 . 7 , 25 . 3 , 39 . 1 ( = a , )' 85 . 2 , 20 . 1 , 64 . 1 , 01 . 8 ( = r ,
2
i
e =258.10
Table 11: Model II, m=10, Least-Squares Estimates Under Yang and Kos Distance
Case #
) , (
i i
s c ) , (
i i
s c
ii
h
2
i
e
2
) (i
e
i
CD
1 (96,42) (94.93, 37.97) 0.40 32.80 59.96 0.01
2 (120,47) (122.31, 49.77) 0.43 29.93 81.46 0.04
3 (52,33) (52.59, 19.71) 0.41 320.68* 365.36* 0.08
4 (106,45) (105.01, 42.31) 0.26 16.05 25.07 0.01
5 (189,79) (191.09, 79.43) 0.55* 13.36 62.29 0.04
6 (194,65) (190.63, 79.23) 0.39 400.57* 545.32* 0.07
7 (107,42) (108.96, 44.02) 0.60* 18.93 103.48 0.07
8 (216,78) (209.03, 87.16) 0.42 297.67* 502.71* 0.13*
9 (108,52) (112.86, 45.70) 0.37 142.67 210.11 0.05
10 (103,44) (103.59, 41.70) 0.18 10.61 13.82 0.002
)' 19 . 5 , 40 . 7 , 30 . 3 , 31 . 1 ( = a , b
=0.43, d
= - 2.96,
2
i
e =1283.29
5.1 Discussion
From Table 2 and 3, the estimates of center and spread under model I are better than
those estimates in model II. In theory, if we useYang and Kos distance, the estimates of
center and spread under model II should be affected by the value of m. But, based on
Tables 5,7,9,11, we found that different m values do not affect very much on the
estimates.
In theory, the distance formula and m values do not affect the estimates of model I
parameters. But, they do affect the parameter estimates in model II. Based on Tables 3
and 5, the usage of different formula has more effect on the parameter estimates in model
II.
Case #5 and #7 have larger leverage values
ii
h , they are possible outliers from the
predictors. In model I, based on the value of
i
e it seems no possible outliers from the
response variable. However, based on the values of
) (i
e in Tables 2,4,6,8, case #5,7,8 are
possible outliers from the response variable. In model II under the Euclidean distance,
Tale 3 shows that case #3,5,6,8,9 are the five possible outliers from the response variable.
But, under the Yang and Kos distance, Tables 5,7,9,11 show that only case #3,6,8 are the
three possible outliers from the response variable.
Under model I, based on Tables 2,4,6,8,10, case #5,7 have larger
i
CD values and they
are influential observations. Under model II and Euclidean distance, table 3 shows that
case #3,6,8 have larger
i
CD values. But, in model II and use Yang and Kos distance,
only case #8 has large
i
CD value and is an influential point (see Tables 5,7,9,11).
If we use exponential membership function for our fuzzy numbers and useYang and
Kos distance, how to best choose the m value to do fuzzy liner regression under model
II? The simplest rule is to choose the m value such that the residual sum of squares,
2
i
e , is
smallest. Based on tables 5,7,9,11, we can see the best choice is m=2.
APPENDIX
A.1: The derivation of a , b
, and d
in (3.8)
2
D =
2 2 2
) ( )] 1 ( [ ) ( )] 1 ( [ lS C d ab l a lS C d ab l a C a + + + + + +
= S ba l bda l a a b l C a C C C a a C a a ' ' 2 1 ' ' 2 ' ' ' ' 2 ' ' ' ' ' '
2 2 2 2
+ + + +
+ S S l d S l a bS l C C a C S d l d l a bd l ' 2 1 ' 2 ' 2 ' 2 ' 2 ' 1 2 1 ' 1 2 ' 1 2
2 2 2 2 2 2 2
+ + +
= S ba l bda l a a b l C C a C a a ' ' 4 1 ' ' 4 ' ' 2 ' 3 ' 6 ' ' 3
2 2 2 2
+ + +
+ S S l S dn l nd l ' 2 4 2
2 2 2 2
+
Let 0
2
=
a
D
, 0
2
=
b
D
and 0
2
=
d
D
, we obtained
0
2
=
a
D
= S b l bd l a b l C a ' 2 1 ' 2 ' 2 ' 3 ' 3
2 2 2 2
+ + (A.1.1)
0
2
=
b
D
= S a da a ba ' ' 1 ' ' ' ' + (A.1.2)
0
2
=
d
D
= S nd ba ' 1 1 ' ' + (A.1.3)
From (A.1.1), we obtained
a = ) 1 '
2 1 '
2 '
2 ' 3 ( ) ' (
2 3
1
2 2 2 2 1
2 2
X C b l S b l S b l C
b l
+ +
+
, substituting a into
(A.1.2) and (A.1.3), we obtained
0
2
=
b
D
= d C n C dn b l S C b l C b 9 6 '
9
2 2 2 2
2
+ +
+ b S l S C nbd l bd S n l
2
2 2 2 2
6
' 9 6 12 (A.1.4)
0
2
=
d
D
= S d C b + (A.1.5)
where C C
' ) ' (
1
=
, S S
' ) ' (
1
=
, C C
n
= ' 1
1
, S S
n
= ' 1
1
.
From (A.1.5), we obtained C b S d
= and substituting it into (A.1.4) we obtained a
quadratic equation of b, 0
3 2
2
1
= + + K b K b K . The solution is
b
=
1
3 1
2
2 2
2
4
K
K K K K
A.2: The derivation of (4.2), (4.3), and (4.4)
I. Based on Euclidean distance formula, we have
2
) (
2
) (
2
) (
) ( ) (
i i i i i i i
r x s a x c e + =
since
' 1
) (
) ' (
1
i
ii
c
i
i
x
h
e
a a
= , and
' 1
) (
) ' (
1
i
ii
s
i
i
x
h
e
r r
= therefore
2
) (i
e =
2 ' 1 2 ' 1
) ) ' (
1
( ) ) ' (
1
(
i i
ii
s
i
i i i i
ii
c
i
i i
x x
h
e
r x s x x
h
e
a x c
+ +
+
=
2 2
)
1
( )
1
(
ii
s
i
ii
c
i
h
e
h
e
=
2
1
|
|
\
|
ii
i
h
e
II. Based on Yang and Kos distance formula, we have
2
i
e =
2 2 2
)] ( ) [( )] ( ) [( ) ( r lx a x ls c r lx a x ls c a x c
i i i i i i i i i i
+ + + +
=
2 2
)] ( [ 2 ) ( 3 r x s l a x c
i i i i
+
=
2 2 2
) ( 2 ) ( 3
s
i
c
i
e l e +
2
) (i
e =
2
) (
'
) (
' 2
) (
'
) (
2
) (
)] ( ) [( )] ( ) [( ) (
i i i i i i i i i i i i i i i
r lx a x ls c r lx a x ls c a x c + + + +
=
2
) (
2
) (
)) ( ( 2 ) ( 3
i i i i i i
r x l ls a x c +
=3
2
)
1
(
ii
c
i
h
e
+
2 2
)
1
( 2
ii
s
i
h
e
l
=
2
1
|
|
\
|
ii
i
h
e
A.3: Proof of Lemma 4.1
In order to prove
LR
d
~
is a metric, we need to prove the following three properties:
1. , ) (
~
LR
F , 0 ) , (
~
LR
d . If ) , (
~
LR
d =0 then = .
2. , ) (
~
LR
F , ) , (
~
LR
d = ) , (
~
LR
d .
3. , , ) (
~
LR
F , ) , (
~
LR
d ) , (
~
LR
d + ) , (
~
LR
d .
Since d is a metric, its easy to show that properties 1 and 2 are satisfied. We need
to show that property 3 is satisfied:
) , (
~
LR
d =
=
k
i
i i
Y X d
1
2
) , (
=
k
i
i i
Z X d
1
2
) , ( +
=
k
i
i i
Y Z d
1
2
) , (
) , (
~
LR
d + ) , (
~
LR
d +2
=
k
i
i i
Z X d
1
2
) , (
=
k
i
i i
Y Z d
1
2
) , (
=( )
2
) , (
~
) , (
~
+
LR LR
d d
Therefore, ) , (
~
LR
d ) , (
~
LR
d + ) , (
~
LR
d .
Assume that ( ) (
LR
F , )
LR
d is a complete metric. Let { }
1 m
m
be a Cauchy
sequence in ) (
~
LR
F , i.e., 0 > , l l m m > ' , < ) , (
~
' m m
LR
d . Then,
for l m m > ' , , ) , (
' m
j
m
j
X X d <
=
k
i
m
i
m
i
X X d
1
' 2
) , ( = < ) , (
~
' m m
LR
d . Hence,
k j 1 , { }
=1 m
m
j
X is a Cauchy sequence in ) (
LR
F . Therefore,
j
X ) (
LR
F ,
j
m
j
X X . Let )' ,..., , (
2 1 k
X X X = . Since
j
m
j
X X for 0 > ,
j
n
for
j
n m > , we have
k
X X d
j
m
j
< ) , ( , k j ,..., 2 , 1 = . Let { }
k
n n n n ,..., , max
2 1
= ,
Then for n m > , we have ) , (
~
m
LR
d = <
=
k
i
i
m
i
X X d
1
2
) , ( . That is,
m
.
A.4: The derivation of equations (4.6) and (4.7)
Under the Euclidean distance: )
(
~ 1
) (
2
i LR i
d
ks
CD = =
=
n
i
i i
Y Y d
ks
1
) (
2
2
)
(
1
=
)
`
+
= =
n
i
n
i
i i i i i i
r x r x a x a x
ks
1 1
2
) (
2
) (
2
) ( ) (
1
=
)
`
ii
ii
s
i
ii
ii
c
i
h
h
e
h
h
e
ks
2 2
2
)
1
( )
1
(
1
=
2
2
2
) 1 (
1
ii
ii i
h
h e
ks
Under Yang and Kos distance: )
(
~ 1
) (
2
i LR i
d
ks
CD = =
=
n
i
i i
Y Y d
ks
1
) (
2
2
)
(
1
=
= =
+
n
i
i i i i i i
n
i
i i i
r lx a x r lx a x a x a x
ks
1
2
) ( ) (
1
2
) (
2
)] ( ) [( ) ( {
1
+
=
+ +
n
i
i i i i i i
r lx a x r lx a x
1
2
) ( ) (
} )] ( ) [(
=
|
|
\
|
+
|
|
\
|
ii
ii
s
i
ii
ii
c
i
h
h
e
l h
h
e
ks
2
2
2
2
1
2
1
3
1
=
2
2
2
) 1 (
1
ii
ii i
h
h e
ks
REFERENCES
1. D. Dubois, H. Prade, Fuzzy Sets and Systems: Theory and Applications, Academic
Publishers, New York, 1980.
2. Zimmermann, H. J., (1996), Fuzzy Set Theory and Its Applications, Kluwer
Academic Press, Dordrecht.
3. Draper, N. R. and Smith, H. (1980), Applied Regression Analysis, Wiley, New
York.
4. DUrso, P. and Gastaldi, T. (2000), A Least-squares Approach to Fuzzy Linear
Regression Analysis, Computational Statistics and Data Analysis 34, 427-440.
5. DUrso, P., (2003), Linear Regression Analysis for Fuzzy/Crisp Input and
Fuzzy/Crisp Output Data, Computational Statistics and Data Analysis 42, 47-72.
6. Tanaka, H., (1987), Fuzzy Data Analysis by Possibilistic Linerar Models, Fuzzy Sets
and Systems 24, 363-375.
7. Tanaka, H., Uejima, S., Asai, K., (1982), Fuzzy Linear Regression Model, IEEE
Trans. Systems Man Cybernet 12, 903-907.
8. Xu, R. and Li, C., (2001), Multidimensional Least-squares Fitting With a Fuzzy
Model, Fuzzy Sets and Systems 119, 215-223.
9. Yang, M. S. and Ko, C. H., (1996), On a Class of c-numbers Clustering Procedures
for Fuzzy Data, Fuzzy Sets and Systems 84, 49-60.
10. Yang, M. S. and Liu, H. H., (2003), Fuzzy Least-squares Algorihms for Interactive
Fuzzy Linear Regression Models, Fuzzy Sets and Systems 135, 305-316.
11. PENA, Daniel (2005), A New Statistics for Influence in Linear Regression,
Technometrics VOL. 47, NO. 1, 1-12.