Integrating Rough Set Theory and Fuzzy Neural Network To Discover Fuzzy Rules

Intelligent Data Analysis 7 (2003) 59–73 59
IOS Press
Integrating rough set theory and fuzzy neural

network to discover fuzzy rules
Shi-tong Wanga , Dong-jun Yub and Jing-yu Yangb

a Department of Computer Science, School of Information, Southern Yangtse University, Jiangsu, P.R.
China, 214036
b Department of Computer Science, Nanjing University of Science & Technology, Nanjing, Jiangsu, P.R.
China 210094
Received 15 April 2002

Revised 15 June 2002
Accepted 25 June 2002
Abstract. Most of fuzzy systems use the complete combination rule set based on partitions to discover the fuzzy rules, thus
often resulting in low capability of generalization and high computational complexity. To large extent, the reason originates
from the fact that such fuzzy systems do not utilize the field knowledge contained in data. In this paper, based on rough set
theory, a new generalized incremental rule extraction algorithm (GIREA) is presented to extract rough domain knowledge,
namely, certain and possible rules. Then, fuzzy neural network FNN is used to refine the obtained rules and further produce the
fuzzy rule set. Our approach and experimental results demonstrate the superiority in both rule’s length and the number of fuzzy
rules.
Keywords: Rough set, fuzzy set, neural networks, incremental rule extraction
1. Introduction
In real world, almost every question will finally lead to process data that has characteristics of
uncertainty, imprecision. To date, many scholars have developed all kinds of approaches, such as neural
network [1], fuzzy systems [2], rough set theory [3], genetic algorithm etc. Each approach has its own
advantages and disadvantages. In order to provide more flexible and robust information processing
system, using only one approach is not enough. There is already a trend to integrate different computing
paradigms such as neural network, fuzzy systems, rough set theory, genetic algorithm and so on to
generate more efficient hybrid systems such as neural-fuzzy systems [4].
Typically, fuzzy neural network (namely, FNN) embodies both advantages of neural networks (namely,
NN) and fuzzy systems. In other words, FNN can be used to construct knowledge-based NN. i.e. human-
being’s field knowledge can be incorporated into NN, so FNN can be more suitable for the question to
be solved. But there still exist questions. For example, in some circumstances, people even can’t derive
appropriate rules to a given system. Of course, we can divide every input dimension into several fuzzy
subsets, and then all fuzzy subsets in every input dimension are combined to construct the complete rule
set. However, such kind of FNN contains no field knowledge, i.e. this kind of FNN may not fit for the
given system at the very beginning. Recent years, rough set theory has been attracting more and more
1088-467X/03/$8.00  2003 – IOS Press. All rights reserved

60 S.-t. Wang et al. / Integrating rough set theory and fuzzy neural network to discover fuzzy rules
attentions and used in various applications, due to its excellent capability of extracting knowledge from
data. In this paper, we will first apply rough set theory to extract certain and possible rules which then
are used to determine the initial structure of FNN such that the FNN here works in the beginning with
this type of useful knowledge.
As to fuzzy rule extraction, there are two important problems worthy to study. One is how to extract a
rule set from data. The other is how to refine/simplify the obtained rule set. Several approaches [1] can
be applied to extract rules from data, such as fuzzy rule extraction based on product space clustering,
fuzzy rule extraction based on ellipsoidal covariance learning, fuzzy rule extraction based on direct
matching, etc. Fuzzy rule simplification approach [12] based on similarity measure can effectively
reduce the number of fuzzy rules by merging similar fuzzy sets in fuzzy rules. This paper aims at solving
the above two problems in a different aspect. The contribution of our approach here mainly exists in
effectively integrating rough set theory and FNN together to discovery fuzzy rules from data. Concisely,
this approach first extracts certain and possible rules from data in an incremental mode by using the new
generalized incremental rule extraction algorithm GIREA, then applies the FNN to refine/simplify the
extracted fuzzy rules.
This paper are organized as follows: Section II gives a brief description of fuzzy system and FNN.
Section III introduces basic concepts of rough set theory. In Section IV, new generalized incremental
fuzzy rule extraction algorithm GIREA is presented. Section V deals with the method of mapping
fuzzy rule set to the corresponding FNN. Simulation results are demonstrated in Section VI. Section VII
concludes this paper.
2. Fuzzy system and its fuzzy neural network
Generally speaking, a fuzzy system consists of a set of fuzzy rules as follows [5]:
Rule 1: if x1 is A11 and x2 is A12 and . . . xn is A1n , then y is B 1

Rule 2: if x1 is A21 and x2 is A22 and . . . xn is A2n , then y is B 2
..
.
..
.
Rule N: if x1 is AN N N
1 and x2 is A2 and . . . xn is An , then y is B
N
Fact: x1 is A1 and x2 is A2 and . . . xn is An
Conclusion: y is B .
With max-product inference and centroid defuzzification, the final output of this fuzzy system can be
written as:

µB (y)ydy
ŷ = (1)
µB (y)dy
n n

N
where µB (y) = ∨ µAI (xi ) · ∨ µAj (xi ) · µB j (y) .
x1 ,x2 ,...,xn j=1 i
i=1 i=1
Dr. L.X. Wang [6] has proved that Eq. (1) is a universal approximator.
S.-t. Wang et al. / Integrating rough set theory and fuzzy neural network to discover fuzzy rules 61
Fig. 1. The FNN implementation of the fuzzy system.
In practice, one can often consider that the output fuzzy sets B j are singleton β j , i.e.,

1, if (y = β j ), j = 1, 2, . . . , N
µB (y) =
(2)
0, otherwise, j = 1, 2, . . . , N
thus, we have
 n
 µ (x ), if (y = β j ),

Aji i
µB j (y) = j = 1, 2, . . . , N (3)

 i=1
0, otherwise
then the final output can be rewritten as follows:
N
n

βj µAj (xi )
i
j=1 i=1
ŷ = N
n (4)

µAj (xi )
i
j=1 i=1
The I/O relationship of the fuzzy system defined in Eq. (4) can be implemented by a corresponding
FNN. The FNN consists of four components. They are input layer, fuzzification layer, inference layer
and defuzzification layer as shown in Fig. 1.
General speaking, FNN can be utilized in two modes, one is series-parallel mode and the other is
parallel mode [13,14], see Figs 2(a) and (b), where TDL represents time delayed logic, RS represents the
real system, FNN represents fuzzy neural network, u k is the activation function, yk and ŷk are outputs
of the RS and the FNN, respectively, e k is the difference between y k and ŷk . Figure 2(a) can be called
series-parallel mode and Fig. 2(b) parallel mode. When the FNN works in series-parallel mode, all the
delayed output data (used as the input data of the FNN) are the observation data of the real system. In this
circumstance, high observation precision is needed; too much observation noise will greatly degrade the
performance of the FNN. While in parallel mode, all the delayed output data (used as the input data of the
FNN) are independent to the observation data of the real system, and only relate to the FNN itself. No
(a) (b)
Fig. 2. Two modes that FNN can be applied. (a) series-parallel mode (b) parallel mode.
matter in which kind of mode, when the FNN approximates the real system well enough, it can be applied
independently. FNN has been widely used but there still exists a question as we described in Section 1,
i.e., when there is no any prior field knowledge, how can people get appropriate rules to construct FNN
to reduce its searching space and time. The rest parts of this paper try to solve this problem.
3. Rough set, decision matrix and rule extraction
3.1. Basic concepts of rough sets
Here, we just introduce some necessary concepts needed in this paper. For details, please refer to [3].
An information system K = (U, C ∪ D), where U denotes the domain of discourse, C denotes a
non-empty condition attribute set, and D denotes a non-empty decision attribute set. Let A = C ∪ D,
an attribute a(a ∈ A) can be regarded as a function from the domain of discourse U to value set V al a .
An information system may be represented in the form of attribute-value table, in which rows are
labeled by objects in the domain of discourse, and columns by the attributes.
For every subset of attributes B ⊆ C , equivalence relation I B on U can be defined as:
IB = {(x, y) ∈ U : for every a ∈ B, a(x) = a(y)} (5)
thus, the equivalence class of the object x ∈ U relative to I B can be defined as:
[x]B = {y|y ∈ U, yBx} (6)
Equivalence class can also be called indiscernible class, because any two objects in equivalence class
are indiscernible.
Low and upper approximation are another two important concepts in rough set theory. Given subsets
x ⊆ U, B ⊆ C , X ’s B-Lower and B-Upper approximations can be defined as BX{x ∈ U : [x] B ⊆ X}
and BX = {x ∈ U : [x]B ∩ X
= Φ}, respectively. Boundary set BN B (X) can be defined as
BNB (X) = BX − BX . If BNB (X)
= Φ, i.e. BX
= BX , then X is B rough, otherwise, X is B
exact.
3.2. Rule extraction using decision matrix
Decision matrix is a generalized form of rough set theory. The concept of decision matrix is derived
from descernibility matrices [8], it can be used to compute decision rules and reducts of information
system. It provides a way to generate the simplest set of rules and preserve all classification information
simultaneously [9].
Table 1
Consistent information table
Attributes Decision
Headache Temperature Flu
Object1 Yes Normal No
Object2 Yes High Yes
Object3 Yes Very High Yes
Object4 No Normal No
Object5 No High No
Object6 No Very High Yes
Table 2
Decision matrix for class 0 (flu infected)
Class j 1 2 3
i OBJ Obj1 Obj4 Obj5
1 Obj2 (T,1) (T,1)(H,0) (H,0)
2 Obj3 (T,2) (T,2)(H,0) (T,2)(H,0)
3 Obj6 (H,1)(T,2) (T,2) (T,2)
3.2.1. Rule extraction from consistent information table

Let us introduce decision matrix first. For an information system K = (U, C ∪ D), suppose U
be divided into m classes (c 1 , c2 , . . . , cm ) by equivalence relation defined on D. Given any class
c ∈ (c1 , c2 , . . . , cm ), all objects which belong to and do not belong to this class are numbered with
subscripts i(i = 1, 2, . . . , γ) and j(j = 1, 2, . . . , ρ), respectively. The decision matrix M (K) = (M ij )
of information system K is defined as a γ×ρ matrix, whose entry at position (i, j) is a set of attribute-value
pair:
Mij = {(a, a(i)) : a(i)
= a(j)}, (i = 1, 2, . . . , γ; j = 1, 2, . . . , ρ), (7)
where a(i) is a value of attribute a.
For a given object i(i = 1, 2, . . . , γ) belonging to class c ∈ (c 1 , c2 , . . . , cm ), we can compute its
minimal-length decision rule
|Bi | =∧ ∨Mij , (8)
j
where ∧ and ∨ are generalized conjunction and disjunction operator respectively. So for the given class
c ∈ (c1 , c2 , . . . , cm ), its decision rule set can be represents as following
RU L = ∪|Bi |, (i = 1, 2, . . . , γ) (9)
Let H represent Headache, T and F represent Temperature and Flu, respectively.
V ALH = {0, 1} represents V ALHeadache = {Yes,No}.
V ALT = {0, 1, 2} represents V ALTewmperature = {Normal,High,Very High}.
V ALF = {0, 1} represents V ALFlu = {Yes,No}.
Tables 2 and 3 demonstrate the decision matrix for class 0 (Flu infected) and 1 (not infected),
respectively.
Let |Bi0 |(i = 1, 2, 3) denotes the i-th minimal-length rule in decision matrix of class 0.
So,
|B10 | = (T, 1) ∧ ((T, 1) ∨ (H, 0)) ∧ (H, 0) = (T, 1) ∧ (H, 0)
Table 3
Decision matrix for class 1 (flu not infected)
Class j 1 2 3
i OBJ Obj2 Obj3 Obj6
1 Obj1 (T,0) (T,0) (H,0)(T,0)
2 Obj4 (T,0)(H,1) (T,0)(H,1) (T,0)
3 Obj5 (H,1) (H,1)(T,1) (T,1)
Table 4
Inconsistent information table
Attributes Decision
Headache Temperature Flu
Object1 Yes Normal No
Object2 Yes High Yes
Object3 Yes Very High Yes
Object4 No Normal No
Object5 No High No
Object6 No Very High Yes
Object7 No High Yes
Object8 No Very High No
|B20 | = (T, 2) ∧ ((T, 2) ∨ (H, 0)) ∧ ((T, 2) ∨ (H, 0)) = (T, 2)

|B30 | = ((T, 2) ∨ (H, 1)) ∧ (T, 2) ∧ (T, 0) = (T, 2)
Similarly, the i-th minimal-length rule in decision matrix of class 1 can be compute as following:
|B11 | = (T, 0) ∧ (T, 0)((T, 0) ∨ (H, 0)) = (T, 0)

|B21 | = ((T, 0) ∨ (H, 1)) ∧ ((T, 0) ∨ (H, 1)) ∧ (T, 0) = (T, 0)
|B31 | = (H, 1) ∧ ((T, 1) ∨ (H, 1)) ∧ (T, 1) = (T, 1) ∧ (H, 1)
The final minimal-length decision rule set for class 0 and class 1 can be represented as
RU L0 = (T, 2) ∨ ((T, 1) ∧ (H, 0))

RU L1 = (T, 0) ∨ ((T, 1) ∧ (H, 1))
3.3. Rule extraction from inconsistent information table using decision matrix
In real-life applications, consistent information table often does not exist, so, inconsistent information
has to be coped with.
Suppose we add Object 7 and Object 8 into Table 1 and then get Table 3. Table 3 is an inconsistent
information table for there exist some Objects that have the same condition attribute value and whose
corresponding decision attribute values are different. For example, Object5 and Object7 have the same
condition attribute value, but they have different decision attribute values.
From Table 3, we can get two concepts X 1 = {Object2, Object3, Object6, Object7} and X 2 =
{Object1, Object4, Object5, Object8}, representing flu infected and flu not infected, respectively. These
two concepts are rough because neither of them is definable. In order to extract rules from inconsistent
information table, low and upper approximations are needed. Rules extracted from low approximation
are certain rules. Rules extracted from upper approximation are possible rules.
Table 5
Decision matrix for computing concept X1 ’s certain rules
Class j 1 2 3 4 5 6
i Object Object1 Object4 Object5 Object6 Object7 Object8
1 Object2 (T,1) (H,0)(T,1) (H,0) (H,0)(T,1) (H,0) (H,0)(T,1)
2 Object3 (T,2) (H,0)(T,2) (H,0)(T,2) (H,0) (H,0)(T,2) (H,0)
Table 6
Decision matrix for computing concept X1 ’s pos-
sible rules
Class j 1 2
i Object Object1 Object4
1 Object2 (T,1) (T,1)(H,1)
2 Object3 (T,2) (T,2)(H,1)
3 Object5 (H,1)(T,1) (T,1)
4 Object6 (H,1)(T,2) (T,2)
5 Object7 (H,1)(T,1) (T,1)
6 Object8 (H,1)(T,2) (T,2)
Firstly, we compute concept X 1 and X2 ’s low and upper approximation:

BX1 = {Object2, Object3}
BX2 = {Object1, Object4}
BX1 = {Object2, Object3,Object5, Object6,Object7, Object8}
BX2 = {Object1, Object4,Object5, Object6,Object7, Object8}
Let |Bi0 |certain (i = 1, 2) denote the i-th minimal-length certain rule in decision matrix of class 0.
Using method proposed in Section 4.1, we can compute certain rules for concept X 1 (class 0 ) as
follows:
|B10 |certain = (T, 1) ∧ ((T, 1) ∨ (H, 0)) ∧ (H, 0) ∧ ((T, 1) ∨ (H, 0)) ∧ (H, 0) ∧ ((T, 1) ∨ (H, 0))
= (T, 1) ∧ (H, 0)
|B20 |certain = (T, 2) ∧ ((T, 2) ∨ (H, 0)) ∧ ((T, 2) ∨ (H, 0)) ∧ (H, 0) ∧ ((T, 2) ∨ (H, 0)) ∧ (H, 0)
= (T, 2) ∧ (H, 0)
thus, we obtain certain rule set for class 0:
RU L0certain = ((T, 1) ∧ (H, 0)) ∨ ((T, 2) ∧ (H, 0))
In order to obtain certain rules, we define its belief function df = 1. In other words, rules with df = 1
are positively believable.
Let |Bi0 |possible denote the i-th minimal-length certain rule in decision matrix of class 0. Similarly, we
can use the same method to compute possible rules for concept X 1 from Table 6 as follows:
|B10 |possible = (T, 1) ∧ ((T, 1) ∨ (H, 1)) = (T, 1),
|B20 |possible = (T, 2) ∧ ((T, 2) ∨ (H, 1)) = (T, 2)
|B30 |possible = ((T, 1) ∨ (H, 1)) ∧ (T, 1)) = (T, 1),
|B40 |possible = ((T, 2) ∨ (H, 1)) ∧ (T, 2) = (T, 2)

|B50 |possible = ((T, 1) ∨ (H, 1)) ∧ (T, 1)) = (T, 1),
|B60 |possible = ((T, 2) ∨ (H, 1)) ∧ (T, 2) = (T, 2)
thus, we can obtain possible rule set for class 0 as follows
RU L0possibel = (T, 1) ∨ (T, 2) ∨ (T, 1) ∨ (T, 2) ∨ (T, 1) ∨ (T, 2)

= (T, 1) ∨ (T, 2)
For possible rules, we define their belief function
card(BX − BX)
df = 1 −
card(U )
where card(·) denotes the cardinality of the set. In other words, possible rules are believable with degree
df, 0 < df < 1. The rationale of this definition is intuitive: The more the difference between BX and
BX is, the more inexact the concept X is, thus the belief degree of the possible rules extracted from X
should be decreased accordingly. When BX approaches to BX , df will approach to 1.
Similarly, we can compute concept X 2 ’s certain and possible rules.
4. New Generalized Incremental Rule Extracting Algorithm (GIREA)
Suppose we have extracted certain and possible rules from an information table, when new objects are
added into it, the rule set may be changed. In this circumstance, incremental rule extraction algorithm is
required; otherwise it will take much more long time to re-compute rule set from the very beginning. It
should be pointed out that the incremental rule extraction algorithm in [9] did not compute certain and
possible rules and cope with consistent information table simultaneously. However, the new generalized
incremental rule extraction algorithm (GIREA) is presented here, which can not only deal with both
consistent and inconsistent information table, but also it can extract certain and possible rule sets at the
same time, although GIREA is a generalization of the algorithm presented in [9]. The main idea of this
new algorithm can be summarized as follows:
Given a new added Object:
– Whether this new added Object causes a new concept or not? If it does, update concept set.
– Collision detection: Objecta collides with Objectb , if and only if Objecta and Objectb have the
same condition attribute values, and their corresponding decision attribute values are different. For
example, Object6 and Object8 collide with each other (in Section 4.3).
– Update certain and possible rule sets in terms of collision detection.
Using this algorithm, when a new object is added up to information system, it is unnecessary to
re-compute rule sets from the very beginning, we can update rule sets by partly modifying original rule
sets, so a lot of time are saved, it is especially useful when extracting rules from large databases.
GIREA Algorithm:
Condition: Rule set and concept set (X = {X 1 , X2 , . . . , Xγ }) which have been computed from the
given information system. A new object Object new is added up to information system.
BEGIN
STEP 1.
Determine which concept the new added object belongs to, if it does not belong to any concept in
concept set X = {X1 , X2 , . . . , Xγ }, create a new concept X γ+1 and add it to X , i.e. X = X ∪ {Xγ }
STEP 2.
// Collision detection
IF (the new object Objectnew collides with original objects in information table)
FLAG = 1;
ELSE
FLAG = 0;
STEP 3.
Get a concept Xi from X , and X = X − {Xi }.
IF(FLAG = 0) // no collision
{
IF (Val(Xi ) = Val(Objnew ))
{ add up a new row for concept X i ’s certain and possible decision matrix respectively
(labeled with 1 and 2 respectively),
(Mk1j ) = {(a, a(k1))|a(k1)
= a(j)}
(Mk2j ) = {(a, a(k2))|a(k2)
= a(j)}
compute decision rule for the added row respectively:
|Bk1 | =∧ ∨Mk1j
j
|Bk2 | =∧ ∨Mk2j
j
update concept X i ’s certain and possible rule sets as follows
RU Licertain = RU Licertain ∪ |Bk1 |
RU Lipossible = RU Lipossible ∪ |Bk2 |
}
ELSE
{
add a new column for concept ’s certain and possible decision matrix respectively
(labeled with 1 and 2 respectively),
(Mk1j ) = {(a, a(i))|a(i)
= a(k1)}
(Mk2j ) = {(a, a(i))|a(i)
= a(k2)}
compute decision rule for every row respectively:
|Bi |certain = |Bi |certain ∨ Mik1
|Bi |possible = |Bi |possible ∨ Mik2
RU Licertain = ∪|Bi |certain
RU Lipossible = ∪|Bi |possible
}
}
ELSE //collision detected
{ IF (Objectnew collides with Object∗ which exists in concept X i ’s low approximation)
{ delete the row which contains Object ∗ from certain decision matrix of concept
Xi (labeled with l).

Update certain rule set as follows:
RU Licertain = RU Licertain − |Bl |certain .
Then add a new column to certain decision matrix of concept
Xi (labeled with k).
Update every row’s decision rule as follows:
|Bi |certain = |Bi |certain ∨ Mik
Update final certain rule set as follows:
add a new row to possible decision matrix of concept X i (labeled with k):
(Mkj ) = {(a, a(k))|a(k)
= a(j)}
compute possible decision rule for this line:
|Bk |possible =∧ ∨Mkj
j
update final possible set as follows:
RU Lpossible = RU Lipossible ∪ |Bk |possible
}
ELSE IF(Val(Xi ) = Val(Objectnew ))
{ add a new column to certain decision matrix of concept X i (labeled with k).
(Mik ) = {(a, a(i))|a(i)
= a(k)}
update every row’s decision rule as follows:
|Bi |certain = |Bi |certain ∨ Mik
update final certain rule set as follows:
delete the column which contains Object ∗ from possible decision matrix of
concept Xi and add a new row (Object new ) to it;
calculate each row’s possible rule |B i |possible ;
calculate RU Lipossible as: RU Lipossible = ∪|Bi |possible
}
ELSE
{ add a new column for concept ’s certain and possible decision matrix respectively
(labeled with k1 and k2 respectively),
(Mik1 = {(a, a(i))|a(i)
a(k1)}
(Mik2 = {(a, a(i))|a(i)
a(k2)}
compute decision rule for the added column respectively:
|Bi |certain = |Bi |certain ∨ Mik1
|Bi |possible = |Bi |possible ∨ Mik2
RU Lipossible = ∪|Bi |possible
}
}
}
STEP 4.
IF (X = Φ)
GOTO STEP 3.
ELSE
STOP.
END
A question one may raise here is that when a new object is added to the domain of discourse U , the
cardinality of U will change, thus the belief degrees of possible rules must be recomputed, this will
affect the entire learned rule set, thereby making the algorithm not incremental. We analyze it as follows:
according to the definition of belief function in Section 4.3, the belief degrees of possible rules extracted
from the same concept are equal. When a new object is added, recomputing each concept’s belief
function can get the belief degrees of all possible rules. Moreover, the incrementability of the proposed
algorithm is acquired by properly modifying the already existing rules; belief degree recomputation is just
small part work of this kind modification. Compared with the computational cost of rule modification,
computational cost of belief degree is rather small.
5. Mapping rules into the FNN
When certain and possible rules are extracted from information table, we need to map them into the
corresponding FNN just like mapping fuzzy rules to FNN, which is described in Section 2.
Taking the rules extracted in Section 3.2.2 as an example, there are 3 certain rules and 3 possible rules
in the rule set as follows:
Certain rules:
RU L0certain = ((T, 1) ∧ (H, 0)) ∨ ((T, 2) ∧ (H, 0))
RU L1certain = (T, 0)
Possible rules:
RU L0possible = (T, 1) ∨ (T, 2)
RU L1possible = (H, 1)
We can describe these rules in the form of natural language as follows:
(1) If Temperature is High And Headache is Yes, Then the Flu is Infected. (df 1 = 1)
(2) If Temperature is Very High And Headache is Yes, Then the Flu is Infected. (df 2 = 1)
(3) If Temperature is Normal, Then the Flu is not Infected. (df 3 = 1)
Rules (1), (2) and (3) are certain rules, the belief degrees (df ) of which are all 1, i.e., these certain rules
are definitely believable.
(4) If Temperature is High, Then the Flu is Infected. (df 4 = 0.5)
(5) If Temperature is Very High, Then the Flu is Infected. (df 5 = 0.5)
(6) If Headache is No, Then the Flu is not Infected. (df 6 = 0.5)
Rules (4), (5) and (6) are possible rules, the belief degrees (df ) of which lie between 0 and 1, i.e., these
possible rules are partially believable.
As there are two kinds of rules (certain and possible), thus the inference layer of the corresponding
FNN consists of two parts as shown in Fig. 3, one is certain part, which contains certain rules, and the
other is possible part, which contains possible rules.
the FNN as shown in Fig
Fig. 3. Mapping rules to FNN.
Let dfi be the belief degree of the ith rule. The final fitness of the ith rule in FNN can be measured by
dfi × αi , where αi is the fitness of the ith rule in conventional meaning.
Let x be the input variable of Headache, y be the input variable of Temperature, C 1 represent flu not
infected and C 2 represent flu infected. Define two fuzzy sets “Yes” and “No” on input dimension and
three fuzzy sets “N”, “H” and “V” on input dimension, where “N”, “H” and “V” represent “Normal”,
“High” and “Very High”, respectively. Then the six rules described above can be mapped into the FNN
as shown in Fig. 3.
6. Numerical simulations
In this section, numerical simulations are demonstrated to show our approach’s superiority over the
rule extraction approach only using the conventional FNN [1].
Given a nonlinear system:
y(t)y(t − 1)(y(t) + 2.5)

y(t + 1) = + u(t)
1 + y 2 (t) + y 2 (t − 1)
2πt
u(t) = sin is activation function. (10)
25
y(0) = 0.9, y(−1) = 0.5.
Method 1: Use the conventional FNN [1].

First, we divide input interval into three equal sub-intervals on each dimension, and then define three
fuzzy subsets on them (see Fig. 4). Figure 4 shows how to define fuzzy sets on sub-intervals, where
S, M and L represents fuzzy sets “Small”, “Middle” and “Large”, respectively; y min and ymax are the
minimum and the maximum that may be taken on dimension y , respectively.
Fig. 4. Defining fuzzy sets on y dimension.
Table 7
Performance comparison between method 1 and
method 2
R ARL No. of Iterations
Method 1 27 3 200
Method 2 20 2.2 89
We define the average rule length ARL as:

R

Pi
i=1
ARL = (11)
R
where R is the number of rules, Pi is the number of the premise variables in the ith rule.
Using the complete combination rule set, there will be 27 (3 × 3 × 3) rules, and ARL is 3 (because
there are 3 premise variables in each rule).
Method 2: Use the approach in this paper, i.e.,
– Discretizing samples (Quantifying continuous attribute value). In order to compare with Method 1,
input interval is also divided into 3 equal subintervals on each dimension as done in method 1.
– In order to demonstrate the incrementability of the proposed algorithm GIREA, setting information
table null at beginning, then gradually add sample into it (one by each time), extracting certain and
possible rules using GIREA until all samples have been processed.
– Mapping rules to the FNN, using the FNN to refine the rules obtained in the above step
Using method 2, we got 20 rules and the average rule length is 2.5.
In our experiment, in order to approximate to the same level, the number of iteration for method 1 and
method 2 are 200 and 89 respectively. Figure 5 shows the final identification results of method 1 and
method 2, respectively (using FNN independently when training finished and using different initial state
values from the real system (y(0) = 0.9, y(−1) = 0.5), but the two FNNs use the same initial state
values (y(0) = 0.4, y(−1) = 0.2)). Table 7 compares the performances of method 1 and method 2.
From Fig. 5 we can see that compared with method 1, method 2 has the simpler rule set, the more
quick learning speed. The reason is that the FNN based on our approach here contains “knowledge” got
from sample data.
Figure 6 also shows the final identification results of method 1 and method 2 after 20% white gauss
noise added respectively. It is easy to see that the FNN based on method 2 has better robustness than the
FNN based on method 1.
Here another experiment is done to demonstrate the performance superiority of the proposed GIREA
over the conventional rule extraction algorithm.
(a) (b)
Fig. 5. (a) and (b) are identification results using method 1 and method 2, respectively. Small dots – real system (initial state
y(0) = 0.9, y(−1) = 0.5); Big dots – FNN (initial state y(0) = 0.4, y(−1) = 0.2).
(a) (b)
Fig. 6. (a) and (b) are identification results using method 1 and method 2, respectively. (20% white gauss noise added). Small
dots – real system (initial state y(0) = 0.9, y(−1) = 0.5); Big dots – FNN (initial state y(0) = 0.4, y(−1) = 0.2).
Suppose there are 100 samples in original sample set. Rules have been extracted from the sample set
using the conventional rule extraction algorithm. Suppose the used time be the benchmark time 1. Now
another 20 samples are added to the sample set. The time of re-extracting rules using the conventional
rule extraction algorithm is 1.19, while the time of re-extracting rules using GIREA is only 1.08, as
shown in Table 8. The reason is that when new objects added, the proposed GIREA updates rule set by
partly modifying original rule set, while the conventional rule extraction algorithm needs to re-compute
rule set from the very beginning.
7. Conclusions
How to get rules from data without expert knowledge is the bottleneck of knowledge discovery. Our
approach here attempts to integrate rough set and FNN together to discover knowledge. Rule set obtained
by GIREA has characteristics of fewer rules and shorter rule length. Simulation results on our approach
here show its effectiveness and advantages over conventional FNN. The reason is that our approach
utilizes the distribution characteristics of sample data and extract “better” rule set, so the FNN based on
“better” rule set has “better” topology and has better robustness and learning speed accordingly. Further
Table 8
Performance comparison between the conventional rule ex-
traction algorithm and the GIREA (Note: the time listed in
Table 8 is relative to the benchmark time 1)
Algorithm Time used
The Conventional Rule Extraction Algorithm 1.19
GIREA 1.08
studies should be focused on theoretical and practical study of static-dynamic topology-changeable FNN
and knowledge discovery.
Acknowledgement
The work here is financially supported by National Science Foundation of China. The authors would
like to thank the anonymous reviewers for their valuable comments.
ABOUT AUTHORS
Wang Shitong: Professor in computer science

Yu Dongjun, Ph.D candidate in computer science
Yang JinYu: Professor in computer science
References
[1] S.T. Wang, Fuzzy system and Fuzzy Neural Networks, Shanghai Science and Technology Press, 1998, Edition 1.
[2] L.A. Zadeh, Fuzzy sets, Inform. Contr. 8 (1965), 338–353.
[3] Z. Pawlak, Rough Sets, Theoretical Aspects of Reasoning About Data. Dordrecht, Kluwer, The Netherlands, 1991.
[4] M. Banerjee et al., Rough Fuzzy MLP: Knowledge Encoding and Classification, IEEE Trans. Neural Networks 9(6)
(1998), 1203–1216.
[5] C.T. Lin, Neural Fuzzy System, Prentice-Hall Press, USA, 1997.
[6] L.X. Wang, A Course on Fuzzy Systems, Prentice-Hall press, USA, 1999.
[7] S. Wang and D. Yu, Error analysis in nonlinear system identification using fuzzy system, J. of software research 11(4)
(2000), 447–452.
[8] A. Skowron and C. Rauser, The discernability matrices and functions in information system, in Intelligent Decision
Support, Handbook of Application and Advances of Rough Sets Theory, R. Slowinski, ed., Dordrecht, Kluwer, The
Netherlands, 1992, pp. 331–362.
[9] N. Shan and W. Ziarko, An incremental Learning Algorithm for Constructing Decision Rules, in: Rough Sets, Fuzzy Sets
and Knowledge Discovery, R.S. Kluwer, ed., Springer-Verlag, 1994, pp. 326–334.
[10] P. Wang, Constructive theory for fuzzy system, Fuzzy sets and systems 88(2) (1997), 1040–1045.
[11] Z. Mao et al., Topology-Changeable neural network, Control theory and application 16(1), 54–60.
[12] M. Setnes et al., Similarity measures in fuzzy rule base simplification, IEEE Transactions on system, man, and cybernetics
– Part B: cybernetics 28(3) (June 1998).
[13] K.S. Narendra and K. Parthasarathy, Identification and control of dynamical systems using neural networks, IEEE Trans.
Neural Networks 1(1) (March 1990), 4–23.
[14] J. Lu, W. Xu and Z. Han, Research on parallel Identification Algorithm of Neural Networks, Control Theory and
applications 15(5) (1998), 741–745.

Integrating Rough Set Theory and Fuzzy Neural Network To Discover Fuzzy Rules

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Integrating Rough Set Theory and Fuzzy Neural Network To Discover Fuzzy Rules

Enviado por

Direitos autorais:

Formatos disponíveis

Intelligent Data Analysis 7 (2003) 59–73 59

Integrating rough set theory and fuzzy neural

Shi-tong Wanga , Dong-jun Yub and Jing-yu Yangb

Received 15 April 2002

1088-467X/03/$8.00  2003 – IOS Press. All rights reserved

2. Fuzzy system and its fuzzy neural network

Rule 1: if x1 is A11 and x2 is A12 and . . . xn is A1n , then y is B 1

Fact: x1 is A1 and x2 is A2 and . . . xn is An

Fig. 1. The FNN implementation of the fuzzy system.

3. Rough set, decision matrix and rule extraction

3.1. Basic concepts of rough sets

3.2. Rule extraction using decision matrix

3.2.1. Rule extraction from consistent information table

|B20 | = (T, 2) ∧ ((T, 2) ∨ (H, 0)) ∧ ((T, 2) ∨ (H, 0)) = (T, 2)

|B11 | = (T, 0) ∧ (T, 0)((T, 0) ∨ (H, 0)) = (T, 0)

RU L0 = (T, 2) ∨ ((T, 1) ∧ (H, 0))

Firstly, we compute concept X 1 and X2 ’s low and upper approximation:

|B40 |possible = ((T, 2) ∨ (H, 1)) ∧ (T, 2) = (T, 2)

thus, we can obtain possible rule set for class 0 as follows

RU L0possibel = (T, 1) ∨ (T, 2) ∨ (T, 1) ∨ (T, 2) ∨ (T, 1) ∨ (T, 2)

For possible rules, we define their belief function

4. New Generalized Incremental Rule Extracting Algorithm (GIREA)

Xi (labeled with l).

5. Mapping rules into the FNN

the FNN as shown in Fig

Fig. 3. Mapping rules to FNN.

y(t)y(t − 1)(y(t) + 2.5)

Method 1: Use the conventional FNN [1].

Fig. 4. Defining fuzzy sets on y dimension.

We define the average rule length ARL as:

Wang Shitong: Professor in computer science

Você também pode gostar