Escolar Documentos
Profissional Documentos
Cultura Documentos
TuBao Ho
Japan Advanced Instituteof
Science and Technology
Tatsunokuchi, Ishikawa
923-1292 Japan
+81-761-51-1730
bao@jaist.ac.jp
TrongDung Nguyen
Japan Advanced Instituteof
Science and Technology
Tatsunokuchi, Ishikawa
923-1292 Japan
+81-761-51-1732
nguyen@jaist.ac.jp
ABSTRACT
Viewing knowledge discovery as a user-centered process that
requires an effective collaboration between the user and the
discovery system, our work aims to support an active role of the
user in that process by developing synergistic visualization tools
integrated in our discovery system D2MS. These tools provide an
ability of visualizing the entire process of knowledge discovery in
order to help the user with data preprocessing, selecting mining
algorithms and parameters, evaluating and comparing discovered
models, and taking control of the whole discover process. Our
case-studies with two medical datasets on meningitis and stomach
cancer show that, with visualization tools in D2MS, the user gains
better insight in each step of the knowledge discovery process as
well the relationship between data and discovered knowledge.
Keywords
model selection, knowledge discovery process,
knowledge visualization, the user's active role.
DungDuc Nguyen
Japan Advanced Instituteof
Science and Technology
Tatsunokuchi, Ishikawa
923-1292 Japan
+81-761-51-1732
dungduc@jaist.ac.jp
data and
1. INTRODUCTION
The process of knowledge discovery in databases (KDD) can be
viewed inherently consists of five steps: (1) understanding the
application domain, (2) data preprocessing, (3) data mining, (4)
post-processing, and (5) applying discovered knowledge, where
each step requires many decisions being made by the user [10].
To find implicit but potentially useful patterns/models from large
databases, one cannot expect just to push a large amount of data
into a KDD system without the user's participation. In other
words, the KDD process can be alternatively viewed as a process
of model selection, i.e., that of choosing by the user the most
interesting discovered patterns/models or algorithms and their
settings for obtaining such patterns/models in a given application.
Model selection in KDD is a complicated human-centered and
domain-centered process in which the participation of the user
plays a key role to the success.
I GraphicalUserInterface ~ ' ~
.......................................
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
SIGKDD "02,July 23-26, 2002, Edmonton, Alberta, Canada.
Copyright 2002 ACM 1-58113-567-X/02/0007...$5.00.
Data
ii ........................................
DataMining
519
model base.
peUng
leb
II
I ,Select
& Apply
A~IIh~-O~arc~s
520
number of instances of the class covered by the rule over the total
number of instances in the class. This view gives a first
observation of the rule quality.
!:
i"~ C - 7 -
~i
i 71~" i S - -
~ !
" ~ ES . . . . . . .
3.2.1 F i e w i n g rules
Each rule is displayed by polyline that goes through the axes
containing attribute-values occurred on the antecedent part of the
rule leading to the consequent part of the rule that are displayed
with different color. In the case of prediction rules, the ratio
associated with each class in the class attribute corresponds to the
521
522
5. A C A S E - S T U D Y
This section illustrates the utility of synergistic visualization of
data and knowledge of D2MS in extracting knowledge from a
stomach cancer dataset.
523
I%4
.......
11)01
Ill~IlOl
6. C O N C L U S I O N
We have presented the knowledge discovery system D2MS with
support for model selection integrated with visualization. We
emphasize the crucial role of the user's participation in the model
selection process of knowledge discovery and have developed
data, rule and tree visualizers in D2MS to support such
participation. Our basic idea is use right visualization techniques
in right places~ and visualization should be integrated into the
steps of the knowledge discovery process. D2MS with its
visualization support has been used and shown advantages in
extracting knowledge from a real-world application on stomach
cancer data.
AND middleJ:hird = 1
class = death within 90 days
REFERENCES
[1] Breiman, L., Friedman, J., Olshen, R., and Stone, C.,
Classification and Regression Trees, Belmont, CA:
Wadsworth, 1984.
[2] Card, S. K., Mackinlay, J. D., Shneiderman, B., Readings in
Information Visualization, Morgan Kaufmann, 1999.
[3] Fayyad, U.M., Grinstein. G.G., and Wierse, A., Information
Visualization in Data Mining and Knowledge Discovery,
Morgan Kaufmann, 2002.
[4] Furnas, G.W., "The FISHEYE View: A New Look at
Structured
Files",
Bell
Laboratories
Technical
Memorandum, #81-11221-9, 1981.
524