Weka

DSS
(Decision Support System)
DECISION SUPPORT SYSTEM

A decision support system (DSS) is a computer program
application that analyses business data and presents it
so that users can make business decisions more easily.
DSSs serve the management, operations, and planning
levels of an organization (usually mid and higher
management) and help people make decisions about
problems that may be rapidly changing and not easily
specified in advance
Decision support systems can be either fully
computerized, human-powered or a combination of both
DATA MINING
DATA MINING
In general, data mining is "a class of analytical

applications that help analysts and managers search for
patterns in a data base".
The data mining process involves identifying an

appropriate data set to mine to discover data content
relationships.
Data mining tools include techniques like case-based
reasoning, cluster analysis, data visualization, fuzzy
query and analysis, and neural networks.
Data mining sometimes resembles the traditional
scientific method of identifying a hypothesis and then
testing it using an appropriate data set.
History of WEKA
In 1993, the University of Waikato in New Zealand
started development of the original version of Weka
(which became a mixture of TCL/TK, C, and Makefiles).
In 1997, the decision was made to redevelop Weka from
scratch in Java, including implementations of modelling
algorithms.
Overview Of WEKA
Weka is a workbench that contains a collection of visualization tools and
algorithms for data analysis and predictive modelling, together with
graphical user interfaces for easy access to this functionality.
The original non-Java version of Weka was a TCL/TK front-end to modelling
algorithms implemented in other programming languages.
Data pre-processing utilities are built using C, and a Makefile.
This original version was primarily designed as a tool for analysing data
from agricultural domains, but the more recent fully Java-based version
(Weka 3), for which development started in 1997, is now used in many
different application areas, in particular for educational purposes and
research
Weka supports several standard data mining tasks, more specifically, data
pre-processing, clustering, classification, regression, visualization, and
feature selection.
Features and
Interfaces in WEKA
By:- Tanvi Redkar
Main Features
46 data processing tools.
76 classifications/regression algorithms
8 clustering algorithms
15 attributes/subset evaluators plus 10 search algorithms
for feature selection.
3 algorithms for finding association rules.
Cont..
Options to customise using java source code is made
available.
Custom extensions and plug ins can be developed.
Excellent mailing and discussion list available.
Cont
3 graphical user interfaces.
The Explorer (exploratory data analysis)
The Experimenter (experimental environment)
The knowledge Flow (new process model inspired interface)
WEKA INTERFACES
Command-line
Explorer
pre-processing, attribute selection, learning,
visualisation
Knowledge Flow
Visual design
Capabilities ~ Explorer
Experimenter
Testing and evaluating machine learning algorithms.
Limitations of WEKA
By:- Priyanka Bhagat
The main disadvantage is that most of the functionality is only applicable if all
data is held in main memory.
A few algorithms are included that are able to process data incrementally or
in batches.
However, for most of the methods the amount of available memory imposes a
limit on the data size, which restricts application to small or medium-sized
datasets.
A second disadvantage : a Java implementation is generally somewhat slower
than an equivalent in C/C++.
Data clustering algorithms

Cluster analysisorclusteringis the task of grouping a set of objects in such
a way that objects in the same group i.e cluster are more similar to each
other than to those in other clusters.
It is a main task of exploratorydata mining , and a common technique
forstatisticaldata analysis, used in many fields, includingmachine
learning,pattern recognition,image analysis,information retrieval,
andbioinformatics.
Cluster analysis itself is not one specificalgorithm.
It can be achieved by various algorithms.
Types of clusters
Well-separated clusters.
Distribution-based clustering
Centroid-based clustering
hierarchical clustering
Density-based clustering
Conceptual clustering
Clustering algorithm
K-means clustering
K-means is one of the simplest unsupervised learning
algorithms that solve the well known clustering problem. The
procedure follows a simple and easy way to classify a given
data set through a certain number of clusters fixed APRIORI.
Algorithmic steps
select K points as the initial centroids.
Repeat
Form K clusters by assigning all points to the closest
centroid.
Recompute the centroid of each cluster.
Until The centroids dont change.
Algorithmic steps for k-means clustering

LetX = {x1,x2,x3,..,xn} be the set of data points and V =
{v1,v2,.,vc} be the set of centres.
1) Randomly selectccluster centres.
2) Calculate the distance between each data point and cluster
centres.
3) Assign the data point to the cluster centre whose distance
from the cluster centre is minimum of all the cluster centres..
4) Recalculate the new cluster centre using:
where,cirepresents the number of data points inithcluster.

5) Recalculate the distance between each data point and new
obtained cluster centres.
6) If no data point was reassigned then stop, otherwise repeat
from step 3.
Advantages
1) Fast, robust and easier to understand.
2) Relatively efficient:O(t*k*n*d), wherenis objects,kis clusters, d is
dimension of each object, andtis iterations. Normally,k,t, d<<n.
3)Gives best result when data set are distinct or well separated from each
other.
Disadvantages
1)The learning algorithmrequiresAPRIORI specification of
the number of cluster centres.
2)The use of Exclusive Assignment - Ifthere are two highly
overlapping datathen k-means will not be able to
resolvethat there are two clusters.
3)The learning algorithm is not invariant to non-linear
transformationsi.e.with different representation ofdata we
get different results.
4) Randomly choosing of the cluster centre cannot lead us to
the fruitful result.
5)Applicable only whenmeanis defined i.e. fails for
categorical data.
6)Unable to handle noisy data andoutliers.
7) Algorithm fails for non-linear data set.

Weka

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Weka

Enviado por

Direitos autorais:

Formatos disponíveis

DSS

(Decision Support System)

DECISION SUPPORT SYSTEM

In general, data mining is "a class of analytical

The data mining process involves identifying an

The Explorer (exploratory data analysis)

The Experimenter (experimental environment)

The knowledge Flow (new process model inspired interface)

Data clustering algorithms

Algorithmic steps for k-means clustering

where,cirepresents the number of data points inithcluster.

Você também pode gostar