Unsupervised Learning

Unsupervised Learning
Clustering
Unsupervised classification, that is,

without the class attribute
Want to discover the classes
Association Rule Discovery
Discover correlation
Data Mining and Kno
The Clustering Process
Pattern representation
Definition of pattern proximity

measure
Clustering
Data abstraction
Cluster validation
Data Mining and Kno
Pattern Representation
Number of classes
Number of available patterns
Feature selection
Circles, ellipses, squares, etc.

Can we use wrappers and filters?
Feature extraction
Produce new features

E.g., principle component analysis (PCA)
Data Mining and Kno
Pattern Proximity
Want clusters of instances that are

similar to each other but dissimilar to
others
Need a similarity measure
Continuous case
Euclidean measure (compact isolated clusters)

1
T
The squared
Mahalanobis
distance
d M ( x i , x j ) ( x i x j ) ( x i x j )
alleviates problems with correlation
Many more measures
Data Mining and Kno
Pattern Proximity
Nominal attributes
nx
d (xi , x j )
n
n Number of attributes
x Number of attributes that are the same
Data Mining and Kno
Clustering Techniques
Clustering
Hierarchical
Single
Link
Partitional
Complete
Link
CobWeb
Square
Error
Mixture
Maximization
K-means
Expectation
Maximization
Data Mining and Kno
Technique Characteristics
Agglomerative vs Divisive
Agglomerative: each instance is its own

cluster and the algorithm merges clusters
Divisive: begins with all instances in one

cluster and divides it up
Hard vs Fuzzy
Hard clustering assigns each instance to one

cluster whereas in fuzzy clustering assigns
degree of membership
Data Mining and Kno
More Characteristics
Monothetic vs Polythetic
Polythetic: all attributes are used simultaneously,

e.g., to calculate distance (most algorithms)
Monothetic: attributes are considered one at a time
Incremental vs Non-Incremental
With large data sets it may be necessary to consider

only part of the data at a time (data mining)
Incremental works instance by instance
Data Mining and Kno
Hierarchical Clustering
Dendrogram
S
F
C
B
DE
G
i
m
i
l
a
r
i
t
y
A B
Data Mining and Kno
C D E F
Hierarchical Algorithms
Single-link
Distance between two clusters set equal to the

minimum of distances between all instances
More versatile
Produces (sometimes too) elongated clusters
Complete-link
Distance between two clusters set equal to

maximum of all distances between instances in
the clusters
Tightly bound, compact clusters
Often more useful in practice
Data Mining and Kno
10
Example: Clusters Found

Single-Link
Complete-Link
1 1
1 1 1
1
*
1
1
1 11
1 1
1 1 1
1
*
1
1
1 11
2
*
2* 2
2
Data Mining and Kno
2* 2
2
2
2
2
2
2
2 2
2
*
2
2
2
2
2
2 2
11
Partitional Clustering
Output a single partition of the

data into clusters
Good for large data sets
Determining the number of
clusters is a major challenge
Data Mining and Kno
12
K-Means
Predetermined
number of clusters
Start with seed
clusters of one
element
Seeds
Data Mining and Kno
13
Assign Instances to
Clusters
Data Mining and Kno
14
Find New Centroids
Data Mining and Kno
15
New Clusters
Data Mining and Kno
16
Discussion: k-means
Applicable to fairly large data sets

Sensitive to initial centers
Use other heuristics to find good

initial centers
Converges to a local optimum

Specifying the number of centers
very subjective
Data Mining and Kno
17
Clustering in Weka
Clustering algorithms in Weka

K-Means
Expectation Maximization (EM)
Cobweb
hierarchical, incremental, and

agglomerative
Data Mining and Kno
18
CobWeb
Algorithm (main) characteristics:
Hierarchical and incremental

Uses category utility
The k clusters
CU C1 , C2 ,..., Ck
Improvement in probability estimate

because of instance cluster assigment

2
2
Pr
C
Pr
a
v
|
C
Pr
a
v
l i ij l
i
ij
l
Why divide by k?
Data Mining and Kno
All possible values

for attribute ai
19
Category Utility
If each instance in its own cluster

1 vij actual value of instance
Pr ai vij | Cl
otherwise
0
Category utility function becomes
n Pr ai vij
CU C1 , C2 ,..., Ck
Without k it would always kbe best for each
instance to have its own cluster,
overfitting!
i
Data Mining and Kno
20
The Weather Problem

Outlook Temp. Humidity Windy
Sunny
Hot
High
FALSE
Sunny
Hot
High
TRUE
Overcast
Hot
High
FALSE
Rainy
Mild
High
FALSE
Rainy
Cool
Normal FALSE
Rainy
Cool
Normal TRUE
Overcast Cool
Normal TRUE
Sunny
Mild
High
FALSE
Sunny
Cool
Normal FALSE
Rainy
Mild
Normal FALSE
Sunny
Mild
Normal TRUE
Overcast Mild
High
TRUE
Overcast
Hot
Normal FALSE
Rainy
Mild
High
TRUE
Data Mining and Kno
Play
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
21
Weather Data (without

Play)
Label instances: a,b,.,n
Start by putting
the first instance
in its own cluster
Add another instance

in its own cluster
Data Mining and Kno
22
Adding the Third Instance

Evaluate the category utility of adding the instance to one
of the two clusters versus adding it as its own cluster
b
a
a
b
c
Highest utility
Data Mining and Kno
23
Adding Instance f
First instance not to get
its own cluster:
a
d
e
Look at the instances:

Rainy Cool Normal FALSE
Rainy Cool Normal TRUE
Quite similar!
Data Mining and Kno
24
Add Instance g
E) Rainy Cool Normal FALSE
F) Rainy Cool Normal TRUE
G) Overcast Cool Normal TRUE
d
e
Data Mining and Kno
g
25
Add Instance h
A) Sunny Hot High FALSE
D) Rainy Mild High FALSE
H) Sunny Mild High FALSE
Rearrange:
Merged into a
single cluster
before h is added
b
a
Runner up
Best matching node
c
e
g
(Splitting is also possible)
Data Mining and Kno
26
Final Hierarchy
g
a
c
b
i
What next?
Data Mining and Kno
27
Dendrogram Clusters
g
a
c
b
What do a, b, c, d, h, k, and l
have in common?
Data Mining and Kno
28
Numerical Attributes
Assume normal distribution
1
1
1
l Pr Cl 2 i
il
i
CU C1 , C2 ,..., Ck
k
Problems with zero variance!

The acuity parameter imposes a minimum
variance
Data Mining and Kno
29
Hierarchy Size (Scalability)
May create very large hierarchy
The cutof parameter is uses to

suppress growth
If
CU C1 , C2 ,..., Ck Cutoff
cut node off.
Data Mining and Kno
30
Discussion
Advantages
Incremental scales to large number of instances

Cutoff limits size of hierarchy
Handles mixed attributes
Disadvantages
Incremental sensitive to order of instances?

Arbitrary choice of parameters:
divide by k,
artificial minimum value for variance of numeric
attributes,
ad hoc cutoff value
Data Mining and Kno
31
Probabilistic Perspective
Most likely set of clusters given data

Probability of each instance belonging
to a cluster
Assumption: instances are drawn from
one of several distributions
Goal: estimate the parameters of these
distributions
Usually: assume distributions are normal
Data Mining and Kno
32
Mixture Resolution
Mixture: set of k probability distributions

Represent the k clusters
Probabilities that an instance takes
certain attribute values given it is in the
cluster
What is the probability an instance
belongs to a cluster (or a distribution)
Data Mining and Kno
33
One Numeric Attribute

Two cluster mixture model:
Cluster B
Cluster A
Attribute
Given some data, how can you determine the parameters:
A Mean for Cluster A

A Standard deviation for Cluster A
B Mean for Cluster B
B Standard deviation for Cluster B
p A Probability of being in Cluster A
Data Mining and Kno
34
Problems
If we knew which instance came from

each cluster we could estimate these
values
If we knew the parameters we could
calculate the probability that an
Pr x | A Pr[ A] f ( x; A , A ) p A
Pr A | x belongs
cluster
instance
to each
Pr[ x]
1
f ( x; A , A )
e
2
Pr[ x]
( x )2
2 2
Data Mining and Kno
35
EM Algorithm
Expectation Maximization (EM)
Start with initial values for the parameters

Calculate the cluster probabilities for each
instance
Re-estimate the values for the parameters
Repeat
General purpose maximum likelihood

estimate algorithm for missing data
Can also be used to train Bayesian networks

(later)
Data Mining and Kno
36
Beyond Normal Models
More than one class:
More than one numeric attribute
Straightforward
Easy if assume attributes independent
If dependent attributes, treat them
jointly using the bivariate normal
Nominal attributes
No more normal distribution!

Data Mining and Kno
37
EM using Weka
Options
numClusters: set number of clusters.
Default = -1 selects it automatically
maxIterations: maximum number of

iterations
seed -- random number seed
minStdDev -- set minimum allowable
standard deviation
Data Mining and Kno
38
Other Clustering
Artificial Neural Networks (ANN)

Random search
Genetic Algorithms (GA)
GA used to find initial centroids for k-means
Simulated Annealing (SA)

Tabu Search (TS)
Support Vector Machines (SVM)

Will discuss GA and SVM later
Data Mining and Kno
39
Applications
Image segmentation
Object and Character Recognition
Data Mining:
Stand-alone to gain insight into the data
Preprocess before classification that

operates on the detected clusters
Data Mining and Kno
40
DM Clustering Challenges
Data mining deals with large databases

Scalability with respect to number of
instance
Dealing with mixed data
Use a random sample (possible bias)

Many algorithms only make sense for numeric
data
High dimensional problems
Can the algorithm handle many attributes?

How do we interpret a cluster in high dimensions?
Data Mining and Kno
41
Other (General)
Challenges
Shape of clusters
Minimum domain knowledge (e.g.,
knowing the number of clusters)
Noisy data
Insensitivity to instance order
Interpretability and usability
Data Mining and Kno
42
Clustering for DM
Main issue is scalability to large

databases
Many algorithms have been developed

for scalable clustering:
Partitional methods: CLARA, CLARANS
Hierarchical methods: AGNES, DIANA,

BIRCH, CURE, Chameleon
Data Mining and Kno
43
Practical Partitional
Clustering Algorithms
Classic k-Means (1967)

Work from 1990 and later:
k-Medoids
Uses the mediod instead of the centroid

Less sensitive to outliers and noise
Computations more costly
PAM (Partitioning Around Mediods)
algorithm
Data Mining and Kno
44
Large-Scale Problems
CLARA: Clustering LARge Applications
Select several random samples of instances

Apply PAM to each
Return the best clusters
CLARANS:
Similar to CLARA
Draws samples randomly while searching
More effective than PAM and CLARA
Data Mining and Kno
45
Hierarchical Methods
BIRCH: Balanced Iterative Reducing

and Clustering using Hierarchies
Clustering feature: triplet summarizing

information about subclusters
Clustering feature tree: height-balanced

tree that stores the clustering features
Data Mining and Kno
46
BIRCH Mechanism
Phase I:
Phase II:
Scan database to build an initial CF

tree
Multilevel compression of the data
Apply a selected clustering algorithm
to the leaf nodes of the CF tree
Has been found to be very scalable

Data Mining and Kno
47
Conclusion
The use of clustering in data mining

practice seems to be somewhat
limited due to scalability problems
More commonly used unsupervised
learning:
Data Mining and Kno
48
Aims to discovery interesting correlation

or other relationships in large databases
Finds a rule of the form

if A and B then C and D
Which attributes will be included in the

relation is unknown
Data Mining and Kno
49
Mining Association Rules
Similar to classification rules

Use same procedure?
Every attribute is the same

Apply to every possible expression on right
hand side
Huge number of rules Infeasible
Only want rules with high

coverage/support
Data Mining and Kno
50
Market Basket Analysis
Basket data: items purchased on pertransaction basis (not cumulative, etc)
How do you boost the sales of a given product?

What other products does discontinuing a
product impact?
Which products should be shelved together?
Terminology (market basket analysis):
Item - an attribute/value pair

Item set - combination of items with min.
coverage
Data Mining and Kno
51
How Many k-Item Sets

Have Minimum Coverage?
Outlook
Sunny
Sunny
Overcast
Rainy
Rainy
Rainy
Overcast
Sunny
Sunny
Rainy
Sunny
Overcast
Overcast
Rainy
Temp. Humidity Windy

Hot
High FALSE
Hot
High
TRUE
Hot
High FALSE
Mild
High FALSE
Cool Normal FALSE
Cool Normal TRUE
Cool Normal TRUE
Mild
High FALSE
Cool Normal FALSE
Mild
Normal FALSE
Mild
Normal TRUE
Mild
High
TRUE
Hot
Normal FALSE
Mild
High
TRUE
Data Mining and Kno
Play
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
52
Item Sets
1-Item
2-Item
3-Item
4-Item
Outlook=sunny
(5)
Outlook=sunny
temp=mild (2)
Outlook=
overcast (4)
Outlook=sunny
temp=hot (2)
Outlook=sunny
temp=hot
humidity=high
(2)
Outlook=sunny
temp=hot
play=no (2)
Outlook=rainy
(5)
Outlook=sunny
humidity=norm
(2)
Outlook=sunny
humidity=norm
play=yes (2)
Temp=cool (4)
Outlook=sunny
windy=true (2)
Outlook=sunny
humidity=high
windy=false (2)
Temp=mild (6)
Outlook=sunny
windy=true (2)
Outlook=sunny
humidity=high
play=no (3)
Outlook=sunny
temp=hot
humidity=high
play=no (2)
Outlook=sunny
humidity=high
windy=false
play=no (2)
Outlook=over
temp=hot
windy=false
play=no (2)
Outlook=rainy
temp=mild
windy=false
play=yes (2)
Outlook=rainy
humidity=norm
windy=false
play=yes (2)
Data Mining and Kno
53
From Sets to Rules

3-Item Set w/coverage 4:
Humidity = normal, windy = false, play = yes
Association Rules:
Accuracy
If humidity = normal and windy = false then play = yes

If humidity = normal and play = yes then windy = false
If windy = false and play = yes then humidity = normal
If humidity = normal then windy = false and play = yes
If windy = false then humidity = normal and play = yes
If play = yes then humidity = normal and windy = false
If - then humidity = normal and windy = false and play=yes
Data Mining and Kno
4/4
4/6
4/6
4/7
4/8
4/9
4/12
54
From Sets to Rules

(continued)
4-Item Set w/coverage 2:

Temperature = cool, humidity = normal,
windy = false, play = yes
Association Rules:
Accuracy
If temperature = cool, windy = false humidity = normal, play = yes

If temperature = cool, humidity = normal, windy = false play = yes
If temperature = cool, windy = false, play = yes humidity = normal
Data Mining and Kno
2/2
2/2
2/2
55
Overall
Minimum coverage (2):
12 1-item sets, 47 2-item sets, 39 3-item sets, 6

4-item sets
Minimum accuracy (100%):
58 association rules
Best Rules (Coverage = 4, Accuracy = 100%)

If humidity = normal and windy = false
If temperature = cool
If outlook = overcast
play = yes
humidity = normal
play = yes
Data Mining and Kno
56
Association Rule Mining

STEP 1: Find all item sets that meet
minimum coverage
STEP 2: Find all rules that meet
minimum accuracy
STEP 3: Prune
Data Mining and Kno
57
Generating Item Sets
How do we generate minimum coverage

item sets in a scalable manner?
Need an efficient algorithm:
Total number of item set is huge

Grows exponentially in the number of attributes
Start by generating minimum coverage 1-item
sets
Use those to generate 2-item sets, etc
Why do we only need to consider minimum

coverage 1-item sets?
Data Mining and Kno
58
Justification
Item Set 1: {Humidity = high}
Coverage(1) = Number of times humidity is high
Item Set 2: {Windy = false}
Coverage (2) = Number of times windy is false
Item Set 3: {Humidity = high, Windy = false}
Coverage (3) = Number of times humidity is high and
windy is false
Coverage (3) Coverage(1)
Coverage (3) Coverage(2)
If Item Set 1 and 2 do not

both meet min. coverage
Item Set 3 cannot either
Data Mining and Kno
59

Start with all
3-item sets
that meet min.
coverage
(A B C)
(A B D)
(A C D)
(A C E)
Merge to
generate
4-item sets
There are only two 4item sets that could

possibly work
(Consider only
sets that start
with the same
two attributes)
(A B C D)
(A C D E)
Candidate 4-item sets with minimum

coverage (must be checked)
Data Mining and Kno
60
Algorithm for Generating

Item Sets
Build up from 1-item sets so that we

only consider item sets that is found
by merging two minimum coverage
sets
Only consider sets that have all but
one item in common
Computational efficiency further
improved using hash tables
Data Mining and Kno
61
Generating Rules
Meets min.
If windy = false and play = no then
coverage
and accuracy outlook = sunny and humidity = high
Meets min.
coverage
and accuracy
If windy
then
If windy
then
= false and play = no

outlook = sunny
= false and play = no
humidity = high
Data Mining and Kno
62
How Many Rules?
Want to consider every possible subset

of attributes as consequent
Have 4 attributes:
Four single consequent rules

Six double consequent rules
Two triple consequent rules
Twelve possible rules for single 4-item set!
Exponential explosion of possible rules

Data Mining and Kno
63
Must We Check All?

If A and B then C and D
Coverage Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy
Number of times A and B are true
If A,B and C then D

Coverage Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy
Number of times A, B, and C are true
Data Mining and Kno
64
Efficiency Improvement
A double consequent rule can only be OK

if both single consequent rules are OK
Procedure:
Start with single consequent rules

Build up double consequent rules, etc.
candidate rules
check for accuracy
In practice: need to check far fewer rules
Data Mining and Kno
65
Apriori Algorithm
This is a simplified description of

the Apriori algorithm
Developed in early 90s and is the
most commonly used approach
New developments focus on
Generating item sets more efficiently

Generating rules from item sets more
efficiently
Data Mining and Kno
66

using Weka
Parameters to be specified in Apriori:
upperBoundMinSupport: start with this value of

minimum support
delta: in each step decrease the minimum
support required by this value
lowerBoundMinSupport: final minimum support
numRules: how many rules are generated
metricType: confidence, lift, leverage, conviction
minMetric: smallest acceptable value for a rule
Handles only nominal attributes

Data Mining and Kno
67
Difficulties
Apriori algorithm improves performance

by using candidate item sets
Still some problems
Costly to generate large number of item

sets
To generate a frequent pattern of size 100 need

>21001030 candidates!
Requires repeated scans of database to

check candidates
Again, most problematic for long patterns
Data Mining and Kno
68
Solution?
Can candidate generation be avoided?

New approach:
Create a frequent pattern tree (FP-tree)
stores information on frequent patterns
Use the FP-tree for mining frequent

patterns
partitioning-based
divide-and-conquer
(as opposed to bottom-up generation)
Data Mining and Kno
69
TID
100
200
300
400
500
Database
Items
Frequent Items
Tree
F,A,C,D,G,I,M,P
F,C,A,M,P
A,B,C,F,L,M,O
B,F,H,J,O
B,C,K,S,P
A,F,C,E,L,P,M,N
(Min. support = 3)
FP-
F,C,A,B,M
F,B
C,B,P
F,C,A,M,P
Item
F
C
A
B
M
P
Head of
node links
Root
F:4
C:3
C:1
B:1
A:3
P:1
M:2
B:1
P:2
M:1
Data Mining and Kno
B:1
70
Computational Effort
Each node has three fields
Also a header table with
item name
count
node link
item name
head of node link
Need two scans of the database
Collect set of frequent items

Construct the FP-tree
Data Mining and Kno
71
Comments
The FP-tree is a compact data structure
The FP-tree contains all the information

related to mining frequent patterns (given the
support)
The size of the tree is bounded by the

occurrences of frequent items
The height of the tree is bounded by the

maximum number of items in a transaction
Data Mining and Kno
72
Mining Patterns
Mine complete set of frequent

patterns
For any frequent item A, all possible

patterns containing A can be
obtained by following As node links
starting from As head of node links
Data Mining and Kno
73
Example Root
Item
F
C
A
B
M
P
Head of
node links
F:4
C:1
C:3
B:1
A:3
B:1
P:1
M:2
B:1
P:2
M:1
Occurs twice
Frequent Pattern
(P:3)
Paths
<F:4, C:3, A:3, M:2, P:2>
<C:1, B:1, P:1>
Occurs ones
Data Mining and Kno
74
Rule Generation
Mining complete set of association

rules has some problems
May be a large number of frequent

item sets
May be a huge number of association
rules
One potential solution is to look at

closed item sets only
Data Mining and Kno
75
Frequent Closed Item Sets
An item set X is a closed item set if there

is no item set X such that X X and every
transaction containing X also contains X
A rule X Y is an association rule on a

frequent closed item set if
both X and XY are frequent closed item sets,

and
there does not exist a frequent closed item set Z
such that X Z XY
Data Mining and Kno
76
Example
ID
10
20
30
40
50
Items
A,C,D,E,F
A,B,E
C,E,F
A,C,D,F
C,E,F
Frequent Item Sets (min support = 2):

A (3),
E (4),
AE (2),
All the closed sets
ACDF (2),
CF (3),
CEF (3),
D (2),
Not closed! Why?
AC (2),
+ 12 more
Data Mining and Kno
77
Mining Frequent Closed

Item Sets (CLOSET)
TDB
NOTE
C:4
E:4
F:4
A:3 Order for
D:2 conditional DB
CEFAD
EA
CEF
CFAD
CEF
D-cond DB (D:2)
A-cond DB (A:3)
F-cond DB (F:4)
E-cond DB (E:4)
CEFA
CEF
CE:3
C:4
CFA
Output: CFAD:2
CF
Output: A:3
Output: E:4
Output: CF:2,CEF:3
EA-cond DB (EA:2)
C
Output: EA:2
Data Mining and Kno
78
Mining with Taxonomies

Taxonomy:
Clothes
Outerwear
Jackets
Footwear
Shirts
Shoes
Hiking Boots
Ski Pants
Generalized association rule

X Y where no item in Y is
an ancestor of an item in X
Data Mining and Kno
79
Why Taxonomy?
The classic association rule mining restricts

the rules to the leave nodes in the taxonomy
However:
Rules at lower levels may not have minimum

support and thus interesting association may go
undiscovered
Taxonomies can be used to prune uninteresting

and redundant rules
Data Mining and Kno
80
Example
ID
10
20
30
40
50
60
Item Set
{Jacket}
{Outerwear}
{Cloths}
{Shoes}
{Hiking Boots}
{Footwear}
{Outerwear, Hiking Boots}
{Cloths, Hiking Boots}
{Outerwear, Footwear}
{Cloths, Footwear}
Items
Shirt
Jacket, Hiking Boots
Ski pants, Hiking Boots
Shoes
Shoes
Jacket
Rule
Outerwear Hiking Boots
Outerwear Footwear
Hiking Boots Outerwear
Hiking Boots Clothes
Support
2
2
2
2
Support
2
3
4
2
2
2
2
2
2
2
Confidence
2/3
2/3
2/2
2/2
Data Mining and Kno
81
Interesting Rules
Many way in which the interestingness of a rule can be evaluated based on ancestors
For example:
A rule with no ancestors is interesting

A rule with ancestor(s) is interesting only if it has enough relative support
Which rules are interesting?
Rule ID
1
2
3
Rule
Clothes Footwear
Outerwear Footwear
Jackets Footwear
Support
10
8
4
Data Mining and Kno
Item
Clothes
Outerwear
Jackets
Support
5
2
1
82
Discussion
Association rule mining finds expression of

the form X Y from large data sets
One of the most popular data mining tasks
Originates in market basket analysis
Key measures of performance
Support
Confidence (or accuracy)
Is support and confidence enough?

Data Mining and Kno
83
Type of Rules Discovered
Classic association rule problem
All rules satisfying minimum threshold

of support and confidence
Focus on subset of rules, e.g.,
Optimized rules
What makes for an
Maximal frequent item sets interesting rule?
Closed item sets
Data Mining and Kno
84
Algorithm Construction
Determine frequent item sets (all

or part)
By far the most computational time

Variations focus on this part
Generate rules from frequent item

sets
Data Mining and Kno
85

Search space
traversed
Support
determined
Bottom-up
Counting
Apriori-like
algorithms
* Have discussed
Intersecting
Apriori*
Partition
AprioriTID
DIC
Top-down
Counting
FP-Growth*
Data Mining and Kno
Intersecting
Eclat
No algorithm
dominates others!
86
Applications
Market basket analysis
Classic marketing application
Applications to recommender
systems
Data Mining and Kno
87
Recommender
Customized goods and services

Recommend products
Collaborative filtering
similarities among users tastes

recommend based on other users
many on-line systems
simple algorithms
Data Mining and Kno
88
Classification Approach
View as classification problem
Product either of interest or not

Induce a model, e.g., a decision tree
Classify a new product as either
interesting or not interesting
Difficulty in this approach?
Data Mining and Kno
89
Association Rule Approach
Product associations
User associations
90% of users who like product A and product B

also like product C
A and B C (90%)
90% of products liked by user A and user B are
also liked by user C
Use combination of product and user

associations
Data Mining and Kno
90
Advantages
Classic collaborative filtering must
identify users with similar tastes

This approach uses overlap of other
users tastes to match given users taste
Can be applied to users whose tastes dont

correlate strongly with those of other users
Can take advantage of information from, say
user A, for a recommendation to user B, even
if they do not correlate
Data Mining and Kno
91
Whats Different Here?
Is this really a classic association

rule problem?
Want to learn what products are liked
by what users
Semi-supervised
Target item
User (for user associations)

Product (for product associations)
Data Mining and Kno
92
Single-Consequent Rules
Only a single (target) item in the

consequent
Go through all such items
Association Rules
All possible item
combination consequent
Associations for
Recommender
Classification
One single item
consequent
Data Mining and Kno
93

Unsupervised Learning

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Unsupervised Learning

Enviado por

Direitos autorais:

Formatos disponíveis

Unsupervised Learning

Unsupervised classification, that is,

Association Rule Discovery

The Clustering Process

Definition of pattern proximity

Circles, ellipses, squares, etc.

Produce new features

Want clusters of instances that are

Euclidean measure (compact isolated clusters)

Data Mining and Kno

Data Mining and Kno

Data Mining and Kno

Agglomerative: each instance is its own

Divisive: begins with all instances in one

Hard clustering assigns each instance to one

Data Mining and Kno

Polythetic: all attributes are used simultaneously,

Monothetic: attributes are considered one at a time

With large data sets it may be necessary to consider

Incremental works instance by instance

Data Mining and Kno

Data Mining and Kno

Distance between two clusters set equal to the

Distance between two clusters set equal to

Data Mining and Kno

Example: Clusters Found

Data Mining and Kno

Output a single partition of the

Data Mining and Kno

Data Mining and Kno

Find New Centroids

Data Mining and Kno

Data Mining and Kno

Applicable to fairly large data sets

Use other heuristics to find good

Converges to a local optimum

Clustering algorithms in Weka

hierarchical, incremental, and

Data Mining and Kno

Algorithm (main) characteristics:

Hierarchical and incremental

Improvement in probability estimate

All possible values

If each instance in its own cluster

Data Mining and Kno

The Weather Problem

Data Mining and Kno

Weather Data (without

Label instances: a,b,.,n

Add another instance

Data Mining and Kno

Adding the Third Instance

Data Mining and Kno

Look at the instances:

Data Mining and Kno

Best matching node

Data Mining and Kno

Data Mining and Kno

Data Mining and Kno

Assume normal distribution

Problems with zero variance!

Data Mining and Kno

Hierarchy Size (Scalability)

May create very large hierarchy