Data Mining:: Association Rules Techniques

Data Mining:
Association Rules Techniques
December 9, 2021 1
What Is Association Mining?
 Association rule mining:

 Finding frequent patterns, associations, correlations among
sets of items or objects in transaction databases, relational

databases, and other information repositories.
 Applications:
 Basket data analysis, cross-marketing, catalog design, loss-
leader analysis, clustering, classification, etc.

 Examples.
 Rule form: “Body ead [support, confidence]”.
 buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]
 major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%, 75%]
December 9, 2021 2
Association Rule: Basic Concepts
 Given: (1) database of transactions, (2) each transaction is a

list of items (purchased by a customer in a visit)
 Find: all rules that correlate the presence of one set of items
with that of another set of items
 E.g., 98% of people who purchase tires and auto
accessories also get automotive services done
December 9, 2021 3
Rule Measures: Support and
Confidence
Customer
buys both
Customer  Find all the rules X & Y  Z with
buys diaper
minimum confidence and support
 support, s, probability that a
transaction contains {X, Y, Z}

 confidence, c, conditional
Customer probability that a transaction

buys beer having {X, Y} also contains Z
Transaction ID Items Bought Let minimum support 50%, and

2000 A,B,C minimum confidence 50%, we have
 A  C (50%, 66.6%)
1000 A,C
 C  A (50%, 100%)
4000 A,D
5000 B,E,F
December 9, 2021 4
Association Rule Mining: A Road Map
 Boolean vs. quantitative associations (Based on the

types of values handled)
 buys(x, “SQLServer”) ^ buys(x, “DMBook”)
buys(x, “DBMiner”) [0.2%, 60%]

 age(x, “30..39”) ^ income(x, “42..48K”) buys(x,
“PC”) [1%, 75%]

 Single dimension vs. multiple dimensional associations
(see ex. Above)
 Single level vs. multiple-level analysis
 What brands of beers are associated with what
brands of diapers?
December 9, 2021 5
Mining Association Rules—An Example
Transaction ID Items Bought Min. support 50%

2000 A,B,C Min. confidence 50%
1000 A,C
4000 A,D Frequent Itemset Support
{A} 75%
5000 B,E,F
{B} 50%
{C} 50%
For rule A  C : {A,C} 50%
support = support({A, C}) = 50%
confidence = support({A, C})/support({A}) = 66.6%
The Apriori principle:
Any subset of a frequent itemset must be frequent
December 9, 2021 6
Mining Frequent Itemsets: the
Key Step
 Find the frequent itemsets: the sets of items that have
minimum support
 A subset of a frequent itemset must also be a frequent
itemset
 i.e., if {AB} is a frequent itemset, both {A} and {B} should be a
frequent itemset
 Iteratively find frequent itemsets with cardinality from 1 to k
(k-itemset)
 Use the frequent itemsets to generate association rules.
December 9, 2021 7
The Apriori Algorithm
 Join Step: C is generated by joining L with itself
k k-1
 Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a

frequent k-itemset
 Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
December 9, 2021 8
The Apriori Algorithm — Example
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
December 9, 2021 9
Interestingness Measurements
 Objective measures
Two popular measurements:
 support; and
 confidence
 Subjective measures (Silberschatz & Tuzhilin,

KDD95)
A rule (pattern) is interesting if
 it is unexpected (surprising to the user); and/or
 actionable (the user can do something with it)
December 9, 2021 10
Criticism to Support and Confidence
 Example 1: (Aggarwal & Yu, PODS98)
 Among 5000 students

3000 play basketball

3750 eat cereal

2000 both play basket ball and eat cereal
 play basketball  eat cereal [40%, 66.7%] is misleading
because the overall percentage of students eating cereal is 75%

which is higher than 66.7%.
 play basketball  not eat cereal [20%, 33.3%] is far more
accurate, although with lower support and confidence

basketball not basketball sum(row)
cereal 2000 1750 3750
not cereal 1000 250 1250
sum(col.) 3000 2000 5000
December 9, 2021 11
Criticism to Support and Confidence
(Cont.)
 Example 2:
 X and Y: positively correlated, X 1 1 1 1 0 0 0 0
 X and Z, negatively related Y 1 1 0 0 0 0 0 0
 support and confidence of
Z 0 1 1 1 1 1 1 1
X=>Z dominates
 We need a measure of dependent
or correlated events Rule Support Confidence
P ( A B) X=>Y 25% 50%
corrA, B 
P ( A) P( B) X=>Z 37.50% 75%
 P(B|A)/P(B) is also called the lift
of rule A => B
December 9, 2021 12
Other Interestingness Measures: Interest
 Interest (correlation, lift) P( A  B)
P ( A) P ( B )
 taking both P(A) and P(B) in consideration
 P(A^B)=P(B)*P(A), if A and B are independent events
 A and B negatively correlated, if the value is less than 1;
otherwise A and B positively correlated
Itemset Support Interest
X 1 1 1 1 0 0 0 0
X,Y 25% 2
Y 1 1 0 0 0 0 0 0 X,Z 37.50% 0.9
Z 0 1 1 1 1 1 1 1 Y,Z 12.50% 0.57
December 9, 2021 13
Summary
 Association rule mining

 probably the most significant contribution from the
database community in KDD
 A large number of papers have been published
 Many interesting issues have been explored
 An interesting research direction
 Association analysis in other types of data: spatial
data, multimedia data, time series data, etc.
December 9, 2021 14

Data Mining:: Association Rules Techniques

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Data Mining:: Association Rules Techniques

Enviado por

Direitos autorais:

Formatos disponíveis

Data Mining:

Association Rules Techniques

 Association rule mining:

sets of items or objects in transaction databases, relational

leader analysis, clustering, classification, etc.

 buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]

 major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%, 75%]

 Given: (1) database of transactions, (2) each transaction is a

accessories also get automotive services done

transaction contains {X, Y, Z}

Customer probability that a transaction

Transaction ID Items Bought Let minimum support 50%, and

 Boolean vs. quantitative associations (Based on the

buys(x, “DBMiner”) [0.2%, 60%]

“PC”) [1%, 75%]

Transaction ID Items Bought Min. support 50%

 Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a

 Subjective measures (Silberschatz & Tuzhilin,

 actionable (the user can do something with it)

because the overall percentage of students eating cereal is 75%

accurate, although with lower support and confidence

 Association rule mining

Você também pode gostar