Você está na página 1de 12

Apriori Algorithm Review

for Finals.

SE 157B, Spring Semester 2007


Professor Lee
By
Gaurang Negandhi

1
Overview
 Definition of Apriori Algorithm
 Steps to perform Apriori Algorithm
 Apriori Algorithm Examples
 Pseudo Code for Apriori Algorithm
 Apriori Advantages/Disadvantages
 References

2
Definition of Apriori
Algorithm
 In computer science and data mining,
Apriori is a classic algorithm for learning
association rules.
 Apriori is designed to operate on
databases containing transactions (for
example, collections of items bought by
customers, or details of a website
frequentation).
 The algorithm attempts to find subsets
which are common to at least a minimum
number C (the cutoff, or confidence
threshold) of the itemsets.
3
Definition (contd.)
 Apriori uses a "bottom up" approach,
where frequent subsets are extended
one item at a time (a step known as
candidate generation, and groups of
candidates are tested against the data.
 The algorithm terminates when no
further successful extensions are found.
 Apriori uses breadth-first search and a
hash tree structure to count candidate
item sets efficiently.
4
5
Steps to Perform Apriori
Algorithm

6
Apriori Algorithm
Examples
Problem Decomposition
Transaction ID Items Bought
1 Shoes, Shirt, Jacket
2 Shoes,Jacket
3 Shoes, Jeans
4 Shirt, Sweatshirt
If the minimum support is 50%, then {Shoes, Jacket} is the only
2- itemset that satisfies the minimum support.
Frequent Itemset Support
{Shoes} 75%
{Shirt} 50%
{Jacket} 50%
{Shoes, Jacket} 50%
If the minimum confidence is 50%, then the only two rules generated from this 2-
itemset, that have confidence greater than 50%, are:

Shoes ⇒ Jacket Support=50%, Confidence=66%


7
Jacket ⇒ Shoes Support=50%, Confidence=100%
The Apriori Algorithm — Example
Min support =50%
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2 8
Pseudo Code for Apriori
Algorithm

9
Apriori
Advantages/Disadvantage
s
 Advantages
 Uses large itemset property
 Easily parallelized
 Easy to implement
 Disadvantages
 Assumes transaction database is memory
resident.
 Requires many database scans.

10
Summary
 Association Rules form an very applied data mining
approach.
 Association Rules are derived from frequent itemsets.
 The Apriori algorithm is an efficient algorithm for
finding all frequent itemsets.
 The Apriori algorithm implements level-wise search
using frequent item property.
 The Apriori algorithm can be additionally optimized.
 There are many measures for association rules.

11
References
 References
 Agrawal R, Imielinski T, Swami AN. "Mining Association
Rules between Sets of Items in Large Databases."
SIGMOD. June 1993, 22(2):207-16, pdf.
 Agrawal R, Srikant R. "Fast Algorithms for Mining
Association Rules", VLDB. Sep 12-15 1994, Chile, 487-
99, pdf, ISBN 1-55860-153-8.
 Mannila H, Toivonen H, Verkamo AI. "Efficient
algorithms for discovering association rules." AAAI
Workshop on Knowledge Discovery in Databases (
SIGKDD). July 1994, Seattle, 181-92, ps.
 Implementation of the algorithm in C#
 Retrieved from "http://
en.wikipedia.org/wiki/Apriori_algorithm"

12

Você também pode gostar