Você está na página 1de 15

Patterns and Pattern

Discovery
Richard J Bolton, PhD
Associate Director,
Strategic Consulting & Analytics
KnowledgeBase Marketing
Richard.Bolton@kbm1.com
KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing

KnowledgeBase
®
Marketing

Overview

ƒ What are patterns?


ƒ Pattern framework
ƒ Pattern discovery in practice
• Examples and case studies
ƒ Challenges in pattern discovery

Copyright © 2007, SAS Institute Inc. All rights reserved.

1
KnowledgeBase
®
Marketing

What do we mean by a ‘pattern’?


ƒ Several dictionary definitions
• A reliable sample of traits, acts, tendencies, or other
observable characteristics of a person, group, or
institution. A behavior pattern, spending patterns.
ƒ Examples
• Purchasing pattern. When I buy eggs I also buy ham
• Amino acid subsequences in a gene sequence
• Disease incidence in a specific area
ƒ But not… regular (repetitive) patterns
• Repetitive temporal pattern. I buy eggs every Saturday
• Repetitive spatial pattern. Wallpaper patterns

Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

History
ƒ Pattern discovery has a long history in
epidemiology through spatial patterns
(Cholera outbreak at water pump – Snow 1854)

Copyright © 2007, SAS Institute Inc. All rights reserved.

2
KnowledgeBase
®
Marketing

Patterns and Interestingness


ƒ Elements of an interesting pattern
• Unexpected even to domain experts
• Plausible and explainable
• Actionable

ƒ Example: ‘Beer and diapers’


• Supermarket analyzed their baskets and discovered an
unexpectedly high affinity between purchases of beer and
diapers – a behavioral pattern of shopping

Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

Patterns
ƒ A data-oriented definition of a pattern
ƒ Want to include the concepts that patterns
• Are localized events, distinct from some global,
normalized view of the world
• Result from a deterministic data generating mechanism
(they represent something that is not noise)
• Have varying degrees of ‘interestingness’
ƒ We can use this definition of a pattern to develop
a framework for pattern discovery

Copyright © 2007, SAS Institute Inc. All rights reserved.

3
KnowledgeBase
®
Marketing

What is a pattern?

ƒ A pattern is a local
structure Global Global – describes all data
ƒ A pattern generates data
with an anomalously
high density compared Segmentation –
with that expected under divide data
some (global) baseline
model Patterns – local
search
ƒ Baseline represents our
beliefs/expectations in the Outliers
system Local

Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

What adds value to a Pattern?


ƒ Pattern is real
• We are not seeing random noise (statistical tests)
ƒ Pattern is explainable
• Adds confidence that pattern is real
ƒ Pattern is unexpected
• It shows significant departure from a baseline that
represents our expectation of the behavior in the system
ƒ Pattern is actionable
• Can we use what we’ve found to exact positive change in
the system?
• Increase sales, reduce attrition, cross-sell, identify danger
spots, …
Copyright © 2007, SAS Institute Inc. All rights reserved.

4
KnowledgeBase
®
Marketing

A Statistical Framework
ƒ Start with traditional view of data
• Data = Model + Noise
• Data = Global structural part + Random part
ƒ Refine
• Data = Global structural part + local structural parts +
random part
• Data = Baseline + patterns + noise
ƒ Unsupervised vs Supervised pattern discovery
• Unsupervised: Local structures with unusually high
densities
• Supervised: Local structures with unusually high values
of Y variable (supervisor)
Copyright © 2007, SAS Institute Inc. All rights reserved.

Unsupervised Pattern
Discovery

KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing

5
KnowledgeBase
®
Marketing

Unsupervised Pattern Discovery


ƒ No target variable
ƒ Analogy is with density estimation or clustering
ƒ Patterns are local areas of high density
compared to an (expected) baseline

Global Global
Local Outlier
(Full Data) (Divisive)

Outlier
Cluster analysis Pattern search detection
Fit single
Unsupervised Allocates each Find regions of Find singletons
distribution to
observation to a local high density distant from
full data
cluster compared to a (baseline)
baseline (not all global distn or
observations clusters
allocated)
Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

Example
ƒ Earthquakes in California
38.5

ƒ Active areas in the faults are


the local deterministic data
(earthquake!) generating
38.0

mechanisms
ƒ Expected baseline
37.5

distribution depends on your


Latitude

domain knowledge
37.0

ƒ Can use graphical


observation to find the more
obvious patterns
36.5

ƒ Some statistical techniques


exist to determine less clear
36.0

patterns – are they -123 -122 -121 -120

deterministic or just noise? Longitude

Copyright © 2007, SAS Institute Inc. All rights reserved.

6
KnowledgeBase
®
Marketing

Marketing Case Study


ƒ Visual pattern discovery – C
purchasing pattern
ƒ Plot shows mail order
purchasing
V

ƒ Catalog mailings at E
Valentine’s, Easter, Mother’s
Day and Christmas = baseline M
F H
ƒ We discovered purchasing
patterns at Father’s Day and
Halloween
ƒ Led to catalog being designed
and mailed for Halloween
season Î increase in sales

Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

Association Rules
ƒ Market Basket Analysis
• Association Analysis, Product Affinity Analysis,
Recommender Engines, etc, etc
ƒ Find ‘interesting’ rules or associations
ƒ Naïve global baseline model is independence
• Pr(Eggs and Ham) = Pr(Eggs)xPr(Ham)
• Assess O(Eggs and Ham) vs E(Eggs and Ham |
Baseline)
• Various measures of association (some with
probabilistic meaning, some just based on frequency)
ƒ End up with ranking of ‘most associated’ products
ƒ Usually need to find some additional way of
ranking/filtering patterns according to what
makes them ‘interesting’
Copyright © 2007, SAS Institute Inc. All rights reserved.

7
KnowledgeBase
®
Marketing

Associations Case Study


ƒ Mail Order Consumer Goods
ƒ Interested to find out which products were
bought together
ƒ Used naïve (independence) baseline
ƒ Measured associations with a combination of
binomial probability and lift
ƒ Client saw list and was not surprised by most
associations (which is encouraging because
you’d hope they weren’t naïve!)

Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

Associations Case Study


ƒ Additional data
• Product group classification
ƒ Most associations were ‘same category’
ƒ By filtering to ‘cross-category’ associations we
removed the most ‘obvious’ patterns and brought
out patterns that were more interesting
ƒ We updated the baseline of expectation
ƒ Client has business rules for cross-sell in
catalogues or websites (not just based on click-
through)

Copyright © 2007, SAS Institute Inc. All rights reserved.

8
Supervised Pattern
Discovery

KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing

KnowledgeBase
®
Marketing

Supervised Pattern Discovery


ƒ Take a ‘supervisor’ variable
• E.g. response rate, average spend
ƒ Search through independent variables (e.g.
demographic, survey data) to find subspaces where
values of the supervisor are high
Global Global
Local Outlier
(Full Data) (Divisive)

Supervised
CART/CHAID Outlier
Supervised pattern detection
Regression Data split into search
segments, driven by a Unusually high
E.g. Find regions where
supervising variable. (or low) values
Linear/logistic supervisor variable
Each observation of supervisor.
associated with a has locally high values
May affect
split ‘Bottom up’ approach regression fit
‘Top down’ approach
Copyright © 2007, SAS Institute Inc. All rights reserved.

9
KnowledgeBase
®
Marketing

Supervised Pattern Discovery


ƒ Goal different from regular use of regression/classifiers
• Does not try to model/classify entire population
• Just finds locally ‘best’ (or ‘worst’) subspaces
ƒ Uses
• Choosing select(s) on mail lists
• Finding most responsive/ highest spending subgroups
ƒ CART, CHAID are examples of tools you could use
• Top down, greedy algorithm; but just keep high (or low scoring)
local subspaces.
• Others include PRIM (Patient Rule Induction Method) and
Subgroup Analysis (MIDOS) (‘bottom up’ approaches not widely
available)
• E.g. [Age >65, Income < $35K]: response rate = 20%
Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

Supervised Pattern Discovery


[MORI = 1; CGQS <= 88.5],
RESP = 38.8%

[MORI = 1; CGQS <= 88.5; CCIP <= 15],


RESP = 45.5%

Copyright © 2007, SAS Institute Inc. All rights reserved.

10
KnowledgeBase
®
Marketing

Supervised Spatial Pattern Discovery

Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

Searching for Interesting Patterns


ƒ The ‘flip-flop’
• Where the value of the supervisor changes (direction)
once you’ve drilled deeper into a pattern
ƒ Baseline: Naïve: Average response rate
ƒ Univariate profile
• A Î More likely to respond than average
• P(Y|A) > P(Y)
ƒ Baseline adjusted to be conditional on B
• A Î Less likely to respond than average, given B
• P(Y|A,B) < P(Y|B)

Copyright © 2007, SAS Institute Inc. All rights reserved.

11
KnowledgeBase
®
Marketing

Case Study – Attrition


ƒ Univariate profile
• P(Attrition | Product X) = Higher than average
• P(Attrition | Cancelled order) = Much higher than average
ƒ Remove customers whose first order was cancelled
ƒ New profile represents patterns of form
• P(Attrition | A, First order not cancelled)
• We found that for most patterns A, the attrition rates were
similar (in relative terms) to their univariate profile
• P(Attrition | Product X, First order not cancelled) = lower
than average attrition rate

Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

Case Study – Attrition – Explanation


ƒ Test results required from a third party before
product X could be shipped
ƒ Approval of test difficult to obtain when customer
wanted to buy product through mail order
ƒ Test not approved Î canceled order
ƒ Customer knows that if they can’t get test
approval for this product then they can’t use this
company to buy this product Î they attrite
ƒ BUT, if the test gets approved then the hard part
is done and repeat purchase is easy! The
customer is more likely to keep purchasing
ƒ Action: Educate prospects, try to make test
process easier somehow
Copyright © 2007, SAS Institute Inc. All rights reserved.

12
Challenges in Pattern
Discovery

KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing

KnowledgeBase
®
Marketing

Challenges in Pattern Discovery


ƒ Interestingness depends on quality/quantity of
domain knowledge
ƒ Interesting patterns are discovered only if the
baseline is specified correctly to represent the
client’s prior beliefs
ƒ Difficult to accurately estimate the ‘significance’
of a pattern
• Where statistics comes in…
• Null hypothesis is baseline
• Estimation harder the more complex the baseline
• Significance probability adjustment for multiple patterns

Copyright © 2007, SAS Institute Inc. All rights reserved.

13
KnowledgeBase
®
Marketing

Finding Patterns that are useful…

ƒ Some Patterns have an obvious explanation


• “[Age > 65]Î High conversion rate for senior insurance.”
• Mis-specification of baseline?
ƒ Suspected and easily explained
• “Those two products show a high association because
we positioned them next to each other in the catalogue”
ƒ Surprising but explainable
• “We weren’t expecting to see mail order purchases of
candy with those volumes because we don’t send out
catalogues at that time of year; however, it is
Halloween…”

Copyright © 2007, SAS Institute Inc. All rights reserved.

KnowledgeBase
®
Marketing

Finding Patterns that are useful…


ƒ Indication of Operational
peculiarities/Outliers/Data error
• “Male Î more purchases. That’s because if we don’t
have the name we assign to the male head of
household.”
• “[Age < 64]Î Non-zero purchase rate for senior
insurance.”
ƒ Surprising and unexplainable
• Twyman’s Law. “Any figure that looks different or
interesting is usually wrong”
• Check data!
ƒ Ideally Surprising, Explainable and Actionable!

Copyright © 2007, SAS Institute Inc. All rights reserved.

14
Thank You
Richard J Bolton, PhD
Associate Director,
Strategic Consulting & Analytics
KnowledgeBase Marketing
Richard.Bolton@kbm1.com

KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing

15

Você também pode gostar