Escolar Documentos
Profissional Documentos
Cultura Documentos
Machine Learning:
An Overview
Bill Haffey
April 27, 2012
Statistical Analysis
–Confirm Hypotheses
–Data Requirements User
–More Assumptions
Driven
–Design importance
–General Population Predictions
Machine Learning
–Generate Hypotheses
–Exploratory
–Less Data Prep Data
–Fewer Assumptions Driven
–Individual Predictions
–Results Oriented
In a nutshell…
Machine Learning works by…
Machine Learning
Three classes of algorithms Group cases that
Cluster exhibit similar
Supervised vs. characteristics.
“Differences”
unsupervised
Complementary
What events
occur together?
Given a series of
actions; what
action is likely to Data
occur next?
Mining Predict
“Relationships”
Associate
“Patterns”
Predict who is likely
to exhibit specific
behavior in the
future.
Cat. %
Bad 52.01 168
Good 47.99 155
n
Cat. %
Bad 86.67 143
n
Paid Weekly/Monthly
P-value=0.0000, Chi-square=179.6665, df=1
Monthly salary
Cat. %
Bad 15.82 25
n
Young (< 25);Middle (25-35) Old ( > 35) Young (< 25) Middle (25-35);Old ( > 35)
Management;Clerical Professional
Cat. % n Cat. % n
Bad 0.00 0 Bad 58.54 24
Good 100.00 8 Good 41.46 17
7 SPSS Inc.
© 2009 © 2010 IBM Corporation
Business Analytics software
Decision Trees
Clustering
– An exploratory data analysis technique
– Reveals natural groups within a data set
– No prior knowledge about groups or
characteristics
– ‘Large’ groups interesting, but so are ‘small’
groups
– Not always an end in itself
Associations
– Finds things that occur together – ex: events in
a crime incident
– Associations can exist between any of the
attributes: (no single outcome like Decision
Trees)
Sequential Associations
– Discovers association rules in time-oriented
data
– Find the sequence or order of the events
15 SPSS Inc.
© 2009 © 2010 IBM Corporation
Business Analytics software
Retention
Modeling Current
Employees
(Education, job
Likelihood of Success history, experience,
Current
And Communication Skills > 7 Data
Then Success = Medium(35, 0.78)
Survey Data
(Attitudes, non work
Retention Incentives related factors)
1. Salary Increase , prob 0.23
2. Not applicable
3. Flexible Schedule, prob 0.87
4. PerformanceAward, prob 0.36 Managers reports on Data
5. Benefits, prob 0.54 employee satisfaction Collection
… and performance
Retention
Modeling Current
Employees
(Education, job
Likelihood of Success history, experience,
Predictive Modeling
Survey Data
(Attitudes, non work
Retention Incentives related factors)
1. Salary Increase , prob 0.23
2. Not applicable
3. Flexible Schedule, prob 0.87
4. PerformanceAward, prob 0.36 Managers reports on
5. Benefits, prob 0.54 employee satisfaction
… and performance
Retention
Modeling Current
Employees
(Education, job
Likelihood of Success history, experience,
Decision Optimization
Survey Data
(Attitudes, non work
Retention Incentives related factors)
1. Salary Increase , prob 0.23
2. Not applicable
3. Flexible Schedule, prob 0.87
4. PerformanceAward, prob 0.36 Managers reports on
5. Benefits, prob 0.54 employee satisfaction
… and performance
Sequencing Algorithm
Apriori results had shown that Ben had been billed for services by 2
different providers:
–Healing Nurses HHA
–Therapy at Home HHA
Initially, this might not be of interest, or not as much as Anita…
Examining the data further using sequencing, a suspicious pattern
emerges!
Law Enforcement
Law Enforcement
Problem: Spiraling crime rates, limited officer resources -- better
deployment decisions required
Solve: (In addition to incident data) weather, city events,
holiday/payday cycles, etc – better picture of criminal incidents,
more accurate prediction, more effective deployment
Night Day
N3D 3D
What is Normal?
Baseline Activity
Including resource usage, work hours,
document type…
Change in Cluster
Used to baseline activity of employees Membership
against:
Their own past history
The past history of their peers (job
title, department, project)
Spikes in Activity
Reversals in Trends
Reactive Analysis
…other
Segmentation
algorithms and
Association
algorithms are
also used to
group people
based on
behavior patterns
Proactive Analysis
Analysis
Analysis of
of documents
documents
accessed
accessed by by
employees
employees andand how
how
closely
closely each
each person
person isis
associated
associated toto certain
certain
topics of interest
topics of interest
Most
Most of of the
the work
work done
done within
within
proactive
proactive analysis
analysis is is used
used toto
contribute
contribute to to an
an individual’s
individual’s riskrisk score
score
or
or to
to create
create aa model
model to to classify
classify the
the
likely
likely risk
risk for
for that
that individual.
individual.
37 © 2010 IBM Corporation
Business Analytics software
Thanks