Você está na página 1de 128

“…OVER TIME, WE BELIEVE BIG DATA MAY WELL

BECOME A NEW TYPE OF CORPORATE ASSET


THAT WILL CUT ACROSS BUSINESS UNITS AND
FUNCTION MUCH AS A POWERFUL BRAND
DOES, REPRESENTING A KEY BASIS FOR
COMPETITION…”
MCKINSEY QUARTERLY
Business analytics: Agenda

1. “Big data” – sources, challenges, and promise


(today)
1. Leveraging analytics for competitive advantage:
The 4 pillars of business analytics.
(later this week)

Page 2
Demystifying (defining) big-data
The challenge of Big Data (by IBM)
Four main sources

1
Enterprise Systems
2
Social Media

3 4
8
Four main sources

1
Enterprise Systems
2
Social Media

3Mobile
4
Mobile targeting – time and geography

11
Geo-Conquesting

https://vimeo.com/44351185
12
Four main sources

1
Enterprise Systems
2
Social Media

3Mobile
4
The Internet
of Things
IoT is bringing about an explosion in
connected devices and huge data sets
Internet-of-Things Projections

Some Big Numbers: Some small numbers:

14bn Connected Devices | Bosch SI Peter Middleton, Gartner:

50bn Connected Devices | Cisco


“By 2020, component costs
309bn IoT Supplier Revenue | Gartner will have come down to the
point that connectivity will
1,9tn IoT Economic Value Add | Gartner become a standard feature,
even for processors costing
less than

$1
7,1tn IoT Solutions Revenue | IDC

http://postscapes.com/internet-of-things-market-size

15
iBeacons – IOT used in the Store

iBeacons indoor positioning systems can interact directly with smart phones
e.g. using Bluetooth Low Energy (BLE)

16
Targeting via iBeacon

16 % higher unplanned spending


(Ghose et al 2015)
Applications beyond marketing

Monitor the driving


habits of drivers

19
Applications managing infrastructure

– Sensors (or a drone)


tell your parking app
about vacant spots
– Sensors rich garbage
bins can schedule
pickups!
– Smart LED streetlights
only light up if a
pedestrian approaches

20
Class exercise: smart chairs

21
IoT Leads to Better Measurement

Has always been a foundation for progress:


– Medicine changed drastically post microscope!
– ERP systems automate key business processes
and serve
as a foundation for modern management!
– Today, our key social processes are digitized:
• 46% of US singles found their romantic partner
online
• Facebook is where ‘friends’ do what they like to do
– IoT has the potential to change our daily lives

22
Data as a disruptive force
Analytics as a “disruptive technology”
Data driven healthcare
Use of IT in agriculture is growing
Jojoba Israel: Lots of data are constantly collected
but analyzed separately

Weather
station +
Excel files
Weather Irrigation

Excel files
+
Yield Soil Excel files
No
Integration…
B2B in the data area?
What can business do?
(Analytics)
Data-Driven Optimization

Organizations that use data-driven decision-making are 5% more productive and 6%


more profitable than their competitors. - MIT

Data-Driven
Status Quo
Optimization

What Happened? What’s our best outcome?


 Internal Data only  Blended Internal &
 Standard Report External Data
 Ad hoc Queries  Predictive Analytics
 Exception Reporting  Sophisticated Modeling
 Monthly Report  Machine Learning
Generation  Real-time Analysis
 ‘Gut feel’ decisions  Fact-based Decisions

Page 30
Analytics Ladder

Courtesy: David Hardoon


Page 31
4 Pillars of business analytics

Page 34
Descriptive vs. predictive

• Descriptive data analytics


• Also sometimes called “exploratory data mining” or
“unsupervised learning” Goal: Find patterns in data (such as
association rules, meaningful segments / clusters, or
anomalies)
– A much broader perspective of “exploratory data analytics”
includes a variety of additional approaches: data
visualization, descriptive statistics, correlation, data
reduction, OLAP technologies, queries, and reporting

• Predictive data analytics


– Also sometimes called “ “supervised learning”
– Goal: Predict a target/outcome variable (such as
purchase/no purchase, fraud/no fraud, creditworthy/not
creditworthy, etc.), typically by building predictive models
Four Key Ideas we will cover

Descriptive:
• Clustering
• Association rules Unsupervised
learning
Patterns

Predictive:
• Classification Supervised
• Prediction learning Models
4 Pillars of business analytics

Page 38
Clustering

Finding elements of data – clusters - that have a


high degree of similarity, and grouping them
together
Example: identifying customer segments (for
which we can make different offers).
Main idea: organizing data into most
natural groups
Amount Example: understanding the consumer base on your website
spent per (based on age, gender, amount spent…)
visit

m f mm m m
100 m m
m m f m m
m m m f
m
Cluster 2 mm
m m
m mm m
60 f m m m m
mm f m mm m m
f f m f mm m m
f
m mm m
m mm m
m m m
m fm mm m m
m
20 mm
Cluster 1 Cluster 3

20 30 40 50 Age
How about going beyond eyeballing the data in 2-3 dimensions?
 Need general-purpose techniques to deal with any-dimensional data
Example: clustering mall visitors

RESULT
QUESTION APPROACH Location- and
What data Analysis of Wi-Fi behavior-based
sources can be usage mapped to based insights for
used? physical space. tenant and mall
strategies.

Image (cc) flickr/ Will


Clustering: Basic Ideas
• Organizing data points/objects (e.g., customers) into
homogeneous (and, hopefully, meaningful) groups/clusters

• Desired properties of clustering result:


– High intra-similarity, i.e., any two data points / objects
that are assigned into the same cluster should exhibit
similarity to each other
– Low inter-similarity, i.e., any two data points / objects
that are assigned into different clusters should not be very
similar to each other (why?)

• Helps to gain insights into your data


– Instead of trying to look at the entire dataset (e.g., a huge
number of customers), you can inspect the representative data
groups/clusters (e.g., a small number of groups, into which your
data can be arranged most naturally)
– Usually a useful precursor for additional, deeper analyses
– Many applications!
Clustering case Study:
Customer Segmentation for Regional
Airline
Clustering case Study:
Customer Segmentation for Regional
Airline
• Goal: break down a large data set into small similar groups based on
customer attributes.

• Customer attributes considered in


this situation included:
• Travel frequency
• Average days booked in
advance
• Number of flights per trip
• Percentage of round trip
• Percentage of group trip
• Booking channels
Clustering case Study:
Can you find a “title”?
Clustering case Study:
Customer Segmentation for Regional
Airline
Clustering case Study:
Can you find a “title”?
Association rules

• (also known us co-occurrence grouping)


• Attempts to find associations between entities
based on transactions involving them.
• Important: no examples are provided to the
model; no “correct answer” exists
• Example: Amazon
4 Pillars of business analytics

Page 51
Classification
Assigning each individual to one of several
pre-defined categories (or classes)
• Objective:
– to predict classification when unknown or will occur in
the future,
– based on rules derived from similar data where the
classification is known

?
Cardiac
Rhythm
Classification
Prediction (Regression)

Estimate (or predict) a numerical value of


specific variable based on past and current data

Stock Price

http://www.blueflag.com.au/blog/why-australians-wont-
buy-1-million-cars-2011

http://mechonomic.blogspot.com/2010/07/ibm-share-price-on-
decline.html
Example: prediction of mall visitors’
next step

QUESTION APPROACH
RESULT
What data Analysis of Wi-Fi
sources can be usage mapped to ?
used? physical space.

Image (cc) flickr/ Will


Case study: Large shopping malls in china
3 coupon types: Random, location and trajectory
0.35 60
0.3 50
0.25 40
0.2
30
0.15
20
0.1
0.05 10
0 0
C Random Location Trajectory C Random Location Trajectory

Highest Redemption Rate Highest Spending in Store


30 20
25
15
20
15 10
10
5
5
0 0
C Random Location Trajectory C Random Location Trajectory

Least Time Spent in Store Time Elapse Until Redemption


What is Different from Classical Statistics?

Assumptions in classical statistics:


“data is scarce”
“computing is difficult”

The result:
same sample is used to make estimation AND
Determine how reliable the estimates are

Do you find “confidence intervals” and “hypothesis


testing” easy to explain to your non-technical
colleagues?

62 / 37
What is Different from Classical Statistics?

Assumptions in data mining:


“data and computing are abundant”

The result:
Fit a model with one sample
Assess its performance with another sample
Use computationally intensive techniques
(examples: classification trees, neural networks)

63 / 37
What is Different from Classical Statistics?

Advantage of data mining:


- Can be open ended
- No need for a hypothesis testing

The danger:
- Over-fitting: model fit so closely to the available
sample of data describes not merely structural
characteristics of the data, but random
peculiarities as well

64 / 37
Over-fitting

65 / 37
Let’s try (supervised learning)
Terminology we will need

• Training data: portion of data used to fit a


model
• Validation data: portion of the data used to
assess how well the model fits and also:
– to adjust some models
– select the best model from among those that
have been tried
• Test data: portion of the data used only at the
end of the model building and selection process
to assess how well the final model might
perform on additional data

67 / 37
Example: Buyer/ non – buyer classification

• A riding-mower manufacturer classifies families


into:

a. those likely to purchase a riding mower


b. those not likely to buy one

• The question: can we derive a method to help


us identify future buyers?

• The data (or the “predictor variables”):


– Income ($ 000s)
– Lot size (sq ft 000s)
(Lawn) Mowers data

Observation Income ($000's) Lot Size (000's sq. ft.) Buyers = 1, Non-buyers = 2
1 60 18.4 1
2 85.5 16.8 1
3 64.8 21.6 1
4 61.5 20.8 1
5 87 23.6 1
6 110.1 19.2 1
7 108 17.6 1
8 82.8 22.4 1
9 69 20 1
10 93 20.8 1
11 51 22 1
12 81 20 1
13 75 19.6 2
14 52.8 20.8 2
15 64.8 17.2 2
16 43.2 20.4 2
17 84 17.6 2
18 49.2 17.6 2
19 59.4 16 2
20 66 18.4 2
21 47.4 16.4 2
22 33 18.8 2
23 51 14 2
24 63 14.8 2
Graphical View
Decision Trees

• Classification Tree – binary outcome


– Will the buyer purchase or not?
• Regression Trees – continuous outcome
– How much will the buyer spend?

• Very broadly applicable technique


• Easy to explain “rules”
Key task is to algorithmically find the splits in
the data that help classifying (separating)
buyers and non-buyers

X2 <= 21
X2 <= 19?

Which is a better split?


Recursive Partitioning

X2 <= 19

X1 < 84.75
Final “Pure” Split
Classification Tree

Decision
Nodes

Leaf
nodes
Why are Decision Trees Popular?

• Tells you which predictors are important


– Variable subset selection is automatic (since
it is part of the split selection)
– Wine.xls uses only 2 out of 13 variables
• No hassle with outliers
– choice of a split depends on the ordering of
observation values and not on the absolute
magnitudes
• No hassle with missing data

• Easy interpretation and implementation


– If then else rules….
Classification and Regression Trees
(CART)

• Very broadly applicable technique


• Easy to explain “rules”

• 2 key ideas
1. Recursive partitioning of the space of the
independent variables
2. Second is of pruning using validation data
Key performance metrics
Key performance metrics

• Accuracy: percentage of times the model


classified both class 0 and class 1 accurately
𝑇𝑇𝑇𝑇+𝑇𝑇𝑇𝑇
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 =
𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹+𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹

• Precision: out of the positive cases, how many


were predicted correctly?
𝑇𝑇𝑇𝑇
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 =
𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹

• Recall: Out of the cases classified as positive,


how many are positive?
𝑇𝑇𝑇𝑇
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 =
𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹
Let’s try (supervised learning)
using Azure
Welcome to Azure!

• Experiment: your “sandbox”


• Step 1: get the data
• Step 2: preprocess the data
(don’t forget to split it!)
• Step 3: choose and run a modeling algorithm
• Step 4: score and evaluate your model

81 / 37
Example 1: predict automotive prices

• Step 1: get the data – “automotive price data”


• Step 2: preprocess the data
– Clean missing values
– Split (75-25)
– Define features (make, body-style, wheel-base,
horsepower, peak-rmp, highway-mpg, price)
• Step 3: choose and run a modeling algorithm
– Linear regression
– Train model
• Step 4: score and evaluate your model

82 / 37
Example 2: income classification

• Step 1: get the data – “adult income binary


classification dataset”
• Step 2: preprocess the data
– Clean missing values
– Split (75-25)
• Step 3: choose and run a modeling algorithm
– Decision trees
– Train model
• Step 4: score and evaluate your model
– Confusion matrix
– Accuracy
– ROC curve

83 / 37
Moving from correlation to
causation
Leveraging Analytics for Competitive
Advantage

Page 85
Example

Senior business leader wants to know,


“Did the website redesign increase sales? Can you run
a report?”

BI Analyst
runs report »

Page 86
Example

Senior business leader wants to know,


“Did the website redesign increase sales? Can you
run a report?”

But that’s the


wrong question

Page 87
An (older) Example

• Amazon – shopping cart recommendations


– A marketing senior VP was against it:
• It might distract people away from
checking out
– Results?

8
Source: http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html
Search Engine Ads with Site Links

 Should search engine add “site links” to ads, which allow


advertisers to offer several destinations on ads?
 OEC: Revenue, ads constraint to same vertical pixels on avg

A B

Source: Ronny Kohavi, MSFT


Pro: richer ads, users better informed where they land
Cons: Constraint means on average 4 “A” ads vs. 3 “B” ads
Variant B is 5msc slower (compute + higher page weight)
Left hand Right hand

Page 89
Search Engine Ads with Site Links

 <answer>

 The above change was costly to implement.


MSFT made two small changes to Bing, which
took days to develop, each increased ad revenue
by about $100 million annually.
 (One was delayed by 6 months because it was
not prioritized high, a prioritization mistake that
cost $50M)
Source: Ronny Kohavi, MSFT

Page 90
Our intuition is poor

Ideas tested at

Flat: no significant Prove to be statistically


difference significant, positive
1/3 1/3 changes

1/3
Statistically significant,
but negative
Source: Ronny Kohavi, MSFT

Page 91
Non-tech Companies where A/B
testing is standard operating
procedure
• Walmart
• Hertz
• Singapore Airlines
• Capital One
• (Not to mention Google, Amazon – moving credit card
offers to checkout page was a $10 million effect! –
Booking.com, Facebook, Uber, Airbnb)
• Requires infrastructure (tools exist)
– instrumentation (to record such things as clicks, mouse
hovers, and event times)
– data pipelines, and
– data scientists

Page 92
Two Methodological Paradigms

Causation & Prediction


Controlled experiments Predictive models tell you
are a powerful tool to where to look for these
evaluate ideas, and to forces
understand
fundamental forces at
play

Page 93
The Goal of Causal Analysis

• Use scientific methodology to support decision


making
• Cause and effect questions
– Test theory of causal relationships
– Contribute knowledge on the nature of a
causal relationship
– Transparent methodology
– Reproducible procedures

9
Approaches to Study Causation

• Observational
The researcher looks for natural differences across cases
and tries to find a single input that might have caused
the variation in outcomes
• Experimental
The researcher conducts an experiment… If outcomes
vary across the treatment and control groups, the (Teele 2013)
difference must be due to the catalyst

A B
Diet Soda Anyone?

• Discovery: people drinking diet soda are overweight

• Is it:
– Consuming Diet Coke makes you fat?
or
– Overweight people are ordering Diet Coke because
they want to lose weight

9
Storks Deliver Babies (p=0.008)
R. Matthews(2000)

http://priceonomics.com/do-storks-deliver-babies/

Matthews, R. (2000). Storks deliver babies (p=0.008). Teaching


Statistics,22(2), 36-38. 9
The Negative Effect of Science?

9
Causality

• The act or process of causing something to happen


or exist
• The relationship between an event or situation and
a possible reason or cause
(merriam-webster)

A B

9
Causality

1 C
Figure 1.1 - Morgan S. and Winship
Cause and Effect

• To establish a cause and effect relationship?


– The cause must precede the effect
– The cause must be related to the effect
– No other plausible alternative explanation

1
Correlation does not imply
causation
• The First Law of Data Science: “To
determine if a correlation is true in the real
world, it must be verified empirically”

(Dr. Michael L. Brodie, KDD 2014)


Inferring Causality

• Association
– A statistically significant correlation or regression
coefficient - the likelihood of its occurrence by
chance alone is small.
• Time order of occurrence
– The causal variable must precede the outcome
variable in time
• Eliminating other potential causes
Advantages of Experiments

• Best scientific way to prove causality


– The effect in the dependent variable caused by
changes introduced in the treatment (Kohavi, 2015)
• An effective way to obtain unbiased estimates
of causal effects (Aral and Walker, 2012)
Experiment - Basic Concept

Source: Kohavi R., KDD


2015
1
A/B Testing

1
A..Z test?

• The Multi-Armed Bandit Problem

1
Challenges

VS.

1
Challenges

• Minimize the possibility that the results you


get might be due to a hidden confounding
factor

1
Challenges

• What to measure?
– Define the OEC – Overall Evaluation Criterion
• Minimize difference between control/treatment
group
• How long should we run the experiment?
• How to measure the significance of the
results?
– Statistical significance
– Economic significance
• Heterogenous treatment effect

1
When Should We Use Experiments?
• Choice between known options
• Examples:
– 41 Shades of Blue (Google)
– Every 100ms counts (Amazon)
– Encryption notification (Kayak)
When Should We Use Experiments?

• Less suitable for:


– New experiences
• Change averse
• Novelty effect
– Fuzzy questions/opportunities?
• What is not offered?
• Which product to develop?
– Long term activity

1
When Should We Use Experiments?

• Which product should we sell?


• Add new premium service?
• Change logo

1
Terminology and Notations

• di – treatment variable:
– di = 1  the ith subject receives the treatment
– di = 0  the ith subject does not receive the
treatment
• Yi(d) – the potential outcome of the ith subject
– Yi(1) – potential outcome when treated
– Yi(0) – potential outcome when not treated
• The subject-level treatment effect  τi = Yi(1) – Yi(0)

1
Terminology and Notations

Yi(0) Yi(1) τi

Student 1 80 85 5

Student 2 85 85 0

Student 3 90 100 10

Student 4 65 60 -5

Student 5 60 70 10

Student 6 85 85 0

Student 7 85 100 15

Average 78.57 83.57 5


1
Terminology and Notations

• Observed outcomes:
– The connection between the observed outcome 𝑌𝑌𝑖𝑖
and the underlying potential outcomes is given by
the equation
𝑌𝑌𝑖𝑖 = 𝑑𝑑𝑖𝑖 𝑌𝑌𝑖𝑖 1 + 1 − 𝑑𝑑𝑖𝑖 𝑌𝑌𝑖𝑖 0
– For any given subject, we observe either 𝑌𝑌𝑖𝑖 1 or
𝑌𝑌𝑖𝑖 0 , not both
• The fundamental problem of causal
inference
only one of 𝑌𝑌𝑖𝑖 1 and 𝑌𝑌𝑖𝑖 0 is observed, so we can never
find the true causal effect.

1
Terminology and Notations

Yi(0) Yi(1) τi
Student 1 85 ?

Student 2 85 ?

Student 3 100 ?

Student 4 65 ?

Student 5 60 ?

Student 6 85 ?

Student 7 85 ?

Average 73.75 90 16.25


1
Terminology and Notations

• Average Treatment Effect - ATE

𝑁𝑁 𝑁𝑁
1 1
𝐴𝐴𝐴𝐴𝐴𝐴 = 𝜇𝜇𝑌𝑌(1) − 𝜇𝜇𝑌𝑌(0) = � 𝑌𝑌𝑖𝑖 1 − � 𝑌𝑌𝑖𝑖 0
𝑁𝑁 𝑁𝑁
𝑖𝑖=1 𝑖𝑖=1
𝑁𝑁
1
= � 𝑌𝑌𝑖𝑖 1 − 𝑌𝑌𝑖𝑖 0
𝑁𝑁
𝑖𝑖=1

1
Hypothesis Testing

• Null Hypothesis
– Yi(1)=Yi(0)
or
– ATE=0
For Completely Randomized Design

1
Hypothesis Testing

Control Treatm
ent

 H0:ATE=0
 H1:ATE≠0

1
Error Types

1
Random Assignment

• Each participant has a known (usually equal)


chance of being assigned to any of the groups.
• Successful randomization - group assignment
cannot be predicted in advance.

1
Random Assignment

Colors
Before Random symbolize
Assignment any
differentiatin
g attribute
among the
After Random individuals
Assignment

Control Treatment
Experimental Groups 1
What if people chose their
condition?

Colors
Before choosing symbolize
any
differentiatin
g attribute
among the
Systematic individuals

error

Control Treatment
Self-selected Groups 1
Selection Bias

• Simple Example
• Sample Selection Bias
– Average height of Americans?

• Self-Selection
– caused when the sample chooses itself
– certain characteristics are over-represented
because they correlate with willingness to be
included.

1
Blind Experiment

Images by lc.gcumedia.com 1
Complete Randomized Design
(CRD)
• Random assignment of subjects to a set of
treatments
• Any variable that could influenced the
response variable is equalized between the
groups

The effect is only due to the treatment


imposed

1
Heterogeneous Treatment Effects

• Does the treatment has the same effect on the


treated?
– Female/Male?
– Age group?
– Education?
• HTE – measure the effect on sub populations
– Pre defined (known) populations
– Advanced data methods

1
What to Test?

• Understanding consumers behavior:


– Cognitive bias
– Rational\irrational behavior
– Social effect
– Price sensitivity
• Website design

1
Design Choices in Online Experiments

Type of Experiment:
• Lab/Virtual Lab

• Field Experiment

• Natural Experiment

1
Lab Experiment

• Conducted in a well-controlled environment


– All variables can be controlled

1
Lab Experiment

• Participants are aware that they are taking part in an


experiment.
• They may or may not know the true aims of the experiment
• Settings don’t always resemble “real world”
• Participants don’t resemble other populations
– Samples are generally non-random
– Small samples, at least by survey data standards
– Participants are often college undergraduates
– Participants are often WEIRD:
• Western, educated, industrialized, rich, democratic

1
Field Experiment

• Examine an intervention in the real


environment
• The subjects are naturally undertaking certain
tasks
• The subjects do not know that they are that
they are participating in an experiment
• The researchers manipulate the independent
variable

1
Case Study: the effect of SEM

• 49% is SEM (Search engine


marketing)
(non-mobile + mobile)

• Google is the leading SEM


provider, advertising ≈95% of
revenues

• What is the ROI of SEM?

1
Search Engine Marketing

1
Paid Search Effectiveness

(Blake, Nosko, & Tadelis,


• The business question:
– What is the ROI of paid search for eBay?

• Hypothesis:
– queries with the word eBay  intent to visit ebay.com  paid
search results substitutes for natural ones. Ads are
navigational.

• Treatment:
– Stop brand related terms (“ebay shoes”) @ Bing

• Control:
– Google, Yahoo!

1
Paid Search Effectiveness

(Blake, Nosko, & Tadelis,


• Simple pre-post analysis (w/o
control): 5.6% decrease in
total clicks

• With control: 0.59% of clicks


lost, but not statistically
significant

• 99.5% substitution between


paid and natural

1
Paid Search Effectiveness

(Blake, Nosko, & Tadelis,


• Follow up test on Google

• No control:

no other brand SEM campaigns

• Pre-post estimate shows 3% clicks

lost.

1
Impact

1
Leveraging Analytics for Competitive
Advantage

Page 140
Analytics – take A-ways
• For (almost) every question you can use data.
– Just be creative about data sources and models
– look for places were patterns can occur

• Be creative in looking for data


– Don’t forget to look outside the organization!

• Ask yourself if this is a supervised task


– For example – do I have examples to provide?
– If not – don’t despair! Patterns are still possible

• Can you use an experiment to get causal


understanding?

• Know how to read your results.

• Remember it is not really complicated!

Você também pode gostar