Escolar Documentos
Profissional Documentos
Cultura Documentos
Steven L. Scott
Welcome!
bsts
2 / 100
Harvey
Chatfield
Petris et. al
bsts
3 / 100
Outline
Introduction to time series modeling
Structural time series models
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Extensions
bsts
4 / 100
Regression
ARMA
Smoothing
bsts
5 / 100
Regression models
bsts
6 / 100
Airline passengers
2.6
2.4
2.0
2.2
log10(AirPassengers)
500
300
100
AirPassengers
2.8
1950
1954
1958
1950
Time
1958
Time
Air passengers
1954
log scale
bsts
7 / 100
0.06
0.04
0.02
residuals
0.00
0.02
0.04
0.06
2.2
2.4
2.6
2.8
fitted
bsts
8 / 100
0.04
residuals
0.02
0.00
0.02
0.04
20
40
60
80
100
120
140
time
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0.2
10
Lag
15
20
ARMA models
ARMA(P,Q) models have the form
yt =
P
X
p ytp +
p=1
Q
X
q tq .
q=0
bsts
10 / 100
Stationary vs Nonstationary
See code in stationary.R
bsts
11 / 100
0
10
20
30
many.ar1[, 1]
many.random.walk[, 1]
10
10
AR1
200
400
600
800
1000
Time
400
600
800
1000
Time
yt = .95yt1 +
Steven L. Scott (Google)
200
yt = yt1 +
bsts
12 / 100
yt = .95yt1 +
Steven L. Scott (Google)
yt = yt1 +
bsts
13 / 100
Variance
AR1
yt = yt1 + t
= (yt2 + t1 ) + t
= ...
=
t
X
i ti .
i=0
t
X
t
i=0
2
Var (yt ) = t
Variance diverges to .
Steven L. Scott (Google)
bsts
14 / 100
Smoothing
Exponential smoothing
st = yt + (1 )st1
turns out to be the Kalman filter for the local level model.
Holt-Winters or double exponential smoothing captures a trend.
st = yt + (1 )(st1 + bt1 )
bt = (st st1 ) + (1 )bt1
This is the Kalman filter for the local linear trend model.
Triple exponential smoothing can handle seasonality as well, but
the formulas are getting ridiculous!
I
bsts
15 / 100
bsts
16 / 100
Outline
Introduction to time series modeling
Structural time series models
Models for trend
Modeling seasonality
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Extensions
Steven L. Scott (Google)
bsts
17 / 100
t N (0, Ht )
Transition equation
t+1 = Tt t + Rt t
I
I
t N (0, Qt )
18 / 100
Zt
State Vector
Tt
Trend
Seasonal
Regression
bsts
19 / 100
Example:
The basic structural model with a regression effect S seasons can be
written
yt = t + t + T xt +t
|{z}
|{z}
| {z }
trend
seasonal
regression
t = t1 + t1 + ut
t = t1 + vt
t =
S1
X
ts + wt
s=1
bsts
20 / 100
Local level
Autoregressive models
bsts
21 / 100
t N 0, 2
t = t1 + t1
t N 0, 2
yt = t + t
In the random walk model, your forecast of the future (given data to
time t) is yt .
In the constant mean model, your forecast is y.
The larger the ratio 2 / 2 the closer this model is to the constant
mean model.
In state space form
Tt = 1,
Zt = 1,
Rt = 1,
bsts
Ht = 2 ,
Qt = 2
August 10, 2015
22 / 100
5
0
5
15
local.level.rw
10
tau = 1, sigma = 0
20
40
60
80
100
60
80
100
60
80
Time
5
4
3
2
1
local.level.constant
tau = 0, sigma = 1
20
40
Time
5
0
5
local.level
10
20
40
Time
bsts
100
23 / 100
The model is
t N 0, 2
t = t1 + t1 + ,t1
,t N 0, 2
t = t1 + ,t1
,t N 0, 2
yt = t + t
Neat fact! The posterior mean of the local linear trend model is a
smoothing spline.
bsts
24 / 100
20
40
60
20
40
60
80
100
60
80
100
60
80
100
100
60 40 20
Time
20
40
120
80
40
Time
20
40
Time
Modeling seasonality
Modeling seasonality
I
S1
X
ts + t1
s=1
I
I
I
I
1
1 1 1
1
0 Tt =
1
0
0 Rt =
0
Zt =
0
0
1
0
0
Steven L. Scott (Google)
bsts
26 / 100
Modeling seasonality
Example
Modeling the air passengers data
data(AirPassengers)
y <- log10(AirPassengers)
ss <- AddLocalLinearTrend(
list(),
## No previous state specification.
y)
## Peek at the data to specify default priors.
ss <- AddSeasonal(
ss,
## Adding state to ss.
y,
## Peeking at the data.
nseasons = 12) ## 12 "seasons"
model <- bsts(y, state.specification = ss, niter = 1000)
plot(model)
plot(model, "help")
plot(model, "comp")
## "components"
plot(model, "resid") ## "residuals"
Steven L. Scott (Google)
bsts
27 / 100
Modeling seasonality
2.8
2.4
2.0
2.2
distribution
2.6
1950
1952
1954
1956
1958
1960
Time
plot(model)
I
bsts
28 / 100
Modeling seasonality
2.0
1.0
0.0
1.0
distribution
2.0
seasonal.12.1
0.0
distribution
trend
1950
1954
1958
1950
Time
plot(model, "comp")
Steven L. Scott (Google)
1954
1958
Time
## "components"
bsts
29 / 100
Modeling seasonality
0.00
distribution
2.5
2.3
2.1
distribution
2.7
trend
1950
1954
1958
1950
Time
1958
Time
1954
bsts
## "components"
August 10, 2015
30 / 100
1958
1954
1958
1954
0.05
distribution
1950
0.10
0.05
0.10
distribution
1950
Season 4
1958
1950
1954
1958
Season 6
Season 7
Season 8
1954
1950
1954
1958
1950
1954
0.10
0.05
0.10
0.05
0.10
1958
0.05
Season 5
distribution
Time
distribution
Time
distribution
Time
1958
1950
1954
1958
Season 9
Season 10
Season 11
Season 12
Time
1958
1950
1954
Time
1958
1950
1954
Time
1958
0.10
0.05
0.10
0.05
0.10
0.05
1954
0.05
Time
distribution
Time
distribution
Time
distribution
Time
0.10
1950
Season 3
Time
0.05
1950
0.05
distribution
1954
0.10
distribution
1950
distribution
Season 2
0.10
0.05
0.10
distribution
Season 1
1950
1954
Time
1958
Modeling seasonality
Setting priors
AddLocalLinearTrend(
state.specification
y,
level.sigma.prior =
slope.sigma.prior =
initial.level.prior
initial.slope.prior
sdy,
initial.y)
= NULL,
NULL,
NULL,
= NULL,
= NULL,
bsts
#
#
#
#
SdPrior
SdPrior
NormalPrior
NormalPrior
32 / 100
Modeling seasonality
SdPrior(sigma.guess,
sample.size = .01,
initial.value = sigma.guess,
fixed = FALSE,
upper.limit = Inf)
I
bsts
33 / 100
Modeling seasonality
> names(model)
[1] "sigma.obs"
[3] "sigma.trend.slope"
[5] "final.state"
[7] "one.step.prediction.errors"
[9] "state.specification"
"sigma.trend.level"
"sigma.seasonal.12"
"state.contributions"
"has.regression"
"original.series"
bsts
34 / 100
Modeling seasonality
Prediction
### Predict the next 24 periods.
pred <- predict(model, horizon = 24)
2.5
2.0
original.series
3.0
1958
1959
1960
1961
1962
1963
time
bsts
35 / 100
Outline
Introduction to time series modeling
Structural time series models
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Extensions
bsts
36 / 100
bsts
37 / 100
t2
I
y t1
yt
t1
bsts
38 / 100
t2
I
y t1
yt
t1
bsts
39 / 100
t2
y t1
yt
t1
bsts
40 / 100
t2
y t1
yt
t1
bsts
41 / 100
t N (0, Ht )
t+1 = Tt t + Rt t
t N (0, Qt )
Ft = ZtT Pt Zt + Ht
(forecast variance)
Tt Pt Zt Ft1
(Kalman gain . . .
at+1 = Tt at + Kt vt
. . . is a regression coefficient)
Kt =
bsts
42 / 100
bsts
43 / 100
Simulation smoother
[Durbin and Koopman(2002)] thought of a clever way to simulate p(|y).
1. Simulate data with the wrong mean, but the right variance.
2. Subtract off the wrong mean, and put in the right one.
The argument goes like this:
1. For multivariate normal (, y), Var (|y) is independent of y.
2. Simulate fake data (, y) from a structural time series model. The
conditional distribution (|
y) has the same variance as (|y).
3. Subtract E (|
y) from your simulated s, and add E (|y).
[Durbin and Koopman(2012)] (Section 4.6.2) have a fast state smoother that
can quickly compute E (t |y) (without computing each Pt ).
I
The DK simulation smoother requires two Kalman filters (for y and y) and
two fast state smoothers.
bsts
44 / 100
Break time!
bsts
45 / 100
Outline
Introduction to time series modeling
Structural time series models
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Extensions
bsts
46 / 100
Linear regression
I
,
| N b,
2
2 2
bsts
47 / 100
Posterior distribution
= V (XT y + 1 b)
SS = ss + yT y + b T 1 b T V 1
bsts
48 / 100
mean. (XT y = XT X)
bsts
49 / 100
XT X
n
E ( 2 )
df
ss
df
bsts
50 / 100
p(, 2 |y) =
bsts
51 / 100
Sparse modeling
I
If there are many predictors, one could expect many of them to have
zero coefficients.
Spike and slab priors set some coefficients to zero with positive
probability.
bsts
52 / 100
p() exp
|j |
0.6
prior
likelihood
posterior
0.4
pri
0.2
0.3
0.3
0.2
0.0
0.1
0.1
0.0
pri
0.4
0.5
0.5
0.6
prior
likelihood
posterior
beta
(weak likelihood)
beta
(stronger likelihood)
With 100 predictors there are 2100 models, which is about 1030 .
This argument absurdly overstates the case (because not all predictors
are exchangeable), but any algorithm that claims to find the right
model with this many candidates should be viewed with suspicion.
I
I
I
bsts
54 / 100
Let j = 1 if j 6= 0 and j = 0 if j = 0.
= (1, 0, 0, 1, , 1, 0, 0)
bsts
55 / 100
A useful parameterization
This prior is conditionally conjugate given .
Notation
I
1 where = 1.
1
means the rows and columns of
j j (1 j )1j
Spike
1
|, 2 N b , 2 1
df ss
,
2 2
Slab
bsts
56 / 100
Prior elicitation
df = 1
I
bsts
57 / 100
|y C (y)
1
2
p()
DF
|V1 | SS 2
|1
|2
bsts
58 / 100
Zt = T xt ,
bsts
Tt = 1,
Rt = 0
59 / 100
Regression coefficients .
2. Set yt = yt ZtT t
I
I
Simulate p(|)
Simulate |y
bsts
60 / 100
Youd need to know the y s that go along with these xs, so youd
have a missing data problem.
Step 1 Find the xs needed to diagonalize XT X.
Step 2 Repeat the the following steps:
1. Simulate the missing y s given and 2 .
2. Simulate and 2 given complete data.
bsts
61 / 100
Pro
I
I
Cons
I
I
bsts includes suport for ODA, but it is still experimental at this point.
bsts
62 / 100
Applications
Outline
Introduction to time series modeling
Structural time series models
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Nowcasting with Google Trends
Causal Impact
Extensions
Steven L. Scott (Google)
bsts
63 / 100
Applications
Outline
Introduction to time series modeling
Structural time series models
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Nowcasting with Google Trends
Causal Impact
Extensions
Steven L. Scott (Google)
bsts
64 / 100
Applications
Nowcasting
Maintaining real time estimates of infrequently observed time series.
Recession leading
indicator.
600
500
400
300
Thousands
700
800
900
Jan 10
2004
Jul 02 Jul 01
2005 2006
Jul 07
2012
bsts
65 / 100
Applications
You can restrict by type of search, time range, geo, or search category
(vertical).
Thats 600 public interest indicies you can use to predict YOUR
time series!
bsts
68 / 100
Applications
bsts
69 / 100
Applications
unemployment.office
I
filing.for.unemployment
I
idaho.unemployment
sirius.internet.radio
White: positive
coefficients
Black: negative
coefficients
sirius.internet
0.0
0.2
0.4
0.6
0.8
1.0
Inclusion Probability
bsts
70 / 100
Applications
Remaining lines
shaded by inclusion
probability.
Scaled Value
1 unemployment.office
0.94 filing.for.unemployment
0.47 idaho.unemployment
0.14 sirius.internet.radio
0.11 sirius.internet
2004
2006
2008
2010
2012
bsts
71 / 100
Applications
plot(model, "components")
2004
2006
2008
2010
time
2012
4
3
2
distribution
3
2
distribution
3
2
1
0
1
distribution
regression
seasonal.52.1
trend
2004
2006
2008
time
bsts
2010
2012
2004
2006
2008
2010
2012
time
72 / 100
Applications
Did it help?
20
40
60
80
100
3
2
1
0
1
scaled values
5 0
Jan 04
2004
Jul 03
2005
Jul 02
2006
Jul 01
2007
Jul 06
2008
Jul 05
2009
Jul 04
2010
Jul 03
2011
bsts
Jul 01
2012
73 / 100
Applications
Causal Impact
Outline
Introduction to time series modeling
Structural time series models
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Nowcasting with Google Trends
Causal Impact
Extensions
Steven L. Scott (Google)
bsts
74 / 100
Applications
Causal Impact
I
I
1. Be hard to measure,
2. Drive native search clicks,
3. Outlast the campaign.
Steven L. Scott (Google)
bsts
75 / 100
Applications
Causal Impact
Example
Real Google advertiser. 6-week ad campaign. Random shift added to both axes.
11000
10000
clicks
9000
8000
7000
6000
5000
Apr 01
Steven L. Scott (Google)
Apr 15
May 01
bsts
May 15
Jun 01
Jun 15
August 10, 2015
76 / 100
Applications
Causal Impact
Problem statement
I
Has a sale.
Begins (or modifies) an advertising campaign.
Introduces (or adopts) a new product.
We have data on both the actor and the similar actors prior to the
intervention.
bsts
77 / 100
Difference in differences
An old trick from econometrics. Only measures at two points.
Applications
Causal Impact
Synthetic controls
A more realistic counterfactual model than DnD
Especially problematic for marketing. You know your sales, but not your
competitors sales.
Steven L. Scott (Google)
bsts
79 / 100
Applications
Causal Impact
CausalImpact
Extends DnD and synthetic controls using BSTS
Forecast the time series over the intervention period given data from
the pre-treatment period.
I
I
I
bsts
80 / 100
The picture
Simulated data
a
post-intervention period
pre-intervention period
Applications
Causal Impact
Potential outcomes
I
Let yjst denote the value of Y for unit j under treatment s at time t.
T is the time of the market intervention.
What we observe:
Before T We observe yj0t for everyone
After T We observe yj1t for the actor and yk0t for the potential
controls k 6= j.
If we could also observe yj0t for the actor then yj1t yj0t would be
the treatment effect.
bsts
82 / 100
Applications
Causal Impact
Case study
A Google advertiser ran a marketing experiment.
bsts
83 / 100
Applications
Causal Impact
Plot shows clicks from treated vs untreated geos. Each dot is a time point.
9000
7000
5000
4000
Steven L. Scott (Google)
5000
6000
clicks (control
bsts region)
before
during
after
7000
August 10, 2015
84 / 100
Applications
Causal Impact
Case study
Google advertiser. Treated vs. Untreated regions
a
pre-intervention
intervention
post-intervention
bsts
week 7
week 6
week 5
week 4
week 3
week 2
week 1
week 0
week -1
week -2
week -3
week -4
85 / 100
Applications
Causal Impact
Case study
Google advertiser. Competitors clicks as predictors
a
pre-intervention
intervention
post-intervention
bsts
week 7
week 6
week 5
week 4
week 3
week 2
week 1
week 0
week -1
week -2
week -3
week -4
86 / 100
Applications
Causal Impact
Case study
Google advertiser. Untreated regions. Competitors sales as predictors
a
pre-intervention
intervention
post-intervention
bsts
week 7
week 6
week 5
week 4
week 3
week 2
week 1
week 0
week -1
week -2
week -3
week -4
87 / 100
Applications
Causal Impact
Case study
Summary
Clicks
84,100
84,800
8,000
%
20
21
2
95% Interval
(15, 26)%
(13, 26)%
(-5, 6)%
I
I
Google trends are publicly available, while competitor clicks are not.
Many more potential controls for Google trends. Spike and slab
variable selection / model averaging is useful for selecting appropriate
control groups.
bsts
88 / 100
Extensions
Outline
Introduction to time series modeling
Structural time series models
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Extensions
Normal mixtures
Longer term forecasting
Steven L. Scott (Google)
bsts
89 / 100
Extensions
Normal mixtures
Outline
Introduction to time series modeling
Structural time series models
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Extensions
Normal mixtures
Longer term forecasting
Steven L. Scott (Google)
bsts
90 / 100
Extensions
Normal mixtures
Draw p(|y, , w)
w Ga(/2, /2)
y |w N , 2 /w
bsts
91 / 100
Extensions
Normal mixtures
300
250
200
150
RSXFS / 1000
350
Retail Sales
(Excluding Food Service)
Jan
1992
Jan
1996
Jan
2000
Jan
2004
Jan
2008
Jan
2012
bsts
92 / 100
Extensions
Normal mixtures
t N 0, 2
y t = t + t
t = t1 + t1 + ,t1
,t T (0, 2 )
t = t1 + ,t1
,t T (0, 2 )
If you tell the model that occasional large errors are possible, it is not
surprised by occasional large errors.
bsts
93 / 100
Extensions
Normal mixtures
Standard Deviations
Student
Slope
Gaussian
Slope
Student
Level
Gaussian
Level
0.0
0.2
0.4
0.6
0.8
1.0
Because the model is aware that occasional large errors can occur, the
standard deviation parameters can be smaller.
bsts
94 / 100
Extensions
Normal mixtures
Impact on predictions
1995
2000
2005
Time
2010
600
200
300
400
original.series
500
600
500
400
original.series
200
300
400
300
200
RSXFS / 1000
500
600
Original Data
2009
2010
2011
2012
2013
2014
2009
time
2010
2011
2012
2013
2014
time
The extreme quantiles of the predictions under the Student model are
wider than under the Gaussian model.
bsts
95 / 100
Extensions
Normal mixtures
Similar tricks can be used to model probit, logit, and Poisson responses,
and even dynamic support vector machines by expressing these
distributions as normal mixtures.
bsts
96 / 100
Extensions
Outline
Introduction to time series modeling
Structural time series models
MCMC and the Kalman filter
Bayesian regression and spike-and-slab priors
Applications
Extensions
Normal mixtures
Longer term forecasting
Steven L. Scott (Google)
bsts
97 / 100
Extensions
t = t1 + t1 + ,t1
,t N 0, 2
t = D + (t1 D) + ,t
,t N 0, 2
yt = t + t
I
I
bsts
98 / 100
Extensions
1995
2000
2005
Time
2010
600
500
200
300
400
original.series
500
200
300
400
original.series
500
400
300
200
RSXFS / 1000
600
600
Original Data
2009
2010
2011
2012
time
bsts
2013
2014
2009
2010
2011
2012
2013
2014
time
99 / 100
References
bsts
100 / 100