Escolar Documentos
Profissional Documentos
Cultura Documentos
Spring 2015
Unit 1 (v5) - 1 -
Spring 2015
Unit 1 (v5) - 2 -
Logistics
Web site
http://seor.gmu.edu/~klaskey/SYST664
Blackboard site: http://mymason.gmu.edu
Requirements
Regular assignments (20%): can be handed in on paper or through Blackboard
Take-home midterm (30%) and final (30%)
Project (20%): apply methods to a problem of your choosing
Office hours
Official office hours are 4:30-5:30PM Wednesdays and Thursdays
I respond to questions by email and am available by appointment
Spring 2015
Unit 1 (v5) - 3 -
Course Outline
Unit 1: A Brief Tour of Bayesian Inference and Decision Theory
Unit 2: Random Variables, Parametric Models, and Inference
from Observation
Unit 3: Statistical Models with a Single Parameter
Unit 4: Monte Carlo Approximation
Unit 5: The Normal Model
Unit 6: Gibbs Sampling
Unit 7: The Multivariate Normal Model
Unit 6: Bayesian Regression and Analysis of Variance
Unit 8: Hierarchical Bayesian Models
Unit 9: Linear Regression
Unit 10: Metropolis-Hastings Sampling
Spring 2015
Unit 1 (v5) - 4 -
Spring 2015
Unit 1 (v5) - 5 -
Bayesian Inference
Bayesians use probability to quantify rational degrees of belief
Bayesians view inference as belief dynamics
Begin with prior beliefs
Use evidence to update prior beliefs to posterior beliefs
Posterior beliefs become prior beliefs for next evidence
Spring 2015
Unit 1 (v5) - 6 -
Decision Theory
Decision theory is a formal theory of decision making under
uncertainty
A decision problem consists of:
Possible actions: {a}aA
States of the world (usually uncertain): {s}sS
Possible consequences: {c}cC (depends on state and action)
Caveat emptor:
How good it is for you depends on fidelity of model to your beliefs
and preferences
Kathryn Blackmond Laskey
Spring 2015
Unit 1 (v5) - 7 -
Illustrative Example
(Highly Oversimplified)
Decision model:
Actions: aT (treat) and aN (dont treat)
States of world: sD (disease now) and sW (well now)
Consequences: cWN (well shortly, no side effects), cWS (well shortly, side
effects), cDN (disease for long time, no side effects)
Probabilities and Utilities:
P(sD) = 0.3
u(cWN) = 100, u(cWS) = 90; u(cDN) = 0
u(cWN)
u(cWS)
100
90
EU(aN)
70
u(cDN)
Expected utility:
Treat:
.390 + .790 =
+ .7100 =
90
70
P(sD) = 0.3
Spring 2015
P(sW) = 0.7
Unit 1 (v5) - 8 -
Sensitivity Analysis:
Optimal Decision as Function of Sickness Probability
Expected utility of the two actions:
E[U|aT] =
90
E[U|aN] =
0p + 100(1-p) = 100(1 p)
Spring 2015
Unit 1 (v5) - 9 -
Why be a Bayesian?
Arguments from theory
A coherent decision maker uses probability to represent uncertainty, uses utility
to represent value, and maximizes expected utility
If you are not coherent then someone can make "Dutch book" on you (turn you
into a "money pump")
Pragmatic arguments
Decision theory provides a useful and principled methodology for modeling
problems of inference, decision and learning from experience
Engineering tradeoffs between accuracy, complexity and cost can be analyzed
and evaluated
Both empirical data and informed engineering judgment can be explicitly
represented and incorporated into a model
Bayesian methods can handle small, moderate and large sample sizes; small,
moderate and large numbers of parameters
With other approaches it is often more difficult to understand why you got the
results you did and how to improve your model
Caution:
Uncritical application of cookbook methods can lead to disaster!
Good modelers iteratively assess, check and revise assumptions
Kathryn Blackmond Laskey
Spring 2015
Unit 1 (v5) - 10 -
Law of total
probability
Spring 2015
Unit 1 (v5) - 11 -
We will assume:
Sensitivity: P(tP | sD) = 0.95
Specificity: P(tN | sW) = 0.85
Spring 2015
Unit 1 (v5) - 12 -
P(E | H )P(H )
i P(E | H i )P(H i )
P(E)>0, P(H2)>0
Terminology:
P(H)
P(E)
P(E | H1)
P(E | H2)
P(H1)
P(H2)
Spring 2015
Unit 1 (v5) - 13 -
If test is negative:
P(sD | tN) = (0.3 x 0.05)/(0.3 x 0.05 + 0.7 x 0.85) = 0.025
EU(aN | tN) = 0.025 0 + 0.975 100 = 97.5
EU(aT | tN) = 0.025 90 + 0.975 90 = 90
Best action is not to treat
EU(aN | tN)
If test is positive:
EU(aT)
= EU(aT|tP)
= EU(aT|tP)
97.5
90
70
EU(aN)
EU(aN | tP)
26.9
Spring 2015
Unit 1 (v5) - 14 -
Value of Information
Reminder of problem ingredients:
P(sD) = 0.3
P(tP | sD) = 0.95; P(tN | sW) = 0.85
u(cWN) = 100, u(cWS) = 90; u(cDN) = 0
Spring 2015
Unit 1 (v5) - 15 -
EVPI EVSI 0
EVPI = EVSI = 0 if the test will not change your decision
Spring 2015
Unit 1 (v5) - 16 -
Spring 2015
Unit 1 (v5) - 17 -
Probability
Action
Utility
Sick,
Positive
.95
Treat
90
Sick,
Negative
.05
NoTreat
Well,
Positive
.15(1-)
Treat
90
Well,
Negative
.85(1-)
NoTreat
100
Spring 2015
Unit 1 (v5) - 18 -
World
State
Probability
Action
Utility
Sick,
Positive
P(sD|tP)P(tP)
Treat
90
Sick,
Negative
P(sD|tN)P(tN)
NoTreat
Well,
Positive
P(sW|tP)P(tP)
Treat
90
Well,
Negative
P(sW|tN)P(tN)
NoTreat
100
Spring 2015
Unit 1 (v5) - 19 -
Strategy Regions
FollowTest:
AlwaysTreat:
EU(aF) = 98.5 13
EU(aT) = 90
NoTreat:
EU(aN) = 100(1 )
EVSI
Spring 2015
Unit 1 (v5) - 20 -
NoTreat:
EU(aN) = 100(1 )
98"
c=0"
c=1"
96"
c=4"
c7.2"
94"
92"
90"
88"
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
Spring 2015
0.8"
0.9"
1"
Unit 1 (v5) - 21 -
EVSI%as%a%Func,on%of%Prior%Probability%
100"
8"
98"
c=0"
7"
c=1"
96"
6"
c=4"
5"
c7.2"
94"
4"
Range of optimality
of test with c=1
3"
92"
2"
90"
1"
0"
88"
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
0.9"
1"
Spring 2015
0"
0.2"
0.4"
0.6"
0.8"
1"
Unit 1 (v5) - 22 -
EVSI is 87 - 1.5
EVSI is 8.5 - 13
EVSI is 8.5 - 13 = 4.6
(testing is optimal)
Spring 2015
Unit 1 (v5) - 23 -
Spring 2015
Unit 1 (v5) - 24 -
Example:
Spring 2015
Unit 1 (v5) - 25 -
X1
X2
X3
~ g()
Xi |
~ Bernoulli()
Xi
i=1,,3
Spring 2015
Unit 1 (v5) - 26 -
0.625
0.725
0.825
0.925
0.14
0.12
0.10
0.525
Theta
0.08
0.425
0.06
0.325
Posterior Probability
0.225
0.04
0.125
0.02
0.025
0.00
Prior Probability
0.06
0.04
0.02
1
(1 )4
g( ) f (x | )
(1 )4
20
g( | x) =
=
=
' g( ') f (x | ') ' 1 '(1 ')4 ' '(1 ')4
20
0.00
0.14
0.025
0.125
0.225
0.325
0.425
0.525
0.625
0.725
0.825
0.925
Theta
Spring 2015
Unit 1 (v5) - 27 -
0.08
0.06
0.00
0.02
0.04
Prior Probability
0.10
0.12
0.14
0.025
0.125
0.225
0.325
0.425
0.525
0.625
0.725
0.825
0.925
0.825
0.925
Theta
0.10
0.08
0.06
0.00
0.02
0.04
Posterior Probability
0.12
0.14
0.025
Spring 2015
0.125
0.225
0.325
0.425
0.525
0.625
0.725
Theta
Unit 1 (v5) - 28 -
R Computing Environment
R (http://www.r-project.org) is a free, open source statistical
computing language and environment that includes:
Spring 2015
Unit 1 (v5) - 29 -
Spring 2015
Unit 1 (v5) - 30 -
0.35"
0.3"
0.25"
0.2"
0.15"
0.1"
0.05"
0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
0.4"
0.4"
0.35"
0.35"
0.3"
0.3"
0.25"
0.25"
0.2"
0.2"
0.15"
0.15"
0.1"
0.1"
0.05"
0.05"
0"
0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
Spring 2015
Unit 1 (v5) - 31 -
Posterior(Distribu,on(for(1(Case(in(5(Samples(
0.4"
0.4"
0.35"
0.35"
0.35"
0.3"
0.3"
0.3"
0.25"
0.25"
0.25"
0.2"
0.15"
0.1"
0.05"
0.2"
0.2"
0.15"
0.15"
0.1"
0.1"
0.05"
0.05"
0"
0"
Posterior(Distribu,on(for(10(Cases(in(50(Samples(
0.4"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
0.4"
0.4"
0.35"
0.35"
0.3"
0.3"
0.3"
0.25"
0.25"
0.25"
0.2"
0.2"
0.15"
0.15"
0.1"
0.1"
0.05"
0.05"
0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
Posterior(Distribu,on(for(10(Cases(in(50(Samples(
Posterior(Distribu,on(for(1(Case(in(5(Samples(
Prior%Distribu+on%
0.4"
0.2"
0.15"
0.1"
0.05"
0"
0"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"
Spring 2015
Unit 1 (v5) - 32 -
Spring 2015
Unit 1 (v5) - 33 -
Spring 2015
Unit 1 (v5) - 34 -
Historical Notes
People have long noticed that some events are imperfectly predictable
Mathematical probability first arose to describe regularities in games of
chance
In the twentieth century it became clear that probability theory provided a
good model for a much broader class of problems:
Physical (thermodynamics; quantum mechanics)
Social (actuarial tables; sample surveys)
Industrial (equipment failures)
The subjectivist interpretation dates from the 18th century but fell out of
favor because of the positivitist orientation of Western 19th and 20th
century science
Von Mises formulated a rigorous (and much-debated) frequency theory in
the mid-twentieth century.
Hierarchy of generality:
Spring 2015
Unit 1 (v5) - 35 -
The Frequentist
A frequentist believes:
Frequentist Inference
Data are drawn from a distribution of known form but with an unknown parameter
(this includes nonparametric statistics in which the unknown parameter is the
distribution itself)
Often this distribution arises from explicit randomization (when not, statistician
argues that the procedure was close enough to randomized that inferences
apply)
Inferences regard the data as random and the parameter as fixed (even though
the data are known and the parameter is unknown)
For example: A sample X1,XN is drawn from a normal distribution with mean .
A 95% confidence interval is constructed. The interpretation is:
If an experiment like this were performed many times we would expect in 95% of the cases
that an interval calculated by the procedure we applied would include the true value of .
Spring 2015
Unit 1 (v5) - 36 -
The Subjectivist
A subjectivist believes:
Probability as an expression of a rational agents degrees of belief about
uncertain propositions.
Rational agents may disagree. There is no one correct probability.
If the agent receives feedback her assessed probabilities will in the limit
converge to observed frequencies
Subjectivist Inference:
Probability distributions are assigned to the unknown parameters and to
the observations given the unknown parameters.
Condition on knowns; use probability to express uncertainty about
unknowns
For example: A sample X1,XN is drawn from a normal distribution with
mean having prior distribution g(). A 95% posterior credible interval is
constructed, and the result is the interval (3.7, 4.9). The interpretation is:
Given the prior distribution for and the observed data, the probability that lies
between 3.7 and 4.9 is 95%.
Spring 2015
Unit 1 (v5) - 37 -
Comparison: Understandability,
Subjectivity and Honest Reporting
Often the Bayesian answer is what the decision maker really wants
to hear.
Untrained people often interpret results in the Bayesian way.
Frequentists are disturbed by the dependence of the posterior
interval on the subjective prior distribution.
It is more important that stochastics provides a means of communication
among researchers whose personal beliefs about the phenomena under study
may differ. If these beliefs are allowed to contaminate the reporting of results,
how are the results of different researchers to be compared?
- H. Dinges
Spring 2015
Unit 1 (v5) - 38 -
Comparison: Generality
Subjectivists can handle problems the frequentist approach
cannot (in particular, problems with not enough data for sound
frequentist inference).
Frequentist statisticians say this comes at a price -- when
there are not enough data the result will be highly dependent
on the prior distribution.
Subjectivists often apply frequentist techniques but with a
Bayesian interpretation
Frequentists often apply Bayesian methods if they have good
frequency properties
Spring 2015
Unit 1 (v5) - 39 -
Spring 2015
Unit 1 (v5) - 40 -
A reward is a prize the decision maker cares about. A lottery is a situation in which
the decision maker will receive one of the possible rewards, where the reward to be
received is governed by a probability distribution. There is a qualitative relationship
of relative preference * , that operates on lotteries, that satisfies the following
conditions:
SU1. For any two lotteries L1 and L2, either L1 * L2, L1 * L2, or L1~*L2.
Furthermore, if L1, L2, and L3 are any lotteries such that L1 * L2 and
L2 * L3, then L1 * L3.
SU2. If r 1, r2 and r3 are rewards such that r1 * r2 * r3, then there exists a
probability p such that [r1: p; r3: (1-p)] ~* r2, where [r1:p; r3:(1-p)] is a
lottery that pays r1 with probability p and r3 with probability (1-p ).
SU3. If r1 ~* r2 are rewards, then for any probability p and any reward r3,
[r1: p; r3: (1-p)] ~* [r2: p; r3: (1-p )]
SU4. If r1 * r2 are rewards, then [r1: p; r2: (1-p)] * [r1: q; r2: (1-q)] if and
only if p > q.
SU5. Consider three lotteries, Li = [r1: pi; r2: (1-pi)], i = 1, 2, 3, giving different
probabilities of the two rewards r1 and r2. Suppose lottery M gives entry to
lottery L2 with probability q and L3 with probability 1-q. Then L1~*M if and
only if p1 = qp2 + (1-q)p3.
Kathryn Blackmond Laskey
Spring 2015
Unit 1 (v5) - 41 -
Spring 2015
Unit 1 (v5) - 42 -
Decision theory can be misused if models are sloppily built and leave out important
elements
When a group or society has not reached consensus there is no clear best choice
Explicitly modeling subjective elements of a problem provides a framework for
informed debate
Kathryn Blackmond Laskey
Spring 2015
Unit 1 (v5) - 43 -
Spring 2015
Unit 1 (v5) - 44 -
Spring 2015
Unit 1 (v5) - 45 -