Lec 2-Continue Slides

Randomization and Simple Comparative Experiments
Dr. Zou
Department of Statistics and Biostatistics

CSUEB
Jan 28, 2020
1/60
One more concept
Placebo is a null treatment that is used when the act of applying a

treatment-any treatment-has an effect. E.g., sugar pills to patients in a
double-blind manner.
Usually in medical studies patients that receive Placebo serve as a control

group.
2/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 2 / 60
Example: A fascinating landmark study of placebo surgery
Moseley et al. (2002) showed that in this controlled trial involving patients
with osteoarthritis of the knee, the outcomes after arthroscopic lavage or
arthroscopic debridement were no better than those after a placebo
procedure.
Remark: For more details see ”A Controlled Trial of Arthroscopic Surgery

for Osteoarthritis of the Knee” at The New England Journal of Medicine
(uploaded in Blackboard).
3/60
Confounding occurs when the effect of one factor or treatment can not
be distinguished from that of another factor or treatment. The two factors
or treatments are said to be confounded. E.g., the factor word processing
packages (A and B) and the factor ”the order of the document entered”
are confounded in our previous example.
4/60
More on responses
Oftentimes more than one response will be collected from a subject in an

experiment:
1 By addressing several questions, experiments often need a different
response for each question. Responses such as these are often called
primary responses.
2 To address a single question, in some cases we need to collect
multiple responses.
5/60
An example of multiple responses
In order to capture Parkinson’s disease disability, we need responses such

as motor (Yi1 ) and non-motor functions (Yi2 ), cognition (Yi3 ) and drug
complications (Yi4 ). Thus, the response of the ith patient is
 
Yi1
Yi2 
Yi = 
Yi3 

Yi4
Remark: In this class, we only focus on the case of a primary response.
6/60
Surrogate responses are responses that are supposed to be related to
and predictive for the primary response. They are oftentimes shorter to
follow up, easier and cheaper. Example, increase in life expectancy vs. the
fraction of patients still alive after five years.
7/60
Randomization
Randomization is a method for assigning treatments to experimental

units using a known, well-understood probabilistic scheme. We say an
experiment is randomized if a randomization is applied to the experiment.
8/60
Why do we need randomization?
1 Randomization protects against systematic errors (or confounding).
2 Randomization itself can be used to conduct statistical inferences.
9/60
Two treatments comparison example
Suppose that we wish to compare a new drug treatment with a surgery

procedure for a certain disease. The surgery is more invasive. We have 20
patients as volunteers to participate in this experiment.
What happens if we let patients decide which treatment to be used by

themselves?
10/60
Since surgery is a more invasive procedure, patients with better health
conditions will be more willing to take the surgery. Thus the drug therapy
would likely to have a lower effect score due to getting the weaker
patients, even if those two treatments are as effective as each other.
(confounding appears here.)
11/60
Randomization Schemes
Lets see two randomization schemes for this experiment:
1 Toss a coin for every patient; heads-the drug, tails-the surgery;
2 Randomly draw 10 patients to receive the drug therapy; The rest 10

patients receive the surgery.
What is the difference between these two randomization schemes?
Which one is better assuming all other factors are equal?
How many different assignments does each of the two randomizations
have?
We now see how to conduct these randomizations in R.
12/60
How does this work? Randomization against confounding.
There are many potential features of the population of experiment units

are associated with our response, randomizations put approximately half
the patients with these features in each treatment group.
Approximately half men get the drug; whereas the other half get the
surgery.
Approximately half of patients with better health conditions get the
drug.
Approximately half the older patients get the drug.
As a result, randomization will generate more homogeneous groups for
comparisons. Of course, if you want to have a better control over some
features, you can resort to more complicated randomization procedures
such as stratified randomization (Show some details: male-female, ages).
13/60
A quick test
Consider the paired design we saw last time, a company is evaluating two
different word processing packages (A and B) for use by its clerical staff.
The goal is to see how quickly a test document can be entered correctly
using two programs. Suppose that 20 test secretaries need to enter the
same document twice using each program. How will you apply
randomization to this case in order to avoid the previous confounding
factor e.g., order?
14/60
A possible method: We randomly select 10 secretaries to enter the
document twice using each in the order A first and B second; The rest 10
secretaries will enter the document twice using each program in the order
B first and A second. Later, when we perform paired t-test, the order
effect will be averaged out.
15/60
Randomization techiniques are used throughout the
exepriments
Some examples:
If experimental units are not used simultaneously, you can randomize
the order in which they are used.
If you use more than one measuring instrument for determining

response, you can randomize which units are measured on which
instruments.
When we anticipate that one of these might cause a change in the
response, we can often design the corresponding problems into the
experiment (e.g., Ch 13 blocking), and randomize everything else. That
says if you expect some potential factors that may influence the response
in an systematic manner, you should consider them in the experiment
designs.
16/60
Simple Comparative Experiments
We consider experiments to compare two treatments (sometimes called

conditions). These are often called simple comparative experiments.
We also refer the two different treatments as two levels of a factor of
interests.
17/60
An example
An engineer is studying the formulation of a Portland cement mortar. He
has added a polymer latex emulsion during mixing to determine if this
impacts the curing time and tension bond strength of the mortar. The
experimenter prepared 20 experimental samples and randomly assign 10
samples to receive the original formulation and 10 samples to receive the
modified formulation. When the cure process was completed, the
experimenter did find a very large reduction in the cure time for the
modified mortar formulation. Then he began to address the tension bond
strength of the mortar. If the new mortar formulation has an adverse
effect on bond strength, this could impact its usefulness.
Remark: see Chapter 2 of Montgomery’s book 6th edition for more details.
18/60
19/60
The crude average tension bond of the modified mortar is
ȳ1 = 16.76 kgf /cm2 compares with the average tension bond
ȳ2 = 17.04 kgf /cm2 of the unmodified mortar. The average tension bond
strengths in these two samples differ by what seems to be a modest
amount. However, it is not obvious that this difference is large enough to
imply that the two formulations really are different. Perhaps this observed
difference in average strengths is the result of sampling fluctuation and the
two formulations are really identical.
20/60
1. Assumptions
Let y11 , y12 , ..., y1n1 represent the n1 observations from the first treatment
(or the first factor level).
Let y21 , y22 , ..., y2n2 represent the n2 observations from the second
treatment (or the second factor level).
Assumptions:
1 We will assume that these observations are independent with each
other.
2 We will also assume that the observations are normally distributed.
In a word, the samples are drawn at random from two independent normal
populations.
21/60
2. A Model for the Data
yij = µi + ij , i = 1, 2, j = 1, 2, ..., ni .
”Response = Treatment effect + Random error ”
yij is the jth obs from factor level i (or ith treatment).
µi is the mean of the response at the ith factor level.
ij are independent as N(0, σi2 ).
22/60
3. Statistical Hypotheses are derived from research
questions
A statistical hypothesis is a statement either about the parameters of a
probability distribution or the parameters of a model. The hypothesis
reflects some conjecture about the problem situation. For example, in the
Portland cement experiment, we may think that the mean tension bond
strengths of the two mortar formulations are equal. This may be stated
formally as
H0 : µ 1 = µ 2
vs.
H1 : µ1 6= µ2
where µ1 is the mean tension bond strength of the modified mortar and µ2
is the mean tension bond strength of the unmodified mortar.
Remark: In general, we usually set our research hypotheses as alternative

hypotheses.
23/60
3. Hypotheses testing
To test a hypothesis, we devise a procedure for taking a random sample,

computing an appropriate test statistic and its sampling distribution, and
then rejecting or failing to reject the null hypothesis H0 based on the
computed value of the test statistic. Part of this procedure is specifying
the set of values for the test statistic that leads to rejection of H0. This
set of values is called the critical region or rejection region for the test.
Informally, when null hypothesis is correct, we do not expect to see a

surprise e.g., usually an unusually large or small value of Test Statistic.
And that how large or small the value is considered as unusual is decided
by its sampling distribution and the significance level α.
24/60
Two kinds of errors
If the null hypothesis is rejected when it is true, a type I error has

occurred. If the null hypothesis is not rejected when it is false, a type II
error has been made.
α = P(Type I error ) = P(reject H0 |H0 is true)

β = P(Type II error ) = P(fail to reject H0 |H0 is false).
Sometimes it is more convenient to work with the power of the test, where
Power = 1 − β = P(reject H0 |H0 is false).
25/60
The general procedure in hypothesis testing is to specify a value of the
probability of type I error α, often called the significance level of the
test, and then design the test procedure so that the probability of type II
error β has a suitably small value.
26/60
4.1 The pooled Two-Sample t-Test
Suppose that we could assume that the variances of tension bond

strengths were identical for both mortar formulations. Then the
appropriate test statistic to use for comparing two treatment means in the
completely randomized design is
ȳ − ȳ2
t0 = q1 ,
Sp n11 + n12
where Sp2 is an estimate of the common variance σ12 = σ22 = σ 2 computed

from
(n1 − 1)S12 + (n2 − 1)S22
Sp2 = .
n1 + n2 − 2
27/60
To test the null hypothesis that H0 : µ1 = µ2 vs. H0 : µ1 6= µ2 in a
two-sided fashion, we would compare the value of t0 to the t distribution
with n1 + n2 − 2 degrees of freedom.
If |t0 | ≥ tα/2,n1 +n2 −2 where tα/2,n1 +n2 −2 is the upper α/2 percentage
point of the t distribution, then we would reject the null hypothesis
H0 and conclude that the mean strengths of the two formulations of
Portland cement mortar differ.
Show some details related to justification (rationale) of this approach
1: Pivotal quantity; 2: Likelihood ratio test in Stat 640.
(Draw a plot and see this in R)
28/60
4.2 Unpooled Two-Sample t-test
Without assuming equal variances. Then the appropriate test statistic to

use for comparing two treatment means is
ȳ1 − ȳ2 − (µ1 − µ2 )

t0 = q 2 ,
S1 S12
n1 + n2
where the degree of freedom v of t0 is obtained by Welch’ approximation.
The unpooled t-test is a default in t.test in R.
29/60
4.3 Which test should we choose in practice?
Pooled t−test vs. Unpooled t − test?

In experiment designs, when randomization takes place, the pooled
t-test oftentimes work.
For observational studies, you should always use the unpooled t-test
since there are no information about equality of variances.
You should also realize that when variances are different, the
unpooled t-test actually tests whether two distributions are the same
or not. And the inferences following up the test will be quite
complicated. (Draw a plot)
Remark: I hope to convince you of the complexity of even simple tests.
Once you have a clear idea of your research problems, you will be fine.
30/60
Calculations behind the software
To illustrate the two sample pooled t−test procedure, consider the

portland cement data, we have the following descriptive statistics (e.g., in
R, summary(cement), var(cementt))
Modified Mortar Unmodified Mortar

ȳ1 = 16.76 ȳ2 = 17.04
S12 = 0.100 S22 = 0.061
n1 = 10 n2 = 10
31/60
Because the sample standard deviations are reasonably similar via both
boxplot and summary statistics, it is not unreasonable to conclude that the
population standard deviations (or variances) are equal. Therefore, we
apply two sample t-test to test the hypotheses
H0 : µ 1 = µ 2
H1 : µ1 6= µ2
Given α = 0.05, the critical value is t0.05/2,18 = qt(0.975, 18) = 2.101.

9(0.100)+9(0.061)
Based on the previous formula, Sp2 = 10+10−2 = 0.081.
32/60
Then the value of test statistic is
16.76 − 17.04
t0 = q = −2.20
1 1
0.284 10 + 10
Since |t0 | = 2.20 > t0.025,18 = 2.101, we would reject H0 and conclude
that the mean tension bond strengths of the two formulations of Portland
cement mortar are different. One can conclude that the modified
formulation reduces the bond strength (just because we conducted a
two-sided test, this does not preclude drawing a one-sided conclusion when
the null hypothesis is rejected).
The above is called Neyman-Pearson’s approach of hypothesis testing: 1)

A specified level of significance α; 2) Critical values or rejection rejections;
3) Test statistics.
33/60
The P-value approach (Fisher’s approach)
The P-value is the probability that the test statistic will take on a value
that is at least as extreme as the observed value of the statistic when the
null hypothesis H0 is true. Thus, a P-value conveys much information
about the weight of evidence against H0, and so a decision maker can
draw a conclusion at any specified level of significance.
1 The smaller the p-value, the stronger the evidence against the null
hypothesis H0 .
2 More formally, we define the P-value as the smallest level of

significance that would lead to rejection of the null hypothesis H0 .
34/60
For our case, under H0 , P.value = Pr (|T18 | ≥ |t0 |) = Pr (|T18 | ≥ 2.2) =
2Pr (T18 ≥ 2.2) = 2 ∗ (1 − pt(2.2, 18)) = 0.041
Thus, the null hypothesis H0 : µ1 = µ2 would be rejected at any level of

significance α ≥ 0.041 in the two-sided testing fashion.
35/60
One-sided alternative hypotheses
In some problems, one may wish to reject H0 only if one mean is larger
than the other. Thus, one would specify a one-sided alternative hypothesis
H1 : µ1 > µ2 and would reject H0 only if t0 > tα,n1 +n2−2 . If one wants to
reject H0 only if µ1 is less than µ2 , then the alternative hypothesis is
H1 : µ1 < µ2 , and one would reject H0 if t0 < −tα,n1 +n2−2 .
Corresponding P-values can also be defined.
36/60
Confidence Intervals (Whether there is a difference; How
large that difference is if possible.)
Although hypothesis testing is a useful procedure, it sometimes does not

tell the entire story. It is often preferable to provide an interval within
which the value of the parameter or parameters in question would be
expected to lie. These interval statements are called confidence intervals.
In many engineering and industrial experiments, the experimenter already
knows that the means µ1 and µ2 differ; consequently, hypothesis testing on
H0 : µ1 = µ2 is of little interest. The experimenter would usually be more
interested in knowing how much the means differ. A confidence interval on
the difference in means µ1 − µ2 is used in answering this question.
It is good practice to accompany every test of a hypothesis with a

confidence interval whenever possible.
37/60
Suppose that θ is an unknown parameter. An interval estimate of θ is to
find two statistics L and U satisfying
Pr (L ≤ θ ≤ U) = 1 − α.
The interval [L, U] is called a 100(1-α)% confidence interval for the

parameter θ.
1 θ is an unknown, but fixed quantity.
2 L and U are functions of random samples. Thus, the probability is
taken with respect to the random data.
3 The interpretation of this interval is that if, in repeated random
samplings, a large number of such intervals are constructed,
100(1 − α) percent of them will contain the true value of θ.
4 Confidence intervals (CI) can also be used to perform hypotheses
testing.
38/60
q
1 1 ȳ1 −ȳ2 −(µ1 −µ2 )
Letting SE = Sp n1 + n2 , the statistic SE ∼ tn1 +n2 −2 . Then

ȳ1 − ȳ2 − (µ1 − µ2 )
Pr −tα/2,n1 +n2 −2 ≤ ≤ tα/2,n1 +n2 −2 = 1 − α.
SE
Rearranging, we have

Pr ȳ1 − ȳ2 − tα/2,n1 +n2 −2 SE ≤ µ1 − µ2 ≤ ȳ1 − ȳ2 + tα/2,n1 +n2 −2 SE = 1 − α.
Thus, a 100(1-α)% CI for µ1 − µ2 is

ȳ1 − ȳ2 − tα/2,n1 +n2 −2 SE , ȳ1 − ȳ2 + tα/2,n1 +n2 −2 SE .
“Estimates ± critical value ∗ standard error .”

Remarks: 1. The above method for constructing CIs is called the use of Pivotal
quantity; 2. Corresponding one-sided CIs can also be constructed.
39/60
In our case of portland cement mortar, the 95% CI estimate for the
difference in mean tension bound strength for two formulations is found as
follows:
−0.55 ≤ µ1 − µ2 ≤ −0.01.
1 Hypotheses testing: Note that because µ1 − µ2 = 0 is not included
in this interval, the data do not support the hypothesis that µ1 = µ2
at the 5 percent level of significance.
2 With 95% confidence, we know that the true difference of two
formulations can be as large as -0.55 and as small as -0.01. Whether
this difference is of practical importance depends on engineers’
decisions.
40/60
Checking assumptions in the t-Test
In using the t-test procedure we make the assumptions that both samples
are random samples that are drawn from independent populations that
can be described by a normal distribution and that the standard
deviation or variances of both populations are equal. The assumption of
independence is critical, and if the run order is randomized (and, if
appropriate, other experimental units and materials are selected at
random), this assumption will usually be satisfied. The equal variance and
normality assumptions are easy to check using a Quantile-Quantile Plot
(qq plot).
41/60
QQ plot
The Q-Q plot, or quantile-quantile plot, is a graphical tool to help us

assess if a set of data plausibly came from some theoretical distribution
such as a Normal or exponential. It plots data against their theoretical
quantiles from the hypothesized distributions.
42/60
QQ Plot: Red−UnModified, Black−Modified
17.4
17.2
17.0
Data points
16.8
16.6
16.4
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Theoretical normal quantile
1. qqline (25th and 75th percentile points); 2. Normality(How?); 3. Equal

variance(How?); 4. Information about mean differences (How?); 5. others
(e.g., skewness). 43/60
Choice of Sample Size
Selection of an appropriate sample size is one of the most important parts

of any experimental design problem. For example, in our case that
engineers are concerned with the difference that µ1 − µ2 = −0.5 which is
of practical importance in testing H0 : µ1 = µ2 vs H1 : µ1 6= µ2 .
Since they want to detect this change not by chance, they set the power
to be 0.99 (e.g., the Type II error is 0.01) when the true difference in
means is -0.5 kgf /cm2 . What is the sample size needed for each group
given the significance level α = 0.05? Assume that previous data shows
that the standard deviation of the units is 0.30.
44/60
power .t.test(delta = −0.5, sd = 0.30, power = 0.99)
Two-samplet test power calculation
n = 14.27349
delta = 0.5
sd = 0.3
sig .level = 0.05
power = 0.99
alternative = two.sided
NOTE: n is number in *each* group
45/60
Summary of Sample size selection
Power analysis is an indispensable part of any experimental design

and quantitative research in other disciplines .
Power consideration needs to be considered in the experiment
designing process, not after the experiment.
In order to compute the corresponding sample size per group, we need
to specify the alternative hypotheses of interest and oftentimes need
to have some estimates of variances of the procedures from some
pilot studies and history controls.
46/60
A nonparametric approach: Two sample Permutation test
(Randomization test)
In our example, the mean of modified mortar is 16.76 < 17.04 which is the
mean of unmodified mortar. We are interested in testing H0 : µ1 = µ2 vs
H1 : µ1 6= µ2 . Or in general you see H0 : no treatment effect vs
H1 : there is an effect.
1 If there is no treatment effect, then there is no difference between
units in treatment and those in control group.
2 When null hypothesis is true (there is no treatment effect), the
observed groups were simply obtained by randomly splitting the 20
subjects into two groups.
3 If we pool 20 observations and randomly allocate them into two
groups, then we should expect the same testing result from the two
“new” groups.
47/60
Testing procedures:
1 Record the observed mean difference as

|Tobs | = |16.76 − 17.04| = 0.28.
2 Pool 20 observations, randomly assign 10 to treatment group and the

other 10 to control group. Calculate the test statistic
|T1 | = |mean(1st group) − mean(2nd group)|.
20!
3 Perform (2) for all 10!10! = 184756 possible combinations.
4 the P value is the proportion of values of |T1 |, ..., |T184756 | that is

greater and equal to the value |Tobs |.
Remark: see R.
48/60
Remarks:
By using Randomization test, we do not have to assign strong

parametric distribution assumptions to the underlying data sets e.g.,
normal distribution.
Since the total number of combinations is so large, this procedure is

oftentimes conducted using Monte Carlo simulation to approximate
the true P-value.
Using my program in R, a resulted P-value is 0.043 in 20000

replications which is similar to 0.042 obtained by two sample t-test.
We will see this at lab session on this Thursday .
49/60
Paired Comparison Design
In some simple comparative experiments, we can greatly improve the

precision by making comparisons within matched pairs of experimental
material.
For example, consider a hardness testing machine that presses a rod with a
pointed tip into a metal specimen with a known force. By measuring the
depth of the depression caused by the tip, the hardness of the specimen is
determined. Two different tips are available for this machine, and although
the precision (variability) of the measurements made by the two tips seems
to be the same, it is suspected that one tip produces different mean
hardness readings than the other.
50/60
Paired Design
Consider an alternative experimental design. Assume that each specimen

is large enough so that two hardness determinations may be made on it.
This alternative design would consist of dividing each specimen into two
parts, then randomly assigning one tip to one-half of each specimen and
the other tip to the remaining half. The order in which the tips are tested
for a particular specimen would also be randomly selected.
51/60
52/60
A Model for the Paired Data
yij = µi + βj + ij , i = 1, 2, j = 1, 2, ..., 10.
yij is the observation on hardness for tip i on specimen j.

µi is the true mean hardness of the ith tip.
βj is an effect on hardness due to the jth specimen
ij are independent as N(0, σi2 ). That is, σ12 is the variance of the
hardness measurements from tip 1, and σ22 is the variance of the
hardness measurements from tip 2.
Remark: The paired design is a special case of completely randomized
block design.
53/60
Note that if we compute the jth paired difference
dj = y1j − y2j , j = 1, 2, ..., 10.
The expected value of this difference is
µd = E (dj ) = E (y1j − y2j )

= E (y1j ) − E (y2j )
= µ1 + βj − µ2 − βj
= µ1 − µ2 .
That is, we may make inferences about the difference in the mean
hardness readings of the two tips µ1 − µ2 by making inferences about the
mean of the differences µd . Notice that the additive effect of the
specimens βj cancels out when the observations are paired in this manner.
54/60
Testing H0 : µ1 = µ2 is equivalent to testing
H 0 : µd = 0
H1 : µd 6= 0.
This is a one sample t-test.

The test statistic is
d¯
t0 = √ ,
Sd / n
where d¯ = nj=1 dj /n, Sd is the common standard deviation based on the
P
paired difference d1 , d2 , ..., d1 0.
We reject H0 if |t0 | ≥ tα/2,n−1 . A P-value approach could also be used.
55/60
Inferences About the Variances of Normal Distributions
In some experiments it is the comparison of variability in the data that is

important. In the food and beverage industry, for example, it is important
that the variability of filling equipment be small so that all packages have
close to the nominal net weight or volume of content. We now briefly
examine tests of hypotheses and confidence intervals for variances of
normal distributions. Unlike the tests on means (e.g., t-test is a robust
test), the procedures for tests on variances are rather sensitive to the
normality assumption.
56/60
One sample variance inference/test
Suppose we wish to test the hypothesis that the variance of a normal

population equals a constant, for example, σ02 . Stated formally, we wish
to test
H0 : σ 2 = σ02
H1 : σ 2 6= σ02
2
The test statistic χ20 = (n−1)S
σ02
∼ χ2n−1 , under H0 , where
S 2 = ni=1 (yi − ȳ )2 /(n − 1).
P
The null hypothesis is rejected if χ20 > χ2α/2,n−1 or if χ20 < χ21−α/2,n−1 .
57/60
The 100(1 − α)% confidence interval on σ 2 is
(n − 1)S 2 2 (n − 1)S 2
≤ σ ≤ .
χ2α/2,n−1 χ21−α/2,n−1
58/60
Two sample variance inference/test
Now consider testing the equality of the variances of two normal

populations. If independent random samples of size n1 and n2 are taken
from populations 1 and 2, respectively, the test statistic for
H0 : σ12 = σ22
H1 : σ12 6= σ22
S12
is the ratio of the sample variances F0 = S22
∼ Fn1 −1,n2 −1 under H0 .
The null hypothesis would be rejected if F0 > Fα/2,n1 −1,n2 −1 or if

F0 < F1−α/2,n1 −1,n2 −1 , where Fα/2,n1 −1,n2 −1 and F1−α/2,n1 −1,n2 −1 denote
the upper α/2 and lower 1 − α/2 percentage points of the F distribution
with n1 − 1 and n2 − 1 degrees of freedom.
59/60
In R, you can use the code var .test(Modified, Unmodified).
The 100(1 − α)% confidence interval on σ12 /σ22 is
S12 σ2 S2
2
F1−α/2,n1 −1,n2 −1 ≤ 12 ≤ 12 Fα/2,n1 −1,n2 −1 .
S2 σ2 S2
60/60

Lec 2-Continue Slides

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lec 2-Continue Slides

Enviado por

Direitos autorais:

Formatos disponíveis

Randomization and Simple Comparative Experiments

Department of Statistics and Biostatistics

Jan 28, 2020

Placebo is a null treatment that is used when the act of applying a

Usually in medical studies patients that receive Placebo serve as a control

Remark: For more details see ”A Controlled Trial of Arthroscopic Surgery

Oftentimes more than one response will be collected from a subject in an

In order to capture Parkinson’s disease disability, we need responses such

Remark: In this class, we only focus on the case of a primary response.

Randomization is a method for assigning treatments to experimental

1 Randomization protects against systematic errors (or confounding).

2 Randomization itself can be used to conduct statistical inferences.

Suppose that we wish to compare a new drug treatment with a surgery

What happens if we let patients decide which treatment to be used by

Lets see two randomization schemes for this experiment:

1 Toss a coin for every patient; heads-the drug, tails-the surgery;

2 Randomly draw 10 patients to receive the drug therapy; The rest 10

There are many potential features of the population of experiment units

If you use more than one measuring instrument for determining

We consider experiments to compare two treatments (sometimes called

2 We will also assume that the observations are normally distributed.

yij = µi + ij , i = 1, 2, j = 1, 2, ..., ni .

”Response = Treatment effect + Random error ”

Remark: In general, we usually set our research hypotheses as alternative

To test a hypothesis, we devise a procedure for taking a random sample,

Informally, when null hypothesis is correct, we do not expect to see a

If the null hypothesis is rejected when it is true, a type I error has

α = P(Type I error ) = P(reject H0 |H0 is true)

Power = 1 − β = P(reject H0 |H0 is false).

Suppose that we could assume that the variances of tension bond

where Sp2 is an estimate of the common variance σ12 = σ22 = σ 2 computed

Without assuming equal variances. Then the appropriate test statistic to

ȳ1 − ȳ2 − (µ1 − µ2 )

where the degree of freedom v of t0 is obtained by Welch’ approximation.

The unpooled t-test is a default in t.test in R.

Pooled t−test vs. Unpooled t − test?

To illustrate the two sample pooled t−test procedure, consider the

Modified Mortar Unmodified Mortar

Given α = 0.05, the critical value is t0.05/2,18 = qt(0.975, 18) = 2.101.

The above is called Neyman-Pearson’s approach of hypothesis testing: 1)

2 More formally, we define the P-value as the smallest level of

Thus, the null hypothesis H0 : µ1 = µ2 would be rejected at any level of

Corresponding P-values can also be defined.

Although hypothesis testing is a useful procedure, it sometimes does not

It is good practice to accompany every test of a hypothesis with a

The interval [L, U] is called a 100(1-α)% confidence interval for the

Thus, a 100(1-α)% CI for µ1 − µ2 is

“Estimates ± critical value ∗ standard error .”

The Q-Q plot, or quantile-quantile plot, is a graphical tool to help us

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Theoretical normal quantile

1. qqline (25th and 75th percentile points); 2. Normality(How?); 3. Equal

Selection of an appropriate sample size is one of the most important parts

Power analysis is an indispensable part of any experimental design

1 Record the observed mean difference as

2 Pool 20 observations, randomly assign 10 to treatment group and the

4 the P value is the proportion of values of |T1 |, ..., |T184756 | that is

By using Randomization test, we do not have to assign strong

Since the total number of combinations is so large, this procedure is

Using my program in R, a resulted P-value is 0.043 in 20000

In some simple comparative experiments, we can greatly improve the

Consider an alternative experimental design. Assume that each specimen

yij = µi + βj + ij , i = 1, 2, j = 1, 2, ..., 10.

yij = µi + ij , i = 1, 2, j = 1, 2, ..., ni .

yij = µi + βj + ij , i = 1, 2, j = 1, 2, ..., 10.