Você está na página 1de 60

Randomization and Simple Comparative Experiments

Dr. Zou

Department of Statistics and Biostatistics


CSUEB

Jan 28, 2020

1/60
One more concept

Placebo is a null treatment that is used when the act of applying a


treatment-any treatment-has an effect. E.g., sugar pills to patients in a
double-blind manner.

Usually in medical studies patients that receive Placebo serve as a control


group.

2/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 2 / 60
Example: A fascinating landmark study of placebo surgery

Moseley et al. (2002) showed that in this controlled trial involving patients
with osteoarthritis of the knee, the outcomes after arthroscopic lavage or
arthroscopic debridement were no better than those after a placebo
procedure.

Remark: For more details see ”A Controlled Trial of Arthroscopic Surgery


for Osteoarthritis of the Knee” at The New England Journal of Medicine
(uploaded in Blackboard).

3/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 3 / 60
Confounding occurs when the effect of one factor or treatment can not
be distinguished from that of another factor or treatment. The two factors
or treatments are said to be confounded. E.g., the factor word processing
packages (A and B) and the factor ”the order of the document entered”
are confounded in our previous example.

4/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 4 / 60
More on responses

Oftentimes more than one response will be collected from a subject in an


experiment:
1 By addressing several questions, experiments often need a different
response for each question. Responses such as these are often called
primary responses.
2 To address a single question, in some cases we need to collect
multiple responses.

5/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 5 / 60
An example of multiple responses

In order to capture Parkinson’s disease disability, we need responses such


as motor (Yi1 ) and non-motor functions (Yi2 ), cognition (Yi3 ) and drug
complications (Yi4 ). Thus, the response of the ith patient is
 
Yi1
Yi2 
Yi = 
Yi3 

Yi4

Remark: In this class, we only focus on the case of a primary response.

6/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 6 / 60
Surrogate responses are responses that are supposed to be related to
and predictive for the primary response. They are oftentimes shorter to
follow up, easier and cheaper. Example, increase in life expectancy vs. the
fraction of patients still alive after five years.

7/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 7 / 60
Randomization

Randomization is a method for assigning treatments to experimental


units using a known, well-understood probabilistic scheme. We say an
experiment is randomized if a randomization is applied to the experiment.

8/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 8 / 60
Why do we need randomization?

1 Randomization protects against systematic errors (or confounding).

2 Randomization itself can be used to conduct statistical inferences.

9/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 9 / 60
Two treatments comparison example

Suppose that we wish to compare a new drug treatment with a surgery


procedure for a certain disease. The surgery is more invasive. We have 20
patients as volunteers to participate in this experiment.

What happens if we let patients decide which treatment to be used by


themselves?

10/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 10 / 60
Since surgery is a more invasive procedure, patients with better health
conditions will be more willing to take the surgery. Thus the drug therapy
would likely to have a lower effect score due to getting the weaker
patients, even if those two treatments are as effective as each other.
(confounding appears here.)

11/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 11 / 60
Randomization Schemes

Lets see two randomization schemes for this experiment:

1 Toss a coin for every patient; heads-the drug, tails-the surgery;

2 Randomly draw 10 patients to receive the drug therapy; The rest 10


patients receive the surgery.
What is the difference between these two randomization schemes?
Which one is better assuming all other factors are equal?
How many different assignments does each of the two randomizations
have?
We now see how to conduct these randomizations in R.

12/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 12 / 60
How does this work? Randomization against confounding.

There are many potential features of the population of experiment units


are associated with our response, randomizations put approximately half
the patients with these features in each treatment group.
Approximately half men get the drug; whereas the other half get the
surgery.
Approximately half of patients with better health conditions get the
drug.
Approximately half the older patients get the drug.
As a result, randomization will generate more homogeneous groups for
comparisons. Of course, if you want to have a better control over some
features, you can resort to more complicated randomization procedures
such as stratified randomization (Show some details: male-female, ages).

13/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 13 / 60
A quick test
Consider the paired design we saw last time, a company is evaluating two
different word processing packages (A and B) for use by its clerical staff.
The goal is to see how quickly a test document can be entered correctly
using two programs. Suppose that 20 test secretaries need to enter the
same document twice using each program. How will you apply
randomization to this case in order to avoid the previous confounding
factor e.g., order?

14/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 14 / 60
A possible method: We randomly select 10 secretaries to enter the
document twice using each in the order A first and B second; The rest 10
secretaries will enter the document twice using each program in the order
B first and A second. Later, when we perform paired t-test, the order
effect will be averaged out.

15/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 15 / 60
Randomization techiniques are used throughout the
exepriments
Some examples:
If experimental units are not used simultaneously, you can randomize
the order in which they are used.

If you use more than one measuring instrument for determining


response, you can randomize which units are measured on which
instruments.
When we anticipate that one of these might cause a change in the
response, we can often design the corresponding problems into the
experiment (e.g., Ch 13 blocking), and randomize everything else. That
says if you expect some potential factors that may influence the response
in an systematic manner, you should consider them in the experiment
designs.

16/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 16 / 60
Simple Comparative Experiments

We consider experiments to compare two treatments (sometimes called


conditions). These are often called simple comparative experiments.
We also refer the two different treatments as two levels of a factor of
interests.

17/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 17 / 60
An example
An engineer is studying the formulation of a Portland cement mortar. He
has added a polymer latex emulsion during mixing to determine if this
impacts the curing time and tension bond strength of the mortar. The
experimenter prepared 20 experimental samples and randomly assign 10
samples to receive the original formulation and 10 samples to receive the
modified formulation. When the cure process was completed, the
experimenter did find a very large reduction in the cure time for the
modified mortar formulation. Then he began to address the tension bond
strength of the mortar. If the new mortar formulation has an adverse
effect on bond strength, this could impact its usefulness.

Remark: see Chapter 2 of Montgomery’s book 6th edition for more details.

18/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 18 / 60
19/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 19 / 60
The crude average tension bond of the modified mortar is
ȳ1 = 16.76 kgf /cm2 compares with the average tension bond
ȳ2 = 17.04 kgf /cm2 of the unmodified mortar. The average tension bond
strengths in these two samples differ by what seems to be a modest
amount. However, it is not obvious that this difference is large enough to
imply that the two formulations really are different. Perhaps this observed
difference in average strengths is the result of sampling fluctuation and the
two formulations are really identical.

20/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 20 / 60
1. Assumptions

Let y11 , y12 , ..., y1n1 represent the n1 observations from the first treatment
(or the first factor level).
Let y21 , y22 , ..., y2n2 represent the n2 observations from the second
treatment (or the second factor level).

Assumptions:
1 We will assume that these observations are independent with each
other.

2 We will also assume that the observations are normally distributed.

In a word, the samples are drawn at random from two independent normal
populations.

21/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 21 / 60
2. A Model for the Data

yij = µi + ij , i = 1, 2, j = 1, 2, ..., ni .

”Response = Treatment effect + Random error ”

yij is the jth obs from factor level i (or ith treatment).
µi is the mean of the response at the ith factor level.
ij are independent as N(0, σi2 ).

22/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 22 / 60
3. Statistical Hypotheses are derived from research
questions
A statistical hypothesis is a statement either about the parameters of a
probability distribution or the parameters of a model. The hypothesis
reflects some conjecture about the problem situation. For example, in the
Portland cement experiment, we may think that the mean tension bond
strengths of the two mortar formulations are equal. This may be stated
formally as
H0 : µ 1 = µ 2
vs.
H1 : µ1 6= µ2
where µ1 is the mean tension bond strength of the modified mortar and µ2
is the mean tension bond strength of the unmodified mortar.

Remark: In general, we usually set our research hypotheses as alternative


hypotheses.
23/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 23 / 60
3. Hypotheses testing

To test a hypothesis, we devise a procedure for taking a random sample,


computing an appropriate test statistic and its sampling distribution, and
then rejecting or failing to reject the null hypothesis H0 based on the
computed value of the test statistic. Part of this procedure is specifying
the set of values for the test statistic that leads to rejection of H0. This
set of values is called the critical region or rejection region for the test.

Informally, when null hypothesis is correct, we do not expect to see a


surprise e.g., usually an unusually large or small value of Test Statistic.
And that how large or small the value is considered as unusual is decided
by its sampling distribution and the significance level α.

24/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 24 / 60
Two kinds of errors

If the null hypothesis is rejected when it is true, a type I error has


occurred. If the null hypothesis is not rejected when it is false, a type II
error has been made.

α = P(Type I error ) = P(reject H0 |H0 is true)


β = P(Type II error ) = P(fail to reject H0 |H0 is false).

Sometimes it is more convenient to work with the power of the test, where

Power = 1 − β = P(reject H0 |H0 is false).

25/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 25 / 60
The general procedure in hypothesis testing is to specify a value of the
probability of type I error α, often called the significance level of the
test, and then design the test procedure so that the probability of type II
error β has a suitably small value.

26/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 26 / 60
4.1 The pooled Two-Sample t-Test

Suppose that we could assume that the variances of tension bond


strengths were identical for both mortar formulations. Then the
appropriate test statistic to use for comparing two treatment means in the
completely randomized design is
ȳ − ȳ2
t0 = q1 ,
Sp n11 + n12

where Sp2 is an estimate of the common variance σ12 = σ22 = σ 2 computed


from
(n1 − 1)S12 + (n2 − 1)S22
Sp2 = .
n1 + n2 − 2

27/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 27 / 60
To test the null hypothesis that H0 : µ1 = µ2 vs. H0 : µ1 6= µ2 in a
two-sided fashion, we would compare the value of t0 to the t distribution
with n1 + n2 − 2 degrees of freedom.
If |t0 | ≥ tα/2,n1 +n2 −2 where tα/2,n1 +n2 −2 is the upper α/2 percentage
point of the t distribution, then we would reject the null hypothesis
H0 and conclude that the mean strengths of the two formulations of
Portland cement mortar differ.
Show some details related to justification (rationale) of this approach
1: Pivotal quantity; 2: Likelihood ratio test in Stat 640.
(Draw a plot and see this in R)

28/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 28 / 60
4.2 Unpooled Two-Sample t-test

Without assuming equal variances. Then the appropriate test statistic to


use for comparing two treatment means is

ȳ1 − ȳ2 − (µ1 − µ2 )


t0 = q 2 ,
S1 S12
n1 + n2

where the degree of freedom v of t0 is obtained by Welch’ approximation.

The unpooled t-test is a default in t.test in R.

29/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 29 / 60
4.3 Which test should we choose in practice?

Pooled t−test vs. Unpooled t − test?


In experiment designs, when randomization takes place, the pooled
t-test oftentimes work.
For observational studies, you should always use the unpooled t-test
since there are no information about equality of variances.
You should also realize that when variances are different, the
unpooled t-test actually tests whether two distributions are the same
or not. And the inferences following up the test will be quite
complicated. (Draw a plot)
Remark: I hope to convince you of the complexity of even simple tests.
Once you have a clear idea of your research problems, you will be fine.

30/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 30 / 60
Calculations behind the software

To illustrate the two sample pooled t−test procedure, consider the


portland cement data, we have the following descriptive statistics (e.g., in
R, summary(cement), var(cementt))

Modified Mortar Unmodified Mortar


ȳ1 = 16.76 ȳ2 = 17.04
S12 = 0.100 S22 = 0.061
n1 = 10 n2 = 10

31/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 31 / 60
Because the sample standard deviations are reasonably similar via both
boxplot and summary statistics, it is not unreasonable to conclude that the
population standard deviations (or variances) are equal. Therefore, we
apply two sample t-test to test the hypotheses

H0 : µ 1 = µ 2
H1 : µ1 6= µ2

Given α = 0.05, the critical value is t0.05/2,18 = qt(0.975, 18) = 2.101.


9(0.100)+9(0.061)
Based on the previous formula, Sp2 = 10+10−2 = 0.081.

32/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 32 / 60
Then the value of test statistic is
16.76 − 17.04
t0 = q = −2.20
1 1
0.284 10 + 10

Since |t0 | = 2.20 > t0.025,18 = 2.101, we would reject H0 and conclude
that the mean tension bond strengths of the two formulations of Portland
cement mortar are different. One can conclude that the modified
formulation reduces the bond strength (just because we conducted a
two-sided test, this does not preclude drawing a one-sided conclusion when
the null hypothesis is rejected).

The above is called Neyman-Pearson’s approach of hypothesis testing: 1)


A specified level of significance α; 2) Critical values or rejection rejections;
3) Test statistics.

33/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 33 / 60
The P-value approach (Fisher’s approach)

The P-value is the probability that the test statistic will take on a value
that is at least as extreme as the observed value of the statistic when the
null hypothesis H0 is true. Thus, a P-value conveys much information
about the weight of evidence against H0, and so a decision maker can
draw a conclusion at any specified level of significance.

1 The smaller the p-value, the stronger the evidence against the null
hypothesis H0 .

2 More formally, we define the P-value as the smallest level of


significance that would lead to rejection of the null hypothesis H0 .

34/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 34 / 60
For our case, under H0 , P.value = Pr (|T18 | ≥ |t0 |) = Pr (|T18 | ≥ 2.2) =
2Pr (T18 ≥ 2.2) = 2 ∗ (1 − pt(2.2, 18)) = 0.041

Thus, the null hypothesis H0 : µ1 = µ2 would be rejected at any level of


significance α ≥ 0.041 in the two-sided testing fashion.

35/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 35 / 60
One-sided alternative hypotheses

In some problems, one may wish to reject H0 only if one mean is larger
than the other. Thus, one would specify a one-sided alternative hypothesis
H1 : µ1 > µ2 and would reject H0 only if t0 > tα,n1 +n2−2 . If one wants to
reject H0 only if µ1 is less than µ2 , then the alternative hypothesis is
H1 : µ1 < µ2 , and one would reject H0 if t0 < −tα,n1 +n2−2 .

Corresponding P-values can also be defined.

36/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 36 / 60
Confidence Intervals (Whether there is a difference; How
large that difference is if possible.)

Although hypothesis testing is a useful procedure, it sometimes does not


tell the entire story. It is often preferable to provide an interval within
which the value of the parameter or parameters in question would be
expected to lie. These interval statements are called confidence intervals.
In many engineering and industrial experiments, the experimenter already
knows that the means µ1 and µ2 differ; consequently, hypothesis testing on
H0 : µ1 = µ2 is of little interest. The experimenter would usually be more
interested in knowing how much the means differ. A confidence interval on
the difference in means µ1 − µ2 is used in answering this question.

It is good practice to accompany every test of a hypothesis with a


confidence interval whenever possible.

37/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 37 / 60
Suppose that θ is an unknown parameter. An interval estimate of θ is to
find two statistics L and U satisfying

Pr (L ≤ θ ≤ U) = 1 − α.

The interval [L, U] is called a 100(1-α)% confidence interval for the


parameter θ.
1 θ is an unknown, but fixed quantity.
2 L and U are functions of random samples. Thus, the probability is
taken with respect to the random data.
3 The interpretation of this interval is that if, in repeated random
samplings, a large number of such intervals are constructed,
100(1 − α) percent of them will contain the true value of θ.
4 Confidence intervals (CI) can also be used to perform hypotheses
testing.

38/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 38 / 60
q
1 1 ȳ1 −ȳ2 −(µ1 −µ2 )
Letting SE = Sp n1 + n2 , the statistic SE ∼ tn1 +n2 −2 . Then
 
ȳ1 − ȳ2 − (µ1 − µ2 )
Pr −tα/2,n1 +n2 −2 ≤ ≤ tα/2,n1 +n2 −2 = 1 − α.
SE

Rearranging, we have

Pr ȳ1 − ȳ2 − tα/2,n1 +n2 −2 SE ≤ µ1 − µ2 ≤ ȳ1 − ȳ2 + tα/2,n1 +n2 −2 SE = 1 − α.

Thus, a 100(1-α)% CI for µ1 − µ2 is


 
ȳ1 − ȳ2 − tα/2,n1 +n2 −2 SE , ȳ1 − ȳ2 + tα/2,n1 +n2 −2 SE .

“Estimates ± critical value ∗ standard error .”


Remarks: 1. The above method for constructing CIs is called the use of Pivotal
quantity; 2. Corresponding one-sided CIs can also be constructed.

39/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 39 / 60
In our case of portland cement mortar, the 95% CI estimate for the
difference in mean tension bound strength for two formulations is found as
follows:
−0.55 ≤ µ1 − µ2 ≤ −0.01.
1 Hypotheses testing: Note that because µ1 − µ2 = 0 is not included
in this interval, the data do not support the hypothesis that µ1 = µ2
at the 5 percent level of significance.
2 With 95% confidence, we know that the true difference of two
formulations can be as large as -0.55 and as small as -0.01. Whether
this difference is of practical importance depends on engineers’
decisions.

40/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 40 / 60
Checking assumptions in the t-Test

In using the t-test procedure we make the assumptions that both samples
are random samples that are drawn from independent populations that
can be described by a normal distribution and that the standard
deviation or variances of both populations are equal. The assumption of
independence is critical, and if the run order is randomized (and, if
appropriate, other experimental units and materials are selected at
random), this assumption will usually be satisfied. The equal variance and
normality assumptions are easy to check using a Quantile-Quantile Plot
(qq plot).

41/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 41 / 60
QQ plot

The Q-Q plot, or quantile-quantile plot, is a graphical tool to help us


assess if a set of data plausibly came from some theoretical distribution
such as a Normal or exponential. It plots data against their theoretical
quantiles from the hypothesized distributions.

42/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 42 / 60
QQ Plot: Red−UnModified, Black−Modified

17.4
17.2
17.0
Data points

16.8
16.6
16.4

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Theoretical normal quantile

1. qqline (25th and 75th percentile points); 2. Normality(How?); 3. Equal


variance(How?); 4. Information about mean differences (How?); 5. others
(e.g., skewness). 43/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 43 / 60
Choice of Sample Size

Selection of an appropriate sample size is one of the most important parts


of any experimental design problem. For example, in our case that
engineers are concerned with the difference that µ1 − µ2 = −0.5 which is
of practical importance in testing H0 : µ1 = µ2 vs H1 : µ1 6= µ2 .

Since they want to detect this change not by chance, they set the power
to be 0.99 (e.g., the Type II error is 0.01) when the true difference in
means is -0.5 kgf /cm2 . What is the sample size needed for each group
given the significance level α = 0.05? Assume that previous data shows
that the standard deviation of the units is 0.30.

44/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 44 / 60
power .t.test(delta = −0.5, sd = 0.30, power = 0.99)
Two-samplet test power calculation
n = 14.27349
delta = 0.5
sd = 0.3
sig .level = 0.05
power = 0.99
alternative = two.sided
NOTE: n is number in *each* group

45/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 45 / 60
Summary of Sample size selection

Power analysis is an indispensable part of any experimental design


and quantitative research in other disciplines .
Power consideration needs to be considered in the experiment
designing process, not after the experiment.
In order to compute the corresponding sample size per group, we need
to specify the alternative hypotheses of interest and oftentimes need
to have some estimates of variances of the procedures from some
pilot studies and history controls.

46/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 46 / 60
A nonparametric approach: Two sample Permutation test
(Randomization test)

In our example, the mean of modified mortar is 16.76 < 17.04 which is the
mean of unmodified mortar. We are interested in testing H0 : µ1 = µ2 vs
H1 : µ1 6= µ2 . Or in general you see H0 : no treatment effect vs
H1 : there is an effect.
1 If there is no treatment effect, then there is no difference between
units in treatment and those in control group.
2 When null hypothesis is true (there is no treatment effect), the
observed groups were simply obtained by randomly splitting the 20
subjects into two groups.
3 If we pool 20 observations and randomly allocate them into two
groups, then we should expect the same testing result from the two
“new” groups.

47/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 47 / 60
Testing procedures:

1 Record the observed mean difference as


|Tobs | = |16.76 − 17.04| = 0.28.

2 Pool 20 observations, randomly assign 10 to treatment group and the


other 10 to control group. Calculate the test statistic
|T1 | = |mean(1st group) − mean(2nd group)|.

20!
3 Perform (2) for all 10!10! = 184756 possible combinations.

4 the P value is the proportion of values of |T1 |, ..., |T184756 | that is


greater and equal to the value |Tobs |.
Remark: see R.

48/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 48 / 60
Remarks:

By using Randomization test, we do not have to assign strong


parametric distribution assumptions to the underlying data sets e.g.,
normal distribution.

Since the total number of combinations is so large, this procedure is


oftentimes conducted using Monte Carlo simulation to approximate
the true P-value.

Using my program in R, a resulted P-value is 0.043 in 20000


replications which is similar to 0.042 obtained by two sample t-test.
We will see this at lab session on this Thursday .

49/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 49 / 60
Paired Comparison Design

In some simple comparative experiments, we can greatly improve the


precision by making comparisons within matched pairs of experimental
material.
For example, consider a hardness testing machine that presses a rod with a
pointed tip into a metal specimen with a known force. By measuring the
depth of the depression caused by the tip, the hardness of the specimen is
determined. Two different tips are available for this machine, and although
the precision (variability) of the measurements made by the two tips seems
to be the same, it is suspected that one tip produces different mean
hardness readings than the other.

50/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 50 / 60
Paired Design

Consider an alternative experimental design. Assume that each specimen


is large enough so that two hardness determinations may be made on it.
This alternative design would consist of dividing each specimen into two
parts, then randomly assigning one tip to one-half of each specimen and
the other tip to the remaining half. The order in which the tips are tested
for a particular specimen would also be randomly selected.

51/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 51 / 60
52/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 52 / 60
A Model for the Paired Data

yij = µi + βj + ij , i = 1, 2, j = 1, 2, ..., 10.

yij is the observation on hardness for tip i on specimen j.


µi is the true mean hardness of the ith tip.
βj is an effect on hardness due to the jth specimen
ij are independent as N(0, σi2 ). That is, σ12 is the variance of the
hardness measurements from tip 1, and σ22 is the variance of the
hardness measurements from tip 2.
Remark: The paired design is a special case of completely randomized
block design.

53/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 53 / 60
Note that if we compute the jth paired difference

dj = y1j − y2j , j = 1, 2, ..., 10.

The expected value of this difference is

µd = E (dj ) = E (y1j − y2j )


= E (y1j ) − E (y2j )
= µ1 + βj − µ2 − βj
= µ1 − µ2 .

That is, we may make inferences about the difference in the mean
hardness readings of the two tips µ1 − µ2 by making inferences about the
mean of the differences µd . Notice that the additive effect of the
specimens βj cancels out when the observations are paired in this manner.
54/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 54 / 60
Testing H0 : µ1 = µ2 is equivalent to testing

H 0 : µd = 0
H1 : µd 6= 0.

This is a one sample t-test.


The test statistic is

t0 = √ ,
Sd / n
where d¯ = nj=1 dj /n, Sd is the common standard deviation based on the
P
paired difference d1 , d2 , ..., d1 0.

We reject H0 if |t0 | ≥ tα/2,n−1 . A P-value approach could also be used.

55/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 55 / 60
Inferences About the Variances of Normal Distributions

In some experiments it is the comparison of variability in the data that is


important. In the food and beverage industry, for example, it is important
that the variability of filling equipment be small so that all packages have
close to the nominal net weight or volume of content. We now briefly
examine tests of hypotheses and confidence intervals for variances of
normal distributions. Unlike the tests on means (e.g., t-test is a robust
test), the procedures for tests on variances are rather sensitive to the
normality assumption.

56/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 56 / 60
One sample variance inference/test

Suppose we wish to test the hypothesis that the variance of a normal


population equals a constant, for example, σ02 . Stated formally, we wish
to test

H0 : σ 2 = σ02
H1 : σ 2 6= σ02
2
The test statistic χ20 = (n−1)S
σ02
∼ χ2n−1 , under H0 , where
S 2 = ni=1 (yi − ȳ )2 /(n − 1).
P

The null hypothesis is rejected if χ20 > χ2α/2,n−1 or if χ20 < χ21−α/2,n−1 .

57/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 57 / 60
The 100(1 − α)% confidence interval on σ 2 is

(n − 1)S 2 2 (n − 1)S 2
≤ σ ≤ .
χ2α/2,n−1 χ21−α/2,n−1

58/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 58 / 60
Two sample variance inference/test

Now consider testing the equality of the variances of two normal


populations. If independent random samples of size n1 and n2 are taken
from populations 1 and 2, respectively, the test statistic for

H0 : σ12 = σ22
H1 : σ12 6= σ22

S12
is the ratio of the sample variances F0 = S22
∼ Fn1 −1,n2 −1 under H0 .

The null hypothesis would be rejected if F0 > Fα/2,n1 −1,n2 −1 or if


F0 < F1−α/2,n1 −1,n2 −1 , where Fα/2,n1 −1,n2 −1 and F1−α/2,n1 −1,n2 −1 denote
the upper α/2 and lower 1 − α/2 percentage points of the F distribution
with n1 − 1 and n2 − 1 degrees of freedom.

59/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 59 / 60
In R, you can use the code var .test(Modified, Unmodified).

The 100(1 − α)% confidence interval on σ12 /σ22 is

S12 σ2 S2
2
F1−α/2,n1 −1,n2 −1 ≤ 12 ≤ 12 Fα/2,n1 −1,n2 −1 .
S2 σ2 S2

60/60
Dr. Zou (csueb) Randomization and Simple Comparative Experiments Jan 28, 2020 60 / 60

Você também pode gostar