Statistics Lectures

PholC60 September 2001
DATA INTERPRETATION AND STATISTICS

Books
A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell
at about 14.
DESCRIPTIVE STATISTICS
Data
Data are obtained by making observations of the world about us. Data are obtained from experiments or from studying
patients. Data contain information about the system or individuals under study, but in order to make judgements it is
usually necessary to process the data to extract relevant information.
Types of data:
Non-parametric data: nominal or categorical data (e.g. names, colours etc without any preferences)
ordinal data (rankings, 1st 2nd 3rd etc.)
Parametric data: numerical/quantitative
measurements may be on an interval scale (e.g. height, weight)
or they may be discrete values on a discontinuous scale (e.g. number of offspring)
Data, especially biological data, tend to be scattered. This form of variability may be an inherent property of the quantity
measured or it may be due to the limited accuracy of measurement. It is more difficult to draw conclusions from data that
are very scattered.
Samples and populations
To assess the properties of populations it is frequently necessary to make measurements on subsets called samples. This
may be because it is often impossible or unreasonable to carry out measurements on the entire population because it is too
large (e.g. the height of all Africans) or because the population is infinite (e.g. a subjects height measured several times.
You will not get exactly the same result each time, so you settle for a finite number of measurements since it would not be
possible to make an infinite number of measurements).
Samples should be representative and not biased. Usually randomly selected. From the properties of sample data, we
infer the properties of the population.
Describing data numerically
Data may be described by calculating quantities that measure:-
1. central tendency: mean, median or mode;
X
Mean : X =
N
(n + 1)
Median : th value
2
Mode : most frequent value
2. spread or scatter or dispersion: range, variance, standard deviation, coefficient of variation

The range = Xmax - Xmin is a poor measure of dispersion because it depends entirely on the extreme values, providing no
information about the intermediate ones.
The mean of the differences between each X and the mean is
(X - X)
n
but this is not a useful measure because it is close to zero when the distribution is symmetrical. If we square the
differences before dividing by n we get the variance:
2
(X - X )
n
If x, for example, is in cm then the variance is in cm2, consequently we use the standard deviation (SD) which is the
square root of the variance.
2
(X - X )
SD =
n
c60notestat.doc 1
This formula calculates the SD of a population of n observations X. We know that a sample mean is an estimator of the
population mean, but a sample SD, calculated from the above formula, would give a biased estimate of the population
SD. This is for rather complicated reasons. Briefly, it is because a single sample is not likely to contain extreme values
and the SD calculated over n tends to underestimate the population SD. To remove this bias in the estimate of
population SD we divide by (n-1), rather than n, when calculating the sample SD. Thus for a sample:-
2
(X - X )
SD =
n -1
The quantity n - 1 is called the number of degrees of freedom (d.f.). Statisticians will tell you that each time you
calculate a statistic from sample data the number of degrees of freedom is reduced by 1. Thus we use n in calculating the
mean value X , but since we use the mean in calculating SD the d.f. becomes n - 1 and we use this instead of n.
This is the way you will nearly always calculate SD. CHECK YOUR CALCULATOR! Use this simple example. The
SD of the sample 1, 2, 3 is 1. If the population is 1, 2, 3 then the SD is 0.816.
Presenting data graphically
Graphic methods allow the visual assessment of data. For nominal data this can take the form of a bar chart or a pie
diagram. For example:
50 Using Histograms to classify data

Number of people
Scatter plot
40 50 60 70 80
Body mass (kg)
29 data points
12
NUMBER
0 8
Blue Brown 4
CLASS INTERVAL = 10
0
40-45 50-55 60-65 70-75
45-50 55-60 65-70 75-80
Body mass (kg)
brown
blue 6 0.2
FREQUENCY
NUMBER
4
0.1
2
CLASS INTERVAL = 5
Histograms 0
40-45 50-55 60-65 70-75
0
45-50 55-60 65-70 75-80

Parametric data i.e. numerical data may be plotted as a histogram. Body mass (kg)
The quantity measured is divided into intervals or classes of
appropriate size and the number of observations within each class is
Frequency polygon
plotted. We are, therefore, classifying the data. The total area under 0.2
the histogram is proportional to the total number of observations.
FREQUENCY
0.1
Rather than plotting the number of observations in each class on the

vertical axis, it is common to plot the frequency. This is number of 0
40-45 50-55 60-65 70-75
45-50 55-60 65-70 75-80
observations in each class divided by the total number of Body mass (kg)
observations. The sum of all the frequencies will be 1.
Instead of plotting each class as a block, a frequency polygon outlining the profile can be drawn. Such a graph is called
the frequency distribution.
To summarise:
1. Measurements are performed upon a sample taken from a population.
2. We may construct a histogram or frequency distribution of the sample data
3. We may calculate from our sample data quantities called statistics that are estimators of population properties.
These include measures of central tendency: e.g. mean, median and mode. The scatter or spread in the data is best
described by statistics such as SD or coefficient of variation (SD/mean as a percentage).
c60notestat.doc 2
Standard error of the mean

If many samples are selected from a population, each
has its own mean value, X . The distribution of these Distribution of sample means (X)
means is called a sampling distribution and it is centred
Frequenc y
around the population mean . The width of the
sampling distribution depends on the number of items
in each sample. Larger samples give narrower
sampling distributions. This means that if you take a
sample of 20 items, the mean value will be closer to
than if you take a sample of, say, 5 items.
Samples
The SD of the sampling distribution is called the
standard error of the mean (SE or SEM). The smaller it
n = 20
is, the closer a sample mean is likely to be to .
Estimating population statistics
1. The sample mean X provides an estimate of the Standard error of the mean
s
population mean .
Frequenc y
n = 10 SEM =
n
2. The sample SD s provides an estimate of the
population SD .
n=5
Just how close these estimates are to the actual values
depends on the number of measurements or items in
the sample.
The standard error of the mean is a measure of the closeness of the sample mean to the population mean.
It is given by
s
SE =
n
(Remember, it's always n here, never n-1)
Illustrating the spread of data graphically

It is usual to show the SD or more commonly the SE on data plots as error bars
Box and whisker plot

This can be used instead of a dot or scatter plot to indicate the central tendency and the spread of data. It may be drawn
horizontally, as below, or vertically. The ends of the whiskers
indicate the limits of the data (range), while the box encloses The Normal Distribution
the values within 2 SDs either side of the mean. The central
vertical line is the mean value. Alternatively, another common 1
SD = 0.61
Frequenc y
7 8 9
0.61 height
convention is that the central line is the median and the box
encloses the upper and lower quartiles.
Measured quantity
The Normal Distribution

The Normal distribution is one of the most common frequency
distributions that occurs. It is bell-shaped and symmetrical
Frequenc y
about the central value. Its shape is completely defined by the

mathematical equation or formula that describes it.
-(x- )2
1 2 2
y = y0 e
22
It's not necessary for you to manipulate this rather forbidding C e ntra l va lue
equation, but if you are mathematical you may notice that -2 SD -1 SD +1 SD +2 SD
c60notestat.doc 3
when x = , y is at its maximum. Also when x- = , y is 1/e or 0.61 of its maximum value.
Thus, the Normal curve for a large population is the frequency polygon, centred at the population mean and with a
half-width of (the SD) at 61% of the maximum height. Any Normal curve is completely defined in terms of shape by
the parameters and , which determine its centre and width. Its area is equal to the number of items/observations.
A simplified form is provided by the Standard Normal Distribution (SND), where is set to zero and the units of
measure on the horizontal axis are SD's; (i.e. x has become z = (x - )/). The area under the whole curve is 1. The area
may be divided into parts by drawing vertical lines.
We can use this property of a Normal curve to provide an additional way of describing the spread of data in a sample. It
applies to large (n > 60) samples taken from a Normally distributed population and it is called the 95% confidence
interval:
95% c.i. = X (1.96 SE)
This doesn't apply to small samples (n < 60) since although the population may be Normally distributed, the samples tend
to be distributed according to the the so-called t distribution (a little broader than a Normal curve). An additional
complication is that, unlike the Normal distribution, the shape of the t distribution depends on the number of degrees of
freedom. Thus
95% c.i. = X ( t SE)
where the value of t is given by the t tables at d.f. = n - 1 and p = .05.
STATISTICAL INFERENCE
Tests of significance
Significance tests are used to determine the likelihood that two (or more) samples come from the same population. For
example, does a particular form of treatment make the patient better or could it have happened by chance? The general
procedure is as follows.
1. Formulate a null hypothesis (called H0 for short). This takes the pessimistic view that differences between sample
means is due entirely to chance, i.e. both samples are derived from the same population.
2. Calculate the significance level of H0. This is the probability that the null hypothesis is true.
3. If the significance level is (by convention) below 5% (p < .05) we reject H0.
Decisions about significance depend on the area under the appropriate distribution. Test can be two-tailed or they can be
single tailed (for differences in one direction only). More on this below.
Paired data, small samples
For two samples of paired data, i.e. data that are matched or that correspond in a one-to-one relation e.g. measurements
on the same individual "before" and "after" treatment, and where n < 60 and the data are from a Normal distribution, we
use a paired t test. (t is the number of SE's between the means).
This test is best performed by calculating the differences between the measurements on each individual and then
determining if the mean difference is significantly different from zero; H0 states that it is not.
mean diff
t = d.f. = n - 1
(s / n )
c60notestat.doc 4
Example:
Hours of sleep in patients after taking sleeping drug.
Patient Without drug After drug Difference

1 5.2 6.1 0.9
2 7.9 7.0 -0.9
3 3.9 8.2 4.3
4 4.7 7.6 2.9
5 5.3 6.5 1.2
6 5.4 8.4 3.0
7 4.2 6.9 2.7
8 6.1 6.7 0.6
9 3.8 7.4 3.6
10 6.3 5.8 -0.5
mean 5.28 7.06 1.78
SD 1.768
SE 0.559
t 3.18
d.f. 9
H0: The means 5.28 h and 7.28 h are not significantly different. Alternatively, the mean difference 1.78 h is not
significantly different from zero.
Looking in the t table we find: at d.f. = 9 and p < .05 t = 2.26,

at d.f. = 9 and p < .02 t = 2.82,
at d.f. = 9 and p < .01 t = 3.25.
We can therefore reject the null hypothesis at p < .02 and conclude that the drug is effective at changing the number of
hours of sleep. Another way of putting it is that the probability that the difference in the amounts of sleep was achieved
purely by chance is less than 2%.
NOTE: This was a two-sided or two-tailed comparison. It told us that the number of sleep hours would be different but
not specifically more. If there was no chance that a particular treatment could reduce sleep hours, then we could use the
data in a single-tailed (-sided) test and conclude that for t = 2.82 and d.f. = 9 the probability of H0 is < 1% (i.e. half of
2%). The t tables give values for either case and you have to make the choice. You will nearly always use two-tailed
comparisons.
Paired data, large samples
When n > 60 the t distribution and the Normal distribution are very similar, so we calculate not t but z, the Standard
Normal Deviate (see above). Remember z = (difference in means)/SE; it does not depend on d.f.
Unpaired data
An unpaired, or two sample, t test is used to compare samples that have no correspondence, for example a set of patients
and a set of healthy controls. The number in each sample does not have to be the same. If the SE for each sample is
similar then it is necessary to calculate a pooled SE sp. (If the SE's are rather different then other methods may be used).
( n1 -1) s12 + ( n2 -1) s22

sp =
n1 + n 2 - 2
This is then used to compute t as
X1 - X2
t = d.f. = n1 + n 2 - 2
1 1
sp ( + )
n1 n2
c60notestat.doc 5
For example birth weights (kg) of children born to smokers and non-smokers:
Non-smokers Heavy Smokers

3.99 3.18
3.79 2.84
3.60 2.90
3.73 3.27
3.21 3.85
3.60 3.52
4.08 3.23
3.61 2.76
3.83 3.60
3.31 3.75
4.13 3.59
3.26 3.63
3.54 2.38
3.51 2.34
_ 2.71
X 3.593 3.203 d.f. = 15 + 14 - 2 = 27
SD 0.371 0.493
SE 0.096 0.13 t = 2.42 with d.f. = 27
n 15 14
In the t table t = 2.47 at p < .02 and so we clearly reject the null hypothesis.
Note: For large samples use the Normal table (SND) and compute z from
X1 - X2
z =
s12 s2 2
+
n1 n 2
Non-parametric tests of significance
When data are not normally distributed we can often still use the parametric tests described above if we can transform the
data in a way that makes them normal. This can be achieved in a variety of ways, sometimes simply by taking the
logarithm. If this cannot be done or if the data are ordinal rather than parametric, then we must resort to a non-
parametric test. For these tests the data are converted from an interval scale into ranked data. The subsequent tests then
only consider the relative magnitudes of the data, not the actual values, so some information is lost.
There are many different non-parametric tests, all with specific applications. However, there is a correspondence between
the parametric and non-parametric methods.
These tests are not difficult to use and an appropriate textbook can be consulted for the methods when necessary. As with
many of the less common statistical tests, it is advisable to seek the assistance of a statistician before embarking on
extensive usage. To illustrate a non-parametric method the Wilcoxon signed rank test will be used on the data used above
for the paired t test.
Hours of sleep in patients after taking sleeping drug.
Patient Before After drug Difference Rank

1 5.2 6.1 0.9 3.5 tied
2 7.9 7.0 -0.9 3.5 3 & 4
3 3.9 8.2 4.3 10
4 4.7 7.6 2.9 7
5 5.3 6.5 1.2 5
6 5.4 8.4 3.0 8
7 4.2 6.9 2.7 6
8 6.1 6.7 0.6 2
9 3.8 7.4 3.6 9
10 6.3 5.8 -0.5 1
Procedure:
1. Rank the differences, excluding any that = 0; (ignore the signs).
2. Sum the ranks with positive and with negative differences:
T+ = 3.5 + 10 + 7 + 5 + 8 + 6 + 2 + 9 = 50.5
T- = 3.5 + 1 = 4.5
c60notestat.doc 6
H0: Drug and placebo give the same results. Thus we expect T+ to be similar to T-. If they are not then compare the
smallest with that due to chance alone.
Let T = smallest of T+ and T-. Thus T = 4.5.
Look up T in the Wilcoxon signed rank table at a sample size of N, where N = number of ranked differences excluding
zeros. Thus N = 10 and we find that p < .02 and reject H0.
Comparing more than two samples
Suppose you were asked to compare blood pressure readings from English, Welsh and Scottish people and were asked if
they were different from one-another. The t test is not appropriate for such a study. The equivalent of a t test for more
than two samples is called analysis of variance (anova for short). This procedure, which can only be applied to normally
distributed data, enables you to determine if the variation between sample means can be accounted for by the variation
that occurs within the data as a whole, (this is the null hypothesis), or whether the variation between the means is due to
significant differences between them. For a factor of analysis (such as nationality) one way anova is performed, for two
factors of analysis (for instance nationality and sex) two way anova is used, and so on.
Variances are calculated from "sums of squares" (i.e. (X - X)2, let us call it SS for short). These may be partitioned in
the following way
SStotal = SSbetween groups + SSwithin groups
The procedure is as follows.
1. Calculate the total SS, i.e. over all the data.
2. Calculate the SS between the means of each group or sample.
3. Calculate the residual SS which is the SS within the groups.
Now calculate the ratio F of the between-group variance to the within-group variance and deduce the p value from the F
table. (Note that for only two groups the result is identical to the t test.)
Comparing observed and expected data. The 2 test.
A way of comparing data that can be grouped into categories is to place the results in a contingency table that contains
both the observed and expected data. One of the ways of testing that the difference between observed and expected
values are significant is the 2 test. (Note or chi, pronounced as in sky, is a Greek letter. It is not always available to
typists and printers, which is why it is sometimes written as chi).
The restrictions on the use of this test are
1. n > 20
2. There must be at least 5 items in any "expected" box
3. The boxes must contain actual data not proportions
On the other hand, 2 tests are not restricted to Normally distributed data.
The 2 test can be used to detect an association between two (or more) variables measured for each individual. These
variables need not be continuous. They can be discrete or nominal (see above). For two variables we use a
2 x 2 contingency table. For example:
Does influenza vaccination reduce the chance of contracting the disease?
OBSERVED DATA:
'flu Vaccinated Placebo Total

Yes 20 80 100
No 220 140 360
240 220 460
Expected values are calculated assuming the null hypothesis; e.g. in the first box multiply 240 by the overall proportion
catching 'flu:
240 x 100/460 = 52.2 etc.
2
(Obs - Exp )
EXPECTED DATA: =
2
Yes 52.2 47.8 100 Exp
No 187.8 172.2 360
240 220 460
c60notestat.doc 7
2 2 2 2
(20 - 52.2 ) (80 - 47.8 ) (220 -187.8 ) (140 -172.2 )
2 = + + + = 53.09
52.2 47.8 187.8 172.2
The number of degrees of freedom is (no rows-1)(no columns-1) = 1. From the 2 table, 2 =10.83 for p = .001. 53.09
greatly exceeds this, so we may reject H0 conclude that the vaccine is effective.
Errors in significance testing
Rejection of H0 is sometimes termed a "positive" finding while acceptance is "negative". For example, when a patient is
tested for a particular disease and the result is significantly different from controls, the individual is termed positive for
that test. If the test was faulty it might give false positive or false negative results. These are classified as:
Type I errors or false positives Incorrect rejection of H0
Type II errors or false negatives Incorrect acceptance of H0
Statistical power
By definition, the probability of a Type I error is equal to the chosen
significance level (usually 5%). We can reduce the probability of a Type I
error by setting a lower significance level, say to 1%. The probability of a
Type II error is a little more complicated. If H0 is false then the distribution
of sample means will be centred around a population mean that is different
from . Let us call it . We reject H0 when our sample mean lies in the tails
of the sampling distribution centred on . However, there is a chance that
our sample could have a mean in the overlap region, i.e. there is a b% chance
that we would incorrectly accept the null hypothesis. The power of a
statistical test is given by the probability of not doing this. i.e. 100-b%.
Decreasing the significance level will reduce the power. Increasing sample
size will increase the power.
c60notestat.doc 8
CORRELATION AND LINEAR REGRESSION

If we want to measure the degree of association between two variables that we
suspect may be dependent on one another we can calculate the correlation
coefficient or perform linear regression. y b
These methods test only for a linear association, i.e. that the data are related by an 1
expression of the type y = a + bx. (Recall that this is the equation of a straight line
with a slope b and an intercept on the y axis at y = a):
An alternative approach and an important preliminary test is to draw a scatter plot
x
of the data. For example compare IQ and height for a sample of individuals. In
another example compare probability of heart
r=0 0<r<1
disease with daily fat intake.
Risk of Heart
Intelligence
100
Attack
There doesn't seem to be much correlation
score
between height and intelligence, but there appears
to be an increased likelihood of heart disease
when more fat is consumed.
Height Dietary fat intake
The horizontal axis (sometimes called the
abscissa) is usually the independent variable, the one whose values you select or are determined already. The vertical
axis (or ordinate) is usually reserved for the dependent variable, the one that is determined by nature.
Correlation coefficient
This is given by
(X - X)(Y - Y)
r = 2 2
(X - X ) (Y - Y )
Examples:
r=1 r = -1
Y Y
X X
It would be inappropriate to calculate the correlation coefficient of data that

is non-linear, (i.e. does not follow a straight line relationship) e.g.
Y
Notice:
1. r has no units.
2. The closer r is to 1, the better the correlation. X
3. Correlation doesn't necessarily indicate direct causality.
Remember: The data must also be Normally distributed, (otherwise use a non-parametric test such as Spearman's rank
correlation test).
c60notestat.doc 9
Example:
Is there a correlation between body weights of 8 healthy men and their corresponding blood plasma volumes?
Subject weight (kg) plasma vol. (l) 3.6
1 58 2.75 3.4
2 70 2.86
Plasma volume (l)

3 74 3.37 3.2
4 63.5 2.76
5 62 2.62 3.0
6 70.5 3.49
7 71 3.05 2.8
8 66 3.12
2.6
We find r = 0.76 which is a rather weak correlation.
Clearly other factors must affect plasma volume. How 56 58 60 62 64 66 68 70 72 74 76
much of the observed variation is determined by body Body mass (kg)

weight? This is given by r2 which is called the
coefficient of determination.
In our example r2 = 0.58, so 58% of the variation in plasma volume is accounted for by its correlation with body weight.
Linear regression
This is an alternative way of assessing dependence, but it also provides the
equation of the straight line that best fits the data, by specifying its slope and
intercept. This line is called the regression line.
This is achieved by minimising the distances between the data points and the
fitted line:
Usually x is the independent variable (i.e. determined by the investigation)
and the vertical (y) distances are minimised. (For example we wish to know
how plasma volume is determined by body weight not the converse).
The line we obtain is then termed the regression of y upon x. Its equation is
given by
Y = a + bX where b =
(X - X )(Y - Y )
(X - X )2
and a = Y - bX
In our example b = 0.0436

a = 0.0857
3.6
so that Y = 0.0857+ 0.0436 X and we can construct the
line by calculating x and y values.
3.4
The derived equation can be used to calculate values of y
Plasma volume (l)
for a given x. Alternatively, y values may be read directly 3.2

from the straight line graph. Both of these operations
should be restricted to the region encompassed by the
3.0
original data. This is called interpolation.
The estimations of y values beyond the data region is 2.8
called extrapolation. Often there is no reason to assume
that the regression line will apply beyond the data limits,
2.6
so extrapolation can be misleading.
56 58 60 62 64 66 68 70 72 74 76
Body mass (kg)
c60notestat.doc 10
Areas in tail of the standard normal distribution
Proportion of area above z
Second decimal place of z
z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
0.4 0.3446 0.3409 0.3372 0.3336 0. 3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
0.7 0.2420 0.2389 0.2358 0.2327 0.2297 0.2266 0.2236 0.2207 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
1.0 0.1587 0.1563 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
1.2 0.1151 0.1131 0.1112 0.1094 0.1075 0.1057 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0. 0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
1.8 0.0359 0.0352 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
2.0 0.02275 0.02222 0.02169 0.02118 0.02067 0.02018 0.01970 0.01923 0.01876 0.01831
2.1 0.01786 0.01743 0.01700 0.01659 0.01618 0. 01578 0.01539 0.01500 0.01463 0.01426
2.2 0.01390 0.01355 0.01321 0.01287 0.01255 0.01222 0.01191 0.01160 0.01130 0.01101
2.3 0.01072 0.01044 0.01017 0.00990 0.00964 0.00939 0.00914 0.00889 0.00866 0.00842
2.4 0.00820 0.00798 0.00776 0.00755 0.00734 0.00714 0.00695 0.00676 0.00657 0.00639
2.5 0.00621 0.00604 0.00587 0.00570 0.00554 0.00539 0.00523 0.00508 0.00494 0.00480
2.6 0.00466 0.00453 0.00440 0.00427 0.00415 0.00402 0.00391 0.00379 0.00368 0.00357
2.7 0.00347 0.00336 0.00326 0.00317 0.00307 0.00298 0.00289 0.00280 0.00272 0.00264
2.8 0.00256 0.00248 0.00240 0.00233 0.00226 0.00219 0.00212 0.00205 0.00199 0.00193
2.9 0.00187 0.00181 0.00175 0.00169 0.00164 0.00159 0.00154 0.00149 0.00144 0.00139
3.0 0.00135 0.00131 0.00126 0.00122 0.00118 0.00114 0.00111 0.00107 0.00103 0.00100
3.1 0.00097 0.00094 0.00090 0.00087 0.00084 0.00082 0.00079 0.00076 0.00074 0.00071
3.2 0.00069 0.00066 0.00064 0.00062 0.00060 0.00058 0.00056 0.00054 0.00052 0.00050
3.3 0.00048 0.00047 0.00045 0.00043 0.00042 0.00040 0.00039 0.00038 0.00036 0.00035
3.4 0.00034 0.00032 0.00031 0.00030 0.00029 0.00028 0.00027 0.00026 0.00025 0.00024
3.5 0.00023 0.00022 0.00022 0.00021 0.00020 0.00019 0.00019 0.00018 0.00017 0.00017
3.6 0.00016 0.00015 0.00015 0.00014 0.00014 0.00013 0.00013 0.00012 0.00012 0.00011
3.7 0.00011 0.00010 0.00010 0.00010 0.00009 0.00009 0.00008 0.00008 0.00008 0.00008
3.8 0.00007 0.00007 0.00007 0.00006 0.00006 0.00006 0.00006 0.00005 0.00005 0.00005
3.9 0.00005 0.00005 0.00004 0.00004 0.00004 0.00004 0.00004 0.00004 0.00003 0.00003
4.0 0.00003 0.00003 0.00003 0.00003 0.00003 0.00003 0.00002 0.00002 0.00002 0.00002
Critical values of t
p values
one tailed 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
two tailed 0.5 0.4 0.3 0.2 0.1 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.00 1.38 1.96 3.08 6.31 12.71 15.89 31.82 63.66 127.30 318.30 636.60
2 0.82 1.06 1.39 1.89 2.92 4.30 4.85 6.97 9.93 14.09 22.33 31.60
3 0.77 0.98 1.25 1.64 2.35 3.18 3.48 4.54 5.84 7.45 10.21 12.92
4 0.74 0.94 1.19 1.53 2.13 2.78 3.00 3.75 4.60 5.60 7.17 8.61
5 0.73 0.92 1.16 1.48 2.02 2.57 2.76 3.37 4.03 4.77 5.89 6.87
6 0.72 0.91 1.13 1.44 1.94 2.45 2.61 3.14 3.71 4.32 5.21 5.96
7 0.71 0.90 1.12 1.42 1.90 2.37 2.52 3.00 3.50 4.03 4.79 5.41
8 0.71 0.89 1.11 1.40 1.86 2.31 2.45 2.90 3.36 3.83 4.50 5.04
9 0.70 0.88 1.10 1.38 1.83 2.26 2.40 2.82 3.25 3.69 4.30 4.78
10 0.70 0.88 1.09 1.37 1.81 2.23 2.36 2.76 3.17 3.58 4.14 4.59
11 0.70 0.88 1.09 1.36 1.80 2.20 2.33 2.72 3.11 3.50 4.03 4.44
12 0.70 0.87 1.08 1.36 1.78 2.18 2.30 2.68 3.06 3.43 3.93 4.32
13 0.69 0.87 1.08 1.35 1.77 2.16 2.28 2.65 3.01 3.37 3.85 4.22
14 0.69 0.87 1.08 1.35 1.76 2.15 2.26 2.62 2.98 3.33 3.79 4.14
15 0.69 0.87 1.07 1.34 1.75 2.13 2.25 2.60 2.95 3.29 3.73 4.07
16 0.69 0.87 1.07 1.34 1.75 2.12 2.24 2.58 2.92 3.25 3.69 4.02
17 0.69 0.86 1.07 1.33 1.74 2.11 2.22 2.57 2.90 3.22 3.65 3.97
18 0.69 0.86 1.07 1.33 1.73 2.10 2.21 2.55 2.88 3.20 3.61 3.92
19 0.69 0.86 1.07 1.33 1.73 2.09 2.21 2.54 2.86 3.17 3.58 3.88
20 0.69 0.86 1.06 1.33 1.73 2.09 2.20 2.53 2.85 3.15 3.55 3.85
21 0.66 0.86 1.06 1.32 1.72 2.08 2.19 2.52 2.83 3.14 3.53 3.82
22 0.69 0.86 1.06 1.32 1.72 2.07 2.18 2.51 2.82 3.12 3.51 3.79
23 0.69 0.86 1.06 1.32 1.71 2.07 2.18 2.50 2.81 3.10 3.49 3.77
24 0.69 0.86 1.06 1.32 1.71 2.06 2.17 2.49 2.80 3.09 3.47 3.75
25 0.68 0.86 1.06 1.32 1.71 2.06 2.17 2.49 2.79 3.08 3.45 3.73
26 0.68 0.86 1.06 1.32 1.71 2.06 2.16 2.48 2.78 3.07 3.44 3.71
27 0.68 0.86 1.06 1.31 1.70 2.05 2.15 2.47 2.77 3.06 3.42 3.69
28 0.68 0.86 1.06 1.31 1.70 2.05 2.15 2.47 2.76 3.05 3.41 3.67
29 0.68 0.85 1.06 1.31 1.70 2.05 2.15 2.46 2.76 3.04 3.40 3.66
30 0.68 0.85 1.06 1.31 1.70 2.04 2.15 2.46 2.75 3.03 3.39 3.65
40 0.68 0.85 1.05 1.30 1.68 2.02 2.12 2.42 2.70 2.97 3.31 3.55
50 0.68 0.85 1.05 1.30 1.68 2.01 2.11 2.40 2.68 2.94 3.26 3.50
60 0.68 0.85 1.05 1.30 1.67 2.00 2.10 2.39 2.66 2.92 3.23 3.46
80 0.68 0.85 1.04 1.29 1.66 1.99 2.09 2.37 2.64 2.89 3.20 3.42
100 0.68 0.85 1.04 1.29 1.66 1.98 2.08 2.36 2.63 2.87 3.17 3.39
1000 0.68 0.84 1.04 1.28 1.65 1.96 2.06 2.33 2.58 2.81 3.10 3.30
inf. 0.67 0.84 1.04 1.28 1.64 1.96 2.05 2.33 2.58 2.81 3.09 3.29
c2 Distribution
p value 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
df
1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12
2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20
3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73
4 5.39 5.59 6.74 7.78 9.49 11.14 11.67 13.23 14.86 16.42 18.47 20.00
5 6.63 7.29 8.12 9.24 11.07 12.83 13.33 15.09 16.75 18.39 20.51 22.11
6 7.84 8.56 9.45 10.64 12.53 14.45 15.03 16.81 18.55 20.25 22.46 24.10
7 9.04 9.80 10.75 12.02 14.07 16.01 16.62 18.48 20.28 22.04 24.32 26.02
8 10.22 11.03 12.03 13.36 15.51 17.53 18.17 20.09 21.95 23.77 26.12 27.87
9 11.39 12.24 13.29 14.68 16.92 19.02 19.63 21.67 23.59 25.46 27.83 29.67
10 12.55 13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 31.42
11 13.70 14.63 15.77 17.29 19.68 21.92 22.62 24.72 26.76 28.73 31.26 33.14
12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 34.82
13 15.93 15.58 18.90 19.81 22.36 24.74 25.47 27.69 29.82 31.88 34.53 36.48
14 17.12 18.15 19.40 21.06 23.68 26.12 26.87 29.14 31.32 33.43 36.12 38.11
15 18.25 19.31 20.60 22.31 25.00 27.49 28.26 30.58 32.80 34.95 37.70 39.72
16 19.37 20.47 21.79 23.54 26.30 28.85 29.63 32.00 34.27 36.46 39.25 41.31
17 20.49 21.61 22.98 24.77 27.59 30.19 31.00 33.41 35.72 37.95 40.79 42.88
18 2 1.60 22.76 24.16 25.99 28.87 31.53 32.35 34.81 37.16 39.42 42.31 44.43
19 22.72 23.90 25.33 27.20 30.14 32.85 33.69 36.19 38.58 40.88 43.82 45.97
20 23.83 25.04 26.50 28.41 31.41 34.17 35.02 37.57 40.00 42.34 45.31 47.50
21 24.93 26.17 27.66 29.62 32.67 35.48 36.34 38.93 41.40 43.78 46.80 49.01
22 26.04 27.30 28.82 30.81 33.92 36.78 37.66 40.29 42.80 45.20 48.27 50.51
23 27.14 28.43 29.98 32.01 35.17 38.08 38.97 41.64 44.18 46.62 49.73 52.00
24 28.24 29.55 31.13 33.20 36.42 39.36 40.27 42.98 45.56 48.03 51.18 53.48
25 29.34 30.68 32.28 34.38 37.65 40.65 41.57 44.31 46.93 49.44 52.62 54.95
26 30.43 31.79 33.43 35.56 38.89 41.92 42.86 45.64 48.29 50.83 54.05 56.41
27 31.53 32.91 34.57 36.74 40.11 43.19 44.14 46.96 49.64 52.22 55.48 57.86
28 32.62 34.03 35.71 37.92 41.34 44.46 45.42 48.28 50.99 53.59 56.89 59.30
29 33.71 35.14 36.85 39.09 42.56 45.72 46.69 49.59 52.34 54.97 58.30 60.73
30 34.80 36.25 37.99 40.26 43.77 46.98 47.96 50.89 53.67 56.33 59.70 62.16
40 45.62 47.27 49.24 51.81 55.76 59.34 60.44 63.69 66.77 69.70 73.40 76.09
50 56.33 53.16 60.35 63.17 67.50 71.42 72.61 76.15 79.49 82.66 86.66 89.56
60 66.98 68.97 71.34 74.40 79.08 83.30 84.58 88.38 91.95 95.34 99.61 102.7
80 88.13 90.41 93.11 96.58 101.9 106.6 108.1 112.3 116.3 120.1 124.8 128.3
100 109.1 111.7 114.7 118.5 124.3 129.6 131.1 135.8 140.2 144.3 149.4 153.2

Statistics Lectures

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Statistics Lectures

Enviado por

Direitos autorais:

Formatos disponíveis

PholC60 September 2001

DATA INTERPRETATION AND STATISTICS

2. spread or scatter or dispersion: range, variance, standard deviation, coefficient of variation

50 Using Histograms to classify data

45-50 55-60 65-70 75-80

Rather than plotting the number of observations in each class on the

Standard error of the mean

Illustrating the spread of data graphically

Box and whisker plot

The Normal Distribution

about the central value. Its shape is completely defined by the

equation, but if you are mathematical you may notice that -2 SD -1 SD +1 SD +2 SD

Patient Without drug After drug Difference

Looking in the t table we find: at d.f. = 9 and p < .05 t = 2.26,

( n1 -1) s12 + ( n2 -1) s22

Non-smokers Heavy Smokers

Patient Before After drug Difference Rank

Does influenza vaccination reduce the chance of contracting the disease?

'flu Vaccinated Placebo Total

CORRELATION AND LINEAR REGRESSION

It would be inappropriate to calculate the correlation coefficient of data that

Plasma volume (l)

much of the observed variation is determined by body Body mass (kg)

In our example b = 0.0436

for a given x. Alternatively, y values may be read directly 3.2

Body mass (kg)

Você também pode gostar