Escolar Documentos
Profissional Documentos
Cultura Documentos
Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis
that sample observations result purely from chance.
For example, suppose we wanted to determine whether a coin was fair and
balanced. A null hypothesis might be that half the flips would result in Heads and half, in
Tails. The alternative hypothesis might be that the number of Heads and Tails would be
very different. Symbolically, these hypotheses would be expressed as
H0: P = 0.5
Ha: P ≠ 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given
this result, we would be inclined to reject the null hypothesis. We would conclude, based
on the evidence, that the coin was probably not fair and balanced.
Hypothesis Tests
State the hypotheses. This involves stating the null and alternative hypotheses.
The hypotheses are stated in such a way that they are mutually exclusive. That
is, if one is true, the other must be false.
Formulate an analysis plan. The analysis plan describes how to use sample
data to evaluate the null hypothesis. The evaluation often focuses around a
single test statistic.
Analyze sample data. Find the value of the test statistic (mean score,
proportion, t-score, z-score, etc.) described in the analysis plan.
Interpret results. Apply the decision rule described in the analysis plan. If the
value of the test statistic is unlikely, based on the null hypothesis, reject the null
hypothesis.
Decision Rules
The analysis plan includes decision rules for rejecting the null hypothesis. In
practice, statisticians describe these decision rules in two ways - with reference to a P-
value or with reference to a region of acceptance.
The set of values outside the region of acceptance is called the region of
rejection. If the test statistic falls within the region of rejection, the null
hypothesis is rejected. In such cases, we say that the hypothesis has been
rejected at the α level of significance.
These approaches are equivalent. Some statistics texts use the P-value
approach; others use the region of acceptance approach. In subsequent lessons, this
tutorial will present examples that illustrate each approach.
TYPES OF HYPOTHESIS
A test of a statistical hypothesis, where the region of rejection is on only one side
of the sampling distribution, is called a one-tailed test. For example, suppose the null
hypothesis states that the mean is less than or equal to 10. The alternative hypothesis
would be that the mean is greater than 10. The region of rejection would consist of a
range of numbers located on the right side of sampling distribution; that is, a set of
numbers greater than 10.
LEVEL OF SIGNIFICANCE
The significance level of a test is the probability that the test statistic will reject
the null hypothesis when the [hypothesis] is true. Significance is a property of the
distribution of a test statistic, not of any particular draw of the statistic. The significance
level is usually denoted by the Greek symbol α (lower case alpha). Popular levels of
significance are 5% (0.05), 1% (0.01) and 0.1% (0.001). If a test of significance gives a
p-value lower than the α-level, the null hypothesis is rejected. Such results are
informally referred to as 'statistically significant'. For example, if someone argues that
"there's only one chance in a thousand this could have happened by coincidence," a
0.001 level of statistical significance is being implied. The lower the significance level,
the stronger the evidence required. Choosing level of significance is an arbitrary task,
but for many applications, a level of 5% is chosen, for no better reason than that it is
conventional.
In some situations it is convenient to express the statistical significance as 1 − α.
In general, when interpreting a stated significance, one must be careful to note what,
precisely, is being tested statistically.
In some fields, for example nuclear and particle physics, it is common to express
statistical significance in units of "σ" (sigma), the standard deviation of a Gaussian
distribution. A statistical significance of "nσ" can be converted into a value of α via use
of the error function:
The critical value(s) for a hypothesis test is a threshold to which the value of the
test statistic in a sample is compared to determine whether or not the null hypothesis is
rejected.
The critical value for any hypothesis test depends on the significance level at
which the test is carried out, and whether the test is one-sided or two-sided.
(Note: The methodology below works equally well for both one-tail and two-tail
hypothesis testing.)
Example:
A phone industry manager thinks that customer monthly cell phone bills have increased
and now average over $52 per month. The company asks you to test this claim. The
population standard deviation, σ, is known to be equal to 10 from historical data.
The Hypotheses
1.H0: μ ≤ 52
H1: μ > 52
Study Design
2. After consulting with the manager and discussing error risk, we choose a level of
significance, α, of 0.10. Our resources allow us to sample 64 sample cell phone bills.
3. Since our hypothesis involves the population mean and we know the population
standard deviation, our test statistic is z and follows the normal distribution.
4. In determining the critical value, we first recognize this test as a one-tail test since the
null hypothesis involves an inequality, ≤. Therefore the rejection region is entirely on the
side of the distribution greater than the historic mean - right tail.
We want to determine a z-value for which the area to the right of that value is 0.10, our
α. We can use the cumulative normal distribution table (which gives areas to the left of
the z-value) and find z having value 0.90 = 1.285. This is our critical value.
The Study
5. We conduct our study and find that the mean of the 64 sample cell phone bills is
53.1. We compute the test statstic, z = (xbar-μ)/(σ/√n) = (53.1-52)/(10/√64) = 0.88.
Conclusions
6. Since 0.88 is less than the critical value of 1.285, we do not reject the null hypothesis.
We report to the company that, based on our testing, there is not evidence that the
mean cell phone bill has increased from $52 per month.
TEST OF SIGNIFICANCE
When a statistic is significant, it simply means that you are very sure that the
statistic is reliable. It doesn't mean the finding is important or that it has any decision-
making utility.
For example, suppose we give 1,000 people an IQ test, and we ask if there is a
significant difference between male and female scores. The mean score for males is 98
and the mean score for females is 100. We use an independent group’s t-test and find
that the difference is significant at the .001 level. The big question is, "So what?” The
difference between 98 and 100 on an IQ test is a very small difference...so small, in
fact, that it’s not even important.
Then why did the t-statistic come out significant? Because there was a large
sample size. When you have a large sample size, very small differences will be
detected as significant. This means that you are very sure that the difference is real
(i.e., it didn't happen by fluke). It doesn't mean that the difference is large or important. If
we had only given the IQ test to 25 people instead of 1,000, the two-point difference
between males and females would not have been significant.
Significance is a statistical term that tells how sure you are that a difference or
relationship exists. To say that a significant difference or relationship exists only tells
half the story. We might be very sure that a relationship exists, but is it a strong,
moderate, or weak relationship? After finding a significant relationship, it is important to
evaluate its strength. Significant relationships can be strong or weak. Significant
differences can be large or small. It just depends on your sample size.
Many researchers use the word "significant" to describe a finding that may have
decision-making utility to a client. From a statistician's viewpoint, this is an incorrect use
of the word. However, the word "significant" has virtually universal meaning to the
public. Thus, many researchers use the word "significant" to describe a difference or
relationship that may be strategically important to a client (regardless of any statistical
tests). In these situations, the word "significant" is used to advise a client to take note of
a particular difference or relationship because it may be relevant to the company's
strategic plan. The word "significant" is not the exclusive domain of statisticians and
either use is correct in the business world. Thus, for the statistician, it may be wise to
adopt a policy of always referring to "statistical significance" rather than simply
"significance" when communicating with the public.
There is a raging controversy (for about the last hundred years) on whether or
not it is ever appropriate to use a one-tailed test. The rationale is that if you already
know the direction of the difference, why bother doing any statistical tests. While it is
generally safest to use two-tailed tests, there are situations where a one-tailed test
seems more appropriate. The bottom line is that it is the choice of the researcher
whether to use one-tailed or two-tailed research questions.
1. Decide on the critical alpha level you will use (i.e., the error rate you are willing to
accept).
2. Conduct the research.
3. Calculate the statistic.
4. Compare the statistic to a critical value obtained from a table.
If your statistic is higher than the critical value from the table:
If your statistic is lower than the critical value from the table:
Modern computer software can calculate exact probabilities for most test statistics. If
you have an exact probability from computer software, simply compare it to your critical
alpha level. If the exact probability is less than the critical alpha level, your finding is
significant, and if the exact probability is greater than your critical alpha level, your
finding is not significant. Using a table is not necessary when you have the exact
probability for a statistic.
In hypothesis testing, there are two types of errors. The first is type I error and
the second is type II error.
Type I error
In hypothesis testing, type I errors occurs when we are rejecting the null
hypothesis, but that hypothesis was true. In hypothesis testing, type I error is denoted
by alpha. In Hypothesis testing, the normal curve that shows the critical region is called
the alpha region. Even though it is unlikely that the test statistics will fall into the critical
region (red) when the null hypothesis is true, it is still possible. When this occurs, we
reject H0, when indeed it is true, and therefore make an error in doing so.
Type II errors
In hypothesis testing, type II errors occur when we accept the null hypothesis but
it is false. In hypothesis testing, type II errors are denoted by beta. In Hypothesis
testing, the normal curve that shows the acceptance region is called the beta region.
T-TEST
Description
The t-test (or student's t-test) gives an indication of the separateness of two sets
of measurements, and is thus used to check whether two sets of measures are
essentially different (and usually that an experimental effect has been demonstrated).
The typical way of doing this is with the null hypothesis that means of the two sets of
measures are equal.
It is used when there is random assignment and only two sets of measurement to
compare.
Calculation
The value of t may be calculated using packages such as SPSS. The actual calculation
for two groups is:
t = experimental effect / variability
= difference between group means /
standard error of difference between group means
Interpretation
The resultant t-value is then looked up in a t-table to determine the probability
that a significant difference between the two sets of measures exists and hence what
can be claimed about the efficacy of the experimental treatment.
Effect
The t-value can also be converted to a Pearson r-value to measure effect, which
can be calculated as:
r = SQRT( t2 / (t2 + DF))
where DF is the degrees of freedom.
In a t-test, DF = N1 + N2 - 2.
Reporting
Reporting a t-test might look something like this:
On average, the reported relationship between holidays in the south
(M=24.1, SE=1.5) were significantly preferred than holidays in the north
(M=20.1, SE=1.2), t(22)=2.3, p<.05, r=.44.
In this, 'M' is the mean and 'SE' the standard error of each sample. In 't(X)=Y', X is
the degrees of freedom and Y is the t-metric. 'p' is the probability of a type-1 error and
'r' is the effect.
Discussion
The t-test was described by 1908 by William Sealy Gosset for monitoring the
brewing at Guinness in Dublin. Guinness considered the use of statistics a trade secret,
so he published his test under the pen-name 'Student' -- hence the test is now often
called the 'Student's t-test'.
The t-test is a basic test that is limited to two groups. For multiple groups, you
would have to compare each pair of groups, for example with three groups there would
be three tests (AB, AC, BC), whilst with seven groups there would need to be 21 tests.
The basic principle is to test the null hypothesis that the means of the two groups
are equal.
A significant problem with this is that we typically accept significance with each t-
test of 95% (p=0.05). For multiple tests these accumulate and hence reduce the
validity of the results.
Z-TEST
Description
The Z-test compares sample and population means to determine if there is a
significant difference.
It requires a simple random sample from a population with a Normal distribution
and where where the mean is known.
A statistical test of the null hypothesis that a population parameter μ is equal to a
given value μ 0 . We construct a z-statistic for the null hypothesis, i.e. a statistic which,
under the null hypothesis, has mean zero and approximately a standard normal
distribution. Then we accept the null hypothesis if z is less than z p for a one tailed test
with probability p, where z p is the pth percentile of the standard normal distribution.
Calculation
The z measure is calculated as:
z = (x - ) / SE
where x is the mean sample to be standardized, (mu) is the population
mean and SE is the standard error of the mean.
SE = / SQRT(n)
where is the population standard deviation and n is the sample size.
The z value is then looked up in a z-table. A negative z value means it is below the
population mean (the sign is ignored in the lookup table).
Discussion
The Z-test is typically with standardized tests, checking whether the scores from
a particular sample are within or outside the standard test performance.
The z value indicates the number of standard deviation units of the sample from the
population mean.
CORRELATION
The correlation is one of the most common and most useful statistics. A
correlation is a single number that describes the degree of relationship between two
variables. Let's work through an example to show you how this statistic is computed.
Correlation Example
Let's assume that we want to look at the relationship between two variables,
height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are
effects your self esteem (incidentally, I don't think we have to worry about the direction
of causality here -- it's not likely that self esteem causes your height!). Let's say we
collect some information on twenty individuals (all male -- we know that the average
height differs for males and females so, to keep this example simple we'll just use
males). Height is measured in inches. Self esteem is measured based on the average
of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the
data for the 20 cases (don't take this too seriously -- I made this data up to illustrate
what a correlation is):
Now we're ready to compute the correlation value. The formula for the correlation is:
We use the symbol r to stand for the correlation. Through the magic of
mathematics it turns out that r will always be between -1.0 and +1.0. if the correlation is
negative, we have a negative relationship; if it's positive, the relationship is positive. You
don't need to know how we came up with this formula unless you want to be a
statistician. But you probably will need to know how the formula relates to real data --
how you can use the formula to compute the correlation. Let's look at the data we need
for the formula. Here's the original data with the other necessary columns:
Self Esteem
Person Height (x) x*y x*x y*y
(y)
1 68 4.1 278.8 4624 16.81
2 71 4.6 326.6 5041 21.16
3 62 3.8 235.6 3844 14.44
4 75 4.4 330 5625 19.36
5 58 3.2 185.6 3364 10.24
6 60 3.1 186 3600 9.61
7 67 3.8 254.6 4489 14.44
8 68 4.1 278.8 4624 16.81
9 71 4.3 305.3 5041 18.49
10 69 3.7 255.3 4761 13.69
11 68 3.5 238 4624 12.25
12 67 3.2 214.4 4489 10.24
13 63 3.7 233.1 3969 13.69
14 62 3.3 204.6 3844 10.89
15 60 3.4 204 3600 11.56
16 63 4 252 3969 16
17 65 4.1 266.5 4225 16.81
18 67 3.8 254.6 4489 14.44
19 63 3.4 214.2 3969 11.56
20 61 3.6 219.6 3721 12.96
Sum = 1308 75.1 4937.6 85912 285.45
The first three columns are the same as in the table above. The next three
columns are simple computations based on the height and self esteem data. The
bottom row consists of the sum of each column. This is all the information we need to
compute the correlation. Here are the values from the bottom row of the table (where N
is 20 people) as they are related to the symbols in the formula:
Now, when we plug these values into the formula given above, we get the
following (I show it here tediously, one step at a time):
So, the correlation for our twenty cases is .73, which is a fairly strong positive
relationship. I guess there is a relationship between height and self esteem, at least in
this made up data!
Once you've computed a correlation, you can determine the probability that the
observed correlation occurred by chance. That is, you can conduct a significance test.
Most often you are interested in determining the probability that the correlation is a real
one and not a chance occurrence. In this case, you are testing the mutually
exclusive hypotheses:
The easiest way to test this hypothesis is to find a statistics book that has a table
of critical values of r. Most introductory statistics texts would have a table like this. As in
all hypotheses testing, you need to first determine the significance level. Here, I'll use
the common significance level of alpha = .05. This means that I am conducting a test
where the odds that the correlation is a chance occurrence are no more than 5 out of
100. Before I look up the critical value in a table I also have to compute the degrees of
freedom or df. The df is simply equal to N-2 or, in this example, is 20-2 = 18. Finally, I
have to decide whether I am doing a one-tailed or two-tailed test. In this example, since
I have no strong prior theory to suggest whether the relationship between height and
self esteem would be positive or negative, I'll opt for the two-tailed test. With these three
pieces of information -- the significance level (alpha = .05)), degrees of freedom (df =
18), and type of test (two-tailed) -- I can now test the significance of the correlation I
found. When I look up this value in the handy little table at the back of my statistics book
I find that the critical value is .4438. This means that if my correlation is greater than .
4438 or less than -.4438 (remember, this is a two-tailed test) I can conclude that the
odds are less than 5 out of 100 that this is a chance occurrence. Since my correlation 0f
.73 is actually quite a bit higher, I conclude that it is not a chance finding and that the
correlation is "statistically significant" (given the parameters of the test). I can reject the
null hypothesis and accept the alternative.
All I've shown you so far is how to compute a correlation between two variables.
In most studies we have considerably more than two variables. Let's say we have a
study with 10 interval-level variables and we want to estimate the relationships among
all of them (i.e., between all possible pairs of variables). In this instance, we have 45
unique correlations to estimate (more later on how I knew that!). We could do the above
computations 45 times to obtain the correlations. Or we could use just about any
statistics program to automatically compute all 45 with a simple click of the mouse.
I used a simple statistics program to generate random data for 10 variables with 20
cases (i.e., persons) for each variable. Then, I told the program to compute the
correlations among these variables. Here's the result:
C1 C2 C3 C4 C5 C6 C7 C8
C9 C10
C1 1.000
C2 0.274 1.000
C3 -0.134 -0.269 1.000
C4 0.201 -0.153 0.075 1.000
C5 -0.129 -0.166 0.278 -0.011 1.000
C6 -0.095 0.280 -0.348 -0.378 -0.009 1.000
C7 0.171 -0.122 0.288 0.086 0.193 0.002 1.000
C8 0.219 0.242 -0.380 -0.227 -0.551 0.324 -0.082 1.000
C9 0.518 0.238 0.002 0.082 -0.015 0.304 0.347 -0.013
1.000
C10 0.299 0.568 0.165 -0.122 -0.106 -0.169 0.243 0.014
0.352 1.000
This type of table is called a correlation matrix. It lists the variable names (C1-
C10) down the first column and across the first row. The diagonal of a correlation matrix
(i.e., the numbers that go from the upper left corner to the lower right) always consists of
ones. That's because these are the correlations between each variable and itself (and a
variable is always perfectly correlated with itself). This statistical program only shows
the lower triangle of the correlation matrix. In every correlation matrix there are two
triangles that are the values below and to the left of the diagonal (lower triangle) and
above and to the right of the diagonal (upper triangle). There is no reason to print both
triangles because the two triangles of a correlation matrix are always mirror images of
each other (the correlation of variable x with variable y is always equal to the correlation
of variable y with variable x). When a matrix has this mirror-image quality above and
below the diagonal we refer to it as asymmetric matrix. A correlation matrix is always a
symmetric matrix.
To locate the correlation for any pair of variables, find the value in the table for
the row and column intersection for those two variables. For instance, to find the
correlation between variables C5 and C2, I look for where row C2 and column C5 is (in
this case it's blank because it falls in the upper triangle area) and where row C5 and
column C2 is and, in the second case, I find that the correlation is -.166.
OK, so how did I know that there are 45 unique correlations when we have 10
variables? There's a handy simple little formula that tells how many pairs (e.g.,
correlations) there are for any number of variables:
where N is the number of variables. In the example, I had 10 variables, so I know I have
(10 * 9)/2 = 90/2 = 45 pairs.
Other Correlations
The specific type of correlation I've illustrated here is known as the Pearson
Product Moment Correlation. It is appropriate when both variables are measured at
an interval level. However there are a wide variety of other types of correlations for
other circumstances. for instance, if you have two ordinal variables, you could use the
Spearman rank Order Correlation (rho) or the Kendall rank order Correlation (tau).
When one measure is a continuous interval level one and the other is dichotomous (i.e.,
two-category) you can use the Point-Biserial Correlation.
Regression Analysis
For example, a medical researcher might want to use body weight (independent
variable) to predict the most appropriate dose for a new drug (dependent variable). The
purpose of running the regression is to find a formula that fits the relationship between
the two variables. Then you can use that formula to predict values for the dependent
variable when only the independent variable is known. A doctor could prescribe the
proper dose based on a person's body weight.
The regression line (known as the least squares line) is a plot of the expected
value of the dependent variable for all values of the independent variable. Technically, it
is the line that "minimizes the squared residuals". The regression line is the one that
best fits the data on a scatterplot.
Using the regression equation, the dependent variable may be predicted from the
independent variable. The slope of the regression line (b) is defined as the rise divided
by the run. The y intercept (a) is the point on the y axis where the regression line would
intercept the y axis. The slope and y intercept are incorporated into the regression
equation. The intercept is usually called the constant, and the slope is referred to as the
coefficient. Since the regression model is usually not a perfect predictor, there is also an
error term in the equation.
y = a + bx + e
The significance of the slope of the regression line is determined from the t-
statistic. It is the probability that the observed correlation coefficient occurred by chance
if the true correlation is zero. Some researchers prefer to report the F-ratio instead of
the t-statistic. The F-ratio is equal to the t-statistic squared.
The t-statistic for the significance of the slope is essentially a test to determine if
the regression model (equation) is usable. If the slope is significantly different than zero,
then we can use the regression model to predict the dependent variable for any value of
the independent variable.
On the other hand, take an example where the slope is zero. It has no prediction
ability because for every value of the independent variable, the prediction for the
dependent variable would be the same. Knowing the value of the independent variable
would not improve our ability to predict the dependent variable. Thus, if the slope is not
significantly different than zero, don't use the model to make predictions.
The standard error of the estimate for regression measures the amount of
variability in the points around the regression line. It is the standard deviation of the data
points as they are distributed around the regression line. The standard error of the
estimate can be used to develop confidence intervals around a prediction.
Example
4.2 27.1
6.1 30.4
3.9 25.0
5.7 29.7
7.3 40.1
5.9 28.8
--------------------------------------------------
You might make a statement in a report like this: A simple linear regression was
performed on six months of data to determine if there was a significant relationship
between advertising expenditures and sales volume. The t-statistic for the slope was
significant at the .05 critical alpha level, t(4)=4.10, p=.015. Thus, we reject the null
hypothesis and conclude that there was a positive significant relationship between
advertising expenditures and sales volume. Furthermore, 80.7% of the variability in
sales volume could be explained by advertising expenditures.
ANOVA
In statistics, analysis of variance (ANOVA) is a collection of statistical models,
and their associated procedures, in which the observed variance in a particular variable
is partitioned into components due to different sources of variation. In its simplest form
ANOVA provides a statistical test of whether or not the means of several groups are all
equal, and therefore generalizes Student's two-sample t-test to more than two groups.
ANOVAs are helpful because they possess an advantage over a two-sample t-test.
Doing multiple two-sample t-tests would result in an increased chance of committing a
type I error. For this reason, ANOVAs are useful in comparing three or more means.
1. Fixed-effects models assume that the data came from normal populations which
may differ only in their means. (Model 1)
2. Random effects models assume that the data describe a hierarchy of different
populations whose differences are constrained by the hierarchy. (Model 2)
3. Mixed-effect models describe the situations where both fixed and random effects
are present. (Model 3)
In practice, there are several types of ANOVA depending on the number of treatments
and the ways they are applied to the subjects in the experiment are:
• One-way ANOVA is used to test for differences among two or more independent
groups. Typically, however, the one-way ANOVA is used to test for differences
among at least three groups, since the two-group case can be covered by a t-test
(Gosset, 1908). When there are only two means to compare, the t-test and the
ANOVA F-test are equivalent; the relation between ANOVA and t is given by
F = t2.
• Factorial ANOVA is used when the experimenter wants to study the effects of
two or more treatment variables. The most commonly used type of factorial
ANOVA is the 22 (read "two by two") design, where there are two independent
variables and each variable has two levels or distinct values. However, such use
of ANOVA for analysis of 2kfactorial designs and fractional factorial designs is
"confusing and makes little sense"; instead it is suggested to refer the value of
the effect divided by its standard error to a t-table. Factorial ANOVA can also be
multi-level such as 33, etc. or higher order such as 2×2×2, etc. Since the
introduction of data analytic software, the utilization of higher order designs and
analyses has become quite common.
• Repeated measures ANOVA is used when the same subjects are used for each
treatment (e.g., in a longitudinal study). Note that such within-subjects designs
can be subject to carry-over effects.
• Mixed-design ANOVA. When one wishes to test two or more independent
groups subjecting the subjects to repeated measures, one may perform a
factorial mixed-design ANOVA, in which one factor is a between-subjects
variable and the other is within-subjects variable. This is a type of mixed-effect
model.
• Multivariate analysis of variance (MANOVA) is used when there is more than
one dependent variable.
The first step in the chi-square test is to calculate the chi-square statistic. In order
to avoid ambiguity, the value of the test-statistic is denoted by Χ2 rather than χ2 (i.e.
uppercase chi instead of lowercase); this also serves as a reminder that the distribution
of the test statistic is not exactly that of a chi-square random variable. However some
authors do use the χ2 notation for the test statistic. An exact test which does not rely on
using the approximate χ2 distribution is Fisher's exact test: this is significantly more
accurate in evaluating the significance level of the test, especially with small numbers of
observation.
Nonparametric Statistics
General Purpose:
For many variables of interest, we simply do not know for sure that this is the
case. For example, is income distributed normally in the population? -- probably not.
The incidence rates of rare diseases are not normally distributed in the population, the
number of car accidents is also not normally distributed, and neither are very many
other variables in which a researcher might be interested.
For more information on the normal distribution, see Elementary Concepts; for
information on tests of normality, see Normality tests.
Sample size
Another factor that often limits the applicability of tests based on the assumption
that the sampling distribution is normal is the size of the sample of data available for the
analysis (sample size; n). We can assume that the sampling distribution is normal even
if we are not sure that the distribution of the variable in the population is normal, as long
as our sample is large enough (e.g., 100 or more observations). However, if our sample
is very small, then those tests can be used only if we are sure that the variable is
normally distributed, and there is no way to test this assumption if the sample is small.
Problems in measurement
Applications of tests that are based on the normality assumptions are further
limited by a lack of precise measurement. For example, let us consider a study where
grade point average (GPA) is measured as the major variable of interest. Is an A
average twice as good as a C average? Is the difference between a B and an A
average comparable to the difference between a D and a C average? Somehow, the
GPA is a crude measure of scholastic accomplishments that only allows us to establish
a rank ordering of students from "good" students to "poor" students. This general
measurement issue is usually discussed in statistics textbooks in terms of types of
measurement or scale of measurement. Without going into too much detail, most
common statistical techniques such as analysis of variance (and t- tests), regression,
etc., assume that the underlying measurements are at least of interval, meaning that
equally spaced intervals on the scale can be compared in a meaningful manner (e.g, B
minus A is equal to D minus C). However, as in our example, this assumption is very
often not tenable, and the data rather represent a rank ordering of observations
(ordinal) rather than precise measurements.
Hopefully, after this somewhat lengthy introduction, the need is evident for
statistical procedures that enable us to process data of "low quality," from small
samples, on variables about which nothing is known (concerning their distribution).
Specifically, nonparametric methods were developed to be used in cases when the
researcher knows nothing about the parameters of the variable of interest in the
population (hence the name nonparametric). In more technical terms, nonparametric
methods do not rely on the estimation of parameters (such as the mean or the standard
deviation) describing the distribution of the variable of interest in the population.
Therefore, these methods are also sometimes (and more appropriately) called
parameter-free methods or distribution-free methods.
Basically, there is at least one nonparametric equivalent for each parametric general
type of test. In general, these tests fall into the following categories:
Descriptive statistics
When one's data are not normally distributed, and the measurements at best
contain rank order information, then computing the standard descriptive statistics (e.g.,
mean, standard deviation) is sometimes not the most informative way to summarize the
data. For example, in the area of psychometrics it is well known that the rated intensity
of a stimulus (e.g., perceived brightness of a light) is often a logarithmic function of the
actual intensity of the stimulus (brightness as measured in objective units of Lux). In this
example, the simple mean rating (sum of ratings divided by the number of stimuli) is not
an adequate summary of the average actual intensity of the stimuli. (In this example,
one would probably rather compute the geometric mean.) Nonparametrics and
Distributions will compute a wide variety of measures of location (mean, median,
mode, etc.) and dispersion (variance, average deviation, quartile range, etc.) to provide
the "complete picture" of one's data.
Nonparametric methods are most appropriate when the sample sizes are small.
When the data set is large (e.g., n > 100) it often makes little sense to use
nonparametric statistics at all. Elementary Concepts briefly discusses the idea of the
central limit theorem. In a nutshell, when the samples become very large, then the
sample means will follow the normal distribution even if the respective variable is not
normally distributed in the population, or is not measured very well. Thus, parametric
methods, which are usually much more sensitive (i.e., have more statistical power) are
in most cases appropriate for large samples. However, the tests of significance of many
of the nonparametric statistics described here are based on asymptotic (large sample)
theory; therefore, meaningful tests can often not be performed if the sample sizes
become too small. Please refer to the descriptions of the specific tests to learn more
about their power and efficiency
Nonparametric Correlations
Spearman R (Siegel & Castellan, 1988) assumes that the variables under
consideration were measured on at least an ordinal (rank order) scale, that is, that the
individual observations can be ranked into two ordered series. Spearman R can be
thought of as the regular Pearson product moment correlation coefficient, that is, in
terms of proportion of variability accounted for, except that Spearman R is computed
from ranks.
Kendall tau
Gamma
Parametric tests
In a repeated measure design, it is assumed that the data structure conforms to the
compound symmetry. A regression model assumes the absence of collinearity, the
absence of auto correlation, random residuals, linearity...etc. In structural equation
modeling, the data should be multivariate normal.
Parametric statistics is a branch of statistics that assumes data has come from a type
of probability distribution and makes inferences about the parameters of the distribution.
[1]
Most well-known elementary statistical methods are parametric.[2]
Because parametric statistics require a probability distribution, they are not distribution-
free.[5]
Example
Suppose we have a sample of 99 test scores with a mean of 100 and a standard
deviation of 10. If we assume all 99 test scores are random samples from a normal
distribution we predict there is a 1% chance that the 100 th test score will be higher than
123.65 (that is the mean plus 2.365 standard deviations) assuming that the 100 th test
score comes from the same distribution as the others. The normal family of distributions
all have the same shape and are parameterized by mean and standard deviation. That
means if you know the mean and standard deviation, and that the distribution is normal,
you know the probability of any future observation. Parametric statistical methods are
used to compute the 2.365 value above, given 99 independent observations from the
same normal distribution.
There are two types of test data and consequently different types of analysis. As
the table below shows, parametric data has an underlying normal distribution which
allows for more conclusions to be drawn as the shape can be mathematically
described. Anything else is non-parametric.
Parametric Non-parametric
Assumed distribution Normal Any
Assumed variance Homogeneous Any
Typical data Ratio or Interval Ordinal or Nominal
Data set relationships Independent Any
Usual central measure Mean Median
Benefits Can draw more Simplicity; Less affected
conclusions by outliers
Tests
Choosing Choosing parametric test Choosing a non-
parametric test
Correlation test Pearson Spearman
Independent measures, 2 Independent-measures t- Mann-Whitney test
groups test
Independent measures, One-way, independent- Kruskal-Wallis test
>2 groups measures ANOVA
Repeated measures, 2 Matched-pair t-test Wilcoxon test
conditions
Repeated measures, >2 One-way, repeated Friedman's test
conditions measures ANOVA