Lab 01 - Scientific Method and Statistics (New Version)

1
Lab 1: The Scientific Method and Statistics

The power of science comes not from scientists but from its method.
- E. O. Wilson, The Creation

When asked What is biology? most people respond with Its the study of living things which is mostly correct.
A better answer would be Its the scientific study of living things. But what makes one inquiry scientific and
another not? The wealth of knowledge that fills your textbook is the result of the hard work of many scientists
over the course of centuries of work (and only represents a fraction of what scientists have discovered during that
period). As Dr. Wilson accurately points out above, the key to scientific inquiry is not the scientist, but the
scientific method.
There are two main approaches to scientific discovery: one in which nature is
described using observation and measurements (observational science) and one
in which a natural phenomenon is explained using the scientific method
(hypothesis-based or empirical science). Often, observational science leads to
questions which can be answered through hypothesis-based investigations.
There are six steps to the scientific process, each important and vital to making
the process work:

THE SCIENTIFIC METHOD
1. Observe: You cannot ask educated questions about natural phenomena without knowing something about the
nature of the phenomena. The key to any investigation, scientific or otherwise, is observation. Observation
leads to the collection of data and usually will lead the observer to ask a question.

2. Question: As a scientist, you should be
curious about the way the things you
observed work, came to be, interact with
other things, etc. What is it you want to
know about the world? Why did that
stop? How is this formed? Which group
is faster? The question formed will lead
to the formation of hypotheses.

3. Hypothesize: A hypothesis (pl.
hypotheses) is a proposed explanation for
a set of observations. Another way to
look at a hypothesis is to say that it is a
reasonable guess of what might be
occurring; a possible answer to your
question. The hypothesis is formed using
information gathered from observations.
To be useful, a hypothesis should be
testable using experimentation. Once a
hypothesis is proposed, several
consequences can be reasonably
expected. These expectations can be
termed predictions and they are the
expected outcomes if a hypothesis is true.

Based on a figure from Lehner
Handbook of ethological methods.
1996.
2

4. Design experiment and collect data: Once a hypothesis (or hypotheses) is formulated, the investigator should
attempt to verify the predictions. To test the validity of the predictions, experiments are created to allow for
controlled testing of the hypothesis. Your experiment will determine if your hypothesis is supported or not by
evidence. A good experiment is one that will provide supporting evidence if a hypothesis is correct and is
equally likely to show that a hypothesis is false, if it is not correct. Since biological systems vary a lot, it is
best to repeat your experiment several times (replication) and then use statistics to sort out the outcome.

Experiments that are appropriately set up only change one variable (called manipulated variable or
independent variable) for each run of the experiment. All other factors are controls (or controlled variables).
In other words, in order to test whether or not this one variable (as determined by the hypothesis) affects the
outcome of the experiment, you must 1) keep all other variables constant (i.e., everything but the variable in
question is the same between subjects) and 2) have controls to show that the organism is able to
function normally in this experiment.

Your experiment should produce data. Data (plural) are measurable outcomes of the experiment.
These are the numbers that will indicate if your hypothesis is actually causing the observed effect.
Data can be measured in height, width, length, seconds, minutes, days, volume, density, number of
something, and many other measurable quantities. Dont forget your units!

5. Analyze data: Analyze data using statistical methods (see below). Statistics typically check to see if the data
produced by one group (e.g., the control) is different from the data produced by another (the manipulated
variable) in a significantly meaningful way. They take into consideration not only the average, but the
variation around that average. If the data are not statistically significant, the outcome from the manipulated
variable did not differ from the control and you must reject your hypothesis.

If you accept your hypothesis, then there is evidence that your prediction is true. If the experimenter can
provide replication of these results, then the hypothesis can be considered a reasonable explanation of the
observed phenomena. If you reject your hypothesis, then your prediction is probably not causing the observed
phenomenon and you still do not know what is causing it. As a curious scientist, the experimenter should
revise the hypothesis and/or experimental design and try again.

6. Conclude: When data are collected and analyzed, the hypothesis is either tentatively accepted or rejected. The
acceptance of a hypothesis may only be temporary, as new observations and experiments can lead to the
rejection of the hypothesis and/or an alternative explanation may also be true. Here, the outcomes of the
experiment are interpreted in a broader context.

As a side note: A theory is when an idea is supported by many lines of evidence gathered by many investigators testing many
hypotheses over many years. It leads to other predictions, or hypotheses, which when tested are
often also supported by evidence. We can only base our conclusions on the evidence at hand. In
the future, further evidence may lead us to a different conclusion. Therefore, in science, all of our
knowledge is theoretical. Famous theories include things like the following:
gravity- that all objects have a pulling force, larger objects have a larger pull
cell theory- that the basic unit of life is the cell, cells come from other cells, etc
germ theory- that diseases are often caused by microorganisms, rather than bad karma.
Plate tectonics- which proposes that the movements of large continental plates drift
across the liquid layer below them causing earthquakes, etc
Atomic theory- that atoms are the smallest units of matter

Some of these have been supported by a hundred years of data now. Even so, they are still called theories. Most were
controversial when first proposed and some of them may still be controversial. All of them could, in theory (pun intended) be
overturned by a better theory that fits all the evidence more closely.
In this lab, you will be making and breaking hypotheses constantly, but will you create a theory? Not likely. If you still dont
see the difference, talk to someone about it.

3

APPLYING THE SCIENTIFIC METHOD
A certain observation or situation, leads us to ask questions. A hypothesis is a proposed explanation or answer
to your question based on an educated/informed guess. For example, Suzies car wont start this morning on
her way to school. Your question, Why is Suzies car not starting? A hypothesis for this situation might be
that Suzies car battery has died. Another hypothesis might be that Suzie has no gas in her car. Both of those
are statements assume that you might know the problem and have guessed the reason. Note, that a hypothesis
must be written in statement form, not as a question. Also, a good hypothesis must be testable and falsifiable
(able to be proved false.

The second step is experimental procedure. In this step you design an experiment to test your hypothesis. This
experimental design will need to accurately test the hypothesis directly as well as give you a clear yes or
no answer. To help with determining the yes or no answer a control should be incorporated. In the
experimental design, controls are set up the exact same as your experimental test, but with only one thing
different. In the non-working car example, the first hypothesis being tested is whether the battery is dead. The
experimental procedure would consist of putting a new battery into Suzies car. The control would be the
original old battery in Suzies car. Only the type of battery has changed; old versus new. The same car and
battery cables must be used to ensure that only one thing has changed. Or, Suzie could take her battery to the
mechanic and they can test the battery and compare Suzies battery results to those of a functional battery of
the same brand.

The results of that experiment will then need to be analyzed so that a conclusion can be made. If the new
battery in Suzies car makes the car start, or the mechanic states that Suzies old battery is below normal
working standards, then Suzie will know that the hypothesis of a non-working battery has been validated (a
yes answer was achieved). We can then conclude, or make a substantiated explanation of the situation that
the battery was indeed the reason the car did not start.

But what if the new battery did not restart Suzies car or what if the mechanic stated that Suzies battery was
well within working parameters? That would mean that the hypothesis was invalidated (a no answer was
achieved). Therefore, Suzie has not determined the cause of her car troubles. So, she will have to start over to
diagnose the problem and come up with another hypothesis, such as the car wouldnt start because it had no
gas in the engine. Suzie will then redesign an experimental procedure to test this hypothesis using the proper
controls. Suzie can again either solve her problem, or will have to come up with another hypothesis.

!"# %&' () *#*+,'-(. )&(/ !"#$% '%$("# )#* $(+ ,"-% ./)0-0 1,#.# 2
BEDEMIR: Quiet, quiet. Quiet! There are ways of telling whether
she is a witch.
CROWD: Are there? What are they?
BEDEMIR: Tell me, what do you do with witches?
VILLAGER #2: Burn!
CROWD: Burn, burn them up!
BEDEMIR: And what do you burn apart from witches?
VILLAGER #1: More witches!
VILLAGER #2: Wood!
BEDEMIR: So, why do witches burn? [pause]
VILLAGER #3: B--... 'cause they're made of wood...?
BEDEMIR: Good!
CROWD: Oh yeah, yeah...
BEDEMIR: So, how do we tell whether she is made of wood?
VILLAGER #1: Build a bridge out of her.
BEDEMIR: Aah, but can you not also build bridges out of stone?
VILLAGER #2: Oh, yeah.
BEDEMIR: Does wood sink in water?
VILLAGER #1: No, no.
VILLAGER #2: It floats! It floats!
VILLAGER #1: Throw her into the pond!
CROWD: The pond!
BEDEMIR: What also floats in water?
VILLAGER #1: Bread!
VILLAGER #2: Apples!
VILLAGER #3: Very small rocks!
VILLAGER #1: Cider!
VILLAGER #2: Great gravy!
VILLAGER #1: Cherries!
VILLAGER #2: Mud!
VILLAGER #3: Churches -- churches!
VILLAGER #2: Lead -- lead!
ARTHUR: A duck.
CROWD: Oooh.
BEDEMIR: Exactly! So, logically...,
VILLAGER #1: If... she.. weighs the same as a duck, she's made of
wood.
BEDEMIR: And therefore--?
VILLAGER #1: A witch!
CROWD: A witch!
BEDEMIR: We shall use my larger scales
4
Introduction to Statistics
All of the above information is to show you how science
works. Science demands a lot. It requires the scientist to be
unassuming, unbiased, curious, skeptical, and clever. Just
because you may think things happen in a certain way or because of a certain reason, you cannot assume
anything without evidence. If your experiment shows that birds fly because they are lighter than air, then
this is what you assume. Of course, your experiment may not be appropriate to test this or it wasnt done
properly, but until it is redone, any conclusions must be drawn on the evidence at hand.

We use science and statistics more than you might think. And often, we are drawing the wrong
conclusions from them. I just heard on NPR that New Mexico dropped from 1
st
to 17
th
when ranking
states by the number of alcohol-related fatalities. Good for NM, right? But it is important to know how
those stats are determined. Previously this was the number of fatalities per state population (per capita).
Now it is the number of fatalities per miles driven on average. And as the reporter said, New Mexicans
drive a lotIt is still in the bottom 10 for per capita deaths. The two figures should not be compared
directly.

Flipping through an old Newsweek magazine I came across these advertising slogans that use
statistics, but I wonder about the specifics. Here are the claims they make and some questions that
immediately come to my mind:

Product Slogan/Claim Questions
Dodge The worlds biggest cab
What is this measuring: volume, surface area, length, width?
Internal or external?
Compared to what: other trucks, cars, everything that ever
existed?
Crestor
10-mg dose of Crestor, along with diet,
can lower bad cholesterol by as much as
52% (vs 7% with placebo)
Are these percentages averages of the study group or
maxima? In other words, perhaps only 1 person in the
entire study saw a 52% drop and this was way above the
average.
What is the effect of diet alone?
Did the placebo treatment have the same diet?
Tempur-
Pedic
In a recent survey, 92% of our
enthusiastic owners report sleeping better
and waking more refreshed.
How many people were surveyed? 9.2/10 is not the same as
920/1000.
Compared to what? Their old mattress? Nothing?
Did you only survey enthusiastic owners or were only the
opinions of enthusiastic owners included in this statistic?

Since many of you have aspirations to join the medical field, let me point this out. It is my
experience that most medical researchers do not use proper statistical tests during their experiments. This
means that most of the studies used to show a link between cancer and your favorite pastime, or to
discover the gene that controls your diet preferences, or to develop medicines that treat the common cold
might be based on faulty premises. Ive seen peer-reviewed papers published with no regard for proper
statistical procedure. Youd be surprised how many papers talk about how correlated their treatment is
with recovery based on figures worse than the last regression example above. The authors got away with
it because the reviewers dont know it either.

In addition, most patients and many physicians are overwhelmed, uniformed, or ignorant of how
statistics are used by researchers, medical supply companies, or mainstream media to sell their story. Ive
seen news stories based on differences between groups that are smaller than the margin of error (meaning
there is no difference). Hopefully, with this lab manual you can change some of that.

My point is to be critical and be curious. There is a lot of information out there these days. Not all
of it is valid. Do not take it at face value. Ask questions. Question your consumer products, your
friends, your doctors, your professors. Then youll be one step closer to thinking like a scientist.
You can claim anything with statistics,
but only to those who dont understand
statistics. - unknown
5
STATISTICS!
Humans consider themselves good judges of what is bigger, faster, or better
than the rest, but how much bigger is bigger? For example, imagine you want to
know if frogs from one population (Pop A) are bigger than those from another
population (Pop B). You could take a frog from each one and measure their lengths.
You might find that the frog from Pop B is much bigger than the one from Pop A.
But do these frogs really show the difference between the two populations?

What if you just happened to catch a small frog from Pop A and a large one from Pop B? What if you
caught several more and found that the frogs from both populations really had a lot of variation in lengths
and looked something like the dataset below? Some of the frogs in Pop A are bigger than some in Pop B.
Now can you tell if one population is bigger than another? Which one?

This is where statistics becomes useful. We cannot easily judge the
differences between groups accurately when there is variation (as in most
studies in biology). Even the advertisement on the left admits Individual
results may vary. Therefore we must analyze them statistically.

We usually use statistics to understand something about a given population, be
it animal, vegetable, or mineral. The best way would be to measure every
single individual from both groups, but that is not often possible. Therefore we
collect data on a subset or sample of the group of interest (n = sample size, the
total number of individuals sampled).

We then extrapolate the data for this sample to the rest of the population. But
the sample should be unbiased, and care should be taken to control for the following:
1. Make sure the sample is a good representation of the population as a whole. For instance, if
you wanted to know if Georgians prefer the Bulldogs or the Jackets, you will not get a complete
understanding of peoples preferences if you only survey people in Athens.

2. If you are using multiple groups to examine the potential effects of a specific variable, make
sure that all groups are equivalent in every way, except in the variable being tested. For
instance, it is not good science to give a placebo to people over 50 years of age and a new medicine
to people under 50 years of age, and conclude that the people given the drug are healthier. In this
case, there are two experimental variables: presence of the drug and age. The two groups are not
equivalent.

6
Once you have gone out and taken some measurements of a sample; the number of trees in a forest, the
size of seeds eaten by cotton rats, the blood pressure of patients with arthritis, your first step will be to
describe different characteristics of your sample. To do this, we use Descriptive Statistics, so here we
go!

Measures of Center
These statistics indicate which values are most common. They attempt to define what is normal for the
population.
Mean in popular speak, the mean is the average. It is the most commonly used measure of central
tendency. The mean is computed by summing all values in your sample and dividing that sum by the
sample size.
where,
X
i
s = each individual data point
n = sample size (the number of data points in your sample)
! = summation

In our frogs, Pop A has a mean length of 7.38 cm and Pop B is
9.06 cm on average.

Mode - The mode of a sample is the score that appears most often.

In Pop A, mode = 8.7 cm (i.e. 2 of the frogs were 8.7 cm long). Pop B does not have a mode.

Median - The median divides the distribution into halves; half of the scores are above the median and
half are below it when the data are arranged in numerical order.

When the sample size is an odd number, as in our frog sample, the median is the middle value (Pop A =
7.7, Pop B = 8.8).

When the sample size is an even number, the median is halfway between the middle values, e.g., for the
dataset (1 3 4 5 8 9), the median location is half-way between the 3
rd
and 4
th
scores (4 and 5) or 4.5.

Median is a useful measure of center for data
that is very skewed towards one end, like
salaries, in which the mean does not give a
good measurement of the norm (see figure at
right). If you were complaining to your boss
that you and your coworkers needed a raise,
would you complain about how low the
mean or the median is?

If the dataset is normally distributed, the data points
are equally distributed about the mean (as on right) and
mean = median = mode
n
X
X
i !
=
Mean,
Median,
Mode
DESCRIPTIVE STATISTICS

A biologist, a chemist, and a
statistician are out hunting. The
biologist shoots at a deer and misses
5ft to the left, the chemist takes a shot
and misses 5ft to the right, the
statistician yells We got em!
7
Measures of Variation
Although Measures of Center are informative as they describe how scores
are centered in the distribution, the mean, median, and mode alone do not
provide the best possible description of a sample (distribution). For
example, think of samples of two populations, X and Y:

Both X and Y have the same Mean (50) and similar Median (50
and 48, respectively). But, would you call those 2 datasets very similar?
As you can see, Measures of Center alone, are not sufficient to clearly
describe a data set. . Measures of variation provide additional critical
information about your data. Specifically, the degree to which individual
scores are clustered about or deviate from the average value in a
distribution. In biology, everything varies, such as the seed sizes of the
two species on right. So it is important to always report some measure of
variation along with your sample mean when describing your data.

Range Range is the simplest measure of variability. It describes how much the population is spread
around the mean. It is the difference between the highest and lowest score in a distribution:

Range = maximum value minimum value.

Although easy to compute, range is based solely on the two most extreme scores in the distribution
and thus it is susceptible to much fluctuation. For instance, in frog Pop A, the range is 3.4 cm.
However, if we caught one additional frog that measures 11.5 cm, the range would jump to 6.2 cm!,
Therefore, the range is not often a reliable measurement of variability.

Variance (s
2
) - Variance measures the average distance between each data point from the mean.:
(X
i
X )

However, simply summing the deviations will result in a value of 0 because values below the mean
(negative) cancel out those above the mean (positive). To get around this problem, variance is based
on squared deviations of scores about the mean:
(X
i
X )
2

Squaring the scores removes the positive/negative signs.

If we had a much larger sample size (say 100 frogs instead of just 5), our variance would be expected
to rise due simply to sampling effort (we would have caught larger and smaller frogs). To control for
this, the sum of the squared deviations is divided by the sample size (n). The result is the average of
the sum of the squared deviations. This is the variance.

1
) (
2
2
!
!
=
"
n
X X
s
i

The variance in Pop A = 2.17 and in Pop B = 3.46. This means
that the lengths of frogs in Pop B vary more than those in Pop A.

X Y
49 2
50 48
51 100

Note: the symbol for variance is
s
2
. So if s
2
= 9; variance equals 9.
You do not need to take the square
root. The same goes for other
symbols in this lab (e.g., X
2
).
8
Standard deviation (s)- Standard deviation is a measure of variability expressed in the same units as
the data being measured. It is calculated by taking the square root of the variance. Variance is a
measure in squared units and has little meaning with respect to the datas units.
2
s s =

The standard deviation for Pop A is 1.47 cm. Often the mean is written with the standard deviation to
show the variability of the data, for instance, frogs in Pop A had an average length of 7.4 +/- 1.5 cm.

Standard error (se): Standard error is the square root of the variance divided by the sample size.
Standard error gives more information about sample size than standard deviation. If we sampled 100
frogs, we might expect to understand the true nature of this population better than if we just measure
5 frogs.
n
s
se
2
=

The standard error of our frog Pop A is 0.66.

Data is graphed using bars that show the mean and some measurement of
variance around that mean, such as standard deviation or standard error. You
should always indicate which is used. The graph on the right is mean +/-
standard error.

WHAT ELSE CAN WE DO WITH STATISTICS??

Now that you have an understanding of how to use Descriptive Statistics to characterize
a population and analyze its distribution we can begin to use other Statistical techniques
to help us answer scientific questions.

Science experiments are usually designed to determine the effect of some variable on a
group by changing the variable for one group and comparing the effects of this change to
an unchanged group (control group). For instance we might like to know if a certain
drug actually helps people get better. We could then set up an experiment and compare
patients on the drug to patients on a placebo (a sugar pill or otherwise neutral
medication).

In addition, scientists often wonder whether populations might differ with respect to
certain characteristics and if so what factors account for those differences. Here too, we
can use Statistics to determine whether differences really exist.

So lets keep using our frog populations A and B to examine this further
9

We might start by asking:
Are Pop A frogs smaller than Pop B frogs? Based on our knowledge of their habitats we might
hypothesize that Pop A frogs are smaller than Pop B frogs.

However, it will be impossible to collect ALL frogs from each pond (and probably bad for the pops
survival), so we must rely on statistical analysis of samples collected in each pond. Whenever we rely on
samples to answer questions about populations, we must use Statistics to decipher how different they are.

Most the questions we will be asking in this lab pertain to differences between samples. We often
hypothesize that the populations are different because of some variable (otherwise we wouldnt be
interested in them). To determine if they are different enough to be meaningful, our statistical methods
need something to compare to. The default for statistical tests is that there is no difference. We call this
the Null Hypothesis.

A null hypothesis (H
0
) states that there is no difference among groups you are comparing or no effect
of a variable on a system. In this example, your Null Hypothesis would be There is no difference in body
size between Pop. A and Pop. B.

If the factor has a big enough effect, the samples will be different statistically. We can state this as Pop
A frogs are smaller than Pop B frogs. This is termed our Alternative Hypothesis.

You can formalize an alternative hypothesis, but the statistics are testing the Null Hypothesis. After you
collect data, statistics will allow you to either Accept or Reject this Null Hypothesis. If you reject the
null, your evidence may point to the alternative as the cause, but you did not prove that the variable you
measured was the cause. There could have been some unknown thing happening too.

When using statistics it is NOT POSSIBLE to prove anything with 100% certainty, so you cant prove
that one population is larger than the other, BUT statistics gives you to tools to DISPROVE that they are
the same (Disprove your Null Hypothesis). Think about this statement: All male Cardinals are red, is it
even possible to Prove that statement correct? I think not!! but you can easily Disprove that statement
when you photograph the first male Cardinal that is not red!

Framing your hypothesis in the form of a Null Hypothesis gives you the ability to statistically Accept or
Reject the hypothesis. If you reject the Null, then as a scientist you can begin to explain why you think
they are different.
10
In the following sections, four statistical tests will be introduced: t-test, ANOVA, regression, and chi-
square test. Using simple math (trust me if you can add, subtract, multiply and divide, you can do
statistics! if you cant, you can use a computer and still do statistics ! ), each of these four statistical tests
allows us to make specific kinds of comparisons with our data, but more on this later. Each test utilizes
the data we collect and computes a number called a Calculated Test Statistic (each test has its own
CTS). A Calculated Test Statistic is a single number that quantifies (or represents) the difference among
the groups being compared based on their sample size, total value, and/or mean and variance. For each
statistical test, we would compare our CTS to a theoretical critical value (do not worry about how these
theoretical values are computed, it is wizardry!, no one really knows !). Based on that comparison, you
will determine whether to accept or reject your Null Hypothesis.

Regardless of how overwhelming your data may show that the frog populations are different, there is
always a risk of being wrong when you Reject a Null Hypothesis. That risk is given with each
Statistical Analysis that you perform in the form of a p value
So, lets start there. What is the p value?

Probability of significance (p value) - The p value represents the
likelihood that your results are simply due to random chance and do not
represent something biologically real. So obviously,

a low p value is best!!

In most areas of science, we have agreed that a p value equal to or less
than 0.05 is scientifically significant and therefore that is the level at which we can confidently
REJECT a null hypothesis. That is, there is less than 5% chance that the result you obtained from your
experiments is random. If you did this experiment with different subjects 100 times, you should get
similar results 95 times. So again, if the p value for a statistical analysis is ! 0.05, you can confidently
reject your Null hypothesis, shout it from the roof top I reject my Null Hypothesis, the frog populations
are truly different in body size. But are you 100% sure that the populations in these 2 ponds are really
different??? No! but you are 95% sure! and that is enough for us to make that statement !.

A higher value of the calculated test statistic results in a lower p value.
The exact relationship between a calculated test statistic and p is usually complicated and cannot
generally be calculated with a simple formula. But think about it, if the CTS represents the degree of
difference among the groups you are comparing, then the greater the number, the lower the probability
that those differences are random. In contrast, a low CTS shows us that the differences among groups are
small and likely not biologically real so your p value will be larger. The relationship includes the number
of degrees of freedom (df).

Degrees of freedom (df) are an integer number representing the number of independent pieces of
information that are used to estimate a statistical parameter. They are related to the sample size, the
number of classes, categories, or groups.

If our calculated test statistic is high enough and our p-value is low enough, we can conclude that there
is indeed a difference between our samples. In science, we say that the means are significantly
different. Since the word significant is commonly used, care must be taken when writing in science not
to use it in any other sense. To say something is significant implies that the proper stats have been done
and a difference was found. To say two groups are different indicates that they are significantly different.

11
Types of data
There are actually two types of data. Which type you collect determines which test you use to analyze
them:
Continuous data (quantitative) quantitative data that can take on many different values, in theory,
any value between the lowest and highest points on the measurement scale.
e.g.: (1, 2, 3) or (4.011, 4.012, 4.013)

Use a t-test or ANOVA to analyze continuous data.

Discrete data (qualitative)- categorical data that has a limited number of values
e.g.: (yes/no) or gender (male/female) or college class (freshman/sophomore/junior/senior).

We will use a chi square test to analyze discrete data.

In order to correctly use the following tests, a few things are assumed. If these assumptions are not met,
you should transform your data into a format that is acceptable, or choose a test that does not require these
assumptions to be met (which we will not go over in this class unless we have to.)

Assumptions of Parametric Statistics:
1. Continuous variables (or almost, i.e., there
are a lot of possibilities)
2. Samples are collected randomly
3. Observations (data) are independent of each
other. The members of each group are
assumed to have nothing in common except
the desired treatment.
4. Within-group variance is equal across
groups. Use F-test to test for differences
among variances.
5. Data must be normally distributed.

If these assumptions are not met, see your
instructor

STATISTICAL TESTS
All the statistical tests described here (except regression) ask the question: Is there a
difference between these two groups? They then test that question mathematically.
12
I. t-test of means
o Used with continuous data, one variable.
o Looks for differences in means of 2 groups.

Example of when you would use a t-test:
Question Is there is a difference in height between male and female giraffes?
Variable of interest: Height (continuous data)
Groups: Male and Female (2 groups)
Comparison: Means (continuous data with variation)
Null Hypothesis: Mean height of Males = Mean Height of Females
Alternative Hypothesis: Mean height of Males "Mean Height of Females

The heart of the t-test is the calculation of a statistic known as the "t value". The formula for the t value
associated with two sample means is:
!!! !
X
!
! X
!

!
!
!
!
!
!
!
!
!
!
!

Where,
X
!
= the mean of group 1
!
!
!
= variance of group 1
n
1
= sample size of group 1
X
!
= the mean of group 2
!
!
!
= variance of group 2
n
2
= sample size of group 2

For the t test, the number of degrees of freedom is: df = (n
1
-1) + (n
2
-1).

By convention, the sample with the larger mean is designated sample 1to avoid a negative value of t, but some
statistical software does not do this, and thus produces negative values for t. In that case, simply take the
absolute value of the listed t (!!!!.

Because of its complexity, the calculation of p is not easily done by hand. Rather, the calculated t value is
compared to a table of critical values, which lists the value that the calculated statistic must exceed in order for
p to be less than 0.05 for the appropriate number of degrees of freedom (SEE TABLE 1). If the calculated t
value is greater than the critical t value in the table, then we REJECT THE NULL HYPOTHESIS, the means are
significantly different.

Explanation of equation:
The numerator evaluates the size of the difference between the two sample means. A greater difference in the
means in the numerator produces a larger value of t. The denominator is actually the formula for the standard
error of the difference between the means. Just as was the case for the standard error of a single mean, the size
of this standard error depends on how many measurements we made (n) and how variable the measurements are
(the standard deviation, s). When the measurements are more variable (i.e. a bigger s), our samples are less
likely to be representative, our standard error is bigger, and the calculated t is smaller. When our sample size
increases (i.e. a bigger n), we are more confident that our sample is representative because the variation in
individual measurements tend to cancel out - leading to a smaller standard error and a larger value of t. Thus
you can see that the formula for t includes all of the factors that affect our ability to assess whether differences
are real or whether they have resulted from chance unrepresentative sampling: the size of the differences, the
variability in the population, and the sample size of our experiment.
13
Example of t-test:
The average age (in days) individuals of Daphnia longispina, a crustacean, begin reproduction were measured
from two populations.
Question Do the populations begin reproduction at different ages?
Variable of interest: Age (in days)
Groups: Population I and II
Null Hypothesis: Mean age of reproduction in Pop I = Mean age in Pop II
Alternative Hypothesis: Mean age of reproduction in Pop I "Mean age in Pop II

Population
I II

Individual ages
(X):
7.2 8.8
7.1 7.5
9.1 7.7
7.2 7.6
7.3 7.4
7.2 6.7
7.5 7.2
Sum (!X) 52.6 52.9
Sample size (n) 7 7
Mean (X) 7.5143 7.5571
Variance (s
2
) 0.5047 0.4095

Plug data into equation for t =

!!! !
X
!
! X
!

!
!
!
!
!
!
!
!
!
!
!

!!! !
!!!"#$ !!!!!"#
!!!"#$
!
!
!!!"#$
!
= -0.0428 / # 0.1306 = -0.0428/0.3613 =$-0.1184$= 0.1184

df = (n
1
-1) + (n
2
-1) = (7 1) + (7 1) = 12

Critical value from table of desired 0.05 p value and 12 degrees of
freedom = 2.179

Since our t, which equals 0.1184, is not greater than 2.179, we must
ACCEPT our Null Hypothesis, the means of the two populations are found
not to be different, and thus we cannot say that these populations reach
reproductive maturity at different ages. We have no evidence to the
contrary!

Here is how this might be graphed:
Variance is depicted as standard error. Notice that the error bars overlap, another
indication that the means are not statistically different.
Age at reproduction in
Daphnia longispina
14
II. Analysis of variance (ANOVA)
o Used with continuous data, one or more variables.
o Looks for differences in means among 3 or more groups.

Example of when you would use An ANOVA:
Question
Is there a difference in body size (as measured by weight) among
frogs from 4 different ponds?
Variable of interest: Weight (continuous data)
Groups: Population I, II, III, IV (more than 2 groups)
Null Hypothesis: There is no difference in weight among the four ponds
Alternative Hypothesis: There is a difference in weight among the four ponds

ANOVA will let you simultaneously compare the means of 3 or more groups.

Although an ANOVA can be performed by hand, we will not take the time to do that in this lab. Instead,
we will use a computer program to perform the messy parts. You can do that here:
http://www.physics.csbsju.edu/stats/anova.html

t-tests tell you if there is a significant difference between two groups. If there is, you can easily look at
the two means and tell which one is bigger. An ANOVA tells you if there is a difference among more
than two groups. In this situation, you cannot easily tell if treatment A is bigger than treatment B, but not
C, etc. If you want to know whwere among the four ponds there is a difference, you must use another
test. Therefore you must perform a follow up test, called post hoc tests, to look for differences among the
means (such as the Tukey-Kramer test).

For instance, if we had measured a third population of Daphnia, we might get results like the following:

ANOVA table:
Source of
variation
Sum of
Squares
df Mean
squares
F
Between 158.8 2 77.40 13.12
Error 70.8 12 5.900
Total 225.6 14

The probability of this result, assuming the null hypothesis, is 0.001

Therefore we can REJECT the NULL and say that there is a difference
among these populations. The graph on the right shows the means of each
population and standard error. If the error bars overlap, the populations are
not different (also indicated by the letters).

Why not just use several t-tests?
t-tests should not be used for comparing means of more than two groups because each comparison
has its own error (probability of getting a significant result due to chance). The error adds up with
each comparison. In other words, if you had many samples and compared each possible pair with a
p-value of 0.05, you have a 5% chance of finding a difference randomly. The more samples, the
more likely this is. Foe example, if we had 7 different groups, there would be 21 pairs and we
would expect to see a difference in at least one of them that is simply by chance

A A
B
15
III. Regression/Correlation
o Use with continuous data, two variables.
o Tests the relationship of two variables.

Example of when you would use a Regression:
Question Is height related to shoe size?
Variable of interest: Height and Shoe size (continuous data)
Groups: The group of individuals you measured (only 1 group, but 2 variables)
Comparison: Each individuals measurements (continuous data with variation)
Null Hypothesis: There is no relationship between height and shoe size.
Alternative Hypothesis: Height and shoe size are related.

Regressions and correlations are very similar and for the purposes of this class, we may treat them
equally, but technically:

Regression- Tests the relationship of one variable to another by expressing one as a linear (or more
complex) function of the other. In regression, one variable is the cause and the other is the effect.
For example, people who predominantly eat fatty foods weigh more (the more fatty foods one eats,
the heavier she/he will be).

Correlation- Tests the degree to which two variables vary together.
Both variables change together, not because one causes the
other, but because they are both affected by a third variable.
TA recent study just found a correlation between the amount of
chocolate consumed by a country and the number of Nobel
laureates they produce. Is this due to the chocolate? Maybe,
but most likely the two are affected by a third cause.

CORRELATION DOES NOT EQUAL CAUSATION!!!

A function is a mathematical relationship enabling us to predict what
values of variable Y correspond to given values of variable X. Such a relationship is written as

Y = f(X)

You may recognize this as Y = bX

In the simplest regression, Y = X. Therefore, for example, when Y = 25,
we can predict that X will also = 25. Fitting a line through this
relationship produces something like the figure on the right:

Here, the X is the independent variable (the cause, free to vary) and the
Y is the dependent variable (effect, due to the cause).

16
The following figure shows a functional relationship (the variables are not perfectly correlated), in which
for every increase of 7 units of X, there will be a 1 unit increase in Y.

In nature, there is variation in the relationship between each pair of X
and Ys. For example:

The vertical lines connecting each datum dot to the best fit line on the
right are measuring the variation. And whenever there is variation, we
must do statistics to see if what we found is random chance or a
significant relationship. The test for regression is similar to that for
ANOVA, but the math involved is beyond this class. There are two
things you need to know to understand a regression: the p-value (see
above) and the r
2
value. The short and skimpy explanation is that the
p-value, as always, tells you if there is a significant relationship
between your two variables.

The r
2
value tells you how good a predictor that relationship is. It
measures the variation of each data point from the best fit line (that is, how far away from the line each
dot is, see figure above). If your data is a perfect predictor, the data will line up
nicely and you will get a high r
2
.

In the figure to the left, p < 0.001, r
2
= 0.94. This means that the independent
variable explains 94% of the dependent variable. It is a really good predictor.

By the way, this is a positive relationship: as X increases, so does Y.

But in the figure to the right, p = 0.02, so the line is significant, but
r
2
= 0.22. X is not as good a predictor of Y (it only explains 22%
of the variation).

This is an example of a negative relationship: as X increases, Y
decreases.

You can think of r
2
as a measurement of the scatter of the data. How scattered is the data? In the first
figure, it is less scattered than in the second figure.

Here is a general guideline for r
2
values:

Pair of
data
Independent
Variable (X)
Dependent
Variable (Y)
1 20 30
2 21 33
3 27 40
4 29 39
17
IV. Chi square test (X
2
)
o Used with discrete data, one variable.
o Compares observed frequencies of an experiment to expected frequencies.
Either/or, yes/no, proportions.
o Since the data is discrete, there is no real variation. Also, chi square is
usually not graphed because there are only 2 numbers. These can be listed in
the text or in a table.

Example of when to use a Chi Square:
You made a bet with friend based on a coin toss. You pick head every time, but lose best out of 10 by 8 to
2. You think the coin might have been weighted and your friend cheated.

Question Is 8:2 different than what we would expect with a random toss (5:5)?
Variable of interest: The ratio of heads to tails
Groups: Heads and Tails
Comparison: Each individuals measurements (continuous data with variation)
Null Hypothesis:
Observed frequency = Expected frequency
(8:2 is not significantly different than 5:5)
Alternative Hypothesis:
Observed frequency " Expected frequency
(8:2 is significantly different than 5:5)

With Chi Square, you can statistically compare how many time heads and tails come up on this coin
(observed frequency) to what you would expect if the coin is not rigged (expected frequency).

How to do a Chi Square test
1. Collect your data!
You flipped the coin 10 times and observed the following:
o 2 heads (these are your Observed Frequencies)
o 8 tails

2. Determine the expected frequencies
If the coin is not rigged, we would expect an even number of heads and tails, a 50:50 ratio. Since
we flipped it 10 times, 5our expected frequency is
o 5 heads
o 5 tails

3. Calculate your chi square test value. Make a chi square table like the one below to set up your
calculations.
!
!
!
!!"#$%&$' ! !"#!$%!&!
!
!"#!$%!&

4. Compare X
2
and degrees of freedom in Table 2 to find p value.

Degrees of freedom (df) for X
2
= number of categories 1

o So in our example we had two potential outcomes Heads or Tails, so df= 2-1
o If The calculated X
2
value is greater than the theoretical (critical) value given under p = 0.05 in
the Chi Square Distribution table (see Table 2),we reject the null hypothesis and conclude that
the coin is rigged. If our calculated chi square is less than the critical chi square, we must
accept the null and conclude that the coin is not rigged (did not behave differently than what
you expected).

18

CHI SQUARE TABLE

Steps: 1 2 3 4 5 6 7
Observed
frequencies
Expected
frequencies
Deviation
from
expected
Deviations
squared

Equation:

o

Expected
ratio:

e

o e

(o e)
2

(o e)
2
e

%

Heads 2 & 5 -3 9 1.8 1.8
Tails 8 & 5 3 9 1.8 +1.8
Sum (n) 10 1 10 0 X
2
=
3.6
P >
0.05

Since our p value is greater than 0.05, we must ACCEPT the NULL.
This means that there is no difference between our observed frequency
and the expectation of randomness. We can trust our friends coin and
we have lost the bet.

In the pages below, you will find a quick flow chart to determine which statistical test to use and two
tables that allow you to convert a test statistic (t and X
2
) to a p value.

Now you have many tools in your statistics toolbox. Go out and DO SCIENCE TO IT!
Remember, the symbol for chi
square is X
2
. So if X
2
= 9; your chi
square value is 9. You do not need to
take the square root. The same goes
for other symbols in this lab (e.g., s
2
).
19

20
Table 1. Critical Values for t-tests

df
Two-tailed p values:
Means are NOT significantly different Means ARE significantly different
1.00 0.50 0.40 0.30 0.20 0.10 0.05 0.02 0.01 0.002 0.001
1 0.000 1.000 1.376 1.963 3.078 6.314 12.71 31.82 63.66 318.3 636.6
2 0.000 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 22.32 31.59
3 0.000 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 10.21 12.92
4 0.000 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 0.000 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 5.893 6.869
6 0.000 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 0.000 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 0.000 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 0.000 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 0.000 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587
11 0.000 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 4.025 4.437
12 0.000 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 0.000 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 0.000 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 0.000 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.733 4.073
16 0.000 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.686 4.015
17 0.000 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.646 3.965
18 0.000 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.610 3.922
19 0.000 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.579 3.883
20 0.000 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.552 3.850
21 0.000 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.527 3.819
22 0.000 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.505 3.792
23 0.000 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.485 3.768
24 0.000 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.467 3.745
25 0.000 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.450 3.725
26 0.000 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.435 3.707
27 0.000 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.421 3.690
28 0.000 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.408 3.674
29 0.000 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.396 3.659
30 0.000 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.385 3.646
40 0.000 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 3.307 3.551
60 0.000 0.679 0.848 1.045 1.296 1.671 2.000 2.390 2.660 3.232 3.460
80 0.000 0.678 0.846 1.043 1.292 1.664 1.990 2.374 2.639 3.195 3.416
100 0.000 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 3.174 3.390
One-tailed p values:
0.50 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

df = (n
1
-1) + (n
2
-1)
21
Table 2. Critical Values for Chi-Square tests

Expected and observed are NOT significantly different
Expected and observed
ARE significantly different
p: 0.99 0.95 0.90 0.75 0.50 0.25 0.10 0.05 0.025 0.01
df
1 0.0002 0.003 0.015 0.10 0.45 1.32 2.70 3.84 5.02 6.63
2 0.0201 0.102 0.210 0.57 1.38 2.77 4.60 5.99 7.37 9.21
3 0.1148 0.351 0.584 1.21 2.36 4.10 6.25 7.81 9.34 11.34
4 0.2971 0.710 1.063 1.92 3.35 5.38 7.77 9.48 11.14 13.27
5 0.5543 1.145 1.610 2.67 4.35 6.62 9.23 11.07 12.83 15.08
6 0.8721 1.635 2.204 3.45 5.34 7.84 10.64 12.59 14.44 16.81
7 1.2390 2.167 2.833 4.25 6.34 9.03 12.01 14.06 16.01 18.47
8 1.6465 2.732 3.489 5.07 7.34 10.21 13.36 15.50 17.53 20.09
9 2.0879 3.325 4.168 5.89 8.34 11.38 14.63 16.91 19.02 21.66
10 2.5582 3.940 4.865 6.73 9.34 12.54 15.98 18.30 20.48 23.20

!
!
!
!!"#$%&$' ! !"#!$%!&!
!
!"#!$%!&

Degrees of freedom equals the
number of groups being compared
minus one
(df = n-1)

Critical values that, at the given degrees of freedom,
indicate the given p-values

22
1108K Lab 1: Statistics Postlab Name___________________________________

1. Lets say that one time you drank a soda before a test and did better on that test than you ever have
before. Design an experiment using the other students in this class to determine if drinking any of the
following: soda, milk, or water, improves performance on a test over the other drinks. You DO NOT
have to perform this experiment, just think through the design. DO NOT MAKE UP DATA. Provide
the following:
a. Question

b. Null Hypothesis

c. Alternate Hypothesis

d. Experimental design. Make sure to keep all variables the same except the one of interest.
Also remember that you need replication to piece apart any variation.

e. Type of data you would collect. What will you measure exactly?

f. Statistical test you need to analyze the results.

23
2. Why is standard error always smaller than standard deviation?

3. What does a p-value of 0.03 mean? Be specific in your interpretation without making up data.

4. Is it more likely that the data depicted on the right has a
high r
2
or a low r
2
?

5. What does that mean?

6. If you increase the sample size, what happens to the critical value (the number you have to reach to
find a significant difference) of a t-test?

24
7. Using a t-test, determine if these two groups are statistically different. Be sure to show your t-test
work (filled in equation) and report the t-value and p-value.

You are trying to determine whether breastfeeding or bottle-feeding is the best method to speed up
the growth of a human baby. You surveyed 15 women who breastfed their babies. Their babies
gained 17 pounds on average (variance = 5) over a 3-week period. You surveyed 20 women with
similar age babies who bottle-fed their child. Their babies gained 12 pounds on average (variance =
4) over a 3-week period.
a. Question

b. Null Hypothesis

c. Alternate Hypothesis

d. What are your results and conclusions? (Do you accept or reject your hypothesis?)

25
8. For this part, I want you to perform a t-test on data you collect yourself. Nothing too complicated (no
need to perform an experiment), you just need to collect continuous, quantitative data of two groups
and test to see if there is a difference between their means. Analyze this data using a t-test. Provide
the following.
a. Question:

b. Null Hypothesis:

c. Alternate Hypothesis:

d. Your data: e. Complete the table
Group 1 Group 2
Mean
Median
Mode
Variance
Standard Deviation
Standard Error

f. Your calculations for the t-test

g. Your conclusions, including the t statistic, p-value, and interpretation of this p-value.

h. Attach a graph of your results using Excel. Show the means and the standard error. Label
your axes.
Group 1 Group 2
Individual Data Individual Data

Lab 01 - Scientific Method and Statistics (New Version)

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lab 01 - Scientific Method and Statistics (New Version)

Enviado por

Direitos autorais:

Formatos disponíveis

1

Lab 1: The Scientific Method and Statistics

Você também pode gostar