Você está na página 1de 14

Statistics: A crash course

Zoe Ireland
z.ireland@uq.edu.au

Topics to cover....
Basic nomenclature Descriptive statistics Hypothesis testing: Error, Power, Assumptions Types of parametric tests (t-test, ANOVA, post-hoc) Types of non-parametric tests

Useful resources: Statistics for the behavioural sciences. Gravetter & Wallnau. Research design and statistics. Thomas Edwards. SPSS survival manual. A step-by-step guide to data analysis using SPSS for Windows (Version 15). J Pallant. G*Power - http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/

Basic Nomenclature
Independent & Dependent variables Scales of measurement Types of experimental design

Variables The IV and DV


Independent variable is defined by the researcher Dependent variable is measured by the researcher (it depends on the independent variable)
Diet type: Control or high salt Blood pressure in animals fed control or high salt diet Activity levels: Restricted or unrestricted Adiposity levels with restricted or unrestricted activity Treatment: Saline or anti-depressant drug Anxiety level following placebo or drug

Scales of Measurement for DV


Limits the statistical tests that can be used 1. Nominal scale categorises observations
Behaviour grooming, feeding, playing Genotype heterozygote or homozygote

2. Ordinal scale ranks observations


First, second, third etc. Interval between rankings doesnt have to be equal
EG. scores 10, 11, 15, 21, 26 ranked 1-5

Scales of Measurement for DV


3. Interval scale is a normal number line with no true zero
zero lowest possible value Bodyweight after treatment expressed relative to bodyweight before treatment (difference score)

4. Ratio scale is a normal number line with a true zero


There is no value lower than zero Bodyweight, BP, protein expression

Experimental Method
1. Independent measures Subjects assigned to SINGLE treatment
Cardiovascular function in rats born from normal or hypertensive mothers

2. Repeated measures Subjects assigned to ALL treatments


In vitro cardiac contractility under hypoxic, normoxic and hyperoxic condtions

3. Paired samples - Subject assigned to one treatment, compared to matched subject in another treatment
Match rats in sex, BW & age when measuring BP on control or high salt diet

Descriptive Statistics
Measures of central tendency Measures of variability

Measures of Central Tendency


Describe an entire group of scores with a single value Median middle score or 50th percentile Mode most common score Mean average score

18 20 21 23 24 26 27 29 30 31 31 31
Median = 26.5 Mean = 25.9 Mode = 31

Central tendency: Which one?


The most appropriate measure of central tendency depends on the distribution of scores
No Mode

Mode Mean Median

Mode

Mean Median Mode

Mean Median

Mode Median Mean

Mode Median Mean

Central tendency: Which one?

No Mode

Mode Mean Median

Mode

Mean Median Mode

Mean Median

Almost always use the Mean


But sometimes the Median or Mode is more appropriate

When to use the Median....


Data measured on an ordinal (ranked) scale There are a few extreme scores
Positively skewed Negatively skewed

Mode Median Mean

Mode Median Mean

When to use the Mode....


Data measured on a nominal scale
Frequency
30 20 10 0 Papers Supervisor Funding PhD PostDoc

Source of Stress

A more sensible measure as it is an actual value


Average family has 2.4 children and 1.8 motor vehicles Modal family has 2 children and 2 motor vehicles

Measures of Variability
A single number that describes how spread out the data is
Range, IQR, semi-IQR Standard Deviation Standard Error of the Mean

1. Describes the distribution scores spread or clustered? 2. How well a single score represents entire sample
Low variability small sample size High variability large sample size

Measures of variability: Range


Range: difference between lowest and highest score Determined by the extreme values not sensitive to distribution Interquartile range: difference between 1st quartile and 3rd quartile Ignores top & bottom 25% of scores (very extreme scores) Focuses on middle 50% of scores Semi-IQR range: the middle half of the IQR Ignores top & bottom 37.5% of scores (extreme scores) Focuses on middle 25% of scores

Measures of variability: SD
The standard deviation describes variability by considering the distance between each raw score and the mean
Mean

-ve

+ve

Measures of variability: SEM


The sample mean should be representative of the population mean, but there is always error The size of this error is calculated as standard error of the mean SEM = standard deviation / N

Sample size (N) is a determining factor in SEM


Large sample better reflects population than small sample Therefore, as sample size the SEM

When to use Range, SD & SEM?


Range/IQR describes diff b/n low and high scores
Crude measure doesnt consider distance b/n scores Often used for ordinal data, or very samll samples (eg. N=3-4)

SD describes variance in within your sample SEM describes variance b/n sample mean and population mean

What is it you are trying to show with your data?

Basics of Hypothesis Testing


Nomenclature Alpha level Type I error Type II error Power Assumptions

The process of Hypothesis Testing


The hypothesis states that treatment will have an effect (H1) or have no effect (H0)
Directional (one tailed) Non-directional (two tailed)

Null hypothesis (H0) states that in the population there will be no change, no difference, or no relationship
In experiment - the IV will not effect the DV

Alternative hypothesis (H1) states that in the population there is a change, a difference, or a relationship
In experiment - the IV will effect the DV

The process of Hypothesis Testing

High probability values fall within this region if the H0 is true

Low probability values fall within this region if H0 is true

Low probability values fall within this region if H0 is true

Need to define high and low probability

The process of Hypothesis Testing


The alpha level or level of significance is used to determine sample scores that are unlikely to occur if the H0 is true usually = .05 These very unlikely values in the tails of the distribution are said to fall in the critical region

Reject H0 Accept H0

Reject H0

Critical region (extreme 5%)

The process of Hypothesis Testing


Collect data, summarise using descriptive stats
Central tendency (mean, median or mode) Variability (range, SD or SEM)

Run statistical test Make a decision about the hypothesis


Reject H0 if sample values fall in critical region
Conclude that treatment did have an effect

Accept H0 if sample values not in critical region


Conclude that treatment did not have an effect

BUT there are errors in hypothesis testing

Type I Error
Type I error is when you incorrectly reject the H0
Conclude the treatment has an effect when in fact the treatment has no effect

Sample data fall in critical region by chance

Reject H0

Reject H0

Sample mean falls in critical region by chance (not from treatment)

Type I Error
Probability of sample data falling in critical region by chance (type I error) is equal to alpha
= .05 then there is 5% chance of type I error
Reject H0 p < .05 Middle 95% p > .05 Fail to reject H0 If treatment effect significant (p<.05) the probability that the difference occurred by chance is less than 5% Reject H0 p < .05

Type II Error
Type II error () is when you incorrectly accept H0
Conclude the treatment has no effect when in fact the treatment does have an effect

Sample data is not in the critical region even though the treatment had an effect Usually happens when treatment effect is small
May be overcome by increasing sample size

Power in hypothesis testing


Type I and II error are about the potential for making an error in hypothesis testing The power of a hypothesis test is about the potential to reach the correct decision Power is defined as the probability the statistical test will correctly detect a true treatment effect
More powerful a statistical test, the more readily it will detect treatment effect when one really exists

Power in hypothesis testing


The concepts of statistical power and type II error are closely related
Type II error: probability of failing to detect a treatment effect Power: probability of correctly detecting a treatment effect

Power = 1 -
Type II error = 0.2 then Power = 0.8 80% chance of correctly detecting a treatment effect

Power in hypothesis testing


Power is related to sample size, alpha value and the size of treatment effect
1. Sample size the sample size will the power
Larger sample better represents population More likely to detect treatment effect

2. Alpha level -

the alpha level will

the power

As alpha value decreases, beta value increases Power = 1 beta

3. Treatment effect -

size of treatment effect will

power

When treatment effect is large, power is high When treatment effect is small, power is low

Power in hypothesis testing


As power is related to sample size, alpha value and the size of treatment effect. You can calculate the sample size needed to achieve a statistical power > 0.8
Allows estimation of sample size required to detect a treatment effect (should one exist)

Values involved in calculation:


Power = 0.8 Alpha value = 0.05 Estimated treatment effect size = ? Sample size = ?

10

Power in hypothesis testing


Calculating the estimated treatment effect size
Search literature for a similar study to yours Find mean & SD (or variance) scores for the treatment groups Use these to calculate effect size (Cohens d)
Small effect size: Cohens d = 0.20 Medium effect size: Cohens d = 0.50 Large effect size: Cohens d = 0.80

Use computer software to do these calculations!


G*Power is free online software www.psycho.uni-duesseldorf.de/aap/projects/gpower/ (or just Goggle G*Power)

1. Test Family: T-test F-test 3. Input: Tails Alpha Beta 4. Effect size (d): Select determine

2. Statistical test: Repeatedmeasures Independentmeasures Etc.

6. Output parameters Sample size Actual power

5. Look in literature Add data from a study that is as similar as possible to yours

11

Why is Power useful?


Before an experiment:
How many animals do you need in each group? Is it feasible? Ethics applications!

During an experiment:
Interim analysis how many more animals needed?

After an experiment:
Treatment effect size? Why significance wasnt detected?

Assumptions of hypothesis testing


Statistical tests have assumptions that underlie their use Use parametric or non-parametric statistics? Two key assumptions that apply to parametric statistics: Normality Homogeneity of Variance Additional assumptions may apply to specific tests

Assumptions: Normality
Normality the distribution of sample means is normal
34.13% 13.59%

2.28%

Most tests robust to violations of normality (frequently violated!)


Usually okay if distribution is symmetrical If skewed distribution, only okay if all groups skewed in same direction

12

Assumptions: Testing Normality


Formal tests for normality (using SPSS):
Kolmogorov-Smirnov Shapiro-Wilk
Tests of Normality Kolmogorov-Smirnov Statistic df Sig. .180 13 .200* .155 13 .200* .174 9 .200*
a

A non-significant (p>.05) result indicates normality


Treatment Controls Asphyxia Creatine Asphyxia Statistic .955 .939 .936 Shapiro-Wilk df 13 13 9 Sig. .677 .450 .544

Bodyweight

*. This is a lower bound of the true significance. a. Lilliefors Significance Correction

Dependent Variable

Independent Variable (3 groups)

Output for each group

Normality can also be assessed by graphing distribution of scores

Histogram
Frequency bar graph Bell shaped distribution?

Normal Q-Q Plot


Scores plotted against expected scores Reasonably straight line?

Box Plot
Box represents 50% scores Box in middle of horizontal line?

Detrended Normal Q-Q Plot


Plots deviation of scores from the straight line Most points lie around zero?

Assumptions: Homogeneity of Variance


Homogeneity of variance assumes samples are taken from populations of equal variance The variability of scores for each of the groups is similar

Levenes test is a formal test for homogeneity of variance A non-significant result (P>.05) indicates NOT violated SPSS can run a Levenes test for t-tests and for ANOVA

13

Assumptions: Homogeneity of Variance


When this assumption is violated, SPSS gives alternative outputs for t-tests
Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -72.56979 -72.58643 19.49286 19.50950

F Bodyweight Equal variances assumed Equal variances not assumed .423

Sig. .521

t -1.190 -1.190

df 24 23.837

Sig. (2-tailed) .246 .246

Mean Difference -26.53846 -26.53846

Std. Error Difference 22.30309 22.30309

ANOVA is robust enough to handle violation of this assumption if similar sample sizes (max difference about 1.5)

Quick Summary
Independent and dependent variables Scales of measurement Types of experimental designs Measures of central tendency and variability Hypothesis testing:
Type I error Type II error Power Assumptions

14

Você também pode gostar