Zoe Ireland's Slides From Part 1

Statistics: A crash course
Zoe Ireland
z.ireland@uq.edu.au
Topics to cover....
Basic nomenclature Descriptive statistics Hypothesis testing: Error, Power, Assumptions Types of parametric tests (t-test, ANOVA, post-hoc) Types of non-parametric tests
Useful resources: Statistics for the behavioural sciences. Gravetter & Wallnau. Research design and statistics. Thomas Edwards. SPSS survival manual. A step-by-step guide to data analysis using SPSS for Windows (Version 15). J Pallant. G*Power - http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/
Basic Nomenclature
Independent & Dependent variables Scales of measurement Types of experimental design
Variables The IV and DV

Independent variable is defined by the researcher Dependent variable is measured by the researcher (it depends on the independent variable)
Diet type: Control or high salt Blood pressure in animals fed control or high salt diet Activity levels: Restricted or unrestricted Adiposity levels with restricted or unrestricted activity Treatment: Saline or anti-depressant drug Anxiety level following placebo or drug
Scales of Measurement for DV

Limits the statistical tests that can be used 1. Nominal scale categorises observations
Behaviour grooming, feeding, playing Genotype heterozygote or homozygote
2. Ordinal scale ranks observations

First, second, third etc. Interval between rankings doesnt have to be equal
EG. scores 10, 11, 15, 21, 26 ranked 1-5
Scales of Measurement for DV

3. Interval scale is a normal number line with no true zero
zero lowest possible value Bodyweight after treatment expressed relative to bodyweight before treatment (difference score)
4. Ratio scale is a normal number line with a true zero

There is no value lower than zero Bodyweight, BP, protein expression
Experimental Method
1. Independent measures Subjects assigned to SINGLE treatment
Cardiovascular function in rats born from normal or hypertensive mothers
2. Repeated measures Subjects assigned to ALL treatments

In vitro cardiac contractility under hypoxic, normoxic and hyperoxic condtions
3. Paired samples - Subject assigned to one treatment, compared to matched subject in another treatment
Match rats in sex, BW & age when measuring BP on control or high salt diet
Descriptive Statistics
Measures of central tendency Measures of variability
Measures of Central Tendency

Describe an entire group of scores with a single value Median middle score or 50th percentile Mode most common score Mean average score
18 20 21 23 24 26 27 29 30 31 31 31
Median = 26.5 Mean = 25.9 Mode = 31
Central tendency: Which one?

The most appropriate measure of central tendency depends on the distribution of scores
No Mode
Mode Mean Median
Mode
Mean Median Mode
Mean Median
Mode Median Mean
Mode Median Mean
Central tendency: Which one?
No Mode
Mode Mean Median
Mode
Mean Median Mode
Mean Median
Almost always use the Mean

But sometimes the Median or Mode is more appropriate
When to use the Median....

Data measured on an ordinal (ranked) scale There are a few extreme scores
Positively skewed Negatively skewed
Mode Median Mean
Mode Median Mean
When to use the Mode....

Data measured on a nominal scale
Frequency
30 20 10 0 Papers Supervisor Funding PhD PostDoc
Source of Stress
A more sensible measure as it is an actual value

Average family has 2.4 children and 1.8 motor vehicles Modal family has 2 children and 2 motor vehicles
Measures of Variability
A single number that describes how spread out the data is
Range, IQR, semi-IQR Standard Deviation Standard Error of the Mean
1. Describes the distribution scores spread or clustered? 2. How well a single score represents entire sample
Low variability small sample size High variability large sample size
Measures of variability: Range

Range: difference between lowest and highest score Determined by the extreme values not sensitive to distribution Interquartile range: difference between 1st quartile and 3rd quartile Ignores top & bottom 25% of scores (very extreme scores) Focuses on middle 50% of scores Semi-IQR range: the middle half of the IQR Ignores top & bottom 37.5% of scores (extreme scores) Focuses on middle 25% of scores
Measures of variability: SD
The standard deviation describes variability by considering the distance between each raw score and the mean
Mean
-ve
+ve
Measures of variability: SEM

The sample mean should be representative of the population mean, but there is always error The size of this error is calculated as standard error of the mean SEM = standard deviation / N
Sample size (N) is a determining factor in SEM

Large sample better reflects population than small sample Therefore, as sample size the SEM
When to use Range, SD & SEM?

Range/IQR describes diff b/n low and high scores
Crude measure doesnt consider distance b/n scores Often used for ordinal data, or very samll samples (eg. N=3-4)
SD describes variance in within your sample SEM describes variance b/n sample mean and population mean
What is it you are trying to show with your data?
Basics of Hypothesis Testing

Nomenclature Alpha level Type I error Type II error Power Assumptions
The process of Hypothesis Testing

The hypothesis states that treatment will have an effect (H1) or have no effect (H0)
Directional (one tailed) Non-directional (two tailed)
Null hypothesis (H0) states that in the population there will be no change, no difference, or no relationship
In experiment - the IV will not effect the DV
Alternative hypothesis (H1) states that in the population there is a change, a difference, or a relationship
In experiment - the IV will effect the DV
High probability values fall within this region if the H0 is true
Low probability values fall within this region if H0 is true
Low probability values fall within this region if H0 is true
Need to define high and low probability

The alpha level or level of significance is used to determine sample scores that are unlikely to occur if the H0 is true usually = .05 These very unlikely values in the tails of the distribution are said to fall in the critical region
Reject H0 Accept H0
Reject H0
Critical region (extreme 5%)

Collect data, summarise using descriptive stats
Central tendency (mean, median or mode) Variability (range, SD or SEM)
Run statistical test Make a decision about the hypothesis

Reject H0 if sample values fall in critical region
Conclude that treatment did have an effect
Accept H0 if sample values not in critical region

Conclude that treatment did not have an effect
BUT there are errors in hypothesis testing
Type I Error
Type I error is when you incorrectly reject the H0
Conclude the treatment has an effect when in fact the treatment has no effect
Sample data fall in critical region by chance
Reject H0
Reject H0
Sample mean falls in critical region by chance (not from treatment)
Type I Error
Probability of sample data falling in critical region by chance (type I error) is equal to alpha
= .05 then there is 5% chance of type I error
Reject H0 p < .05 Middle 95% p > .05 Fail to reject H0 If treatment effect significant (p<.05) the probability that the difference occurred by chance is less than 5% Reject H0 p < .05
Type II Error
Type II error () is when you incorrectly accept H0
Conclude the treatment has no effect when in fact the treatment does have an effect
Sample data is not in the critical region even though the treatment had an effect Usually happens when treatment effect is small
May be overcome by increasing sample size
Power in hypothesis testing

Type I and II error are about the potential for making an error in hypothesis testing The power of a hypothesis test is about the potential to reach the correct decision Power is defined as the probability the statistical test will correctly detect a true treatment effect
More powerful a statistical test, the more readily it will detect treatment effect when one really exists

The concepts of statistical power and type II error are closely related
Type II error: probability of failing to detect a treatment effect Power: probability of correctly detecting a treatment effect
Power = 1 -
Type II error = 0.2 then Power = 0.8 80% chance of correctly detecting a treatment effect

Power is related to sample size, alpha value and the size of treatment effect
1. Sample size the sample size will the power
Larger sample better represents population More likely to detect treatment effect
2. Alpha level -
the alpha level will
the power
As alpha value decreases, beta value increases Power = 1 beta
3. Treatment effect -
size of treatment effect will
power
When treatment effect is large, power is high When treatment effect is small, power is low

As power is related to sample size, alpha value and the size of treatment effect. You can calculate the sample size needed to achieve a statistical power > 0.8
Allows estimation of sample size required to detect a treatment effect (should one exist)
Values involved in calculation:

Power = 0.8 Alpha value = 0.05 Estimated treatment effect size = ? Sample size = ?
10

Calculating the estimated treatment effect size
Search literature for a similar study to yours Find mean & SD (or variance) scores for the treatment groups Use these to calculate effect size (Cohens d)
Small effect size: Cohens d = 0.20 Medium effect size: Cohens d = 0.50 Large effect size: Cohens d = 0.80
Use computer software to do these calculations!

G*Power is free online software www.psycho.uni-duesseldorf.de/aap/projects/gpower/ (or just Goggle G*Power)
1. Test Family: T-test F-test 3. Input: Tails Alpha Beta 4. Effect size (d): Select determine
2. Statistical test: Repeatedmeasures Independentmeasures Etc.
6. Output parameters Sample size Actual power
5. Look in literature Add data from a study that is as similar as possible to yours
11
Why is Power useful?

Before an experiment:
How many animals do you need in each group? Is it feasible? Ethics applications!
During an experiment:
Interim analysis how many more animals needed?
After an experiment:
Treatment effect size? Why significance wasnt detected?
Assumptions of hypothesis testing

Statistical tests have assumptions that underlie their use Use parametric or non-parametric statistics? Two key assumptions that apply to parametric statistics: Normality Homogeneity of Variance Additional assumptions may apply to specific tests
Assumptions: Normality
Normality the distribution of sample means is normal
34.13% 13.59%
2.28%
Most tests robust to violations of normality (frequently violated!)

Usually okay if distribution is symmetrical If skewed distribution, only okay if all groups skewed in same direction
12
Assumptions: Testing Normality

Formal tests for normality (using SPSS):
Kolmogorov-Smirnov Shapiro-Wilk
Tests of Normality Kolmogorov-Smirnov Statistic df Sig. .180 13 .200* .155 13 .200* .174 9 .200*
a
A non-significant (p>.05) result indicates normality

Treatment Controls Asphyxia Creatine Asphyxia Statistic .955 .939 .936 Shapiro-Wilk df 13 13 9 Sig. .677 .450 .544
Bodyweight
*. This is a lower bound of the true significance. a. Lilliefors Significance Correction
Dependent Variable
Independent Variable (3 groups)
Output for each group
Normality can also be assessed by graphing distribution of scores
Histogram
Frequency bar graph Bell shaped distribution?
Normal Q-Q Plot

Scores plotted against expected scores Reasonably straight line?
Box Plot
Box represents 50% scores Box in middle of horizontal line?
Detrended Normal Q-Q Plot

Plots deviation of scores from the straight line Most points lie around zero?
Assumptions: Homogeneity of Variance

Homogeneity of variance assumes samples are taken from populations of equal variance The variability of scores for each of the groups is similar
Levenes test is a formal test for homogeneity of variance A non-significant result (P>.05) indicates NOT violated SPSS can run a Levenes test for t-tests and for ANOVA
13
Assumptions: Homogeneity of Variance

When this assumption is violated, SPSS gives alternative outputs for t-tests
Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -72.56979 -72.58643 19.49286 19.50950
F Bodyweight Equal variances assumed Equal variances not assumed .423
Sig. .521
t -1.190 -1.190
df 24 23.837
Sig. (2-tailed) .246 .246
Mean Difference -26.53846 -26.53846
Std. Error Difference 22.30309 22.30309
ANOVA is robust enough to handle violation of this assumption if similar sample sizes (max difference about 1.5)
Quick Summary
Independent and dependent variables Scales of measurement Types of experimental designs Measures of central tendency and variability Hypothesis testing:
Type I error Type II error Power Assumptions
14

Zoe Ireland&#39;s Slides From Part 1

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Zoe Ireland&#39;s Slides From Part 1

Enviado por

Direitos autorais:

Formatos disponíveis

Statistics: A crash course

Variables The IV and DV

Scales of Measurement for DV

2. Ordinal scale ranks observations

Scales of Measurement for DV

4. Ratio scale is a normal number line with a true zero

2. Repeated measures Subjects assigned to ALL treatments

Measures of Central Tendency

Central tendency: Which one?

Mode Mean Median

Mean Median Mode

Mode Median Mean

Mode Median Mean

Central tendency: Which one?

Mode Mean Median

Mean Median Mode

Almost always use the Mean

When to use the Median....

Mode Median Mean

Mode Median Mean

When to use the Mode....

A more sensible measure as it is an actual value

Measures of variability: Range

Measures of variability: SEM

Sample size (N) is a determining factor in SEM

When to use Range, SD & SEM?

What is it you are trying to show with your data?

Basics of Hypothesis Testing

The process of Hypothesis Testing

The process of Hypothesis Testing

High probability values fall within this region if the H0 is true

Low probability values fall within this region if H0 is true

Low probability values fall within this region if H0 is true

Need to define high and low probability

The process of Hypothesis Testing

Critical region (extreme 5%)

The process of Hypothesis Testing

Run statistical test Make a decision about the hypothesis

Accept H0 if sample values not in critical region

BUT there are errors in hypothesis testing

Sample data fall in critical region by chance

Sample mean falls in critical region by chance (not from treatment)

Power in hypothesis testing

Power in hypothesis testing

Power in hypothesis testing

the alpha level will

As alpha value decreases, beta value increases Power = 1 beta

size of treatment effect will

Power in hypothesis testing

Values involved in calculation:

Power in hypothesis testing

Use computer software to do these calculations!

2. Statistical test: Repeatedmeasures Independentmeasures Etc.

6. Output parameters Sample size Actual power

Why is Power useful?

Assumptions of hypothesis testing

Most tests robust to violations of normality (frequently violated!)

Assumptions: Testing Normality

A non-significant (p>.05) result indicates normality

*. This is a lower bound of the true significance. a. Lilliefors Significance Correction

Independent Variable (3 groups)

Output for each group

Normality can also be assessed by graphing distribution of scores

Zoe Ireland's Slides From Part 1

Zoe Ireland's Slides From Part 1