Você está na página 1de 31

TYPES OF DATA &

DESCRIPTIVE STATISTICS
OVERVIEW

 Operational definitions

 Types of variable and types of data

 Descriptive statistics

 Central tendency: Mean, median, and mode

 Dispersion: Range, interquartile range, and standard deviation


OPERATIONAL DEFINITIONS

 Cannot measure/ observe psychological constructs directly


 Watching EastEnders is depressing
 But what do we mean by depressing?

 So, must define variables in terms of measures that we use


 An operational definition
 Depression = viewers crying, score on a diagnostic questionnaire,
number of chocolates eaten per episode etc.

The way in which we choose to operationalise a construct will


determine the type of variable and data that we have in our study
TYPES OF VARIABLE

 Categorical variables
 Involves placing observations into categories
 Do our participants cry during EastEnders – YES or NO?

 Measured/ continuous variables


 Each observation is assigned a value on a scale
 Scores on the Beck Depression Inventory after watching EastEnders

Many psychological constructs can be treated as both categorical and


measured variables. We choose which as the researcher
TYPES OF DATA

Definitio
Variable Data
n

Ordinal
Measured
Interval/Ratio
Operationalise

Categorical Nominal
TYPES OF DATA

 Nominal data
 Names and categorises observations
 1 = did cry, 2 = did not cry

 Ordinal data
 Orders observations using some kind of scale
 Rank these programmes from most to least depressing

 Interval data
 Measures observations in terms of units of numerical difference
 Score on a depression scale

 Ratio data
 Same as interval level, but with a true, non-arbitrary zero-point
 Number of comfort chocolates eaten
TYPES OF DATA

Interval/ ratio Data can be reduced


 Most informative but not raised
 Mary (20-mins) vs. Bruce (10)
through levels.

Ordinal However,
 Less informative information is lost as
 Mary cried > Bruce we reduce.

Nominal
 Least informative
 Mary and Bruce both cried
AN EXAMPLE

Construct
 Alertness while driving

Operational definition
 RT in a driving simulator

Type of variable
 Measured
How else might we operationalise this variable?

Type of data
 Interval/ ratio
DESCRIPTIVE STATISTICS

 Statistics that we use to summarise our data


 Describe to the reader what you have found
 Indicate the pattern of results

 Central tendency
 Indicates the average value in the data set
 Mean, median, mode

 Dispersion
 Indicates the spread of scores – how are they distributed?
 Range, interquartile range, standard deviation

The statistics that we use to summarise our results depend on how we


have operationalized – what type of variable and data do we have?
CENTRAL TENDENCY: MEAN

Sum
Score
x
Mean of x
sample N Number of values
in the data set
CENTRAL TENDENCY: MEAN

 Advantages
 Sensitive: Takes value of each data point into account

 Disadvantages
 Can be distorted by rogue data points (extreme values)

 Type of variable
 Measured

 Type of data
 Interval/ ratio
AN EXAMPLE: MEAN

 Team score on a pub quiz over a 10-week period


 80, 65, 53, 44, 39, 51, 77, 35, 56, 61

x
x =
80 + 65 + 53 + 44 + 39 + 51 + 77 + 35 + 56 + 61
10
N
561
=
10

= 56.1
CENTRAL TENDENCY: MEDIAN & MODE

 Median
 Central value when the data set is arranged sequentially

 Mode
 The most commonly occurring value in the data set
CENTRAL TENDENCY: MEDIAN

 Advantages
 Not distorted by rogue data points (extreme values)

 Disadvantages
 Less sensitive: Does not take value of each data point into account

 Type of variable
 Measured

 Type of data
 Ordinal (interval/ ratio when mean is likely to be biased)
AN EXAMPLE: MEDIAN

 Finishing position in a pub quiz over 9-week period


 3 rd , 4 th , 1 s t , 1 s t , 5 th , 3 rd , 3 r d , 2 n d , 6 t h

 Arrange the data set sequentially


 1, 1, 2, 3, 3, 3, 4, 5, 6

Median = 3
 Median value = (N+1) / 2
 = (9 + 1) / 2 = the 5 t h value, which is 3
AN EXAMPLE: MEDIAN

 Finishing position in a pub quiz over 10-week period


 3 rd , 4 th , 1 s t , 1 s t , 5 th , 3 rd , 3 r d , 2 n d , 6 t h , 4 t h

 Arrange the data set sequentially


 1, 1, 2, 3, 3, 3, 4, 4, 5, 6

Median = 3
 Median value = (N+1) / 2
 = (10 + 1) / 2 = the 5.5 t h value, which is 3
AN EXAMPLE: MEDIAN

 Finishing position in a pub quiz over 10-week period


 3 rd , 4 th , 1 s t , 1 s t , 5 th , 4 th , 3 r d , 2 n d , 6 t h , 4 t h

 Arrange the data set sequentially


 1, 1, 2, 3, 3, 4, 4, 4, 5, 6

Median = (3+4)/2 = 3.5


 Median value = (N+1) / 2
 = (10 + 1) / 2 = the 5.5 t h value, which is 3
CENTRAL TENDENCY: MODE

 Advantages
 Not distorted by rogue data points (extreme values)

 Disadvantages
 Less sensitive: Does not take value of each data point into account

 Type of variable
 Categorical

 Type of data
 Nominal
AN EXAMPLE: MODE

 How drunk we got during the quiz over a 10-week period


 1 = extremely, 2 = moderately, 3 = slightly, 4 = not at all
 1, 2, 2, 2, 3, 1, 4, 2, 3, 3

 Mode
 2 (moderately drunk)

 Modal frequency
 4 (we were moderately drunk 4 times out of 10)

The statistics that we use to summarise our results depend on how we


have operationalized – what type of variable and data do we have?
DISPERSION

 A mean by itself is useless!


 Also need to know the distribution of scores around the mean

Mean age = 35
years

Mean age = 35
years
DISPERSION

 A mean by itself is useless!


 Means can be very similar but spread of scores may differ greatly

Low variability
(scores clustered
around mean)

High variability
spread (scores
widely spread)
DISPERSION: THE RANGE

 Range = (highest score – lowest score) + 1


 Larger values = greater variability/ spread

(36-34)+1 = 3

(69-1)+1 = 69
DISPERSION: THE RANGE

 Interquartile range
 Middle 50% of scores
 i.e. difference between top quarter and bottom quarter

 Semi-interquartile range
 Interquartile range / 2

Both of these values tell us something about the central grouping of


scores. However, they ignore extreme values
DISPERSION: THE RANGE

 Child’s score on weekly spelling test over a 23-week period

 Arrange the data set sequentially


 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 9

Lower Median = Upper


quartile = 3 5 quartile = 7

 Interquartile range = upper quartile – lower quartile = 4

 Semi-interquartile range = 4 / 2 = 2
DISPERSION: DEVIATION

 Deviation
 Difference between an individual score and the mean

 Mean deviation
 The average difference between each individual score and the mean

 The standard deviation


 Estimate of the mean deviation for the population
 (More on populations and samples later)

In your practical reports we will ask you to present the mean value
and the standard deviation for each condition of your experiment
DISPERSION: DEVIATION

Mean value = 35-years

d  xx
Oscar’s deviation = 1 – 35 = -34
Boris’ deviation = 35 – 35 = 0
Percy’s deviation = 69 – 35 = 34
DISPERSION: MEAN DEVIATION

Mean age = 35-years

d  xx
Mean deviation = 34 + 0 + 34 / 3 = 22.67

Remove any minus signs when calculating the mean deviation,


otherwise it will always be equal to zero!
DISPERSION: STANDARD DEVIATION

Mean age = 35-years

s
  x  x 2

N 1
Standard deviation = √(-34 2 + 0 2 + 34 2 ) / 2 = 34
DISPERSION: STANDARD DEVIATION

 DON’T PANIC!

 SPSS can calculate these things for you

 You don’t need to memorise the formulas

 Just remember which stat to use when and what they mean

 See the SPSS video demos on Moodle


SUMMARY: KEY TERMINOLOGY

 Operational definition
 Categorical variable
 Measured variable
 Nominal data
 Ordinal data
 Interval/ ratio data
 Central tendency
 Mean
 Median
 Mode
 Dispersion
 Range
 Standard deviation
READING

 Coolican, H. (2014). Research methods and statistics in


psychology (Chapter 13). London: Hodder Stoughton.

 Look out for the new SPSS videos on Moodle!

Você também pode gostar