Statistics 4040

Khodaboc
REVISION NOTES FOR STATISTICS

us Aihjaaz
AVERAGE
Mean
There are four types of average: mean, mode, median and range. The mean is what
most people mean when they say 'average'. It is found by adding up all of the numbers
you have to find the mean of, and dividing by the number of numbers. So the mean of
3, 5, 7, 3 and 5 is 23/5 = 4.6 .
When you are given data which has been grouped, the mean is Sfx / Sf , where f is the
frequency and x is the midpoint of the group (S means 'the sum of').
Example: Work out an estimate for the mean height.

Height (cm) Number of People (f) Midpoint (x) fx (f multiplied by x)
101-120 1 110.5 110.5
121-130 3 125.5 376.5
131-140 5 135.5 677.5
141-150 7 145.5 1018.5
151-160 4 155.5 622
161-170 2 165.5 331
171-190 1 180.5 180.5
Sfx = 3316.5
Sf = 23
mean = 3316.5/23 = 144cm (3s.f.)
Mode
The mode is the number in a set of numbers which occurs the most. So the modal value
of 5, 6, 3, 4, 5, 2, 5 and 3 is 5, because there are more 5s than any other number.
Range
The range is the largest number in a set minus the smallest number. So the range of 5,
7, 9 and 14 is (14 - 5) = 9.
The Median Value

The median of a group of numbers is the number in the middle, when the numbers are
in order of magnitude. For example, if the set of numbers is 4, 1, 6, 2, 6, 7, 8, the
median is 6:
1, 2, 4, 6, 6, 7, 8 (6 is the middle value when the numbers are in order)
If you have n numbers in a group, the median is the (n + 1)/2 th value. For example,
there are 7 numbers in the example above, so replace n by 7 and the median is the (7 +
1)/2 th value = 4th value. The 4th value is 6.
1
Khodaboc
us Aihjaaz
Cumulative Frequency
This is the running total of the frequencies. On a graph, it can be represented by a
cumulative frequency polygon, where straight lines join up the points, or a cumulative
frequency curve.
Example:
Frequency: Cumulative frequency:
4 4
6 10 (4 + 6)
3 13 (4 + 6+ 3)
2 15 (4 + 6+ 3 + 2)
6 21 (4 + 6+ 3 + 2 + 6)
4 25 (4 + 6+ 3 + 2 + 6 + 4)
The Median Value

The median of a group of numbers is the number in the middle, when the numbers are in
order of magnitude. For example, if the set of numbers is 4, 1, 6, 2, 6, 7, 8, the median is
6:
1, 2, 4, 6, 6, 7, 8 (6 is the middle value when the numbers are in order)
If you have n numbers in a group, the median is the (n + 1)/2 th value. For example, there
are 7 numbers in the example above, so replace n by 7 and the median is the (7 + 1)/2 th
value = 4th value. The 4th value is 6.
When dealing with a cumulative frequency curve, n is the cumulative frequency (25 in the
above example). Therefore the median would be the 13th value. To find this, on the
cumulative frequency curve, find 13 on the y-axis (which should be labelled cumulative
frequency). The corresponding 'x' value is an estimation of the median.
Quartiles
If we divide a cumulative frequency curve into quarters, the value at the lower quarter is
referred to as the lower quartile, the value at the middle gives the median and the value at
the upper quarter is the upper quartile.
A set of numbers may be as follows: 8, 14, 15, 16, 17, 18, 19, 50. The mean of these
numbers is 19.625 . However, the extremes in this set (8 and 50) distort this value. The
interquartile range is a method of measuring the spread of the middle 50% of the values
and is useful since it ignore the extreme values.
The lower quartile is (n+1)/4 th value (n is the cumulative frequency, i.e. 157 in this case)
and the upper quartile is the 3(n+1)/4 the value. The difference between these two is the
interquartile range (IQR).
In the above example, the upper quartile is the 118.5th value and the lower quartile is the
39.5th value. If we draw a cumulative frequency curve, we see that the lower quartile,
therefore, is about 17 and the upper quartile is about 37. Therefore the IQR is 20 (bear in
mind that this is a rough sketch- if you plot the values on graph paper you will get a more
accurate value).
2
Khodaboc
us Aihjaaz
Histograms
Histograms are similar to bar charts apart from the consideration of areas. In a bar
chart, all of the bars are the same width and the only thing that matters is the
height of the bar. In a histogram, the area is the important thing.
Example: Draw a histogram for the following information.

Frequency:
Height (feet): (Number of pupils) Relative frequency:
0-2 0 0
2-4 1 1
4-5 4 8
5-6 8 16
6-8 2 2
(Ignore relative frequency for now). It is difficult to draw a bar chart for this
information, because the class divisions for the height are not the same. The height
is grouped 0-2, 2-4 etc, but not all of the groups are the same size. For example the
4-5 group is smaller than the 0-2 group.
When drawing a histogram, the y-axis is labelled 'relative frequency' or 'frequency

density'. You must work out the relative frequency before you can draw a
histogram. To do this, first you must choose a standard width of the groups. Some
of the heights are grouped into 2s (0-2, 2-4, 6-8) and some into 1s (4-5, 5-6). Most
are 2s, so we shall call the standard width 2. To make the areas match, we must
double the values for frequency which have a class division of 1 (since 1 is half of
2). Therefore the figures in the 4-5 and the 5-6 columns must be doubled. If any of
the class divisions were 4 (for example if there was a 8-12 group), these figures
would be halved. This is because the area of this 'bar' will be twice the standard
width of 2 unless we half the frequency
3
Khodaboc
us Aihjaaz
PROBABILITY
Introduction
Probability is the likelihood or chance of an event occurring.
Probability = the number of ways of achieving success
the total number of possible outcomes
For example, the probability of flipping a coin and it being heads is ½, because there is 1
way of getting a head and the total number of possible outcomes is 2 (a head or tail).
We write P(heads) = ½ .
The probability of something which is certain to happen is 1.

The probability of something which is impossible to happen is 0.
The probability of something not happening is 1 minus the probability that it will happen.
Single Events
Example:
There are 6 beads in a bag, 3 are red, 2 are yellow and 1 is blue. What is the probability
of picking a yellow?
The probability is the number of yellows in the bag divided by the total number of balls,
i.e. 2/6 = 1/3.
Example:
There is a bag full of coloured balls, red, blue, green and orange. Balls are picked out
and replaced. John did this 1000 times and obtained the following results:
Number of blue balls picked out: 300
Number of red balls: 200
Number of green balls: 450
Number of orange balls: 50
a) What is the probability of picking a green ball?

b) If there are 100 balls in the bag, how many of them are likely to be green?
a) For every 1000 balls picked out, 450 are green. Therefore P(green) = 450/1000 =
0.45
b) The experiment suggests that 450 out of 1000 balls are green. Therefore, out of 100
balls, 45 are green (using ratios).
Multiple Events
When working out what the probability of two things happening is, a probability/
possibility space can be drawn. For example, if you throw two dice, what is the
probability that you will get: a) 8, b) 9, c) either 8 or 9?
a) The black blobs indicate the ways of getting 8 (a 2 and a 6, a 3 and a 5, ...). There
are 5 different ways. The probability space shows us that when throwing 2 dice, there
are 36 different possibilities (36 squares). With 5 of these possibilities, you will get 8.
4
Khodaboc
us Aihjaaz
Therefore P(8) = 5/36 .

b) The red blobs indicate the ways of getting 9. There are four ways, therefore P(9) =
4/36 = 1/9.
c) You will get an 8 or 9 in any of the 'blobbed' squares. There are 9 altogether, so P(8
or 9) = 9/36 = 1/4 .
Another way of representing 2 or more events is on a probability tree.
Example:
There are 3 balls in a bag: red, yellow and blue. One ball is picked out, and not
replaced, and then another ball is picked out.
The first ball can be red, yellow or blue. The probability is 1/3 for each of these. If a red
ball is picked out, there will be two balls left, a yellow and blue. The probability the
second ball will be yellow is 1/2 and the probability the second ball will be blue is 1/2.
The same logic can be applied to the cases of when a yellow or blue ball is picked out
first.
In this example, the question states that the ball is not replaced. If it was, the
probability of picking a red ball (etc.) the second time will be the same as the first (i.e.
1/3).
The AND and OR rules

In the above example, the probability of picking a red first is 1/3 and a yellow second is
1/2. The probability that a red AND then a yellow will be picked is 1/3 × 1/2 = 1/6 (this
is shown at the end of the branch).
The probability of picking a red OR yellow first is 1/3 + 1/3 = 2/3.
When the word 'and' is used we multiply. When 'or' is used, we add. On a probability
tree, when moving from left to right we multiply and when moving down we add.
Example:
What is the probability of getting a yellow and a red in any order?
This is the same as: what is the probability of getting a yellow AND a red OR a red AND
a yellow.
P(yellow and red) = 1/3 × 1/2 = 1/6
P(red and yellow) = 1/3 × 1/2 = 1/6
P(yellow and red or red and yellow) = 1/6 + 1/6 = 1/3
5
Khodaboc
us Aihjaaz
There are a number of ways of representing data diagrammatically. (See also

Histograms).
Scatter Graphs
These are used to compare two sets of data. A line of best fit is drawn, which should
pass through as many points as possible. It should have roughly the same number of
points above and below it.
The less scatter there is about the best-fit line, the stronger the relationship is between
the two quantities. If the points are close to the best-fit line, we say that there is a
strong correlation. If the points are loosely scattered, there is a weak correlation. There
is no correlation if there is no trend in the results.
Bar Chart
A bar chart is a chart where the height of bars represents the frequency. The data is
'discrete' (discontinuous- unlike histograms where the data is continuous). The bars
should be separated by small gaps.
Pie Chart
A pie chart is a circle which is divided into a number of parts.
The pie chart above shows the TV viewing figures for the following TV programmes:
Eastenders, 15 million
Casualty, 10 million
Peak Practice, 5 million
The Bill, 8 million
Total number of viewers for the four programmes is 38 million. To work out the angle
that 'Eastenders' will have in the pie chart, we divide 15 by 38 and multiply by 360
(degrees). This is 142 degrees. So 142 degrees of the circle represents Eastenders.
Similarly, 95 degrees of the circle is Casualty, 47 degrees is Peak Practice and the
remaining 76 degrees is The Bill.
6
Khodaboc
us Aihjaaz
Sampling
When examining a particular population it is usually advisable to choose a small sample in
such a way that everyone is represented. This is not easy and requires careful thought
about sample size and composition. Often questionnaires are devised to identify the
required information. These need to be idiot proof, so questions need to cover all
alternatives and give little scope for variation.
Example question:
A bus company attempted to estimate the number of people who travel on local buses in a
certain town. They telephoned 100 people in the town one evening and asked 'Have you
travelled by bus in the last week?'
Nineteen people said 'Yes'. The bus company concluded that 19% of the town's population
travel on local buses.
Give 3 criticisms of this method of estimation.
In answering this question, there are no 3 correct answers. As long as what you say is
plausible and sensible, you should get the marks. For example, you might say:
100 people in a large town is not a large enough proportion of the population to give a good
sample.
People who travel on local buses once a fortnight may have said no to the question. They
nevertheless travel on local buses.
On the evening that the sample was carried out, anybody travelling by bus would be out.
7
Khodaboc
us Aihjaaz
Standard Deviation
Lower case sigma means 'standard deviation'.

Capital sigma means 'the sum of'.
x bar means 'the mean'
The standard deviation measures the spread of the data about the mean value. It is
useful in comparing sets of data which may have the same mean but a different range.
For example, the mean of the following two is the same: 15, 15, 15, 14, 16 and 2, 7,
14, 22, 30. However, the second is clearly more spread out. If a set has a low standard
deviation, the values are not spread out too much.
Example:
Find the standard deviation of 4, 9, 11, 12, 17, 5, 8, 12, 14
First work out the mean: 10.222
Now, subtract the mean individually from each of the numbers in the question and
square the result. This is equivalent to the (x - xbar)² step. x refers to the values in the
question.
x 4 9 11 12 17 5 8 12 14
(x - x)² 38.7 1.49 0.60 3.16 45.9 27.3 4.94 3.16 14.3
Now add up these results (this is the 'sigma' in the formula): 139.55
Divide by n. n is the number of values, so in this case is 9: 15.51
And finally, square root this: 3.94
The standard deviation can usually be calculated much more easily with a calculator and
this is usually acceptable in exams. With some calculators, you go into the standard
deviation mode (often mode '.'). Then type in the first value, press 'data', type in the
second value, press 'data'. Do this until you have typed in all the values, then press the
standard deviation button (it will probably have a lower case sigma on it). Check your
calculator's manual to see how to calculate it on yours.
NB: If you have a set of numbers (e.g. 1, 5, 2, 7, 3, 5 and 3), if each number is
increased by the same amount (e.g. to 3, 7, 4, 9, 5, 7 and 5), the standard deviation
will be the same and the mean will have increased by the amount each of the numbers
were increased by (2 in this case).
When dealing with data such as the following:

x f
4 9
5 14
6 22
7 11
8 17
8
Khodaboc
us Aihjaaz
the formula for standard deviation becomes:
Try working out the standard deviation of the above data. You should get an answer of
1.32 .

Statistics 4040

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Statistics 4040

Enviado por

Direitos autorais:

Formatos disponíveis

Khodaboc

REVISION NOTES FOR STATISTICS

Example: Work out an estimate for the mean height.

The Median Value

The Median Value

Example: Draw a histogram for the following information.

When drawing a histogram, the y-axis is labelled 'relative frequency' or 'frequency

The probability of something which is certain to happen is 1.

a) What is the probability of picking a green ball?

Therefore P(8) = 5/36 .

Another way of representing 2 or more events is on a probability tree.

The AND and OR rules

There are a number of ways of representing data diagrammatically. (See also

Lower case sigma means 'standard deviation'.

When dealing with data such as the following:

the formula for standard deviation becomes:

Você também pode gostar