Escolar Documentos
Profissional Documentos
Cultura Documentos
Lecturer in Education M.C.T. Training College Malappuram www.sathitech.blogspot.com www.mctinfotech.blog.com sathikalanilayam@gmail.com Mob : 09562253564
Satheesh
Statistics in Education
Statistics - Definition
statistics may be defined as the collection, presentation, analysis and interpretation of numerical data - Croxten & Cowden
Statistics
The term Statistics seems to have derived from the Latin word status or Italian word statista or the German word statistik. Each of which means Political state
Presentation (Tabulation)
Analysis
Interpretation
NATURE OF DATA
Continuous
discrete
The marks obtained by 25 students of a class in Mathematics, out of 10 marks are as followsconstruct a Discrete frequency distribution
MARKS TALLY No. OF STUDENTS
1, 7, 6, 5, 9 10, 5, 6, 8, 2
7, 8, 3, 8, 3
1, 4, 4, 5,6 4, 3, 2, 6, 7
1 2 3 4 5 6 7 8 9 10
TOTAL
2 2 3 3 3 4 3 3 1 1
25
20 29 30 39
III
IIII
3
5
40 49 IIII IIII II
12
10 7
50 -59
III
3
40
TOTAL
TYPES OF CLASSES
Inclusive Classes 0 9 10 19 20 29 30 39 40 49
Lower limit
Included
Exclusive Classes 0 10 10 20 20 30 30 40 40 50
Lower limit
Included
Upper limit
Included
CUMULATIVE FREQUENCY DISTRIBUTION LESS THAN CUMULATIVE FREQUENCY DISTRIBUTION GREATER THAN CUMULATIVE FREQUENCY DISTRIBUTION
5 10 10 15
15 20 20 25
7 12
5 2
Answer
Class Frequency
<CF
05 5 10 10 15 15 20 20 25
4 7 12 5 2
4
(4+7) (4+7+12) (4+7+12+5) (4+7+12+5+2) 11 23 28 30
5 10 10 15
15 20 20 25
7 12
5 2
Answer
Class Frequency
>CF
05 5 10 10 15 15 20 20 25
4 7 12 5 2
(4+7+12+5+2)
(7+12+5+2) (12+5+2) (5+2)
30
26 19 7 2
k = 1 + 3.22 log N,
k - number of classes
N is the total observations
Histogram
Graphical representation of continuous (Grouped) frequency distribution It is a graph including vertical rectangles with no space between the rectangles. The class interval taken along the horizontal axis (X axis) and the respective class frequencies are taken on the vertical axis (Y axis) using suitable scales of each classes.
For each class a rectangle is drawn with base as width of the class and height as proportional to the class frequency.
The area of each rectangle will be proportional to or equal to respective frequencies of the class
The total area of the histogram will be proportional or equal to the total frequency of the distribution.
Histogram
Class 0 10 10 20 20 30 30 40 40 50 Frequency 4 10 21 9 4
50 60 Total
2 50
10 20 30 40 50 60
Bar Diagram
It is graphical representation of the data which can be divided into different categories. These diagrams are generally drawn in the shape of horizontal or vertical bars.
The bars should be of equal breadth and the height of the bars should be proportional to the magnitude of each quantity. Leave equal space between the bars.
No.of Students
40 50
45 25 180
Frequency Polygon
It is a graphical representation of continuous frequency distribution It can be constructed by drawing Histogram or directly plotting the points To draw Frequency Polygon by drawing Histogram, join the mid-points of the top of the rectangles of the Histogram using straight lines
Frequency Polygon can also drawn by joining the consecutive points, plotted by taking the mid-points of the classes on X-axis and corresponding frequencies on Y-axis. The end points are extended at each end and to join the X-axis. the total area under the Frequency Polygon is equal to or proportional to (numerically) the total frequency of the given distribution.
Frequency
4
10 20
20 30
10
21
30 40
40 50 50 60
9
4 2
Total
50
First Method
Second Method
Third Method
Frequency Curve
It is a graphical representation of continuous frequency distribution It can be constructed by drawing Histogram or directly plotting the points To draw Frequency curve by drawing Histogram, join the mid-points of the top of the rectangles of the Histogram using smooth curve by free hand method
Frequency curve can also drawn by joining the consecutive points, plotted by taking the mid-points of the classes on X-axis and corresponding frequencies on Y-axis. The end points are extended at each end and to join the X-axis. The total area under the Frequency Curve is equal to or proportional to (numerically) the total frequency of the given distribution.
Frequency
4
10 21 9 4 2
Total
50
First Method
Second Method
Third Method
Construct Less than Cumulative Frequency Curve for the following frequency distribution
Class 0 10 10 20 20 30 Frequency <CF 5 12 28 5 17 45
30 40
40 50 50 60
40
21 10
85
106 116
60 - 70
120
Construct Greater than Cumulative Frequency Curve for the following frequency distribution
Class 0 10 10 20 20 30 30 40 40 50 50 60 60 - 70 Frequency >CF 5 12 28 40 21 10 4 120 115 103 75 35 14 4
Pie Diagram
Pie diagram consist of circle whose area proportional to the magnitude of the variable they present The component part of the variable represented by means of sectors of the circle The area of the sector proportional to the frequencies of the component parts of the variable. If A1 and A2 are the total magnitude of the two variables, to represent the data by means of Pie diagram, draw two circles with radius r1 and r2 given by
Third class
Failure
45
25
Category
Distinction First class
No. of Students 20 40
Second class
Third class Failure Total
50
45 25 180 360
500
Assignment
Diagrammatic and Graphic representation of Data - Merits and Limitations
ARITHMETIC MEAN
Case I: Ungrouped Data (Discrete data)
Let x1, x2, x3, ..xn are N observations
Sum of the observations Then A.M (X) = Total No. of observations
x1+x2+x3+xn = N
A.M=
Sx
N
A.M =
f1x1+f2x2+f3x3+fnxn
f1+f2+f3+fn
S fx A.M =
Sf
x
5
f
3
fx
15
6 7 8 9
TOTAL
8 12 10 7
6
7 8 9
8
12 10 7
48
84 80 63
fx =290
40
N = 40
S fx
A.M =
290
=
Sf
40
= 7.25
Home work
Observations Frequency
15 16 17 18 19
TOTAL
5 10 14 12 9
50
Two Methods
Direct Method Assumed Mean Method
Direct Method
A.M =
Calculate A.M
Class f Class
0 - 10 10 20 20 - 30
f
3 12 20
mid-value (x)
fx
15 180 500
0 - 10 3 10 20 12 20 - 30 20 30 - 40 10 40 - 50 5
TOTAL
5 15 25
30 - 40
40 - 50
10
5 N=50
35
45
350
225
50
S fx = 1270
A.M =
1270 = = 25.4 50
Home work
Class
0-9 10 19 20 - 29 30 - 39
f
3 10 13 9
40 - 49
TOTAL
5
40
Calculate A.M
Class
0 - 10 10 20 20 - 30 30 - 40 40 - 50
TOTAL
f
3 12 20 10 5
50
Answer
Class
0 - 10 10 20
20 - 30 30 - 40 40 - 50
f
3 12
20 10 5 N=50
mid-value (x)
5 15
d
-2 -1
0 1 2
fd
-6 -12
0 10 10
25 - A
35 45
=2
= 25 +
= 25.4
MEDIAN
Median is defined as the middle most observation when the observations are arranged in ascending or descending order of magnitude.
CALCULATION OF MEDIAN
Discrete Data & Discrete Frequency
Calculate Median: 8, 12, 16, 10, 9, 6, 17, 20, 25 Data in Ascending order of magnitude: 6, 8, 9, 10, 12, 16, 17, 20, 25 Here N = 9, Then Median = observation = 5th observation =
12
5
6 7 8 9
3
8 12 10 8 = = Observation
Median =
Observation
Total
41
Actual lower limit of Median Class (Median Class Class in which ( observation falls N Total Frequency cfm Cumulative frequency Up to Median Class fm frequency of Median Class c Class interval
Calculate Median
Class 05 5 10 10 15 15 20 20 25 Total Frequency 5 10 15 12 8 50
Answer
Class 05 5 10
Median Class
Frequency 5 10
<CF 5 15
10 15 15 20
25 25
15 12
8
30 42
50
Median = lm + (
= 10 + (
) c
) 5 ) 5
Total
50
= 10 + ( = 13.33
N/2
Median
Median
Median Merits
It is rigidly defined It is easy to understand Simple to calculate It can be located by mere inspection It is not affected by extreme values It can be calculated for a distribution having open end classes It can be determined graphically.
Median demerits
It is not based on all observations Median is a non-algebric measure and hence not suitable for further algebric treatment It is cant be used for computing other statistical measures such as Standard Deviation, Coefficient of correlation etc. When there are wide variations between the values of different scores, a Median may not be representative of the distribution.
MODE
Mode is the value of the variable which occurs most frequently.
In certain cases there may be Two or Three Modes in a distribution. When there are Two Modes we call it Bi-Modal Distribution If there are Three Modes, we call it Tri-Modal Distribution.
Calculation of Mode
Discrete Distribution
Calculate Mode
Observation
5 6 7 8 9 Total
frequency
3 8 12 10 8 41
Mode = 7
Continuous Distribution
Mode =lm + (
lm
f1
) c
f2
c
Actual lower limit of Modal Class (Modal Class Class having maximum frequency Frequency of the class just below the Modal Class Frequency of the class just above the Modal Class Class interval
Calculate Mode
Class 80 84 75 79 70 74 65 69 60 64 55 59 50 54 45 40 Frequency 4 8 8 12 9 7 5 3 f1 f2 Modal Class
Here lm = 64.5 f1 = 9 f2 = 8 C= 5
Mode = lm + ( ) c
=64.5 + ( = 66.9
)5
Mode Merits
Easy to locate Not affected by extreme values Can calculate the Mode for the distribution having open-end classes, if open-end classes have less frequency It is useful in business matters.
Mode demerits
It is not based on all observations It is not capable for further algebric treatment A slight change in the distribution may extensively disturb the Mode As there be 2 or 3 modal values, it becomes impossible to set a definite value of a Mode.
Measures?
Consider the Marks of two Groups Group 1 8, 12, 11, 12, 10, 8, 9, 11, 12, 10, 8, 10, 9, 10, 12, 8, 10, 9, 10, 11 Mean = 10 Group 1 15, 2, 8, 12, 4, 17, 20, 6, 2, 18, 16, 0, 3, 9, 6, 10, 15, 17, 9, 11 Mean = 10
If we compare two groups, merely on the basis of the Arithmetic Mean, there is a possibility of being mislead to incorrect judgment
MEASURES OF DISPERSION
(MEASURES OF VARIABILITY)
MEASURES OF DISPERSION
The statistical measures used to determine the extent of dispersion of the scores from the central value (Arithmetic Mean) of the distribution are known as Measures of Dispersion Measures of Dispersion measures the spreading of observations from the central value of the distribution.
Standard Deviation
Quartile Deviation
Standard Deviation
Standard Deviation is the square root of the average of the squares of the deviations of the scores taken from the mean. SD denoted by the symbol (sigma).
S.D
Consider the Marks of two Groups Group 1 8, 12, 11, 12, 10, 8, 9, 11, 12, 10, 8, 10, 9, 10, 12, 8, 10, 9, 10, 11 Mean = 10 Group 1 15, 2, 8, 12, 4, 17, 20, 6, 2, 18, 16, 0, 3, 9, 6, 10, 15, 17, 9, 11 Mean = 10
S.D = 1.38
S.D = 5.93
Ungrouped Distribution
Answer
Calculate SD
Score Frequency 20 24 5 25 29 10 30 34 25 35 39 30 40 44 20 45 - 49 10 N=100
S.D =
Answer
For a large distribution, Short-cut method (Assumed Mean Method) can be used to calculate Standard Deviation
f
2 3 2 6 8 8 7 5 9 N = 50
Answer
class
45 - 49 40 - 44 35 - 39 30 - 34 25 - 29 20 - 24 15 - 19 10 - 14 5-9
f
2 3 2 6 8 8 7 5 9 N = 50
x
47 42 37 32 27 22 17 12 7
d
5 4 3 2 1 0 -1 -2 -3
d2
25 16 9 4 1 0 1 4 9
fd
10 12 6 12 8 0 -7 -10 -27 fd = 4
fd2
50 48 18 24 8 0 7 20 81 fd2 = 256
MEAN DEVIATION
(AVERAGE DEVIATION) Mean Deviation is the average of the deviations of the scores taken from the Mean
It may be calculated by taking the deviations of each of the scores from the mean and finds the average of these scores. Deviations may ve or +ve, so take absolute value of deviations.
Answer
= 15
Score (x) 8 10 12 14 16 18 20 22
7 5 3 1 1 3 7 8
Discrete Distribution
22 27 32 37 42 47
Answer
Score (x) 22 27 32 37 42 47 f 5 10 25 30 20 10 fx 110 270 800 1110 840 470
14 19 4 1 6 11
AM =
= 3600 100
= 36
N=100 fx =3600
=520
Continuous Distribution
Answer
Class 20 - 24 25 29 30 34 35 39 40 44 45 - 49 Score (x) 22 27 32 37 42 47 f 5 10 25 30 20 10 fx 110 270 800 1110 840 470 fx =3600 14 19 4 1 6 11 70 90 100 30 120 110
AM =
= 3600 100
= 36
N=100
=520
Quartile: Any of three points that divide an ordered distribution into four parts each containing one quarter of the scores.
Lower Quartile (first quartile) Q1: first point of division of observations which have been grouped into four equal-sized sets based on their statistical rank. Upper Quartile (Third quartile) Q3: Third point of division of observations which have been grouped into four equal-sized sets based on their statistical rank. Second Quartile Q2: Second point of division of observations which have been grouped into four equal-sized sets based on their statistical rank. Second Quartile is called Median
Continuous Distribution
Class 30 35 Frequency 10
35 40
40 45 45 50
16
18 27
50 55
55 60 60 65
18
8 3
Answer
Class Frequency <CF
30 35
Q1 Class 35 40
10
16
10
26
40 45
45 50
Q3Class 50 55
18
27
44
71
18
8
89
97
55 60
60 65
100
.68
Range is the difference between the highest and lowest scores in a Distribution
RANGE
Discrete Distribution
Observation frequency 5 3 6 8 7 12 8 10 9 8 Total 41
Range (R) = H L
=9-5
=4
continuous distribution
In a continuous distribution, Range is the difference between the upper limit of the highest class and lower limit of the lowest class
Class
10 20 20 - 30 30 - 40 40 - 50
Frequency
12 20 10 5
Range (R) = H L
= 50 - 10 = 40
CORRELATION
Correlation may be defined as the relationship between two variables. There are three types of correlation Positive correlation Negative correlation Zero correlation
COEFFICIENT OF CORRELATION
The ratio indicating the degree of relationship between two related variables is called the coefficient of correlation.
It indicates the nature of the relationship between two variables. It predicts the value of one variable given the value of another related variable. It helps to ascertain the traits and capacities of pupils.
Properties of Correlation
For a perfect positive correlation, the Coefficient of Correlation is +1 and for a perfect Negative correlation, the Coefficient of Correlation will be -1. Perfect positive or Negative correlation is possible only in Physical Science. In a Social Science like Education, the correlation between two variables will lie within the limit +1 and -1 Positive correlation varies from 0 to +1 and Negative correlation varies from 0 to -1 Zero correlation indicates that there is no consistent relationship between two variables.
Rank Correlation
Spearman who for the first time measures the extent of correlation between two set of scores by the method of Rank Difference
Gopal
Mohammed
35
50
54
66
Answer
Name of Students Score in Maths Score in Physics Rank in Maths (R1) Rank in Physics (R2) Rank Difference
(D=
1
1 1 0 0 1
D2
1
1 1 0 0 1
Nikhil
Santhosh John Jenna Gopal Mohammed
45
53 67 40 35 50
68
76 70 64 54 66
4
2 1 5 6 3
3
1 2 5 6 4
Answer
Height of Father(h1) 65 66 67 67 68 69 70 72 h1=544 Height of Son (h2) 67 68 65 68 72 72 69 71 h2=552 deviation Deviation from Mean from Mean x y -3 -2 -2 -1 -1 -4 -1 -1 0 3 1 3 2 0 4 2 x2 9 4 1 1 0 1 4 16 x2=36 y2 xy 4 6 1 2 16 4 1 1 9 0 9 3 0 0 4 8 y2=44 xy =24
r=
x, y : first set of scores and the second set of scores N : Number of scores in a set
Answer
students
A B C D E F G H I J
x2
y2
xy
64 36 16 49 9 36 25 16 25 36 x2 = 312
81 49 9 36 25 36 25 25 16 25 y2 = 312
72 42 12 42 15 36 25 20 20 30 xy = 314
= 0.76
These special features of the Normal Distribution will be seen in the dispersion of scores regarding natural phenomena as intelligence, height, weight etc. in a population. This characteristic of Normal Distribution is found to be true to a great extent with regard to achievement scores of a well conducted examination, if the number taking the examination is sufficiently large. Hence properties of Normal Distribution and Normal Distribution curve are of great importance in the study of group and their characteristics with respect to given variables.
All the three Measures of Central Tendency, viz Mean, Median, and Mode of a normal curve coincide, that is, they are all equal. The first and third quartiles are equidistant from the median. The ordinate at the mean is the highest. The height of other ordinates at various sigma distances from the mean are also in fixed relationship with the height of the mean ordinate. The curve will gradually go on the nearer to the base line, but it will never meat the base line. For practical purpose, the curve may be taken to end at points -3 to +3 distance from the mean, because this region will cover almost 100% of the cases. Between -1 and -1, there are 68.26% of the frequencies Between -2 and -2, there are 95.44% of the frequencies Between -1 and -1, there are 99.73% of the frequencies