Escolar Documentos
Profissional Documentos
Cultura Documentos
By G. Raja Sekhar
INTRODUCTION
The data is collected, then the sampling is done,
simultaneously the interviewees are carried out. The
collected data then goes for processing. After the data
have been processed, it is necessary that these data are
analysed.
Analysis refers to computation of certain indices or
measures along with searching for patterns of relationship
that exist among the data groups.
PROCESSING STAGE
The processing stage includes the editing, coding, classification and
tabulation of collected data that are ready to analyse.
The collected data must be arranged. In other words it means that out of all
received data some of them are useful and others not and therefore in this
step, these received data must be
Edited,
Coded,
Classified and
Tabulated.
EDITING
The purpose of editing is that careful scrutiny
of
all
collected
data
to
produce
completeness, error-free and readability.
CODING
The purpose of coding is the assigning
codes (numbers) for each category of
answers.
CLASSIFICATION
TABULATION
The purpose of tabulation is the process of
summarizing data and displaying them in the
appropriate tables that further analysis are to
be facilitated.
ANALYSIS OF DATA
Whenever the mass of data is collected the statistics
comes into account and it creates the procedures
to support processing of data and also analysis of
data.
STATISTICS IN RESEARCH
The role of statistics in research is to function as a tool in designing research,
analysing its data and drawing conclusions therefrom.
To achieve the objective of the research, we have to go a step further and
develop certain indices or measures to summarise the collected/classified
data.
Only after this we can adopt the process of generalisation from small groups
(i.e., samples) to population. If fact, there are two major areas of statistics
viz., descriptive statistics and inferential statistics.
DESCRIPTIVE STATISTICS
Descriptive statistics is the term given to the analysis of data that helps
describe, show or summarize data in a meaningful way
Descriptive statistics, allow us to make conclusions beyond the data we
have analysed or reach conclusions regarding any hypotheses we might
have made.
Descriptive statistics are very important because if we simply presented
our raw data it would be hard to visualize what the data was showing,
especially if there was a lot of it.
ARITHMETIC MEAN
Arithmetic mean is defined as the sum of the items divided by the number of
items in a series.
Arithmetic mean is the most is the widely used and practical method for the
measurement of central tendency. It is further divided into
Simple arithmetic mean
Weighted arithmetic mean
Where
= arithmetic mean
= total of the items in a series
n = number of items
=10
Indirect Method: the indirect method is used when the number of items is
very large and to simplify that data, we take the deviation from the assumed
mean. The following formula will be used for it
Where
= arithmetic mean
= frequencies
= variable or mid points of class interval frequency
N = Total Number of frequencies in series
EXAMPLE
X
20
30
40
50
60
70
No. of Students
12
20
10
fX
20
160
30
12
360
40
20
800
50
10
500
60
360
70
280
N = 60
= 41
INCLUSIVE SERIES
Different between upper limit of interval and lower limit of next interval is
noted; then half of the difference is deducted from lower limit of every
interval and the same is added to upper limit of every interval.
Example:
Class interval
Frequency
46
79
10 12
13 15
16 18
19 21
22 24
15
11
Class Interval
Frequency (f)
fm
3.5 6.5 1
6.5 9.5 3
24
9.5 12.5 7
11
77
12.5 15.5 15
14
210
15.5 18.5 11
17
187
18.5 21.5 3
20
60
21.5 24.5 2
23
46
42
609
= 14.5
Open end intervals: Open end intervals are those in which lower limit of
the first class and the upper limit of the last class are not known. In such case,
we cannot find out the arithmetic mean unless we make an assume about
the unknown limits. The assumption would naturally depend upon the class
interval.
Unequal intervals: if class intervals are not equal, make class intervals
equally, then solve the problem.
Example:
X
02
25
56
68
8 10
10 20
20 21
21 25
fm
05
2.5
7.5
5 10
11
7.5
82.5
10 15
12.5
37.5
15 20
17.5
52.5
20 25
22.5
112.5
25
292.5
= 11.7
First of all find out the product of items with their respective weights, that is
WX
Take the total of WX as
Divide the value of
by
EXAMPLE
A train run 25 km at a speed of 30 kmph and another 50 km at a speed of 40
kmph. Due to repairs of the tracks it travels at a speed 10kmph for 6 minutes,
and finally covers the remaining distance of 24 km at a speed of 60 kmph.
What is the average speed in kmph?
Solution: Time taken in covering 25 km at a speed of 30kmph = 50 minutes
and so on. Therefore taking the time as weights.
Speed in KMPH (X)
30
50
1500
40
75
3000
10
60
24
60
1400
191
6000
=
MEDIAN
When the observations are arranged in ascending or descending order of
magnitude, then the middle value is known as median of these observations.
Let x1,x2,xn be n observations arranged in the ascending order of
magnitude. Median is defined as the middle most term, that is the value of x
at the position
we can write
Me =
Me =
(when n is even)
Solution:
X = 384,391,407,522,591,672,753,777,1490
Median =
=
= 5th item
Solution:
X = 222,384,391,407,522,591,672,753,777,1490
Median =
= 5.5 item
th
item
Look just greater value which find in step 3 in the cumulative frequency
table, the value of corresponding variable is median.
EXAMPLE
Find the median from the following data
Income
100
150
80
200
250
300
No. of Persons
12
13
10
15
Cf
80
100
12
20
Median =
150
13
33
180
15
48
200
10
58
250
61
31th item
MERITS OF MEDIAN
It is very simple to understand.
Its calculation is very easy and simple.
It is not effected by the extreme items.
It can be represented graphically very small.
It is not suitable average for open enabled class intervals.
It deals with quality more than quantity.
DEMERITS OF MEDIAN
It needs extra labour to make the ascending or descending order of data
than other averages measures.
It does not involve all the observations at the time of calculation which
affect its relationship.
It cannot be calculated exactly in the series of even number of items.
It is very difficult to calculate at the time of presence of very small or large
numbers of items in the series.
It has no further, mathematical applicability like other methods of average.
MODE
The mode is defined to be size of the variable which occurs most frequently
or the point of maximum frequency or the point of greatest density. It is also
an important measure of central tendency.
According to Kenny and Keeping, The value of the variable which occur
most frequently in a distribution is called the mode.
Where
L1 = Lower limit of class limit
F1 = Higher value of the frequency
F0 = Preceding the value of highest frequency
F2 = Succeeding the value of height frequency
I = Difference between two variables.
EXAMPLE
X: 19, 21, 20, 19, 19, 19, 25, 3, 1, 9, 2, 8, 5, 8
Solution
19 is the mode value which occurring very frequently.
Therefore Z = 19
GROUPING MODE
When all values appear the same number of times the idea of a mode is not
useful. But you could group them to see if one group has more than the
others.
Example: {4, 7, 11, 16, 20, 22, 25, 26, 33}
In groups of 10, the "20s" appear most often, so we could choose 25 as the
mode.
EXAMPLE
Calculate the mode for the following distribution
Gross Profit as % of
sales
07
7 14
14 21 21 28 28 35 35 42 42 49
No. of co s
19
25
36
72
51
43
28
Solution:
Here, the largest frequency is 72. it lies in the class 21 28 so the model class is 21
28 and the lower limit of the model class is 21. Thus
07
19
7 14
25
14 21
36
F0
21 28
72
F1
28 35
51
F2
35 42
43
42 49
28
= 21 +
MERITS OF MODE
It is very simple to understand and easy to calculate because it is a
positional average.
This is based on quality rather than quantity.
It is least effected by the extreme values.
Where there is a large concentration of items around the value, that value is
the good representation of the items.
It is possible graphically to show the model value.
DEMERITS OF MODE
Is not a suitable measure of central tendency where the number of items is
very small.
It has no future mathematical applicability.
If we have given the data about more than two series, then it is not possible
to calculate model value.
It is not possible to find out the sum of the items by multiplying with the model
value the number or items in this measure of central.
Nominal
Mode
Ordinal
Median
Mean
Interval/Ratio (skewed)
Median
MEASURES OF SPREAD
A measure of spread, sometimes also called a measure of dispersion, is used to
describe the variability in a sample or population. It is usually used in conjunction
with a measure of central tendency, such as the mean or median, to provide an
overall description of a set of data.
Measures of spread, these are ways of summarizing a group of data by describing
how spread out the scores are. For example, the mean score of our 100 students
may be 65 out of 100. However, not all students will have scored 65 marks. Rather,
their scores will be spread out. Some will be lower and others higher. Measures of
spread help us to summarize how spread out these scores are. To describe this
spread, a number of statistics are available to us, including the range, quartiles,
absolute deviation, variance and standard deviation.
RANGE
The simplest possible measure of dispersion is the range, which is the
difference between the greatest and least level of the variables.
Range may be shown under these methods
Simple range
Inter quartile range
Percentile range and
Decline range
SIMPLE RANGE
It is the difference between the value of the smallest item and the value of
the largest item include in a distribution
Example: in the series 8, 9, 14, 10, 12, 7; range = 14 7 = 7
Coefficient of dispersion: the relative measure of the range is called the
coefficient of dispersion and is obtained by dividing the range with sum of
the extreme values
Coefficient of dispersion =
INTERQUARTILE RANGE
The interquartile range is another range used as a measure of the spread.
The difference between upper and lower quartiles (Q3Q1), which is called
the interquartile range, also indicates the dispersion of a data set. The
interquartile range spans 50% of a data set, and eliminates the influence of
outliers because, in effect, the highest and lowest quarters are removed.
Interquartile range = difference between upper quartile (Q3) and lower quartile
(Q1)
EXAMPLE
A year ago, Angela began working at a computer store. Her supervisor
asked her to keep a record of the number of sales she made each month.
The following data set is a list of her sales for the last 12 months:
34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37.
find:
The range
The upper and lower quartiles
The interquartile range
MERITS OF RANGE
Range is a very easy and simple measure to understand and calculate.
Therefore, even a layman can understand it with out any difficulty.
It is rigidly defined to some extent.
The disadvantage of using range is that it does not measure the spread of
the majority of values in a data setit only measures the spread between
highest and lowest values. As a result, other measures are required in order
to give a better picture of the data spread. The range is an informative tool
used as a supplement to other measures such as the standard deviation or
semi-interquartile range, but it should rarely be used as the only measure of
spread.