Escolar Documentos
Profissional Documentos
Cultura Documentos
Measures of central tendency: - mean, median, Dispersion may be defined as the spread of the Moments are the arithmetic mean of the powers of
mode; Measures of dispersion-Range, Mean items in a series around its average. A measure of deviations in a series either from its mean or any
deviation, Quartile deviation and Standard dispersion can be expressed either in the absolute arbitrary origin
deviation; Moments, Skewness and Kurtosis, Linear form or in the relative form. Absolute measures of Central moments
correlation, Karl Pearson’s coefficient of dispersion are expressed in the same unit in which If moments are estimated by taking deviation of
Correlation, Rank correlation and linear data are collected. items from its arithmetic mean, they are called
regression. Absolute measures of dispersion include: central moments
(i) Range 𝜇1 = ∑(x-M) / n or ∑f(x-M) / n
MEASURES OF CENTRAL TENDENCY Range is the difference between the largest and 𝜇2 = ∑(x-M)2 / n or ∑f(x-M)2 / n
An average is a single significant figure which sums up smallest values in a series 𝜇3 = ∑(x-M)3 / n or ∑f(x-M)3 / n
the characteristics of a group of figures. An average is RANGE = L – S 𝜇4 = ∑(x-M)4 / n or ∑f(x-M)4 / n
called a measure of central tendency since it L= largest item, S=Smallest item Raw moments
determines the central value to which the items in a L–S If moments are estimated by taking deviation of
series tend to cluster. Important measures of central Coefficient of range = ------- items from an arbitrary origin, they are called raw
tendency are:
L+S moments.
(i) Mean
Arithmetic mean is obtained by dividing the sum total
(ii) Quartile deviation 𝜇′1 = ∑(x-a) / n or ∑f(x-a) / n
of the values of items with its number. Quartile deviation is defined as the half of the 𝜇′2 = ∑(x-a)2 / n or ∑f(x-a)2 / n
∑x/n OR ∑fx /n distance between the third and the first quartiles 𝜇′3 = ∑(x-a)3 / n or ∑f(x-a)3 / n
Merits: Q3 – Q1 𝜇′4 = ∑(x-a)4 / n or ∑f(x-a)4 / n
It is rigidly defined QD = -------------
Easy to calculate 2 SKEWNESS:
Simple to understand Q1 is the value of [(n +1)/4]th item Lack of symmetry is called Skewness. If a
It is based on all observations in a series OR distribution is not symmetrical then it is called
It is less affected by sampling fluctuations (N/4 – m1) skewed distribution.
It is amenable to further algebraic treatment Q1 = L1 + -------------- X C Positively skewed distribution:
Demerits: F1 If the frequency curve has longer tail to right
Affected by extreme values the distribution is known as positively skewed
Mean cannot be calculated from frequency Q3 is the value of [3(n+1)/4]th item distribution and Mean > Median > Mode.
tables with open end classes OR
Sometimes mean will be a value not found 3(N/4 – m3)
in the series and it may be an absurd value Q3 = L1 + -------------- X C
Cannot be applied for qualitative data
F3
(i) Median
Median is the middlemost item in an arranged series. It
(iii) Mean deviation
is called positional average Mean deviation is defined as the arithmetic mean of
Value of ((n+1) /2 )th item the absolute deviations of items from an average.
OR ∑|d|
Negatively skewed distribution:
Median = L + ((N/2-m)C) / f ------
If the frequency curve has longer tail to left
Merits: n
the distribution is known as negatively skewed
Can be easily calculated (iv) Standard deviation
distribution and Mean < Median < Mode.
Simple to understand Standard deviation is the positive square root of the
Measure of Skewness
Not affected by extreme values arithmetic mean of the squares of deviations from
Karl Pearson coefficient of Skewness
Can be correctly estimated from frequency arithmetic mean.
= (Mean -Mode) / SD
tables with open end classes SD = √∑x2/n - (∑x/n)2 Bowley’s coefficient of Skewness
Can be estimated graphically OR
Most appropriate average to deal with = [(Q1 + Q3) – 2Median] / Q3 – Q1
SD = √∑fx2/n - (∑fx/n)2
qualitative data
Demerits:
n1(𝜎12+d12)+n2(𝜎 22+d22) KURTOSIS
Not based on all the items in a series COMBINED SD = √ -------------------------------- Kurtosis measures the degree of peakedness or
Not capable of further algebraic treatment n1 + n2 flatness of a frequency distribution, usually taken
More affected by sampling fluctuations A relative measure of dispersion is the ratio of a relative to a normal distribution.
If estimated from a small number of items measure of dispersion to an appropriate average When a frequency curve is more peaked than the
median need not be good representative of from which deviations are measured. normal curve, it is called leptokurtic. When it is
the given data (i) Coefficient of range more flat topped than the normal curve it is called
(L-S )/ (L+S) platykurtic. Normal curve is smooth, continuous
(ii) Mode (ii) Coefficient of quartile and bell shaped is called mesokurtic.
Mode is the most frequently occurring item in a series deviation
Merits: (Q3 – Q1) / (Q3+Q1)
Can be easily calculated and readily (iii) Coefficient of mean deviation
understood MD(M)
Not affected by extreme values
= --------
It can be calculated from frequency tables
M
with open ended classes
A mode is the most popular value in a series (iv) Coefficient of variation
and it gives the true representative value SD X 100 / mean
Demerits: Measure of Kurtosis
Mode is not rigidly defined 𝛽 = 𝜇4 /(𝜇2)2
Not based on all items in a series
When 𝛽 = 3, the curve is mesokurtic. When 𝛽 <3,
Can’t be used for further algebraic treatment
Much affected by sampling fluctuations the curve is platykurtic. When 𝛽 >3, the curve is
Some series may have more than one mode leptokurtic.
and some other may have no mode at all.
Mo = Value of the item with highest frequency
OR
Mo = L1 + (Cf2) / (f1+f2)
CORRELATION REGRESSION ANALSYSIS
The relationship between two or more variables in a Regression analysis means the estimation or
given series is called correlation and the numerical prediction of the unknown value of one variable
measurement of the degree of relationship is called (dependent variable) from the known value of the
correlation coefficient (independent variable)
Positive and Negative correlation Dependent and independent variables
Positive Correlation: The variable whose value is influenced or is to be
The correlation in the same direction is called predicted is called dependent variable and the
positive correlation. If one variable increase other is variable which influences the values or is used for
also increase and one variable decrease other is also prediction is called independent variable.
decrease. Simple and multiple regressions
Negative Correlation: When there are only two variables, the regression
The correlation in opposite direction is equation so obtained is called simple regression.
called negative correlation, if one variable is The regression analysis which studies the
increase other is decrease and vice versa relationship between more than two variables at a
Perfect Correlation time is called multiple regression.
If there is any change in the value of one variable, Linear and non-linear regression
the value of the others variable is changed in a fixed Regression is said to be linear if a unit change in
proportion, the correlation between them is said to the value of the independent variable always leads
be perfect correlation. It is indicated numerically as to a constant change in the value of the independent
+1 and -1 variable.
Perfect Positive Correlation: If regression is non-linear if the ratio of change in
If the values of both the variables are move in same the value of the independent variable to the resultant
direction with fixed proportion is called perfect change in the value of the dependent variable is not
positive correlation. It is indicated numerically a fixed ratio.
as +1. Regression lines
Perfect Negative Correlation: Regression line is a graphic technic to show
If the values of both the variables are move functional relationship between dependent and
in opposite direction with fixed proportion is called independent variables. There are two regression
perfect negative correlation. It is indicated lines,
numerically as -1. Regression line y on x
Linear and non linear Regression line x on y
Linear
Correlation is said to be linear if the ratio of change Regression equations
is constant. Regression equation y on x
Non Linear Y = a + bx
Correlation is said to be non-linear if the ratio of byx = (n∑xy – (∑x X ∑y)) /( n∑x2- (∑x)2)
change is not constant. Regression equation x on y
Simple, partial and multiple correlations X = a + by
In simple, the relationship between two variables is byx = (n∑xy – (∑x X ∑y)) /( n∑y2- (∑y)2)
considered.
In partial, we study the relationship between one
variable with one of the other variables, presuming
the other variables constant.
In multiple, the relationship between more than two
variables are studied simultaneously
Methods of studying correlation
1. Scatter diagram
Scatter diagram is a visual device to study the
nature of relationship between two variables.
2. Correlation graph
It is a graphical representation of relationship
between two variables.
F (Fisher)
It is a statistical technique to test null
hypothesis for possible rejection.
If the calculated F is greater than the tablulated
F, reject the null
Uses
• Test for equality of several
population means
• Test for equality of population
variances
• Testing significance of an observed
sample correlation ratios
• Testing significance of an observed
sample multiple correlations