Você está na página 1de 43

A Talk

On
Data Processing and Analysis of Data
(Research Methodology)

Introduction
The data has to be processed and
analyzed for the purpose of research
plan
This is essential for scientific study and
comparisons.
Processing implies

Editing
Coding
Classification and
Tabulation

Analysis implies
Computation of certain measures
Searching for patterns of relationships
that exists among data groups.

Processing Operations
1. Editing
The process of examining the collected
raw data to detect errors and omission
and also correct these.
It involves scrutiny of the completed
questionnaires and/or schedules.
There are two variations of editing

Field editing.
Central editing.

Field editing
Consists of review of the reporting forms by
the investigator for completing (rewriting)
what has been written in abbreviated form
at the time of recording the response.
This editing is expected to be done as soon
as possible after the interview.
While doing field editing the investigator
should not try to correct errors or omissions
by simply guessing the suitable option.

Central editing
Takes place when all forms or schedules
have been completed and returned to
office.
All the forms should be edited by a
single editor in a small study or a team
of editors in case of large inquiry.
Corrections are allowed in this editing.

There are certain points to be kept in view


while performing their work
a) Editors should be familiar with instructions
given to the interviewers and coders.
b) Single line should be drawn to cross out any
information.
c) Entries should be made in some distinctive
color and in standardized form.
d) They should initial all answers which they
change or supply,.
e) Editors initials and the date of editing should
be placed on each completed from or schedule.

2. Coding
Refers to the process of assigning
numerals or other symbols to answers
so that the response can be put into
limited categories.
Necessary for efficient analysis.
Coding decision is usually taken at the
design stage of the questionnaire.

3. Classification
Individual Data should be reduced into
homogeneous
groups
to
get
meaningful relationships.

classification is the process of


arranging data in groups or classes on
the
basis
of
some
common
characteristics.

Broadly there are two types of


classification based on the nature of
the phenomena involved.
a) Classification according to attributes.
b) Classification
interval.

according

to

class-

Classification according to attributes:


Data are classified on the basis of
common
characteristics
either
descriptive or numerical.
Descriptive characteristics refer to
qualitative phenomenon which cannot
be measured quantitatively
Data obtained this way is known as
statistics of attributes.

This classification can be either simple


or manifold
In Simple classification, we consider only
one attribute and make two classes; one
possessing the considered attribute and
the other devoid of it.
In Manifold classification, more than one
attributes are considered and data is
divided into number of classes.

Classification according to classinterval:


Data relating to income, production, age
etc are known as statistics of variables
and are classified on the basis of class
intervals.

4. Tabulation
Tabulation refers to the process of
summarizing the raw data and
displaying the same in compact form.
It is essential because:

It conserves space and reduces


explanatory statements to minimum.
Facilitates the process of comparison.

the

Elements/Types of Analysis
In case of survey or experimental
data, analysis involves
estimating the values of unknown
parameters of the population,
Testing of hypotheses for drawing
inferences.

Categories of analysis:
a)Descriptive
b)inferential

Correlation analysis:
Studies the joint variation of two or
more variables for determining the
amount of correlation between two or
more variables.

Casual analysis:
Studies how one or more variable affect
changes in another variable.

Multivariate analysis:
All
statistical
methods
which
simultaneously analyze more than two
variables on a sample of observations.
It involves:
a)
b)
c)
d)

Multiple regression analysis


Multiple discriminant analysis
Multivariate analysis of variance
Canonical analysis

STATISTICS IN RESEARCH
Statistics in research functions as a tool
in designing research, analyzing its data
and drawing conclusions there from.
The important statistical measures used
to summarize the survey/research are:
1) Measure of central tendency or statistical
averages.
2) Measures of dispersion

3. Measures of asymmetry(skewness)
4. Measures of relationship
5. Other measures

Measure of Central
Tendency
It tells the point about which items have a
tendency to cluster.
Mean, Median ,Modes are the most popular
averages.
Mean is also known as arithmetic average
Median is the value of the middle item of
series when it is arranged in ascending or
descending order.
Mode is the most commonly or frequently
occurring value in a series.

Measure of Dispersion
It is used to give an idea about the
scatter of the values of items of a
variable in the series around the true
value of average.
Important measures of dispersion are:
a) Range
b) Mean deviation and
c) Standard deviation

Range
Is the simplest possible measure of dispersion
It is defined as the difference between the
values of the extreme items of a series.

Mean deviation
It is the average of difference of the values of
items from some average of the series.

Standard deviation
Most widely used measure of dispersion
Denoted by the symbol

Standard deviation is defined as the


square root of the average of squares of
deviations.

Where

Measure of Asymmetry
When the distribution of the elements in
a series happens to be perfectly
symmetrical then we get the following
type of curve. Technically such curves
are described as normal curve.

If the curve is distorted, it is said to exhibit asymmetrical


distribution which indicates the presence of skewness.

Where

Measures of Relationship
In context of bivariate and multivariate
population, it is required to know the
relation of the two or more variables in
the data to one another.
These association/correlation and causeand-effect
relationship are studied
using correlation technique and the
technique of regression

In case of bivariate population:


Correlation can be studied through:
a) Cross tabulation
b) Charles
Spearmans
coefficient
correlation
c) Karl Pearsons coefficient of correlation

of

Cause-and-effect relationship can be


studied through simple regression
technique.

1. Cross tabulation:

Useful when the data are in nominal form


Classify each variable in two or more
categories and then cross classify the
variables in these categories.
The interaction between them can be as
follows:

Symmetrical
Reciprocal
Asymmetrical

In a symmetrical relationship the two


variables vary together.
In reciprocal relationship the two variables
mutually influence or reinforce each other.
In an asymmetric relationship one variable
(independent variable) is responsible for
another variable (dependent variable).

2. Charles
Spearmans
correlation:

coefficient

of

This technique deals with ordinal data where


ranks are given to the different values of the
variables
The objective is to determine the extent to
which the two sets of ranking are similar of
dissimilar.

3. Karl Pearsons coefficient of


correlation:
Most widely used method to measure
the degree of relationship between two
variables.

Simple regression analysis:


Regression is the determination of a
statistical relationship between two or
more variables, where one variable is
the cause of the behavior of another
variable.
If X is the independent variable and Y is
the dependent variable then, the
regression equation of Y on X is given as
below

In case of multivariate population:


Correlation can be studied through:
a)coefficient of multiple correlation.
b)coefficient of partial correlation.

Cause-and-effect relationship can be


studied through multiple regression
equations.

1. Multiple Correlation and Regression


When there are two or more
independent
variables
then
the
analysis concerning relationship is
known as multiple correlation
The
equation
describing
such
relationship is known as multiple
regression equation.

In the context of two independent


variables and one dependent
variable the equation can be given
as:

Partial correlation:
Partial correlation measures separately
the relationship between two variables
such that the effect of other related
variable is eliminated
In other words the aim is at measuring
the relation between a dependent
variable and particular independent
variable by holding all other variables
constant.

Other Measures
1. Index number:
Used when the series are expressed in
different units.
In such scenario the series is converted
into series of index numbers.
For example the given figures can be
expressed in terms of percentage.

2.

Time- Series Analysis:

When the data collected relates to


some time period concerning a given
phenomenon, particularly in economic
and business scenario, such data are
labeled as Time-Series
Factors affecting such series are
I.
II.

Secular trend (T) : changes taking place at long


duration of time
Short time oscillations: changes taking place at
short duration of time

Short time oscillation are affected by the


following factors:
a) Cyclic fluctuations (C): the fluctuations as a
result of business cycles.
b) Seasonal fluctuations (S): these fluctuations
are of short duration occurring at a regular
sequence at specific interval of time.
c) Irregular fluctuations (I): such fluctuations
takes place at completely unpredictable
fashion.

For analyzing time series there are


two models:
a) Multiplicative model
b) Additive model
Multiplicative model assumes that the
various component interact in a
multiplicative manner to produce the
given values of the overall time series
and can be stated as;

The additive model considers the total


of various components resulting in
the given values of the overall time
series and can be stated as

Você também pode gostar