# Fundamentals of Data Analysis

Learning objectives
Concepts of data analysis. Need for data preparation techniques. Various statistical techniques for data analysis Factors that influence the selection of an appropriate data analysis strategy.

Data analysis
It is a process by which data is converted into useful information. Raw data from questionnaires is processed in some way to make it amenable to draw conclusions

The purpose of data analysis is to produce information that will help to address the problem.

## Principles of data analysis :

Avoid erroneous judgments and conclusions.
Provide a background to help interpret and understand the analysis conducted by others

## Preparing the data for analysis

The major data preparation techniques are: Data editing. Coding. Data entry. Statistically adjusting the data.

Data validation
It is concerned with the process of determining, to the extent possible, whether a surveys interviews or observations were conducted correctly and are free of fraud or bias.

## The process of validation covers: 1. Fraud. 2. Screening. 3. Procedure. 4. Completeness.

Data editing
It is the process whereby the raw data are checked for mistakes by either the interviewer or the respondents. The researcher can check several areas of concern:
Asking the proper questions. Accurate recording of answers. Correct screening of respondents. Complete and accurate recording of openended question.

Coding
It is grouping and assigning value to various responses to the questions on the survey instrument. Specifically, coding is the assignment of numerical values to each individual response for each question on the survey. Typically, the codes are numerical (a number from 0 to 9) since these are easy to input.

Data entry
It is the procedure used to enter the data into the computer for subsequent data analysis. It includes those tasks involved with the direct input of the coded data into a software package that enables the research analyst to manipulate and transform the raw data into useful information. One critical step is to ensure that the data entered is correct and error free.

The data can be adjusted to enhance the quality of the data. Weighing. Variable re-specification. Dummy variable.
Scale transformation.

Weighing: It is the procedure by which each response in the database is assigned a number according to some perspective rule.

Variable re-specification: It is a procedure in which the existing data are modified to create new variables, or large number of variables are collapsed into fewer variables.

Dummy variables: These are used extensively for re-specifying categorical variables.

Scale transformation: It involves the manipulation of scale values to ensure comparability with other scales or otherwise make the data suitable for analysis.

Frequency distribution
Reports the number of responses that each question received. Simplest way of determining the empirical distribution of the variable. It organizes the data into classes or groups of values, and shows the number of observations from the data set that falls into each class. It can be represented by percentage breakdown of the various categories and visual bar graph presentation known as a histogram.

Descriptive Statistics

## Measures of central tendency

Mean

fX
i

X
where

I=1

n
fi = the frequency of the ith class
Xi = the midpoint of that class h = the number of classes n = the total number of observations

Descriptive Statistics

Median It is the middle value of the distribution when the distribution is ordered in either an ascending or a descending sequence. Mode It is the most common value in the set of responses to a question i.e. the response most often given to the question.

Descriptive Statistics

Measures of Dispersion
Standard deviation It describes the average distance of the distribution values from the mean. Calculated by: subtracting the mean of a series from each value in a series. squaring each result. summing them. dividing by the number of items minus 1. and taking the square root of this value.

Descriptive Statistics

Standard deviation:

S =
where

n-1

I=1

(Xi - X) 2

## S = sample standard deviation

Xi = the value of the ith observation X = the sample mean n = the sample size

Descriptive Statistics

## Measures of Dispersion Variance

The sums of the squared deviations from the mean divided by the number of observations minus one. The same formula as standard deviation with the square-root sign removed.

Range
It defines the spread of the data. The distance between the smallest and the largest values in a set of responses.

Statistical techniques
Various statistical techniques used are : Univariate, involving single variable at a time. Bivariate, involving two variables at a time. Multivariate, involving three or more variables at a time.

Univariate Techniques
These are the Statistical techniques appropriate for analyzing data when there is a single measurement of each element in the sample or, if there are several measurements on each element, each variable is analyzed in isolation.

## An overview of Univariate techniques

Univariate techniques

## Non parametric statistics

Parametric statistics

One sample

One sample

Independent

T-test Z-test

Independent

Dependent

Dependent

## T-test Z-test ANOVA

Multivariate Techniques
Statistical techniques suitable for analyzing data when there are two or more measurements on each element and the variables are analyzed simultaneously. Multivariate techniques are concerned with the simultaneous relationships among two or more phenomena.

Multivariate techniques
Multivariate techniques

Dependence techniques

Interdependence techniques

## Multi independent variable

Focus on variables

Focus on objects

Factor analysis

## Factors that influence the choice of statistical technique

The factors that influence the selection of the appropriate technique for data analysis are: Type of data. Research design. Assumption underlying the test statistic and related considerations.

