SW 4 Quantitative Research Methodology

SW 4 Quantitative Research
Methodology
Notes
Topics
What is quantitative research

Research Process Overview
Research Proposal Components
Problem Formulation
Literature Review
Objectives
Concepts, Variables, Levels of Measurement
Research Design and Types of Research Design

Research Ethics
Sampling
Tools for Data collection
Hypothesis and Hypothesis Testing
Data Analysis
Univariate and Bivariate Data Analysis
Graphic Representation of Data
Descriptive Statistics - Measures of Central Tendency,
Dispersion and Chi-Square test
Inferential Statistics
Correlation
T-test and ANOVA
Linear Regression
Report Writing
Steps of the Research Process

o
o
o
o
o
o
o
o
Selection of Topic (Problem identification/problem

statement)
Formulate Research Questions, Study objectives
Conceptualize research design
Construct measure/instrument/tool
Collect data
Clean & Analyze data
Interpret data
Inform others (Write research report, disseminate findings)
Selecting Researchable Topics

o
o
o
-
Research topic is a concept, subject or issue that can be

studied through research
Research topics in social research often- people, problems,
programmes, phenomena
Developing research questions
They can be about local or global phenomenon
Very specific or more general
Can focus on the past, present or future
Can be related to social reality or seek answer to social
problems
Can be inductive or deductive in nature
Aim at exploring new knowledge or seek to fill gaps in
existing knowledge
Sources of Research Questions

o
o
o
o
Values and Science

Personal factors-intellectual curiosity
Social, political and economic climate
Research funding
Developing Researchable Question

Literature Review
o Feasibility
-Access
-Time & money
-Expertise
o
Literature Review
Based on the assumption that knowledge accumulates & that we
learn from & build on what others have done.
Goals of Literature Review
To show familiarity & establish credibility of researcher & the
importance of the problem
Provide theoretical background to the study
To show the path of prior research & its linkage with the current
project
To learn & improve research methodology, measures used
To integrate & summarize what is known in an area+ identify gaps
To learn from others & stimulate new ideas
To contextualize the findings of the study & integrate them into the
existing body of knowledge once data collection & analysis is
complete
Types of Reviews
o
o
o
o
o
o
Self study Reviews increase the readers confidence

Context Review- Place specific project in larger context
Historical Review- trace development of an issue over time
Theoretical review- compare how different theories address
the issue
Methodological Review- different methodologies used by
earlier researchers
Integrative Reviews- Summarizing what is known at a point in
time
Sources of Research Literature

o
o
o
o
o
o
o
Scholarly journals
Books
Dissertations
Government documents/Policy reports
Conference papers/ Working papers,
monographs
Web search- www.scholar.google.com
Electronic databases- Sagepub, LexisNexis,
ERIC, JSTOR etc.
Conducting a Systematic Review

o
o
o
Define and refine a topic

Choose focused research questions
Locate research reports, books, documents,
policy reports, conference papers
Present it in a logical sequence
Fundamental Concepts of Social Research

o
o
o
o
o
Objectives
Concepts
Variables
Hypothesis
Assumptions
Selection of Topic
o
o
o
o
o
o
Personal experience
Curiosity
The state of knowledge in the field
Solving a problem
Personal values
Everyday life
Techniques for Narrowing a Topic

1. Published literature is an excellent source of
ideas for research questions2. Talk over ideas to others
3. Apply to a specific context
4. Define the aim and desired outcome- type of
study
5. State specific objectives
Nature and Use of concepts

o
o
o
o
o
o
A concept is abstracted from many sense

impressions or percepts. Concept may be subjective
and may vary markedly from individual to individual
Concepts are foundation of all human
communication and thought
Science requires a more precise communication
Problem in definition and communication
Scientific terms may have different meaning in other
frames of reference
With evolution of knowledge, concepts may be
redefined
Concepts
o
o
o
o
o
o
o
o
Concepts are words or signs that share common

characteristics
Concepts can be concrete and easily measurable or complex
and difficult to measure
Each science develops its own terms or concepts for
communicating its findings
Concepts have meaning only within some frame of reference,
some theoretical system
Concepts are building blocks of a theory
Natural science concepts are often expressed in symbolic
form
Most social science concepts are expressed in words
Concepts are everywhere and used all the time
Concepts
o
o
o
Concepts have two parts: Symbol- word or term and

Definition
Concepts are created from personal experiences,
creative thought or observation
Social science concepts form a specialized language
or jargon
Every field of science has its own jargon
Concepts can refer to concrete objects or abstract
phenomenon
Concepts
o
o
o
o
o
o
Concrete concepts- school, age, height, income, housing,

physical space etc
Abstract concepts- Family, beliefs, social control, intelligence,
personal space etc.
Scientific concepts are more precisely defined than concepts
used in day to day communication.
We rarely use concepts in isolation. They form
interconnected groups or clusters.
Theories contain collection of associated concepts that are
consistent and mutually reinforcing
They can be one-dimensional or complex having multiple
dimensions
Conceptual and Operational

Definitions
o
Conceptual definitions are abstraction,

articulated in words, that facilitate
understanding. definitions in the
dictionaries, or meanings understood in day
to day conversation
Operational definitions consists of a set of
instructions on how to measure a variable
that has been conceptually defined
Operational Definitions of Concepts

o
Operational definitions ensures that other

scientists will understand the terms in the
same way as the researcher
It defines the phenomenon with greater
precision and may leave out some elements
of an older concept.
Operational definition makes it easy to
measure or study an abstract concept
Variables
o
Measurability is the main difference between

a concept and a variable.
Variable take on more than one value and
those values can be numbers or words
Social research is based on defining variables,
looking for associations among them and
trying to understand whether and how one
variable causes another
Dimensions of Variables
o
o
o
o
Variables can be one-dimensional or

multidimensional.
Examples of one dimensional variables- age,
height, weight, birth order, marital status
Multidimensional variables- stress, wealth,
attitude,
A variable must have at least two categories
Types of Variables
o
Independent variable- The cause variable or

the one that identifies forces or conditions
that act on something else is an independent
variable
Dependent variable- The variable that is the
effect or is the result or outcome of another
variable is the dependent variable
Intervening variable- This comes between
the independent and dependent variable &
links them to each other
Variables
o
Not simple to decide whether a variable is

dependent or independent
Simple theories have one dependent and one
independent variable
Complex theories can contain dozens of
variables with multiple independent,
dependent and intervening variables
Levels of Measurements
The categories of a variable should be exhaustive &
mutually exclusive
o Nominal variables The values comprise of a list of
names (religion, states, occupation)- Qualitative
measurement-- involves only classification
o Ordinal variable- the categories have names and
these values can be rank ordered (socio economic
class, opinions)-- involves classification+rank
ordering
o All ordinal variables can be treated as nominal
variables but not the other way round.
Levels of Measurements
o
Interval variables They have all the properties of

nominal and ordinal variables They are mutually
exhaustive and mutually exclusive list attributes .
There is a standard distance between the adjacent
categories.
(temperature,
height,
weight,
measurement scales)
Ratio variables: have names, the categories can be
rank ordered, the adjacent categories have a standard
distance & one of the categories have a true zero
point. E.g. age, income
Types of Variables
Categorical variables- measured at nominal or
ordinal levels where variables are divided into
multiple categories
Continuous variables- have continuity in
measurement and are measured at an
interval or ratio level.
Units of Analysis
Populations of people
o Farms
o Communities
o Myths
o Cities
o Countries
Always collect data on the lowest unit of analysis. It is
easy to aggregate data collected on individuals but
not possible to disaggregate data collected on
groups
o
Hypotheses
o
A hypothesis is a proposition to be tested or a

tentative statement of a relationship between
two variables.
Hypothesis is used to test the direction or
strength of a relationship between variables
3 types of relationships between variableso
o
o
Positive
Negative
Curvilinear
Hypothesis
Criterion of a good hypothesis
Conceptual clarity
Should have empirical referents
Specific
Should be related to available technique of testing
Should be related to a body of theory
Cause and Effect

Cause and effect relationship between two variables
can
be established if the following conditions are met
Two variables covary
The covariation is not spurious
There is a logical time order
A mechanism is available to explain how an

independent variable causes changes in a
dependent variable.
Causal Hypothesis
At least two variables,

Expresses the causal relationship between
the variables
Can be expressed as a prediction or an
expected future outcome
Logically linked to a research question or
theory
It is falsifiable-
Expression of Causal Relationship

o
o
o
o
o
o
o
o
o
o
o
Different ways to express- Attendance in College and

performance in Exam
Causes
Leads
Relates
Influences
Associated with
Produces
Results
If, then
Higher /lower
Reduces
Covariation
o
o
Covariation is also called association

Association is not a sufficient condition but a
necessary condition (time spent in library
scores in exam)
Spurious correlation- No. of firefighters and
amount of damage, gender and lung cancer,
age and readiness to adopt innovation
Hypothesis- a Word of Caution

o
The fact that two variables go together does

not mean that change in one variable causes
change in another variable.
In social science, it is often difficult to
establish causality
Many times we are not sure that cause comes
first (conflicts with friends and low self
esteem)
Research proposal components

The research proposal should include
The problem identification,

Aim of the study; rationale (why is this problem important to study)
A brief literature review to discuss what is known about this topic
The research questions (and hypothesis, if any) / objectives of the
study
Methodology including - type of research design, the sample (type of
sample, sampling frame, size of sample), methods and tool for data
collection.
Ethical concerns related to the study must be included.
PART II TOOL FOR DATA COLLECTION

Students are expected to prepare a tool for data collection which
should be included at the end of the research proposal.
All research proposals must have a list of references and in text

citations.
Quantitative Research Process: Use of

Logic
n
Deductive Process- Begins with abstract logical relationships

among concepts and move towards concrete empirical
evidence
Theory- Hypothesis- Observations to test the hypothesis .e.g.
Ability to manage multiple tasks and gender; Mathematical
aptitude and gender; Effectiveness of experiential learning vs.
learning through lecture method
Inductive Process- Begins with detailed observations of the
world and move towards abstract generalizations and ideas.
Examples: Observation of cases of Malnutrition in a region,
Observations of different aspects associated with
malnutrition among infants/ children below 6years.
Generalization based on the association of factors affecting
status of Nutrition among different age groups of children
Cyclic Model of Science

- Perpetual flow of theory building and theory testing
Theories
Logical deductions
Empirical Generalizations
Hypothesis
Measurement
Statistical or verbal
summarization
Observations
Use of Research
n
Basic Research /academic research/pure

research- Research designed to add to our
knowledge and understanding of the social
world for the sake of contributing to
theoretical knowledge
Applied research it is intended to be

useful in the immediate future and to
suggest action or increase effectiveness in
some area
Types of Applied Research

n
Action Research- focused on immediate application,

problem solving in a particular situation/community not
so much on development of theory or making
generalizations.
Impact Assessment- main purpose is to determine

whether a given program has had the desired effect on
the target population (individual, household, institution)
and to determine whether the desired effect is
attributable to the program or to other factors. It is a fact
finding activity that describes conditions that exist at a
particular time.
Evaluation Research- purpose to collect information to

provide feedback about a project/scheme to assess its
worth or merit. Implies judgment on effectiveness, utility
or desirability of a program or policy.
Research Design
n
A study design is the blue print presenting the

overall plan of why and how the study will be
conducted.
It is a research strategy specifying the number
of cases to be studied, the number of times
data will be collected, the number of samples
that will be used and whether or not the
researcher will try to control or manipulate
the independent variable in some way.
Decisions about Research Design

n
Key factors to consider-
Purpose of research
Researchers interest
General use of theory
n
n
Research Design: Purpose of

Research
n
n
n
n
Exploratory Research
Descriptive Research
Explanatory Research
Applied Research
Studies may have multiple purposes but
one purpose is usually dominant.
Goals of Exploratory Research

Ground breaking research on a relatively unstudied
topic or a new area. Exploratory research addresses
the what question
n
n
n
n
n
Become familiar with the basic facts, people, and

concerns involved
Develop a well grounded mental picture of what is
occurring
Generate many ideas and develop tentative theories
and conjectures
Formulate questions and refine issues for more
systematic inquiry
Develop techniques and a sense of direction for
future research
Goals of Descriptive Research

It is designed to describe groups, activities, situations
or events. It focuses on how and who questions
n
n
n
n
n
n
n
n
Provide an accurate profile of a group

Describe a process, mechanism, or relationship
Give a verbal or numerical picture
Find information to stimulate new explanations
Present basic background information or a context
Create a set of categories or classify types
Clarify a sequence, set of stages or steps
Document information that contradicts prior belief
about a subject
Goals of Explanatory Research
Explanatory research is designed to explain why

subjects vary in one way or the other. It addresses the
why questions.
Determine the accuracy of a principle or theory
Find out which competing explanation is better
Advance knowledge about underlying processes
Link different issues and topics under a common
general statement
Build and elaborate a theory so that it becomes more
complete
Extend a theory or principle into new areas or issues
Provide an evidence to support or refute an
explanation or prediction
Types of Research Designs

1.Research Designs based on time dimensions
n
Cross Sectional Study
Longitudinal study 3 types

n Panel
n Trend
n Cohort studies
Case Study
2. Experimental Study Designs
Time Dimension in Research

n
Cross sectional research
Longitudinal Research- time series

research, panel study research, cohort
analysis
Case Studies

n
Data are collected for all the variables of

interest using one sample at one time
Data are collected for one sample at one point
in time even if that one point lasts for hours,
days, months or years.
Most widely used design as they are useful to
describe samples or populations on a number
of variables
Usually less expensive and simpler to
implement

n
Statistical analysis is used to analyse

patterns of relationships
There is at least one independent and one
dependent variable
They are sometimes used to examine causal
relationships when the time order between
the variables is easy to determine
Longitudinal Designs
n
n
Data are collected at least two different times

Types of longitudinal study- panel, trend, or
cohort
Panel study- Same sample is followed over a
period of time- one can collect current data to
determine the order of events and experiences
Panel studies allows documentation of patterns of
change and establishment of time order
consequences
Longitudinal Design: Panel Study

Disadvantages:
n Higher cost in terms of time and money
n The loss of subjects from a study because of
disinterest, death, illness or inability to locate
them
n Changes in the methods of data collection,
and measurement techniques
n Panel conditioning
n Effect of age on the subjects
Longitudinal Design: Trend Study

n
n
n
n
n
n
n
Data collected at least twice selecting a new sample

each time
They avoid panel attrition and panel conditioning
Save expenses of relocating sample members
Measures aggregated changes and not changes in
individuals
May be no aggregated change even though there are
changes in individual characteristics
Difficult to establish causal relationship because a
new sample is examined
Could inform about overall change in society but may
not identify the reasons for this change- thus
informing about general trends
Longitudinal Design: Cohort Study

Cohort means people born within a given
time
frame or experiencing a life event at
approximately the same time
n People can leave the cohort but no one can
join
after its formation
n Cohort is a specific kind of trend study that
studies a cohort over time.
n It can examine an entire generation
n
Experimental Design
n
One group post test only design
One group pre test post test design
Classical Controlled Experiments
Solomon Four group Experiment
Quasi Experimental
One group post test only design

n
n
n
n
n
n
n
n
Only one group

No group to compare
Data collection only after intervention
Can be useful in gathering information about how the
program is functioning
Often used for client satisfaction survey to see their
perception of the programme
Numerous threats to its validity both external and
internal
Internal validity Not sure whether the outcome is the
result of the programme
External validity No use for generalization
One group pre test post test design

Pre test--Programme intervention post test
n
Threats to internal validity- History( no method of

judging effects of other events), Maturation and
effect of testing, changes in the questionnaire
Threats to external validity- History, reactive
effect (change in the behaviour because of the
participation in the study)
Classic Controlled Experiment

n
One experimental and one controlled group
Dependent variable is measured at least two times

before and after the experiment
The independent variable is manipulated/controlled by

the researcher
Used for explanatory research testing causal hypothesis
Classic Controlled Experiment

n
Internal validity- Selection and maturation

interaction
External validity- Selection treatment

interaction and maturation treatment
interaction
Solomon Four Group Experiment

Groups
Time
1( pretest)
2( Post Test)
Experimental
Measure dependent
variable
Measure dependent variable again

post intervention
Control
Measure dependent
variable
Measure dependent variable again

(no intervention)
Experimental (no
pretest)
Measure dependent variable

(post intervention)
Control (no pretest)
Measure dependent variable

(no intervention)

n
It takes care of
Testing the main effects of the
experiment
Understanding the interaction effect of
testing
Combined effect of maturation and
history

n
Same as classic experiment with the addition of

post test only control and experimental groups
Four groups are- Experimental group with pretest, experimental group only post test,
Control group with pre -test and control group
without pre test
Stimulus is introduced in both the experimental
Groups
Post test measurements for all four groups
Experimental Study
n
n
n
Use of Random assignment to select groups

Use of matching to select groups
Experimental group receives the stimulus and
controlled group receives nothing
Difference in dependent variable as measured at
pre test and post test is calculated for both the
groups
Field experiments are done in real situations so
better generalizability
Laboratory research allows researcher a better
control over setting
Quasi Experimental Designs

n
n
n
No pre test
Post test only
Controlled group may be given alternate
treatment or placebo
Not as reliable as the other experimental
designs
Eliminates threats to internal validity and
can establish causality
Ethics in Research
66
Why are we talking about ethics in

social sciences?
Ethics is concerned with the conduct of human
beings
SS are conducted with the participation of
human beings
Have an impact on human beings or on the
wider society
67
Context specific .
Can be universal
Can be specific to a particular context
Can specific to a particular locality
68
Four well known moral principles of

ethics
The Principle of Non-maleficence
Research must not cause harm to the participants in particular and to people
in general
The Principle of Beneficence
Research should also make a positive contribution towards the welfare of the
society
The Principle of Autonomy
Research must respect and protect the rights and dignity of participants
The Principle of Justice
The benefits and risks of research should be fairly distributed among people
69
The Principle of Autonomy

Research must respect and protect the rights
and dignity of participants
The Principle of Justice
The benefits and risks of research should be
fairly distributed among people
08/09/11
Ten General Ethical Principles (1)

Essentiality
Necessary to make all possible efforts to get and give adequate
consideration existing literature and knowledge of the study or
the issue you research
Maximization of Public Interest and Social Justice
Research is a social activity, carried out for the benefit of the
society (even the reason is to get marks)
With the motive of maximization of public interest and social
justice
71

Knowledge, Ability and Commitment to do Research
Sincere commitment to research and relevant subject
Readiness to acquire adequate knowledge, ability and
skills for a particular research
Respect and Protection of Autonomy, Rights and
Dignity of the Participants
Protect the autonomy, rights and the dignity
Participation of the individual MUST be Voluntary and
based on informed consent
Get permission of the respondent for photography and
recording
72

Privacy, Anonymity and Confidentiality
All information from the participants are confidential
NO IDENTIFYER!!!!!
Pseudo names
Precaution and Risk Minimisation
All research carries some risk to the participants and to
society
Take adequate precautions and minimizing is essential
73

Non Exploitation
MUST not unnecessarily consume the time of
participants
MUST not incur undue loss of resources and income
MUST not expose them to risks due to participation
Should not exploit juniors or other team members
Contribution of each member should be properly
acknowledged and recognized
74

Public Domain
Results should be in the public demine
Accountability and Transparency
Must be fair
Must be honest
Must be transparent
Must preserve the research records for a reasonable time
Must destroy all the records after certain periods
75

Totality of Responsibility
All those involved directly and indirectly in the
research should adhere to the ethics
76
A few other important principles

Protection and Promotion of Integrity in
Research
Researchers should not take any secret research
There is fabrication, falsification, plagiarism
77
Participants have the right to get help

All possible help to the participants
Help the participants in case of adverse consequence
Informed consent where the gatekeepers are involved
Obtain permission of the gatekeepers
But it does not substitute the permission of the actual
respondent
Take back your results to the community
You check whether the observation you made is correct
78
NEVER have a wrong statistical analysis

Think about your interpretation
Should not be from group to individual level
Should not be a particular population to general
population
79
Reference books: (Published by Centre for

Studies in Ethics and Rights)
Ethical Guidelines for Social Science Research in
health (2006). National committee for Ethics in
Social Science Research in Health.
Amar Jesani and Tejal Barai-Jaitly (2005). Ethics
in health Research: A Social Science Perspective.
80
COMPOSITE MEASURES
QUANTITATIVE RESEARCH
METHODOLOGY
Madhura Nagchoudhuri
Composite Measures
Composite measures are used to measure variables that are
complex or multifaceted such that they cannot be measured
using a single item on a questionnaire e.g. stress, quality of life,
human development
Two types of composite measures Indexes
Scales
Both indexes and scales enable representation of complex

variables with scores that allow potential for greater variance.
Scores are derived from multiple items.
Indexes and scales provide ordinal measures of variable by rank
ordering people through overall score that combines items on
the scale or index.
Composite Measures
Factors to keep in mind while selecting items
to create a scale or an index Face validity
Items should have adequate variance- useful in
distinguishing people from each other. People
should not come up with uniform answer.
Index- The HDI example
An example of a well established and commonly used index- Human

Development Index
HDI provides a way of ranking countries on the issue of human
development and is used to monitor the progress of nations.
It indicates how far a country has to travel to provide essential choices to its
entire population.
Focus is on long term human development outcomes. It cannot reflect input
efforts in terms of policies or short term human development achievements
It is an average measure and masks certain disparities and inequalities
within countries.
Consists of
Educational attainment access to knowledge measured by adult literacy
rate and combined gross primary, secondary and tertiary enrollment ratios
A decent standard of living- measured by GDP per capitapurchasing
power
Long and Healthy Life - measured by Life expectancy rate at birth
HDI calculated by taking an average of deprivations in all three areas and

subtracting the average from 1
Types of Scales
3 types of scales most commonly used
include Likert scales
Semantic differential scales
Guttman scales
Likert Scale
Format frequently used in contemporary survey
questionnaires.
Respondent is presented with a series of statements to which
s/he is to respond indicating whether s/he strongly agrees,
agrees, undecided/neutral, disagrees or strongly disagrees.
There is an unambiguous ordinality in the response categories.
Usually is 3point, 5point or 7point (odd no. with a midpoint)
Assumes that each item on the scale has equal intensity
Lends itself to simple method of scaling with the possibility of
scoring being done in a uniform way e.g. scores of 0 to 5 may
be assigned where score of 5 is assigned to strongly agree
for positive items and strongly disagree for negative items.
Semantic Differential Scales

Used extensively in social science research
It is easy to construct and easy to administer
Used to measure inanimate things, animate things, behaviors
and intangible concepts similar to Likert scales.
Usually a 7 point scale
Tests peoples feelings about something
Determine the relevant dimensions and have terms to represent
extremes of each.
Uses pairs of adjectives which are opposites of each other (e.g.
boring- interesting)
Respondents are expected to rate each set of adjectives
indicating which of the pair they favor more eg. Boring or
interesting (very much, somewhat, neither, somewhat, very
much)
Guttman Scales
Clear difference in intensity in the way items
are structured moving from the least intense to
the most intense.
If a respondent agrees to the more intense
items (harder items) then one may assume that
s/he will agree to the less intense or easier
items.
E.g. Bogardus Social Distance Scale
Complex measures- word of caution

Some factors to be taken into account while
creating and applying scales Language- shades of meaning as they are understood
may be different. If the scale is to be used in a context
where respondents dont know English- interviewers
who are bilingual may be used. Instrument may be
translated into another language --- back translated to
ensure reliability and validity
Culturally sensitive questions scales should be
tested in different cultural contexts to ensure their
reliability and validity in the context of that culture.
Reliability & Validity
Reliability
Deals with the indicators of dependability
A reliable indicator or measure gives the
same result every time
Three types of Reliability1. Stability reliability -reliability across time,
2. Representative reliability -across
subpopulation, groups of people and
3. Equivalence reliability -consistency across
different indicators
Sources of Error
Unclear Definition of variables

Use of retrospective information
Variation in conditions for data collection
Structure of the instrument (many open ended
questions may reduce the reliability)
Testing Reliability
Reliability is determined by obtaining two or
more measures of the same thing and seeing
how closely they agree.
Four methods of testing reliability
Test retest
Alternate form
Split Half
Observer reliability
Test-Retest
Repeated administering the same instrument to
the same set of people on separate occasions
They should not be subjects in actual study
If the results of repeated tests are similar, then
the reliability is high
Drawback- the first test has an influence on the
next
Measuring instruments that are strongly
affected by memory or repetition, should not be
tested for reliability using this method
Alternate Form
Different but equivalent forms of the same test
are administered to the same group of
individuals usually close in time and then
compared
Drawback- developing equivalent tests can be
time consuming
Some problems associated with test-retest are
not completely eliminated
Split Half
Items of the instrument are divided into
comparable halves
The test is administered and the scores of the
two halves are compared.
If the scores are same then the test is reliable
Major problem in designing two halves that
are equivalent
Observer Reliability
Comparing administration of an instrument
done by different observers or interviewers
The observers need to be thoroughly
trained
At least two people will code the content of
the responses according to certain criterion
Validity
Validity: A measure is valid if it measures
what it is supposed to measure
Four Types of Measurement Validity
- Face validity
- Content validity
- Criterion validity
- Construct validity
Face Validity
The easiest type of validity to achieve and
most basic
It is the judgment by the scientific community
that the indicator really measures the construct
Content Validity
It is a special type of face validity
Whether it captures the entire meaning
Is the full content of the definition
represented?
E.g. Feminism, empowerment
Criterion Validity
The validity of an indicator is verified by
comparing it with another measure of
The same construct in which the researcher has
confidence
Two subtypes Concurrent
Predictive
Concurrent Criterion Validity

An indicator must be associated with a
preexisting indicator that is judged to be valide.g. intelligence test
Predictive Criterion Validity

Indicators predict future events that are
logically related to a construct
E.g. scores of competitive exams like
SAT or CAT and future performance of
the student
Construct Validity
It is for measures with multiple indicators
Two types
Convergent
Discriminant
Convergent Construct Validity

This applies when multiple indicators
converge or are associated with one another
E.g.- income, type of housing
Educational level, skills in writing or
computation and knowledge or awareness
Discriminate Construct Validity

Also known as divergent validity
If two constructs A and B are very different
then their measures should not be associated
e.g. belief in secularism and strong identity
with religious groups
Other Types of Validity

Internal- It means that there is no error internal
to the design of research project
External- It is the ability to generalize the
findings of a specific setting or group to a
broad setting or group
Statistical- Choice of correct statistical
procedure and meeting its assumptions fully
Relationship between Reliability &

Validity
Reliability is necessary for validity but not
sufficient
A measure can be reliable but not valid
weighing scale
Validity and reliability are usually
complementary concepts
Sampling
Introduction
Inferential statistical methods use sample statistics to
make predictions about population parameters.
The quality of inferences depends crucially on how well
the
sample represents the population.
To ensure a good sample representation
randomization
is essential.
What is randomization?
Randomization is the mechanism for ensuring that the
sample representation is adequate for inferential
methods.
Methods of Sampling
Sampling is quite often used in our day-today
practical life where, our purpose is to
determine the population characteristics
only by observing a finite sub set of
individuals taken from it.
Sampling methods can be classified under
two heads namely,
1. Probability Sampling Methods
2. Non-probability Sampling Methods
Probability Sampling
Methods
1. Simple Random Sampling
2. Systematic Sampling
3. Stratified Sampling
4. Cluster Sampling
5. Multistage Sampling
1. Simple Random Sampling : A Simple Random

Sample of n subjects from a population is one in which
each possible sample of that size has the same
probability of being selected.
How to Select a Simple Random Sample?
Before selecting a Random Sample, we need a list of
all subjects in the population. The list is called
Sampling Frame. The most common method for
selecting a simple random sample from the sampling
frame is the use of a random number table.
A random number table is a table containing a
sequence of numbers that is computer generated
according to a scheme whereby each digit is equally
likely to be any of the integers 0,1,2, ... ,9
. Systematic Sampling: If a sample

of size n is to be selected from a
population size by N and let k = N/n. A
systematic random sample
1. Selects a subject at random from the first k
names in the sampling frame, and
2. Selects every kth subject listed after that one.
The number k is called the sampling interval.
Stratified Sampling : A stratified

random sample divides the population
into separate groups, called strata and
then selects a simple random sample
from each stratum.
The population is divided into k
homogeneous strata with stratum size
N1, N2, , Nk such that
N1 + N2 + + Nk=N
N1 N2 + ++ N k = N
. Cluster Sampling: Simple, systematic and

stratified random sampling are very expensive or
ever impossible to implement in many situations
particularly when a complete and up to date
sampling frame is not available.
In cluster sampling, the population is divided in to
large number of groups, called clusters. A cluster
sample is one for which the sampling units are
the
subjects in a random sample of the clusters.
Multistage Sampling: Multi-stage sampling methods

use combinations of various sampling techniques. For
example, to study various characteristics of adult
residents in Maharashtra state, one could treat districts
as clusters and select a random sample of a certain
number of them. Now within each district selected one
could take cluster sample of villages. Within each
village selected, one could systematically sample every
10th household. Within each household selected, one
could select one adult at random from the sample.
Non-Probability Sampling
Methods
Social research is often conducted in situations that do
not allow the kinds of probability sampling discussed
so
far, for large-scale social surveys. Suppose we want to
study homelessness. Neither a list of all homeless
individuals is available nor it can be created. Moreover,
there are times when probability sampling wouldnt be
appropriate. Such situations call for non-probability
sampling.
Methods of Non-Probability
Sampling
1. Purposive or Judgement Sampling;
2. Volunteer Sampling;
3. Snowball Sampling;
4. Quota Sampling; and
5. Selecting Informants
1. Purposive or Judgement Sampling

Sometimes it is appropriate to select a sample
on the basis of knowledge of a population and
the purpose of the study. This type of sampling is
called purposive or judgement sampling.
2. Volunteer Sampling
One of the most common non-probability sampling
methods is volunteer sampling. In this method
subjects volunteer themselves for the sample. A good
example of volunteer sampling is visible almost any day
on television. Some T.V. programmes ask viewers to
offer their opinions on any issue or vote for any
celebrity for his/her performance through SMS or calling
a phone number. The danger in this method is that the
sample will poorly represent the population and may
yield misleading conclusions.
Snowball Sampling
Snowball sampling method is appropriate when the
members of a special population or individuals with a
rare characteristic are difficult to locate such as
persons in a village who were bitten by snake. In this
method, the researcher collects data from the few
members of the target population; then asks those
individuals to provide information needed to locate
other members of the target population whom they
happen to know.
.
Quota Sampling
Quota sampling begins with a matrix, or table, describing the
relevant characteristics of the target population. Depending on
your research purposes, you may need to know what proportion
of the population is male and what proportion female as well as
what proportion of each gender fall into various age categories,
educational levels, ethnic groups, and so forth.
Once you have created such matrix and assigned a relative
proportion to each cell in the matrix, you proceed to collect data
from people having all the characteristics of a given cell. You
then assign a weight to all the people in a given cell that is
appropriate to their total population. When all the sample
elements are so weighted , the overall data should provide a
reasonable representation of the total population.
Sampling Error
The sampling error of a statistic is
the error that occurs when a statistic
based on a sample estimates or
predicts the value of a population
parameter.
1. Other Sources of Variability/Error
2. Under-coverage
3. Response Bias
4. Non-response
5. Missing Data
Sampling Distribution
A sampling distribution is a probability
distribution that determines probabilities of
the possible values of a sample statistic.
Standard Error
The standard deviation of the sampling
distribution of the sample statistic is called
the standard error of the statistic
Research Methods
Tools & Techniques for
Data Collection
Some Techniques are

l
l
l
l
Interview
Questionnaire
Focus Groups
Observation
Methods of Interview
l
l
l
l
Self administered questionnaires

Face-to-face interviews
Telephonic Interview
Through the Internet
Interview
Types of Interviews
Informal
Unstructured
Semi Structured &

Structured
Interviewing
l
Unstructured Interviewing-
Get people to open up and express

themselves in their own terms at their own
pace.
Excellent for building initial rapport before
moving to more formal interviews
Often no formal written tool is used
Interview
l
Semi-structured Interviewing-
An Interview Guide used- This is a written list of

questions and topics and need to be covered in a
particular order.
Structured interviewing-people are asked to

respond to as nearly identical a set of stimuli as
possible.
-
Interview Schedule used with a written list of

questions used with the question order and
structure being followed exactly for each interview.
Different Components
l
The Interviewer
The Interview Schedule/Interview Guide
The Researched/Respondent
The Skill of Interviewing

l
l
Assure people of anonymity and confidentiality

Explain that you simply want to know what
they think, and what their observations are
Encourage them to interrupt you during the
interview with anything they think is important.
Always ask for permission to record personal
interviews and to take notes.
Framing the Questions

l
l
l
l
l
l
l
Questions must be unambiguous and clear.

The vocabulary or words used must be appropriate to the
respondents.
There must be a clear purpose for every question.
Use of open and closed ended questions
- open ended questions respondent gives own answer (used
for qualitative data, sensitive information)
- Closed ended questions-choice of answers, respondent
picks the most appropriate one.
Never use loaded or double-barreled questions.
Always pre test
See if the questions elicit the information needed to test the
hypothesis or answer research questions
Sequencing the Questions

l
l
l
Ask general questions or about some facts

before personal questions
Get respondents involved in the interview
Intersperse fact-based questions throughout
the interview to avoid long lists of factbased questions
Sequencing the Questions

l
Ask questions about the present before

questions about the past or future.
The last questions might be to allow
respondents to provide any other
information they prefer to add and their
impressions of the interview
Skills in Interviewing
l
Probing
-
Silent Probe
- Echo Probe
- Uh-huh Probe
- Tell Me-More Probe
- Long-Question Probe
Carrying the Interview

l
l
l
l
Ask one question at a time.

Attempt to remain as neutral as possible.
Encourage responses
Be careful about the appearance when note
taking.
Provide transition between major topics
Don't lose control of the interview
After the Interview

l
Verify if the tape recorder, if used, worked

throughout the interview.
Make any notes on your written notes
Write down any observations made during
the interview
Points to Note.
l
l
l
l
l
Importance of Language
Pace of the Study
Being Yourself
The little things!!
Using a Tape Recorder (recording equipment
etc)
Taking Notes
Response Effects
l
l
l
l
l
Response effects refer to measurable differences in the

interview data that are predictable from the characteristics
of respondents, interviewers and/or the environment.
Age, sex, culture, comfort level of the respondents impact
responses
The Deference Effect
Threatening Questions
Social Desirability Effect
Accuracy of responses-(inability to recall, misleading)
Types of Interviews
l
l
Face to face
Telephone
Advantages of Face to Face

Interviews
l
l
l
Can be used for all types of people- illiterate, bedridden,

old etc
It is possible to clarify if the person does not understand
the meaning (noting such questions where explanation
was needed)
Use of different techniques is possible-open ended
questions, visual aids, graphs, etc.
Long interviews are better in a face to face situation
Can get only one question at a time so cannot flip
through the next page to see whats coming
Possible to observe the body language
Disadvantages of Face to face

Interviews
l
l
l
They are intrusive and reactive

Costly in terms of time and money
Limits the sample size for a single
investigator as you have to finish the data
collection in a short time (should not exceed
a year)
Training is needed for multiple investigators
and there can be some error
Telephone Interviews
Advantages:
-Have the impersonal quality of the questionnaire.
Inexpensive, need less time and energy
Can reach everyone who has a phone
Less influence of the interviewers personality
Disadvantages:
Not useful for people without a telephone connection
Cannot be a long schedule
Data can be false if investigators are not monitored
properly
Questionnaire
l
Include a brief explanation of the purpose of

the questionnaire.
Include clear explanation of how to
complete the questionnaire.
Include directions about where to provide
the completed questionnaire.
Note conditions of confidentiality
Advantages of Self Administered

Questionnaires
l
l
l
l
Can be administered in various ways including- mail, drop and

collect; administered online and collected through email or
administered to a group of people sitting in a room
Single researcher can collect data from a large sample in a
short time
Relatively cheaper
No interviewer bias
Possible to include questions with long list of categories/long
battery questions
Possible to ask sensitive/difficult to answer in face to face
interview
Disadvantages of Self Administered

Questionnaires
l
l
l
l
No control over how people interpret the

questions
Response rate can be poor
Prone to serious sampling problems
The sequence may not be followed/cannot
avoid flipping through
Not useful for illiterate or visually impaired
population
When to Use What

l
l
No method is perfect
On an average interview method can ensure
82% of fully filled schedules as against 68%
by questionnaire method
Short schedules for a population having
telephone connections, telephone
interviews are possible
Dillmans Total Design Method to Improve Response

of Mail and Telephone Surveying
l
Mailed questionnaires must look professional (size and

colour of the paper, font size, layout)
Front and back covers- No question on either coversinteresting title, name and return address
Question order start with questions related to the
topic and end with questions on personal data
Formatting- Careful use of font and case, spacing
Dillmans Total Design Method to Improve

Response of Mail and Telephone Surveying
l
Length should not be more than 10 pages and 125

questions
The covering letter brief and specific, guarantee
confidentiality
Inducement- some monetary incentives for responding
ca also be thought of
Contact and follow up Sending the questionnaire
after prior intimation and follow up after mailing.
Sending a second cover letter and questionnaire to
non respondents
Focus Groups
Interview through the group process

Focus groups typically have 6-12 members, plus a
moderator
Discussion Focused around a specific topic or theme
Homogenous group preferred
To the extent possible, participants should not know
one another
Focus groups have to be supportive and nonjudgemental
Focus Group Discussion

l
l
l
l
Develop five to six questions

Major goal of facilitation is collecting useful
information to meet goal of meeting.
Carefully word each question before that question is
addressed by the group.
Facilitate discussion around the answers to each
question, one at a time.
After each question is answered, carefully reflect back
a summary of what you heard
Ensure even participation.
Closing the session
Observation
l
Observation is a data collection strategy involving the

systematic collection and examination of verbal and
non verbal behaviours as they occur in a variety of
contexts It includes both human activities and the
physical settings in which such activities take place.
Observation methods are also used to extend or
validate data collected by other data collection
methods.
Observation
l
Observation also has relevance in research studies

where the respondents are unable to communicate for
a variety of reasons for e.g. either they are infants or
they could be adults who may not be able to articulate
complex emotions or certain life situations in an indepth manner.
Even in studies with direct interviews, researchers use
observational techniques to note body language and
other gestures to get an insight into the words spoken
by the persons being interviewed.
Observation
l
The purpose of observational research is to

record group activities, conversation and
interaction as they happen and to ascertain
the meanings of such events to
participants.
Observation may take place either in
laboratory settings, designed by the
researcher or in field settings that are the
natural habitat of selected activities.
Types of Observational Research

Three types of observational researchl
a)Descriptive observation generates a large
quantum of data as it involves the description of all
details by an observer
l
b) Focused observation as the name indicates entails
looking at only specific pertinent material relevant to
the area of study
l
c)Selective observation would mean identifying
specific areas from a more general category.
Types of Observation
l
l
l
Observation-in-person (participant observation and

non participant observation)
Video recordings.
Structured or Unstructured.
In majority of research, efforts are usually made to
observe participants in as natural a setting as possible.
Participant Observation
l
l
Usually involves field work

It a strategic method that lets you collect any kind of
data qualitative as well as quantitative, narrative or
numbers.
It can be in the form of life histories, attending rituals,
talking about sensitive issues, It is about immersing
yourself I a culture, process and documenting it.
Can be ethically problematic if not done properly
Two Different Roles as Participant Observer

l
Complete Participant- Becoming a member

of the group without disclosing your role as
a researcher
Participant Observer- Can be an insider who
observes and records some aspects of life
around him/her or can be an outsider who
participates in some aspects of life and
records whatever is possible
Advantages
l
l
l
It makes it possible to collect different types

of data
It reduces the problems of reactivity or
change in behaviour because they are being
studied
It helps to formulate sensible questions
Gives an intuitive understanding of a culture
Non participant Observer

l
l
Direct observation
Video recording and then analysing
Non Reactive Measurements

l
Observations of the researcher on a topic of

his/her interest without making the subjects
aware that they are being studied.
The evidence of social behaviour or action is
available naturally
Researcher infers from the evidence or
behaviour without disrupting those being
studied
Varieties of Non Reactive or Unobtrusive

Observation
l
Family portraits of different historical eras- to

study the gender relations
Contents of garbage dumps in an urban area to
study how many bottles of liquor are being thrown
to study the under reporting of liquor
consumption
Interest in different exhibits in a museum from the
worn floor tiles
Bumper stickers of cars to study political affiliation
Recording and Documentation

l
l
l
Construct conceptualization and linking it to

non reactive observations
Deciding upon a system of observation
It can be done from physical traces, Archives
and observation of the events, behaviour in
natural settings
Content Analysis
l
l
l
l
l
Content analysis is a technique for gathering and

analyzing the content of the text
Content refers to words, meanings, pictures, symbols,
ideas, themes, and any message of communication
Text can be written, visual or spoken
The researcher uses objective and systematic counting
and recording procedures to produce a quantitative
description of the content
It is a non reactive technique
It involves random sampling, precise measurement and
operational definitions for abstract constructs
Topics Appropriate for Content Analysis

l
Themes of popular songs and religious

symbols and hymns
Trends in newspaper covers and ideological
tone of the editorials
TV coverage of people from different
backgrounds
Gender exploitation in commercials
Units of Analysis in Content

Analysis
l
Word, Theme, plot, design, newspaper

article,
Coding system can be around four
characteristics
-
Frequency
Direction
Intensity
Space
Data Collection from Secondary

Data
Books
l Reports
l Published compilation
l Computerized records
Researcher can search through collection of information
with a research question and variables in mind
Reassemble the information in new ways to address the
research question
l
Sources of Data Collection from Secondary

Data
l
Any topic on which information has been

collected and is publicly available can be
selected
Existing statistics can also provide data for
the study such as govt. records, census
data,
Assembly proceedings, biographical
information
Methods of Data Collection, Data

Processing and Codebook
What is Data ?
Data refers to a collection of organized
information, usually the result of experience,
observation or experiment, other information
within a computer system, or a set of premises.
This may consist of numbers, words, or
images, particularly as measurements or
observations of a set of variables.
169
Data Processing
The Survey data which are collected from field
require certain operations before it can be used
for analysis.
The data processing requirements are to be
specified in an earlier stage of any research
study in terms of time, cost, manpower,
materials, etc.
170
Cont
Processing of collected data is required for
drawing out meaningful results. Data
processing involves the various steps, from
editing of questionnaires to analysis and
report-writing.
There are different stages of data processing:
Editing and Scrutinising
Coding and Data Entry
Validation, Checking & Updating
Analysis
171
Editing & Scrutinising

The first stage is scrutiny or editing of filed-in
questionnaires.
Editing means checking the schedules for:
Completeness,
Accuracy and
Uniformity
172
Completeness
By completeness, it is meant that the filled-inschedule is complete in all manners. The first
point to check is whether there are answers for
every question. If an interviewer forgets to ask
a question or to record an answer, it may be
possible to deduce from other data on the
questionnaire what the answer should have
been and thus fill the gap at the editing stage.
173
Accuracy
By accuracy, it is meant that the answers are
correctly filled-in. It is not enough to check
that questions are answered; one must try to
check whether the answers are accurate.
Answers needing arithmetic even of the
simplest kind, should be edited carefully.
174
Uniformity
The editing stage gives every opportunity for
checking that interviewers have interpreted
questions and instructions uniformly.
For example, if a question on occurrence of a
calamity is to be asked as follows :
"Whether any calamity has occurred in your
village during the past two years". Every
investigator should confine to the period of
two years only so that there will be uniformity
in the case of the period.
175
Characteristics of Completed Questionnaire

A correctly completed questionnaire or
schedule will have, among others, the
following characteristics :
All answers are recorded in a legible and
comprehensive way.
No inconsistencies between answers.
Where information is missing, it should be
filled-in (wherever possible) in accordance
with the correct pattern.
176
In general, you need a thorough knowledge of

the entire questionnaire and the ability to see
which answer to a certain question can be
verified against the answers.
When you encounter a problem for which
there is no ready solution, you should always
check against other relevant questions in the
schedule or questionnaire.
177
Coding
Coding is translating answers into numerical values or
assigning numbers to the various categories of a
variable to be used in data analysis.
Coding is generally done while preparing the questions
and interview schedules. Fieldwork is thus done with
pre-coded questions. However, sometimes, when
questions are not pre-coded, coding is done after the
fieldwork.
Coding is done on the basis of the instructions given in
the codebook. The codebook gives a numerical code
for each variable.
178
Example
If Age = 9 years;
Then, code
0 9
Do not code in any of the following ways:

Blank instead of zero :
Wrong Box Used
The last will be read as 90 and not 9.
9 0
179
Types of Questions
Different types of questions should be examined
before coding:
(i)
(ii)
(iii)
(iv)
(v)
Number (value) Questions.

Fixed Alternative Questions.
Semi-Open-ended Questions.
Open-ended Questions.
Multi-coded Questions.
180
(i) Number (Value) Questions : This can be coded in

the same way as it has been recorded on the
questionnaire at the time of interview.
Example : Age, No. of children, etc.
(ii) Fixed Alternative Questions : Questions of this type
are YES/NO, SEX, Month etc. (The no. of
alternatives decided in advance).
(iii) Semi-Open-ended Questions: These questions have
a fixed number of alternatives plus an `OTHER'
option. Example : Other contraceptive methods, sex
preferences of next child, etc.
181
Cont
(iv) Open-ended Questions: These questions are left
completely open for the interviewers, and no
alternatives are suggested in the questionnaire. The
reason for this may be either of the following:
(a) The alternatives are known, but there are too may to
make it practicable to list them all (Example :
Contraceptive methods).
(b) The possible replies cannot be foreseen, and as a
consequence the answers are taken down verbatim
and later classified in manageable groups (Example :
Occupational Status).
182
Cont
(v) Multi-Coded Questions: Multi-coded
questions belong to the group of `fixed
alternative' questions, as the number of
possible replies are fixed. However, in multicoded questions, the answers are not
necessarily mutually exclusive, so that two or
more answers are allowed for the same
respondent. The codes for this are developed
differently from the other types.
183
Cont
For this type of question, a Binary System of
codes is used, rather than a consecutive order.
This idea is that all the categories ticked can be
added together to form one code without any
loss of information, as each `sum' represents a
unique combination of answers.
184
Cont
A detailed coding manual or set of instructions should
be prepared before the coding begins. Since, the
editing and coding operations are related, the timing
of the coding depends on that of the editing. In
general, the coding should not begin until there are an
adequate number of edited questionnaires available,
and there is assurance that there will be a continuous
flow of questionnaires. Once the coding starts, there
should not be delays due to the unavailability of
edited questionnaires. There must be adequate office
space so that questionnaires can be checked as they
are returned from the field.
185
Cont
The unedited questionnaires should be kept separate
from the edited ones. Likewise, those that have been
coded should be stored separately from those not yet
coded. Adequate working space should be provided
for each individual coder so that there is no
overcrowding and the work can proceed satisfactorily.
All coders should be given specific training for
sufficient understanding of the job. The real effective
way to train is to ensure that they are given enough
on the job practice, followed up with careful
evaluation of the work performed.
186
Data Validation and Updating

The data validation may be done in different
stages, which can be divided into following
parts:
Data Entry
Editing
Recoding
Tabulation
Archiving and further analysis
187
Cont
Data Entry: The data are entered into a
computer. For example, the data are entered
into SPSS package.
Editing: The data are checked and corrected
on computer for format and structure errors to
ensure that all and only required data are
present. Also, the data are checked and
corrected for out of range and inconsistent
responses.
188
Cont
Recoding: The edited data are transformed
from the actual responses to a set of variables
convenient for analysis.
Tabulation: The recorded data are tabulated
according to the specifications laid down for
writing reports.
Archiving and further analysis: The different
data files with complete documentation are
organized for further research.
189
Cont
It is very important for any meaningful interpretation
of data that all possible errors and inconsistencies are
corrected before the analysis phase.
Thus cleaning or machine editing of data is an
extremely important function involving both the
researchers and data processors.
Essentially, computer editing is a repetition of the
manual editing and is necessary both because of
human error in the manual operation and to correct
errors introduced during coding and punching.
190
Machine Editing
After the office editing, a more comprehensive
checking must be carried out by the computer.
Machine editing can be divided into two main stages.
A)Format and structure check which involves in
checking the following items:
Each part of the identification (e.g. sample area,
household, and line number) contains a valid value.
All sample households are present.
191
Cont
B) Range and Consistency Checks:
All codes are within the ranges specified for them in
the code book.
All skips in the questionnaire have been correctly
executed.
The information recorded is internally consistent.
Dates in the event histories flow in a sequential order
with a specified minimum elapsed time between
events.
192
Cont
The computer is used to locate errors and not
to make corrections.
During the format, structure and consistency
checks, error reports are produced from the
computer.
Correct values are looked up in the original
questionnaires and written into suitable update
forms along with the identification of the
record to be corrected. This work is usually
done by the office editors.
193
Cont
It is, therefore, very important that: (i) the error
reports from the computer are clear and easily
comprehensible to the non-data processing
staff, and (ii) the update forms for writing
down the corrections are simple to fill out. It
should be ensured that careful organization is
done of the way corrections are to be made on
the computer. Questionnaires should be easily
accessible and located on shelves clearly
labelled with the cluster/region to which they
belong.
194
Cont
The editing staff looking up the corrections
must be thoroughly trained on how to interpret
error listings from the computer, how to look
up appropriate corrections and how to fill out
the update forms.
The contents of update forms are key punched
and used to update the computer files. The
whole checking and correction procedure must
be repeated until no more errors are
encountered.
195
What is a codebook?
A codebook describes and documents the
questions asked or items collected in a survey.
The codebook will describe the subject of the
survey or data collection, the sample and how
it was constructed, and how the data were
coded, entered, and processed.
The questionnaire or survey instrument will be
included along with a description or layout of
how the data file is organized.
196
Dr. Madhura Nagchoudhuri & Ms. Divya K.

SW 4- Quantitative Research Methodology
Questions & Hypotheses

n
Survey questions should be based on hypotheses to be

tested.
n Hypotheses expect/assume a relationship between
certain factors/variables- where there is an
independent and dependent variable and the
dependent variable is affected by the independent
variable.
n
Hypotheses are key to data analysis as they define

what it is you want to find and guide analysis.
Elementary Quantitative Analysis

Data: refers to the numbers of measurements that are collected
from the subjects/respondents .
Statistic: a number calculated on the sample data that quantifies
/ describes a characteristic of the sample.
Descriptive statistics- used to describe the data collected (e.g.
measures of central tendency- mean, mode, median, measures of
dispersion- range, standard deviation)
n Inferential Statistics- used to make inferences about the
population from which sample was drawn through use of various
tests (mostly used with ratio and interval level variables).
n Types of analysis- Univariate, Bivariate and Multivariate
analysis
SPSS : What Can it and Cant it Do?

nSPSS is a windows based point and click program.
nSPSS helps organize & analyze data
nSPSS can also help present data through graphs,
charts etc.
nSPSS does not help make decisions related to
analysis.
nSPSS does not interpret analysis.
Purpose of Data Analysis

Main purpose of data analysis is to understand and make sense of the
information/data the researcher has collected through the data collection.
n Through various types of analysis the researcher understands patterns in the
datan
how closely clustered is the data? How close is it to a central
point? - (measures of central tendency)
n
how spread out is the data? (measures of dispersion), how

frequently do certain data points occur ?(frequency)
Nature of relationships between variables- are certain variables related or

unrelated? If related, are variables that are related moving in the same direction
or in the opposite direction?
n
Broadly what do the answer to these questions mean in real terms while
answering the research questions of the study?
n
Data Analysis
n
Univariate Analysis: refers to looking at tables and statistics

that describe one variable at a time (e.g. for categorical
variables frequency tables, scale variables- measures of
central tendency and measures of dispersion).
Bivariate Analysis: refers to tables and statistics that
describes the relationship between two variables.
Multivariate Analysis: refers to tables and statistics that
describes the relationship between multiple variables.
Univariate Table
Table 4.3 HELP TAKEN FOR ADMISSION PROCESS
Help in Admission Process
No. of responses (N=38)
Self
18 (47.3%)
Superintendent of the transit

hostel
8 (21.5%)
Trustees or volunteers
3 (7.8%)
House master /sir of previous

institute
3 (7.8%)
Hostel seniors
2 (5.2%)
Friends
2 (5.2%)
Dont remember
2 (5.2%)
Cross Tabulation (Bivariate

(Bivariate))
Cross Tabulation between Literacy and Gender of
person
Literate * Sex Crosstabulation
Sex
Literate
no
yes
Total
Count
% within Sex
Count
% within Sex
Count
% within Sex
female
2095
21.1%
7847
78.9%
9942
100.0%
male
1517
13.3%
9874
86.7%
11391
100.0%
Total
3612
16.9%
17721
83.1%
21333
100.0%
Different Ways of Data Presentation

Descriptive presentation
Table
Graph
Map
Essential information- Clear Title, Title for Columns,
Total number of cases and percentages, Key for
different colours or figures for graphical
presentation
Graphical Representation
Different types of Graphs: The primary purpose of
graphical representation is to highlight important
features of the data.
- Line graph- displays trends in data
- Bar chart/graph- used to show nominal or ordinal
level data
- Pie chart- used to show nominal or ordinal data
- Histogram- used to represent distributions of
interval or ratio data.
Line graph-is used to display trends in data
Bar charts:- are used to display the distribution of subjects or

cases in particular categories. Usually used for nominal or
ordinal level data
100
90
80
70
Count
60
50
40
low
SES
middle
high
Pie diagram like bar diagram is another way of displaying the

number of subjects or cases within different subsets of categorical
data
h
ig
h
2
9
.
0
%
lo
w
2
3
.
5
%
m
id
d
le
4
7
.
5
%
Measures of Central Tendency

Measures of
Central Tendency: refer to descriptive

statistics that indicate the central location of data
distribution. These statistics summarise data by describing
the most representative values in the dataset.
Mean- refers to the average sum of scores divided by the
number of scores.
Median-refers to the value below which 50% of the scores
fall. It is the centre most score if the no. of scores are odd
and the average of the two center scores if the no. of scores
are even.
Mode: refers to the most frequently occurring score
Measures of Central Tendency: Mean

Mean = Xi/N where N is equal to total sample
The mean is sensitive to the exact value of all the scores in
the distribution.
It is very sensitive to extreme scores.
The mean is least subject to sampling variation as compared
to other measures of central tendency. If repeated samples
were taken from a population the mean would vary
somewhat from sample to sample but it would vary lot less
than median or mode. This is the reason why it is used so
frequently in inferential statistics.
Measures of Central Tendency: Median

Median is less sensitive to extreme scores than
the mean.
Median is more subject to sampling variability
than mean but less than mode
If no. of scores is odd it is the middle value.
If score is even Median = Center
score1+Centre score2/ N
Measures of Dispersion/Variability
n
Measures of Dispersion/Variability: refer to descriptive

statistics that indicate how far apart the scores are spread.
These measures quantify the extent of dispersion from a
central point. Measures of dispersion include
n
Range- refers to the difference between the highest
and the lowest scoress in the distribution i.e.
highest score- lowest score = Range
n
Standard Deviation-indicates the distance of the
scores from around the mean.
n
Variance- square of the standard deviation used
mostly in inferential statistics.
Measures of Dispersion: Standard Deviation

n
Standard deviation gives the measure of dispersion relative

to the mean.
It is sensitive to each score in the distribution so if scores
are moved closer to the mean the standard deviation will
become smaller while if it is moved further from the mean
the standard deviation will become larger.
Like the mean, standard deviation is stable with regard to
sampling fluctuations.
Topics
Univariate, Bivariate Analysis

Descriptive and Inferential Statistics
Measures of Central Tendency
Measures of Dispersion
Measures of Association
Topics
Measures of Association
Correlation
Chi square test
t-test ( for independent samples)
One way ANOVA
Outline of the Session

Coding the questionnaire; identify
levels of measurement; type of
variables.
Frequency distributions
Measures of central tendency and
dispersion.
Parametric and non parametric.
Descriptive and inferential Statistics
31 Aug 2012
217
Types of Statistics
Broadly, statistics are of two types:
I. Descriptive and II. Inferential
Descriptive statistical procedures summarise large
groups of numbers. They are also called summary
statistics.
Ex: Measures of Central tendency, variance, S.D.,
correlation and so on.
The second category of statistics is called inferential
statistics. Inferential statistics are the statistical
techniques used by researchers to generalize from
characteristics of a small group to a larger group not
measured by the researcher.
Ex: t-test, ANOVA, Chi-Square etc.
Parametric and NonNon-parametric statistics

regarding parametric and nonnon Depending on the nature Assumptionsparametric
statistics
of sample distribution
Non-para metric
there are two types of Parametric
Test
hypotheses Test hypotheses that
statistics:
based
on
the
assumption that the
samples come from
populations that are
normally
distributed.
a) parametric and
b) non-parametric or
distribution-free statistics.
The use of nonparametric statistics in

social sciences was on
the increase since
behavioural scientists
and social work
researchers rarely
achieve the sort of
measurement, which
permits the meaningful
use of parametric tests.
Parametric statistics
are
only
for
Interval/ratio levels
of
measurements
though some use it
on ordinal data also.
Assume homogeneity
of variance
do
not
specify
normality
or
homogeneity
of
variance.
Some
researchers prefer to
use these statistics
when
these
two
assumptions
are
violated.
Non parametric
statistics are used for
nominal/ordinal
levels of
measurement
Para metric statistics and analogous nonnon-parametric

procedures
Parametric
Pearson's r
t-test correlated samples
Non-Parametric
Spearmans rank correlation
(rho)
Sign test
t-test independent samples
Mann-Whitney U test
One-way ANOVA
Kruskal-Wallis
ANOVA of ranks
One-way
ANOVA
repeated measures
one-way
with Friedman two-way ANOVA

of ranks
(No similar parametric test)
Chi-square (single sample

/independent samples)
The assumptions need to be fulfilled to use:

Parametric statistical test
Non-para metric statistical test
non-parametric statistical
The observations must be A nontest is a test whose model
independent and the
does not specify conditions
sample a random one.
about the parameters of
the population from which
Observations must be
the sample was drawn.
drawn from normally
Most nonnon-parametric tests
distributed populations.
apply to data in an ordinal
At least, the level of
scale, and some to data in a
nominal scale.
measurement must be on
Simply increasing the size
interval scale
of N increases the
Homogeneity of variance
efficiency of nonnonparametric statistics.
Descriptive Stats:
Frequency Distributions
A frequency distribution is a display of the frequency
of occurrence of each value/score. It can be
presented either in a tabular form or as a graph.
Bar charts are suitable for nominal variables and for
interval/ratio variables histograms and frequency
polygons are useful.
Measures of Central tendency- mean, mode and
median.
The measures of variation include range, standard
deviation and variance.
31 Aug 2012
222
Descriptive Stats:
Frequency Distributions
To obtain frequency table, MCT, and variability:
Analyze>descriptive stat>frequencies>select
variables>statistics>continue>charts/histograms>
OK
31 Aug 2012
223
Descriptive stats
To explain about the differences between
M,Md,Mdn,SD, range, variance.
To show the calculation of Standard deviation.
Mention briefly if necessary about z scores.
Then go to Cross tabulation and chi-square
Multiple response analysis
Correlation.
31 Aug 2012
224
Cross tabulation
Helps us explore the relationship between
variables. It goes beyond descriptive
statistics.
Whereas Chi-square tells us whether two
variables are related or not dependent.
Chi--Square test
Chi
There are different types of Chi-square analysis:
Test for goodness of fit
Applies to the analysis of single categorical
variable and determines if differences in
frequency exist across response categories
compared to the population from which the
sample is drawn.
Test of Independence
Applies to independence or relatedness
between two categorical variables. This is a
very common method used by researchers.
Chi--Square test
Chi
Questions addressed by Chi-square test.
1. Whether attitude toward abortion is

dependent on sex of respondent.
2. Whether mental well being is dependent on
sex or age of respondent.
3. Whether access to toilet facility or piped
water is dependent on household wealth.
4. Whether possession of assets is dependent
on sex of respondent.
Chi--Square test
Chi
Degrees of freedom = (r-1)(c-1).
If the calculated value is higher than the table
value (reported in the output), then we
conclude that there is some significant
association between the two variables.
Significance level usually selected will be: .05 or
lower.
How to report: there is significant relationship
between sex of respondent and the possession
of land as assets (X2 = 34.21, df=1, p<0.000).
Objectives
Explain what is meant by a chi-square goodness of fit test
Conduct a chi-square goodness of fit test
Given a two-way table, compute conditional distributions
Conduct a chi-square test for homogeneity of populations
Conduct a chi-square test for association / independence
Use technology to conduct a chi-square significance test
Chi-Square Distribution
Total area under a chi-square curve is equal to 1
It is not symmetric, it is skewed right
The shape of the chi-square distribution depends on the
degrees of freedom (just like t-distribution)
As the number of degrees of freedom increases, the chi-square
distribution becomes more nearly symmetric
The values of are nonnegative; that is, values of are
always greater than or equal to zero (0); they increase to a
peak and then asymptotically approach 0
Conditions
All Chi-Square tests (Goodness of Fit, Homogeneity,
Independence):
Independent SRSs
All expected counts are greater than or equal to 1
(all Ei 1)
No more than 20% of expected counts are less than
5
Remember it is the expected counts, not the observed
that are critical conditions
Chi-Square Test for Goodness of Fit
Chi-Square Test for Homogeneity

H0: distribution of response variable is the same for all c
populations
Ha: distributions are not the same
z-Test versus Test

We use the test to compare any number of
proportions
The results from the test for 2 proportions will be
the same as a z-test for 2 proportions
z-Test is recommended to compare two proportions
because it gives you a choice of a one-side test and is
related to the confidence interval for p1 p2.
Test of Association/Independence
This test assesses whether this observed association is

statistically significant. That is, is the relationship in the sample
sufficiently strong for us to conclude that it is due to a
relationship between the two variables and not merely to chance.
Correlation
A correlation is a measure of relationship between two or more

variables.
The variables used should be at interval or ratio level.
The correlation coefficient
can range from +1(posititive
correlation) to -1(negative correlation) and a 0 represents no
correlation.
Some of the common applications of Correlation are:
Do people who smoke tend to have higher incidence of
cancer?
Is there a relationship between Level of literacy and Fertility
rate of women?
Does child under nutrition consistently decline as maternal
education improve?
Is Educational attainment and Age at marriage are related?
How is height is related to self-esteem?
The direction of Association

Direction of association
value
Increase
Increase
Decrease
Decrease
Increase
Decrease
Decrease
Increase
No association
Correlation
Correlation and causation.
Association between variable doesnt mean causation.
Sometimes variables may be spuriously correlated.
Problems relating to multicollinearity
How to report the result?
r (N=75).78; p<0.05.
Person
Height (x)
1
68
2
71
3
62
4
75
5
58
6
60
7
67
8
68
9
71
10
69
11
68
12
67
13
63
14
62
15
60
16
63
17
65
18
67
19
63
20
61
Self Esteem (y)

4.1
4.6
3.8
4.4
3.2
3.1
3.8
4.1
4.3
3.7
3.5
3.2
3.7
3.3
3.4
4.0
4.1
3.8
3.4
3.6
Scatterplot of Height and Self Esteem
Pearsons Coefficient of Correlation(r)
Person
Height (x)
Self Esteem (y)
x*y
x*x
y*y
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Sum =
68
71
62
75
58
60
67
68
71
69
68
67
63
62
60
63
65
67
63
61
1308
4.1
4.6
3.8
4.4
3.2
3.1
3.8
4.1
4.3
3.7
3.5
3.2
3.7
3.3
3.4
4
4.1
3.8
3.4
3.6
75.1
278.8
326.6
235.6
330
185.6
186
254.6
278.8
305.3
255.3
238
214.4
233.1
204.6
204
252
266.5
254.6
214.2
219.6
4937.6
4624
5041
3844
5625
3364
3600
4489
4624
5041
4761
4624
4489
3969
3844
3600
3969
4225
4489
3969
3721
85912
16.81
21.16
14.44
19.36
10.24
9.61
14.44
16.81
18.49
13.69
12.25
10.24
13.69
10.89
11.56
16
16.81
14.44
11.56
12.96
285.45
MULTIPLE RESPONSE/CHOICE
ANALYSIS
What are Multiple Choice Questions? How to

code them?
Could you mention the
purpose (s) for which
you normally went out
during last week?
i.
ii.
iii.
iv.
v.
To buy groceries
To visit relatives
To visit friends
To run errands
To attend social
function
vi. To places of worship
vii. Any other..
31 Aug 2012
Treat each choice as a

separate question
Assign one column for
each choice
Code Yes and No as 1
and 2
1. Buy groceries:
Yes=1
No=2
245
Distribution of the sample elderly by

places/purpose of going out
Place /purpose
31 Aug 2012
%*
To attend social functions
172 (73.2)
To buy groceries
160(68.1)
To visit relatives
157(66.8)
To places of worship
146(62.1)
To visit friends / neighbors
138(58.7)
For a stroll
114 (48.5)
To run errands
87(37.0)
Shopping/visit mall
71(30.2)
To hang out in a place in neighborhood
36(15.3)
To park
34(14.5)
Attend exhibitions and events in the city

*percentages do not add to 100 due to multiple
responses.
N= 243
22(9.4)
246
Statistical Inference and

Hypothesis Testing
Statistical Inference
The process of generalization in prescribed manner
from a sample to its universe is known as
Statistical Inference.
Population Parameters
: Population mean
: Population standard deviation
Sample Statistic
x: Sample mean
s: Sample standard deviation
Universe/Population
SAMPLE
HYPOTHESIS TESTING
Hypothesis testing in inferential statistics involves
making inferences about the nature of the
population on the basis of observations of a sample
drawn from the population.
What is Statistical Hypothesis?
A Hypothesis is a statement/conjecture about one or
more population parameters.
What is null hypothesis?

A null hypothesis (H0) is a hypothesis of no
relationship or no difference.
Steps in hypothesis testing?
1. State the Hypothesis
2. Set the criterion for rejecting H0
3. Compute the test statistic
4. Decide whether to reject H0
Type I and Type II Errors

(Huck et. al, 1974)
The researcher normally would state a null hypothesis or an
alternate hypothesis. If a null hypothesis states that there is
no difference, an alternative hypothesis states that there is
a difference.
Ex.: Alternate Hypothesis
Teacher behaviour changes as a function of changes in

student behaviour.
Null Hypothesis
There will be no teacher behaviour changes as a

function of changes in student behaviour.
Selecting a Level of Significance

The level of significance is a probability that defines how rare or unlikely the sample data must be
before the researcher can reject the null hypothesis or confirm alternate hypothesis. The most
common levels used are: .05 and .01 level.
.95 level of confidence means the same thing as the .05 level of significance.
Calculated value from the data and critical value from the statistical table.
Rejecting or Failing to Reject Null Hypothesis
If critical value is larger than the calculated value, then the researcher accepts (or fails to
reject) the null hypothesis.
Possible Errors in Hypothesis Testing
There is a possibility that the researcher will make the wrong decision concerning the null
hypothesis, in either accepting or rejecting it.
A Type I Error is rejecting null hypothesis when it is true (RNT).
A Type II Error is accepting the null hypothesis when it is false (ANF).
One and Two Tailed Tests
A two tailed test is sensitive to significant differences in either direction (i.e. greater and
less); the one-tailed test is sensitive to differences in only one direction (i.e. greater or less).
1. State the Hypothesis

In inferential statistics, the term hypothesis has a very specific
meaning: conjecture about one or more population parameters.
The hypothesis to be tested is called the null hypothesis and is
given the symbol H0.
Example: We use a null hypothesis that the mean quantitative SAT
score of the population of XII standard psychology students is 455.
Thus, our null hypothesis, written in symbols, is
H0: = 455
OR
H0: -455 = 0
Where
=
455 =
population mean
Hypothesis value to be tested
We test the null hypothesis (H0) against the alternative

hypothesis (symbolized H1), which includes the
possible outcomes not covered by the null
hypothesis. For the above example we will use the
alternative hypothesis as
H1 : 455
The alternative hypothesis, often considered the
research hypothesis, can be supported only be
rejecting the null hypothesis
2. Set the Criterion for Rejecting H0

After stating the hypothesis the next step in hypothesis
testing is determining how different the sample statistic
(X) must be from the hypothesized population
parameter () before the null hypothesis can be
rejected. For our example, suppose we randomly select
144 XII standard psychology students from the
population and find the sample mean (X) to be 535. Is
this sample mean X=535) sufficiently different from
what we hypothesize for the population mean ( = 455)
to warrant rejecting null hypothesis. Before answering
this question, we need to consider three concepts: (i)
errors in hypothesis testing, (ii) level of significance, and
(iii) Region of rejection
i. Errors in hypothesis testing

When we decide to reject or not reject the
null hypothesis, there are four possible
situations:
a.
b.
c.
d.
A true hypothesis is rejected.

A true hypothesis is not rejected.
A false hypothesis is not rejected
A false hypothesis is rejected
In a specific situation, we may make one of

two types of errors, as shown in the figure
below:
State of nature
Decision made
Null hypothesis
is true
Null hypothesis
is false
Reject null
hypothesis
Type I error
Correct
decision
Do not reject
null hypothesis
Correct
decision
Type II error
Type I error is when we reject a true null

hypothesis.
Type II error is when we do not reject a false
null hypothesis
ii. Level of significance

To choose the criterion for rejecting H0, the
researcher must first select what is called the level of
significance. The level of significance or alpha ()
level is defined as the probability of making a Type I
error when testing a null hypothesis.
The level of significance is the probability of making a
Type I error: rejecting H0 when it is true.
iii. Region of Rejection

The region of rejection is the area of the sampling
distribution that represents those values of the sample
mean that are improbable if the null hypothesis is true.
The Critical values of the tests statistic are those values
in the sampling distribution that represent the beginning
of the region of rejection.
When the alternative hypothesis is non-directional, the
region of rejection is located in both tails of the
sampling distribution. The test of the null hypothesis
against this non-directional alternative is called a twotailed test
Region of rejection for sampling distribution of the mean for null

hypothesis H0 : = 455 and x = 8.33
3 Compute the Test Statistic

In our example
=455, the hypothesized value for the parameter
n=144, the size of the sample
X= 535, the observed value for the sample statistic
=100, the value of the standard deviation in the
population
First using the concept of z scores, we determine how different X is from , or the
number of standard errors (standard deviation units) the observed sample
value is from the hypothesized value.
In symbols,
z=
X -m
For this example

535 - 455
8.33
= 9.60
z=
calculating the z score using above formula is called

computing the test statistic
4. Decide about H0
Suppose we had found that the sample mean (X) for 144
students was not 535, but 465. Our hypotheses, sampling
distribution, and critical values (+1.96 and -1.96) remain
the same, but now the test statistic is
z=
X -m
465 - 455
= 1.20
8.33
In other words, the observed sample mean (X=465) is 1.20

standard errors above the hypothesized value of the
population mean.
Theoretical sampling distribution for the hypothesis H0:=45,

illustrating the values of the test statistic when X=465
-1.96
+1.96
1.20
Note that the test statistic does not exceed the critical value; it does not fall
into the region of the rejection; and we should not reject the null
hypothesis
Region of rejection : Directional Alternative Hypothesis
In the SAT example, we tested the null hypothesis against a non-directional

alternative:
H0 : = 455
H1 : 455
This test is called two-tailed or non-directional because the region of rejection was
located in both tails of the sampling distribution of the mean.
Suppose a direction of the results is anticipated. A directional hypothesis states that
a parameter is either greater or less than the hypothesis value.
For instance, in the SAT example we might use the alternative hypothesis that the
mean SAT level of our population is greater than 455, in symbols,
H0 : = 455
H1 : > 455
An alternative hypothesis can be either non-directional or directional. A directional
alternative hypothesis states that the parameter is greater than or less than the
hypothesized value. A non-directional alternative hypothesis merely states that the
parameter is different from (not equal to) the hypothesized value.
The test of the null hypothesis against a directional alternative is called a

one-tailed test, the region of rejection is located in one of the two tails of
the sampling distribution. The specific tail of the distribution is
determined by the direction of the alternative hypothesis.
Now suppose the alternative hypothesis states that the mean SAT was less
than 455. In symbols, the hypotheses are
H0 : = 455
H1 : < 455
Here the critical region lies on the left tail of the distribution
Hypothesis Testing when 2 is Unknown

For testing the hypothesis about a population mean when
is not known, we estimate the standard deviation of the
population () by using the standard deviation of the
sample (s). The estimated standard error of the sampling
distribution of sample mean (SX) is then given by
s
sX =
n
Students t Distributions
Does the adjustment of using s to estimate have an effect on the statistical test?
Actually, it does, especially for small samples. The effect is that the normal
distribution is inappropriate as the sampling distribution of the mean. In the
beginning of the 20th century William S. Gosset found that, for small samples,
sampling distribution departed substantially from the normal distribution and that,
as sample sizes changed, the distributions changed. This gave rise to not one
distribution but a family of distributions.
The t distributions are a family of symmetrical, bell-shaped distributions that
change as the sample size changes.
Degrees of Freedom : The number of degrees of freedom is a mathematical
concept defined as the number of observations less the number of restrictions
placed on them.
Students t distribution for 1, 2, 5, 10, and

degrees of freedom
Computation of Test Statistic

when the variance in the population is known and
the normal distribution is used as the sampling
distribution the test statistic is defined as
However, when the variance of the sample is used
as an estimate of population variance, the test
statistic is defined as t
z=
X -m
t=
Where
SX =
X -m
SX
S
n
Test Statistic =
Statistic - Parameter
Standard error of the Statistic
This test statistic is then compared to the critical value. If the test statistic exceeds the critical
values in absolute value, then the null hypothesis is rejected
Point Estimates and Interval Estimates

A point estimate is a single value that represent the
best estimate of the population value. If we are
estimating the mean of a population (), then the
sample mean (X) is the best point estimates.
Interval Estimation builds on points estimation to
arrive at a range of values that are tenable for the
parameter and that define an interval we are
confident contains the parameter.
Confidence Interval
2
When is Known
CI= X (ZCV) (X)
Where
X = Sample mean
ZCV = Critical value using the normal distribution and
X = Standard error of the mean
Confidence Interval
When 2 is Unknown
CI= X (tCV) (sX)
Where
X = Sample mean
tCV = Critical value using appropriate t distribution and
sX = estimated standard error of the mean from the sample
Introduction to Linear Regression and

Correlation Analysis
Goals
After this, you should be able to:
Calculate and interpret the simple correlation

between two variables
Determine whether the correlation is significant
Calculate and interpret the simple linear regression
equation for a set of data
Understand the assumptions behind regression
analysis
Determine whether a regression model is
significant
Goals
(continued)
After this, you should be able to:

Calculate and interpret confidence intervals
for the regression coefficients
Recognize regression analysis applications
for purposes of prediction and description
Recognize some potential problems if
regression analysis is used incorrectly
Recognize nonlinear relationships between
two variables
Scatter Plots and Correlation

A scatter plot (or scatter diagram) is used to
show the relationship between two variables
Correlation analysis is used to measure
strength of the association (linear
relationship) between two variables
Only concerned with strength of the
relationship
No causal effect is implied
Scatter Plot Examples

Linear relationships
y
Curvilinear relationships
y
x
y
x
y

(continued)
Strong relationships
y
Weak relationships
y
x
y
x
y

(continued)
No relationship
y
x
y
Correlation Coefficient
(continued)
The population correlation coefficient

(rho) measures the strength of the
association between the variables
The sample correlation coefficient r is an
estimate of and is used to measure the
strength of the linear relationship in the
sample observations
Features of and r
Unit free
Range between -1 and 1
The closer to -1, the stronger the negative
linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker the linear
relationship
Examples of Approximate
r Values
y
r = -1
r = -.6
r=0
r = +.3
r = +1
Calculating the
Correlation Coefficient
Sample correlation coefficient:
r=
( x - x)( y - y )
[ ( x - x ) ][ ( y - y ) ]
2
or the algebraic equivalent:
r=
n xy - x y
[n( x 2 ) - ( x )2 ][n( y 2 ) - ( y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Calculation Example
Tree
Height
Trunk
Diamete
r
xy
y2
x2
35
280
1225
64
49
441
2401
81
27
189
729
49
33
198
1089
36
60
13
780
3600
169
21
147
441
49
45
11
495
2025
121
51
12
612
2601
144
S=321
S=73
S=3142 S=14111
S=713
Calculation Example
Tree
Height,
y 70
r=
n xy - x y
[n( x 2 ) - ( x)2 ][n( y 2 ) - ( y)2 ]
60
50
40
30
(continued)
8(3142) - (73)(321)
[8(713) - (73) 2 ][8(14111) - (321) 2 ]
= 0.886
20
10
0
0
10
Trunk Diameter, x
12
14
r = 0.886 relatively strong positive

linear association between x and y
Introduction to Regression Analysis

Regression analysis is used to:
Predict the value of a dependent variable
based on the value of at least one
Explain the impact of changes in an
independent variable on the dependent
variable
Dependent variable: the variable we wish to

explain
Independent variable: the variable used to
explain the dependent variable
Simple Linear Regression Model

Only one independent variable, x
Relationship between x and y is
described by a linear function
Changes in y are assumed to be
caused by changes in x
Types of Regression Models

Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
Population Linear Regression

The population regression model:
Population
y intercept
Dependent
Variable
Population
Slope
Coefficient
Independent
Variable
y = 0 + 1x +
Linear component
Random
Error
term, or
residual
Random Error
component
Linear Regression Assumptions

Error values () are statistically independent
Error values are normally distributed for any
given value of x
The probability distribution of the errors is
normal
The probability distribution of the errors has
constant variance
The underlying relationship between the x
variable and the y variable is linear
Population Linear Regression

y
y = 0 + 1x +
(continued)
Observed Value
of y for xi
i
Predicted Value
of y for xi
Slope = 1
Random Error
for this x value
Intercept = 0
xi
Estimated Regression Model

The sample regression line provides an estimate of
the population regression line
Estimated
(or predicted)
y value
Estimate of
the regression
intercept
Estimate of the
regression slope
y i = b0 + b1x
Independent
variable
The individual random error terms ei have a mean of zero
Introduction
Correlation
the strength of the linear relationship between two
variables
Regression analysis
determines the nature of the relationship
Is there a relationship between the number of

units of alcohol consumed and the likelihood of
developing cirrhosis of the liver?
294
Pearsons coefficient of correlation (r)

Measures the strength of the linear relationship
between one dependent and one independent
variable
curvilinear relationships need other techniques
Values lie between +1 and -1

perfect positive correlation r = +1
perfect negative correlation r = -1
no linear relationship r = 0
295
r = +1
Pearsons coefficient of correlation
r = -1
r=0
r = 0.6
296
Scatter plot
BMD
dependent variable
make inferences about
Calcium intake
297
Non-Normal data
298
Normalised
299
SPSS output: scatter plot
300
SPSS output: correlations
301
Interpreting correlation
l
Large r does not necessarily imply:

l
strong correlation
l
r increases with sample size
cause and effect

strong correlation between the number of
televisions sold and the number of cases of
paranoid schizophrenia
l watching TV causes paranoid schizophrenia
l may be due to indirect relationship
l
302
Interpreting correlation
l
Variation in dependent variable due to:

l
l
l
l
l
l
relationship with independent variable: r2

random factors: 1 - r2
r2 is the Coefficient of Determination
e.g. r = 0.661
r2 = = 0.44
less than half of the variation in the dependent
variable due to independent variable
303
304
Agreement
l
Correlation should never be used to determine

the level of agreement between repeated
measures:
l
l
l
measuring devices
users
techniques
It measures the degree of linear relationship

l
You can have high correlation with poor agreement
305
Non-parametric correlation
l
l
l
Make no assumptions
Carried out on ranks
Spearmans r
l
Kendalls t
l
l
l
easy to calculate
has some advantages over r
distribution has better statistical properties
easier to identify concordant / discordant pairs
Usually both lead to same conclusions

306
Role of regression
l
l
Shows how one variable changes with another

By determining the line of best fit
l
l
linear
curvilinear
307
Line of best fit

l
l
Simplest case linear

Line of best fit between:
l
dependent variable Y
l BMD
independent variable X
l dietary intake of Calcium
Y = a + bX
value of Y when X=0 change in Y when X increases by 1
308
Role of regression
l
Used to predict
l
l
l
the value of the dependent variable

when value of independent variable(s) known
within the range of the known data
l
l
extrapolation risky!
relation between age and bone age
Does not imply causality
309
SPSS output: regression
310
Multiple regression
l
More than one independent variable

l
BMD dependent on:

age
l gender
l calorific intake
l Use of bisphosphonates
l Exercise
l etc
l
311
Logistic regression
l
The dependent variable is binary

l
l
yes / no
predict whether a patient with Type 1 diabetes
will undergo limb amputation given history of
prior ulcer, time diabetic etc
l
result is a probability
Can be extended to more than two

categories
l
Outcome after treatment

l
recovered, in remission, died

312
Summary
l
Correlation
l
l
l
l
strength of linear relationship between two variables

Pearsons - parametric
Spearmans / Kendalls non-parametric
Interpret with care!
Regression
l
l
l
l
line of best fit

prediction
Multiple regression
logistic
313
Statistics for Health Research
Regression:
Checking the Model
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Objectives of session
Recognise the need to check fit of
the model
Carry out checks of assumptions in
SPSS for simple linear regression
Understand predictive model
Understand residuals
How is the fitted line

obtained?
Use method of least squares (LS)
Seek to minimise squared vertical
differences between each point and
fitted line
Results in parameter estimates or
regression coefficients of slope (b)
and intercept (a) y=
y=a+bx
a+bx
Dependent (y)
Consider Fitted line of

y = a +bx
a
Explanatory (x)
Consider the regression of age on

minimum LDL cholesterol achieved
Select Regression
Linear.
Dependent (y) Min LDL achieved
Independent (x) - Age_Base
Output from SPSS linear

regression
Coefficientsa
Model
1
Unstandardized Coefficients Standardized Coefficients

B
Std. Error Beta
t
(Constant)
2.024
.105
19.340
Age at baseline
-.008
.002
-.121
-4.546
sig
.000
.000
a. Dependent Variable: Min LDL achieved
N.B. 0.008 may look very small but

represents:
The DECREASE in LDL achieved for each
increase in one unit of age i.e. ONE year
Output from SPSS linear

regression
Coefficientsa
Model
1
Unstandardized Coefficients Standardized Coefficients

B
Std. Error Beta
t
(Constant)
2.024
.105
19.340
Age at baseline
-.008
.002
-.121
-4.546
sig
.000
.000
H0 : slope b = 0
Test t = slope/se = -0.008/0.002 = 4.546 with
p<0.001, so statistically significant
Predicted LDL = 2.024 - 0.008xAge
Prediction Equation from linear

regression
Predicted LDL achieved = 2.024 - 0.008xAge
So for a man aged 65 the predicted LDL
achieved = 2.024 0.008x 65 = 1.504
Age
Predicted Min LDL
45
1.664
55
1.584
65
1.504
75
1.424
Assumptions of Regression
1. Relationship is linear
2. Outcome variable and hence
residuals or error terms are approx.
Normally distributed
Use Graphs and Scatterplot

to obtain the Lowess line of
fit
Use Graphs and Scatterplot to

obtain the Lowess line of fit
1. Create Scatterplot and then
doubledouble
-click to enter chart
editor
2. Chose Icon Add
Add fit line at
total
total
3. Then select type of fit such
as Lowess
Linear assumption: Fitted

lowess smoothed line
Lowess smoothed line (red) gives a good eyeball

examination of linear assumption (green)
Definition of a residual
A residual is the difference between
the predicted value (fitted line) and the
actual value or unexplained variation
ri = y i E ( y i )
Or
ri = yi ( a + bx )
Residuals
To assess the residuals in SPSS

linear regression, select plots..
Normalised
or
standardised
predicted
value of LDL
Normalised
residual
Select
histogram of
residuals and
normal
probability plot
In SPSS linear regression, select

Statistics..
Model fit
Select
confidence
intervals for
regression
coefficients
Select DurbinDurbinWatson for

serial correlation
and identification
of outliers
Output:
Scatterplot of residuals vs. predicted
Note
1) Mean of
residuals
= 0
2) Most of
data lie
within +
or -3
SDs of
mean
Assumptions of Regression
1. Relationship is linear
2. Outcome variable and hence
residuals or error terms are approx.
Normally distributed
Output:
Histogram of standardised residuals
Plot of
residuals
with
normal
curve
supersuper
imposed
Output:
Cumulative probability plot
Look for
deviation
from
diagonal
line to
indicate
nonnon
normality
Output:
Description of residuals
Descriptive statistics for residuals
Subjects with standardised

residuals > 3
Residuals Statisticsa
Casewise Diagnostics(a)
Minimum Maximum
Predicted Value
1.314867 1.843205
Residual
-1.65389 4.0658469
Std. Predicted Value
-2.750
3.264
Std. Residual
-2.302
5.660
Mean Std. Deviation

1.556478
.0878548
.0000000
.7181448
.000
1.000
.000
1.000
Worth
investigation?
N
1383
1383
1383
1383
Case Number
Residual
164
5.660
209
4.395
250
3.143
268
3.064
274
3.227
362
4.095
517
3.636
849
3.968
1047
4.207
1075
3.885
1103
3.519
1229
3.016
1290
3.975
Std. ResidualMin LDL
Predicted
5.5840
4.5260
3.7875
3.8730
4.0953
4.5350
4.3240
4.3290
4.4360
4.4040
3.9905
3.7660
4.2345
4.0658471
3.1573148
2.2581750
2.2013357
2.3180975
2.9415398
2.6122125
2.8508873
3.0223141
2.7907805
2.5279157
2.1667456
2.8553933
1.518153
1.368685
1.529325
1.671664
1.777153
1.593460
1.711788
1.478113
1.413686
1.613219
1.462584
1.599254
1.379107
Output:
Model fit and serial correlation
Model Summary
Model
1
R
.121a
R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson

.015
.014
.7184048
2.034
a. Predictors: (Constant), Age at baseline
R correlation between min LDL achieved and Age at

baseline, here 0.121
R2 - % variation explained, here 1.5%, not particularly
high
Durbin-Watson test - serial correlation of residuals
Durbinshould be approximately 2 if no serial correlation
Summary
After fitting any regression model check
assumptions Functional form linearity is default,
often not best fit, consider quadratic
Check Residuals for approx. normality
Check Residuals for outliers (> 3 SDs)
All accomplished within SPSS
Practical on Model Checking

Read in LDL Data.sav
1) Fit age squared term in min LDL model and
check fit of model compared to linear fit
(Hint: Use transform/compute to create age
squared term and fit age and age2)
2) Fit separate linear regressions with min
Chol achieved with predictors of 1) baseline
Chol 2) APOE_lin 3) adherence
Check assumptions and interpret results
What is ANOVA?
A statistical method for testing whether two or more dependent variable means are
equal (i.e., the probability that any differences in means across several groups are
due solely to sampling error).
Variables in ANOVA (Analysis of Variance):
Dependent variable is metric.
Independent variable(s) is nominal with two or more levels also called
treatment, manipulation, or factor.
One--way ANOVA: only one independent variable with two or more levels.
One
Two--way ANOVA: two independent variables each with two or more levels.
Two
With ANOVA, a single metric dependent variable is tested as the outcome of a
treatment or manipulation.
With MANOVA (Multiple Analysis of Variance), two or more metric dependent
variables are tested as the outcome of a treatment(s).
How Do We State The

Null and Alternative Hypotheses?
H0: The means for all groups are the same

(equal).
Ha: The means are different for at least one pair
of groups.
H0: m1 = m2 = . = mk
Ha: m1 m2 . mk
How do you determine which means are

significantly different?
The FF-statistic assesses whether you can conclude that

statistical differences are present somewhere between
the group means.
But to identify where the differences are you must use
follow--up tests called multiple comparison tests.
follow
Many multiple comparison tests are available in SPSS.
Writing the Research Report

The purpose of the written report is to
present the results of your research,
but more importantly to provide a
persuasive argument to readers of
what you have found.
Components of an Empirical
Research Paper in Economics
Title
Abstract
Table of Contents
Introduction and Literature Survey
Theoretical Analysis
Empirical Testing
Conclusions
References
Introduction
The purpose of the introduction to the
research report is to provide the rationale for
the research. This rationale should address
four issues:
What is the nature of the issue or problem the
research investigates?
Why is this worthy of investigation?
Introduction
What have previous researchers discovered
about this issue or problem?
What does your research attempt to prove?
The Written Literature Review

A literature review is a summary of the major
studies that have been published on a
research topic. Literature review is usually
included as part of the introduction in
research papers.
The Written Literature Review

The literature review should accomplish three goals:
v It should identify the major findings on a topic up to the
present;
v It should point out the principal deficiencies of these studies
or provide a sense of what is lacking in the literature; and
v It should conclude by leading into your research question, by
explaining how your research proposes to contribute to the
literature or address some short-coming of a previous study.
The Most Frequently Asked

Question!
Students frequently ask how many sources
should be included in the literature survey.
What do you think the answer should be?
The Answer
It depends on how many major studies have been
completed on the topic.
If you only report one or two sources, readers may
suspect that you have not put enough effort into
searching the literature. You dont want to miss a
major study, since at best it will make you look
careless and at worst it may weaken the rationale for
your research.
What a Literature Survey is NOT

A list of potential sources of information
about your topic;
A list of sources that you reviewed, or even
A list of summaries of the sources you
reviewed.
Theoretical Analysis
The purpose of this part of research is to
present the theoretical analysis of the issue or
problem you are investigating. This is also
described as presenting your theoretical
model.
Empirical Testing of the Analysis

The purpose of the empirical testing part of
the research report is to provide the empirical
evidence for your research argument. The
theme of this section of the paper can be
summarized as: Given your hypothesis, how
did you test it and what were your findings?
Empirical Testing of the Analysis

This section should include:
The data used;
The empirical model and type of statistical
analysis you employed;
The results you hypothesized;
The actual results; and
Your interpretation of the results.
Conclusions
The purpose of this part of the research report is to
summarize your findings, that is, to restate your
argument and conclude whether or not it is valid. In
light of the statistical results, what can you infer
about your hypothesis? To what extent did your
empirical testing confirm your analysis?
Writing a Research Report

If research was not written up, did
it really occur?

Academic sociologists conduct research to discover
facts, truths, and explanations about the social
world.
They write research reports to convey theirs and
others research findings.
Types of Research:
Library research refers to gathering information that
others have generated.
Primary research refers to generating information
through data collection, analysis, and reporting
findings.

Sociologists articles, papers, or research reports come in
different forms:
Literature Review: Library research that organizes facts and/or
theories others in the sociological community generated (Rarely
published)
Research Article or Book: Ones own findings generated by a
primary research project that builds on previous research by the
sociological community. (Findings from basic research, most
common.)
Applied Research Report: Ones findings from a primary research
project that evaluates a program without drawing much from
previous sociological research. (Findings from applied research,
rarely published.)
This class focuses on writing Research Articles.

A sociological article, paper, or report generally
covers only one important topic of interest and
conveys evidence and interpretations of evidence.
Research reports are NOT creative writing, opinion
pieces, poems, novels, letters, musings, memoirs, or
interesting to read.
A sociological article, paper, or report about primary research

generally takes a structure or form that seems difficult but is
intended to help make reading it or using it for research quick and
efficient.
A research report has seven components:
1.
2.
3.
4.
5.
6.
7.
Abstract or Summary
Introduction
Review of Literature
Methods
Results
Conclusions and Discussion
References
Note:
Qualitative research reports will vary from what is presented here.

Applied research reports may vary from what is presented here.

1.
Abstract or Summary
The abstract or summary tells the reader very briefly what the main
points and findings of the paper are.
This allows the reader to decide whether the paper is useful to them.
Get into the habit of reading only abstracts while searching for
papers that are relevant to your research.
Read the body of a paper only when you think it will be useful to
you.

1. Abstract or Summaryan example

2.
Introduction
The introduction tells the reader:
Introductions should:
what the topic of the paper is in general terms,

why the topic is important
what to expect in the paper.
funnel from general ideas to the specific topic of the paper

justify the research that will be presented later
Introductions are sometimes folded into literature reviews

2. Introductionan example

3.
The literature review tells the reader what other researchers
have discovered about the papers topic or tells the reader
about other research that is relevant to the topic. Often what
students call a research paper is merely a literature review.
A literature review should shape the way readers think about a

topicit educates readers about what the community of
scholars says about a topic and its surrounding issues.
Along the way it states facts and ideas about the social world
and supports those facts and ideas with evidence for from
where they came (empiricism).

3.
Literature reviews have parenthetical citations running

throughout. These are part of a systematic way to document
where facts and ideas came from, allowing the skeptical reader
to look up anything that is questionable.
Parenthetical citation is our way of substantiating the claims in

our paper, without breaking our flow.
Each citation directs the reader to the references where

complete details on sources can be found. Therefore,
information such as authors first names or titles of works do
not need to be written into the text.

3. Review of Literature
Citations consist of authors last names and the year of publication. One
finds complete information on sources by looking up last names and
dates in alphabetized referencesso theres no need to put all that
information in the text.
We have conventions that allow the reader to figure out from where
information is coming . Here are some examples of the conventions for
citing in text of the literature review:
Just pointing out where info came from:
Form: blah blah (Author Year)
Example: the gays are different (Lee 2004).
More than one article in the same year:

Form: blah blah (Author Yeara) and also blah blah (Author yearb)
Example: are different (Lee 2004a), but are more pickled (Lee 2004b)

information is coming . Here are some examples of the conventions for citing
in text of the literature review:
Where a researcher is quoted:
Form: blah, Quote quote (Author Year: Pages)
Example: reveals that the gays are different. (Lee 2004: 340).
More than one source:

Form: blah blah (Author Year; Author Year)
Example: bis are more adept (Lee 2004; Seymour & Hewitt 1997).

information is coming . Here are some examples of the conventions for citing
in text of the literature review:
Using the authors name in a sentence:
Form: Author (Year) says that
Example: Lee (2004) claims that girls will rule the world
Quoting a person and using their name:

Form: Author (Year: Pages) says, Quote quote
Example: Lee (2004: 341) says, Girls are more likely to rule the world

3. Review of Literatureexamples of citing

3.
If an idea is used, but cannot be substantiated by the
community of sociologists, the literature review clearly shows
that the author is speculating and details the logic of the
speculation.
Do NOT discuss irrelevant information.
For example, a paper on attitudes about marijuana attitudes should not

detail the multiple uses of hemp such as in clothing, rope, hemp oil and so
forth.
The literature review has is written in the authors voice. The

sources of information are not extensively quoted or copied
and pasted. Instead, the author puts facts and ideas into his
or her own words while pointing out from where the
information came.
Analogously, if you were discussing the exciting things you learned in a

sociology course at a cocktail party, you would use your own words. You
would NOT pull out a book or lecture notes and quote these word for
word.

3.
Note: Explaining why social events occur as they do requires use (and
testing) of explanations that have worked before. THESE
EXPLANATIONS ARE CALLED THEORIES.
Most academic literature reviews have a guiding theory that is

used to:
Sometimes the whole point of a research project is to:
Frame (or help us understand) facts in the literature.

Establish expectations (or hypotheses) for the research.
Justify speculation when no evidence to justify an idea specific to a
topic exists in the literature.
Determine whether a theory works

Pit two or more theories against each other to see which works
better
You will most likely not refer to theories in your papers

3.
Quantitative literature reviews typically end with:
1.
Focused declarations of the particular issues the research

activity is addressingideas about a topic that will be
tested with quantitative methods
2.
Research hypotheses
Hypotheses are statements of the expected relationship(s)
between two (or more) variables
For example:
Men will have higher investment income than women.
Older Americans are more likely to oppose abortion for a
woman who doesnt want her baby because she is poor.

3. Review of Literatureexamples of hypotheses
Hypothesis 1. In a new social context, girls will be more sociable than boysgetting more involved with
others (interactional commitments) and forming more emotionally close relationships (affective
commitments)across activity domains.
Hypothesis 2. Given that commitments to new relationships positively determine identity prominence,
and identity prominence positively determines behaviors, if girls are more sociable with newer
persons, their identities and behaviors will change more across activity domains.
Hypothesis 3. However, girls and boys will experience the same identity processes, meaning that girls
and boys with the same sociability in new relationships will have equal identity and behavior
changes.

4.
Methods
A METHODS SECTION MUST CONTAIN:

1.
Descriptions of Data (Think in terms of: Who, What, When, Where,

Why and How?)
Report:
A. The Target Population
B. The Ways Data were Collected:
1. Sampling
2. Delivery Methods
C. Response Rates
D. Sample sizes resulting from various decisions
Such as:
1. eliminating non-Christians from the sample
2. using only white respondents

4. Methods
2. Descriptions of Variables
First for dependent, then for independent variables, report:
A. Names for the variablesmake them intuitive! (Do not use
GSS variable names.)
B. Word for word description of the questions. (sociology
differs from psychology and medicine)
C. Final coding schemethe numbers you assigned to
responses.

4.
Methods
3.
Manipulations of the variables or data

For example:
A.
B.
4.
Reflection on ability of data to generalize to the target

population
A.
B.
5.
recoding income from 23 uneven intervals to five equivalent

categories
removing non-citizens if studying voting patterns
Limitations of Data (omitted cases, biases, etc.)

Analyses that bolster claims that the data are appropriate
Statistical techniques that will be used to test your hypotheses

and the statistics program used.

4. Methods
5. Results
The results section chronicles the outcome of

the statistical analyses, assessing whether your
hypotheses were correct and why or why not.

5.
Results
The results section includes:
Narrative describing most relevant findings
Professional tables showing descriptive and inferential

statistics
Tables must be numbered and have a descriptive title

There are conventions for formatting
For example:
Asterisks are used to highlight results that are statistically important
All numbers in a column are aligned on decimals

5. Results

5.
Results
The narrative and tables are complementary.
The narrative discusses ONLY VERY IMPORTANT Results and

leaves details for tables.
As different outcomes are described in the narrative, reference is

made to where the detailed information can be found in the
tables.
The tables contain almost all statistical information so that the

author does not have to write a narrative for every detail in the
analysis.

5. Results
The narrative highlights:
Evaluations of the hypotheses. Were the
research hypotheses supported?
Statements about new discoveries or
surprises encountered in the analyses

6. Conclusions and Discussion
This section assesses how ones research findings
relate to what the community of sociologists have
accepted as facts.
Things that should be done:
1. Summarize the most salient points of your research
(tell the reader what you found out about your
topic).
2. Discuss the general significance of your topic and
findings.

6. Conclusions and Discussion
3. Discuss the shortcomings of your study and how
these might affect your findings.
4. Discuss things future researchers should investigate
about your topic to advance knowledge about it.
5. Help the reader gain the knowledge that you think
he or she ought to have about the topic. You spent a
lot of time exploring the, you should share your
expertise.

7. References
The references are just as important as any other part of
your paper.
References are the empirical support for claims in a
paper that are not directly observed in the research.
They are needed for researchers to remain empirical in
their descriptions of topics.

7. References:
Link the paper to the community of scholars, permitting
readers to assess the worthiness claims in a paper.
Make the research process much more efficient because
they make it very easy to look up sources of facts and
ideas.

7.
References
Style:
Hanging indented
Alphabetical on authors last name (by increasing year within same author)
Invert only first authors name
Information within source in an order determined by type of source
Article:
Last Name, first name, first name last name, and first name last name. Year. Article
title. Journal Name Volume(number): 1st Page- Last Page.
Lee, James Daniel. 2005. Do Girls Change More than Boys? Gender Differences and
Similarities in the Impact of New Relationships on Identities and Behaviors.
Self and Identity 4:131-47.
Multiple authors
Kroska, Amy and Sarah K. Harkness. 2008. Exploring the Role of Diagnosis in the
Modified Labeling Theory of Mental Illness. Social Psychology Quarterly
71:193-208

7.
References
Book Chapter:
Last Name, first name. Year. Chapter Name. Pages in the book in Book Name, edited
by first name last name. City of Publisher: Publisher.
Bianciardi, Roberto. 1997. "Growing Up Italian in New York City." Pp.179-213 in Adult
Narratives of Immigrant Childhoods, edited byAna Relles. Rose Hill, PA:
Narrative Press.
Book:
Last name, first name. Year. Book Name. City of Publisher: Publisher.
Stryker, Sheldon. 1980. Symbolic Interactionism: A Social Structural Version. Menlo
Park, CA: Benjamin/Cummings.

7.
References
General Social Survey:

Davis, James Allan and Smith, Tom W.: General Social Surveys, 1972-2008. [machine-readable
data file]. Principal Investigator, James A. Davis; Director and Co-Principal Investigator, Tom W.
Smith; Co-Principal Investigator, Peter V. Marsden, NORC ed. Chicago: National Opinion
Research Center, producer, 2005; Storrs, CT: The Roper Center for Public Opinion Research,
University of Connecticut, distributor. 1 data file (53,043 logical records) and 1 codebook
(2,656 pp).
Website:
Last Name (if available), first name. Year (if available). Article or web page title. Journal or
Report Name Volume (if available). Retrieved date (http://address).
Markowitz, Robin. 1991. Canonizing the Popular. Cultural Studies Central. Retrieved
October 31, 2001 (http://culturalstudies.net/canon.htm).
Note: Do your best to replicate this style in the case of missing information. If there is no author,
use the title in that position. Always have a retrieved date and website address.

7. Referencesan example

Some General Points
1.
Make accurate sociological claims in your paper. Stake out

positionsa kind of, I think I have the answer to this issue,
position.
2.
Cite facts to support your sociological claims.
3.
If you can, use theories to support your sociological claims.
4.
Every declaration or fact claim must be cited or overtly posed as

speculation.

Some General Points
5.
Anticipate your readers questions as you write:

A.
B.
C.
6.
help the reader understand why your topic is important

demonstrate to the reader that you adequately investigated your
topic
help them anticipate what youll say nexteverything you say should
seem reasonable to say
While writing, keep thinking The point is to:

(1) establish hypotheses
(2) describe how to test the hypotheses
(3) give results of tests, and
(4) discuss what the reader should believe about the world.

Some General Points
7.
There is no right answer in a research paperJust approximate

representations of the truth that are closer or further away from
that truth.
The truth is:
From Community of Scholars:
What they said about your topic in the journals, books, and
other publications
From you:
What your methods and analyses revealed about the

topic.

FinallyAvoiding Plagiarism
What is it?
All knowledge in your head has either been copied

from some place or originally discovered by you.
Most knowledge was copied.
This is true in most settings. General knowledge is
copied. Most teachers lectures are copied
knowledge.
Human culture would not exist without our keen
ability to copy!
Humans are natural copiers, but that is not what is
meant by the term plagiarism.
The Elements of Style endorses imitation as a way for a writer to achieve

his own style:
The use of language begins with imitation . . . The imitative life continues long
after the writer is on his own in the language, for it is almost impossible to
avoid imitating what one admires. Never imitate consciously, but do not worry
about being an imitator; take pains instead to admire what is good. Then
when you write in a way that comes naturally, you will echo the halloos that
bear repeating.
Copied from: http://www.answers.com/topic/writing-style-1

What is it?
Among other things, plagiarism refers to taking others work

and representing it as if it were your own.
In academics this is bad because with plagiarism:
One cannot assess students development accurately

The person who makes his or her livelihood by scholarly pursuit is
being robbed of credit
It masks the lineage of ideas and facts.
Plagiarism is to academics as Enron-accounting is to

corporate America.

Lineage of Ideas:
Original sources of research are all the proof we have for some facts.
Without the paper trail of academic thought:
People could pass incorrect ideas off as facts
We would have to keep re-proving things.
The contexts that generated facts and ideas get lost.
Research becomes highly inefficient as it becomes incredibly difficult to

find full information on a topic.

To avoid plagiarism:
1.
2.
3.
4.
5.
Document every source for information that is not general

knowledgethis includes facts and ideas.
Cite every time a fact or idea is used unless it is clear that one
citation is referring to a group of facts or ideas.
If you quote material, put quotation marks around the quoted
stuff and include a page number within the citation.
It is alright to paraphrase material, but you still have to cite from
where the paraphrased material came.
When in doubt, cite the source.
Improper citing is grounds for failure on the course paper.

SW 4 Quantitative Research Methodology

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

SW 4 Quantitative Research Methodology

Enviado por

Direitos autorais:

Formatos disponíveis

SW 4 Quantitative Research

What is quantitative research

Research Design and Types of Research Design

Steps of the Research Process

Selection of Topic (Problem identification/problem

Selecting Researchable Topics

Research topic is a concept, subject or issue that can be

Sources of Research Questions

Values and Science

Developing Researchable Question

Self study Reviews increase the readers confidence

Sources of Research Literature

Conducting a Systematic Review

Define and refine a topic

Fundamental Concepts of Social Research

Techniques for Narrowing a Topic

Nature and Use of concepts

A concept is abstracted from many sense

Concepts are words or signs that share common

Concepts have two parts: Symbol- word or term and

Concrete concepts- school, age, height, income, housing,

Conceptual and Operational

Conceptual definitions are abstraction,

Operational Definitions of Concepts

Operational definitions ensures that other

Measurability is the main difference between

Variables can be one-dimensional or

Independent variable- The cause variable or

Not simple to decide whether a variable is

Interval variables They have all the properties of

A hypothesis is a proposition to be tested or a

Cause and Effect

Two variables covary

The covariation is not spurious

There is a logical time order

A mechanism is available to explain how an

At least two variables,

Expression of Causal Relationship

Different ways to express- Attendance in College and

Covariation is also called association

Hypothesis- a Word of Caution

The fact that two variables go together does

Research proposal components

The problem identification,

PART II TOOL FOR DATA COLLECTION

All research proposals must have a list of references and in text

Quantitative Research Process: Use of

Deductive Process- Begins with abstract logical relationships

Cyclic Model of Science

Basic Research /academic research/pure

Applied research it is intended to be

Types of Applied Research

Action Research- focused on immediate application,

Impact Assessment- main purpose is to determine

Evaluation Research- purpose to collect information to

A study design is the blue print presenting the

Decisions about Research Design

Key factors to consider-

Research Design: Purpose of

Goals of Exploratory Research

Become familiar with the basic facts, people, and

Goals of Descriptive Research

Provide an accurate profile of a group

Goals of Explanatory Research