Você está na página 1de 12

Chapter 4

Methodology

4.1 Source of Data

The purpose of this section is to give an overview about the data

that will be used for this study which is the Family Income and

Expenditures Survey (FIES) 1997.

4.1.1 General Background

The Family Income and Expenditures Survey (FIES) 1997 is a nationwide

survey with two visits per survey period on the same households conducted by the

National Statistics Office (NSO) every three years. The objectives of the survey

are as follows:

a. to gather data on family income and family living expenditures and

related information affecting income and expenditure levels and

patterns in the Philippines;

b. to determine the sources of income and income distribution, levels of

living and spending patterns, and the degree of inequality among

families,

c. to provide benchmark information to update weights in the estimation

of consumer price index and


d. to provide information in the estimation of the country's poverty

threshold and incidence.

4.1.2 Sampling Design and Coverage

The sampling design method for the FIES 1997 is a

stratified multi – stage sampling design consisting of 3,416

Primary Sampling Units (PSU’s) for the provincial estimate with

as subsample of 2,247 PSU’s as a master sample for the regional

level estimates. (National Statistics Office [NSO], 1997-2005)

This multi stage sampling design involved three stages.

First is the selection of sample barangays. Second is the

selection of sample enumeration areas, which is a physically

delineated portion of the barangay. This was followed by a

selection of sample households. The sampling frame and

stratification of the three stages were based on the 1995 Census

of Population (POPCEN) and 1990 Census of Population and

Housing (CPH). From this method, a sample of 41,000 households

participated in this survey. (NSO, 1997-2005)

4.1.3 Survey Characteristics

The FIES 1997 questionnaire contains about 800 data

items, where questions are asked by the interviewer to the


respondent of the selected sample household. A respondent is

defined as the household or the person who manages the

finances of the family or any member of the family who can give

reliable information to the questionnaire. (NSO, 1997-2005)

The items or variables to be gathered in the survey are as

follows:

Table 2: The Variables Gathered in the FIES 1997

Part I – Identification A. Identification of the Household


and Other B. Other Information:
Information 1. Particulars about the Head of the Family
a) Sex
b) Age as of Last Birthday
c) Martial Status
d) Highest Grade Completed
e) Employment Status
f) Occupation
g) Kind of Indutry / Business
h) Class of Worker
2. Other information about the Household
a) Type of Household
b) Number of Family Members Enumerated
c) Number of boarders, helpers and other
non-relatives
d) Number of Family Members who are
Employed for Pay or Profit
Part II - Expenditures A. Food, Alcoholic Beverages and Tobacco
and Other
Disbursements 1. Particulars about the Head of the Family
a) Cereals and Cereal Preparations
b) Roots and Tubers
c) Fruits and Vegetables
d) Meat and Meat Preparations
e) Dairy Products and Eggs
f) Fish and Marine Products
g) Coffee, Cocoa and Tea
h) Non-Alcoholic Beverages
i) Food Not Elsewhere Classified
2. Food Regularly Consumed Outside the Home
3. Alcoholic Beverages
4. Tobacco
5. Food Items, Alcoholic Beverages and
Tobacco Received as Gifts
B. Fuel, Light and Water, Transportation and Communication
and Household Operation
C. Personal Care and Effects, Clothing, Footwear
and Other Wear
D. Education, Recreation and Medical Care
E. Furnishings and Equipment
F. Taxes
G. Housing, House Maintenance and Minor Re-
pairs
H. Miscellaneous Expenditures
I. Other Disbursements
Part III – Income and A. Salaries and Wages from Employment
Other Receipts B. Net Share of Crops, Fruits and Vegetables Pro-
duced or Livestock and Poultry Raised by Oth-
er Households
C. Other Sources of Income
1. Cash Receipts, Gifts, Support, Relief and
Other Forms of Assistance From Abroad
2. Cash Receipts, Support, Assistance and
Relief from Domestic Source
3. Rentals Received From Non-Agricultural
Lands, Buildings, Spaces and Other
Properties
4. Interest
5. Pension and Retirement, Workmen's
Compensation and Social Security Benefits
6. Net Winnings from Gambling, Sweepstakes
and Raffle
7. Dividends From Investment
8. Profits from Sale of Stocks, Bonds and Real
and Personal Property
9. Back pay and Proceeds from Insurance
10.Inheritance

D. Other Receipts

4.1.4 Survey Nonresponse

Two types of nonresponse occurred in the 1997 FIES. The

first type of nonresponse which resulted from factors such as

being unaware of the question, unwilling to provide the answer

or omission of the question during the interview is called the

item nonresponse. (NSO, 1997-2005)

The other type of nonresponse which is due to households

being temporarily away, on vacation, not at home, demolished or


transferred residence during the second visit is called as partial

nonresponse. This type of nonresponse totaled to only 3.6% of

the total number of respondents. (NSO, 1997-2005)

The NSO has only devised the deductive imputation for

solving the problem of item nonresponse while no specific

method was made to compensate for the partial nonresponse.

(NSO, 1997-2005)

Hence, the researchers will focus on the comparison of

imputation procedures for partial nonresponse. The first selection

made by the researchers is the choice of regional data set to

which the imputation techniques will be applied. In this case, the

National Capital Region (NCR) was chosen because it was noted

as the region with highest nonresponse rate. The data consist of

4,130 observations, 39 categorical variables and the rest are

continuous variables pertaining to income and expenditures of

the respondents. Using Nordholt’s criteria on selecting which

variables should be imputed such as the importance of the

variable in the survey and the percentage of nonresponse

(Nordholt, 1998), the variables of interest that the researchers

chose were Total Income (TOTIN) and Total Expenditure (TOTEX).


4.2 The Simulation Method

In order to investigate and make an empirical comparison of the

statistical properties of the estimates with imputed values using

selected imputation methods, a data set with missing observations was

simulated. This simulation method will create an artificial data set with

missing observations to indicate which values will be imputed.

The algorithm for this simulation procedure is as follows:

1. A matrix of random numbers was generated in order to

satisfy the assumption that the data was Missing

Completely at Random (MCAR).

2. This matrix of random numbers was matched to each

observation of the FIES 1997 second visit variables TOTIN

and TOTEX.

3. The second visit observations were sorted in ascending

order through their corresponding random number.

4. To get the number of nonresponse observations, the FIES

1997 data set, which is 4,130, was multiplied to the

indicated nonresponse rate. The nonresponse rates used

for this study were 10%, 20% and 30%. The rational for

setting different nonresponse rates is because the study

aims to investigate the effect of varying nonresponse rates

for each imputation method.


5. The observations that were set as nonresponse were

identified and deleted. The observations which were

deleted were flagged in order to distinguish the imputed

values from the actual values.

6. To ensure that the data satisfies the MCAR assumption and

to prevent the selection of an odd sample of deleted cases

(Kalton, 1983); the simulation method was replicated 1,000

times.

This simulation method was implemented with the use of the

Decimal Basic program, SIMULATION.BAS (see Appendix for the Source

Code) where the files Simulated Values for Income (SIMI) and Simulated

Values for Expenditure (SIME), a matrix containing nonresponse observations for the

income and expenditure were stored in order to use it in the application of the imputation

methods.

4.3 Formation of Imputation Classes

Imputation classes are stratification classes that divide the data

in order to produce more homogeneous groupings. Assuming that the

units that have the same characteristics have the propensity to give

the same response, the formation of imputation classes would help

reduce the biasness of the estimates.


The steps undertaken in the formation of the imputation classes

are as follows:

1. The researchers identified the potential matching variables,

which are the candidate variables that could have a relationship

with the variables of interest (i.e. TOTIN and TOTEX).

2. These variables must fit into the criteria in order to be selected

as a matching variable. Three criteria were used as a basis for

selecting the matching variables. The first criterion is that the

variable must be known. Second, the matching variable must be

easy to measure. Lastly, the probability of missing observations

for matching variable is small. If the candidate variables would fit

in the three criteria, then it can be used as a matching variable.

3. For the variables that have many categories, the researchers

reduced the number of categories for these variables. The

rationale for this procedure is because having too many

categories can increase heterogeneity and the biasness of the

estimates. This was done with the use of the software Statistica,

particularly, the Recode function.

4. Measures of association were tested on the matching variables.

The Chi Squared test was the first test applied on the variables.

This was made to determine if the matching variables is a

significant factor or has a great degree of association for the

variables of interest.
5. Other tests for measuring the association of matching variables

to the variables of interest followed. The purpose of these tests is

to find the best matching variable that would divide the data into

imputation classes. For the tests of association, three tests were

used namely Phi-coefficient, Cramer's V and Contingency Test.

The matching variable with the greatest degree of association

will be chosen as the variable to be used in the formation of

imputation classes.

All these tests were made using statistical packages Statistica

and SPSS. The results of these tests will be presented in the next

chapter.

4.4 Performing the Imputation Techniques

4.4.1 Overall Mean Imputation

The Overall Mean Imputation (OMI) is an imputation procedure

where the missing observations are replaced with the mean of the

variable which contains available units. As said in the previous chapter,

this imputation method does not require the formation of imputation

classes, which makes this method as the simplest procedure among

the four methods in this study.

The procedures in applying the Overall Mean Imputation (OMI)

are as follows:
1. The overall mean for the variables of interest,

TOTIN and TOTEX, for the first visit was computed.

The formula that was used for the computation of

the overall mean is:

m
∑ yri
yOMI = i =1
r
Where:

yOMI is the overall mean for the first visit TOTIN or TOTEX

yri is the first visit observation for the variable TOTIN or TOTEX

r is the total number of responding units for the first visit variable

TOTIN or TOTEX

2. Using the output from the simulation method, the

missing observations for the second visit variables

TOTIN and TOTEX were replaced with the overall

means of the first visit TOTIN and TOTEX.

The implementation of the Overall Mean Imputation (OMI) was

made through the Decimal Basic program OMI.BAS. (See Appendix for

the source code).


4.4.2 Hot Deck Imputation

The Hot Deck (HD) Imputation is an imputation procedure where the missing

observations are replaced by choosing a value from the set of available units.

The steps undertaken in applying the Hot Deck (HD) Imputation are

as follows:

1. The donor and recipient files are sorted before allocating

values to the missing observations.

2. The values that were substituted for the missing

observations in the second visit were randomly chosen from

the donor record, which is the first visit record for each

imputation class.

3. Using the output in the simulation method, the missing

observations for the second visit variables TOTIN and

TOTEX were replaced with the selected donor records from

the first visit TOTIN and TOTEX.

The implementation of the Hot Deck (HD) Imputation was made

through the Decimal Basic program HOT DECK.BAS. (See Appendix

for the source code)

Você também pode gostar