Você está na página 1de 30

Why do we need to identify problems?

A clearly specified list of problems is the most suitable basis for


identifying potential solutions. Problems can be identified, both now
and in the future, as evidence that objectives are not being
achieved. However, objectives are often rather abstract, and it may
be easier for members of the public to understand a strategy based
on clearly identified problems. This problem-oriented approach to
strategy formulation is an alternative to starting with objectives, but
does still need to be checked against the full list of objectives.
What types of problem are we concerned with?
One of the easiest ways of specifying problems is by reference to a
set of objectives (Section 7). This enables the question how do we
know we have got a problem? to be answered more easily. For
example, the efficiency objective relates to problems of congestion
and unreliability; the safety objective to accidents. The two
concepts, objectives and problems, are two sides of the same coin.
We can start either with objectives or problems and come to the
same conclusions. The box shows the problems which are considered
in the Policy Guidebook.
Problem
Congestion-related delay
Congestion-related unreliability
Community severance
Visual intrusion
Lack of amenity
Global warming
Local air pollution
Noise
Reduction of green space

Damage to environmentally sensitive sites


Poor accessibility for those without a car and those with mobility
impairments
Disproportionate disadvantaging of particular social or geographic
groups
Number, severity and risk of accidents
Suppression of the potential for economic activity in the area
How can we decide if a problem is occurring and how serious
it is?
Problems may be identified in a number of ways:
Consultation
People can identify the problems that they encounter when travelling
and which result from other people travelling. Transport providers
can be consulted about the operational problems which they face.
This is a key element of the participation process (Section 5). People
will naturally have more reliable views about current problems than
those predicted to occur at some future date. Problem identification
through consultation is therefore of most use for current problems.
Objective analysis
Objective analysis of problems requires the adoption of an
appropriate set of indicators and targets (Section 7). When a
condition is measured or predicted to differ from a threshold, then a
problem is said to exist. A range of thresholds can be set, so that
problems may be graded by severity. Thus, for example, noise levels
which exceed, say, 65dB(A), 70dB(A) and 75dB(A) could be classed
as slight, moderate and severe noise problems. When thresholds
are defined, they can be used, with current data, to identify current
problems. Given an appropriate predictive model, a similar exercise
can be conducted for a future year. This is shown in the feedback
loop from Predict Impacts to Assess Problems inSection 6.
Monitoring
Regular monitoring of conditions, using similar indicators to those for
objective analysis, is another valuable way of identifying problems,
and is covered further in Section 15. As well as enabling problems,
and their severity, to be specified, a regular monitoring programme
enables trends to be observed, and those problems which are
becoming worse to be singled out for treatment.
Why is it useful to determine the severity of problems now
and in the future?
If problems are identified through consultation, the city authority is

Text edited at the Institute for Transport Studies, University of Leeds, Leeds
LS2 9JT

Selection of a sample

The sample design

The survey population

The survey frame

The survey units

The sample size

The sampling method

Sampling allows statisticians to draw conclusions about a whole by examining a


part. It enables us to estimate characteristics of a population by directly observing
a portion of the entire population. Researchers are not interested in the sample
itself, but in what can be learned from the surveyand how this information can be
applied to the entire population.
It is essential that a sample survey be correctly defined and organized. If the wrong
questions are posed to the wrong people, statisticians will not receive information
that will be useful when applied to the entire population.

In the context of a national statistical agency like Statistics Canada, the following
steps are needed to select a sample and ensure that this sample will fulfill its goals.

Establish the survey's objectives


The first step in planning a useful and efficient survey is to specify the objectives
with as much detail as possible. Without objectives, the survey is unlikely to
generate usable results. Clarifying the aims of the survey is critical to its ultimate
success. The initial users and uses of the data should be identified at this stage.
The pros and cons of a census versus a sample survey or the use of administrative
records should be evaluated and a decision made as to the most appropriate
method. (At this point, we will assume that a sample survey is the best way to
proceed in order to obtain the information we need. This assumption will hold true
for the remainder of the sample selection steps, even though many of the steps
mentioned will also apply to the other methods.)

Define the target population


The target population is the total population for which the information is required.
For example, if you were to conduct a survey about the most popular types of cars
in Saskatchewan, then the target population would be every car in Saskatchewan.
The units that make up the population must be described in terms of characteristics
that clearly identify them. Specifically, the target population is defined by the
following characteristics:

Nature of data required: about persons, hospitals, schools, etc.

Geographic location: the geographic boundaries of the population have to be


determined, as well as the level of geographic detail required for the survey
estimate (by province, by city, etc.).

Reference period: the time period covered by the survey.

Other characteristics, such as socio-demographic characteristics (interest in a


particular age group, for example) or type of industry.

Decide on the data to be collected


he data requirements of the survey must be established. To ensure that the
requirements are operationally sound, the necessary data terms and definitions also
need to be determined.

Set the level of precision

As mentioned in the section on Sampling error, there is a level of uncertainty


associated with estimates coming from a sample. For example, if you are trying to
estimate the average distance between home and school for students in your class
of 25 from a sample of 5 persons, your estimate will depend on who the 5 sampled
students are. If the 5 sampled students also live close to the school, the results will
not be able to represent the class accurately. This sample-to-sample variation is
what causes the sampling error. Statisticians can estimate the sampling error
associated with a particular sampling plan, and try to minimize it.
When designing a survey, the acceptable level of uncertainty in the survey
estimates has to be established. This level depends on what the end use of the
results will be and on the size of the overall budget. The bigger the budget, the
more resources available, and thus, less chance for error. And if the end result is to
serve a specific purpose, then the acceptable level of uncertainty would be smaller
than an end result that is simply looking for general trends.
The level of uncertainty will also be determined by the sample size. Increasing the
sample size will decrease the sampling error. (If you sample 24 out of 25 students
in your class, there will not be as much sample-to-sample variation as there would
be if you only sampled 5 students from among the 25 possible samples.)
Top of Page

The sample design


Once the objectives, guidelines and definitions have been worked out, the
statistician can work on the survey plan. The survey plan is divided into three parts:

Sample design: how the sample will be collected.

Estimation techniques: how the results from the sample will be extended to
the whole population.

Measures of precision: how the sampling error will be measured.

The estimation techniques and measures of precision are discussed in a later


section. For the moment, we will look at the sample design. The following steps
lead to the complete determination of the sample design:
1.

Determine what the survey population will be (e.g., students, men aged 20
to 35, newborn babies, etc.).

2.

Choose the most appropriate survey time frame.

3.

Define the survey units.

4.

Establish the sample size (e.g., a sample of 100 from a population of


1,000).

5.

Select a sampling method.


Top of Page

The survey population


The target population must be defined early in the survey-designing process. This is
the population for which information is required. However, some members of the
population have to be excluded because of operational constraints: the high cost of
collecting data in some remote areas, the difficulty of identifying and contacting
certain components of the target population, etc. For example, because it would be
too difficult to locate and survey each car owned by every resident in
Saskatchewan, a survey population of just the major cities and towns might be
conducted instead. When some of the members of the target population are
excluded, we call the included population the survey population or, what is
sometimes called, an observed population. The target population is the population
we want to observe while the survey population is the population we can
observe.
The goal of this process is to have the survey population as close as possible to the
target population. It is also very important that the users of the data be informed of
the differences between the two populations, as the results of the survey will apply
only to the survey population.
For example, a target population for a survey could be all Canadians aged 15 years
and over (on a particular reference date), while the survey population could exclude
residents of the Yukon, Nunavut and Northwest Territories, persons living on
Aboriginal reserves, full-time members of the Canadian Armed Forces and residents
of institutions. These Canadians might be excluded for various reasons: to survey
people in the territories might prove to be difficult and expensive, military
personnel may not be available for surveying if they are out on a mission, etc.
Using this example, about 2% of the target population would be excluded from the
survey population.
Top of Page

The survey frame


The survey frame, also called the sampling frame, is the tool used to gain access to
the population. There are two types of frames: list frames and area frames. A list
frame is just a list of names and addresses that provide direct access to 'individuals'
(e.g., a list of hospitals, a list of restaurants, a list of students at a university). Area
frames are a list of geographic areas that provide indirect access to individuals
(e.g., the neighbourhoods in a city). This type of access is called indirect because
first, a list of geographic areas must be selected and then, access to individuals
within each selected area must be worked out.

For instance, suppose that you were surveying a rural town in Quebec to see what
percentage of residents are farmers. If you were provided with an area frame, then
you would be able to locate which roads to visit, but you would still have to find out
the names and addresses of the residents on each road.
When there is no single frame that is appropriate, multiple frames can be used.
Some sampling techniques using both types of frames will be discussed later.
A good frame should be complete and up-to-date; no member of the survey
population should be excluded from the frame or duplicated on the frame
(represented more than once); and no unit that is not part of the population (e.g.,
deceased persons) should be on the frame. The frame chosen will impact the
selected survey population. For instance, if a list of telephone numbers is used to
select a sample of households, then all households without telephones are excluded
from the survey population.
Top of Page

The survey units


There are three types of units that have to be accurately identified in order to avoid
problems during the selection, data collection and data analysis stages. They are as
follows:

The sampling unit is part of the frame and therefore subject to being
selected.

The respondent unit or reporting unit provides the information needed by the
survey.

The unit of reference or unit of analysisthe unit about which information is


providedis used to analyse the survey results.

For example, in a survey about newborns in Edmonton, the sampling unit might be
a household, the reporting unit one of the parents or a legal guardian, and the unit
of reference the baby.
The sampling units may differ depending on the frame used. This is why the survey
population, survey frame and survey units are defined in conjunction with one
another.
Top of Page

The sample size

The level of precision needed for the survey estimates will impact the sample size.
However, it is not as easy to determine the sample size as one may think.
Generally, the actual sample size of a survey is a compromise between the level of
precision to be achieved, the survey budget and any other operational constraints,
such as budget and time. In order to achieve a certain level of precision, the
sample size will depend, among other things, on the following factors:

The variability of the characteristics being observed: If every person in a


population had the same salary, then a sample of one person would be all
you would need to estimate the average salary of the population. If the
salaries are very different, then you would need a bigger sample in order to
produce a reliable estimate.

The population size: To a certain extent, the bigger the population, the bigger
the sample needed. But once you reach a certain level, an increase in
population no longer affects the sample size. For instance, the necessary
sample size to achieve a certain level of precision will be about the same for
a population of one million as for a population twice that size.

The sampling and estimation methods: Not all sampling and estimation
methods have the same level of efficiency. You will need a bigger sample if
your method is not the most efficient. But because of operational constraints
and the unavailability of an adequate frame, you cannot always use the most
efficient technique.
Top of Page

The sampling method


There are two types of sampling methods: probability sampling and non-probability
sampling. The difference between them is that in probability sampling, every unit
has a 'chance' of being selected, and that chance can be quantified. This is not true
for non-probability sampling; every item in a population does not have an equal
chance of being selected. The next section will describe features of both types of
sampling and detail some of the methods related to each type

POPULATIONS AND SAMPLING

Populations
Definition - a complete set of elements (persons or objects) that possess some
common characteristic defined by the sampling criteria established by the
researcher

Composed of two groups - target population & accessible population

Target population (universe)

The entire group of people or objects to which the researcher


wishes to generalize the study findings

Meet set of criteria of interest to researcher

Examples

All institutionalized elderly with Alzheimer's

All people with AIDS

All low birth weight infants

All school-age children with asthma

All pregnant teens

Accessible population

the portion of the population to which the researcher has


reasonable access; may be a subset of the target population

May be limited to region, state, city, county, or institution

Examples

All institutionalized elderly with Alzheimer's in St. Louis


county nursing homes

All people with AIDS in the metropolitan St. Louis area

All low birth weight infants admitted to the neonatal


ICUs in St. Louis city & county

All school-age children with asthma treated in pediatric


asthma clinics in university-affiliated medical centers in
the Midwest

All pregnant teens in the state of Missouri

Samples
Terminology used to describe samples and sampling methods

Sample = the selected elements (people or objects) chosen for


participation in a study; people are referred to as subjects or
participants

Sampling = the process of selecting a group of people, events, behaviors,


or other elements with which to conduct a study

Sampling frame = a list of all the elements in the population from which
the sample is drawn

Could be extremely large if population is national or


international in nature

Frame is needed so that everyone in the population is identified


so they will have an equal opportunity for selection as a subject
(element)

Examples

A list of all institutionalized elderly with Alzheimer's in


St. Louis county nursing homes affiliated with BJC

A list of all people with AIDS in the metropolitan St.


Louis area who are members of the St. Louis Effort for
AIDS

A list of all low birth weight infants admitted to the


neonatal ICUs in St. Louis city & county in 1998

A list of all school-age children with asthma treated in


pediatric asthma clinics in university-affiliated medical
centers in the Midwest

A list of all pregnant teens in the Henderson school


district

Randomization = each individual in the population has an equal


opportunity to be selected for the sample

Representativeness = sample must be as much like the population in as


many ways as possible

Sample reflects the characteristics of the population, so those


sample findings can be generalized to the population

Most effective way to achieve representativeness is through


randomization; random selection or random assignment

Parameter = a numerical value or measure of a characteristic of the


population; remember P for parameter & population

Statistic = numerical value or measure of a characteristic of the


sample;remember S for sample & statistic

Precision = the accuracy with which the population parameters have


been estimated; remember that population parameters often are based on
the sample statistics

Types of Sampling Methods - probability & non-probability

Probability Sampling Methods

Also called random sampling

Every element (member) of the population has a probability


greater than) of being selected for the sample

Everyone in the population has equal opportunity for selection


as a subject

Increases sample's representativeness of the population

Decreases sampling error and sampling bias

Types of probability sampling - see table in course materials for details

Simple random

Elements selected at random

Assign each element a number

Select elements for study by:

1. Using a table of random numbers in book

A table displaying hundreds of digits from 0 to 9


set up in such a way that each number is equally
likely to follow any other

See text for random sampling details & table of


random numbers
Computer generated random numbers table
Draw numbers for box (hat)
Bingo #=s
Stratified random

Population is divided into subgroups, called strata, according

to some variable or variables in importance to the study

Variables often used include: age, gender, ethnic origin, SES,


diagnosis, geographic region, institution, or type of care

Two approaches to stratification - proportional &


disproportional

Proportional

Subgroup sample sizes equal the proportions of


the subgroup in the population

Example: A high school population has

15% seniors

25% juniors

25% sophomores

35% freshmen

With proportional sample the sample has


the same proportions as the population

Disproportional

Subgroup sample sizes are not equal to the


proportion of the subgroup in the population

Example

Class

Population

Sample

Seniors

15%

25%

Juniors

25%

25%

Sophomores

25%

25%

Freshmen

35%

25%

With disproportional sample the


sampledoes not have the same proportions
as the population

Cluster random sampling

A random sampling process that involves stages of sampling

The population is first listed by clusters or categories

Procedure

Randomly select 1 or more clusters and take all of their


elements (single stage cluster sampling); e.g. Midwest
region of the US

Or, in a second stage randomly select clusters from the


first stage of clusters; eg 3 states within the Midwest
region

In a third stage, randomly select elements from the


second stage of clusters; e.g. 30 county health dept.
nursing administrators from each state

Systematic

A random sampling process in which every kth (e.g. every 5th


element) or member of the population is selected for the
sample after a random start is determined

Example

Population (N) = 2000, sample size (n) = 50, k=N/n, so k


= 2000 ) 50 = 40

Use a table of random numbers to determine the starting

point for selecting every 40th subject

With list of the 2000 subjects in the sampling frame, go


to the starting point, and select every 40th name on the
list until the sample size is reached. Probably will have to
return to the beginning of the list to complete the
selection of the sample.

Non-probability sampling methods

Characteristics

Not every element of the population has the opportunity for selection in
the sample

No sampling frame

Population parameters may be unknown

Non-random selection

More likely to produce a biased sample

Restricts generalization

Historically, used in most nursing studies

Types of non-probability sampling methods

Convenience - aka chunk, accidental & incidental sampling

Selection of the most readily available people or objects for a


study

No way to determine representativeness

Saves time and money

Quota

Selection of sample to reflect certain characteristics of the


population

Similar to stratified but does not involve random selection

Quotas for subgroups (proportions) are established

E.g. 50 males & 50 females; recruit the first 50 men and first 50
women that meet inclusion criteria

Purposive - aka judgmental or expert's choice sampling

Researcher uses personal judgement to select subjects that are


considered to be representative of the population

Handpicked subjects

Typical subjects experiencing problem being studied

Snowball

Also known as network sampling

Subjects refer the researcher to others who might be recruited


as subjects

Time Frame for Studying the Sample

See design notes on longitudinal & cross-sectional studies

Longitudinal

Cross-sectional

Sample Size

General rule - as large as possible to increase the representativeness of the sample

Increased size decreases sampling error

Relatively small samples in qualitative, exploratory, case studies, experimental


and quasi-experimental studies

Descriptive studies need large samples; e.g. 10 subjects for each item on the
questionnaire or interview guide

As the number of variables studied increases, the sample size also needs to
increase in order to detect significant relationships or differences

A minimum of 30 subjects is needed for use of the central limit theorem (statistics
based on the mean)

Large samples are needed if:

There are many uncontrolled variables

Small differences are expected in the sample/population on variables of


interest

The sample is divided into subgroups

Dropout rate (mortality) is expected to be high

Statistical tests used require minimum sample or subgroup size

Power Analysis

Power analysis = a procedure for estimating either the likelihood of committing a Type II
error or a procedure for estimating sample size requirements

Background Information for Understanding Power Analysis:


Type I and Type II errors

Type I error

Based on the statistical analysis of data, the researcher wrongly rejects a


true null hypothesis; and therefore, accepts a false alternative hypothesis

Probability of committing a type I error is controlled by the researcher


with the level of significance, alpha.

Alpha is the probability that a Type I error will occur

Alpha is established by researcher; usually = .05 or .01

lpha = .05 means there is a 5% chance of rejecting a true


null hypothesis; OR out of 100 samples, a true null hypothesis
would be rejected 5 times out of 100 and accepted 95 times out of
100.

Alpha = .01 means there is a 1% chance of rejecting a true null


hypothesis; OR out of 100 samples, a true null hypothesis would
be rejected 1 time out of 100 and accepted 99 times out of 100

Type II error

Based on the statistical analysis of data, the researcher wrongly accepts a


false null hypothesis; and therefore, rejects a true alternate hypothesis

Probability of committing a Type II error is reduced by a power analysis

Probability of a Type II error is called beta

Power, or 1- is the probability of rejecting the null hypothesis


and obtaining a statistically significant result

Type I & Type II Errors

In the real world, the


actual situations is
that the null
hypothesis is :

In the real world, the


actual situations is
that the null
hypothesis is :

True

False

Correct decision: the


actual true null is
accepted

Type II error: the


actual false null is
accepted

Type I error: the


actual true null
hypothesis is rejected

Correct decision: the


actual false null is
rejected & alternate is
accepted

Based on statistical analysis,


the researcher concludes that:
Null true: Null hypothesis is
accepted

Based on statistical analysis,


the researcher concludes that:
Null false: Null hypothesis is
rejected & alternate is accepted

Background Information for Understanding Power Analysis:


Population Effect Size - Gamma

Gamma measures how wrong the null hypothesis is; it measures how strong the
effect of the IV is on the DV; and it is used in performing a power analysis

Gamma is calculated based on population data from prior research studies, or


determined several different ways depending on the nature of the data and the
statistical tests to be performed

The textbook discusses 4 ways to estimate gamma (population effect size) based
upon:

Testing the difference between 2 means (t-test)

Testing the difference between 3> means (ANOVA)

Testing bivariate correlation (relationship) between 2 variables


(Pearson's r)

Testing the difference in proportions between 2 groups (chi-square)

If there is no relevant research on topic to estimate the population effect size


(gamma), then use guidelines for gamma or its equivalent

Testing the difference between 2 means (t-test) - gamma for small


effects = .20; medium effects = .50; large effects = .80

Testing the difference between 3> means (ANOVA) - eta squared 2 for
small effects 2 = .01; medium effects 2 = .06; large effects 2 = .14

Testing bivariate correlation (relationship) between 2 variables


(Pearson's r) gamma for small effects = .10; medium effects = .30;

large effects = .50

Testing the difference in proportions between 2 groups (chi-square - no


conventions for unknown populations

Determining Sample Size through Power Analysis

Need to have the following data:

Level of significance criterion = alpha , use .05 for most nursing studies and your
calculations

Power = 1 - (beta); if beta is not known standard power is .80, so use this when
you are determining sample size

Population size effect = gamma or its equivalent, e.g. eta squared 2; use
recommended values for small, medium, or large effect for the statistical test you
plan to use to answer research questions or test hypothesis

Use tables on pages 455-459 of Polit & Hungler or other reference

Mathematical formulas and computer programs can also be used for calculation of sample
size

Sampling Error and Sampling Bias

Sampling error = The difference between the sample statistic (e.g. sample mean)
and the population parameter (e.g. population mean) that is due to the random
fluctuations in data that occur when the sample is selected

Sampling bias

Also called systematic bias or systematic variance

The difference between sample data and population data that can be
attributed to faulty sampling of the population

Consequence of selecting subjects whose characteristics (scores) are


different in some way from the population they are suppose to represent

This usually occurs when randomization is not used

Randomization Procedures in Research

Randomization = each individual in the population has an equal opportunity to


be selected for the sample

Random selection = from all people who meet the inclusion criteria, a sample is
randomly chosen

Random assignment

The assignment of subjects to treatment conditions in a random


manner.

It has no bearing on how the subjects participating in an experiment are


initially selected.

See Polit & Hungler, pg. 160-162 for random assignment to groups and
group random assignment to tx. using a random numbers table

Return to calendar/assignments

Sampling (statistics)
From Wikipedia, the free encyclopedia
This article is about statistically selecting a random (or "representative") subset of a population. For computer
simulation, see pseudo-random number sampling.
In statistics, quality assurance, and survey methodology, sampling is concerned with the selection of a subset
of individuals from within a statistical population to estimate characteristics of the whole population. Acceptance
sampling is used to determine if a production lot of material meets the governing specifications. Two
advantages of sampling are that the cost is lower and data collection is faster than measuring the entire
population.
Each observation measures one or more properties (such as weight, location, color) of observable bodies
distinguished as independent objects or individuals. In survey sampling, weights can be applied to the data to
adjust for the sample design, particularly stratified sampling (blocking). Results from probability
theory and statistical theory are employed to guide practice. In business and medical research, sampling is
widely used for gathering information about a population.
The sampling process comprises several stages:

Defining the population of concern

Specifying a sampling frame, a set of items or events possible to measure

Specifying a sampling method for selecting items or events from the frame

Determining the sample size

Implementing the sampling plan

Sampling and data collecting

Development of Research
Tools and Modeling
Thomas G. Hinton
Savannah River Ecology Laboratory
P O Drawer E, Aiken, SC 29802
(803) 725-7454 office
(803) 725-3309 fax
thinton(at)uga.edu
The most powerful research tool developed by
Dr. Hinton is the Low Dose Rate Irradiation
Facility. The unique facility is ideally suited to the
study of chronic exposures to environmentally
relevant concentrations of pollutants.
The facility was originally designed to use turtles
as a model organism for studying how effects
are manifest along increasing levels of biological
complexity (i.e. from molecules to individuals
and populations) but now we are conducting the
majority of our research with Medaka fish.
Several technique papers emerged from this line
of research, including the development of a
molecular probe (Fig. 6) that facilitates the
quantification of a specific type of chromosome
aberration: reciprocal translocations. The
method, developed in collaboration with Dr. Joel
Bedford and Brant Ulsh of Colorado State
University, was the first application of
Fluorescence In Situ Hybridization (FISH) in an
ecological setting and holds great promise as a
biomarker that can couple molecular effects to
population-level impacts.

Dr. Hinton has also been involved in two


international modeling exercises that used
globally dispersed contamination as a maker for
ecological processes. The first exercise,
sponsored by the International Atomic Energy
Agency and the European Community, tested
the validity of model predictions using data
derived from the Chernobyl accident. The
models used atmospheric concentrations of
radionuclides as their starting point and then
predicted the concentrations in various human
food items, culminating in a prediction of human
dose. The blind tests were compared to data
from the Chernobyl accident. The second
exercise, BIOMOVS, was funded by a
consortium of European governments. Dr.
Hinton was among a team of researchers
examining the sources of variation among model
predictions, particularly, determining the
influence of modeler experience and
interpretation on model results.

Você também pode gostar