Você está na página 1de 14

Experimental design: The basic principles of experimental designs are

randomization, replication and local control. These principles make a valid test of significance
possible. Each of them is described briefly in the following subsections.
(1) Randomization. The first principle of an experimental design is randomization, which is a
random process of assigning treatments to the experimental units. The random process implies that
every possible allotment of treatments has the same probability. An experimental unit is the smallest
division of the experimental material and a treatment means an experimental condition whose effect
is to be measured and compared. The purpose of randomization is to remove bias and other
sources of extraneous variation, which are not controllable. Another advantage of randomization
(accompanied by replication) is that it forms the basis of any valid statistical test. Hence the
treatments must be assigned at random to the experimental units. Randomization is usually done by
drawing numbered cards from a well-shuffled pack of cards, or by drawing numbered balls from a
well-shaken container or by using tables of random numbers.
(2) Replication. The second principle of an experimental design is replication; which is a repetition
of the basic experiment. In other words, it is a complete run for all the treatments to be tested in the
experiment. In all experiments, some variation is introduced because of the fact that the
experimental units such as individuals or plots of land in agricultural experiments cannot be
physically identical. This type of variation can be removed by using a number of experimental units.
We therefore perform the experiment more than once, i.e., we repeat the basic experiment. An
individual repetition is called a replicate. The number, the shape and the size of replicates depend
upon the nature of the experimental material. A replication is used
(i) to secure more accurate estimate of the experimental error, a term which represents the
differences that would be observed if the same treatments were applied several times to the same
experimental units;
(ii) to decrease the experimental error and thereby to increase precision, which is a measure of the
variability of the experimental error; and
(iii) to obtain more precise estimate of the mean effect of a treatment, since , where
denotes the number of replications.
(3) Local Control. It has been observed that all extraneous sources of variation are not removed by
randomization and replication. This necessitates a refinement in the experimental technique. In other
words, we need to choose a design in such a manner that all extraneous sources of variation are
brought under control. For this purpose, we make use of local control, a term referring to the amount
of balancing, blocking and grouping of the experimental units. Balancing means that the treatments
should he assigned to the experimental units in such a way that the result is a balanced
arrangement of the treatments. Blocking means that like experimental units should be collected
together to form a relatively homogeneous group. A block is also a replicate. The main purpose of
the principle of local control is to increase the efficiency of an experimental design by decreasing the
experimental error. The point to remember here is that the term local control should not be confused
with the word control. The word control in experimental design is used for a treatment. Which does
not receive any treatment but we need to find out the effectiveness of other treatments through
comparison.
Reliabilty & Validity :Reliability is the degree to which an assessment tool
produces stable and consistent results.
Types of Reliability
1.Test-retest reliability is a measure of reliability obtained by administering the same
test twice over a period of time to a group of individuals. The scores from Time 1 and
Time 2 can then be correlated in order to evaluate the test for stability over time.
Example: A test designed to assess student learning in psychology could be given
to a group of students twice, with the second administration perhaps coming a week
after the first. The obtained correlation coefficient would indicate the stability of the
scores.
2. Parallel forms reliability is a measure of reliability obtained by administering
different versions of an assessment tool (both versions must contain items that
probe the same construct, skill, knowledge base, etc. to the same group of
individuals. The scores from the two versions can then be correlated in order to
evaluate the consistency of results across alternate versions.
Example: !f you wanted to evaluate the reliability of a critical thinking assessment,
you might create a large set of items that all pertain to critical thinking and then
randomly split the "uestions up into two sets, which would represent the parallel
forms.
#. Inter-rater reliability is a measure of reliability used to assess the degree to
which different $udges or raters agree in their assessment decisions. !nter%rater
reliability is useful because human observers will not necessarily interpret
answers the same way& raters may disagree as to how well certain responses or
material demonstrate knowledge of the construct or skill being assessed.
Example: !nter%rater reliability might be employed when different $udges are
evaluating the degree to which art portfolios meet certain standards. !nter%rater
reliability is especially useful when $udgments can be considered relatively
sub$ective. Thus, the use of this type of reliability would probably be more likely
when evaluating artwork as opposed to math problems.

'. Internal consistency reliability is a measure of reliability used to evaluate the
degree to which different test items that probe the same construct produce
similar results.
A. Average inter-item correlation is a subtype of internal consistency
reliability. !t is obtained by taking all of the items on a test that probe the
same construct (e.g., reading comprehension, determining the correlation
coefficient for each pair of items, and finally taking the average of all of
these correlation coefficients. This final step yields the average inter%item
correlation.
(. Split-half reliability is another subtype of internal consistency reliability.
The process of obtaining split%half reliability is begun by )splitting in half*
all items of a test that are intended to probe the same area of knowledge
(e.g., +orld +ar !! in order to form two )sets* of items. The entire test is
administered to a group of individuals, the total score for each )set* is
computed, and finally the split%half reliability is obtained by determining the
correlation between the two total )set* scores.
Validity refers to how well a test measures what it is purported to measure.
+hy is it necessary,
+hile reliability is necessary, it alone is not sufficient. -or a test to be reliable, it also
needs to be valid. -or e.ample, if your scale is off by / lbs, it reads your weight every
day with an e.cess of /lbs. The scale is reliable because it consistently reports the
same weight every day, but it is not valid because it adds /lbs to your true weight. !t is
not a valid measure of your weight.
Types of 0alidity
1. Face Validity ascertains that the measure appears to be assessing the intended
construct under study. The stakeholders can easily assess face validity. Although this is
not a very )scientific* type of validity, it may be an essential component in enlisting
motivation of stakeholders. !f the stakeholders do not believe the measure is an
accurate assessment of the ability, they may become disengaged with the task.
Example1 !f a measure of art appreciation is created all of the items should be related to
the different components and types of art. !f the "uestions are regarding historical time
periods, with no reference to any artistic movement, stakeholders may not be motivated
to give their best effort or invest in this measure because they do not believe it is a true
assessment of art appreciation.

2. 2onstruct 0alidity is used to ensure that the measure is actually measure what it is
intended to measure (i.e. the construct, and not other variables. 3sing a panel of
)e.perts* familiar with the construct is a way in which this type of validity can be
assessed. The e.perts can e.amine the items and decide what that specific item is
intended to measure. 4tudents can be involved in this process to obtain their feedback.
Example1 A women5s studies program may design a cumulative assessment of learning
throughout the ma$or. The "uestions are written with complicated wording and
phrasing. This can cause the test inadvertently becoming a test of reading
comprehension, rather than a test of women5s studies. !t is important that the measure
is actually assessing the intended construct, rather than an e.traneous factor.
3. Criterion-Related Validity is used to predict future or current performance % it
correlates test results with another criterion of interest.
Example1 !f a physics program designed a measure to assess cumulative student
learning throughout the ma$or. The new measure could be correlated with a
standardi6ed measure of ability in this discipline, such as an 7T4 field test or the 8R7
sub$ect test. The higher the correlation between the established measure and new
measure, the more faith stakeholders can have in the new assessment tool.
. Formative Validity when applied to outcomes assessment it is used to assess how
well a measure is able to provide information to help improve the program under study.
Example1 +hen designing a rubric for history one could assess student5s knowledge
across the discipline. !f the measure can provide information that students are lacking
knowledge in a certain area, for instance the 2ivil Rights 9ovement, then that
assessment tool is providing meaningful information that can be used to improve the
course or program re"uirements.
!. Sampling Validity (similar to content validity ensures that the measure covers the
broad range of areas within the concept under study. :ot everything can be covered,
so items need to be sampled from all of the domains. This may need to be completed
using a panel of )e.perts* to ensure that the content area is ade"uately sampled.
Additionally, a panel can help limit )e.pert* bias (i.e. a test reflecting what an individual
personally feels are the most important or relevant areas.

Example1 +hen designing an assessment of learning in the theatre department, it would
not be sufficient to only cover issues related to acting. ;ther areas of theatre such as
lighting, sound, functions of stage managers should all be included. The assessment
should reflect the content area in its entirety.
Validity and Reliability Compared
4o what is the relationship between validity and reliability, The two do not necessarily go hand%in%
hand.

At best, we have a measure that has both high validity and high reliability. !t yields consistent results
in repeated application and it accurately reflects what we hope to represent.
!t is possible to have a measure that has high reliability but low validity % one that is consistent in
getting bad information or consistent in missing the mark. <!t is also possible to have one that has
low reliability and low validity % inconsistent and not on target.
-inally, it is not possible to have a measure that has low reliability and high validity % you can=t really
get at what you want or what you=re interested in if your measure fluctuates wildly.
Important Experimental Designs
Experimental design refers to the framework or structure of an experiment and as such there are
several experimental designs. We can classify experimental designs into two broad categories
like informal experimental designs and formal experimental designs. Informal experimental
designs are those designs that normally use a less sophisticated form of analysis based on
differences in magnitudes, whereas formal experimental designs offer relatively more control
and use precise statistical procedures for analysis work, important experimental designs are as
follows:
Informal experimental designs:
1. efore!and!after without control design.
". #fter!only with control design.
$. efore!and!after with control design.
%ormal experimental designs:
1. &ompletely randomi'ed design (&. ). design*.
". )andomi'ed block design (). . design*.
$. +atin s,uare design (+. -. design*.
Factorial designs
We may briefly deal with each of the above stated informal as well as formal experimental
designs.
1. efore and after without control design: In such a design a single test group or area is
selected and the dependent variable is measured before the introduction of the treatment.
.he treatment is then introduced and the dependent variable is measured again after the
treatment has been introduced. .he effect of the treatment would be e,ual to the level of
the phenomenon after the treatment minus the level of the phenomenon before the
treatment. .he main difficulty of such a design is that with the passage of time
considerable extraneous variations may be there in its treatment effect.
". #fter!only with control design. In this design two groups or areas (test area and control
area* are selected and the treatment is introduced into the test area only. .he dependent
variable is then measured in both the areas at the same time. .reatment impact is assessed
by subtracting the value of the dependent variable in the control area from its value in the
test area. .he basic assumption in such a design is that the two areas are identical with
respect to their behavior towards the phenomenon considered. If this assumption is not
true, there is the possibility of extraneous variation entering into the treatment effect.
/owever, data can be collected in such a design without the introduction of problems
with the passage of time. In this respect this design is superior to before!and!after without
control design.
$. efore and!after with control design. In this design two areas are selected and the
dependent variable is measured in both the areas for an identical time!period before the
treatment. .he treatmentis then introduced into the test area only, and the dependent
variable is measured in both for an identical time!period after the introduction of the
treatment. .he treatment effect is determined by subtracting the change in the dependent
variable in the control area from the change in the dependent variable in test area. .his
design is superior to the above two designs for the simple reason that it avoids extraneous
variation resulting both from the passage of time andfrom noncom pare* tree of the test
and control areas. ut at times, due to lack of historical data, time or a comparable
control area, we should prefer to select one of the first two informal designs stated above.
0. &ompletely )andomi'ed 1esign (&. ). design* involves only two principles vi'., the
principle of replication and the principle of randomi'ation of experimental designs. It is
the simplest possible design and its procedure of analysis is also easier. .he essential
characteristic of this designs that sub2ects are randomly assigned to experiment
treatments (or vice. 3ersa*. %or instance, if we have 14 sub2ects and if we wish to test 5
under treatment # and 5 under treatment , the randomi'ation process gives every
possible group of 5 sub2ects selected from a set of 14 an e,ual opportunity of being
assigned to treatment # and treatment . 6ne!way analysis of variance (or one!way
#763#* is used to analy'e such a design. Even une,ual replications can also work in
this design. It provides maximum number of degrees of freedom to the error. -uch a
design is generally used when experimental areas happen to be homogenous. .echnically,
when all the variations due to the uncontrolled extraneous factors are included under the
heading of chance variation, we refer to the design of experiment as &. ). design.
Semantic Differential : A semantic differential scale is a list of opposite ad$ectives. !t is a
method invented by 2.7. ;sgood (1>/? in order to measure the connotative meaning of cultural ob$ects.
4emantic differential scales are used in a variety of social science research but it also is used in
marketing and practical, user e.perience research and therapy. 4ometimes semantic differentials are also
known as polarities.
The Semantic Differential (SD) measures people's reactions to stimulus words and concepts in
terms of ratings on bipolar scales defined with contrasting adjectives at each end. An example of
an SD scale is
!suall"# the position mar$ed % is labeled &neutral#& the ' positions are labeled &slightl"#& the (
positions &)uite#& and the * positions &extremel".& A scale li$e this one measures directionalit"
of a reaction (e.g.# good versus bad) and also intensit" (slight through extreme). T"picall"# a
person is presented with some concept of interest# e.g.# +ed ,hina# and as$ed to rate it on a
number of such scales. +atings are combined in various wa"s to describe and anal"-e the
person's feelings.
A number of basic considerations are involved in SD methodolog"
(') .ipolar adjective scales are a simple# economical means for obtaining data on people's
reactions. /ith adaptations# such scales can be used with adults or children# persons from all
wal$s of life# and persons from an" culture.
(() +atings on bipolar adjective scales tend to be correlated# and three basic dimensions of
response account for most of the co0variation in ratings. The three dimensions# which have been
labeled 1valuation# 2otenc"# and Activit" (12A)# have been verified and replicated in an
impressive variet" of studies.
(*) Some adjective scales are almost pure measures of the 12A dimensions3 for example# good0
bad for 1valuation# powerful0powerless for 2otenc"# and fast0slow for Activit". !sing a few pure
scales of this sort# one can obtain# with considerable econom"# reliable measures of a person's
overall response to something. T"picall"# a concept is rated on several pure scales associated
with a single dimension# and the results are averaged to provide a single factor score for each
dimension. 4easurements of a concept on the 12A dimensions are referred to as the concept's
profile.
(5) 12A measurements are appropriate when one is interested in affective responses. The 12A
s"stem is notable for being a multi0
6end p. (*76
variate approach to affect measurement. 8t is also a generali-ed approach# applicable to an"
concept or stimulus# and thus it permits comparisons of affective reactions on widel" disparate
things. 12A ratings have been obtained for hundreds of word concepts# for stories and poems# for
social roles and stereot"pes# for colors# sounds# shapes# and for individual persons.
(7) The SD has been used as a measure of attitude in a wide variet" of projects. 9sgood# et al.#
(':7;) report explorator" studies in which the SD was used to assess attitude change as a result
of mass media programs (pp. *%70*'') and as a result of messages structured in different wa"s
(pp. (5%0(5'). Their chapter on attitude balance or congruit" theor" (pp. '<:0('%) =excerpted in
,hapter '* of this volume> also presents significant applications of the SD to attitude
measurement. The SD has been used b" other investigators to stud" attitude formation (e.g.#
.arcla" arid Thumin# ':?*)# attitudes toward organi-ations (e.g.# +odefeld# ':?;)# attitudes
toward jobs and occupations (e.g.# Triandis# ':7:3 .eardslee and 9'Dowd# ':?'3 @usfield and
Schwart-# ':?*)# and attitudes toward minorities (e.g.# 2rothro and Aeehn# ':7;3 /illiams#
':?53 ':??). The results in these# and man" other studies# support the validit" of the SD as a
techni)ue for attitude measurement. The )uestion of validit"# and other issues in assessing
attitudes with the SD# will be treated in more detail after a general discussion of SD theor" and
techni)ue.
Sampling and non-sampling errors
Beyond the conceptual differences, many kinds of error can help explain differences in
the output of the programs that generate data on income. They are often classified into
two broad types: sampling errors and non-sampling errors.
Sampling errors occur because inferences about the entire population are based on
information obtained from only a sample of that population. Because S!" and the long-
form #ensus are sample sur$eys, their estimates are sub%ect to this type of error. The
coefficient of $ariationis a measure of the extent to which the estimate could $ary, if a
different sample had been used. This measure gi$es an indication of the confidence that
can be placed in a particular estimate. This data &uality measure will be used later in this
paper to help explain why some of S!"'s estimates, which are based on a smaller
sample, might differ from those of the other programs generating income data. (hile
the #ensus is also sub%ect to this type of error, reliable estimates can be made for much
smaller populations because the sampling rate is much higher for the #ensus )*+,-
1
.
.on-sampling errors can be further di$ided into co$erage errors, measurement errors
)respondent, inter$iewer, &uestionnaire, collection method/-, non-response errors and
processing errors. The co$erage errors are generally not well measured for income and
are usually inferred from exercises of data confrontation such as this. Section 0 will
re$iew the population exclusions and other known co$erage differences between the
sources.
The issues of $arious collection methods or mixed response modes and the different
types of measurement errors that could arise will be approached in section 1.
.on-response can be an issue in the case of sur$eys. !t is not always possible to contact
and con$ince household members to respond to a sur$ey. Sometimes as well, e$en if the
household responded, there may not be $alid responses to all &uestions. !n both cases
ad%ustments are performed to the data but error may result as the &uality of the
ad%ustments often depends on then on-respondents being similar to the respondents. 2or
the *++3 income year, S!" had a response rate of 40.0, and for the #ensus, it was
close to 54,. Still for *++3, because of item non-response, all income components were
imputed for *.4, of S!"'s respondents and at least some components were imputed for
another *0.3,
*
. !n the case of the #ensus, income was totally imputed for 5.0, and
partially imputed for *5.0,.
!n administrati$e data 6 in particular the personal tax returns 6 the filing rates for
specific populations may depend on a $ariety of factors )amount owed, financial acti$ity
during the year, personal interest, re&uirement for eligibility to support programs, etc.-
and this could also result in differences in the estimates generated by the programs
producing income data.
The systems and procedures used to process the data in each of the programs are
different and may ha$e design $ariations that impact the data in special ways. (hen
such discrepancies ha$e been identified, they will be mentioned in section 3. Beyond the
design $ariations, most processing errors in these data sources are thought to be
detected and corrected before the release of data to the public. 7owe$er due to the
complexity and to the yearly modifications of processing systems, some errors may
remain undetected and they are therefore &uite difficult to &uantify.
8ore detail on the &uality and methods of indi$idual statistical programs is accessible
through the Sur$eys and statistical programs by sub%ect section on Statistics #anada's
website.
Effective Questioaire Design: The "#alities of a good "#estionnaire
The design of a "uestionnaire will depend on whether the researcher wishes to collect
e.ploratory information (i.e. "ualitative information for the purposes of better understanding
or the generation of hypotheses on a sub$ect or "uantitative information (to test specific
hypotheses that have previously been generated.
$%ploratory "#estionnaires& !f the data to be collected is "ualitative or is not to be
statistically evaluated, it may be that no formal "uestionnaire is needed. -or e.ample, in
interviewing the female head of the household to find out how decisions are made within the
family when purchasing breakfast foodstuffs, a formal "uestionnaire may restrict the
discussion and prevent a full e.ploration of the woman=s views and processes. !nstead one
might prepare a brief guide, listing perhaps ten ma$or open%ended "uestions, with
appropriate probes@prompts listed under each.
Formal standardised "#estionnaires& !f the researcher is looking to test and "uantify
hypotheses and the data is to be analysed statistically, a formal standardised "uestionnaire
is designed. 4uch "uestionnaires are generally characterised by1
prescribed wording and order of "uestions, to ensure that each respondent receives the
same stimuli
prescribed definitions or e.planations for each "uestion, to ensure interviewers handle
"uestions consistently and can answer respondents= re"uests for clarification if they occur
prescribed response format, to enable rapid completion of the "uestionnaire during the
interviewing process.
8iven the same task and the same hypotheses, si. different people will probably come up
with si. different "uestionnaires that differ widely in their choice of "uestions, line of
"uestioning, use of open%ended "uestions and length. There are no hard%and%fast rules
about how to design a "uestionnaire, but there are a number of points that can be borne in
mind1
1. A well%designed "uestionnaire should meet the research ob$ectives. This may seem
obvious, but many research surveys omit important aspects due to inade"uate preparatory
work, and do not ade"uately probe particular issues due to poor understanding. To a certain
degree some of this is inevitable. 7very survey is bound to leave some "uestions
unanswered and provide a need for further research but the ob$ective of good "uestionnaire
design is to =minimise= these problems.
2. !t should obtain the most complete and accurate information possible. The "uestionnaire
designer needs to ensure that respondents fully understand the "uestions and are not likely
to refuse to answer, lie to the interviewer or try to conceal their attitudes. A good
"uestionnaire is organised and worded to encourage respondents to provide accurate,
unbiased and complete information.
#. A well%designed "uestionnaire should make it easy for respondents to give the necessary
information and for the interviewer to record the answer, and it should be arranged so that
sound analysis and interpretation are possible.
'. !t would keep the interview brief and to the point and be so arranged that the
respondent(s remain interested throughout the interview.
Preliminary decisions in "#estionnaire design
There are nine steps involved in the development of a "uestionnaire1
1. Aecide the information re"uired.
2. Aefine the target respondents.
#. 2hoose the method(s of reaching your target respondents.
'. Aecide on "uestion content.
/. Aevelop the "uestion wording.
B. Cut "uestions into a meaningful order and format.
?. 2heck the length of the "uestionnaire.
D. Cre%test the "uestionnaire.
>. Aevelop the final survey form.
Cluster Sampling : 'l#ster sampling is a sampling techni"ue used when
EnaturalE but relatively homogeneous groupings are evident in a statistical population.
The population within a cluster should ideally be as heterogeneous as possible, but
there should be homogeneity between cluster means. 7ach cluster should be a small%
scale representation of the total population. The clusters should be mutually e.clusive
and collectively e.haustive. A random sampling techni"ue is then used on any relevant
clusters to choose which clusters to include in the study. !n single%stage cluster
sampling, all the elements from each of the selected clusters are used. !n two%stage
cluster sampling, a random sampling techni"ue is applied to the elements from each of
the selected clusters...
The main difference between cluster sampling and stratified sampling is that in cluster
sampling the cluster is treated as the sampling unit so analysis is done on a population
of clusters (at least in the first stage.
Census Method:
This method of data collection is also known as complete enumeration technique or 100%
enumeration technique. Under this technique each and every item or unit constituting the universe is
selected for data collection. In the Indian census, which is conducted once in every ten years, this
technique is invarialy followed.
Examples: !uppose we have to study
"!ale and #rofit of !ugar industry in U.#." !uppose there are $000 sugar mills in U.#. If the
investigator has to collect data of all these mills, investigation will e called census method. !uppose
we have to study,
"%verage income of all households in &handigarh" !uppose, there are $',000 households in
&handigarh. If we collect and analy(e data for all these households, investigation will e ased on
census method.
Merits
(i) Accuracy in the Result:
Under this technique of data collection, the result of the enquiry is likely to e e)act and accurate.
This is ecause the information*s are collected from each and every unit of the universe without
ignoring any one.
(ii) Extensive Study:Under this technique of data collection an e)tensive and detailed study of the
unit is made possile. +or e)ample, in the population census a lot of information relating to the
population vi( age, se), marital status, religion, nationality, education, occupation, employment,
income, wealth etc. are all collected in addition to the numer of individuals constituting a unit.
Demerits:
(i)Extensive:This technique of data collection is highly e)pensive. It requires lot of manpower and
administrative personnel as well. Therefore, this type of technique cannot e afforded y small
organi(ations.
(ii)May not Meet with Urgency:!ince the technique takes a lot of time collecting the data from
each and every item, it may not e possile to meet with an urgent situation y answering to a
prolem under study promptly. ,oreover, in course of the long period, conditions of the
phenomenon might have radically changed so that the result otained from the enquiry may not
truly represent the situation. +or e)ample in the measurement of price level changes, the price inde)
numer arrived at after long time does not speak of the true picture of the price level in the economy.
(iii) napplica!ility:
This technique of data collection cannot e applied where the universe is infinite or hypothetical. It
also, cannot e applied, where, in course of stud- the item itself is destroyale.
Primary and Secondary Sources and Triangulation
Researchers need to consider the sources on which to base and
1. interview
2. observation
3. action research
4. case studies
5. life histories
6. questionnaires
7. ethnographic research
8. longitudinal studies
Secondary sources are data that already exists
1. Previous research
2. Official statistics
3. Mass media products
4. Diaries
5. Letters
6. Government reports
7. Web information
8. Historical data and information

Você também pode gostar