Escolar Documentos
Profissional Documentos
Cultura Documentos
Most classroom tests are developed for one or more of the following purposes:
• To establish a basis for assigning grades.
• To determine how well each student has achieved the course objectives.
Developing a good test is like target shooting. Hitting the target requires
planning; you must choose a target, select an appropriate arrow, and take careful
aim. Developing a good test also requires planning: you must determine the
purpose for the test, and carefully write appropriate test items to achieve that
purpose.
Planning the Test
Testing is not a hit-or-miss affair. An educationally sound test has to be
carefully planned. Just as a blueprint is necessary before building a house, we
draw a plan before we construct a test.
The more time you spend on constructing the test and the accompanying
scoring procedure, the less time it will take to score the test. Also, the more time
spent in planning, the more confidence you will have that your assessment of the
student was done using understandable and defensible criteria.
Page 1 of 25
2. Outlining the course content
1. Implicit statements of instructional objectives:
2. Explicit statements of instructional objectives:
Refer to specific student performances which a teacher can observe and test.
For example we say that “we want our students to define the word measurement”.
Here the verb “define” refers to some specific performance which is open to
observation.
Verbs in implicit statements or Verbs in explicit statements or “you
“Undertheskin” Verbs canseeit” Verbs
Know Name
Page 2 of 25
Understand List
Realize Describe
Comprehend Distinguish
Believe Compare
Think State
Enjoy Discuss
Grasp Illustrate
Solve
Draw
write
How To State Instructional Objectives
For example:
Bloom’s Taxonomy Of Educational Objectives
The hierarchy can be used to help formulate objectives. The levels of the
hierarchy, beginning with knowledge, become increasingly complex as we move
up to comprehension, application, analysis, synthesis, and evaluation. It is
assumed that to meet a higher level of learning in a particular topic, one must first
have proficiency at the lower levels.
Analysis is the ability to break a product apart into its requisite elements or logical
components.
Page 4 of 25
Using Bloom's Taxonomy as a guide, the verbs below have been categorized
according to the intellectual activity they represent, ranked here from the highest
to the lowest level.
Page 5 of 25
Test Question Examples -- The sample questions below demonstrate the use of
these verbs, from the most simple to the highest knowledge level:
• Application: Why was the Boston Tea Party a significant act for the settlers?
• Analysis: How does the American Civil War compare with the French
Civil
War?
• Synthesis: If you can only take 10 cultural items to a new world, what
will you take ?
• Evaluation: Do you agree with the main precepts of the Green Party?
Why or why not ?
CONSIDERING OTHER RELATED MATTERS
Along with the preparation of a table of specifications for a test, the teacher should
answer the following major questions in the light of the suggestions given for
each:
The types of items to be chosen will primarily depend upon the intellectual
processes to be called forth and the uses to be made of the test results. Both
essay and objective items can be used keeping in view their strengths and
weaknesses.
It can be a short test lasting only for few minutes, a single period test or a longer
duration test if it’s a midterm or the final examination. The amount of
material to be covered in the test will give an idea of the amount of time
required.
Page 6 of 25
The answer to this question depends upon the types of items chosen, the amount
of available time , the amount of content coverage and the number of
behavioural objectives to be tested.
4. As the rule of thumb the total number of items to be included in the test
should be such as to be completed by at least 80 % of the students within
the testing time.
If the test is a mastery test, the level of difficulty should be uniformly low. If it is
a survey or a diagnostic test, the items should be of varying difficulty from
easy to very difficult.
CONSTRUCTING THE TEST
When the blueprint for the test has been prepared, the next major step is the
construction of the test. There are two types of tests commonly used by the
classroom teacher:
1. Objective tests
Objective Test
Major types of Objective type tests:
Page 7 of 25
1. Supply Type: is the type of test in which the student supplies the answer. This
further has two subtypes namely:
a) Short answer Example:
b) Completion Example:
2. Selection Type: is the type of test in which the student selects the answer from a
given number of choices. This further has three subtypes namely:
a) Alternative response (true false, right wrong, yes no etc.) Example:
No
• 1919
Page 8 of 25
• 1939
• 1945
• 1965
c) Matching Example:
Advantages And Disadvantages Of Objective Type Tests
Advantages:
Disadvantages:
Page 9 of 25
1. Cannot measure higher intellectual abilities and skills (such as literary
composition)
2. Difficult to construct
3. Often develops bad study habits among students who resort to rote learning
of bits of individual information
4. Susceptible to guessing
Essay or subjective type tests:
Is a form of questioning which may have more than one correct answer (or more
than one way of expressing the correct answer). It is also known as Constructed-
Response Items.
It requires students to write out information rather than select a response from a
menu. In scoring, many constructed-response items require judgment on the part
of examiner. It can measure the highest level of learning outcomes such as
analysis, synthesis and evaluation and the integration and application of ideas can
be emphasized. In term of preparation, the essay type questions can be prepared in
less time compared to selection-types format.
Major type of Essay type tests:
The essay type tests can be classified into two subtypes according to the amount of
freedom of response permitted to students:
Please elaborate.
Page 10 of 25
2. Ristricted Response:
Example:
others.
• Advantages And Disadvantages Of Essay Type Tests
Advantages:
4. Easy to construct
5. Easy to administer
Disadvantages:
Page 11 of 25
A. Provides inadequate sampling of questions since it comprises only a few
questions
not only on the same piece of work marked by other examiners, but on their
own marked scripts re-
marked after a passage of time.* ("An Examination of Examinations", Hartog and Rhodes
)
C. Premium is placed on how the answer is written rather than what is written
When to use which type of test?
Objective type test is recommended under the following conditions:
9. If you are better in writing high quality objective test items than in judging
the scripts correctly. (Teacher dependant condition)
Page 12 of 25
Essay type test is recommended under the following conditions:
11. When the emphasis is on developing the skills in written expression among
students
13. When there is less time for test preparation than for marking scripts
15. If you are a better critical reader of scripts than an imaginative and creative
writer of objective test items. (Teacher dependant condition)
General Guidelines For Writing Objective type Test Items
23. Avoid obscure language and “big words,” unless you are specifically testing
for language usage.
24. Be careful not to give the subject irrelevant clues to the right response.
Using “a(n)” rather than “a” or “an” is an example of this.
OBJECTIVE TYPE TESTS
• The short answer Item
• The completion Item
• The true false type
• The right wrong type
• The yes or no type
• The correction type true false
• The cluster type true false
• Correct answer type
• Best answer type
• Worst answer type
• Most inclusive answer type
Page 14 of 25
• Most dissimilar answer type
• The matching exercise
• The rearrangement exercise
• The interpretative exercise
Guidelines For writing Essay Type Questions
Page 15 of 25
6.
Page 16 of 25
Review (after)
D. Review student responses to the essay question.
APPRAISING CLASSROOM TESTS (ITEMS ANALYSIS)
THE VALUE OF ITEM
Item Analysis
Item is a statistical technique which is used for selecting and rejecting the items of a test
on the basis of their difficulty value and discriminative power. Item analysis is a general
term that refers to the specific methods used in education to evaluate test items, typically
for the purpose of test construction and revision. The main objective of item analysis is
to select the appropriate understanding of any existing deficiencies. Particular attention
is given to individual test items, item characteristics, probability of answering items
correctly, overall ability of the test taker, and degrees or levels of knowledge being
assessed.
Item analysis is concerned basically with the two characteristics of an item difficulty
value and discriminative power.
Need of Item Analysis
Item analysis is a technique by which the test items are selected and rejected. The
selection of items may serve the purpose of the designer or test constructor, because the
items have the such characteristics. The following are the main purpose of the test:
The different purposes require different types of test having the items of different
characteristics. The selection or entrance test includes the items of high difficulty value
Page 17 of 25
as well as high power of discrimination. The promotion or prognostic test has the items
of moderate difficulty value. There are various techniques of item analysis which are
used these days.
The Objectives of Item Analysis
• (1) The following are the main objectives of item analysis technique: items for
the final drift and reject the poor items which do not contribute in the functioning
of the test. Some items are to be modified.
• (2) Item analysis obtains the difficulty values of all the items of preliminary draft
of the test. The items are classified- difficulties, moderate and easy items.
• (4) It also indicates the functioning of the distructors in the multiple-choice items.
The powerful and poor distructors are changed. It provides the basis for the
modification to be made in some of the items of preliminary draft.
• (5) The reliability and validity of test depends on these characteristics of a test.
The functioning of a test is increased by this technique. Both these indexes and
considered simultaneously in selecting and rejecting the items of a test.
• (6) It provides the basis for preparing the final draft a test. In the final draft items
are arranged in difficulty order. The most easy items are given in the beginning
and most difficult items are provided at the end.
• (7) Item analysis is a cyclic technique. The modified items are tried out and their
item analysis is done again to obtain these indexes (difficulty and discrimination).
The empirical evidences are obtained for selecting the modified items for the final
draft.
Functions of Item Analysis
The main function of item analysis is to obtain the indexes of the items which indicate
its basic characteristics. There are three characteristics
Page 18 of 25
• (1) Item difficulty value (D. V.) is the proportion of subjects answering each item
correctly.
(a) Item reliability— It is taken as the point-biserial correlation between an item and the
total test score, multiplied by the item standard deviation.
(b) Item validity— It is taken as the point biserial correlation between an item and a
criterion score multiplied by the item standard deviation.
The test as a whole should fulfil its purpose successfully; each of its items must be able
to discriminate between high and poor students on the test. In other words, a test fulfils
its purpose with maximum success when each .items serves as good predictor. Therefore
it is essential that each item of the test should be analysed in terms of its difficulty value
and discriminative power for the justification. Item analysis serves the following purpose
1. (1) To improve and modify a test for immediate use on a parallel group of subjects.
2. (2) To select the best items for a test with regard to its purpose after a proper try out
on the group of subjects selected from the target population.
3. (3) To provide the statistical check-up for the characteristics of the test items for the
judgment of test designer.
4. (4) To set up parallel forms of a test. Parallel form of test should not require only to
have Similar items content or type of items but they should also have the sky&
difficulty value and discriminative power. Item analysis' technique that exactly
parallel test can be developed, provides 'the empirical basis.
5. (5) To modify and reject OF poor items of the test. The poor items may not serve
the purpose of the test. The powerful distractor of items are changed an'tkpoor
distracters are also changed.
6. (6) Item analysis is usually done of a power test rather than speed test. It speed test
all the items are of the same difficulty value. The purpose of speed test is to measure
the speed and accuracy while speed is acquired through practice. There is no power
test, because the time limit is imposed, therefore these are the speeded test. The
speediness of the test depends on the difficulty values of the items of the test. Most
of the students should reach to last items, in the allotted time for the test. Item
analysis is the study of the statistical properties of test items. The qualities usually of
interest are the difficulty of the item and its ability or power to differentiate between
Page 19 of 25
more capable and less capable examinees. Difficulty is usually expressed as the
percent or proportion getting the item right, and discrimination as some index
comparing success by the more capable and the less capable students.
Meaning of definition of Difficulty Value (D.V.)
The term difficulty value of an item can be explained with the help of simple example of
extreme ends. If an item of test is answered correctly by every examinee, it means the
item is very easy the difficulty value is 100 percent or proportion is one. This item will
not serve any purpose and there is no use to include such items in a test. Such items are
generally rejected.
If an item is not answered correctly by any of the examinees. None could answer
correctly, it means the item is most difficult, the difficulty value is zero percent or
proportion is also zero. This item will not serve any purpose and there is no use to
include such items in a test. Such items are usually rejected.
"The difficulty value of an item is defined as the proportion or percentage of the
examinees who have answered the item correctly." —1.1). Guilford
"The difficulty value of an item may be defined as the proportion of certain sample of
subjects who actually know the answer of item." —Frank S. Freeman
In the definition of difficulty value, it has been stated that it is the percentage and
proportion of examinee's who answer the item correctly, but in the second definition, the
difficulty value is defined as the proportion of certain sample of subjects who actually
know the answer of an item. This statement seems to be most functional and dependable,
because an item can be answered correctly by guessing but the examinee does not know
the answer of the item. The difficulty value depends on actually knowing the correct
answer of an item rather than answering an item correctly.
Methods or Techniques of item Analysis
A recent review of the literature on item analysis indicates that there are at least twenty
three different techniques of item analysis. As it has been discussed that item analysis
technique obtain the indexes for the characteristics of an item. The following two
methods of item analysis are most popular and are widely used.
Page 20 of 25
obtaining the indexes for the characteristics of an item. The proportion of right
responses on the items are considered for this purpose.
There are separate techniques for obtaining difficulty value and discriminative power of
the items.
(a) Techniques of Difficulty Value.
a1 – Proportion of right responses on an item technique. Davis and Haper have also used
this technique.
b1 – Proportion of right responses on an item technique. Davis and Haper have used this
technique.
THE PROCEDURE/ PURPOSE OF ITEM ANALYSIS:
The review of literature on item analysis indicates that there are two dozen techniques of
item analysis have been devices to obtain the difficulty value and discriminative index of
an item of a test. It is not possible to describe all the techniques of item analysis in this
chapter. Therefore, most popular and widely used techniques have been discussed.
Fredrick B. Davis method of Item Analysis of Prognostic test, and Stanley method of
Item Analysis of Diagnostic test.
"The item difficulty value may be defined as the proportion or percentage of certain
sample subjects that actually know the answer of an item. Frank S. Freeman
The difficulty value depends on actually knowing the answer rather than answering
correctly i.e. right responses. In objective type test, the items are answered correctly by
guessing rather than actually knowing the answer. It means that an item may be
answered without knowing its answer. Thus, correction for guessing is to be used for
obtaining the scores which may be actual correct responses.
Page 21 of 25
It is important to note that in the procedure of item analysis item wise scoring is done,
while subject wise scoring is done in general. There are several formulas have been
developed by psychomatricians for 'guessing correction'. Some of the important formula-
correction for guessing has been discussed.
FormulaCorrection for Guessing
The following two formula-corrections for guessing have been explained. (a) Guilford's
formula-correction for guessing and
MAKING THE MOST OF EXAMS: PROCEDURES FOR ITEM ANALYSIS:
One of the most important (if least appealing) tasks confronting faculty members
is the evaluation of student performance. This task requires considerable skill, in part
because it presents so many choices. Decisions must be made concerning the method,
format, timing, and duration of the evaluative procedures. Once designed, the evaluative
procedure must be administered and then scored, interpreted, and graded. Afterwards,
feedback must be presented to students. Accomplishing these tasks demands a broad
range of cognitive, technical, and interpersonal resources on the part of faculty. But an
even more critical task remains, one that perhaps too few faculty undertake with
sufficient skill and tenacity: investigating the quality of the evaluative procedure.
Even after an exam, how do we know whether that exam was a good one? It is
obvious that any exam can only be as good as the items it comprises, but then what
constitutes a good exam item? Our students seem to know, or at least believe they know.
But are they correct when they claim that an item was too difficult, too tricky, or too
unfair?
Item Difficulty Index (p)
Page 22 of 25
where the instructor can convert the range of possible point values into the categories
"passing" and "failing."
The item difficulty index, symbolized p, can be computed simply by dividing the
number of test takers who answered the item correctly by the total number of students
who answered the item. As a proportion, p can range between 0.00, obtained when no
examinees answered the item correctly, and 1.00, obtained when all examinees answered
the item correctly. Notice that no test item need have only one p value. Not only may the
p value vary with each class group that takes the test, an instructor may gain insight by
computing the item difficulty level for a number of different subgroups within a class,
such as those who did well on the exam overall and those who performed more poorly.
Although the computation of the item difficulty index is quite straightforward, the
interpretation of this statistic is not. To illustrate, consider an item with a difficulty level
of 0.20. We do know that 20% of the examinees answered the item correctly, but we
cannot be certain why they did so. Does this item difficulty level mean that the item was
challenging for all but the best prepared of the examinees? Does it mean that the
instructor failed in his or her attempt to teach the concept assessed by the item? Does it
mean that the students failed to learn the material? Does it mean that the item was poorly
written? To answer these questions, we must rely on other item analysis procedures, both
qualitative and quantitative ones.
Item Discrimination Index (D)
Item discrimination analysis deals with the fact that often different test takers will
answer a test item in different ways. As such, it addresses questions of considerable
interest to most faculty, such as, "does the test item differentiate those who did well on
the exam overall from those who did not?" or "does the test item differentiate those who
know the material from those who do not?" In a more technical sense then, item
discrimination analysis addresses the validity of the items on a test, that is, the extent to
which the items tap the attributes they were intended to assess. As with item difficulty,
item discrimination analysis involves a family of techniques. Which one to use depends
on the type of testing situation and the nature of the items. I'm going to look at only one
of those, the item discrimination index, symbolized D. The index parallels the difficulty
index in that it can be used whenever items can be scored dichotomously, as correct or
incorrect, and hence it is most appropriate for true-false, multiple-choice, and matching
items, and for those essay items which the instructor can score as "pass" or "fall."
We test because we want to find out if students know the material, but all we learn
for certain is how they did on the exam we gave them. The item discrimination index
tests the test in the hope of keeping the correlation between knowledge and exam
performance as close as it can be in an admittedly imperfect system.
1. Divide the group of test takers into two groups, high scoring and low scoring.
Ordinarily, this is done by dividing the examinees into those scoring above and those
Page 23 of 25
scoring below the median. (Alternatively, one could create groups made up of the top
and bottom quintiles or quartiles or even deciles.)
2. Compute the item difficulty levels separately for the upper (P upper) and lower
(Plower) scoring groups.
3. Subtract the two difficulty levels such that D = P upper Plower
How is the item discrimination index interpreted? Unlike the item difficulty
levelp, the item discrimination index can take on negative values and can range between
-1.00 and 1.00. Consider the following situation: suppose that overall, half of the
examinees answered a particular item correctly, and that all of the examinees who scored
above the median on the exam answered the item correctly and all of the examinees who
scored below the median answered incorrectly. In such a situation P, upper, = 1.00 and P
lower = 0.00. As such, thevalue of the item discrimination index D is 1.00 and the item is
said to be a perfect positive discriminator. Many would regard this outcome as ideal. It
suggests that those who knew the material and were well-prepared passed the item while
all others failed it.
Finally, notice that the difficulty and discrimination are not independent. If all the
students in both the upper and lower levels either pass or fail an item, there's nothing in
the data to indicate whether the item itself was good or not. Indeed, the value of the item
discrimination index will be maximized when only half of the test takers overall answer
an item correctly; that is, when p = 0.50. Once again, the ideal situation is one in which
the half who passed the item were students who all did well on the exam overall.
Does this mean that it is never appropriate to retain items on an exam that are
passed by all examinees, or by none of the examinees? Not at all. There are many
reasons to include at least some such items. Very easy items can reflect the fact that
some relatively straightforward concepts were taught well and mastered by all students.
Similarly, an instructor may choose to include some very difficult items on an exam to
challenge even the best-prepared students. The instructor should simply be aware that
neither of these types of items functions well to make discriminations among those
taking the test.
References and further readings
Page 24 of 25
Babbie, E.R. (1989). The practice of Social Science Research. Belmont, CA:Wadsworth.
Page 25 of 25