Você está na página 1de 25

How can I develop a good test?

 
Most classroom tests are developed for one or more of the following purposes:
• To establish a basis for assigning grades.

• To determine how well each student has achieved the course objectives.

• To diagnose student problems for remediation.

• To determine where instruction needs improvement.

Developing a good test is like target shooting. Hitting the target requires
planning; you must choose a target, select an appropriate arrow, and take careful
aim. Developing a good test also requires planning: you must determine the
purpose for the test, and carefully write appropriate test items to achieve that
purpose.

Planning the Test 
Testing is not a hit-or-miss affair. An educationally sound test has to be
carefully planned. Just as a blueprint is necessary before building a house, we
draw a plan before we construct a test.

The more time you spend on constructing the test and the accompanying
scoring procedure, the less time it will take to score the test. Also, the more time
spent in planning, the more confidence you will have that your assessment of the
student was done using understandable and defensible criteria.

The planning of a classroom test involves four major steps:

1. Developing and defining the instructional objectives of the course

Page 1 of 25
2. Outlining the course content

3. Preparing a table of specifications

4. Considering other related matters

The statements of instructional objectives may be implicit or explicit. 

1. Implicit statements of instructional objectives: 

Refer to some interior responses of students not open to observation, for


example we can state that “we want our students to know the meaning of the term
measurement”. Here the verb “know” refers to some internal response which we
cannot observe.

2. Explicit statements of instructional objectives: 

Refer to specific student performances which a teacher can observe and test.
For example we say that “we want our students to define the word measurement”.
Here the verb “define” refers to some specific performance which is open to
observation.

The chief difference between the implicit and explicit statements of


instructional objectives lies in the choice of verbs. In implicit statements the
actions of the verbs are hidden behaviours while in explicit statements the actions
of the verbs are observable behaviours.

Verbs in implicit statements or  Verbs in explicit statements or “you­
“Under­the­skin” Verbs can­see­it” Verbs

Know Name

Page 2 of 25
Understand List

Realize Describe

Comprehend Distinguish

Believe Compare

Think State

Enjoy Discuss

Grasp Illustrate

Solve

Draw

write

How To State Instructional Objectives 

Although there is no universally accepted format for stating instructional


objectives however there appears to be some consensus among test specialists that
an ideal statement of Instructional objective has three main elements:

1. Identification of the learning outcome

2. Description of the situation under which the learning outcome is to take


place

3. Description of the standard of acceptable performance

For example:

Instructional objective: Given a list of 20 words, the student should be able to


spell 15 words correctly. In this example:
Page 3 of 25
“to spell the words correctly”( Identification of the learning outcome)

“Given a list of 20 words” (Description of the situation under which the learning outcome 


is to take place) “to spell 15 words correctly” (Description of the standard of acceptable 
performance) 

Bloom’s Taxonomy Of Educational Objectives 

Instructional objectives can be written to show different levels of learning. The


most used hierarchy of learning was formulated by Bloom, et.al in 1956. They
classified learning behaviour in three domains namely; cognitive, affective and
psychomotor. We will remain concerned in this discussion with the cognitive
domain only.

The hierarchy can be used to help formulate objectives. The levels of the
hierarchy, beginning with knowledge, become increasingly complex as we move
up to comprehension, application, analysis, synthesis, and evaluation. It is
assumed that to meet a higher level of learning in a particular topic, one must first
have proficiency at the lower levels.

Knowledge is knowing and recall of specific facts, principles, theories, methods


etc.

Comprehension is the ability to explain a point.

Application is using previously known facts to solve a problem.

Analysis is the ability to break a product apart into its requisite elements or logical
components.

Synthesis is the ability to create something.

Evaluation is the ability to judge quality.

Page 4 of 25
Using Bloom's Taxonomy as a guide, the verbs below have been categorized
according to the intellectual activity they represent, ranked here from the highest
to the lowest level.

Page 5 of 25
Test Question Examples -- The sample questions below demonstrate the use of
these verbs, from the most simple to the highest knowledge level:

• Knowledge: Who was the first president of India?

• Comprehension: Define the word measurement in your own words.

• Application: Why was the Boston Tea Party a significant act for the settlers?

• Analysis: How does the American Civil War compare with the French
Civil

War?

• Synthesis: If you can only take 10 cultural items to a new world, what
will you take ?

• Evaluation: Do you agree with the main precepts of the Green Party?
Why or why not ?

CONSIDERING OTHER RELATED MATTERS 

Along with the preparation of a table of specifications for a test, the teacher should
answer the following major questions in the light of the suggestions given for
each:

1. What types of items should be used?

The types of items to be chosen will primarily depend upon the intellectual
processes to be called forth and the uses to be made of the test results. Both
essay and objective items can be used keeping in view their strengths and
weaknesses.

2. What should be the time allotment foe the test?

It can be a short test lasting only for few minutes, a single period test or a longer
duration test if it’s a midterm or the final examination. The amount of
material to be covered in the test will give an idea of the amount of time
required.

3. How many items should be there in the test?

Page 6 of 25
The answer to this question depends upon the types of items chosen, the amount
of available time , the amount of content coverage and the number of
behavioural objectives to be tested.

4. As the rule of thumb the total number of items to be included in the test
should be such as to be completed by at least 80 % of the students within
the testing time. 

5. What should be the difficulty of the test?

If the test is a mastery test, the level of difficulty should be uniformly low. If it is
a survey or a diagnostic test, the items should be of varying difficulty from
easy to very difficult.

CONSTRUCTING THE TEST 
When the blueprint for the test has been prepared, the next major step is the
construction of the test. There are two types of tests commonly used by the
classroom teacher:

1. Objective tests

2. Essay tests or Subjective tests.

Objective Test 

Is a form of questioning which has a single correct answer. It is also known


as Selected-Response Items.

A scoring key for correct responses is created and can be applied by an


examiner or by a computer. The scoring is easy, objective and reliable. The task is
highly structured and clear where it can measure both simple and complex
learning outcomes. However, constructing good items is time consuming and it is
ineffective to measure some types of problem-solving items

Major types of Objective type tests: 

There are two major types of objective type tests:

Page 7 of 25
1. Supply Type: is the type of test in which the student supplies the answer. This
further has two subtypes namely:
a) Short answer Example:

· Which is the most populous country of the world?

· Who invented the telephone?

b) Completion Example:

· The formula for water is _________

· The plural of child is____________

2. Selection Type: is the type of test in which the student selects the answer from a
given number of choices. This further has three subtypes namely:
a) Alternative response (true false, right wrong, yes no etc.) Example:

• Indus is in Asia. True or False 

• Is Baluchistan the largest province of Pakistan area wise? Yes or 

No 

b) Multiple choices Example:

• The UN was established in the year

• 1919
Page 8 of 25
• 1939

• 1945

• 1965

c) Matching Example:

Advantages And Disadvantages Of Objective Type Tests 

Advantages: 

1. Can measure a variety of learning outcomes

2. Can consist of large number of questions thus sampling of material can be


extensive and adequate

3. Can be scored reliably

4. Can be scored quickly

5. Can be scored by anyone who has an answer key

6. Can be scored by a machine

Disadvantages: 

Page 9 of 25
1. Cannot measure higher intellectual abilities and skills (such as literary
composition)

2. Difficult to construct

3. Often develops bad study habits among students who resort to rote learning
of bits of individual information

4. Susceptible to guessing

Essay or subjective type tests: 

Is a form of questioning which may have more than one correct answer (or more
than one way of expressing the correct answer). It is also known as Constructed-
Response Items.

It requires students to write out information rather than select a response from a
menu. In scoring, many constructed-response items require judgment on the part
of examiner. It can measure the highest level of learning outcomes such as
analysis, synthesis and evaluation and the integration and application of ideas can
be emphasized. In term of preparation, the essay type questions can be prepared in
less time compared to selection-types format.

Major type of Essay type tests: 

The essay type tests can be classified into two subtypes according to the amount of
freedom of response permitted to students:

1. Extended Response: Type of questions in which the response of


students is unrestricted. Example:

· Discuss the importance of testing in education.

· What are the advantages and disadvantages of computer?

· “Democracy is the worst form of government but is better


than all others”;

Please elaborate.
Page 10 of 25
2. Ristricted Response: 

Example:

· Give five reasons why testing is important in education.

· State four advantages and four disadvantages of computer.

· State five points why a democracy is better form of


government than all

others.

• Advantages And Disadvantages Of Essay Type Tests 

Advantages: 

1. Can measure complex learning outcomes (like abilities to organise,


express, evaluate and create ideas which cannot be measured by other
means)

2. Develops good study habits among students (since it emphasizes


higher mental abilities and problem solving skills.)

3. Develops writing skills

4. Easy to construct

5. Easy to administer

Disadvantages: 

Page 11 of 25
A. Provides inadequate sampling of questions since it comprises only a few
questions

B. Variation in marking. This is possibly the greatest disadvantage to a


subjective test. Inconsistency in subjective assessment; research has shown
that experienced examiners award widely varying marks

not only on the same piece of work marked by other examiners, but on their
own marked scripts re-

marked after a passage of time.* ("An Examination of Examinations", Hartog and Rhodes
)

C. Premium is placed on how the answer is written rather than what is written

D. Scoring is time consuming and tedious

When to use which type of test? 

Objective type test is recommended under the following conditions: 

1. When testing a large group of students

5. When reusing of the test is wished.

6. When highly reliable scores are required

7. When absolute impartial scoring is required

8. When Speedy reporting of scores is required rather than speedy test


preparation.

9. If you are better in writing high quality objective test items than in judging
the scripts correctly. (Teacher dependant condition)

Page 12 of 25
Essay type test is recommended under the following conditions: 

1. When testing small group of students

10. When reuse of the test is not wished

11. When the emphasis is on developing the skills in written expression among
students

12. When measurement of more than factual information is emphasized

13. When there is less time for test preparation than for marking scripts

14. When there is a shortage of stationary and duplicating facilities

15. If you are a better critical reader of scripts than an imaginative and creative
writer of objective test items. (Teacher dependant condition)

General Guidelines For Writing Objective type Test Items 

1. Avoid ambiguous or meaningless test items.

16. Avoid rambling or confusing sentence structure.

17. Avoid exact wording of the text book.

18. If some opinion is used; also indicate its source.

19. Avoid interdependent items

20. Avoid trick questions

21. Use good grammar.

22. Use items that have a “definitely correct” answer.

23. Avoid obscure language and “big words,” unless you are specifically testing
for language usage.

24. Be careful not to give the subject irrelevant clues to the right response.
Using “a(n)” rather than “a” or “an” is an example of this.

25. Avoid a regular sequence in the pattern of correct responses.


Page 13 of 25
In short, a test should not provide any barrier to subjects apart from demonstrating
mastery over the test content. Otherwise, scores reflect more “noise” than “true
measure.”

OBJECTIVE TYPE TESTS 

1. Supply type Items

• The short answer Item

• The completion Item 

2. The Alternative Response Item

• The true false type 

• The right wrong type 

• The yes or no type 

• The correction type true false 

• The cluster type true false 

3. The multiple-choice Items

• Correct answer type 

• Best answer type 

• Worst answer type 

• Most inclusive answer type 

Page 14 of 25
• Most dissimilar answer type 

4. The Matching Exercise Items

• The matching exercise 

• The rearrangement exercise 

• The interpretative exercise 

Guidelines For writing Essay Type Questions 

1. Clearly define the intended learning outcome to be assessed by the


item.
2. Avoid using essay questions for intended learning outcomes that are
better assessed with other kinds of assessment.
3. Clearly define the task and situate the task in a problem situation.
4. Clearly define the task.
5. Delimit the scope of the task.

Page 15 of 25
6.

Clearly develop the problem or problem situation.


7. Present a reasonable task to students.
8. The tasks can be written as a statement or question.
9. Specify the relative point value and the approximate time limit in clear
directions. State the criteria for grading.
Use several relatively short essay questions rather than one long one.
Avoid the use of optional questions.
10. Improve the essay question through preview and review.
Preview (before)
A. Predict student responses.

B. Write a model answer.

C. Ask a knowledgeable colleague to critically review the essay question,


the model

answer, and the intended learning outcome for alignment.

Page 16 of 25
Review (after)
D.  Review student responses to the essay question.

APPRAISING CLASSROOM TESTS (ITEMS ANALYSIS) 
THE VALUE OF ITEM

 Item Analysis 

Item is a statistical technique which is used for selecting and rejecting the items of a test
on the basis of their difficulty value and discriminative power. Item analysis is a general
term that refers to the specific methods used in education to evaluate test items, typically
for the purpose of test construction and revision. The main objective of item analysis is
to select the appropriate understanding of any existing deficiencies. Particular attention
is given to individual test items, item characteristics, probability of answering items
correctly, overall ability of the test taker, and degrees or levels of knowledge being
assessed.

Item analysis is concerned basically with the two characteristics of an item difficulty
value and discriminative power.

Need of Item Analysis 

Item analysis is a technique by which the test items are selected and rejected. The
selection of items may serve the purpose of the designer or test constructor, because the
items have the such characteristics. The following are the main purpose of the test:

• (a) Classification of students or candidates.

• (b) Selection of the candidates for the job.

• (c) Gradation is an academic purpose to assign grades or divisions to the students.

• (d) Prognosis and promotion of the candidates or students.

• (e) Establishing individual differences, and

• (f) Research for the verification of hypotheses.

The different purposes require different types of test having the items of different
characteristics. The selection or entrance test includes the items of high difficulty value
Page 17 of 25
as well as high power of discrimination. The promotion or prognostic test has the items
of moderate difficulty value. There are various techniques of item analysis which are
used these days.

The Objectives of Item Analysis 

• (1) The following are the main objectives of item analysis technique: items for
the final drift and reject the poor items which do not contribute in the functioning
of the test. Some items are to be modified.

• (2) Item analysis obtains the difficulty values of all the items of preliminary draft
of the test. The items are classified- difficulties, moderate and easy items.

• (3) It provides the discriminative power (item reliability; validity) to differentiate


between capable and less capable examines of all the items preliminary draft of
the test. The items are classified on the basis of the indexes-positive, negative and
no discrimination. The negative and no discrimination power items are rejected
out rightly.

• (4) It also indicates the functioning of the distructors in the multiple-choice items.
The powerful and poor distructors are changed. It provides the basis for the
modification to be made in some of the items of preliminary draft.

• (5) The reliability and validity of test depends on these characteristics of a test.
The functioning of a test is increased by this technique. Both these indexes and
considered simultaneously in selecting and rejecting the items of a test.

• (6) It provides the basis for preparing the final draft a test. In the final draft items
are arranged in difficulty order. The most easy items are given in the beginning
and most difficult items are provided at the end.

• (7) Item analysis is a cyclic technique. The modified items are tried out and their
item analysis is done again to obtain these indexes (difficulty and discrimination).
The empirical evidences are obtained for selecting the modified items for the final
draft.

Functions of Item Analysis 

The main function of item analysis is to obtain the indexes of the items which indicate
its basic characteristics. There are three characteristics

Page 18 of 25
• (1) Item difficulty value (D. V.) is the proportion of subjects answering each item
correctly.

• (2) Discriminative power (D.P.) of item, this characteristic is of two type —

(a) Item reliability— It is taken as the point-biserial correlation between an item and the
total test score, multiplied by the item standard deviation.

(b) Item validity— It is taken as the point biserial correlation between an item and a
criterion score multiplied by the item standard deviation.

The test as a whole should fulfil its purpose successfully; each of its items must be able
to discriminate between high and poor students on the test. In other words, a test fulfils
its purpose with maximum success when each .items serves as good predictor. Therefore
it is essential that each item of the test should be analysed in terms of its difficulty value
and discriminative power for the justification. Item analysis serves the following purpose

1. (1) To improve and modify a test for immediate use on a parallel group of subjects.

2. (2) To select the best items for a test with regard to its purpose after a proper try out
on the group of subjects selected from the target population.

3. (3) To provide the statistical check-up for the characteristics of the test items for the
judgment of test designer.

4. (4) To set up parallel forms of a test. Parallel form of test should not require only to
have Similar items content or type of items but they should also have the sky&
difficulty value and discriminative power. Item analysis' technique that exactly
parallel test can be developed, provides 'the empirical basis.

5. (5) To modify and reject OF poor items of the test. The poor items may not serve
the purpose of the test. The powerful distractor of items are changed an'tkpoor
distracters are also changed.

6. (6) Item analysis is usually done of a power test rather than speed test. It speed test
all the items are of the same difficulty value. The purpose of speed test is to measure
the speed and accuracy while speed is acquired through practice. There is no power
test, because the time limit is imposed, therefore these are the speeded test. The
speediness of the test depends on the difficulty values of the items of the test. Most
of the students should reach to last items, in the allotted time for the test. Item
analysis is the study of the statistical properties of test items. The qualities usually of
interest are the difficulty of the item and its ability or power to differentiate between
Page 19 of 25
more capable and less capable examinees. Difficulty is usually expressed as the
percent or proportion getting the item right, and discrimination as some index
comparing success by the more capable and the less capable students.

Meaning of definition of Difficulty Value (D.V.) 

The term difficulty value of an item can be explained with the help of simple example of
extreme ends. If an item of test is answered correctly by every examinee, it means the
item is very easy the difficulty value is 100 percent or proportion is one. This item will
not serve any purpose and there is no use to include such items in a test. Such items are
generally rejected.

If an item is not answered correctly by any of the examinees. None could answer
correctly, it means the item is most difficult, the difficulty value is zero percent or
proportion is also zero. This item will not serve any purpose and there is no use to
include such items in a test. Such items are usually rejected.

"The difficulty value of an item is defined as the proportion or percentage of the 
examinees who have answered the item correctly." —1.1). Guilford 

"The difficulty value of an item may be defined as the proportion of certain sample of 
subjects who actually know the answer of item." —Frank S. Freeman 

In the definition of difficulty value, it has been stated that it is the percentage and
proportion of examinee's who answer the item correctly, but in the second definition, the
difficulty value is defined as the proportion of certain sample of subjects who actually
know the answer of an item. This statement seems to be most functional and dependable,
because an item can be answered correctly by guessing but the examinee does not know
the answer of the item. The difficulty value depends on actually knowing the correct
answer of an item rather than answering an item correctly.

In the procedure of item-analysis "correction for guessing” formula is used for the


scores rather than right answers. The difficulty value is also obtained in terms of
standard scores or z-scores.

Methods or Techniques of item Analysis 

A recent review of the literature on item analysis indicates that there are at least twenty
three different techniques of item analysis. As it has been discussed that item analysis
technique obtain the indexes for the characteristics of an item. The following two
methods of item analysis are most popular and are widely used.

1. 1)   Davis method of item analysis—It is the basic method of item analysis. It is


used for the prognostic test for selecting and rejecting the items on the basis of
difficulty value and discriminative power. The right responses are considered in

Page 20 of 25
obtaining the indexes for the characteristics of an item. The proportion of right
responses on the items are considered for this purpose.

7. 2)   Stanley method of item analysis.  It is used for the diagnostic test items. The


wrong responses are considered in obtaining the difficulty value and discriminative
power. The wrong responses provide the cause of weakness of the students. The
proportion of wrong responses on an item is considered for this purpose.

There are separate techniques for obtaining difficulty value and discriminative power of
the items.

(a) Techniques of Difficulty Value. 

There are two main approaches for obtaining difficult value.

a1 – Proportion of right responses on an item technique. Davis and Haper have also used
this technique.

a2 – Standard scores or z-scores or normal probability curve. Technique of


Discriminative Power.

b1 – Proportion of right responses on an item technique. Davis and Haper have used this
technique.

THE PROCEDURE/ PURPOSE OF ITEM ANALYSIS: 

The review of literature on item analysis indicates that there are two dozen techniques of
item analysis have been devices to obtain the difficulty value and discriminative index of
an item of a test. It is not possible to describe all the techniques of item analysis in this
chapter. Therefore, most popular and widely used techniques have been discussed.

Fredrick B. Davis method of Item Analysis of Prognostic test, and Stanley method of
Item Analysis of Diagnostic test.

"The item difficulty value may be defined as the proportion or percentage of certain 
sample subjects that actually know the answer of an item. ­­Frank S. Freeman 

The difficulty value depends on actually knowing the answer rather than answering
correctly i.e. right responses. In objective type test, the items are answered correctly by
guessing rather than actually knowing the answer. It means that an item may be
answered without knowing its answer. Thus, correction for guessing is to be used for
obtaining the scores which may be actual correct responses.

Page 21 of 25
It is important to note that in the procedure of item analysis item wise scoring is done,
while subject wise scoring is done in general. There are several formulas have been
developed by psychomatricians for 'guessing correction'. Some of the important formula-
correction for guessing has been discussed.

Formula­Correction for Guessing 

The following two formula-corrections for guessing have been explained. (a) Guilford's
formula-correction for guessing and

(b) Horst's formula-correction for guessing.

MAKING THE MOST OF EXAMS: PROCEDURES FOR ITEM ANALYSIS: 

One of the most important (if least appealing) tasks confronting faculty members
is the evaluation of student performance. This task requires considerable skill, in part
because it presents so many choices. Decisions must be made concerning the method,
format, timing, and duration of the evaluative procedures. Once designed, the evaluative
procedure must be administered and then scored, interpreted, and graded. Afterwards,
feedback must be presented to students. Accomplishing these tasks demands a broad
range of cognitive, technical, and interpersonal resources on the part of faculty. But an
even more critical task remains, one that perhaps too few faculty undertake with
sufficient skill and tenacity: investigating the quality of the evaluative procedure.

Even after an exam, how do we know whether that exam was a good one? It is
obvious that any exam can only be as good as the items it comprises, but then what
constitutes a good exam item? Our students seem to know, or at least believe they know.
But are they correct when they claim that an item was too difficult, too tricky, or too
unfair?

Lewis Aiken (1997), the author of a leading textbook on the subject of


psychological and educational assessment, contends that a "postmortem" evaluation is
just as necessary in classroom testing as it is in medicine. Indeed, just such a postmortem
procedure for exams exists-- item analysis, a group of procedures for assessing the
quality of exam items. The purpose of an item analysis is to improve the quality of an
exam by identifying items that are candidates for retention, revision, or removal. More
specifically, not only can the item analysis identify both good and deficient items, it can
also clarify what concepts the examinees have and have not mastered.

Item Difficulty Index (p) 

The item difficulty statistic is an appropriate choice for achievement or aptitude


tests when the items are scored dichotomously (i.e., correct vs. incorrect). Thus, it can be
derived for true-false, multiple-choice, and matching items, and even for essay items,

Page 22 of 25
where the instructor can convert the range of possible point values into the categories
"passing" and "failing."

The item difficulty index, symbolized p, can be computed simply by dividing the
number of test takers who answered the item correctly by the total number of students
who answered the item. As a proportion, p can range between 0.00, obtained when no
examinees answered the item correctly, and 1.00, obtained when all examinees answered
the item correctly. Notice that no test item need have only one p value. Not only may the
p value vary with each class group that takes the test, an instructor may gain insight by
computing the item difficulty level for a number of different subgroups within a class,
such as those who did well on the exam overall and those who performed more poorly.

Although the computation of the item difficulty index is quite straightforward, the
interpretation of this statistic is not. To illustrate, consider an item with a difficulty level
of 0.20. We do know that 20% of the examinees answered the item correctly, but we
cannot be certain why they did so. Does this item difficulty level mean that the item was
challenging for all but the best prepared of the examinees? Does it mean that the
instructor failed in his or her attempt to teach the concept assessed by the item? Does it
mean that the students failed to learn the material? Does it mean that the item was poorly
written? To answer these questions, we must rely on other item analysis procedures, both
qualitative and quantitative ones.

Item Discrimination Index (D) 

Item discrimination analysis deals with the fact that often different test takers will
answer a test item in different ways. As such, it addresses questions of considerable
interest to most faculty, such as, "does the test item differentiate those who did well on
the exam overall from those who did not?" or "does the test item differentiate those who
know the material from those who do not?" In a more technical sense then, item
discrimination analysis addresses the validity of the items on a test, that is, the extent to
which the items tap the attributes they were intended to assess. As with item difficulty,
item discrimination analysis involves a family of techniques. Which one to use depends
on the type of testing situation and the nature of the items. I'm going to look at only one
of those, the item discrimination index, symbolized D. The index parallels the difficulty
index in that it can be used whenever items can be scored dichotomously, as correct or
incorrect, and hence it is most appropriate for true-false, multiple-choice, and matching
items, and for those essay items which the instructor can score as "pass" or "fall."

We test because we want to find out if students know the material, but all we learn
for certain is how they did on the exam we gave them. The item discrimination index
tests the test in the hope of keeping the correlation between knowledge and exam
performance as close as it can be in an admittedly imperfect system.

The item discrimination index is calculated in the following way:

1. Divide the group of test takers into two groups, high scoring and low scoring.
Ordinarily, this is done by dividing the examinees into those scoring above and those
Page 23 of 25
scoring below the median. (Alternatively, one could create groups made up of the top
and bottom quintiles or quartiles or even deciles.)

2. Compute the item difficulty levels separately for the upper (P upper) and lower
(Plower) scoring groups.
­ 
3. Subtract the two difficulty levels such that D = P upper  Plower 

How is the item discrimination index interpreted? Unlike the item difficulty
levelp, the item discrimination index can take on negative values and can range between
-1.00 and 1.00. Consider the following situation: suppose that overall, half of the
examinees answered a particular item correctly, and that all of the examinees who scored
above the median on the exam answered the item correctly and all of the examinees who
scored below the median answered incorrectly. In such a situation P, upper, = 1.00 and P
lower = 0.00. As such, thevalue of the item discrimination index D is 1.00 and the item is
said to be a perfect positive discriminator. Many would regard this outcome as ideal. It
suggests that those who knew the material and were well-prepared passed the item while
all others failed it.

Though it's not as unlikely as winning a million-dollar lottery, finding a perfect


positive discriminator on an exam is relatively rare. Most psychometricians would say
that items yielding positive discrimination index values of 0.30 and above are quite good
discriminators and worthy of retention for future exams.

Finally, notice that the difficulty and discrimination are not independent. If all the
students in both the upper and lower levels either pass or fail an item, there's nothing in
the data to indicate whether the item itself was good or not. Indeed, the value of the item
discrimination index will be maximized when only half of the test takers overall answer
an item correctly; that is, when p = 0.50. Once again, the ideal situation is one in which
the half who passed the item were students who all did well on the exam overall.

Does this mean that it is never appropriate to retain items on an exam that are
passed by all examinees, or by none of the examinees? Not at all. There are many
reasons to include at least some such items. Very easy items can reflect the fact that
some relatively straightforward concepts were taught well and mastered by all students.
Similarly, an instructor may choose to include some very difficult items on an exam to
challenge even the best-prepared students. The instructor should simply be aware that
neither of these types of items functions well to make discriminations among those
taking the test.

References and further readings
Page 24 of 25
Babbie, E.R. (1989). The practice of Social Science Research. Belmont, CA:Wadsworth.

Mehrens, W.A., and Lehmann,I.J.(1991). Measurement and Evaluation in Education and


Psychology. Chicago: Holt, Rinehart &Winston.

Glass,G.V., and Hopkins,K.D.(1996). Statistical methods in Education and Psychology (3rd


ed.). Boston: Allyn and Bacon.

Page 25 of 25

Você também pode gostar