Você está na página 1de 48

Lecturer: Yee Bee Choo

IPGKTHO
Item Analysis

Item Distractor
Item Difficulty
Discrimination Analysis
Item analysis is a process which examines
student responses to individual test items
(questions) in order to assess the quality of
those items and of the test as a whole.
Purpose of Item Analysis
 To select the best available items for the final form of
the test.

 To identify structural or content defects in the items.


 To detect learning difficulties of the class as a whole

 To identify the areas of weaknesses of students in need


of remediation.
 To increase instructors' skills in test construction
 To identify specific areas of course content which need
greater emphasis or clarity.
Item Analysis information can tell us
if an item (i.e. the question) was too easy or
too difficult (item difficulty)
how well it discriminated between high and
low scorers on the test (item discrimination)
whether all of the alternatives functioned as
intended (distractor analysis)
Item Difficulty
Item difficulty or Index of Difficulty (IF) refers to
how easy or difficult an item is.
The formula used to measure item difficulty is
quite straightforward.
It involves finding out how many students
answered an item correctly and dividing it by the
number of students who took this test. The
formula is therefore:
Item Difficulty
For example, if twenty students took a test and
15 of them correctly answered item 1, then the
item difficulty for item 1 is 15/20 or 0.75.
Item difficulty is always reported in decimal
points and can range from 0 to 1.
An item difficulty of 0 refers to an extremely
difficult item with no students getting the item
correct and an item difficulty of 1 refers to an
easy item which all students answered correctly.
Item Difficulty
The appropriate difficulty level will depend on the
purpose of the test.
According to Anastasi & Urbina (1997), if the test
is to assess mastery, then items with a difficulty
level of 0.8 can be accepted.
However, they go on to describe that if the
purpose of the test is for selection, then we
should utilise items whose difficulty values come
closest to the desired selection ratio –for
example, if we want to select 20%, then we
should choose items with a difficulty index of
0.20.
Item Discrimination
Item discrimination is used to determine how
well an item is able to discriminate between good
and poor students.
Item discrimination values range from –1 to 1.
A value of –1 means that the item discriminates
perfectly, but in the wrong direction.
This value would tell us that the weaker students
performed better on a item than the better
students.
This is hardly what we want from an item and if
we obtain such a value, it may indicate that there
is something not quite right with the item.
Item Discrimination
It is strongly recommended that we examine
the item to see whether it is ambiguous or
poorly written.
A discrimination value of 1 shows positive
discrimination with the better students
performing much better than the weaker ones
– as is to be expected.
Item Discrimination
Item Discrimination
Suppose you have just conducted a twenty item
test and obtained the following results:
Table 1: Item Discrimination
Item Discrimination
As there are twelve students in the class, 33%
of this total would be 4 students. Therefore,
the upper group and lower group will each
consist of 4 students each.
Based on their total scores, the upper group
would consist of students L, A, E, and G while
the lower group would consist of students J,
H, D and I.
Item Discrimination
We now need to look at the performance of these
students for each item in order to find the item
discrimination index of each item.
For item 1, all four students in the upper group (L, A,
E, and G) answered correctly while only student H in
the lower group answered correctly.
Using the formula described earlier, we can plug in
the numbers as follows:
Item Discrimination
Two points should be noted.
First, item discrimination is especially important
in norm referenced testing and interpretation as
in such instances there is a need to discriminate
between good students who do well in the
measure and weaker students who perform
poorly. In criterion referenced tests, item
discrimination does not have as important a role.
Secondly, the use of 33.3% of the total number of
students who took the test in the formula is not
inflexible as it is possible to use any percentage
between 27.5% to 35% as the value.
Distractor Analysis
Distractor analysis is an extension of item analysis,
using techniques that are similar to item difficulty
and item discrimination.
In distractor analysis, however, we are no longer
interested in how test takers select the correct
answer, but how the distractors were able to function
effectively by drawing the test takers away from the
correct answer.
The number of times each distractor is selected is
noted in order to determine the effectiveness of the
distractor.
We would expect that the distractor is selected by
enough candidates for it to be a viable distractor.
Distractor Analysis
What exactly is an acceptable value?
This depends to a large extent on the difficulty of
the item itself and what we consider to be an
acceptable item difficulty value for test items. If
we are to assume that 0.7 is an appropriate item
difficulty value, then we should expect that the
remaining 0.3 be about evenly distributed
among the distractors.
Distractor Analysis
Let us take the following test item as an example:
In the story, he was unhappy because__________.
A. it rained all day
B. he was scolded
C. he hurt himself
D. the weather was hot
Distractor Analysis
Let us assume that 100 students took the test. If we
assume that A is the answer and the item difficulty is
0.7, then 70 students answered correctly.
What about the remaining 30 students and the
effectiveness of the three distractors?
If all 30 selected D, then distractors B and C are
useless in their role as distractors.
Similarly, if 15 students selected D and another 15
selected B, then C is not an effective distractor and
should be replaced.
Therefore, the ideal situation would be for each of the
three distractors to be selected by an equal number
of all students who did not get the answer correct, i.e.
in this case 10 students.
Distractor Analysis
Therefore the effectiveness of each distractor can be quantified
as 10/100 or 0.1 where 10 is the number of students who
selected the items and 100 is the total number of students who
took the test.
This technique is similar to a difficulty index although the result
does not indicate the difficulty of each item, but rather the
effectiveness of the distractor.
In the first situation described in this paragraph, options A, B, C
and D would have a difficulty index of 0.7, 0, 0, and 0.3
respectively.
If the distractors worked equally well, then the indices would be
0.7, 0.1, 0.1, and 0.1.
Unlike in determining the difficulty of an item, the value of the
difficulty index formula for the distractors must be interpreted in
relation to the indices for the other distractors.
Distractor Analysis
From a different perspective, the item discrimination
formula can also be used in distractor analysis.
The concept of upper groups and lower groups would
still remain, but the analysis and expectation would
differ slightly from the regular item discrimination
that we have looked at earlier.
Instead of expecting a positive value, we should
logically expect a negative value as more students
from the lower group should select distractors.
Each distractor can have its own item discrimination
value in order to analyse how the distractors work
and ultimately refine the effectiveness of the test
item itself.
Distractor Analysis
Table 2: Selection of Distractors
Distractor A Distractor B Distractor C Distractor D

Item 1 8 3 1 0

Item 2 2 8 2 0

Item 3 4 8 0 0

Item 4 1 3 8 0

Item 5 5 0 0 7
Distractor Analysis
For Item 1, the discrimination index for each distractor can be
calculated using the discrimination index formula.
From Table 2, we know that all the students in the upper group
answered this item correctly and only one student from the lower
group did so. If we assume that the three remaining students
from the lower group all selected distractor B, then the
discrimination index for item 1, distractor B will be:

This negative value indicates that more students from the lower
group selected the distractor compared to students from the
upper group. This result is to be expected of a distractor and a
value of -1 to 0 is preferred.
Why Do Item Analysis?
Encourage teachers to undertake an item analysis
as often as practical
Allowing for accumulated data to be used to make
item analysis more reliable
Providing for a wider choice of item format and
objectives
Facilitating the revision of items
Facilitating the physical construction and
reproduction of the test
Accumulating a large pool of items as to allow for
some items to be shared with the students for
study purposes.
Benefits of Item Analysis
1. It provides useful information for class
discussion of the test.
2. It provides data which helps students
improve their learning.
3. It provides insights and skills that lead to the
preparation of better tests in the future.
Limitations of Item Analysis
It cannot be used for essay items.
Teachers must be cautious about what
damage may be due to the table of
specifications when items not meeting the
criteria are deleted from the test. These
items are to be rewritten or replaced.
Outline
1. Introduction
2. Where, when, how the test is administered,
number of students involved and which Year
and class
3. Test blueprint
4. Test format

5. Sample of test designed


Content / Learning Objectives to be learned Total % Weight
Subject
Recall of Understanding Application Analysis Synthesis Evaluation
Area
facts
Writing 3 items 3 items - - - - 6 6%

Language 2 items 4 items 2 items - 2 items - 10 10%


Art 1
Reading 1 4 items 3 items 4 items - 4 items - 15 15%

Reading 2 5 items 4 items 4 items - 4 items - 17 17%

Grammar 1 4 items 10 items 8 items - 8 items - 30 30%

Grammar 2 3 items 7 items 5 items - 7 items - 22 22%

TOTAL 21 31 23 - 25 - 100 100%

% Weight 21% 31% 23% 25% 100%


SPM 1119 English
Paper 1 (Time: 1 hour 45 minutes)
Section A. Directed Writing (35 marks)
Section B. Continuous Writing (50 marks)

Paper 2 (Time: 2 hours 15 minutes)


Section A. 15 MCQ questions (15 marks)
Section B. Information Transfer (10 marks)
Section C. (i) Reading Comprehension (10 marks)
(ii) Summary (15 marks)
Section D. Literature Component.
(i) Poem. 1 poem with 4 short-answer questions
(5 marks)
(ii) Novel. 1 essay question (15 marks)
Outline
1. Introduction
2. Students’ performance in English test (Table 1 & 2)
3. Item Analysis
a) Item Difficulty (Table 3)
b) Item Discrimination (Table 4)
c) Distractor Analysis (Table 5)
4. Strengths
5. Weaknesses
6. Problems
7. Suggestions
8. Conclusion
Table 1: Students’ Performance in English Test

Student Raw Scores Percentage Scores Grade


1
2
3
4

1. Find the highest and lowest score.


2. Find the mean, mode and median.
Table 2: Class Frequency Distribution
Grade Percent Scores Frequency Frequency
(Percentage)

A 80-100
B 60-79
C 40-59
D 20-39
E 0-19

1. Do a bar graph based on the table.


2. Discuss the results of students’ performance in
terms of grade and frequency percentage.
Preparing Data for Item Analysis
1. Arrange test scores from highest to lowest.
2. Get one-third of the papers from the highest
scores and the other one-third of the papers
from the lowest scores.
3. Record separately the number of times each
alternatives was chosen by the students in both
groups.
4. Add the number of correct answers to each
item made by the combined upper and lower
groups.
5. Calculate the item difficulty and item
discrimination.
Item Group Answers Total No. Difficulty H–L Discriminati
of Index on Index
Correct (Item (Item
A B C D
Answers Difficulty) Discriminat
ion
H 20 3 14 2 1
1 21 52.5 7 0.35
L 20 10 7 3 0
H 20 0 0 18 2
2 27 67.5 9 0.45
L 20 0 3 9 8
H 20 3 8 4 4
3 10 25.0 6 0.30
L 20 10 2 4 4
H 20 3 3 4 10
4 14 35.0 6 0.30
L 20 2 4 10 4
H 20 15 2 2 1
5 16 40.0 14 0.70
L 20 1 10 4 5
Table 3: Analysis of Item Difficulty
Item Correct Incorrect Total Item
Response Response Responses Difficulty (IF)

1
2
3
4
5

Formula: Item Difficulty


Interpreting Item Difficulty (IF)
IF values above 0.90 are very easy items and
should not be reused again for subsequent test.
If almost all the students can get the item
correct, it is a concept not worth testing. IF
values below 0.20 are very difficult items and
should be reviewed for possible confusing
language, removed from subsequent test, and/or
highlighted for an area for re-instruction. If
almost all the students get the item wrong, there
is a problem with the item or the students did
not get the concept.
Interpreting Item Difficulty (IF)
Range of Interpretation Action
difficulty index
0 – 0.25 Difficult Revise or

discard
0.26 – 0.75 Right difficulty retain

0.76 - above Easy Revise or

discard
Table 4: Analysis of Item Discrimination
Student Total Correct Response in Each Item
Score
Item 1 Item 2 Item 3 Item 4 Item 5

1 1 0 1 0 1

2 0 1 1 0 0

3 1 1 1 1 1

4 0 0 0 0 0

Formula: Item Discrimination


Item discrimination describes the ability of an item to
distinguish between high and low scores (scores of upper
and lower 33.33% of students after being ordered
descendingly).
The range is from 0 to 1.
The higher the value, the more discriminating the item. A
highly discriminating item indicates the students who had
high tests scored got the item correct whereas students
who had low tests scored got the item incorrect.
Items with discrimination value less than or near zero
should be removed from the test. This indicates students
who overall did poorly on the test did better on the item
than the students who overall did well. The item may be
confusing for your better scoring students in some way.
Interpreting Item discrimination
0.40 or higher – very good discrimination
0.30 to 0.39 – reasonably good discrimination but
possibly subject to improvement
0.20 to 0.29 – marginal/ acceptable discrimination
(subject to improvement)
0 to 0.19 – poor discrimination (to be rejected or
improved by revision)
Negative ID – low performing students selected the
correct answer more often than high scores (to be
rejected)
Interpreting Item discrimination
Index Range Interpretation Action

-1.0 to -.50 Can discriminate Discarded

but the item is


questionable
-.55 to .45 Non- Revised

discriminating

.46 to 1.0 Discriminating Include


Table 5: Analysis of Distractor
Distractor A Distractor B Distractor C Distractor D

Item 1

Item 2

Item 3

Item 4

Item 5

TOTAL
Interpreting Distractor Analysis
The distractors are important component of an item,
as they show a relationship between the total test
score and the distractor chosen by the student.
Distractor analysis is a tool to inform whether the
item was well structured or failed to perform its
purpose.
The quality of the distractor influences students
performance on a test item. Ideally, low-scoring
students who had not mastered the subject, should
choose the distractor more often, whereas high
scorers should discard them more frequently while
choosing the correct option.
Interpreting Distractor Analysis
Any distractor that has been selected by 5% of the
students is considered to be non-functioning
distractor.
Reviewing the options can reveal potential errors of
judgment and inadequate performance of
distractors. These poor distractors can be revised,
replaced or removed.
Interpreting Distractor Analysis
Any distractor that has been selected by 5% of the
students is considered to be non-functioning
distractor.
Reviewing the options can reveal potential errors of
judgment and inadequate performance of
distractors. These poor distractors can be revised,
replaced or removed.
Internal Consistency Reliability
The reliability of a test refers to the extent to which the
test is likely to produce consistent scores.
The measure of reliability used is Cronbach's Alpha.
This is the general form of the more commonly reported
KR-20 and can be applied to tests composed of items
with different numbers of points given for different
response alternatives.
When coefficient alpha is applied to tests in which each
item has only one correct answer and all correct answers
are worth the same number of points, the resulting
coefficient is identical to KR-20.
High reliability indicates that the items are all measuring
the same thing, or general construct.
The higher the value, the more reliable the overall test
score.
Internal Consistency Reliability
We can estimate the proportion of true score variance that is
captured by the items by comparing the sum of item variances
with the variance of the sum scale. Specifically, we can compute:
= (k/(k-1)) * [1- (s2i)/s2sum]
This is the formula for the most common index of reliability,
namely, Cronbach's coefficient alpha (α). In this formula, the
si**2's denote the variances for the k individual items; ssum**2
denotes the variance for the sum of all items.
If there is no true score but only error in the items (which is
esoteric and unique, and, therefore, uncorrelated across
subjects), then the variance of the sum will be the same as the
sum of variances of the individual items. Therefore, coefficient
alpha will be equal to zero.
If all items are perfectly reliable and measure the same thing
(true score), then coefficient alpha is equal to 1. (Specifically, 1-
(si**2)/ssum**2 will become equal to (k-1)/k; if we multiply this
by k/(k-1) we obtain 1.)
Internal Consistency Reliability
Cronbach’s Internal Consistency
Alpha
(Reliability)
α≥0.90 Excellent
0.80≤α≤0.90 Very good
0.70≤α≤0.80 Good (There are probably a few items which could
be improved)
0.60≤α≤0.70 Acceptable (There are probably some items which
could be improved)
0.50≤α≤0.60 Poor (Suggests need for revision of a test)

α≤0.50 Questionable/ Unacceptable (This test should not


contribute heavily to the course grade, and it
needs revision)

Você também pode gostar