Escolar Documentos
Profissional Documentos
Cultura Documentos
ITEM
RESPONSE THEORY
An evaluation of the theory test in the
Swedish driving-license test
EM No 50, 2004
ISSN 1103-2685
Marie Wiberg
Abstract
The Swedish driving-license test consists of a theory
test and a practical road test. The aim of this paper is
to evaluate which Item Response Theory (IRT) model
among the one (1PL), two (2PL) and three (3PL)
parameter logistic IRT models that is the most suitable
to use when evaluating the theory test in the Swedish
driving-license test. Further, to compare the chosen
IRT model with the indices in Classical Test Theory
(CTT). The theory test has 65 multiple-choice items and
is criterionreferenced. The evaluation of the models
were made by verifying the assumptions that IRT
models rely on, examining the expected model features
and evaluating how well the models predict actual test
results. The overall conclusion from this evaluation is
that 3PL model is preferable to use when evaluating
the theory test. By comparing the indices from CTT and
IRT it was concluded that both give valuable
information and should be included in an analysis of
the theory test in the Swedish driving-license test.
INDEX
INTRODUCTION
AIM
METHOD: SAMPLE
METHOD: CLASSICAL TEST THEORY
METHOD: ITEM RESPONSE THEORY
1. Verifying the assumptions of the model
A. Unidimensionality
B. Equal discrimination
C. Possibility of guessing the correct answer
2. Expected model features
3. Model predictions of actual test results
Estimation methods
RESULTS: CLASSICAL TEST THEORY
Modeling the items
RESULTS: ITEM RESPONSE THEORY
1. Verifying the assumptions of the model
2. Expected model features
Invariance of ability estimates
Invariance of item parameter estimates
3. Model predictions of actual test results
Goodness of fit
Normal distributed abilities
Test information functions and standard error
Rank of the test-takers
COMPARING CTT AND IRT
DISCUSSION
FURTHER RESEARCH
REFERENCES
2
3
3
4
5
5
5
5
6
6
7
8
10
10
10
12
12
13
16
16
16
17
18
20
22
24
25
Introduction
The
Swedish
National
Road
Administration,
abbreviated here as SNRA, is responsible for the
Swedish
driving-license
examination.
The
drivinglicense test consists of two parts: a practical
road test and a theory test. In this paper the emphasis
is on the theory test. The theory test is a standardized
test that is divided into five content areas and is used
to decide if the test-takers have sufficient theoretical
knowledge to be a safe driver as stated in the
curriculum (VVFS, 1996:168). The theory test is a
criterion-referenced test (Henriksson, Sundstrm, &
Wiberg, 2004) and consists of 65 multiple-choice items,
where each item has 2-6 options, and only one is
correct. The test-taker receives one point for each
correctly answered item. If the test-taker has a score
higher or equal to the cut-off score, 52 (80%), the testtaker passes the test. Each test-taker is also given five
try-out items together randomly distributed among the
regular items, but the score on those are not included
in their final total score.
(VVFS, 1999:32).
A test can be studied from different angles and the
items in the test can be evaluated according to
different theories. Two such theories will be discussed
here; Classical Test Theory (CTT) and Item Response
Theory (irt). CTT was originally the leading framework
for analyzing and developing standardized tests. Since
the beginning of the 1970s IRT has more or less
replaced the role CTT had and is now the major
theoretical framework used in this scientific field
(Crocker & Algina, 1986; Hambleton & Rogers, 1990;
Hambleton, Swaminathan, & Rogers, 1991).
CTT has dominated the area of standardized testing
and is based on the assumption that a test-taker has an
1
Aim
The aim of this study is to examine which IRT model is
the most suitable for use when evaluating the theory
test in the Swedish drivinglicense test. Further, to use
this model to compare the indices generated by
classical test theory and item response theory.
Method: Sample
A sample of 5404 test-takers who took one of the test
versions of the Swedish theory driving license test in
January 2004 was used to evaluate the test results. All
test-takers answered each of the 65 regular items in
the test. Among the test-takers 43.4% were women and
56.4% were men. Their average age was 23.6 years
(range = 18-72 years, s = 7.9 years) and 75% of the
test-takers were between 18 and 25 years.
where
of interest correctly,
the values of the point biserial correlations and the pvalues. However Figure 1 indicates that there is no
specific connection between items that are easy and
items that are difficult.
Table 1. The p-values and the point biserial correlations
(rpbis) for the 65 items*.
Item
1
p
0.88
0.79
0.79
0.62
0.81
0.62
0.63
0.95
0.88
10
0.82
11
0.81
12
0.68
13
0.94
14
0.81
15
0.96
16
0.95
rpbis
0.2
6
0.2
8
0.2
6
0.3
3
0.3
1
0.3
9
0.4
4
0.2
9
0.3
6
0.3
1
0.2
2
0.2
7
0.3
3
0.3
6
0.2
4
0.2
Item
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
p
0.82
rpbis
0.2
6
0.95 0.2
4
0.85 0.2
3
0.82 0.3
2
0.96 0.1
5
0.91 0.3
5
0.87 0.2
9
0.85 0.2
7
0.69 0.4
1
0.90 0.3
3
0.84 0.3
3
0.95 0.2
5
0.61 0.2
4
0.78 0.4
8
0.95 0.2
2
0.97 0.1
11
Item
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
P
0.91
rpbis
0.3
7
0.91 0.3
7
0.63 0.3
5
0.93 0.2
7
0.96 0.2
7
0.70 0.3
1
0.82 0.4
1
0.73 0.3
2
0.53 0.3
4
0.69 0.1
9
0.83 0.2
3
0.26 0.2
3
0.84 0.2
9
0.13 0.0
3
0.78 0.1
3
0.84 0.2
17
0.79
18
0.86
19
0.76
20
0.97
21
0.94
22
0.84
2
0.3
5
0.2
6
0.3
1
0.2
0
0.2
3
0.4
1
39
0.73
40
0.67
41
0.88
42
0.87
43
0.94
44
0.82
8
0.2
8
0.3
7
0.4
1
0.2
3
0.1
2
0.3
3
61
0.81
62
0.85
63
0.59
64
0.47
65
0.52
8
0.1
3
0.3
2
0.3
3
0.2
9
0.2
3
12
13
14
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65
Component Number
20
10
0
0,00
,06
,13
,19
,25
,31
,38
,44
,50
item discrimination
15
1
2
0
Eigenvalue
16
17
0,0
,6
,8
18
Probability
-3,00
-2,00
ability
2
1
0
1
2
19
-1,00
,00
1,00
2,00
3,00
20
21
22
23
-10
-5
-15
-20
0
10
24
15
-20
-15
-10
-5
10
15
25
,2
,3
,1
00
,
,4
,5
,6
,7
0,00
,10
,20
,30
,40
,50
,60
,70
26
27
28
30
31
Discussion
This study has aimed to describe the theory test in the
Swedish driving license test using both classical test
theory and item response theory with most weight on
the latter. The most important goal has been to
compare the 1PL, 2PL and the 3PL models in order to
find the model which is most suitable for modeling the
items. Further to compare this model results from CTT.
The evaluation using classical test theory showed that
the internal reliability was high; 0.82. Further, the pvalues of the items ranged from 0.13 to 0.97 and the
point biserial correlation from 0.03 to 0.48. One
important conclusion was that the item discrimination
was not similar between items. Another important
result was that if we want to model the items we
should include a guessing parameter in the model since
some test-takers with low ability tend to guess the
correct answer on the most difficult items.
The evaluation using IRT was performed according to
three criteria. The first criterion was verifying the
model assumptions. The factor analysis in Figure 2
supports the assumption of unidimensionality. There
was one dominant factor that explains more of the
variation than the other factors. Even though it would
be better if the first factor would account for more
explained variance this assumption is considered
fulfilled. The item discrimination was concluded to be
unequal among the items which lead to the conclusion
that we should have that parameter in our model. In
other words, the 1PL is less suitable than the 2PL or
the 3PL models. Guessing should probably also be
included in the model since test-takers with low ability
still manage to get some of the most difficult items in
32
,2
0,0
,3
,4
Further research
There are, of course, many different scientific fields
related to licensure tests using both CTT and IRT that
can be studied in the future. For example, if the
c-values
evaluators have access to the test situation it can be
0,0
,1
,2
,3examine
,4
,5
,6
manipulated. Further,
one
can
the
effect of
speediness, i.e. if thetheoretical
estimated
guessing abilities are the same
regarding how much time they have to complete the
test. If the evaluators have access to the items before
the test is given to the test-takers the assumption of
local independence can be more carefully examined
and it can help understanding why an item has a
certain difficulty, discrimination or a certain value of
guessing. Other interesting fields concern what kind of
studies that can be made using IRT with a theory test
in a driving-license test. For example, studying test
35
References
Bechger, T., Gunter, M., Huub, H., & Bguin, A. (2003). Using
classical test theory in combination with item response theory.
Applied psychological measurement, 27(5), 319-334.
Birnbaum, A. (1968). In F. M. Lord & M. R. Novick (Eds.), Statistical
theories of mental test scores. Reading: Addison-Wesley.
Bock, R., & Mislevy, R. (1990). Item analysis and test scoring with
binary logistic models. Mooresville: Scientific Software.
Camilli, G., & Shepard, L. (1994). Methods for identifying biased test
items. Thousand Oaks, CA: Sage publications.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern
test theory. New York: Holt, Rinehart and Winston, Inc.
Hambleton, R. K. (2004). Personal communication. Ume.
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test
theory and item response theory and their applications to test
development. Educational Measurement: issues and practice,
12(3), 535-556.
Hambleton, R. K., Robin, F., & Xing, D. (2000). Item response
models for the analysis of educational and psychological test
data. In H. Tinsley & S. Brown (Eds.), Handbook of applied
multivariate statistics and modelling. San Diego, CA:
Academic Press.
Hambleton, R. K., & Rogers, J. H. (1990). Using item response
models in educational assessments. In W. Schreiber & K.
Ingenkamp (Eds.), International developments in large-scale
assessment (pp. 155-184). England: NFER-Nelson.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory:
principles and applications. Boston: Kluwer-Nijhoff
Publishing.
Hambleton, R. K., Swaminathan, H., & Rogers, J. H. (1991).
Fundamentals of item response theory. New York: Sage
publications.
36
37
38
Appendix