Você está na página 1de 4

Using Speed Measures to Predict Performance

in X-Ray Luggage Screening Tasks


Alan Wales, Tobias Halbherr Adrian Schwaninger
Center for Adaptive Security Research and Applications University of Applied Sciences Northwestern Switzerland
(CASRA) School of Applied Psychology, Institute Human in Complex
Thurgauerstr. 39 Systems
8050 Zurich, Switzerland 4600 Olten, Switzerland
and
Center for Adaptive Security Research and Applications
(CASRA)
Thurgauerstr. 39
8050 Zurich, Switzerland

Abstract— The visual inspection of x-ray images of luggage items Superposition refers to the extent to which the threat item is
at airports is a challenging task, where detection rates suffer occluded by other objects in the bag Finally, bag complexity
when threat item complexity increases [1].The relationship refers to the amount of noise, clutter and general ‘disorder’ in a
between threat-item types, aspects of image difficulty, and bag, meaning that high bag complexities should make detecting
decision time are explored using a combination of Drury’s Two- threat items more complicated due to the difficulties in
Component Model [2] and Signal Detection Theory [3]. 67 dissociating background noise from parts of potential threats.
professional screeners completed a 2048-image battery that
manipulated various image-based difficulty factors. A strong Because more difficult bags are given longer consideration
linear relationship between hit rate and decision time was found to increase the probability of a correct response, we
(r² = 0.64), with the hardest images showing a marked increase in hypothesised that decision time could be a predictor of
decision time and decrease in hit-rate. The search time was found detection performance. Speed-accuracy operating curves [8,9]
to be relatively stable across the threat categories, but decision have long stated that humans slow down as the task difficulty
time increased in proportion to detection rate decreases. Decision increases. If image difficulties are manipulated, then subjects
time is shown to closely reflect changes in detection sensitivity take longer to make decisions but may not necessarily become
caused by different threat and image difficulties. more accurate by comparison to the easier images. When easy
items are shown, subjects tend to indicate “threat present”
Keywords- X-ray image interpretation, speed; accuracy;
straight away. However, difficult items require more of a
decision time, search time
cognitive outlay, and there is indecision and degradation in
detection performance. It is only when reaction times are varied
I. INTRODUCTION within images that longer reaction times are bound to produce
Response speed and performance in detecting prohibited more accurate performance. This leads to the conclusion that
items in x-ray images of luggage items are intertwined: Visual between-image variance in decision time could be used as a
search studies have shown that systematic manipulation of predictor of detection performance. Within-image predictions
allowed detection time affects changes in response accuracy of performance requires a fixed-time experimental process
[4]. The more difficult the visual search task for a given item, known as a response signal experiment [4], whereas our study
the longer subjects take to make a judgement to compensate allowed subjects to take as long as they required to respond to
for the difficulty. This property of search tasks makes decision stimuli and therefore is an unconstrained task.
time measures a promising predictor for both item difficulty There are considerable difficulties that arise when trying to
and detection performance (note that the term detection conjoin speed and detection into a single model [10]. A popular
performance sometimes refers to hit rate and sometimes to model derived from the human factors domain, Drury’s Two-
sensitivity as defined in signal detection theory [3, 5]). Component Model (TCM; [2]), splits the speed measures into a
More recently, predictions of detection performance have search and a decision component in the form of a speed-
become automated using image-based factor (IBF) estimations accuracy operating curve [11]. This allows comparison of
[6]. These IBFs are based on previous work [7] that has detection performance with decision time, which can be viewed
uncovered three major determinants of image difficulty: threat as a search-corrected reaction time. The time taken to search
item view difficulty, superposition and bag complexity. The for an item should be fairly stable within-subjects, but there are
first IBF, view difficulty, relates to the rotation of the target likely to be differences in the strategies used by different
item from the canonical view, with certain rotations rendering subjects and therefore differences in the times allocated to
images more difficult to recognise as a threat object. search and decision times. The formulas and application of the
model can be found in [12].

978-1-4244-4170-9/09/$25.00 ©2009 IEEE 212


In summation, the three IBFs should not only result in An estimate of search and decision time is provided by the
lowered detection performance in the harder difficulty settings, TCM, which provides a framework in which to compare both
but should also take longer to evaluate due to increases in the detection performance and the amount of reaction time
decision time. Additionally, the threat types themselves should which is attributable to both the search and decision time taken
provide a marker for decision time as the simplest and most to evaluate each image. This model was applied to each subject
visually-salient threats, i.e. guns, should take less time to for each threat type and IBF combinations, resulting in over
evaluate than more difficult multi-component threats such as 800 TCM fits. The model consists of individualised linear
bombs. The interactions between the IBFs, the threat types, correlations using a formula derived to evaluate the probability
detection performance and decision time can also be evaluated of a correct response related to the associated reaction times for
by using these counterbalanced measures. each IBF, resulting in an estimate for search time (ST), which
is influenced by the very fastest reaction times, and the
II. METHOD decision time (DT), which is the mean reaction time minus this
search time. Therefore the decision time can be viewed as a
A. Participants search-corrected reaction time.
67 professional x-ray screeners from a large European
airport participated in the study as part of their routine training III. RESULTS
process. Because of the large item dataset (2048 images) each To model the data for each individual, the bag size factor
subject was exposed to, completing the task could take several was collapsed because splitting each IBF by each threat type
hours and therefore continued into multiple sessions. To resulted in only 32 items (16 threats) per bag size, which is
control for any training effect, training was suspended until the insufficiently large to model reliable analyses. Therefore a
entire battery of images were completed, upon which time spread of 64 items (32 threats) with various balanced bag sizes
training resumed as normal. Due to the within-subjects nature were analysed per condition. A table of the conditions is given
of the analyses, prior training coming into the task is not below for reference to the Bonferroni-corrected pairwise
recorded, but may partly explain why there are wider variances comparisons (using the repeated-measures ANOVA procedure
for knowledge items (i.e. IEDs) than easier items (i.e. guns). of SPSS), where the condition number refers to that given in
Table 1. The “All Low” and “All High” conditions are 1 and 8
B. Stimuli and Procedure respectively, while main effects of bag complexity (BC),
The test used was a variant of the Competency Assessment superposition (SP) and view difficulty (VD) are conditions 2, 4
Test (CAT; [1]), which is described in detail in [13]. In total and 6.
2048 x-ray luggage images were presented to participants, of Summary of the conditions and the associated condition
which 1024 contained a threat item (Ti) and the other 1024 did numbers (#) used in the pairwise comparisons.
not (NTi). The threat items consisted either of guns, knives,
improvised explosive devices or ‘other’ miscellaneous threats View
# Bag Complexity Superposition Difficulty
such as gas sprays that do not fit into the other three categories.
There were sixteen threat items per category, resulting in a total 1 Low Low Low
of 64 unique threat items. These threats were semi-
2 High Low Low
automatically placed into positions that manipulated the
aforementioned IBFs (view difficulty, bag complexity and 3 High High Low
superposition) into either “high” or “low” categories. In
addition, bag sizes were given either as large or small bags (for 4 Low High Low
details see [13]). All IBFs were counterbalanced with each 5 Low High High
other resulting in the 1024 threat images (4 threat categories x
2 view difficulties x 2 bag complexities x 2 superpositions x 2 6 Low Low High
bag sizes x 16 threat items). Consequently, of the 1024 NTi 7 High Low High
bags, only bag size and bag complexity varied between images.
Only data from screeners who had completed the entire 2048 8 High High High
image battery were used for analysis. Due to airport regulations
and data protection, no absolute detection values can be A three-way within-subjects ANOVA was used for each
reported. threat type, with significant main effects of SP (F(66,1) =
66.91, p < 0.01), and VD (F(66,1) = 176.98, p < 0.01) but not
C. Measures for BC (F(66,1) = 1.67, p > 0.05). Detection for knives was
Detection performance is measured using the probability of significantly different for BC (F(66,1) = 138.36, p < 0.01) ,
a hit (pHit), as well as the sensitivity measure d’ (see [14]), VD (F(66,1) = 573.16, p < 0.01) and SP (F(66,1) = 343.90, p <
which is the standardised hit rate minus the standardised false 0.01), whereas detection for IEDs was significantly different
alarm rate. While d’ includes measures of incorrect assertions for BC (F(66,1) = 9.97, p < 0.01) , VD (F(66,1) = 28.91, p <
(false alarms for NTis), the probability of a hit can be argued as 0.01) and SP (F(66,1) = 173.96, p < 0.01). The ‘other’ category
being more relevant to actual screener performance ([7]). The found main effects again for BC (F(66,1) = 10.73, p < 0.01) ,
bias measure c can be used to ascertain if participants are trying VD (F(66,1) = 253.57, p < 0.01) and SP (F(66,1) = 32.98, p <
to inflate hit rate performance by clicking “threat present” 0.01). Of all the main effects, then, only bag complexity did not
liberally even when no threat is present, so pHit measures are alter between high and low values for guns as they are given in
contraindicated under these circumstances. the counterbalanced Latin-Squares design.
213
Follow-up pairwise comparisons were computed for each the associated decision times, split by the main effects of the
separate threat type as a univariate ANOVA, resulting in 28 IBFs. For conciseness the easiest condition (“All low”) and the
comparisons per threat type. The repeated factor was IBF hardest (“All High”) are given as well as the individual IBFs.
condition, and the levels were the eight levels given in Table 1. There is a clear relationship between the DT, which increases
d’ and pHit were closely approximated in these results, but as with image difficulty, and also the pHit for each threat type.
pHit provided a better linear approximation to DT (Figure 1) These results provide support for the view that superposition, in
only pHit is shown in Table 2. The high volume of significant the presence of the other IBFs in their ‘low’ conditions, is the
results found in Table 2 matches the visual representations of IBF with the strongest influence on detection performance,
these data in Figure 2. The results in Table 2 confirm that the followed by view difficulty and then bag complexity.
IBFs all measure separate levels of performance, given that the Correlations were used to predict detection performance, giving
values tended to differ from each other in significant and values for d’ to DT of r(32) = -0.78, d’ to ST of r (32) = -0.54,
systematic ways. In addition to the main effects, the mixed pHit to DT of r(32) = -0.80 and pHit to ST of r(32) = -0.54.
conditions 2, 3, 5 and 7 also showed tendencies to differ from
each other in ways that are not shown in Figure 2, although
conditions 5-6, 5-7, and 6-7 tended not to differ from each
other.
Bonferroni-corrected pairwise comparisons for each threat
type and condition.
Guns 1 2 3 4 5 6 7
1
2 ns
3 ++ ++
4 ++ ++ ns
5 ++ ++ ++ ++
6 ++ ++ ++ ++ ns
7 ++ ++ ++ ++ ns ns
8 ++ ++ ++ ++ ns ns ns
Knives
2 ++ Figure 1. Scatterplot of the decision time against the probability of a hit. There
3 ++ ++ is a clear linear trend between the two dependent variables that are mediated
4 ++ ++ ++ by the type of threat being presented on screen. The decision time for guns is
5 ++ ++ ++ ++ markedly less than the other threat items.
6 ++ ++ ++ ++ --
7 ++ ++ ++ ++ ns ++
8 ++ ++ ++ ++ ++ ++ ++
IEDs
2 ++
3 ++ ++
4 ++ ++ ns
5 ++ ns -- -
6 ++ ns -- -- --
7 ns -- -- -- -- ns
8 ++ + -- -- ns ++ ++
Other
2 ns
3 ++ ++
4 ++ ++ ns
5 ++ ++ ++ ++
6 ++ ++ ++ ++ ns
7 ++ ++ ++ ++ ns ns
8 ++ ++ ++ ++ ns + ns
a. If there was an increase in PdHit between conditions 2 and 1 from Table 1 then Table 2 would
read ++ at p < 0.01 for condition 2 threat detection being significantly higher than condition 1. + = increase
p < 0.05, ++ = increase p < 0.01, - = decrease p < 0.05, -- = decrease p < 0.01, ns = non significant.

Main effects of each condition can be seen graphically in


Figures 1 and 2 for pHit, ST and DT using each of the eight
IBF manipulations. Figure 1 shows a negative linear
relationship between DT and pHit (r = -0.804, p < 0.05) split
into the various threat types that control part of the overall
image difficulty. Figure 2 shows these probabilities of a hit and
214
decision component of reaction time. The easiest of the threat
items, guns, exhibited a decision time that was only marginally
longer than the search time (Figure 2). IEDs were an interesting
adjunct from the main trend as they demonstrated a similar
decreasing linear relationship between decision time and hit
rate, but they appear grouped because of the longer decision
times per IBF compared to the other threat items. Perhaps this
is due to the greater dependence of detection for IEDs on
knowledge based factors ([13]). Knives show the widest spread
of detection rates and decision times between the easiest and
hardest conditions. Using the hit rate produced a small
improvement in model fit compared to d’ (r = -0.80 to r = -
0.78), but it should be noted that the false alarm rate can only
take into consideration the IBF bag complexity.
The linear relationship between detection performance and
decision time is encouragingly robust to threat type and
manipulations of image difficulty. The search-time component
of the reaction time remained fairly stable across the threat
types, while the decision component changed in magnitude.
Thus we conclude that decision time is an excellent predictor of
performance in x-ray lugging screening tasks and reflects
changing detection performances caused by various threat
categories and levels of image difficulties.

REFERENCES
[1] Koller, S., & Schwaninger, A, “Assessing X-ray image interpretation
competency of airport security screeners,” Proceedings of the 2nd
International Conference on Research in Air Transportation, ICRAT
2006, Belgrade, Serbia and Montenegro, June 24-28, 2006, pp. 399-402.
Figure 2. The top figure shows the probability of a hit (pHit) split by the main
[2] Drury, C.G, “The speed-accuracy tradeoff in industry,” Ergonomics,
IBFs, while the bottom figure shows the associated ST and DT for these 37(4), 1994, pp. 747-763.
values. As the probability of a hit decreases, the decision time increases. It is
notable that the search time remains fairly constant across the conditions. BC [3] Green, D.M., & Swets, J.A, “Signal detection theory and
= Bag Complexity, VD = View Difficulty, Sup = Superposition. psychophysics.,” 1966, New York: Wiley.
[4] Ratcliff, R, “Modeling response signal and response time data, “
IV. DISCUSSION Cognitive Psychology, 53, 2006, pp. 195-237.
[5] Swets, J.A. “Signal detection theory and ROC analysis in psychology
The results from Table 2 confirmed that there were and diagnostics: Collected papers,” 1995, Lawrence Erlbaum Associates.
significant differences in detection performance for the IBFs in [6] Schwaninger, A., Michel, S., & Bolfing A, “Towards a model for
each threat category. These differences were usually shown as estimating image difficulty in x-ray screening,” IEEE ICCST
an increase in detection performance for the easier condition Proceedings, 39, 2005, pp. 185-188.
i.e. “all low” compared to high bag complexity, with the [7] Schwaninger, A., Michel, S., and Bolfing, A, “A statistical approach for
exception of IEDs, which showed the converse results for some image difficulty estimation in x-ray screening using image
of the interactive effects i.e. high bag complexity and high view measurements,” ACM International Conference Proceeding Series, 253,
2007, pp. 123-130.
difficulty showed significantly worsened detection
[8] Pew, R.W, “The speed-accuracy operating characteristic,” Acta
performance than high superposition and high view difficulty Psychologica, 20, 1969, pp. 16-26.
conditions alone. These differences in detection performance
[9] Wickelgren, W.A, “Speed-accuracy tradeoff and information processing
are shown in Figure 2, as the probability of a hit decreases with dynamics.,” Acta Psychologica, 41, 1977, pp. 67-85.
successive loadings of IBFs and dipoles of difficulty (“all low” [10] Ratcliff, R., & Rouder, J.N, “Modeling response times for two-choice
to “all high”). decisions,” Psychological Science, 9, 1998, pp. 347-356.
Although it was not hypothesised that any individual IBFs [11] Spitz, G. & Drury, C.G, “Inspection of sheet materials – test of model
predictions,” Human Factors, 20(5), 1978, pp. 521-528.
would be more or less influential than others, Table 1 and
[12] Ghylin, K.M., Schwaninger, A., Drury, C.G., Redford, J., Lin, L. &
Figure 2 suggest that this sample of 67 participants found high Batta, R, “Screening enhancements: Why don’t they enhance
levels of superposition to be more challenging than high levels performance?” Proceedigs of the Human Factors and Ergonomics
of view difficulty or bag complexity for the BST. Not only was Society 52nd Annual Meeting, September 22-26, 2008,. New York City,
detection performance worsened, but the time taken to think NY USA.
about the threats present was also higher for superposition than [13] Bolfing, A., Halbherr, T., & Schwaninger, A, “How image based factors
bag complexity, but not for view difficulty. The “all high” and human factors contribute to threat detection performance in x-ray
aviation security screening,” HCI and Usability for Education and Work,
condition showed the largest decision times, followed by view Lecture Notes in Computer Science, 5298, 2008, pp. 419-438.
difficulty. [14] MacMillan, N.A., & Creelman, C.D, “Detection theory: A user's guide,”
The scatterplot given in Figure 1 showed a decreasing Cambridge: University Press, 1991.
linear relationship between the probability of a hit and the
215

Você também pode gostar