Escolar Documentos
Profissional Documentos
Cultura Documentos
Generation
firstname.lastname@list.lu
Abstract: Test authors can generate test items (semi-) automatically with differ-
ent approaches. On the one hand, bottom-up approaches consist of generating
items from sources such as texts or domain models. However, relating the gener-
ated items to competence models, which define required knowledge and skills on
a proficiency scale remains a challenge. On the other hand, top-down approaches
use cognitive models and competence constructs to specify the knowledge and
skills to be assessed. Unfortunately, on this high abstraction level it is impossible
to identify which item elements can actually be generated automatically. In this
paper we present a hybrid process which integrates both approaches. It aims at
securing a traceability between the specification levels and making it possible to
influence item generation during runtime, i.e., after designing all the intermediate
models. In the context of the European project EAGLE, we use this process to
generate items for information literacy with a focus on text comprehension.
1 Introduction
adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
In the European project EAGLE1 we aim to support the development of assessment
items to support formative assessment in the context of the public administration.
EAGLE’s learning approach is based on so-called open educational resources
(OER). Such OERs are created by employees of the public administrations for the pur-
pose of workplace learning and knowledge sharing. They can have different formats
such as text, video, audio, presentation or interactive content. The EAGLE platform
contains also several community tools such as a Wiki and blogs.
We propose using a process that can support the generation of items from the defi-
nition of construct maps in the domain of information literacy as well as from text-
based OERs and Wiki pages. We call this process a hybrid AIG approach because it
uses a top-down engineering approach, i.e., starting from a competence construct as
well as bottom-up, i.e., processing a text-based resources available on the EAGLE plat-
form in order to influence the generation of the items at runtime.
Section 2 elaborates on existing research work in the domain of automatic item gen-
eration. Section 3 describes the hybrid engineering workflow and the underlying mod-
els used in the EAGLE project. Section 4 provides an example of item generation in the
content of text comprehension, which is a basic skill in information literacy.
1 http://www.eagle-learning.eu/
as input, linguistics features and syntactical information can be used to influence item
generation. A domain model provides knowledge excerpts such as key concepts, refer-
ences, relationships between two concepts in a clause, etc.
Bottom-up strategies can represent very different approaches with various levels of
automation and various sources to be used for AIG. While Gierl et al. [5] only use
manually created templates and variables generated from the definition of mathematical
equations (e.g., x is an integer between 1 and 10), Karamanis et al. [6] and Moser et al.
[7] generate questions from texts. Foulonneau et al., Linnebank et al., Liu et al. and
Papadopoulos et al. use semantic models in domains of medicine [8-11], history and
environment, and computer science to generate items. Finally, Zoumpatianos et al. [12]
suggests generating choice questions from rules using a Semantic Web Rule Language
(SWRL).
Karamanis [6] assesses the quality of questions obtained through AIG by asking ex-
perts to identify the questions which could be delivered without edit. Nevertheless, the
quality of the generated items mainly depends on linguistic features [13] and distractors
(i.e., wrong answer options) effectiveness [8] rather than on the quality of the items to
adequately measure a construct (i.e., the knowledge or skill that is being assessed).
Most of these approaches are triggered by the source rather than by an educational ob-
jective or a higher level competence construct, which describes the knowledge and re-
quired skills to be tested. Not having these specifications makes it very difficult to ar-
gument about validity for test-based inferences and to decide about the difficulty level
of the generated items.
Today, the top-down and the bottom-up approaches are complementary but unfortu-
nately not integrated. In order to cope with their limitations we propose a hybrid ap-
proach which integrates both. On the one hand, specification models on different levels
of abstraction allow a step-wise reduction of complexity, a good traceability between
the different models, and a higher level of reuse. On the other hand the process leads to
a sufficient detailed specification of items templates for automatic item generation. The
use of context specific information allows to influence the generation of items during
runtime, i.e., when the generation process is triggered by a user.
The AIG process in embedded in an eAssessment workflow (see Fig. 2) where different
actors are involved and where information (e.g., models, specifications, items, tests) is
passed from one phase to the other using different tools. In EAGLE, the AIG process
is embedded into the workflow of our TAO® platform [14].
Test Results
Item Editing
AIG Process Composition Analysis &
& Selection
& Delivery Reporting
The first model to be designed is the cognitive model. In the context of the Eagle pro-
ject, we used the ACRL framework [21] as a basic to structure the different relevant
constructs for information literacy. The ACRL framework is comprehensive and de-
scribes information literacy related competences in so-called standards. To limit the
scope for this paper we decided to focus on the standard Evaluate information and its
sources critically. This is a typical situation where employees of the public administra-
tion process textual information such as internal process descriptions, regulations, law
texts or inquiries received from citizens by email.
The cognitive model shown in Fig. 3 was modelled using the Business Process Model
and Notation (BPMN) [22], which is known for designing business processes. The el-
ements of a BPMN model are well defined and allow specifying flows of cognitive
actions. A cognitive model is typically composed of states, processes, and possible
paths through the model. Arrows describe the sequence of cognitive processes as they
are typically performed when solving a task, in this case understanding and evaluating
a text.
Fig. 3. Cognitive model of the evaluation of a text and its sources critically
As mentioned in the previous section, a cognitive model does not represent different
proficiency levels. Nevertheless, some processes are mandatory to proceed further (e.g.,
1, 2, 3 in Fig. 3) or alternative options are possible (4, 5, 6). The operator “X” is used
as an exclusive choice of options whereas the operator “+” allows cognitive process to
take place in parallel ((Group: 1-2-3-4-5-6), 7, 8). For each proficiency level a construct
map is developed. In this example we elaborate the construct map for the high profi-
ciency level. Each knowledge or skill component is related to at least one cognitive
process and is numbered for traceability.
Next, we present a task model for the component K2, which is linked to the cognitive
process “Recalling background knowledge to understand technical vocabulary”. Each
component has one task model. A task model has one to several task model expressions.
In our case three expressions are formulated:
All expressions are atomic. Vocabulary learning can take place on different levels of
cognition. On the lowest level of remembering a learner can recall a synonym of a
presented term and on the understanding level a learner can match different terms with
a definition by inferring similarities between the terms and the text of a definition.
A single task model expression (TME) can be implemented using different item tem-
plates. TME1 is implemented by an MCQ item with several options. TME2 consists in
identifying a term from its definition. In this task, the public administration employee
is invited to choose between multiple options in an MCQ and identify the correct con-
cept to which the definition applies. For TME3 a cloze item template is used.
Fig. 5. Item type template MCQ for TMS2 compare(term definition, term)
The XML-based item template has different sections. All sections are shown by ex-
cerpts from a MCQ item template (see Fig. 5). The metadata section describes the ref-
erences to the other models and also the item type. The second section contains the
body of the item with the elements: stem, key, options, and auxiliary information. The
advantage or our item templates is that the specification of the item body is fully QTI
v2.1 compliant and the generated items can therefore be imported into any eAssessment
platform that can import QTI items. The next section is dedicated to the variables which
are used to define the variants of the items. Each variable refers to a process that re-
solves the variable during runtime, i.e., the variable is replaced with content for this
element of the item.
All resolving processes are defined in the last section of the template. In this example
the variable termDefinition used in the prompt (i.e., stem) is retrieved by executing a
semantic query to a domain model, which is also determined in the template. In
EAGLE, this domain model is extracted automatically from a resource, in this case a
text-based OER.
After resolving all the variables, each item is stored in the TAO® platform. In
EAGLE the test designer has the possibility to manually select items and compose a
test or to let the AIG process compose automatically a test by using specific criteria:
The user can for example select from three difficulty levels and the user can also define
which types of items should be used in the test.
The following figure shows one page of a test composed of an MCQ, cloze, and gap
match item for the construct knowledge of specific vocabulary. The items were gener-
ated from OERs of the EAGLE platform on the topic of change management in the
public administration.
Fig. 6. Screenshot of generated items in TAO®
In this paper we have described the AIG process and as well as its implementation
inspired by the frameworks proposed by Luecht and Gierl et al. [1] in order to support
formative assessment for public administrations. The different models have been cre-
ated for EAGLE on the different levels. We have proposed notations as well as tem-
plates to specify the models on the different abstraction levels. These abstraction levels
reduce the complexity for item designers and hence bridge the gap between a high level
cognitive model and the fine grained item template specifications. The example demon-
strates that the traceability between the generated items and the constructs is possible.
Unfortunately, most AIG approaches do not cover all types of items but mainly focus
on generating multiple choice questions (MCQ). The use of variables in multiple item
templates and the fact that all templates are QTI compliant, don’t limit the approach to
specific item types. The limits reside on the resolving process of the variables defined
in the template. If the resource that is used as input does not contain the information to
resolve the variable, this means that the item cannot be generated. Therefore, AIG
strongly depends on the domain model or the knowledge excerpts that are extracted
from a given resource (e.g., text, Wiki page, OER).
Specifying the models for AIG requires an interaction between the domain experts
/instructors and the item bank creator to ensure that the final item bank will contain
appropriate items and the tests will be valid. It is important to decrease the effort nec-
essary for the creation of item templates because a complex task model can lead to
many item templates. This can be addressed by increasing the level of automation of
the domain model extraction with semantic annotation mechanisms (e.g., [20]) and au-
thoring guidelines (e.g., editing models such as the Text Encoding Initiative [23], Wiki
infobox templates [21], and semantic Wikis) and the semi-automatic creation of tem-
plates from the ontology based on ontology patterns.
Another current drawback is that we didn’t integrate yet the modelling of the cogni-
tive model, construct and task model into an IT-supported workflow. Therefore in the
future, we will also choose appropriate tools which can exchange machine readable
versions of the models. This allows in addition a systematic verification of the specifi-
cations, i.e., validity, completeness, consistency, etc.
A major challenge is to evaluate the quality of the generated items and also the AIG-
related modelling process. Therefore, we aim to conduct a comparative case study
where test design experts either follow the AIG process as elaborated in this paper or
where they follow the traditional process of item design. In addition, we will derive
quality indicators from the elements of the items (e.g. semantic similarity between dis-
tractor and key) and characteristics of the resources (e.g., reading complexity of a text)
in order to develop an automatic item features analysis tool. The tool will support the
classification of items in the item bank and hence the composition of suitable tests for
the different proficiency levels and learning needs. This will lead to an extension of the
model chain with a quality dimension, i.e., while we define the other models, we will
infer quality related information to build a quality model. Different quality mechanisms
and algorithms will be developed to derive psychometric characteristics for the items.
These quality mechanisms will certainly not replace the manual validation process per-
formed by humans. In future experiments, items generated should be compared with
“traditional items” to evaluate the quality of our automatic generated items.
Finally, the implemented feedback mechanism is simple, in the sense that TAO®
provides only corrective feedback and no comprehensive feedback (e.g., prompt, hint,
tutorial feedback, etc.). We have started the design of a feedback system for AIG based
on incremental user models which are derived from users’ behavior in tests [24].
Acknowledgements
This work was carried out in the context of the EAGLE funded by the European Un-
ion’s Seventh Framework Programme for research, technological development and
demonstration under grant agreement N° 619347.
References
1. Gierl, M.J., Haladyna, T.M.: Automatic Item Generation - Theory and Practice. Routledge,
New York (2013)
2. Luecht, R.M.: An introduction to assessment engineering for automatic item generation. In:
Gierl, M.J., Haladyna, T.M. (eds.): Automatic Item Generation. Routledge, New York, NY
(2013)
3. Gierl, M.J., Lai, H.: Using weak and strong theory to create item models for automatic item
generation. In: Gierl, M.J., Haladyna, T.M. (eds.): Automatic Item Generation. Routledge,
New York, NY (2013)
4. Gierl, M.J., Lai, H.: Generating items under the assessment engineering framework. In: Gierl,
M.J., Haladyna, T.M. (eds.): Automatic Item Generation. Routledge, New York, NY (2013)
5. Gierl, M.J., Lai, H.: Methods for creating and evaluating the item model structure used in
automatic item generation. Centre for Research in Applied Measurement and Evaluation,
University of Alberta, Alberta (2012) 1-30
6. Karamanis, N., Ha, L.A., Mitkov, R.: Generating multiple-choice test items from medical
text: A pilot study. Fourth International Conference Natural Language Generation Sydney,
Australia (2006) 111-113
7. Moser, J.R., Gütl, C., Liu, W.: Refined distractor generation with LSA and stylometry for
automated multiple choice question generation. In: Thielscher, M., Zhang, D. (eds.): 25th
Australasian Joint Conference - Advances in Artifical Intelligence (AL 2012), Vol. LNCS
7691. Springer, Sydney, Australia (2012) 95-106
8. Foulonneau, M., Grouès, G.: Common vs. expert knowledge: Making the Semantic Web an
educational model. 2nd International Workshop on Learning and Education with the Web of
Data (LiLe-2012 at WWW-2012 conference), Vol. 840, Lyon, France (2012)
9. Linnebank, F., Liem, J., Bredeweg, B.: Question generation and answering. DynaLearn,
Deliverable D3.3, EC FP7 STREP project 231526. (2010)
10. Liu, B.: SARAC: A Framework for Automatic Item Generation. Ninth IEEE International
Conference on Advanced Learning Technologies (ICALT2009), Riga, Latvia (2009) 556-
558
11. Papadopoulos, P.M., Demetriadis, S.N., Stamelos, I.G.: The Impact of Prompting in
Technology-Enhanced Learning as Moderated by Students' Motivation and Metacognitive
Skills. In: Cress, U.D.V.S.M. (ed.): 4th European Conference on Technology Enhanced
Learning (EC-TEL 2009), Vol. 5794. Springer, Nice, France (2009) 535-548
12. Zoumpatianos, K., Papasalouros, A., Kotis, K.: Automated transformation of swrl rules into
multiple-choice questions. FLAIRS Conference 11, Palm Beach, FL, USA (2011) 570-575
13. Mitkov, R., Ha, L.A., Karamanis, N.: A computer-aided environment for generating
multiple-choice test items. Natural Language Engineering 12 (2006) 177-194
14. TAO platform. Retrieved Dec 16, 2016, from https://www.taotesting.com/
15. Foulonneau, M., Ras, E.: Automatic Item Generation - New prospectives using open
educational resources and the semantic web. International Journal of e-Assessment (IJEA) 1
(2014)
16. Leighton, J.P., Gierl, M.J.: The Learning Sciences in Educational Assessment - The role of
Cognitive Models. Cambridge University Press, New York, NY (201)
17. Wilson, M.: Measuring progressions: Assessment structures underlying a learning
progression. Research in Science Teaching 46 (2009) 716-730
18. Lai, H., Gierl, M.J.: Generating items under the assessment engineering framework. In: Gierl,
M.J., Haladyna, T.M. (eds.): Automatic Item Generation. Routledge, New York, NY (2013)
19. Anderson, L.W., Krathwohl, D.R.: A Taxonomy for Learning, Teaching, and Assessing: a
Revision of Bloom's Taxonomy of Educational Objectives. Longman, New York (2001)
20. Sonnleitner, P.: Using the LLTM to evaluate an item-generating system for reading
comprehension. Psychology Science 50 (2008)
21. ACRL framework (2016). Retrieved Dec 6, 2016, from http://www.ala.org/acrl
22. Object Management Group Business Process Modeling and Notation v2 (BPMN). Retrieved
retrieved Dec 19, 2016, from http://www.bpmn.org/
23. Text Encoding Initiative (TEI): (2016)
24. Höhn, S., Ras, E.: Designing formative and adaptive feedback using incremental user models.
International Conference on Web-based Learning (ICWL 2016), Rome, Italy (2016)