Você está na página 1de 4

Natural Language Navigation Support in Virtual Reality

Jeroen van Luin, Anton Nijholt, Rieks op den Akker

University of Twente
Centre of Telematics and Information Technology
PO Box 217, 7500 AE Enschede, The Netherlands
{vanluin, anijholt, infrieks}@cs.utwente.nl

ABSTRACT lated and can be integrated as all of them are


We describe our work on designing a natural lan- agent-oriented and are oriented towards a com-
guage accessible navigation agent for a virtual mon framework of communicating agents. In
reality (VR) environment. The agent is part of this paper we will describe how we build navi-
an agent framework, which means that it can gation intelligence into an agent.
communicate with other agents. Its navigation
task consists of guiding the visitors in the en- 2. NAVIGATION
vironment and to answer questions about this
environment (a theatre building). Visitors are We dene navigation as the process in which
invited to explore this building, ask questions people control their movements using visual clues
and get advice from the navigation agent. A in the environment and articial aids such as
2D map has been added to the environment so maps, to reach their target without getting lost
that visitors can make references to the loca- [4]. Virtual worlds have many of the navigation
tions and objects on this map, both in natural problems that exist in the real world. However,
language and by clicking with the mouse, mak- virtual worlds are often considered more di-
ing it a multimodal system with cross-modality cult to navigate than real worlds. This is due
references. to the fact that most virtual worlds are less de-
tailed, use no or dierent laws of physics and
contain less visible clues that can be used as
1. INTRODUCTION
landmarks during navigation. This causes visi-
We have designed and built a VRML version of tors of virtual environments to fail in getting a
a theatre according to the drawings of the ar- good overview of the environment. A problem
chitects [3]. In the theatre we have added mul- we discovered while using the virtual theatre is
tiple agents, with the hostess Karin being the that moving through the world using the VRML
main agent. She is standing behind an infor- browsers mouse or keyboard interface is dicult
mation desk and knows a lot about the perfor- and unnatural. Especially moving around ob-
mances that take place and the performers that jects turned out to be quite a struggle. These
will perform in the real theatre. Visitors can problems, causing desorientation and failure to
ask Karin questions in a natural language and nd new or known locations will lead to unsat-
Karin will access the performance and perform- ised and frustrated visitors who will stop using
ers databases and try to extract and formulate the environment.
answers. When the virtual world was made ac-
cessible to the audience, the need for an other 3. AGENT AND MAP BASED
agent emerged to solve problems like: To whom NAVIGATION SUPPORT
do we address our questions about the environ-
ment itself? and To whom do we address our As mentioned, the problems can be roughly di-
questions about how to continue, where to nd vided into two categories: loss of overview and
other visitors or where to nd domain-related failure to navigate. Both problems also exist
information?. We learned from reactions of in navigating through the real world and many
visitors that they had problems with navigat- solutions have been tried to solve them. For in-
ing through the virtual world as well. At this stance, to solve the problem of loss of overview,
moment we are following dierent approaches we can use a map. To solve the problem of
to solve these problems. The approaches are re- waynding in an unfamiliar area, we can use a
guide. The combination of a map and guide
makes it possible to point out objects and loca-
tions on the map and refer to them when com-
municating with the guide.
We have implemented the combination of these
two solutions in our virtual theatre to see if they
would work in VR as well. The reader is re-
ferred to [2] for observations on user preferences
for navigation support. Our work is in progress,
meaning that the system is there, but no eort
has been done to add graphic sugar to the lay-
Figure 2: Visitor has been brought to the stairs
out and the integration of the dierent windows
that are used.
our navigation agent lacks such skills. To en-
able him to deal with the virtual world, we have
made an abstract representation of the theatre.
The agent has access to three databases. The
rst database contains information about phys-
ical objects that are present in the theatre, such
as name, location, aliasses, adjectives, size, etc.
The second database contains information about
imaginary objects, i.e. objects without a phys-
ical body, such as performances. The third
database is a general knowledge base, containing
knowledge about relations between objects, such
Figure 1: Floor map and agent window
as chairs, tables and cupboards all being furni-
ture. At this moment, the physical objects in
In gure 1 we display the current oormap and the database have been entered by hand. When
the agent window and in gure 2 a view on we have translated our theatre from VRML to
part of the virtual world. In this view the user Java3D, we will have the database automatically
is looking at the stairs going to the next oor lled with the objects in the virtual world.
of the building. The visitor can ask questions,
give commands and provide information when
Visitor
prompted by the agent. This is done by typing
natural language utterances and by moving the
mouse pointer over the map to locations and ob-
Unification
jects the user is interested in. On the map the grammar
Navigation Dialogue
agent manager
user can nd the performance halls, the lounges Parser
and bars, selling points, information desks and
other interesting locations and objects.1 The Physical Imaginary Knowledge
current position of the visitor in the virtual en- objects objects base
vironment is marked on the map as well, allow-
ing the visitor to check his position on the map Figure 3: System layout
while moving in VR. When using the mouse to
point at a position on the map, both the user 5. NATURAL LANGUAGE ACCESS
and the navigation agent can refer to the object
or location pointed at. As mentioned, the navigation agent can be ac-
cessed using natural language. We have anno-
4. ABSTRACT WORLD tated a small corpus of example user utterances
that may appear in navigation dialogues. The
Our visitors are able to interpret the objects and utterances were chosen based on a set of use
locations they see in the virtual world. However, cases that were made during the design of the
navigation agent. The corpus contains two types
1 Interestingly, one of these objects is a board in the
of utterances. The rst type are full sentences
virtual world displaying a map with chair positions in a
which contain complete questions and commands.
performance hall. When the visitor clicks on a chair on
this VR map, she is teleported to this chair to get a view Examples of full sentences are: What is this?,
at the stage. while the visitor points at an object on the map,
or Is there an entrance for wheel chairs? or case a declarative sentence. This simple exam-
Bring me to the information desk.. The sec- ple only has one possible parse, but when more
ond type of utterances are short sentences, that complex sentence-parts like for example prepo-
the visitor can use when reacting on a question sitional phrases are introduced, the number of
or remark from the navigation agent. Exam- possible parses will increase rapidly.
ples of short sentences are No, that one. or If more than one representation has been
Karin. We use Treebank software to induce found, it is possible to use knowledge of the
a grammar from the annotated corpus[1]. The domain to make an educated guess which rep-
grammar has unication rules to compose se- resentation represents the utterance best, but
mantics based on the semantics of underlying this has not been implemented yet. At this mo-
non-terminals and the semantics of the termi- ment, if one or more representations are found,
nals, which can be found in a lexicon. the rst representation is returned. If no repre-
sentations are found, the parser returns a rep-
S NP VP resentation of the longest part of the utterance
<S sem main> = <NP sem> that could be parsed. The chosen representation
<S sem verb> = <VP sem> is sent to the dialogue manager who tries to cre-
ate an action object based on the information in
NP Det N
the representation. Action objects can contain
<NP sem det> = <Det sem>
the action to undertake as well as the object(s)
<S sem main> = <N sem>
and location(s) used in the action. The dialogue
VP V manager uses an abstract representation of the
<VP sem> = <V sem> virtual world and a knowledge base to match
reference words in the utterance to objects and
Figure 4: Example grammar
locations in the world. The abstract representa-
tion contains the objects and locations from the
Figure 4 shows an example grammar with se-
real world with their specic names, aliases and
mantic unication rules for a sentence that con-
adjectives. The knowledge base contains general
sists of a noun phrase (NP) followed by a verb
knowledge about objects, like chairs, tables and
phrase (VP), in which a NP consists of a deter-
cupboards all being furniture.
miner (Det) and a noun (N). A VP can only
have a single verb (V). The Det, N and V are
terminals and their semantics can be found in Receive visitor utterance
a (seperate) lexicon le. The next section will Analyze input
explain how this grammar is used by a unica- If analysis contains enough
tion type parser. Using a corpus based grammar information
has the advantage that new sentences can easily Create action object
be added to the system by adding them to the Execute action object
corpus and repeating the induction step. Else
Try to get the missing
6. FROM UTTERANCE TO ACTION information
If unsuccesful, ask a
When the utterance of the user is received by question
the navigation agent, the utterance is sent to
our unication parser Demosthenes. The parser
Figure 6: Highest level algorithm
uses the corpus based grammar to nd one or
more semantic representations of the utterance.
Whenever more than one object would t the
For example, if the sentence The man whistles
description in the utterance, the dialogue man-
is parsed using the grammar in gure 4, the fol-
ager automatically assumes the one closest to
lowing representation would be generated:
the visitor is meant.
If the information in the representation is
stype: decl
  enough to create an action object with all nec-
det: the
main: essary elds lled in and all references could be
main: man
solved, the action object is executed. The ob-
verb: whistles ject is then stored on the action object stack,
which contains all the action objects generated
Figure 5: Semantic representation
during this particular visit of the user. If the
where stype refers to the sentence type, in this information in the representation is not enough
to correctly determine the visitors wish, three 2. Visitor: Can you bring me to the infor-
agents start to work together: the navigation mation desk?
agent, the dialogue manager and the CosmoA- Agent: I have marked the position on the
gent. The latter can talk to the Cosmo VRML- map. Now I will bring you there.
browser using an EAI interface to nd out infor- Visitor: [Clicks on an object on the map]
mation about the location and surroundings of What is this?
the visitor and move the visitor through the vir- Agent: That is an exposition.
tual world. 3. Visitor: Where is it?
Together, they try to nd out what infor- Agent: You can nd it in the lounge.
mation is missing and how they can get that Visitor: Lets go there.
information. They rst try to use information Agent: I will bring you there.
that is already available, like the history of the
dialogue. If that information isnt enough to ll 8. CONCLUSIONS AND FUTURE
in all the necessary information, the navigation RESEARCH
agent will take over the dialogue initiative and
the visitor will be asked an intelligent question. The prototype navigation agent which we dis-
The question is intelligent in the sense that the cussed here is only a rst step in navigation
information that could be extracted from the support. It is certainly not our nal solution in
utterance is remembered and used in the cre- assisting the visitors of our virtual environment.
ation of the question. For example, if the navi- In the next phase of research we need to concen-
gation agent did nd out the visitor wanted to be trate on the communication with other agents
brought to a certain place, but couldnt nd out that are available in the virtual theatre. How
whereto, the question asked will not be What can we take care that a visitors question reaches
do you want? but Where do you want to go the appropriate agent? How can we model the
to?. history of interaction in such a way that dier-
The visitor can respond in two ways. First, ent agents do not only know about their own
the visitor can answer with a new command or role in this interaction but also about that of
question in which the missing information is en- the others? Unlike other environments, our en-
tered or in which the misspelled information is vironment allows the investigation of communi-
corrected. However, the visitor can also respond cation between active and passive agents that in-
with a short phrase, in which he only gives the form the visitor about the possibilities and the
missing or misspelled information. The dialogue properties of an information-rich virtual envi-
manager will then enter the new information in ronment.
the remembered action object.
Not yet implemented is the possibility that 9. REFERENCES
the CosmoAgent can nd out what is in the eye-
sight of the visitor. This information can be [1] H.W.L. ter Doest, Towards probabilistic
used to allow and disambiguate references to ob- unication-based parsing, Neslia Panicu-
jects that are visible in the VR world. lata publishers, Enschede, The Nether-
lands, 1999
7. NAVIGATION DIALOGUES
[2] K. Hook, et al. Towards a framework for
To illustrate the way our multi-modal naviga- design and evaluation of navigation in elec-
tion assistence works, we give three examples of tronic spaces. Persona Deliverable for the
dialogues. The rst example shows the way the EC, 1998
agent responds when confronted with an error in
[3] A. Nijholt, and J. Hulstijn, Multimodal in-
the input. The second example shows the way teractions with agents in virtual worlds.
the multi-modality is used, and the third exam- In: Future Directions for Intelligent Infor-
ple shows how the visitor can refer to objects mation Systems and Information Science,
mentioned in earlier parts of the dialogue. Springer Physica-Verlag, 2000.
1. Visitor: How do I get to the information [4] J. Zwiers, B. van Dijk, A. Nijholt and R. op
deks? den Akker, Design issues for navigation and
Agent: Where did you want to go? assistance agents in virtual environments.
Visitor: The desk. In TWLT 17 Learning to behave, pages 119-
132, 2000

Você também pode gostar