Você está na página 1de 6

Interaction with virtual environment using verbal/ non-verbal communication

Tomoaki Ozaki*, Kazuaki Tanaka" & Norihiro Abe* *Faculty of Computer Science and System Engineering, Kyusyu Institute of Technology Iizuka-shi, 820-8502 Japan ozaki@sein.mse.kyutech.ac.jp
1 Introduction 2 System configuration
Many expert systems have been made up to now. Because the system can't detect human's behavior justly, there is the case that dialogs with the system fail. It is epoch-making in the point that a virtual reality system can solve this problem by leading the human being to the space of a computer. In the virtual reality system, human's non-verbal behavior can be input into the computer system using sensors and data gloves. This system is different from the traditional interface using both mouse and keyboard, it is possible to watch the behavior of a user and to instruct a right method when it is wrong. But this system wasn't able to know the intention of user definitely because no voice interaction facility is provided with the user. In other words, only in human non-verbal behavior, System can't completely detect the human intention and hesitation [2]. In communication between human being, a spoken language becomes important element besides non-verbal behavior. So, in this research, using speech recognition technology and adopting voice interaction to the virtual reality system, we propose verbal/ non-verbal communication with human being and a computer
system.

2.1 Hardware organization


The system consists of the computer which builds a virtual reality system, a microphone for a user to perform voice input, and 3-dimensional position sensors and data gloves for a user to input non-verbal behavior. General drawing is shown in figure 1.

SGI workstation

The interface that both spoken language and non-verbal behavior can be input at the same time in order to realize verbal/ nonverbal communication. A virtual reality system can be brought close to a real environment more by using this interface. For example, we can ask or issue an order on an object to the system using a spoken language while pointing at the object with a data glove. We selected assembly/ disassembly of a virtual machine as the field of application of verbal/ non-verbal communication. We built the assembly training system which has a user acquire a right assembly/ disassembly method by allowing a user to simulate assembly operation and to issue an inquiry or command to the system using data glove and a spoken language in a virtual reality system in which three-dimensional object models of mechanical parts are arranged. Figure 1. Hardware organization

2.2 Continuous speech recognition parser (JULIAN)


This research used JULIAN which Prof. Doshita's research laboratory in Kyoto University developed as a speech

recognition software. JULIAN is a recognition parser performing continuous speech recognition on the basis of a finite state grammar (DFA). It begins to look for the most plausible word list based on a given DFA for voice input from the microphone (continuous speech to make a pose with gap) and outputs it as a character string. DFA is made from vocabulary and the syntax rule that a user registered.

0-7803-5731-0/99$10.00 01999 IEEE

IV- 170

2.3 Openlnventor

the two objects with a data glove. A three-dimensional graphics library,

To build a virtual reality system, three-dimensional surface


models are used.
*

A user will need not memorize the identifier of object parts

OpenInventor[4] of SGI Company is used.

by admitting the use of the directive. A user is able to order the system to do assembly operation. If we should want to

3 Assembly training system


3.1 System configuration This system consists of 3 parts including an assembly of a virtual machine, spoken language processing unit, and nonverbal behavior analysis unit. We summarize the main facility of each part in Figure 2. We describe each function in detail later.

interrupt the operation while the system is executing an assembly operation, we could have the system suspend the operation by issuing a phrase or sentence that means the suspension of the operation. 3.3 Model of mechanical part The model of mechanical part at the initial state of assembly is shown in Figure 3.

-speech recognition

-analysis ofhand position

commands non-verbal behavior quesrions assembly of a virmal machine

- dehitions of operadngprocedure
-definitions of mecahnical pan -display of a vinual machine -making aresponse to users questions or mmmands

support line Figure 3. Initial status of assembly

Figure 2. System configuration 3.4 Definitions of operating procedure 3.2 Verbal/ non-verbal interface In this system, we used the interface that a user could input spoken language and non-verbal behavior simultaneously. Consequently, a user has only to utter toward a microphone in case issuing voice input without any keyboard action. The operation method peculiar to this interface is shown in the following. In this system, the operating procedure (AND/OR procedure) is defined with an AND/OR graph as shown in figure 4. Hereafter, we call the part a mvobject which has a component to be moved after a user has selected it with a data glove or voice input, and the partner part a basic part into which the mvobject is installed. Each node in the AND/OR graph shown in Figure 4, for examples START, END and points from 1 to 8, express an assembly status of the give assembly. With a traditional interface, the unique name must be used in order to distinguish the object from others. But when Assembly operation along an arc of the graph (operation) is necessary in order to change the state of the assembly. In operation, operating instruction and the object parts (mvobject, basic part) are described. All nodes of the AND/OR procedure shown in Figure 4 consists of OR nodes. In other words, a user has only to sequentially follow the graph from the upper part toward the

there are many same objects like mechanical parts, It is difficult to designate one of them using the name. This is, however, easily realized simply by pointing or grasping the object. A user can speak to the system by inputting the spoken language such as Install this part on that. while pointing at

IV- 171

lower part. For example, assembly procedure such as [START

On teaching of the assembly , it is important to get a user simulate the assembly operation using data glove. But if the system shows the user how to move the specified object from the initial state to the completion status of the assembly operation, we think it to be helpful enough for the user to realize the same operation. So we prepare a method to allow a user to have the system operate the specified object. 3.6 Selection of parts with data glove

- 1-5 - END], [START - 3-8 - END] shows one procedure.

- When a data glove takes an attitude of pointing at an object

as shown in the Figure 6, the system decides the part has been selected that is nearest to the hand and is included in the neighborhood of a linear line extended from the forefinger, and changes the color. When the palm of the data glove is going to be closed to Figure 4. Operating procedure based on AND/OR graph 3.5 Definitions of mechanical part In this system mechanical parts are defined with a Scene Graph [4] as shown in figure 5. A MyParts is data node. A part name and a part number are described in the MyParts. A part name corresponds with the voice input from a user. A part number is used for describing the object part in the AND/OR procedure. grasp the part as shown in the Figure 7, if the bound box of the forefinger and the bound box of the parts interfere each other, the system decides that the user has grasped the part and changes the color again.

MyParts DataNodes :name key shaft 2 :number

@ @
Transform Material,

:parator Figure 6. Selection of part by pointing action

Coodinate FaceSet MyParts : definition of a part name and a part number Transform : definition of a part position Coordinate, FaceSet : definition of a part shape Figure 5. Definition of mechanical part 3.5 Selection/ operation of parts In selection of parts, the system is provided with the method to orally input a part name, or the method to grasp or point to the object with a data glove in a virtual reality system.
interference of bouding boxes

boundine box of a data glove

Figure 7. Selection of parts by grasping operation

Iv- 172

4. Spoken language process


4.1 Natural language processing A spoken language input from a user (Japanese) is converted into a character string by JULIAN. Next, a natural language processing program will analyze the character string through the speech recognition and the semantics of the voice is extracted. A user must register into JULLIAN the words and syntax used in the speech recognition as had described with

syntax

OBJ

WO

OPV-A

AUX-A

in the syntax

dictionary, and a category shown in the Figure 10 is obtained. Because a category is registered corresponding to a function of a word, semantics of the word becomes possible. At this time whether the content of the sentence can be handled with the system or not is judged. To increase a number of sentences to be understood, you have only to add words belonging to a category or categories and syntax rules. Inversion expression and more than one expression can be also accepted. The second rule in syntax rules shown in the Figure 9 is the inversion form of the first syntax rule. The Intax flow of the process is shown in Figure 10. Recognition
##An object of o p e r a t i o h S e m a n t i c s of category

2.2. The dictionary made at that time is also available to the


language processing. We show an example of the word dictionary and dictionary in Figure 8 and Figure 9.

Syntax rule

%OBJ- . A b

category name word to be recognized

bf ---------+The

&R$% F-

*H
OPV-A

Sematics of category

Method of operation (-

$y)

##Method of operation (%OPV-A

ab)

I The end of a verb(imperative) I

i%?+GT

@!lHbf

Figure 10. A result of language processing

4.2 Constructing the contents of dialog ##The end of a verb (imperative) %AUX-A
An analysis result provided with the natural language

5 7 . 2b l

processing exploits the knowledge of assembly, and is stored in a list called a contents list. When Assembled hexagon head bolt.(&@!$& made.

b %%f145T3. is a recognized )

character string, a contents list as shown in the Figure 11 is

##Particle %WO

(e)
Figure 8. A part of dictionary

order

operation

-32

The syntax rule is registered assuming the categories registered in the dictionary to be non-terminal symbols. OBJ WO OPV-A OPV-A AUX-A AUX-A OBJ WO Figure 11. Contents list Key words corresponding to the contents of the sentence stored in the first line of the contents list. The system distinguishes the contents of the utterance with the key words. If some assembly operation is necessary, a method realizing Semantic analysis is done in top-down fashion. When a sentence Assemble the hexagon head bolt.(,ffi$lb the operation is put in the second line, and the object is entered in the 3rd and the 4th line. Because two parts are mainly selected as objects of one operation, the 3rd and the

Figure 9. A part of syntax dictionary

F-

&?+GT . ) is input, the input sentence is matched to the 5

IV--173

4th line are prepared. 4.3 Estimation of a contents list The system makes an appropriate response by matching a contents list with information about the assembly at hand. As this process depends on the contents of the utterance, we will explain it in detail while showing typical examples in the 5th section. 4.4. Flow of process .We describe the flow of process of spoken language in the following. At first, a user issues an inquiry or command to the system using a spoken language. Making natural language processing on the spoken language next, If the system is able
to accept the contents, a contents list is made. Otherwise, the
5.2 The example a user has the system to operate a part

Figure 13. The circumstance the key shaft is pointed at

If operating instruction and an object are given to the system, it performs assembly. When an operation command is given from a user, the system matches the operation described in the AND/OR procedure shown in 3.4 to the contents list of the given utterance, and a response is made. If the operation command fits the operation

user must repeat the voice input. When a contents list is made, the system matches the contents with the information about the current assembly and makes a response.

5 Examples of interaction
Two interaction examples are shown. These examples show verbal/ non-verbal communication with human being and a computer system typically.
5.1 The Example a user asks the system about a part name

in the AND/OR graph, the operation is performed. When the operating instruction is wrong, a response The given

operation is wrong because it is impossible to cany it out. is generated. When an operating instruction is right but a part to be operated is wrong, a response, The part to be operated is wrong.

A user can ask the system about a part name pointing at a part with data glove. If a part is selected with a data glove, the system gets the MyParts (Figure 5) of the selected part. Then the system teaches the user the part name described in the

is generated. If there are several parts with the same

name as the object designated with voice input, the system executes the given instruction on finding an instance of the operational part.

MyParts.
Typical interaction examples are shown in the following. A user points at a bearing (holder) with a data glove while
) saying, What is t h i s ? ( Z ; h k k ~ T ~ d ~ .. As a result a

Typical interaction examples are shown in the following. In the assembly shown in the Figure 15, let assume that the user grasps a bearing (holder) with a data glove while saying Install this part.(L$l??~c?($kf3. system replied as follows. Figure 14 shows an interaction contents list made then.
)

response

This is a holder.

is generated.

As a result the

Figure 12 shows an interaction contents list made then. Figure 13 shows the circumstance the holder is pointed at with a data glove.

order oqeration obectl object2

operation ~ ; h nothin nothing

Figure 15 shows the circumstance the bearing is grasped with a data glove. Figure 16 shows the situation the bearing has been completely installed.

Figure 12. The contents list of dialog

N - 174

order operation object 1 object2

operation

out an object simply. Because there is no depth feeling in the virtual space used in the system, it is often difficult for a user to grasp or point to an object using a data glove. The way a user instructs a system to assemble a virtual machine has been mainly reported in this paper. We have already reported the way a system watches a users behavior while he/she is assembling a virtual machine. But a system simply points at the part erroneously operated, any instruction was not given to the user. An avatar should perform this instruction along with the explanation in voice with manipulating mechanical parts. In the same way, for questions from a user, the avatar must answer with a gesture and a spoken language. If the bi-directional verbal/ non-verbal communication can be realized, a user has only to imitate the avatars action. This means that mutual comprehension will be promoted.

fx0 ($tjrL t nothing

Figure 15. Grasp of a part with a data glove

6 References

[l] Norihiro Abe and Saburo Tsuji A consulting system which detects and undoes erroneous operations by novices Proc. of SPIE, pp.352-358, (10 1986) [2] Norihiro Abe, Tomohiro Amano, Kazuaki Tanaka, J.Y.Zheng, Shoujie He, and Hirokazu Taki A Training System for Detecting Novices Erroneous Operation in Repairing Virtual Machines Figure 16. A situation the bearing is perfectly installed. [3] Norihiro Abe, J.Y.Zheng, Kazuaki Tanaka and Hirokazu As a given operation command fits the operation described in the AND/OR procedure, the system performed the installation operation of the bearing. Taki A training System using Virtual Machines for Conference on System, Man and Teaching Assembling/ Disassembling Operations to Novices International Cybemetics,pp.2096-2101 (1996) International Conference on Virtual Reality and Tele-Existence(ICAT), pp.224-229,( 1997)

6 Conclusion
[4] J,Wernecke In this research, verball non-verbal communication between human and a computer system in virtual space is proposed. As a concrete example, we have applied it to the field of the mechanical assembly domain. The system accepts questions/ command from a user in a spoken language and is successful in the correct interpretation of the users intention and maintenance of the communication between the user and the system. Using the non-verbal operation such as a pointing action, a user was able to point [5] Norihiro Abe, Atsushi Wada, Kazuaki Tanaka, J.Y.Zheng, Shoujie He and Hirokazu Taki Verification of Assembability of Mechanical Parts and Visualization of Machinery of Assembly in Virtual Space pp.208-215, (1997) International Conference on Virtual Reality and Tele-Existence (ICAT), The Inventor Mentor, Addison Wesley Publishing Company (1994)

Iv-175

Você também pode gostar