Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract. Robots that interact with humans are required to achieve mul-
tiple simultaneous tasks such as carrying objects, collision avoidance and
conversation with human, in real time. This paper presents a design frame-
work of the control and the recognition processes to meet the requirement by
considering stochastic behavior of humans. The proposed designing method
first introduces petri-net. The petri-net formulation is converted to Markov
decision processes and dealt with in optimal control framework. Two tasks of
safety confirmation and conversation tasks are implemented. Tasks that nor-
mally tend to be designed by integrating many if-then rules can be dealt with
in a systematic manner in the proposed framework. The proposed method
was verified by simulations and experiments using RI-MAN.
1 Introduction
Human-robot interaction has come to gather attention recently. [1, 2, 3]. It is
also expected that robots interact with humans in household environments.
Yuichi Kobayashi
Tokyo University of Agriculture and Technology
e-mail: yu-koba@cc.tuat.ac.jp
Masaki Onishi
Information Technology Research Institute, AIST
e-mail: onishi@ni.aist.go.jp
Shigeyuki Hosoe
RIKEN Bio-mimetic Control Research Center
e-mail: hosoe@bmc.riken.jp
Luo Zhiwei
Kobe University
e-mail: luo@gold.kobe-u.ac.jp
566 Y. Kobayashi et al.
In the environments where robots interact with humans, robots are required
to do multiple tasks, such as conveyance of objects, conversation with hu-
mans, collision avoidance against humans. These parallel tasks should be
sometimes done simultaneously in real time. In addition, human interacting
robots are required to realize state recognition which includes uncer-
tainty. This uncertainty is mainly caused by arbitrary motions of humans.
The problem of real-time processing of parallel tasks has been discussed
in the field of task scheduling [4]. In the case of multiple tasks of the hu-
man interacting robot, multiple tasks and requirement for the task are not
well-defined because there have not been many trials of formulation of the
parallel tasks of robots interacting with humans. One reason for this is that
the uncertainty caused by human behaviors is not simple enough to formulate
in a standard task scheduling framework. In the literature of robot control
architectures, there have been many models to realize reactive and adaptive
behaviors of robots [5, 6]. As applications of the petri-net [9], a selection
framework of multiple navigation behaviors [3], a hierarchical control includ-
ing exceptional handling [7], and a motion generation of a humanoid robot
using timed petri-net [8] were proposed. These works mainly focused on navi-
gation or the motion of the robot body as applications and human-interacting
aspects are not considered because of the difficulty of formulation.
In this paper, human-interacting tasks such as collision avoidance and
conversation are implemented. The robot receives command by conversa-
tion while taking care about the safety of the human (collision avoidance).
This paper proposes dealing with the design of human-interacting behavior
through the modeling of petri-net, optimal control and models of human
behaviors interacting with robots. The proposed architecture consists of de-
scription of parallel tasks by petri-net and transformation of the petri-net
form into Markov Decision Processes (MDPs). A general formulation of the
proposed design is described in 2. Later an application to parallel tasks using
a human-interacting robot RI-MAN [10] is explained in 3. The experimental
and simulation results are shown in 4, followed by conclusion in 5.
from a place to another through a transition. When a token moves, the cor-
responding transition is said to ‘fire’. A transition which is connected to
multiple places as input can fire only when all of its input places have tokens.
1
The internal stages of task execution are expressed by places of petri-net.
The number of tasks is denoted by n. An objective place di0 is defined to de-
scribe the desired stage of task i. The number of places for task i is denoted
by mi and places of task i are denoted by di0 , di1 , · · · , dimi . In this paper it is
assumed that there is always only one token per task.
There are cases where a single place has multiple transitions as outputs.
In such cases, which of these transitions will fire is characterized by firing
probabilities. In addition, two assumptions are introduced; 1) an expected
duration is assigned to each transition and 2) a token can stay at the same
place for a certain period. 2) is expressed by defining transition from a place
to itself in the process of transformation to MDPs.
(i)
where s = {· · · , sk , · · · } and A(s) denotes action set for state s. If the state
transition probabilities and the expected durations are known, the state value
(i)
function V∗ (s) can be calculated by the above equation. The optimal policy
for the problem can be also derived based on V ∗ (s).
n
(i) (i)
[θ|s] = arg min wi Qπ(θ̄) (sk , a(θ)),
θ
i=1
(i) (i) (i) (i) (i) (i) (i) (i)
Qπ(θ̄) (sk , a(θ)) ≡ p(sk , sl , a(θ)) r(sk , sl , a(θ)) + Vπ(θ̄) (sl ) (5)
sl
Behavior Design of a Human-Interactive Robot 569
and wi > 0 denotes weighting coefficient for setting priorities among tasks.
By (5), an appropriate action parameter can be selected considering (approx-
imate) optimality of the shortest time control and priorities among tasks.
The robot looks around itself and confirms whether human exists or not.
When the robot judges that a human exists in vicinity, the robot speaks to
human so that human does not approach to the robot any more. The robot
estimates probability of human existence ph (i1 , i2 ), where [i1 , i2 ] denotes a
grid that is generated by dividing the 2D space around the robot. For the
judgment of human existence, two threshold values pth1 and pth2 (0 < pth1 <
pth2 < 1) are introduced and used as
Let Rvicinity denote the vicinity of robot set and Rcontact denote the contact
set. The grids are classified into three sets, Rvicinity, Rcontact and the rest
Rrest = Rall \(Rvicinity ∪ Rcontact ). There are four places in the task:
570 Y. Kobayashi et al.
conversation task
d 31
safety collision security of collision
d 01 avoidance task
The robot faces to a human and speaks to him or her. When the robot does
not receive any speech from humans, the robot promote the conversation by
orienting the face to a person. There are three places in the conversation task.
• ‘Human FOUND’; there exists grid [i1 , i2 ] such that ph (i1 , i2 ) > pth2 . The
token on this place will transit to SPEAK when a speech is recognized.
• ‘SPEAK’; based on the result of recognition of human speech, the robot
outputs some reply through the speaker. The end of the output triggers
the transition of the token from SPEAK to FOUND.
• ‘Human LOST’; there does not exist any grid such that ph (i1 , i2 ) > pth2 .
SPEAK is the objective place in the conversation task. The transition from
FOUND to SPEAK depends on the utterance of human. This process is
expressed by a stochastic transition.
Rext
qtilt
q2
qpan
Rvicinity
Rall q3
robot
Rcontact q1
qpan
Fig. 3 Grid set around robot Fig. 4 Generation of trajectory with via points
point of a trajectory. That is, grids with high ph tend to be used as via
points. The selecting probability function is denoted by pselect (ph ), which is
close to one when ph 1 and close to zero when ph 0. Let qi denote ith via
point as qi = [qpani , qtilti ]T . By succession of those points, a trajectory is gen-
erated as θk = {q1 , q2 , · · · , qnkv },where nkv denotes the number of via-points
of candidate θk . The head angles are ordered so that arctan(qtilti /qpani ) be-
comes monotonically increasing. By setting the velocity of head angle vhead
as constant, the total time for a periodic motion can be expressed by
k
nv
qj+1 − qj
T (a(θk )) = , qnkv +1 ≡ q1 . (9)
j=1
vhead
(i) (i)
This total time corresponds to the duration as r(sk , sl , a(θk )) = T (a(θk )).
(1) (1)
In the case of security task, the transition probability p(sk , sl , a(θk )) is
calculated as followings. First, grids that are visible by a sequence of head
motions generated by head trajectory θk are calculated and denoted as set
Rvisible . Let Rugrid denote set of grids where humans existence is not known
(defined by (8)). If view range realized by θk does not cover unknown grids,
the token remains at UNKNOWN place. This can be expressed as
(1) (1)
If Rvisible ∩ Runknown = ∅, then p(s2 , s2 , a(θk )) = 1,
(1) (1) (1) (1)
p(s2 , s1 , a(θk )) = p(s2 , s0 , a(θk )) = 0, (10)
(1) (1) (1)
where s0 , s1 and s2 denote SAFETY, ATTENTION, UNKNOWN, re-
spectively. On the other hand, when Rvisible ∩ Runknown = ∅, transition prob-
abilities are given by the followings.
(1) (1)
where τ(i1 ,i2 ) (θk ) denotes the duration when [i1 , i2 ] is included in the view
range. The probability of human speech can be expressed as a function of
ξ(θk ) as pspeech(ξ(θk )). pspeech(ξ)) is defined so that it becomes close to one
when ξ 1 and decreases as ξ gets close to zero. The transition probability
(2) (2)
in the conversation task is expressed as p(s1 , s0 , a(θk )) = pspeech(ξ(θk )).
250 100 12
200
200
160
160
120 120
80 80
40 40
0 0
[1,0] [0.7,1] [0,1] [1,0] [0.7,1] [0,1]
[w1 , w2 ] [w1 , w2 ]
[w1 , w2 ] = [0, 1]. That is, the performance of the security task, to look around
and decrease the number of unknown grids, was sacrificed by putting priority
to the tracking behavior.
Next in the experiment, two (real) humans walk around RI-MAN along fixed
trajectories for 420 [sec] per one trial. In Fig.8, the gazing time is maximum
in the case of [w1 , w2 ] = [0, 1]. In Fig.9, the neglecting time is minimum in
the case of [w1 , w2 ] = [1, 0]. In the case of [w1 , w2 ] = [0.7, 1], an intermediate
performance was obtained both in the gazing time and the neglecting time.
Thus, a similar tendency to the case of simulation could be seen also in the
experiment.
5 Conclusion
This paper proposed a behavior design of a human-interacting robot which is
required to execute multiple parallel tasks under uncertainties caused by hu-
mans. MDPs were constructed based on the description of parallel tasks by
the petri-net. The control framework was proposed as a shortest-time optimal
control problem and the multiple task problem could be dealt with in a sys-
tematic manner. In the application to the security task and the conversation
task of RI-MAN, models of human behaviors were introduced. By simulation
and experiment, it was verified that the proposed framework enables to adjust
the performance of the robot by changing weighting parameters.
574 Y. Kobayashi et al.
References
1. Kanda, T., Hirano, T., Eaton, D., Ishiguro, H.: Interactive robots as social
partners and peer tutors for children: A field trial. Human Computer Interac-
tion 19(1-2), 61–84 (2004)
2. Shiomi, M., Kanda, T., Ishiguro, H., Hagita, N.: Interactive humanoid robots
for a science museum. IEEE Intelligent Systems 22(2), 25–32 (2007)
3. Kim, G., Chung, W., Park, S., Kim, M.: Experimental research of navigation
behavior selection using generalized stochastic petri nets for a tour-guide robot.
In: Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (2005)
4. Bazewicz, J.: Scheduling computer and manufacturing processes. Springer, Hei-
delberg (1996)
5. Brooks, R.A.: A robust layered control system for a mobile robot. IEEE Journal
of Robotics and Automation RA-2, 253–262 (1986)
6. Connell, J.H.: Sss: A hybrid architecture applied to robot navigation. In: Proc.
of the 1992 IEEE Conf. on Robotics and Automation, pp. 2719–2724 (1992)
7. Lehmann, A., Mikut, R., Asfour, T.: Petri nets for task supervision in humanoid
robots. In: Proc. 37th International Symposium on Robotics, pp. 71–73 (2006)
8. Kobayashi, K., Nakatani, A., Takahashi, H., Ushio, T.: Motion planning for hu-
manoid robots using timed petri net and modular state net. In: Proc. of the 2002
Int. Conf. on Systems, Man & Cybernetics, pp. 334–339 (2002)
9. Haas, P.J.: Stochastic Petri Nets. Springer Series in Operations Research (2002)
10. Odashima, T., et al.: A soft human-interactive robot ri-man. In: Video Proceed-
ings of IEEE/RSJ International Conference on Intelligent Robots and Systems
(2006)
11. Ramage, P.J.G., Wonham, W.M.: The control of discrete event system. Proc.
IEEE 77(1), 81–98 (1989)
12. Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge
(1998)
13. Bertsekas, D.: Dynamic Programming and Optimal Control. Athena Scientific
(2005)
14. Elfes, A.: Using Occupancy Grids for Mobile Robot Perception and Navigation.
Computer 22(6), 46–57 (1989)
15. Stepan, P., Kulich, M., Preucil, L.: Robust data fusion with occupancy grid.
IEEE Trans. on Systems, Man, and Cybernetics Part C 35, 1 (2005)
16. Nakashima, H., Ohnishi, N., Mukai, T.: Self-Organization of a Sound Source Lo-
calization Robot by Perceptual Cycle. In: 9th Int. Conf. on Neural Information
Processing, vol. 2, pp. 834–838 (2002)