Você está na página 1de 10

Behavior Design of a

Human-Interactive Robot through


Parallel Tasks Optimization

Yuichi Kobayashi, Masaki Onishi, Shigeyuki Hosoe, and Zhiwei Luo

Abstract. Robots that interact with humans are required to achieve mul-
tiple simultaneous tasks such as carrying objects, collision avoidance and
conversation with human, in real time. This paper presents a design frame-
work of the control and the recognition processes to meet the requirement by
considering stochastic behavior of humans. The proposed designing method
first introduces petri-net. The petri-net formulation is converted to Markov
decision processes and dealt with in optimal control framework. Two tasks of
safety confirmation and conversation tasks are implemented. Tasks that nor-
mally tend to be designed by integrating many if-then rules can be dealt with
in a systematic manner in the proposed framework. The proposed method
was verified by simulations and experiments using RI-MAN.

1 Introduction
Human-robot interaction has come to gather attention recently. [1, 2, 3]. It is
also expected that robots interact with humans in household environments.
Yuichi Kobayashi
Tokyo University of Agriculture and Technology
e-mail: yu-koba@cc.tuat.ac.jp
Masaki Onishi
Information Technology Research Institute, AIST
e-mail: onishi@ni.aist.go.jp
Shigeyuki Hosoe
RIKEN Bio-mimetic Control Research Center
e-mail: hosoe@bmc.riken.jp
Luo Zhiwei
Kobe University
e-mail: luo@gold.kobe-u.ac.jp
566 Y. Kobayashi et al.

In the environments where robots interact with humans, robots are required
to do multiple tasks, such as conveyance of objects, conversation with hu-
mans, collision avoidance against humans. These parallel tasks should be
sometimes done simultaneously in real time. In addition, human interacting
robots are required to realize state recognition which includes uncer-
tainty. This uncertainty is mainly caused by arbitrary motions of humans.
The problem of real-time processing of parallel tasks has been discussed
in the field of task scheduling [4]. In the case of multiple tasks of the hu-
man interacting robot, multiple tasks and requirement for the task are not
well-defined because there have not been many trials of formulation of the
parallel tasks of robots interacting with humans. One reason for this is that
the uncertainty caused by human behaviors is not simple enough to formulate
in a standard task scheduling framework. In the literature of robot control
architectures, there have been many models to realize reactive and adaptive
behaviors of robots [5, 6]. As applications of the petri-net [9], a selection
framework of multiple navigation behaviors [3], a hierarchical control includ-
ing exceptional handling [7], and a motion generation of a humanoid robot
using timed petri-net [8] were proposed. These works mainly focused on navi-
gation or the motion of the robot body as applications and human-interacting
aspects are not considered because of the difficulty of formulation.
In this paper, human-interacting tasks such as collision avoidance and
conversation are implemented. The robot receives command by conversa-
tion while taking care about the safety of the human (collision avoidance).
This paper proposes dealing with the design of human-interacting behavior
through the modeling of petri-net, optimal control and models of human
behaviors interacting with robots. The proposed architecture consists of de-
scription of parallel tasks by petri-net and transformation of the petri-net
form into Markov Decision Processes (MDPs). A general formulation of the
proposed design is described in 2. Later an application to parallel tasks using
a human-interacting robot RI-MAN [10] is explained in 3. The experimental
and simulation results are shown in 4, followed by conclusion in 5.

2 Formulation of Parallel Tasks and Optimal Control


State variables express the state of the environment, human and the robot.
They consist of continuous variables xc ∈ RNc and discrete ones xd ∈ ZNd .
Observation variables consist of continuous variables yc ∈ RMc and discrete
ones yd ∈ ZMd . The output to the actuators and the speaker consists of
continuous variables uc ∈ RLc and discrete ones ud ∈ ZLd .

2.1 Description of Parallel Tasks with Petri-net


The tasks are expressed first by Petri-net. Petri-net consists of places and
transitions (see Fig.1). A circle on a place denotes a token. The token moves
Behavior Design of a Human-Interactive Robot 567

Fig. 1 Example of petri- task j


net expression for two d1j d 2j d 3j d 0j
tasks i and j
task i d1i d 2i d 0i
place transition token objective place

from a place to another through a transition. When a token moves, the cor-
responding transition is said to ‘fire’. A transition which is connected to
multiple places as input can fire only when all of its input places have tokens.
1
The internal stages of task execution are expressed by places of petri-net.
The number of tasks is denoted by n. An objective place di0 is defined to de-
scribe the desired stage of task i. The number of places for task i is denoted
by mi and places of task i are denoted by di0 , di1 , · · · , dimi . In this paper it is
assumed that there is always only one token per task.
There are cases where a single place has multiple transitions as outputs.
In such cases, which of these transitions will fire is characterized by firing
probabilities. In addition, two assumptions are introduced; 1) an expected
duration is assigned to each transition and 2) a token can stay at the same
place for a certain period. 2) is expressed by defining transition from a place
to itself in the process of transformation to MDPs.

2.2 Optimal Control in Markov Decision Process

The stages of the task execution of the robot can be represented as s =


{s(1) , · · · , s(n) }, where s(i) denotes the place that has a token in task i. This
can be regarded as ‘state’ of the task execution. When an action is deter-
mined as a function of state as a = π(s), π is called a policy. Note here that
the transitions of states are stochastic even for the same actions because of
uncertainties caused by human motions.
The transitions among the states are characterized by state transition
(i) (i) (i) (i)
probabilities p(sk , sl , a) and expected durations r(sk , sl , a) for transi-
(i) (i)
tions from sk to sl . This expected duration is dependent on the action.
That is, a common expected duration T (a) is applied to all transitions of all
(1) (1) (n) (n)
tasks as r(sk , sl , a) = · · · = r(si , sj , a) = T (a).
The goal for task i is to make the token for task i reach the objective place
within the shortest expected time. In the framework of optimal control with
discrete state [13],this problem is to find a policy π which minimizes the
∞ (i)  (i)
expected value Eπ r
t=0 t , where rt denotes the duration of an action
at step t in task i. The state value function is defined as the expected return:
1
In this research, we use basic definitions of transitions and omit inhibition and
synchronization functions for simplicity.
568 Y. Kobayashi et al.
 ∞

(i)
 (i) (i) (i) (i) (i) (i)
Vπ(i) (sk ) = Eπ r(sk , sl , a) , r(s0 , s0 , a) = 0, Vπ(i) (s0 ) = 0, s0 = di0 .
t=0
(1)
The optimal state value function V ∗ (s) satisfies the Bellman equation [12].
 
(i) (i) (i) (i) (i) (i) (i) (i)
V∗ (sk ) = min p(sk , sl , a) r(sk , sl , a) + V∗ (sl ) , (2)
a∈ A(s)
(i)
sl

(i)
where s = {· · · , sk , · · · } and A(s) denotes action set for state s. If the state
transition probabilities and the expected durations are known, the state value
(i)
function V∗ (s) can be calculated by the above equation. The optimal policy
for the problem can be also derived based on V ∗ (s).

2.3 Optimization of Action Parameters

An action a is more concretely expressed as a sequence of continuous outputs


uc or a successive discrete output ud for a certain period. The parameteriza-
tion of action is generally expressed by a(θ), where θ = [α, β] in this case. In
our optimal control problem, parameter θ is determined instead of deciding
action a. Parameter set depends on actions, which is expressed by Θ(a).
The Bellman equation with action parameterization can be rewritten as
 
(i) (i) (i) (i) (i) (i) (i) (i)
V∗ (sk ) = min p(sk , sl , a(θ)) r(sk , sl , a(θ)) + V∗ (sl ) , (3)
a∈A(s),
(i)
θ∈Θ(a) sl
(i)
where s = {· · · , sk , · · · }.
Let π(θ) denote that a policy is parameterized by
θ. Note that in this research, parameter θ is adjusted for seeking optimal
control while fixing policy π. The state value of task i under π(θ) satisfies
the following:
 
(i) (i) (i) (i) (i) (i) (i) (i)
Vπ(θ) (sk ) = p(sk , sl , a(θ)) r(sk , sl , a(θ)) + Vπ(θ) (sl ) , (4)
(i)
sl
(i)
where s = {· · · , sk , · · · }.
In order to save calculation amount for finding
optimal solution for θ, nominal parameter sets θ̄ are introduced. The value
function under the policy π(θ̄) can be computed off-line using (4) by fixing
(i)
θ = θ̄. Using the values of Vπ(θ̄) (s), an approximation of the optimal multi-
task parameter selection can be done by


n
(i) (i)
[θ|s] = arg min wi Qπ(θ̄) (sk , a(θ)),
θ
i=1
 
(i) (i) (i) (i) (i) (i) (i) (i)
Qπ(θ̄) (sk , a(θ)) ≡ p(sk , sl , a(θ)) r(sk , sl , a(θ)) + Vπ(θ̄) (sl ) (5)
sl
Behavior Design of a Human-Interactive Robot 569

and wi > 0 denotes weighting coefficient for setting priorities among tasks.
By (5), an appropriate action parameter can be selected considering (approx-
imate) optimality of the shortest time control and priorities among tasks.

3 Implementation of Human-Interacting Parallel Tasks


RI-MAN was developed to realize human-interacting tasks. The robots has a
speaker, two CCD cameras and two microphones on the head. The state
variables are xc = [xTangle , xThuman ]T and xd = [xcom , xnh ]T , where xangle
denotes joint angles of the robot, xcom denotes command given by a hu-
man instructor, xnh denotes the number of humans and xhuman denotes po-
sitions of humans in the robot coordinate. The observation variables are
T T T
yc = [yhuman , ysound , ytac ] and yd = ycom , where yhuman denotes position
of human (face) obtained by image processing, ysound denotes orientation of
human who generate sound, which is obtained by sound source localization
[16], ycom denotes command or conversation ID recognized by speech and ytac
denotes contact information between the robot body and humans. The out-
put variables are uc = uangle and ud = uspeech, where uangle denotes desired
velocities of joint angles and uspeech denotes ID of speech.

3.1 Definition of Parallel Tasks


Fig.2 shows the tasks implemented in this research. Two tasks of security of
collision avoidance task and conversation task are simultaneously executed.

Task 1: Security of Collision Avoidance Task

The robot looks around itself and confirms whether human exists or not.
When the robot judges that a human exists in vicinity, the robot speaks to
human so that human does not approach to the robot any more. The robot
estimates probability of human existence ph (i1 , i2 ), where [i1 , i2 ] denotes a
grid that is generated by dividing the 2D space around the robot. For the
judgment of human existence, two threshold values pth1 and pth2 (0 < pth1 <
pth2 < 1) are introduced and used as

human exists at [i1 , i2 ] when ph (i1 , i2 ) > pth2 (6)


human does not exist at [i1 , i2 ] when ph (i1 , i2 ) < pth1 (7)
human existence is unknown at [i1 , i2 ] when pth1 < ph (i1 , i2 ) < pth2 . (8)

Let Rvicinity denote the vicinity of robot set and Rcontact denote the contact
set. The grids are classified into three sets, Rvicinity, Rcontact and the rest
Rrest = Rall \(Rvicinity ∪ Rcontact ). There are four places in the task:
570 Y. Kobayashi et al.

Fig. 2 Expression of d 21 d11


tasks by Petri-net unknown attention speak found lost
d 02 d12 d 22

conversation task
d 31
safety collision security of collision
d 01 avoidance task

• ‘COLLISION’; there exists a grid [i1 , i2 ] with probability ph (i1 , i2 ) >


pth2 , [i1 , i2 ] ∈ Rcontact .
• ‘ATTENTION’; the token is not at ‘COLLISION’ and there exists a grid
with probability ph such that ph (i1 , i2 ) > pth2 , [i1 , i2 ] ∈ Rvicinity .
• ‘UNKNOWN’; the token is not at ‘COLLISION’ nor ‘ATTENTION’, and
there exists a grid such that pth1 < ph < pth2 , [i1 , i2 ] ∈ Rrest ∪ Rvicinity.
• ‘SAFETY confirmed’; the token is not at other places. That is, all the
grids in Rvicinity has the lower probability ph such that ph < pth1 .
SAFETY is the objective place of the security task. COLLISION is judged
by tactile sensor.

Task 2: Conversation Task

The robot faces to a human and speaks to him or her. When the robot does
not receive any speech from humans, the robot promote the conversation by
orienting the face to a person. There are three places in the conversation task.
• ‘Human FOUND’; there exists grid [i1 , i2 ] such that ph (i1 , i2 ) > pth2 . The
token on this place will transit to SPEAK when a speech is recognized.
• ‘SPEAK’; based on the result of recognition of human speech, the robot
outputs some reply through the speaker. The end of the output triggers
the transition of the token from SPEAK to FOUND.
• ‘Human LOST’; there does not exist any grid such that ph (i1 , i2 ) > pth2 .
SPEAK is the objective place in the conversation task. The transition from
FOUND to SPEAK depends on the utterance of human. This process is
expressed by a stochastic transition.

3.2 Head Trajectory Generation through Optimization


The head swinging action is executed for all places of the security task and
FOUND and LOST places of the conversation task. Let ntraj denote the num-
ber of candidate trajectories. The head trajectory is generated by choosing
one from ntraj candidates through optimization. To generate a candidate of
trajectory θk (k = 1, · · · , ntraj ), several grids are extracted stochastically (see
Fig.4). Head angles (denoted by [qpan , qtilt ]T ) that correspond to the positions
of those grids are calculated. The corresponding angles of the head comprise
a candidate trajectory. ph (i1 , i2 ) gives a probability to be extracted as a via
Behavior Design of a Human-Interactive Robot 571

Rext
qtilt
q2

qpan
Rvicinity
Rall q3
robot
Rcontact q1

qpan

Fig. 3 Grid set around robot Fig. 4 Generation of trajectory with via points

point of a trajectory. That is, grids with high ph tend to be used as via
points. The selecting probability function is denoted by pselect (ph ), which is
close to one when ph  1 and close to zero when ph  0. Let qi denote ith via
point as qi = [qpani , qtilti ]T . By succession of those points, a trajectory is gen-
erated as θk = {q1 , q2 , · · · , qnkv },where nkv denotes the number of via-points
of candidate θk . The head angles are ordered so that arctan(qtilti /qpani ) be-
comes monotonically increasing. By setting the velocity of head angle vhead
as constant, the total time for a periodic motion can be expressed by
k

nv
qj+1 − qj 
T (a(θk )) = , qnkv +1 ≡ q1 . (9)
j=1
vhead

(i) (i)
This total time corresponds to the duration as r(sk , sl , a(θk )) = T (a(θk )).
(1) (1)
In the case of security task, the transition probability p(sk , sl , a(θk )) is
calculated as followings. First, grids that are visible by a sequence of head
motions generated by head trajectory θk are calculated and denoted as set
Rvisible . Let Rugrid denote set of grids where humans existence is not known
(defined by (8)). If view range realized by θk does not cover unknown grids,
the token remains at UNKNOWN place. This can be expressed as

(1) (1)
If Rvisible ∩ Runknown = ∅, then p(s2 , s2 , a(θk )) = 1,
(1) (1) (1) (1)
p(s2 , s1 , a(θk )) = p(s2 , s0 , a(θk )) = 0, (10)
(1) (1) (1)
where s0 , s1 and s2 denote SAFETY, ATTENTION, UNKNOWN, re-
spectively. On the other hand, when Rvisible ∩ Runknown = ∅, transition prob-
abilities are given by the followings.
(1) (1)

p(s2 , s1 , a(θk )) = (1 − ph (i1 , i2 )) (11)


[i1 ,i2 ]∈Rvisible ∩Rvicinity
(1) (1) (1) (1) (1) (1)
p(s2 , s2 , a(θk )) = 0, p(s2 , s0 , a(θk )) = 1 − p(s2 , s1 , a(θk )) (12)
572 Y. Kobayashi et al.

3.3 Model of Human Conversation


It is assumed that the speech of human happens stochastically and the prob-
ability of speaking depends on how much the robot is looking at the person.
The motion of the robot head is periodic with a period of T (a(θk )). The ratio
of duration when the robot is facing to the human is defined as

τ(i1 ,i2 ) (θk )


ξ(θk ) = , a human exists in grid [i1 , i2 ], (13)
T (a(θk ))

where τ(i1 ,i2 ) (θk ) denotes the duration when [i1 , i2 ] is included in the view
range. The probability of human speech can be expressed as a function of
ξ(θk ) as pspeech(ξ(θk )). pspeech(ξ)) is defined so that it becomes close to one
when ξ  1 and decreases as ξ gets close to zero. The transition probability
(2) (2)
in the conversation task is expressed as p(s1 , s0 , a(θk )) = pspeech(ξ(θk )).

4 Experiment and Simulation


The proposed framework is evaluated in simulation and experiment. In the
evaluation, stochasticity of human behavior is omitted by 1) evaluating how
the robot acts on the human instead of evaluating human utterance, and 2)
fixing trajectories of humans instead of letting humans walk arbitrarily.
Simulation was performed ten trials for each strategy, 300 [sec] for one trial.
Two virtual humans repeat walking along fixed trajectories around the robot.
Gazing time, neglecting time and average number of unknown grids are plot-
ted in Fig.5, Fig.6 and Fig.7. Gazing time denotes the duration when the robot
was looking at a human. This duration affects the probability of human utter-
ance and thus affects achievement of the conversation task. Neglecting time
denotes the duration when the robot did not notice a human who was in the
region of Rvicinity . The long duration of neglecting time means that the secu-
rity task is not sufficiently achieved, because a token on UNKNOWN should
move to ATTENTION as soon as possible. Average number of unknown grids
is calculated using (8). This number indirectly relates to the achievement of
the security task because it becomes smaller when the robot keeps searching
longer than tracking. When [w1 , w2 ] = [1, 0], optimization of (5) is done con-
sidering the security task only. As a result, the robot keeps searching. On the
contrary, when [w1 , w2 ] = [0, 1], the conversation task is considered in the opti-
mization. In this case, the robot keeps following humans. By changing [w1 , w2 ],
the balance between searching and tracking humans can be adjusted.
From Fig.5-Fig.7, it can be seen that the performance of the robot is ad-
justed by changing parameters w1 , w2 . In Fig.5, the gazing time was the longest
in case of [w1 , w2 ] = [0, 1]. That is, the best performance of the conversa-
tion task was achieved by putting priority to the conversation task. On the
contrary in Fig.7, the number of unknown grids was the largest in case of
Behavior Design of a Human-Interactive Robot 573

250 100 12

# of unknown grids [sec]


neglected time [sec]
gazing time [sec]
200 80 10
8
150 60
6
100 40
4
50 20 2
0 0 0
[1,0] [1,1] [0.7,1] [0.5,1] [0,1] [1,0] [1,1] [0.7,1][0.5,1] [0,1] [1,0] [1,1] [0.7,1][0.5,1] [0,1]
[w1 , w2 ] [w1 , w2 ] [w1 , w2 ]

Fig. 5 Gazing time Fig. 6 Neglected time Fig. 7 Unknown grids

200
200

neglected time [sec]


gazing time [sec]

160
160
120 120
80 80
40 40
0 0
[1,0] [0.7,1] [0,1] [1,0] [0.7,1] [0,1]
[w1 , w2 ] [w1 , w2 ]

Fig. 8 Gazing time in experiment Fig. 9 Neglected time in experiment

[w1 , w2 ] = [0, 1]. That is, the performance of the security task, to look around
and decrease the number of unknown grids, was sacrificed by putting priority
to the tracking behavior.
Next in the experiment, two (real) humans walk around RI-MAN along fixed
trajectories for 420 [sec] per one trial. In Fig.8, the gazing time is maximum
in the case of [w1 , w2 ] = [0, 1]. In Fig.9, the neglecting time is minimum in
the case of [w1 , w2 ] = [1, 0]. In the case of [w1 , w2 ] = [0.7, 1], an intermediate
performance was obtained both in the gazing time and the neglecting time.
Thus, a similar tendency to the case of simulation could be seen also in the
experiment.

5 Conclusion
This paper proposed a behavior design of a human-interacting robot which is
required to execute multiple parallel tasks under uncertainties caused by hu-
mans. MDPs were constructed based on the description of parallel tasks by
the petri-net. The control framework was proposed as a shortest-time optimal
control problem and the multiple task problem could be dealt with in a sys-
tematic manner. In the application to the security task and the conversation
task of RI-MAN, models of human behaviors were introduced. By simulation
and experiment, it was verified that the proposed framework enables to adjust
the performance of the robot by changing weighting parameters.
574 Y. Kobayashi et al.

References
1. Kanda, T., Hirano, T., Eaton, D., Ishiguro, H.: Interactive robots as social
partners and peer tutors for children: A field trial. Human Computer Interac-
tion 19(1-2), 61–84 (2004)
2. Shiomi, M., Kanda, T., Ishiguro, H., Hagita, N.: Interactive humanoid robots
for a science museum. IEEE Intelligent Systems 22(2), 25–32 (2007)
3. Kim, G., Chung, W., Park, S., Kim, M.: Experimental research of navigation
behavior selection using generalized stochastic petri nets for a tour-guide robot.
In: Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (2005)
4. Bazewicz, J.: Scheduling computer and manufacturing processes. Springer, Hei-
delberg (1996)
5. Brooks, R.A.: A robust layered control system for a mobile robot. IEEE Journal
of Robotics and Automation RA-2, 253–262 (1986)
6. Connell, J.H.: Sss: A hybrid architecture applied to robot navigation. In: Proc.
of the 1992 IEEE Conf. on Robotics and Automation, pp. 2719–2724 (1992)
7. Lehmann, A., Mikut, R., Asfour, T.: Petri nets for task supervision in humanoid
robots. In: Proc. 37th International Symposium on Robotics, pp. 71–73 (2006)
8. Kobayashi, K., Nakatani, A., Takahashi, H., Ushio, T.: Motion planning for hu-
manoid robots using timed petri net and modular state net. In: Proc. of the 2002
Int. Conf. on Systems, Man & Cybernetics, pp. 334–339 (2002)
9. Haas, P.J.: Stochastic Petri Nets. Springer Series in Operations Research (2002)
10. Odashima, T., et al.: A soft human-interactive robot ri-man. In: Video Proceed-
ings of IEEE/RSJ International Conference on Intelligent Robots and Systems
(2006)
11. Ramage, P.J.G., Wonham, W.M.: The control of discrete event system. Proc.
IEEE 77(1), 81–98 (1989)
12. Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge
(1998)
13. Bertsekas, D.: Dynamic Programming and Optimal Control. Athena Scientific
(2005)
14. Elfes, A.: Using Occupancy Grids for Mobile Robot Perception and Navigation.
Computer 22(6), 46–57 (1989)
15. Stepan, P., Kulich, M., Preucil, L.: Robust data fusion with occupancy grid.
IEEE Trans. on Systems, Man, and Cybernetics Part C 35, 1 (2005)
16. Nakashima, H., Ohnishi, N., Mukai, T.: Self-Organization of a Sound Source Lo-
calization Robot by Perceptual Cycle. In: 9th Int. Conf. on Neural Information
Processing, vol. 2, pp. 834–838 (2002)

Você também pode gostar