Você está na página 1de 8

Learning Capture Points for Humanoid Push

Recovery
John Rebula, Fabián Cañas, Jerry Pratt Ambarish Goswami
The Institute for Human and Machine Cognition Honda Research Institute
Pensacola, FL 32502 Mountain View, CA 94041
Email: jrebula@alum.mit.edu Email: agoswami@honda-ri.com

Abstract— We present a method for learning Capture Points Since humans are so adept at stepping to an appropriate
for humanoid push recovery. A Capture Point is a point on the place to recover from pushes, we argue that humanoid robots
ground to which the biped can step and stop without requiring need to become just as adept at this skill in order to operate
another step. Being able to predict the location of such points
is very useful for recovery from significant disturbances, such in real world environments and perform tasks that humans
as after being pushed. While dynamic models can be used to perform.
compute Capture Points, model assumptions and modeling errors Learning where to step in order to recover from a distur-
can lead to stepping in the wrong place, which can result in large bance is a promising approach that does not require sophisti-
velocity errors after stepping. We present a method for computing cated modeling, accurate physical parameter measurements, or
Capture Points by learning offsets to the Capture Points predicted
by the Linear Inverted Pendulum Model, which assumes a point overly constraining limitations on the walking control system.
mass biped with constant Center of Mass height. We validate our In this paper we present a method for learning Capture Points
method on a three dimensional humanoid robot simulation with for humanoid push recovery. Loosely speaking, a Capture
12 actuated lower body degrees of freedom, distributed mass, Point is a point on the ground in which a biped can step to
and articulated limbs. Using our learning approach, robustness and stop without requiring another step [26], [28]. The locus
to pushes is significantly improved as compared to using the
Linear Inverted Pendulum Model without learning. of all possible Capture Points is the Capture Region.
The location of the Capture Region depends on a number
I. I NTRODUCTION of factors, including the state of the robot and the kinematic
Appropriate foot placement is critical when attempting to reachability of the swing leg. In addition, one can define
come to a stop during dynamic three dimensional bipedal Capture Points with certain qualifications or assumptions. In
walking. Simple models can be used to determine where to this paper we assume the robot has an existing control system
step in order to stop [9], [26], but the errors resulting from the which can step to any reasonable desired location on the
assumptions of these models may be significant, especially in ground, and it takes this desired location as an input variable.
bipeds with a large number of degrees of freedom, distributed Given the robot and control system, we learn the desired
mass, and complex walking control systems. More complex stepping location in order for the robot to stop. This method
models can reduce the errors resulting from these assumptions, should be applicable to any humanoid robot and control system
but even with accurate models, small errors in modeling that has certain characteristics described in the next section.
parameters may lead to significant errors in predicting the
II. P ROBLEM S TATEMENT
desired stepping location.
Humans are very adept at stepping in just the right place We wish to learn Capture Points, points on the ground that
after being pushed in order to regain balance. From simple the robot can step to and stop in a single step. These Capture
observations of toddlers it is clear that this ability improves Points are a function of the robot’s state, the control actions
with practice and that learning likely plays a large role in a during the current stance phase, and the control actions during
person’s ability to choose an appropriate place to step. This the next stance phase in which the robot attempts to stop. In
ability becomes almost reflexive as can be experienced by the this paper we assume that the robot’s control systems already
following simple demonstration: exist and that we can modify the module, which we call
Stand on one leg and have someone push you in a random the Desired Stepping Location Calculator, that determines the
direction. Notice how you step to a spot that allows you to desired stepping location. We then use learning to improve this
stand on one leg again. Now, still standing on one leg, have module, better predicting Capture Points, given the preexisting
someone push you again in a random direction, but this time control systems.
purposefully step to a spot on the ground that does not allow Before stating the problem, we first define a few terms. Our
for you to regain balance and stop. Notice how natural and goal is for the robot to come to a stop after being pushed. We
automatic it seems when you step in a good spot but how define a Stopped State as follows:
difficult and how much conscious effort it takes to step in a • Stopped State: Robot state in which the robot’s Center
bad spot. of Mass velocity, angular momentum about the Center of
Mass, and Center of Mass linear and rotational accelera- Stopping Control System does a reasonable job of
tions are all less than given thresholds. stopping (terminating in a Stopped State), without
Note that the velocity and acceleration thresholds need to be requiring an additional step, from a reasonably sized
non-zero for Stopped States to exist for many bipedal models. basin of attraction of states.
For example, for a point-foot point-mass biped, zero Center Our problem statement is then as follows: Given a biped
of Mass velocity and acceleration cannot be achieved during with a preexisting Stepping Control System and a preexisting
single support in the presence of infinitesimal disturbances. Stopping Control System, determine the desired foot place-
With small but finite thresholds, the robot will be essentially ment for the Stepping Control System, such that after the
stopped, such that with a small amount of Center of Pressure robot takes a step, the Stopping Control System will bring the
regulation on the foot or a small amount of upper body robot to a Stopped State. In other words, determine a Desired
motions, the robot could maintain balance without requiring Step Location Calculator ~xstep = πcapture (~x, p~step , p~capture ),
any additional steps. For humanoids with non-point feet, given where p~capture represents potential internal state of the Desired
reasonable velocity and acceleration thresholds, being in a Step Location Calculator.
Stopped State will also imply that the Capture Region has Note that we assume the Desired Step Location Calculator
a large intersection with the convex hull of the support feet. has access to both the state of the robot, ~x, and any internal
We consider the gait of the robot to have two main phases, state of the Stepping Controller, p~step . In this paper, we
the Capture Step and the Stopping Stance Phase: consider two implementations of the Desired Step Location
• Capture Step: The single support phase and swing foot Calculator:
placement during which the robot takes a step in attempts • Single Decision: The Stepping Control System queries
to reach a Stopped State during the next support phase. the Desired Step Location Calculator once per swing.
• Stopping Stance Phase: The support phase occurring
This could occur at a trigger event, such as immediately
after the Capture Step, during which the robot attempts after exchange of support, at the point of maximum
to reach a Stopped State, without taking any additional body height, or just before final placement of the swing
steps. leg. In our simulation experiments, we query the step
Note that the Stopping Stance Phase could be a double location when the Linear Inverted Pendulum Capture
support phase, followed by a single support phase, or could Point reaches a certain distance from a foot edge and
be an instantaneous exchange of support into a single support a certain amount of time has been spent falling.
phase. We define the control systems that are in effect during • Continuous Decision: The Stepping Control System
each of these two stages of the gait as follows: queries the Desired Step Location Calculator every con-
• Stepping Control System: The control system the robot trol cycle during a swing and responds smoothly to
utilizes during the Capture Step. πstep . changing step locations.
• Stopping Control System: The control system the robot
In this paper we assume that a Stepping Control System
utilizes during the Stopping Stance Phase. πstop .
and a Stopping Control System already exist for the biped
The Stepping Control System will determine the location and we concentrate on using learning to determine a Desired
that the robot steps to. We define the module of this control Step Location Calculator. We do this mainly because choosing
system that determines the desired location to step to as the an appropriate stepping location is perhaps the most critical
Desired Step Location Calculator, control action to take when attempting to stop in bipedal
• Desired Step Location Calculator: Control system mod- walking, particularly in 3D bipedal walking. Other control
ule of the Stepping Control System that calculates the actions, such as moving the Center of Pressure on the feet,
desired step location in attempts to reach a Stopped State accelerating internal momentum (e.g., lunging the upper body
during the Stopping Stance Phase. or “windmilling” the arms), or altering the Center of Mass tra-
Note that the Stepping Control System may query the jectory are also useful control actions. However, their success
Desired Step Location Calculator either at predefined trigger depends on the robot first choosing an appropriate Desired
events, or continuously throughout the swing, to determine Step Location, such that these control actions can be used to
where to step to. Let us make the following assumptions: bring the robot to a Stopped State.
A1 The Stepping Control System is a function of the In the experiments for this paper, the Stepping Control
robot’s state, ~x, some internal controller state vari- System and the Stopping Control System that we employ each
ables, p~step , and a desired stepping location, ~xstep . are designed using simple physics based heuristic control rules
~u = πstep (~x, p~step , ~xstep ). The Stepping Control [25]. The main strategy employed for the stopping control
System does a reasonable job of stepping to the de- system is to modulate the Center of Pressure on the support
sired step location, maintains balance, and regulates foot based on a Linear Inverted Pendulum model in order to
any other important aspects of walking. have the Center of Mass come to rest over the foot. In this pa-
A2 The Stopping Control System is a function of the per, we do not use strategies that involve accelerating internal
robot’s state, ~x, and perhaps some internal controller momentum or altering the Center of Mass trajectory based on
state variables, p~stop . ~u = πstop (~x, p~stop ). The velocity. However, controllers that use these strategies could
Fig. 1. Simulation results before learning (left) and after learning (right) showing the humanoid robot reacting to a disturbance in a single step. Before
learning the robot fails recover balance. After learning the robot successfully comes to a stop in one step. The ball shown in the first frame of each animation
applied a push from straight on at 550 Newtons for 0.05 seconds. Images are from left to right, top to bottom and are spaced 0.04 seconds apart.

also be used in the presented learning method and should result learn from. The nominal stepping location is based on the
in larger Capture Regions than controllers that do not employ Linear Inverted Pendulum Model [9] of walking and, in
these strategies. our simulations, typically predicts a Capture Point within 20
We state the above problem fairly generally and would like centimeters of an actual Capture Point. The off-line learning
to learn Capture Regions for any state of the robot. As a technique represents the learned Capture Point as an offset
stepping stone to that result, in this paper we focus on the from the Linear Inverted Pendulum Capture Point. This offset
problem of starting in a Stopped State, being pushed, and then is calculated based on curves fit to several variables of the state
taking a step and returning to a Stopped State. The robot starts vector. The online learning technique represents the learned
balancing over one leg with the other leg raised off the ground. Capture Point in a table indexed using a simple gridding of
A pushing force occurs, disrupting the robot’s balance and several variables of the state space.
requiring the robot to take a step to stop. Setting the robot up
like this initially is less general, but it allows us to focus on this A. Stopping Energy Error
particular instance of the problem, and allows us to generate a To evaluate the performance of the learned Desired Step
sufficient set of data to learn from with a reasonable number Location Calculator, it will be necessary to compare the errors
of trials. In addition, once this skill is learned, it can be used resulting from different stepping locations. While there is a
as the basis for stopping from more general conditions. clear distinction between a Stopped State and a state that
will lead to a fall, a measure of stopping error is required
III. S IMULATION M ODEL
to compare two falling states.
The learning method is verified on a simulated 12 degree To compare results from stepping in different locations, we
of freedom humanoid robot. Table I details the important define a residual stopping “energy” as,
parameters of the model.
1 ˙ 2 1g 2
Estopping = |~x| + |~x| , (1)
TABLE I 2 2l
S IMULATED ROBOT PARAMETERS
where ~x is the ground projection of the vector from the support
Total Mass 28.84 kg
foot to the Center of Mass, g is the gravitational acceleration,
Mass in Body 16.0 kg and l is the average Center of Mass height. Note that this
Mass in Each Leg 6.42 kg quantity is similar to the conserved Orbital Energy of the
Leg Length 0.911 m
Standing Center of Mass Height 0.945 m
Linear Inverted Pendulum Model [9], except that instead of
Foot Width 0.1 m subtracting the two terms, we add them.
Foot Length 0.203 m Estopping will decrease as the Center of Mass approaches
the foot and decelerates. It will reach a minimum approxi-
mately when the Center of Mass is the closest to the foot. We
IV. L EARNING T ECHNIQUES use the minimum value as the Stopping Error, and refer to the
We present two different learning techniques, one off-line state of the robot at that instance as the “top of stance” state.
and one on-line, to solve the problem presented in the previous For a given push, it is reasonable to expect that the Stopping
section. Instead of learning from scratch, both techniques Error will decrease as the robot steps closer and closer to the
employ a nominal stepping location as a starting point to Capture Region. When the robot steps to a Capture Point,
the Stopping Error will be near zero since the robot will pass the forward axis of the biped. We omitted the π2 radian sector
through a Stopped State, with the Center of Mass approaching to the right of the robot because the controller currently has
a balanced position over the foot. difficulty maintaining stability in that region, and it forces the
swing leg (the left leg) to cross through the support leg.
B. Off-line Learning Technique 1) Gridded Capture Point Table: The Desired Step Lo-
In the off-line learning technique, we collect data resulting cation Calculator stores learned Capture Points in a three
from a number of different pushes on the robot, and a number dimensional gridded table. The three axes of this space are
of different stepping locations on the ground. The pushes were the Linear Inverted Pendulum Capture Point x and y, and
simulated as impact forces on the robot for a set time, 0.05 the distance in the XY plane of the Center of Mass from
sec, at varying magnitudes, 150 to 550 N at 100 N intervals, the support foot. The grid size in each of the axes is 4
yielding force impulses between 7.5 and 27.5 N sec. The centimeters. When the controller queries for a Capture Point,
direction of the applied force was varied over the full range of the Desired Step Location Calculator calculates the Linear
[0, 2π] at 10 even intervals. During learning, the Desired Step Inverted Pendulum Capture Point, (xc,lip , yc,lip ) from the
Location Calculator returned the Capture Point predicted by current state of the robot. It then uses (xc,lip , yc,lip ) and the
the Linear Inverted Pendulum Model plus an offset that varies distance of the Center of Mass from the support foot in the
along a 15 by 15 point grid spanning ±0.15m in both x and Cartesian plane to determine which grid block the current state
y. For each push setting (magnitude and direction pair), and is in. If there is a stored Capture Point in the grid block, the
offset, the robot is started from rest on one foot and disturbed stored point is returned as the desired Capture Point location. If
by the impact force. The Desired Step Location Calculator the block is empty then the point (xc,lip , yc,lip ) is returned. In
decides where to step, and the simulation continues until the the Continuous Decision implementation, the Stepping Control
next support phase either reaches a Stopped State or the Center System applies a simple filter to the returned point each control
of Mass falls below 0.65m, indicating a fall. Each force setting cycle, so step changes in the desired Capture Point location
is simulated for 225 swings, one for each offset. There are 500 as the state crosses between grid blocks are smoothed.
force settings, yielding 112500 trials. The table begins the learning process empty. After a swing
The centroid of the Capture Region for each force setting is whose state passes through a grid entry with no stored value,
then calculated from the trial data. This yields an offset vector all of the states that fell into that grid entry for the swing
from the nominal location estimated by the Linear Inverted are recalled. The Linear Inverted Pendulum Capture Points of
Pendulum Model to the centroid of the Capture Region. We these states are averaged together, and the grid entry’s stored
learn a Desired Step Location Calculator by fitting functions of Capture Point is initialized with this value.
the decision state to estimate the offset vector to the centroid of 2) Learning Update Rule: After each swing trial, if the step
the Capture Region. It was found that the angle and magnitude location succeeded in bringing the robot to a Stopped State,
of the desired offset were best correlated with the angle (in no update is applied to the table. If the step location chosen
the XY plane) of the Center of Mass velocity at the decision resulted in the robot falling, the trial is stopped when the body
event, so this was the only dimension of the state vector height goes below 0.85 m. The final velocity vector of the
used in the calculation of the offset. The learned Desired Step body as it falls away from the support foot corresponds to
Location Calculator determines the desired step by calculating the appropriate direction in which to correct the Desired Step
the Inverted Pendulum Capture Point and adding an offset Location. Therefore the desired offset, ~δoffset , is calculated as
vector calculated by evaluating the fit functions for the offset ~δoffset = Estopping ~x˙ f inal (2)
magnitude and angle using the decision state Center of Mass
velocity angle. where the magnitude to correct the location by is determined
To test the Desired Step Location Calculator, a new set of by Equation 1, the energy error at the top of stance. A learning
pushes was generated at a finer resolution, 150 to 550 N at gain is then applied to maintain stability of the learning
50 N intervals and 20 even angle intervals. These pushes were process, yielding
simulated on the robot using both the simple Linear Inverted ~ = ~δoffset Klearning ,
∆V (3)
Pendulum model to calculate Capture Points and the learned
Desired Step Location Calculator. where Klearning = 0.6.
For the Single Decision online learning implementation, the
C. On-line Learning Technique stored Capture Point used for the swing is updated by adding
In the on-line learning technique, the robot is pushed ∆V~ to it.
randomly and the Desired Step Location Calculator is used For the Continuous Decision implementation, ∆V ~ is used to
to decide where to step. Based on the results of the step, the update the stored Capture Point of each gridded state that the
Desired Step Location Calculator is updated. The simulation swing passed through, multiplied by an additional term that
is run for 10000 random pushes, with the force magnitude depends on the time that the state occurred during the swing.
varying in the range of 200 to 450 Newtons, yielding force A sigmoidal function of time ramping from 0 at the beginning
impulses between 10 and 22.5 N sec. The force direction of the swing to approximately 1 halfway through the swing
varied in the range [− π4 , 5π ~ and then added
4 ]. The angle is measured from evaluated at each state is multiplied by ∆V
with a simple approximation that encompasses enough of the
measured data to make a reasonable prediction.
Figure 4 shows the Stopping Error using Capture Points pre-
dicted from the Linear Inverted Pendulum Model and using a
curve fit with off-line learning. Nine different force magnitudes
from 150 to 550 Newtons for 0.05 seconds, corresponding to
7.5 to 27.5 N sec impulses were applied from 20 different
directions. The mean squared error before learning is 0.137
J2
Kg 2 , and the controller successfully comes to a Stopped State
only 1 out of 180 trials. After learning the mean squared error
J2
is 0.0192 Kg 2 , and the controller successfully comes to a
Fig. 2. Diagram showing results of stepping to various locations for a given Stopped State in 45 out of 180 trials.
push. Pink circles show Capture Points. At other points a red arrow shows the
resulting vector from the new support foot to the Center of Mass Position and
a green arrow shows the resulting Center of Mass Velocity at the point where B. On-line Learning Results
the Center of Mass comes closest to the center of the support ankle. The black
and white target shows the ground projection of the Center of Mass when the Figure 5 shows the learning results of the Single Decision
Desired Step Location Calculator was queried. The blue rectangle shows the
location of the current stance foot. The center of the grid of points is the and Continuous Decision online learning trials. The Contin-
location of the predicted Capture Point using the Linear Inverted Pendulum uous Decision success percentage is higher both before and
Model. after learning than the Single Decision case. Before learning,
this is probably due to the reactive nature of the contin-
to the stored Capture Point at that state. This time ramping uously updating controller. The Single Decision Controller
factor was added because there are fewer possible states early is limited to the initial Linear Inverted Pendulum Capture
in the swing, so they will be trained towards a larger range of Point calculated at the beginning of the swing, whereas the
Capture Points than states later in the swing. Continuous Decision Controller reacts throughout the swing
to an increasingly accurate Linear Inverted Pendulum Capture
V. S IMULATION R ESULTS Point. After learning, the Continuous Decision learning also
Learning resulted in a significant increase in the frequency has the advantage of a higher degree of discrimination between
of the robot coming to a stop and a significant decrease in the swings because decisions made early in the swing can be
residual energy after a step in both the off-line learning case refined at later grid states.
and the on-line learning case. Figure 6 shows a representation of the gridded memory for
the Continuous Decision learning case. The three rows of the
A. Offline Learning graph show the memory throughout the training process, at
Figure 2 shows the results of stepping to various locations 1000, 2000, and 10000 iterations. The plots from left to right
for a given push. Pink circles show Capture Points. At other represent slices of the memory at progressively larger Center
points a red arrow shows the resulting Center of Mass position of Mass distances from the support foot. The Center of Mass
and a green arrow shows the resulting Center of Mass Velocity range is represented as a black ring around the center of the
at the point where the Center of Mass comes closest to the foot, represented as a blue rectangle. Each point represents
center of the support ankle. The black and white target shows a gridded Linear Inverted Pendulum Capture Point, the color
the ground projection of the Center of Mass when the Desired represents the distance from the point stored at that location to
Step Location Calculator was queried. The blue rectangle the Linear Inverted Pendulum Capture Point. As more gridded
shows the location of the current stance foot. The center of the states are visited, the memory fills in new blocks, and as more
grid of points is the location of the predicted Capture Point states are experienced in existing blocks the stored Capture
using the Linear Inverted Pendulum Model. Point is refined.
Figure 3 shows the measured and best fit stepping location
offsets resulting from off-line learning. The magnitude and VI. D ISCUSSION AND F UTURE W ORK
angle of the step location offset (vertical axes) are functions
of the Center of Mass velocity angle at the decision event (hor- We have presented a method for learning Capture Points for
izontal axis). The offset magnitude is fit using a cubic curve. humanoid push recovery. This method assumes that a Stepping
The angle is fit using a cubic curve for velocity angles less than Controller and a Stopping Controller already exist for the
0, and a line for velocity angles greater than 0. Several points biped. We learn offsets to Capture Point locations predicted
in each dimension were ignored as outliers when calculating from the Linear Inverted Pendulum Model of walking. This
the best fit curve, as the resulting fit passed near fewer of results in a Desired Foot Placement Controller that the Step-
the measured points. These outliers imply that the Center of ping Controller can query. Simulation results show that this
Mass velocity angle cannot completely characterize the offset method results in significantly more robust push recovery than
vector of the Capture Point from the Linear Inverted Pendulum without using learning. There are several points of discussion
Capture Point. However, these curve fits provide the controller and potential areas of future work.
Capture Region Center Offset Angle From LIP Capture Point vs COM Velocity Angle Capture Region Center Offset Magnitude From LIP Capture Point vs COM Velocity Angle
6 2.2
Measured offset angles Measured offset magnitudes
Ignored Measured angles 2 Measured magnitudes ignored
5 Best fit best fit
1.8

1.6

Magnitude of offset (m)


4
Angle of offset vector

1.4

3 1.2

1
2
0.8

0.6
1
0.4

0 0.2
!4 !3 !2 !1 0 1 2 3 !3 !2 !1 0 1 2 3
COM velocity angle at decision time COM velocity angle at decision time

Fig. 3. Measured and best fit stepping location offsets resulting from off-line learning. The left graphs shows the angle of the step location offset (vertical
axis) as a function of the Center of Mass velocity angle at the decision event (horizontal axis). The right graph shows the magnitude of the step location
offset (vertical axis) as a function of the Center of Mass velocity angle at the decision event (horizontal axis).
Stopping Error using the LIP capture point calculator vs. push angle and magnitude

Stopping Error using the Learned Function capture Point calculator vs. push angle and magnitude

1.2
Residual Stopping Error (J/Kg)

1 1.2

Residual Stopping Error (J/Kg)


1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
500 0
6 500
400 5
4 6
300 3 400 5
2 4
200 1 300 3
0 2
Applied force magnitude (N) Applied force angle 200 1
Applied force magnitude (N) 0
Applied force angle

Fig. 4. Stopping Error using Capture Points predicted from the Linear Inverted Pendulum Model (left) and adding a Capture Point offset predicted from the
simple curve fit based on the Center of Mass velocity angle at the decision event (right). Nine different force impulses were applied from 20 directions. The
J2
mean squared Stopping Error when using the Linear Inverted Pendulum model is 0.137 Kg 2 , with a stopping success rate of 1 out of 180 trials. The mean
J2
squared error when using the curve fit to predict Capture Point offsets is 0.0192 Kg 2
, with a success rate of 45 out of 180 trials.

Fig. 5. Results of on-line learning using the Single Decision Controller (left) and the Continuous Decision Controller (right) over 10000 training iterations.
A successful trial has a value of 1, and a failed trial has a value of 0, regardless of the residual Energy Error. The points plotted represent the average value of
the previous 200 trials. The filtered estimate was found by using a forward/reverse 750 trial running average of the data padded on either side with the average
of the first and last 500 iterations. The vertical lines represent training being turned on at 1000 iterations and off after 11000 iterations. The disturbance forces
used to test the controller before and after training are identical.
Fig. 6. A representation of the learned memory at 1000, 2000, and 10000 iterations for the continuous decision online learning case. The black ring represents
the range of locations of the Center of Mass for the given graph. The blue rectangle represents the foot location. Each plotted point is a Linear Inverted
Pendulum Capture Point of a gridded state. The color of the point represents the magnitude of the offset from the Linear Inverted Pendulum Capture Point
to the corresponding Capture Point in memory. Blue dots represent offset magnitudes of less than 3 centimeters, dark red represents a 90 centimeter offset.

A. What was Learned? We are now expanding our technique to learn to recover
from pushes that occur at random times during walking. This
The intent of this study was to learn Capture Points for
effort may require more complex learning techniques that
Humanoid Push Recovery. While we achieved improved push
perform good generalization over a large range of state space.
recovery performance through learning, it is interesting to
However, by starting with the controller that is learned using
think about what was really learned. While the Stepping
the techniques presented in this paper, learning for the more
Control System does a reasonable job of stepping to the
general case should be sped up.
desired step location, it does have some errors. Therefore,
in addition to correcting for the deviation from the Linear C. Improving the Stopping Controller
Inverted Pendulum Model, we also adapt to the errors resulting
In this study, we used a Stopping Controller that utilizes foot
from the Stepping Control System. In other words, we learned
Center of Pressure regulation as the main strategy for stopping
desired stepping locations that result in steps to actual Capture
the robot and balancing over the foot. Our Stopping Controller
Points, even if the actual step is away from the desired step.
did not use acceleration of internal angular momentum (lung-
ing the body or windmilling the arms), or modification of the
B. Generalizing to More States
Center of Mass trajectory to assist in stopping. These two
In this study we limited the initial setup of the robot to neglected strategies can be quite powerful both in expanding
start standing and balancing on one foot with the other foot the range of pushes that a biped can recover from and in
slightly off the ground. Then a disturbance force was applied. expanding the Capture Region for a given push.
By limiting the initial setup to these conditions, the effective We did not use these two strategies in this study simply
size of the space that needed to be generalized over was because we do not have a good stopping controller that uses
relatively small. Since only the magnitude and direction of these strategies yet. We are currently developing such Stopping
the disturbance push were varied between trials, only a two Controllers and have shown preliminary work for planar
dimensional subspace of the full state space was visited. This bipeds [26], [27]. Because we set up the problem formulation
limited exploration of state space allowed us to avoid the Curse in this paper such that any reasonable stopping controller can
of Dimensionality and easily learn Capture Point locations be used, as our Stopping Controllers are improved, we should
using standard function approximators and low dimensional be able to use them in the current framework without requiring
state space griddings. modification to the framework.
This limited approach is justifiable as a first step toward
general push recovery since the learned Capture Point loca- D. Application to a Real Robot
tions can be used as a baseline for a broader range of state We are currently building a humanoid robot with inertial
space exploration. Similarly to a toddler learning to walk, the parameters that are close to the simulation model used for this
ability to recover from a push when standing can be built on paper. The robot should be operational in the Summer of 2008.
to develop the ability to recover from pushes while walking. The first experiments on this robot will be for humanoid push
recovery, including reproducing the results from this paper on [20] J. Morimoto, G. Cheng, G. Atkeson, and G. Zeglin. A simple re-
the real robot. inforcement learning algorithm for biped locomotion. Proceedings of
the IEEE International Conference on Robotics and Automation, pages
3030–3035, 2004.
[21] J. Morimoto, J. Nakanishi, G. Endo, G. Cheng, C.G. Atkeson, and
R EFERENCES G. Zeglin. Poincare-map-based reinforcement learning for biped walk-
ing. Proceedings of the IEEE International Conference on Robotics and
[1] Chee Meng Chew and G.A. Pratt. Frontal plane algorithms for dynamic Automation, pages 2381–2386, 2005.
bipedal walking. Robotica, 22(1):29–39, 2004. [22] J. Morimoto, G. Zeglin, and C.G. Atkeson. Minimax differential
[2] Chee Meng Chew and Gill A. Pratt. Dynamic bipedal walking assisted dynamic programming: application to a biped walking robot. SICE 2003
by learning. Robotica, 20(5):477–491, 2002. Annual Conference, pages 2310–2315, 2003.
[3] T. Fukuda, Y. Komata, and T. Arakawa. Stabilization control of biped [23] Jun Morimoto and Christopher G. Atkeson. Learning biped locomo-
locomotion robot based learning with gas having self-adaptive mutation tion: Application of poincare-map-based reinforcement learning. IEEE
and recurrent neural networks. Proceedings of the IEEE International Robotics and Automation Magazine, pages 41–51, 2007.
Conference on Robotics and Automation, pages 217–222, 1997. [24] Masaki Ogino, Yutaka Katoh, Masahiro Aono, Minoru Asada, and Koh
[4] K. Hitomi, T. Shibata, Y. Nakamura, and S. Ishii. On-line learning of a Hosoda. Vision-based reinforcement learning for humanoid behavior
feedback controller for quasi-passive-dynamic walking by a stochastic generation with rhythmic walking parameters. Proceedings of the
policy gradient method. Proceedings of the IEEE/RSJ International IEEE/RSJ International Conference on Intelligent Robots and Systems,
Conference on Intelligent Robots and Systems, pages 3803–3808, 2005. pages 1665–1671, 2003.
[5] Andreas Hofmann. A sliding controller for bipedal balancing using [25] Jerry E. Pratt. Exploiting Inherent Robustness and Natural Dynamics
integrated movement of contact and non-contact limbs. Proceedings in the Control of Bipedal Walking Robots. PhD thesis, Massachussetts
of the IEEE/RSJ International Conference on Intelligent Robots and Institute of Technology, May 2000.
Systems, pages 1952–1959, 2004. [26] Jerry E. Pratt, John Carff, Sergey Drakunov, and Ambarish Goswami.
[6] Andreas Hofmann. Robust Execution of Bipedal Walking Tasks From Capture point: A step toward humanoid push recovery. Proceedings
Biomechanical Principles. PhD thesis, Massachusetts Institute of Tech- of the IEEE-RAS International Conference on Humanoid Robots, pages
nology, 2006. 200–207, 2006.
[27] Jerry E. Pratt and Sergey V. Drakunov. Derivation and application of
[7] Sang-Ho Hyon and Gordon Cheng. Gravity compensation and full-
a conserved orbital energy for the inverted pendulum bipedal walking
body balancing for humanoid robots. Proceedings of the IEEE-RAS
model. Proceedings of the IEEE International Conference on Robotics
International Conference on Humanoid Robots, pages 214–221, 2006.
and Automation, pages 4653–4660, 2007.
[8] Sang-Ho Hyon and Gordon Cheng. Disturbance rejection for biped
[28] Jerry E. Pratt and Russ Tedrake. Velocity based stability margins for
humanoids. Proceedings of the IEEE International Conference on
fast bipedal walking. In Fast Motions in Robotics and Biomechanics -
Robotics and Automation, pages 2668–2675, 2007.
Optimization and Feedback Control.M.Diehl, K.D.Mombaur (editors),
[9] S. Kajita, F. Kanehiro, K. Kaneko, K. Yokoi, and H. Hirukawa. The Lecture Notes in Information Science, Springer, No.340, Sept.2006.,
3d linear inverted pendulum mode: a simple modeling for a biped 2006.
walking pattern generation. Proceedings of the IEEE/RSJ International [29] E. Schuitema, D.G.E. Hobbelen, P.P. Jonker, M. Wisse, and J.G.D.
Conference on Intelligent Robots and Systems, pages 239–246, 2001. Karssen. Using a controller based on reinforcement learning for a passive
[10] Jae Won Kho, Dong Cheol Lim, and Tae Yong. Kuc. Implementation of dynamic walking robot. Proceedings of the IEEE-RAS International
an intelligent controller for biped walking using genetic algorithm. Pro- Conference on Humanoid Robots, pages 232–237, 2005.
ceeings of the IEEE International Symposium on Industrial Electronics, [30] Russ Tedrake, Teresa Weirui Zhang, and H.Sebastian Seung. Stochastic
pages 49–54, 2006. policy gradient reinforcement learning on a simple 3d biped. Proceed-
[11] Jaewon Kho and Dongcheol Lim. A learning controller for repetitive ings of the IEEE International Conference on Intelligent Robots and
gait control of biped walking robot. Proceedings of the SICE Annual Systems, pages 2849–2854, 2004.
Conference, pages 885–889, 2004. [31] Shouyi Wang, Jelmer Braaksma, Robert Babuska, and Daan Hobbelen.
[12] T. Komura, H. Leung, S. Kudoh, and J. Kuffner. A feedback controller Reinforcement learning control for biped robot walking on uneven
for biped humanoids that can counteract large perturbations during gait. surfaces. Proceedings of the 2006 International Joint Conference on
Proceedings of the IEEE International Conference on Robotics and Neural Networks, pages 4173–4178, 2006.
Automation, pages 2001–2007, 2005. [32] Changjiu Zhou and Qingchun Meng. Reinforcement learning with
[13] T.Y. Kuc, S.M. Baek, H.G. Lee, and J.O. Kim. Fourier series learning fuzzy evaluative feedback for a biped robot. Proceedings of the IEEE
of biped walking motion. Proceedings of the 2002 IEEE International International Conference on Robotics and Automation, pages 3829–
Symposium on Intelligent Control, pages 49–54, 2002. 3834, 2000.
[14] Qinghua Li, Atsuo Takanishi, and Ichiro Kato. Learning control for a
biped walking robot with a trunk. Proceedings of the 1993 IEEE/RSJ
International Conference on Intelligent Robots and Systems., pages
1771–1777, 1993.
[15] Qinghua Li, Atsuo Takanishi, and Ichiro Kato. Learning of robot biped
walking with the cooperation of human. Proceedings of the IEEE
International Workshop on Robot and Human Communication, pages
393–397, 1993.
[16] Thijs Mandersloot, Martijn Wisse, and Christopher G. Atkeson. Con-
trolling velocity in bipedal walking: A dynamic programming approach.
Proceedings of the IEEE-RAS International Conference on Humanoid
Robots, pages 124–130, 2006.
[17] Yong Mao, Jiaxin Wang, Peifa Jia, Shi Li, Zhen Qiu, Le Zhang, and
Zhuo Han. A reinforcement learning based dynamic walking control.
Proceedings of the IEEE International Conference on Robotics and
Automation, pages 3609–3614, 2007.
[18] W.T. Miller. Learning dynamic balance of a biped walking robot.
Proceedings of the 1994 IEEE International Conference on Neural
Networks.IEEE World Congress on Computational Intelligence, pages
2771–2776, 1994.
[19] T. Mori, Y. Nakamura, M. Sato, and S. Ishii. Reinforcement learning
for a cpg-driven biped robot. Proceedings of the Nineteenth National
Conference on Artificial Intelligence, AAAI, 2004.

Você também pode gostar