Signature Redacted

Autonomous Navigation in Unknown Environments
Using Machine Learning

by
Charles Andrew Richter
Submitted to the Department of Aeronautics and Astronautics
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy KCHUM
MASS 0 ETNSTITUTE
TEHNOLOGY
at the
JUL 112017
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
LIBRARIES
June 2017
ARCHIVES
@ Massachusetts Institute of Technology 2017. All rights reserved.
Signature redacted
Author . . . . . . . . . . . . . . . ..
.
Department of Aeronautics and Astronautics
May 25, 2017
Certified by.... Signature redacted
Nicholas Roy
,Professor of Aeroautics and Astronautics
Certified by. .... Signature redacted,................
Tnmni's Lo7ano-P0re7
Professor of Elgtrical Engin ering and Computer Science
Certified by. ...Signature redacted.......................
Sertac Karaman
Associate Pffess f AeronXutics and Astronautics
Certified by. Signature redacted.......................
Ingmar Posner
Assoc iate Professor of EngineeringScie}pe, University of Oxford
Accepted by ... Signature redacted . .........
.
6/ Youssef Marzouk
Chairman, Department Committee on Graduate Theses
2
Autonomous Navigation in Unknown Environments Using
Machine Learning
by
Charles Andrew Richter
Submitted to the Department of Aeronautics and Astronautics

on May 25, 2017, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Abstract
In this thesis, we explore the problem of high-speed autonomous navigation for a
dynamic mobile robot in unknown environments. Our objective is to navigate from
start to goal in minimum time, given no prior knowledge of the map, nor any explicit
knowledge of the environment distribution. Faced with this challenge, most practical
receding-horizon navigation methods simply restrict their action choices to the known
portions of the map, and ignore the effects that future observations will have on their
map knowledge, sacrificing performance as a result. In this thesis, we overcome
these limitations by efficiently extending the robot's reasoning into unknown parts
of the environment through supervised learning. We predict key contributors to the
navigation cost before the relevant portions of the environment have been observed,
using training examples from similar planning scenarios of interest.
Our first contribution is to develop a model of collision probability to predict the
outcomes of actions that extend beyond the perceptual horizon. We use this collision
probability model as a data-driven replacement for conventional safety constraints in a
receding-horizon planner, resulting in collision-free navigation at speeds up to twice as
fast as conventional planners. We make these predictions using a Bayesian approach,
leveraging training data for performance in familiar situations, and automatically
reverting to safe prior behavior in novel situations for which our model is untrained.
Our second contribution is to develop a model of future measurement utility, efficiently
enabling information-gathering behaviors that can extend the robot's visibility far into
unknown regions of the environment, thereby lengthening the perceptual horizon,
resulting in faster navigation even under conventional safety constraints. Our third
contribution is to adapt our collision prediction methods to operate on raw camera
images, using deep neural networks. By making predictions directly from images, we
take advantage of rich appearance-based information well beyond the range to which
dense, accurate environment geometry can be reliably estimated. Pairing this neural
network with novelty detection and a self-supervised labeling technique, we show that
we can deploy our system initially with no training, and it will continually improve
with experience and expand the set of environment types with which it is familiar.
3
Thesis Supervisor: Nicholas Roy
Title: Professor of Aeronautics and Astronautics
Thesis Committee Member: Tomis Lozano-Phrez

Title: Professor of Electrical Engineering and Computer Science
Thesis Committee Member: Sertac Karaman

Title: Associate Professor of Aeronautics and Astronautics
Thesis Committee Member: Ingmar Posner

Title: Associate Professor of Engineering Science, University of Oxford
4
Acknowledgments
It has been an honor and a great privilege to be a student at MIT, in the Robust
Robotics Group and the MIT robotics community, for the last six years. There are
many people I would like to acknowledge for making MIT such a great experience.
First and foremost, I would like to thank my advisor, Professor Nicholas Roy, for his
insightful guidance at every stage of the research process, and for pointing me toward
fascinating problems that would turn out to be productive lines of inquiry. I would
also like to thank Nick for his constant and genuine support and sense of humor,
which made the research process so much fun. I feel extremely lucky to have Nick as
my advisor and to be a part of the Robust Robotics Group.
I would also like to thank my thesis committee for asking the challenging, funda-
mental questions that guided me to define and strengthen the research in this thesis.
Professor Tomis Lozano-Perez encouraged me to clearly state the underlying prob-
lem in its most general form in order to build practical methods on a solid theoretical
foundation. Professor Sertac Karaman encouraged me to always start with the sim-
plest instance of the problem containing the characteristics of interest. And Professor
Ingmar Posner encouraged me to carefully examine the machine learning algorithms
in this thesis to understand how and why they know what they know. These pieces of
advice, along with all of our other great discussions, have clarified my understanding
of my own work tremendously, and helped to define my approach to the research
process in general. It has been a privilege to learn from such a great committee.
I would like to thank my labmates in the Robust Robotics Group and the MIT
robotics community for being such an inspiring group of colleagues to work with.
We have shared many great memories, flying autonomous drones and driving robots
through the Stata Center late at night, and hashing out the deep questions of artificial
intelligence and robotics every day. I would also like to thank all of my MIT friends,
who made these years so special. I will always remember the great times we have
had, and I look forward to many more. Finally, I would like to thank my family for
supporting and encouraging my engineering pursuits, and everything else, all along.
5
6
Contents
1 Introduction 15
1.1 Safe Planning in Unknown Environments . . . . . . . . . . . . . . . . 17
1.2 Predicting Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3 Predicting the Utility of Future Measurements . . . . . . . . . . . . . 27
1.4 Learning Predictive Features from Raw Data . . . . . . . . . . . . . . 30
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.6 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2 Background 35
2.1 Safe Navigation for Dynamic Mobile Robots . . . . . . . . . . . . . . 35
2.1.1 Contingency Plans and Emergency-Stop Maneuvers . . . . . . 36
2.1.2 Probabilistic Notions of Safety . . . . . . . . . . . . . . . . . . 37
2.2 Navigating Unknown Environments . . . . . . . . . . . . . . . . . . . 39
2.2.1 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2.2 Incremental Re-Planning in an Unfolding Map . . . . . . . . . 41
2.2.3 POMDPs for Navigation in Unknown Environments . . . . . . 43
2.3 Learning to Control Highly Dynamic Robots . . . . . . . . . . . . . . 45
2.3.1 Learning Under Uncertainty of the System Dynamics . . . . . 46
2.3.2 Learning From Demonstrations . . . . . . . . . . . . . . . . . 48
2.3.3 Safe Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4 Learning Visual Perception for Navigation . . . . . . . . . . . . . . . 50
2.4.1 Near-to-Far Self-Supervised Learning . . . . . . . . . . . . . . 51
2.4.2 Supervised Learning of Visual Models for Planning . . . . . . 52
7
2.4.3 Mapping Images Directly to Actions . . . . . . . . . . . . . . 55
2.4.4 Quantifying Uncertainty in Learned Visual Models . . . . . . 57
3 Predicting Collisions from Features of a Geometric Map 65

3.1 POMDP Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1.1 Missing Prior Distribution Over Environments . . . . . . . . . 70
3.1.2 Approximations to the POMDP . . . . . . . . . . . . . . . . . 71
3.2 Predicting Future Collisions . . . . . . . . . . . . . . . . . . . . . . . 74
3.2.1 Learning Problem . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.2 Advantage of the Bayesian Approach . . . . . . . . . . . . . . 77
3.2.3 Training Procedure . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3 R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.3.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 82
3.3.2 Demonstrations on Autonomous RC Car . . . . . . . . . . . . 92
3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4 Predicting the Utility of Future Measurements 105

4.1 Measurement Utility Modeling Approach . . . . . . . . . . . . . . . . 107
4.1.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.1.2 Prediction and Planning with the Learned Model . . . . . . . 110
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.2.1 Learned M odels . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.2.2 Simulation Success Rates . . . . . . . . . . . . . . . . . . . . . 112
4.2.3 Simulation Time-to-Goal as a Function of Visibility . . . . . . 114
4.2.4 Experimental Demonstration on Autonomous RC Car . . . . . 116
4.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5 Predicting Collisions from Learned Features of Raw Sensor Data 121

5.1 Visual Navigation Problem Formulation . . . . . . . . . . . . . . . . 124
5.2 Learning to Predict Collisions . . . . . . . . . . . . . . . . . . . . . . 127
5.2.1 Labeling Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8
5.2.2 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2.3 Prior Estimate of Collision Probability . . . . . . . . . . . . . 132
5.3 Novelty Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.3.1 MNIST Examples . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3.2 Robot Navigation Examples . . . . . . . . . . . . . . . . . . . 137
5.4 R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.4.1 Visualizing Neural Network Predictions . . . . . . . . . . . . . 139
5.4.2 Neural Network Compared to Hand-Engineered Features . . . 141
5.4.3 Online Learning Simulation Results . . . . . . . . . . . . . . . 142
5.4.4 Experimental Results using LIDAR . . . . . . . . . . . . . . . 143
5.4.5 Experimental Results using Monocular VIO and RGBD . . . . 145
5.5 Generalization in Real Environments . . . . . . . . . . . . . . . . . . 146
5.5.1 Demonstrating Generalization . . . . . . . . . . . . . . . . . . 147
5.5.2 What Makes a Hallway Image Appear Novel? . . . . . . . . . 151
5.5.3 Reconstruction Examples . . . . . . . . . . . . . . . . . . . . . 152
5.6 Limitations and Future Directions for Novelty Detection Methods . . 154
5.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6 Conclusion 159
6.1 Future Work: Predicting Collisions . . . . . . . . . . . . . . . . . . . 160
6.2 Future Work: Predicting Measurement Utility . . . . . . . . . . . . . 161
6.3 Future Work: Learning Predictive Features . . . . . . . . . . . . . . . 162
9
10
List of Figures
1-1 Baseline planner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
.
1-2 Planning with collision prediction. . . . . . . . . . . . . . . . . . . . 22
.
1-3 Planning with measurement utility prediction. . . . . . . . . . . . . . 28
3-1 Pre-computed action set. . . . . . . . . . . . . . . . . . 68
.
3-2 Collision labeling example. . . . . . . . . . . . . . . . . 80
.
3-3 Example hallway and forest simulation maps. . . . . . 82
.
3-4 Basic simulation results. . . . . . . . . . . . . . . . . . 83
.
3-5 Example simulation trajectories. . . . . . . . . . . . . . 84
Performance as a function of data volume. . . . . . . . .
3-6 86
.
3-7 Performance in simulated hallways as a function of sensor configuration. 87

3-8 Effect of sensor field-of-view on the baseline planner in a hallway. . 88
3-9 Performance in simulated forests as a function of sensor configuration. . 89
3-10 Effect of sensor range on the baseline planner in a forest. . . . . . . 90
.
3-11 Simulated transition between familiar and unfamiliar environment types. 91

3-12 Experimental autonomous RC car. . . . . . . . . . . . . . . . . . . 92
.
3-13 Hallway-type experiment. . . . . . . . . . . . . . . . . . . . . . . . 93

.
3-14 Experiment with restricted sensor range and field-of-view. . . . . . 96

.
3-15 Experiment traversing multiple environment types. . . . . . . . . . 98

.
3-16 M aximum speed test. . . . . . . . . . . . . . . . . . . . . . . . . . . 99

.
3-17 Example of wheel slip during an experiment. . . . . . . . . . . . . . 100

.
4-1 Baseline planner. . . . . . . . . . . . . . . . . . . . . . . 106

.
4-2 Measurement utility training procedure. . . . . . . . . . 109

.
11
4-3 Example learned models of measurement utility. . . . . . . . . . . . . 111
4-4 Baseline planner trapped having not observed key regions of the map. 113
4-5 Simulation results for measurement utility prediction. . . . . . . . . . 114
4-6 Planning with measurement utility prediction. . . . . . . . . . . . . . 115
4-7 Trajectories of planner while predicting measurement utility vs. baseline. 116
4-8 Experimental trials of measurement utility prediction. . . . . . . . . . 117
5-1 Training examples for image-based collision prediction. . . . . . . . . 130

5-2 Autoencoder demonstration on MNIST and notMNIST datasets. . . . 137
5-3 Autoencoder demonstration on a hallway and a forest. . . . . . . . .. 138
5-4 Histogram of autoencoder loss on hallway training images. . . . . . . 138
5-5 Neural network collision prediction examples. . . . . . . . . . . . . . 140
5-6 Neural network performance comparison to geometric features. . . . . 141
5-7 Simulation results of image-based collision prediction. . . . . . . . . . 142
5-8 Environment used for image-based collision prediction experiments. 144
5-9 Experimental results of image-based collision prediction. . . . . . . . 144
5-10 Novel environment used for image-based collision predictionexperiments. 144
5-11 Images taken from image-based collision prediction experiments. . . . 145
5-12 Histogram of autoencoder loss for experimental environments. .... 145
5-13 MIT hallways used for neural network generalization experiments. 147
5-14 Neural network generalization results. . . . . . . . . . . . . . . . . . . 148
5-15 Example of a novel segment of a hallway image. . . . . . . . . . . . . 151
5-16 Example of a familiar segment of a hallway image. . . . . . . . . . . . 152
5-17 Autoencoder weights displayed as image filters. . . . . . . . . . . . . 153
5-18 Examples of inputs and reconstructions from non-hallway environments. 154
12
List of Tables
3.1 Collision prediction performance in a hallway environment. . . . . . . 94

3.2 Collision prediction performance with restricted sensing. . . . . . . . 96
4.1 Measurement utility prediction performance. . . . . . . . . . . . . . . 117
13
14
Chapter 1
Introduction
In this thesis, we study the problem of autonomous navigation through an environ-
ment that is initially unknown, with the objective of reaching a goal in minimum
time. Without prior knowledge of the map, a fast moving robot must perceive its
surroundings through onboard sensors and make instantaneous decisions to react to
obstacles as they come into view. This problem lies at the intersection of several
widely studied areas of robotics, including motion planning, perception, and explo-
ration. Yet, the particular task of navigating as fast as possible through an unknown
environment gives rise to novel and challenging questions because it requires the robot
to select actions partially based on assumptions and predictions about the structure
of the environment that it has not yet perceived.
As we will observe in this thesis, standard motion planning techniques often limit
performance to be conservative when deployed in unknown environments, where every
unexplored region of the map may, in the worst case, pose a hazard. At the same time,
standard planners do not typically take deliberate actions to observe key unknown
regions, unless those planners have been designed for the task of mapping rather than
high-speed goal-directed navigation. In order to guarantee that the robot will not
collide with potential obstacles that may lie beyond the boundaries of the known
map, conventional planners limit the robot's speed such that it could come to a stop,
if need be, before entering the unknown. However, this constraint implies a worst-
case assumption that is rarely true for man-made and natural environments. Very
15
often it is indeed possible to plan actions that commit the robot to entering unknown
regions, if guided by relevant experience suggesting that such actions will be safe.
Furthermore, it is also possible, given some intuitive guidance, to plan in anticipation
of advantageous future sensor views and project the knowledge horizon as far ahead
as possible in order to enable faster navigation. As a result, there is a substantial
gap between the limited navigation speeds achievable under standard constraints and
assumptions, and what is truly possible for an optimal planner. In most cases, we do
not even know where the limit of possibility lies, since it depends heavily on many
factors like the physical properties of the environment, the physical limitations of the
robot, and the types of knowledge and learning the robot can use.
Nevertheless, we have strong evidence that extremely agile high-speed naviga-

tion, far exceeding today's state-of-the-art robots, is fundamentally possible. For
instance, consider the goshawk, a bird of prey whose formidable hunting style fea-
tures high-speed flight through dense woodland and nimble, aerobatic dives through
tiny passages in the vegetation. Or, consider aerial drones racing at speeds up to
15 meters per second through cluttered buildings and obstacle courses, controlled by
expert human pilots using only a first-person onboard camera view. What is perhaps
most fascinating about these examples is that animals and human pilots do not have
the same types of high-precision sensing and computation as modern robotic systems
and instead must make decisions based on a learned or intuitive understanding of the
ways their actions and perceptual capabilities work in familiar types of environments.
One of the goals of this thesis is to develop autonomous robotic planners and
perception algorithms that help us to get closer to the levels of agility observed in
nature and demonstrated by expert human pilots. We aim to develop a practical
understanding of how fast it is possible to navigate with respect to a continually
unfolding perceptual horizon, in which obstacles may appear from behind occlusion
or come into range of the sensor. An intelligent planner must strike a careful balance
between driving toward the goal as aggressively as possible and anticipating the risk of
collision with yet-unseen obstacles and structures. This balance is a complex function
of the vehicle's maneuverability at different speeds, the likelihood and placement of
16
obstacles that may appear, and the amount of time and distance that will be available
to react as those obstacles come into view.
All of the problems we will study in this thesis take the form of minimum-time
navigation to a specified goal location. We will begin by demonstrating how con-
ventional methods make overly conservative assumptions about the world, limiting
performance in order to guarantee safety. Next, we will show how we can replace
those assumptions with more accurate models of collision probability, learned from
experience in the environment types of interest, thereby enabling faster navigation
while remaining collision-free. Then, we will show that we can apply a similar learn-
ing technique to predict measurement utility, guiding robots to obtain advantageous
sensor views that increase the range at which they can anticipate obstacles. Finally,
we will show how we can learn feature representations for these learning problems
directly from raw sensor data. We briefly outline these topics next.
1.1 Safe Planning in Unknown Environments
Faced with a partially unknown environment, a mobile robot must use its onboard
sensors to perceive obstacles and incrementally update its internal map representation
with each new piece of information. As the robot moves, observing more and more of
the environment, it gradually expands the known regions of its map. However, if the
map is incomplete, it is generally not possible to plan a detailed trajectory from the
robot's location all the way to the goal, until the goal has come into view. Instead,
a common planning strategy in unknown environments, and the technique we will
use in this thesis, is the receding-horizon approach. In receding-horizon planning, the
robot selects a partial trajectory extending to some short planning horizon given the
current (partial) map knowledge, and begins to execute it while continuing to observe
its surroundings and update its map. By continually re-planning, the robot can react
to new information as it is observed and revise its trajectory.
However, avoiding collision in a receding-horizon setting can be difficult. The
planner must not only ensure that its chosen actions are collision-free within the
17
(a) 4 m/s. (b) 1 m/s. (C) 1 m/s.
Figure 1-1: A sequence of actions chosen for a robot approaching a blind corner while
guaranteeing safety with a 60' sensor field-of-view. Chosen trajectories are drawn
in blue, and feasible emergency-stop maneuvers are drawn in red. Black rectangles
illustrate the outline of the robot, indicating the size of the required free space around
each feasible action and emergency-stop action. The safety constraint requires the
robot to slow from 4 m/s in (a) to 1 m/s in (b) to preserve the ability to stop
within known free space. In (c), the planner must proceed slowly until it is able
to complete its turn, pivoting its field-of-view (illustrated by the black cone), and
observing more free space ahead. This is the guaranteed-safe "baseline" planner we
will use for comparison in this thesis.
planning horizon, but must also ensure that there will exist collision-free options
beyond the end of the planned trajectory. More concretely, to guarantee safety,
a receding-horizon planner must plan trajectories that are known to be collision-
free for all time 136]. This constraint is often satisfied by ensuring the existence
of a collision-free emergency-stop maneuver or safe holding pattern that could be
executed, if need be, after the current action has been completed [6, 72, 105, 1231.
Additionally, if the map is only partially known, a planner constrained to guarantee
safety must conservatively assume that every unknown map region may, in the worst
case, contain an obstacle, and therefore must confine its trajectories and emergency-
stop maneuvers to lie within observed free space. This observed free space may, in
turn, be limited by sensor range and occlusion.
Figure 1-1 illustrates this scenario for a robot approaching a blind corner while
guaranteeing safety with respect to the perceived map. In these figures, the white
regions represent observed free space, the black regions are observed obstacles, and
18
the gray regions are unknown. Light gray indicates the true hidden structure of the
hallway around the blind corner, which has not yet been observed. The straight black
lines indicate the robot's 60' sensor field-of-view, which approximately corresponds to
a common RGBD camera, although the general principles described here also affect
any form of geometric sensing, including 3600 scanning LIDAR.
In Figure 1-1(a), for example, the blue line represents the chosen trajectory from
the robot's current state to a planning horizon of 2 m. The red line extending from
the end of the blue trajectory is a feasible emergency-stop action that would bring
the robot to a stop without collision, entirely within known free space. The black
rectangles illustrate the bounding box of the robot at the beginning and end of each
feasible emergency-stop action to show that it lies entirely within known free space.
This red emergency-stop action would be executed if, as the robot proceeds along
its chosen trajectory, all subsequent re-planning attempts fail to find a feasible tra-
jectory with its own subsequent emergency-stop action. Requiring the planner to
guarantee a feasible emergency-stop action places an implicit limit on the robot's
speed since higher speeds require greater stopping distances. Assuming constant de-
celeration due to braking, the stopping distance grows quadratically with speed, which
makes this constraint more and more restrictive as the robot moves faster. As the
robot approaches the corner, it must slow from 4 m/s to 1 m/s to preserve its ability
to complete a feasible emergency-stop maneuver before entering unknown space.
We refer to this guaranteed-safe planning approach as the "baseline" planner and

use it for comparison in this thesis. While Figure 1-1 illustrates the baseline planner
with a 60' field-of-view, we will explore a variety of sensor ranges and fields-of-view.
When we perform a comparison, we will impose the same sensor limitations on the
baseline planner as on our method. The baseline planner represents state-of-the-art
planning in unknown environments without the benefit of any additional domain-
specific information or hand-engineered behaviors. It is simply designed to drive
toward the goal as fast as possible while observing a safety constraint that ensures
that it can always stop before colliding with observed obstacles or entering unknown
regions of the map. This baseline planner repeatedly selects actions according to the
19
following optimization problem:
a*(bt) =argmin{Ja(at) + h(bt, at)} (1.1)

at EA
s.t. g(bt, at) = 0. (1.2)
Here, we select an action a* at time t, from a set A of pre-computed motion primitives,

that minimizes the cost function in equation (1.1), subject to a safety constraint
expressed in equation (1.2). This choice of action depends on the robot's belief, bt,
which contains its knowledge of its own state as well as its (partial) knowledge of the
map accumulated from sensor measurements obtained up to time t. We will discuss
the belief bt in greater detail when we introduce the problem description in Chapter 3.
The objective function, equation (1.1), is an estimate of the time required to reach
the goal if the planner selects action at, given belief bt. However, since the environment
between the robot's position and the goal is largely unknown, we can only estimate
most of the remaining time. Therefore, we split the cost into two terms that facilitate
this approximation. The first term in the cost function, Ja(at), represents the time
duration of action at. The second term, h(bt, at), represents a heuristic estimate of
the remaining time required to reach the goal after at has been completed. This
heuristic can be computed, for example, by assuming that the robot will maintain its
current speed along the shortest piecewise straight-line path to the goal, respecting
the observed obstacles but assuming that all unknown parts of the map are free and
traversable. Roughly speaking, minimizing Ja(at) encourages the robot to go fast,
while minimizing h(bt, at) encourages the robot to drive toward the goal.
This heuristic is simple and efficient, but is a poor approximation in a few key
respects that we will explore in this thesis. It fails to account for the future evolu-
tion of the robot's dynamics or map knowledge, although doing so would amount to
solving an intractable continuous-valued partially observable Markov decision process
(POMDP), which is a formidable challenge. Since this heuristic attempts to minimize
time (and therefore distance) to the goal, it has a tendency to guide the robot toward
the inside of turns, as seen in Figure 1-1, which is disadvantageous both for vehicle
20
maneuverability and for sensor visibility. The techniques we will outline below can
be interpreted as ways to improve or augment this heuristic to make it more closely
approximate the true, optimal cost-to-goal.
One of the key components of the optimization problem in equations (1.1) and (1.2),
and the element that restrains the robot from recklessly traveling at its maximum
speed at all times, is the constraint g(bt, at). We define g(bt, at) = 0 if there exists a
feasible emergency-stop maneuver that can bring the robot to rest within the known
free space contained in bt after completing action at, and g(bt, at) = 1 otherwise. This
constraint imposes the speed limit observed in Figure 1-1.
While the constraint g(bt, at) = 0 requires that there must exist a collision-free
emergency-stop trajectory beyond the end of action at, we also perform conventional
collision checking along the length of at itself, distinct from g(bt, at). We assume that
the robot will always have some knowledge of obstacles in its immediate vicinity, and
should not consider actions known to intersect them. Since conventional collision
checking is enforced throughout this thesis, we omit it from equation (1.2), and all
subsequent optimization problem statements. However, we will state the constraint
g(bt, at) explicitly in order to clarify when it is being enforced and when it is not.
1.2 Predicting Collisions
Conventional planning constraints, derived from worst-case assumptions that any

unknown region of the environment may contain an obstacle, imply that driving
into unknown regions is dangerous. However, one of our central claims is that this
assumption may be overly conservative. In many cases, the robot can drive at high
speeds into the unknown if it is equipped with an accurate model of the risk associated
with such actions, based on previous experience in similar situations. For instance,
many buildings have straight hallways with 90' turns and intersections. If a robot
observes a hallway with a turn, and has trained in similar environments, it should be
able to reason with high certainty that the hallway will continue, and that there will be
free space around the turn. Figure 1-2 shows our proposed solution planning a high-
21
Figure 1-2: Planning a trajectory (blue) that rounds a blind corner at 6 m/s using a
learned model of collision probability. The true, hidden map is illustrated in light gray.
The set of potential emergency-stop maneuvers (black) all enter unknown space, so
this trajectory would be disallowed by a safety constraint, however several of them are
indeed collision-free with respect to the true map, indicating that the planned action
is correctly predicted to be feasible. This is the planner we develop in Chapter 3.
speed turn around a blind corner at 6 m/s, much faster than the safety-constrained
baseline planner would allow. The set of emergency-stop trajectories (illustrated in
black) from the chosen action (blue) all enter unknown space, indicating that the
chosen action is unavoidably committing the robot to entering the unknown. Yet,
this type of maneuver is often empirically collision-free for hallway environments.
Our solution correctly infers that this maneuver is not excessively risky. Indeed,
Figure 1-2 shows that some of the emergency-stop trajectories lie entirely within the
free space around the turn (light gray) and are collision-free, although this free space
has not yet been observed. Many other common environment structures make up the
natural and man-made environments we frequently navigate, and these structures can
be exploited by planners that are familiar with them.
In general, there are several types of assumptions that a robot could make about
the unknown parts of the environment, given the set of partial observations already
obtained. For instance, the robot could optimistically assume that all unobserved
regions are free and traversable, implying the existence of many feasible routes to
the goal. This assumption could lead to very reckless planning, since it would imply
22
that the robot could drive at full speed without risking collision. Or, like the base-
line planner discussed in Section 1.1, the robot could conservatively assume that all
unobserved regions may be occupied, implying that there are no feasible trajectories
beyond the free space that has already been observed. Bachrach [8] provides a com-
pelling illustration of these two types of assumptions and their effects on planning.
However, in almost all realistic environments, the reality lies somewhere in between
these two extremes. Therefore, in order to plan trajectories that are likely to avoid
obstacles and lead to the goal faster than the conservative assumptions of the baseline
planner would allow, the robot must perform some form of inference over the unknown
regions of the environment. The objective of this inference problem is ultimately
to determine which action choices will safely lead to the goal, and which action
choices are infeasible with respect to yet-unseen obstacles. Performing this inference
accurately will enable the planner to exploit the difference between the conservative
assumptions of the baseline planner and the true characteristics of the environment.
One way to estimate the feasibility of actions through unknown space might be
to directly predict the occupancy of the unknown parts of the environment, using
sensor measurements combined with an accurate prior distribution over maps. Using
an accurate prior, a measurement of some portion of the environment would allow the
robot to infer something about the adjacent regions. This inference problem would be
high-dimensional and computationally expensive, but in principle, it could be done.
Indeed, POMDP solvers perform this type of inference in order to compute optimal
policies by effectively searching forward through every possible action-observation
sequence to enumerate the probabilities of all possible future maps the robot might
encounter. However, it is only possible to perform this kind of inference for very small
or coarsely discretized problems, rendering these methods inapplicable for realistic
dynamical systems with continuous-valued states and observations.
Perhaps a more fundamental difficulty, however, is that we do not know the dis-
tribution over real-world environments in which we would like our robots to operate.
Accurately modeling real-world environment distributions would be extremely diffi-
cult due to the high dimensionality of building-sized occupancy grid maps, the strong
23
assumption of independence between map cells, and the richness of man-made and
natural spaces that resist compact parameterization. Without significant modeling
effort, or restriction of the problem domain to specific types of environments, the best
prior knowledge we can reasonably provide is to say that every map is equally likely.
Of course, this prior is very inaccurate and completely unhelpful for planning, and
prevents the agent from exploiting any intuitive knowledge of "typical" environments.
To compensate for this missing knowledge of the environment distribution, we

reformulate the problem from high-dimensional map inference to direct prediction of
local action feasibility-in other words, collision probability-using supervised learn-
ing. Our goal is to implicitly capture the relevant traits of our training environments
needed for collision prediction through training examples, rather than modeling the
map distribution explicitly. This approach leads to a much lower-dimensional model
that is vastly easier to learn and, unlike high-dimensional inference over maps, is effi-
cient enough to run online. Using this learned model of collision probability, we plan
in a receding-horizon fashion according to the following control law:
a*(bt) = argmin{Ja(at) + h(bt, at) + Jc - fc(bt, at)}. (1.3)

at EA
This control law is similar to the optimization problem for the baseline planner, spec-
ified in equations (1.1) and (1.2), but differs in one key respect. In this control law,
we replace the safety constraint, g(bt, at) = 0, with an expected collision penalty,
Jc -fc(bt, at), where Jc is a cost applied to collision, and fc(bt, at) is a model of colli-
sion probability. This modification changes planning from a problem of constrained
optimization to one of minimizing cost in expectation over the unknown environ-
ment. In Chapter 3, we will derive this objective function as an approximation of
a full POMDP problem statement and describe how to learn this model of collision
probability from training data. We will show that our method results in collision-free
navigation in some cases more than twice as fast as the baseline planner in a variety
of simulation and real-world tests.
Note that we are not merely lifting a constraint and observing that the uncon-
24
strained system can travel faster, which would be trivially true. The hard question
is to reason about how much faster the planner should choose to go given its current
surroundings and the state-dependent limitations on its maneuvering capabilities. For
instance, the turning radius for realistic ground vehicle models increases with speed
due to tire friction limitations. Therefore, the vehicle in Figure 1-2 may be able to
negotiate its turn at 6 m/s, but it would be risky at 8 m/s, and it would certainly
crash if it attempted the turn at 10 m/s. Yet, in a long straightaway, 15 m/s would
be perfectly safe. These are complex relationships that our models must capture.
There are a number of useful points of view from which to interpret our overall
approach of collision prediction. First, we are replacing a safety constraint derived
from worst-case assumptions with a data-driven "soft constraint" that more accurately
represents the true properties of our training environments. Second, we are using a
learned function to extend the effective planning horizon beyond the extent of the
perceptual horizon. By this, we mean that our planner uses learning to augment its
partial map knowledge to make reasonable judgments about hazards that may lie
beyond what can be directly perceived, either due to sensor range or occlusion. For
example, our planner can judge an appropriate speed at which to approach and round
a corner before the occluded space around the corner has been observed, because it
has been trained on datasets containing many similar blind corners. This capability
is perhaps more fundamental than reacting to observed obstacles because no increase
in planning horizon or sensor range can enable the baseline planner to reason about
occluded regions. Third, our collision probability model serves an analogous role to an
evaluation function in games like chess, backgammon or go, in that it encapsulates the
results of many training examples gathered offline that can be used to approximate
the value of a game state beyond some finite sequence of moves, which would be
difficult or impossible to compute online.
Another way to interpret our approach is through the viewpoint of reinforcement

learning (RL), which has provided many methods of learning to act optimally in do-
mains with various forms of uncertainty. Since we learn a model of some aspect of
robot-environment interaction, and then plan in a conventional manner using that
25
model, our approach could be considered a special case of model-based reinforcement
learning. However, since we are only learning a single specific type of state transi-
tion from normal operation into a collision state, rather than a full model of state
transition probabilities, we do not generally refer to our method as model-based RL.
Furthermore, since our learning problem takes the form of a supervised probabilistic
classification problem, it is one of the easiest forms of learning problems in terms of
reward/label complexity and has none of the very difficult RL challenges associated
with sequential or delayed rewards 163].
In contrast with model-based RL, model-free approaches are attractive for their
simplicity and because they assume very little knowledge of the structure of the sys-
tem, but do not have the advantage of a model class to determine which data are
useful for learning and are usually less data efficient [31]. Furthermore, for on-policy
RL algorithms, the distribution of training data and resulting rewards are are gener-
ated by a complex and changing function of the policy as it is learned and refined.
Without some assumed structure in the problem, the policy learned in a model-free
context is not as easily transferred across tasks and domains, and is not as transpar-
ently understandable as the models we learn in this work. In Chapter 3, we will show
how our method enables us to collect labeled training data in a simple, offline, policy-
independent batch process, requiring only a training map and the robot's dynamics.
While our method improves performance in familiar types of environments that

resemble the training data, real-world scenarios are highly varied, unpredictable, and
outside the control of the robot. Therefore, it is likely that we will query the learned
function fc(bt, at) to make a prediction in a scenario unlike any it has encountered
during training. However, when the learned model is being used for collision avoid-
ance, an incorrect prediction could be catastrophic. We need some way to recognize
when the robot is in a situation for which it is untrained, and protect against making
an uninformed or dangerous prediction. Therefore, rather than asking the learned
function to extrapolate from its training data in novel scenarios, we instead take a
Bayesian approach. We introduce a prior over collision probability that mimics the
safety constraint g(bt, at), so that fc(bt, at) can blend the influences of its training data
26
and prior in proportion to how much relevant training data are available for a given
prediction. In familiar scenarios, the prior will have very little influence, whereas in
completely novel scenarios, the robot will still be able to navigate in a slow but safe
manner using the prior alone, without any relevant data. Unlike many robot learning
methods, this approach enables us to safely deploy our system in any environment,
without regard for whether it has been appropriately trained. In Chapter 3, we will
demonstrate this method in simulation and real-world trials showing automatic safe
transition between familiar and unfamiliar environment types.
1.3 Predicting the Utility of Future Measurements
We have just outlined a method of predicting collisions on the basis of the robot's
state, choice of action and current knowledge of the map. This method exchanges a
guarantee of safety in favor of a more accurate data-driven risk model that enables
faster navigation while remaining empirically collision-free. However, in some cases,
no amount of risk is acceptable and we wish to guarantee safety with respect to
unknown regions in the environment. Are there other ways we can use learning to
improve planning without sacrificing the safety guarantee?
We observe that the baseline planner in Figure 1-1(b) is highly constrained by
the fact that it has not observed the free space around the blind corner. However,
map knowledge at any point in time depends on the sequence of measurements that
have been obtained by the robot, which can be strongly influenced by the choice of
actions that move the sensor. We now ask the question: Can we train a planner to
take actions that yield improved views into the unknown regions of the environment?
If so, how much can such a method improve navigation performance?
Figure 1-3 demonstrates a sequence of actions for a safety-constrained robot round-
ing a blind corner, just as in Figure 1-1. However, the difference is that in this case,
the robot has been guided to take a slightly longer route, approaching from the out-
side of the turn. The consequence is that along this route, it is able to observe some
of the free space around the corner much earlier, and therefore has a considerably
27
U'-
(a) 6 i/s. (b) 4 m/s. (c)6 r/s.
Figure 1-3: An example planning sequence using our method of learning measurement
utility, which guides the robot wide around corners to project sensor visibility into
occluded space. The sensor in this figure is restricted to a 60' field-of-view, illustrated
with black lines. This is the planner we develop in Chapter 4.
longer available stopping distance by the time it reaches the turn. By anticipating
the utility of a few sensor measurements, the robot is able to round the corner at a
rapid 4 m/s, compared to 1 m/s with the baseline planner.
We might consider applying existing next-best-view or information-theoretic tech-

niques to guide the robot to obtain advantageous views of unknown space. However,
these methods are typically designed to discover as much map knowledge as possible,
and do not necessarily consider the implications that certain specific pieces of map
knowledge may have on future planning options because they do not reason directly
in terms of cost-to-goal. Therefore, using next-best-view or information-theoretic
guidance, it is not known how helpful particular measurements will actually be for
the task of reaching a goal in minimum time.
It is possible to manually assign some value to information, for the purpose of

combining information-theoretic quantities with physical quantities like cost-to-goal
into a single cost function [80, 110]. The problem with this approach is that, for
navigation, certain measurements are much more valuable than others, even if they
all reveal the same amount of new map knowledge. For high-speed navigation, the
most valuable measurements are the ones that reveal a corridor of free space in the
direction the robot would like to travel.
28
To add to this difficulty, as we have noted in Section 1.2, we assume that we have
no explicit knowledge of the distribution over environments, and therefore we have no
direct means of estimating the likelihood of potential future measurements. We must
instead learn to predict the usefulness of measurements from experience, using only
the robot's current map knowledge combined with the potential placements of the
sensor within that current partial map. For instance, an action that aims the sensor
toward a map frontier in the direction of the goal is probably a valuable information-
gathering action, which could enable the robot to continue moving fast and therefore
reduce its cost-to-goal. On the other hand, an action that occludes the map frontier
from view would likely be disadvantageous, requiring the robot to slow down as it
approaches the frontier.
Following this intuition, we propose a learned model that approximates the utility
of the robot's next sensor observation directly in terms of cost-to-goal. Since our
method learns from training examples, it does not require access to an explicit dis-
tribution over environments or a measurement likelihood function. Our method is in
some ways analogous to next-best-view planning, but the learned quantity reflects the
change in expected cost-to-goal that will result from the measurements taken along a
chosen action, rather than an information-based score. Therefore, we can incorporate
the learned quantity directly into our receding-horizon control law:
a*(bt) =argmin{Ja(bt, at) + h(bt, at) + fh(bt,at) (1.4)

atE A
s.t. g(bt, at) = 0. (1.5)
This control law differs from the baseline optimization problem in equations (1.1)
and (1.2) only in that we have added a learned measurement utility term, fh(bt, at),
that augments or refines the heuristic estimate, h(bt, at).
As we will show in Chapter 4, accounting for the next sensor observation has
several important effects. First, it encourages the robot to perform the intuitive
information-gathering maneuver of swinging wide around corners while maintaining
visibility of the map frontier. Second, we will show that by predicting future mea-
29
surement utility, our robot navigates up to 40% faster than it does under the myopic
guidance of the baseline planner. Finally, we will show that in some relatively benign
cases, the baseline planner becomes trapped and is unable to make progress to the
goal because it fails to foresee the need to observe key regions of the environment. Our
solution largely overcomes this problem due to its information-gathering behavior.
1.4 Learning Predictive Features from Raw Data
We have now introduced two learned functions, fc(bt, at) and fh(bt, at), that take a
belief-action pair as input. However, we have not described how these belief-action
pairs are represented for the purposes of learning and inference. In the methods
described in Sections 1.2 and 1.3, we condense belief-action pairs into a small set
of predictive features that were manually designed for their respective prediction
problems. In other words fc(bt, at) and fh(bt, at) are shorthand for the more explicit
notation, fc(0c (bt, at)) and fA(Oh(bt, at)), where #0 (bt, at) and Oh#(bt, at) compute low-
dimensional sets of real-valued features.
Our feature functions represent predictive quantities that we selected, and are by
no means the only features that could be used. For collision prediction, 0 c(bt, at)
computes the following features: (1) The minimum distance to the nearest known
obstacle along action at; (2) the mean range to obstacles or frontiers in the 600 cone
emanating forward from the robot's pose, averaged over several points along action
at; (3) the length of the straight free path ahead of the robot, averaged over several
points along action at; and (4) the speed of the robot at the end of action at. For
measurement utility prediction, #h (bt, at) extracts: (1) The fraction of map frontier
locations that will be within the field of view of the sensor along action at; and (2) the
average distance from the action to the frontier. While reasonably effective for their
respective purposes, these feature functions are difficult to design and require specific
intuition about the problems.
Hand-engineered features may be feasible for simple 2D geometry, but what if we
wish to make predictions directly from a camera image? It would be very challenging
30
to manually design a small set of visual features that could reliably predict collisions.
Instead, in Chapter 5, we will describe how to train a deep neural network to predict
collisions using images and actions as input. One of the primary advantages of using
a deep neural network is that it adapts its internal feature representation in response
to its training data, which enables it to work well on high-dimensional but highly
structured data such as images. By using images, this method helps to alleviate
the need for large and expensive sensors such as LIDAR for collision avoidance, and
significantly extends the effective perceptual range beyond the dense depth estimates
that can be computed from visual SLAM or RGBD point clouds.
However, like all learning approaches, neural networks may make inaccurate pre-
dictions when queried with inputs that differ significantly from their training data.
And, unlike the Bayesian method outlined in Section 1.2, neural networks do not
naturally have a means of accurately determining their own confidence, or assessing
how well they have been trained to make a certain prediction. Therefore, in Chap-
ter 5, we will describe how we can use a separate autoencoder neural network to
protect against erroneous predictions in novel situations. If our autoencoder detects
an input for which the collision predictor is untrained, our planner will switch to a
prior estimate of collision probability that is conservative but safe. This approach
is analogous to the Bayesian approach in Section 1.2, but because it utilizes deep
neural networks, it is able to operate on raw sensor data without the computationally
demanding intermediate step of inferring the environment geometry from images.
As we will show in Chapter 5, we can deploy our deep learning system in an

arbitrary environment, initially with no data. At first, all images observed by the
autoencoder-based novelty detector will appear novel, so the planner will navigate
purely based on its conservative prior. However, it will record all images, label them
using a self-supervised technique, and periodically re-train itself, becoming more and
more familiar with the types of environments it has observed. Our method of novelty
detection extends the early work of Pomerleau [94], who used autoencoders to indicate
when a vision-based controller was untrained for a particular scenario. We build upon
these techniques by enabling the robot to continually improve without additional
31
manual training or hand-engineered training simulators for each type of environment.
Once our system has obtained a sufficient amount of data-roughly 10,000 images
collected over several minutes of driving-most images with similar appearance to the
training environments will appear to be familiar, and the collision prediction network
will be able to make reliable predictions. At this point, the planner will be able to
drive almost entirely based on the learned model, approximately 50% faster than
what is possible under the safety constraint or conservative prior.
1.5 Contributions
We have now described the high-level problem setting for this thesis and outlined the
limitations of standard approaches such as the baseline planner. In the subsequent
chapters, we will develop three different ways that we can modify the baseline planner
to improve navigation speed. The unifying paradigm of this thesis is that in unknown
environments, there is not enough information observable through the robot's sensors
to navigate as fast as is physically possible. Instead, we must augment or replace
components of the baseline planner using supervised learning to more accurately
capture the true characteristics of the environments in which we hope to operate.
These learned models provide the planner with additional information, trained on
data from similar types of environments, that can help to close the gap between the
performance of the baseline planner and the performance that is truly possible for
our robotic systems. Toward this goal, this thesis makes the following contributions:
" A problem statement of minimum-time navigation in unknown environments as

a partially observable Markov decision process (POMDP), and approximations
that allow it to be converted to a receding-horizon control law (Chapter 3).
" A method of training a model to estimate collision probability efficiently in

simulation, using either synthetic or real maps (Chapter 3).
* An application of Bayesian inference to combine training data for collision pre-

diction with a safe but conservative prior that enables navigation in any type
32
of environment, automatically transitioning between the high performance of
the learned model in familiar environment types and the safe prior in novel or
unfamiliar environment types (Chapter 3).
" A method of learning to predict future measurement utility that can encourage
information-gathering actions leading to faster navigation by effectively extend-
ing the perceptual horizon (Chapter 4).
* A neural network formulation allowing a robot to predict collisions directly from

raw sensor data such as camera images (Chapter 5).
" A novelty-detection approach, coupled with an offline training method, which

enables a robot to navigate slowly but safely in unfamiliar environments, col-
lecting data and periodically re-training to automatically improve as it becomes
familiar with each new environment type (Chapter 5).
1.6 Publications
The work presented in Chapter 3 was originally presented in Richter et al. 1101]. The
work presented in Chapter 4 was originally presented in Richter and Roy [98]. Finally,
the work presented in Chapter 5 will appear in Richter and Roy [991.
33
34
Chapter 2
Background
Navigation is one of the most studied problems in robotics, serving as a test domain for
motion planning, control, perception, mapping, and learning techniques. Navigation
is a multi-faceted challenge, which means that algorithms for planning and perception
are not always developed in isolation and simply combined. Rather, algorithms are
often developed jointly for the problem domain of interest, leading to a wide variety
of combinations of techniques with advantages and disadvantages that depend on the
particular demands and assumptions of a given problem.
In this chapter, we will review the areas of robot navigation literature that are
most relevant to this thesis and point out the differences and shortcomings that help
to motivate this work. In particular, we will focus on methods of enforcing safety,
reaching a destination when the environment is unknown, and machine learning for
control and perception. While these are all very broad topics, we will emphasize their
importance for highly dynamic robotic vehicles where relevant.
2.1 Safe Navigation for Dynamic Mobile Robots
The aim in most robot navigation problems is to guide a robot between start and
goal locations without colliding with obstacles in the environment, perhaps minimiz-
ing some objective such as time. This task requires the robot to have some means
of representing the environment and planning paths through free space that do not
35
intersect obstacles. Simmons [107], Fox et al. [35], and Brock and Khatib [241 de-
veloped some early examples of collision avoidance for mobile robots with non-trivial
and actuator-limited dynamics by planning in the space of curvatures and velocities,
not merely positions. Although we focus on static environments in this thesis, Fior-
ini and Shiller [341 developed a similar method for avoiding collision with moving
obstacles by projecting the future motion of mobile obstacles into state space. One of
the essential characteristics of planning in the space of velocities and curvatures with
finite actuator limits is that a robot must begin slowing down or initiating a turn
well in advance of an obstacle in order to have the necessary time and space to avoid
collision. To do so requires foresight that is not necessary for quasi-static kinematic
planning for robots that can stop or turn nearly instantaneously.
Unifying these notions of safety, Fraichard and Asama [37] introduced the concept
of an "Inevitable Collision State" (ICS), which articulates the conditions under which
a dynamic robot will be unable to avoid collision with the environment. To guarantee
that it can avoid collision, a planner must limit its action choices to those that do
not place the robot in an ICS. In Chapters 3 and 5, we will utilize the ICS notion
extensively in computing labels for supervised learning problems. As we shall discuss,
collision prediction can be framed as a probabilistic classification problem to decide
which actions, when taken in a given scenario, would place the robot in an inevitable
collision state with (yet unseen) obstacles. One of the primary benefits of predicting
future inevitable collision states is that it gives the robot information about hazards
that may lie beyond the explicit planning or sensing horizon, or in occluded regions
of the environment, allowing the planner extended time and space to react.
2.1.1 Contingency Plans and Emergency-Stop Maneuvers
Unlike the work presented in this thesis, planners in general cannot anticipate future
collisions that might occur beyond their perceptual horizon, either beyond the sensor
range or into occluded regions of the environment. Therefore, they are forced to make
some assumptions about what might happen if they drive into unknown regions of
the environment. If the environment is static but unknown and only partially visible,
36
it is common to plan trajectories that make progress toward the goal, but never enter
(or dynamically commit the robot to entering) unknown parts of the environment
before they have been observed [16, 123]. This method ensures the existence of a safe
contingency plan that could bring the robot to a stop within the known free space of
the environment if need be, in the event that future re-planning attempts fail to find
a collision-free trajectory. We have expressed this constraint in equation (1.2).
For dynamic vehicles with significant maneuvering constraints, such as helicopters,
it may be useful to compute an emergency-maneuver library that can be used during
planning [7], and for vehicles that cannot stop, such as fixed-wing aircraft, it is also
possible to use circling loiter maneuvers as contingency plans [105]. The Closed-
Loop RRT planner for the MIT DARPA Urban Challenge entry used a similar safety
condition, ensuring that the vehicle could always stop in known free space by setting
the speed of every leaf node in the trajectory tree to zero [72].
If the evolution of a dynamic environment can only be predicted to some finite
time horizon, or if computational constraints prohibit planning a complete path to the
goal, then a planner may enforce what is known as T-safety, where a (partial) plan
of duration T is guaranteed to be collision-free with respect to the possible future
motions of dynamic obstacles [391. The assumption is that T should be sufficiently
long for planner to generate another safe trajectory to follow the current one.
These methods all share the common assumption that planned trajectories must
be verifiably collision-free, and so the planner is restricted to plan actions that will
keep the robot within the regions of the environment that are known to be unoccupied
at the time of planning. However, there are many cases where some uncertainty in
the environment or the future evolution of the system prohibits this strict guarantee,
and the planner must accept some risk of failure. We will discuss these cases next.
2.1.2 Probabilistic Notions of Safety
In the preceding discussion, we considered safety with respect to the particular en-
vironment in which the robot is navigating. However, if knowledge is presented in
the form of a distribution over environments, then it is possible to make statements
37
about planning performance in terms of that distribution. Karaman and Frazzoli [61]
showed that in an ergodic forest environment, there is a critical speed, above which
there exists no infinite-length collision-free trajectory through the environment, and
below which, there exists at least one. This work dissociates the notion of safety
from the exact geometry of space that is observable by the robot and instead opens
up the possibility of safely navigating faster than the speed that would normally be
verifiably safe, as long as the planner has some (perhaps learned) sense of what the
critical speed is for a given type of environment. This finding is particularly rele-
vant in Chapter 3, where we relax conventional safety constraints and instead learn
probabilities of collision as a function of environment characteristics, and we observe
a similar type of critical speed above which collisions will essentially always happen,
and below which, collisions never happen.
Althoff et al. [5] examined the problem of planning in uncertain and dynamic envi-
ronments using estimated collision probabilities to encourage safety while navigating
toward a goal. While their concept of a "Probabilistic Collision State" appears quite
general and is conceptually similar to the notion of learned collision probabilities we
use in this thesis, they only demonstrated its usefulness in avoiding collisions with
dynamic agents in the environment and they assumed implicit knowledge of a distri-
bution over future trajectories those agents might take. In contrast, we demonstrate
collision probability with respect to an unknown map, and, importantly, we do not
assume knowledge of the relevant probability distributions, which are often feasible
to define or learn for mobile agents in the map, but not for the map itself.
In addition to randomness in the physical environment, there has been substan-

tial work on planning in a known, static environment subject to noisy dynamics and
state estimation. Patil et al. [911 and Schmerling and Pavone [104], for example,
introduced methods for computing the probability of collision along a planned tra-
jectory assuming motion and sensing uncertainty. Furthermore, Prentice and Roy
[96], Van Den Berg et al. [119], Bry and Roy [251, Janson et al. [55] and Sun et al.
[114] developed algorithms for planning under the framework of Linear Quadratic
Gaussian (LQG) control, i.e., with Gaussian process and sensing uncertainty. Luders
38
et al. [75I and Luders et al. [76] extended the capabilities of some of these algorithms
to plan chance-constrained asymptotically optimal paths in the case where obstacle
placement may be uncertain. While this is indeed a form of environment uncertainty,
in practical terms its effect on planning is very similar to that of noisy dynamics and
state estimation since leaving a wider berth between the robot and a random obstacle
is sufficient to decrease probability of collision. In contrast, in this thesis, we consider
the case where the entire configuration of the environment is completely unknown. In
Section 3.1.1, we discuss why the completely unknown map makes many conventional
methods for planning under uncertainty so difficult.
The techniques listed above enable planning of complete trajectories from start
to goal, however in a completely unknown map it is not practical to do so in any
meaningful way, so we will employ a receding-horizon re-planning approach. There are
numerous examples of planning under uncertainty for dynamic systems in a receding-
horizon setting. For instance, Majumdar and Tedrake [791 introduced a method of
planning with pre-computed libraries of partial trajectories, along with associated
feedback policies and "funnels," which can be composed incrementally as the robot
moves through the environment. Funnels are sets of states around a trajectory that
the robot is guaranteed to remain within, for given bounded uncertainties on model
parameters and disturbances. Therefore, if a funnel is entirely collision-free with
respect to the observed environment, then the associated trajectory is guaranteed to
succeed. Funnels provide a method of guaranteeing absolute safety along a trajectory,
subject to bounded disturbances or model errors, but like most of the other planning
techniques listed above, they do not provide any method of reasoning about the
primary source of uncertainty in this thesis: The unknown map.
2.2 Navigating Unknown Environments
Most of the planning approaches outlined above have placed specific restrictions on
the types of uncertainty, for instance additive Gaussian noise acting on the dynamics
and measurement model, or bounded disturbances. These restrictions help to make
39
the problem analytically convenient and tractable. However, none of the methods we
have discussed thus far, with the exception of 1611 have explicitly reasoned about an
unknown environment as the primary source of uncertainty.
If the environment is initially unknown at runtime, and is perceived incrementally
through the onboard sensors on the robot, then the navigation problem takes on some
elements of an exploration task. An unknown environment requires the planner to
reason explicitly about the parts of the environment that have not been perceived
yet. In particular, the robot must consider what measurements might be obtained
if it were to point its sensor at an unknown part of the map, or what the outcome
might be for an action that led into the unknown. In Chapters 3 and 4, we will
consider both of these questions in more detail for our particular task of reaching a
goal location in minimum time.
However, there exist many other approaches for navigating unknown environments
depending on the objective. The exploration literature, for example, focuses on plan-
ning for the purpose of building complete maps, which might require the robot to
point its sensor into every unknown corner of the environment. On the other hand,
goal-directed planners may seek out very specific pieces of information that are valu-
able in reaching the goal, such as whether or not a particular door is open. Partially
observable Markov decision processes (POMDPs) are the most flexible in the forms
of uncertainty and objectives they can model, but pay the price of computational
expense for their generality. We will examine each of these cases next.
2.2.1 Exploration
The exploration literature focuses on navigation for the purpose of map-building. Ya-
mauchi 11251 introduced the idea of frontier-based navigation, in which a robot would
identify the boundaries between known and unknown regions of space, and plan paths
that allowed the robot's onboard sensors to observe the frontiers. Seeking out and
observing the frontiers of a map turns out to be a very useful heuristic for exploration
since, without any additional prior knowledge of the unknown parts of the map, it
tends to draw the robot to the areas of greatest uncertainty where most of the in-
40
formation is to be gained. As we will show in Chapter 4, it turns out that frontier
visibility is also strong predictor of how useful a measurement will be for high-speed
navigation to the goal in situations where extending the knowledge horizon around
occlusions allows the robot to travel faster.
Makarenko et al. [801 and Stachniss et al. [1101, among others, posed exploration
as a navigation problem in which the cost associated with an action was defined as
some combination of expected map improvement combined with a penalty on action
length. This type of formulation encouraged the system to produce a complete map
efficiently, with a minimum of motion. Both methods took an information-theoretic
approach to quantifying map improvement. Connolly [28], Potthast and Sukhatme
[95], and Bircher et al. 119] also proposed methods of next-best-view planning that
optimized the expected information gain or new map knowledge to be obtained from
different views into occluded parts of a cluttered space. Gonzclez-Banios and Latombe
145] extended next-best-view mapping techniques for polygonal mapping by comput-
ing the largest regions known to be obstacle-free given a partial sequence of mea-
surements, which is non-trivial for partial polygonal environment representations.
However, information gain need not be the only metric used to guide a robot for map
building. Kollar and Roy 1661 proposed a reinforcement learning planning method
whose objective was to guide a robot such that the error of the resulting SLAM-
generated map was minimized, which suggests that learning can be applied to reason
more generally about information acquisition in other contexts.
2.2.2 Incremental Re-Planning in an Unfolding Map
While exploration usually implies the task of mapping, there are considerable chal-
lenges associated with even finding a feasible route to a goal location when much of
the environment is unknown. The D* family of incremental replanning algorithms
address this problem by planning paths through a graph of positions representing
environment locations under the assumption that unknown regions of the map are
traversable, and then updating the graph and replanning the global path if an un-
foreseen obstacle is observed [111, 112, 64, 651. These algorithms all cleverly re-use
41
computation to make each refinement to the plan more efficiently than solving the
entire planning problem from scratch. The work in this thesis borrows the assumption
that any unknown region of the map may contain a route to the goal, since any more
specific assumptions about the unknown regions of the map would require some form
of prior knowledge, which we assume that we do not have.
In a similar vein, Likhachev and Stentz [74] developed a method of computing

policies for a robot navigating in a partially unknown environment that, rather than
using the free-space assumption of D*, instead minimized cost in expectation over the
probability that unknown regions of the map were traversable. This type of approach
might be useful for indoor navigation in the case where a floor plan is available,
but where doors may be open or closed. The types of policies computed by this
method might intentionally pass near doors that, if found to be open, would yield
favorable routes to the goal. Unlike more general forms of planning under uncertainty
that might use a variant of value iteration, for example, this method operates in an
anytime fashion using a series of deterministic A* searches.
Nabbe and Hebert [86] developed a related method for planning to obtain advan-
tageous sensor views of particular key locations in a partially unknown map using
a next-best-view approach. Unlike the next-best-view methods described in Sec-
tion 2.2.1, the purpose of this strategy was to reach a goal efficiently, rather than
simply map the entire environment. In that sense, it is quite similar in spirit to
the method of quantifying measurement utility that we will describe in Chapter 4.
However, similar to Likhachev and Stentz [741, this method required "hallucinating"
the presence or absence of obstacles in particular key locations, which is not unlike
problems where a distribution over environments is given.
While conceptually very relevant for this thesis, we focus on the more general
problem where unknown regions of the environment are completely unknown, rather
than mostly known with the exception of a few key obstacles or traversable corridors.
If we had access to distribution over maps, or a partial floor plan with probabilities
associated with certain passages being traversable, we could apply the formalism of
the POMDP, which we discuss next.
42
2.2.3 POMDPs for Navigation in Unknown Environments
The most general formalism for planning under uncertainty is the POMDP [59]. A
POMDP is defined by a reward function, a set of states, actions, and observations, a
probability distribution over state transitions, and a probability distribution over ob-
servations that might be obtained from each state. Thus, we can represent navigation
in a completely unknown map such that every unknown cell or region in the map adds
an additional state variable to the state space. Problems of this form can be solved, in
principle, using simple algorithms like value iteration. However, algorithms for solv-
ing POMDPs are typically not useful for realistic robot motion planning problems
because they are computationally intractable. While it is possible to compute exact
solutions for POMDPs with discrete states, actions, and observations, it is generally
infeasible to discretize high-dimensional robotics problems of interest.
Nevertheless, a number of approaches have helped to reduce the practical cost of

solving POMDPs. For instance, in an algorithm known as Point-Based Value Iteration
(PBVI), Pineau et al. [92] developed a method to approximate exact value iteration
by selecting a small set of belief points and estimating the values of those points only.
Similarly, Kurniawati et al. [701 introduced SARSOP, an algorithm that performs
value iteration for a random sampling of belief points that are intended to approximate
optimally reachable beliefs, further reducing the number of belief points needed to
approximate the optimal value function. Extending this line of approximation still
further, Silver and Veness [1061 proposed solving large POMDPs using Monte-Carlo
Tree Search (MCTS). This approach has two benefits: First, Monte-Carlo sampling
helps to break the curse of dimensionality, and second, only a black-box simulator
of the POMDP is required, as opposed to the exact distributions of transition and
observation probabilities that would normally be required. Bai et al.. [121 developed a
related technique, termed Monte-Carlo Value Iteration (MCVI), to solve continuous-
state POMDPs, and Bai et al. [14] extended that technique to handle POMDPs with
continuous states and observations.
While this line of research has made tremendous progress increasing the size of
43
POMDPs that are approximately solvable, real-world robotics problems with complex
dynamics and high-dimensional sensor measurements are still out of reach for current
solvers. Even though POMDPs have been used to model and solve a wide range of
robotic tasks, they are usually very simplified. For example, Simmons and Koenig
1108] demonstrated partially observable office navigation using a discrete, topological

approximation of the environment to vastly reduce the size of the state space. Pineau
et al. [92] demonstrated POMDP planning for a game of grid-world robot tag, which
is essentially a partially observable pursuit-evasion game in which the location of the
target can only be detected by being in the same location. However, with such a
coarsely discretized state and action space, robot tag is mostly used as a toy problem
for benchmarking POMDP solvers rather than a problem of general use.
In spite of computational difficulty, POMDPs have been used successfully in a

number of robotics domains where partial observability is a key characteristic of
the problem. Hsiao et al. [531 and Horowitz and Burdick [501, for instance, solved
grasping problems with limited sensing as POMDPs in which the robot needed to
localize an object using information-gathering actions, but in both of these cases, the
authors reported that a clever selection of state-space or action-space representation
was necessary to limit the computational complexity to a feasible level. Roy and
Thrun [1031 demonstrated coastal navigation for a robot whose state could only be
estimated by noisy observations, and might collide with an obstacle or fail to reach its
goal if its state uncertainty grew too high. This type of POMDP represents perhaps
the most commonly addressed type of robot navigation under uncertainty, and is the
problem of interest in [119, 251, and [96]. Finally, since a large portion of this thesis is
devoted to collision prediction, it is worth noting that POMDPs have also been used
for aircraft collision avoidance to evade incoming hazards [116, 13J.
While none of the POMDP applications mentioned thus far represent the un-
known environment scenario addressed in this thesis, Kurniawati et al. 1701 described
an integrated exploration scenario in which a robot attempts to reach a goal in a
grid-world map where several cells in the map may or may not be traversable with
some probability (similar to 174]). They explained the computational difficulty of such
44
problems, noting that the total number of possible states of the system grows expo-
nentially with the number of unknown cells in the map, and furthermore, that the
size of the belief space (distribution over states) grows exponentially with the number
of states. This doubly exponential growth in the size of the problem is too difficult
for even point-based POMDP algorithms. The integrated exploration environment
in 1701 had only 10 unknown map cells, and presented a computational challenge, but
even modestly sized maps of realistic environments may contain millions of cells.
It should be noted that, in addition to the severe computational complexity,
POMDPs usually require a complete description of the action and sensor model,
or at least a black-box simulator in the case of [1061, which implies some form of
implicit or explicit knowledge of the distribution over environments. For real-world
problems, specifying or learning this distribution is extremely difficult. In Section 3
we will discuss how we use machine learning to circumvent this particular difficulty
in our navigation scenario, but next we will discuss the wide range of other learning
applications for robotic planning and control.
2.3 Learning to Control Highly Dynamic Robots
Roboticists have applied machine learning techniques to solve problems in a vast and
ever growing number of ways. One general use case for machine learning is when it is
easier to present examples of a concept than to derive a model from first principles.
For robots, which must process very complex, highly structured data encountered in
the real world, this scenario occurs very frequently. The form of machine learning most
closely aligned with robotic planning problems is reinforcement learning [115, 58, 63J,
which encompasses many different techniques related to learning an unknown model
of a robot-world system and planning within that system to maximize a reward. How-
ever, other types of learning, including classification, regression, and learning from
expert demonstrations also play important roles in robot planning and control. For
instance, in Chapter 3, we will introduce a probabilistic classification problem for
collision prediction that we will use to enable a robot to anticipate hazards beyond
45
the regions of the environment that have been directly perceived. Then, in Chapter 4,
we will introduce a regression task to quantify the expected utility of future measure-
ments that we will use to guide a robot to obtain advantageous sensor viewpoints.
These methods can be interpreted as variants on canonical forms of reinforcement
learning. Given the extremely widespread use of machine learning in robotics, we
will focus primarily on examples of highly dynamic vehicle systems of interest in this
thesis, rather than providing a broad foundational overview.
2.3.1 Learning Under Uncertainty of the System Dynamics
In many robotics problems, the dynamics model may be unknown or too complex to
derive from first principles. In this thesis, we assume that the unknown map is the
only source of uncertainty and that the vehicle dynamics model is known. Still, these
cases are useful for illustrating some of the most prominent robot learning methods.
In some cases, the dynamics can be learned through trial-and-error to develop a
model of how the vehicle behaves in response to actions, and then that model can
be used for planning. This approach is called model-based reinforcement learning.
Alternatively, model-free reinforcement learning searches directly for a policy mapping
states to actions, or learns a state-action value function, without the intermediate
step of learning the dynamics explicitly. Model-free reinforcement learning may be
more appropriate if a good policy is already known, perhaps from special domain
knowledge or expert demonstration, and only requires incremental refinement. Model-
free approaches may also be appropriate if a good policy class or value function
representation are known to be much simpler than the underlying dynamics.
Sometimes, model-based and model-free approaches can be combined. For in-
stance, Ng et al. [87j used several reinforcement learning techniques to control an
autonomous helicopter in highly unstable inverted flight. In this work, the authors
used supervised learning to first develop an approximate model of inverted helicopter
flight based on several minutes of flight demonstration by an expert pilot. Then,
they used the learned dynamics model in simulation to learn a controller for inverted
flight via policy search. By using expert demonstrations to obtain data from inverted
46
flight, they avoided the time-consuming, difficult and potentially dangerous process
of trial-and-error with a real helicopter in inverted flight.
Similarly, Cutler and How [29] demonstrated reinforcement learning for drifting
maneuvers on a small RC car, where friction forces between tires and ground surfaces
are notoriously hard to model from first principles, necessitating at least some real-
world data for learning. The authors began with a policy computed from a simplified
model, then alternately tested in real-world experiments and in simulation, transfer-
ring results of the real-world tests back to the simulator to refine the policy. This
combination allowed a large portion of the controller refinement to occur efficiently
in simulation while still obtaining enough information from the real world for the
method to capture complex nonlinearities. Both Ng et al. [87] and Cutler and How
[291 demonstrated the advantages of collecting a minimal amount of real-world data as
needed while performing much of the trial-and-error controller refinement efficiently
in simulation where many more trials could be performed without the cost or risks of
testing in the real world.
Sometimes, it is not necessary to model the dynamics at all during the process
of learning a controller, if the controller can be refined directly through safe and
expedient real-world experiments. Lupashin et al. [77] demonstrated a simple learning
technique to iteratively refine a parameterized policy for multi-flip maneuvers for a
quadrotor helicopter until the maneuvers could be performed with high accuracy.
However, this model-free method has some drawbacks. First, it requires domain
knowledge in order to design the class of policies that can be expected to perform
the desired task. This scenario might only occur with highly specific maneuvers such
as quadrotor flips. Furthermore, unlike a learned dynamics model, the learned policy
cannot be generalized to any other task. In this case, it is a trade off that may be
required to achieve such precise high-performance maneuvers.
In other cases, the dynamics of the system are simply too difficult to model for
the purposes of efficient and practical control design. One example of this scenario is
unsteady fluid flow around a solid object. Joseph et al. [571 demonstrated balancing
of an inverted hydrodynamic cart-pole system, showing that it was possible to learn
47
effective policies for controlling such systems using a policy search method while
completely circumventing the physical equations that would be required for a model-
based reinforcement learning approach. However, this is another example where real-
world tests need to be relatively inexpensive to perform and in fact, due to the
difficulty of performing real-world experiments, the main contribution of Joseph et al.
[57] was to learn from very small amounts of data. Still, policies learned in this way
are unlikely to transfer to any other tasks.
On the other hand, in order for learning to transfer efficiently to a wide variety
of tasks, sometimes is best to learn the dynamics over a wide range of operating
conditions directly. Bagnell et al. [11] provided one example of learning a general
dynamics model for high-speed off-road navigation of the "Crusher" ground vehicle
in the DARPA UGCV-Perceptor Integration (UPI) program. Since the UPI program
used a skid-steer vehicle in a variety of off-road terrain conditions, accurate vehicle
modeling presented a challenge for high-performance planning. Therefore, rather than
deriving a physics-based model from first principles, they trained a neural network
to predict vehicle motions that would result from different input commands, using
ground surface estimates as part of the prediction [221. This technique enabled precise
control in a variety of conditions, including muddy hill slopes, dirt trails and driving
through brush, which have very different physical properties for a skid-steer vehicle.
While the methods vary, all of these examples demonstrated high-performance
control of a dynamical system operating near the limits of its physical capabilities.
These situations are bound to deviate from any simple models that can be derived
from first principles, and therefore benefit from learning to achieve their goals.
2.3.2 Learning From Demonstrations
Regardless of whether the low-level dynamics and control have been learned or de-
rived from first principles, it may still be very difficult for users to specify high-level
behaviors in the form of a simple cost function that can be optimized by a planner.
For instance, Ratliff et al. 197] observed that it was tedious and time-intensive for
expert engineers in the DARPA UPI program to specify a complex cost function that
48
would produce desirable navigation behavior accounting for the wide variety of haz-
ards and terrains that were encountered. Instead, they reported that it was easier to
use inverse reinforcement learning to provide demonstrations of good behavior and
learn what latent cost function was being optimized by the human demonstrator.
Likewise, Abbeel et al. 131 learned to replicate extremely challenging aerobatic
helicopter maneuvers from expert demonstrations that only a small number of human
pilots are capable of performing. In this case, aerobatic maneuvers are so specialized
that there was essentially no other way to obtain training data for the dynamics
and control along the maneuver, as well as the maneuver specification itself, besides
recording an expert demonstration.
In other cases, recording human behavior can provide labels for supervised learning
problems as a by-product without the time- and labor-intensive process of explicitly
querying an expert. Pomerleau [93], Barnes et al. 1151 and Giusti et al. [441 explored
this strategy, which we will discuss in further detail in Section 2.4.
2.3.3 Safe Learning
The essence of reinforcement learning approaches is to enable a robot to explore its

state space or policy space to learn what does and does not work. For some systems,
like a small RC car in an empty test arena, there is little risk in allowing the system
to explore its state space. However, for a large gas-powered helicopter, a crash could
cause catastrophic damage. In cases such as the helicopter, expert demonstrations
provide a safe and effective starting point for learning. In other cases, simulators may
be accurate enough to learn from before deploying a policy on a real physical system.
However, in a few cases, a robotic system must learn in the real world under its own
control, without expert guidance. Therefore, the robotic system must have some way
to remain safe by obeying constraints during the process of learning. This area of
study is referred to as safe reinforcement learning [421.
In one example of learning subject to constraints, Gillulay and Tomlin [43 used
reachability analysis to guarantee that an aerial vehicle with a downward-facing cam-
era would always be able to see a target (whose dynamics were initially unknown),
49
while simultaneously learning the dynamics of the target and thereby improving per-
formance. Similarly, Akametalu et al. [4] demonstrated safe reinforcement learning
for cart-pole and quadrotor helicopter systems, based on Gaussian process models,
by estimating the maximal safe region of state space while simultaneously learning
the dynamics. Sui et al. [113] ("SafeOpt") and Berkenkamp et al. [17] ("SafeOpt-MC")
used Gaussian processes to model objective and constraint functions, enabling them
to optimize parameters while ensuring that they would only explore regions of state
space that were believed to be safe with high probability. Demonstrating the effec-
tiveness of these, techniques on real dynamical systems, Berkenkamp et al. [18] used
SafeOpt to safely optimize a controller for a quadrotor helicopter.
While the robot must start with some baseline controller or knowledge of its
dynamics before exploring and learning on its own, these methods articulate an im-
portant problem, which is to quantify the level of uncertainty in the learned model
so that the robot can explore and improve its confidence only in situations that are
not likely to cause failure. We will utilize this concept in Chapters 3 and 5 to make
sure that we only trust our learned models in cases where we are confident that they
have received the appropriate training.
2.4 Learning Visual Perception for Navigation
In addition to dynamics and control, one of the other major domains of learning
for robotics is in sensory perception. Learned models for visual guidance date back
to the earliest examples of autonomous vehicles. The ALVINN driving system, for
instance, learned to follow a road using a fully connected artificial neural network that
received a camera image and a LIDAR scan as input, and produced output scores
for each of 45 steering directions 193]. This neural network controller performed well
enough to guide "NAVLAB," a full-sized autonomous navigation test vehicle on a
variety of roads. Perhaps most impressively, it could be trained on simulated images
and was able to quantify its own uncertainty-both topics of current research. This
vehicle was reliable enough to drive across America, mostly autonomously, at highway
50
speeds. But, despite the early success driving on highly structured roadways, learning
perceptual models is far from solved. Some of the most interesting challenges arise in
cases where the perceptual cues are subtle, variable and lack the consistent appearance
of the lane lines on a road.
2.4.1 Near-to-Far Self-Supervised Learning
One of the significant benefits of using visual information for planning is that camera
images, in principle, contain information about the structure of the environment far
beyond the range of reliable dense geometric sensing that can be accomplished with
LIDAR or stereo-based point clouds. Thrun et al. [1181, for instance, demonstrated
detection of dusty desert roads in the DARPA Grand Challenge to a range of at
least 70 m using camera images, which was significantly greater than the 22 m range
of road detection that could be performed by the onboard LIDAR sensors. This
technique was designed to raise the top speed of safe navigation from 25 mph to 35
mph. The approach was to first detect drivable terrain immediately in front of the
vehicle with LIDAR, identify the pixels in the associated image that correspond to
drivable terrain, and characterize the color content of those pixels with a mixture-
of-Gaussians model. Then, all other pixels in the image could be tested for their
similarity to the appearance of the drivable terrain and classified as drivable or not,
well beyond the range of LIDAR.
The vision technique used by Thrun et al. [1181 is an instance of near-to-far self-
supervised learning, which was developed by Wellington and Stentz [124] to predict
the characteristics of unknown surfaces under vegetation from forward-looking sensor
data. The concept of near-to-far learning is that an autonomous robot may need to
predict some property of the world that can be seen in a sensor measurement, but
may not have the information needed to compute a label for that sensor measurement
until it arrives at the location in question. But, by recording that information and
associating it with the correct previously observed sensor measurement, it can then
"close the loop" and use that measurement-label pair (or a dataset of such pairs) to
make predictions for subsequent sensor measurements.
51
One of the most prominent examples of near-to-far learning for robotic vision
was the DARPA Learning Applied to Ground Vehicles (LAGR) program, which was
explicitly designed to encourage vision-guided navigation of unstructured outdoor
environments [541. Hadsell et al. [481 used a near-to-far self-supervised technique
with a convolutional neural network to learn terrain traversability from short-range
stereo point clouds. The resulting system was able to classify terrain at a range much
farther than the stereo point clouds.
For urban driving, Barnes et al. [15] utilized a weakly-supervised method of seg-
menting images into drivable routes by projecting the future motion of a manually-
driven car onto an image and labeling the area swept by the car's path as an example
of a viable path in image space. This method of weak supervision inherits the same
essential characteristics and benefits of near-to-far learning. By segmenting images
into drivable routes, a planner can then check whether its proposed plans agree with
the learned path proposals, which may be a very powerful tool both for online guid-
ance in an applied driving context, and also potentially as a source of labels to train
a driving system from real-world image data.
While we have described a number of applications for self-supervised learning,
they all share the same fundamental advantage, which is that labeled examples can
be obtained very inexpensively by cleverly associating data from multiple sensors, or
from the motion of the robot, across time. This source of large volumes of labeled
data makes it feasible to train visual models at scale. In Chapter 5, we will utilize
similar general self-supervised labeling principles to enable us to predict collisions
from camera images.
2.4.2 Supervised Learning of Visual Models for Planning
Learned visual models can be integrated into planning in a wide variety of ways,
depending on the role that vision plays in the planning process. Some methods have
been very effective by posing the learned visual model as a standalone supervised
learning problem. For instance, Michels et al. [831 learned to predict depth from
monocular images as a supervised learning problem, using both a graphical simulator
52
and real images labeled using LIDAR, and then trained a controller using policy
search reinforcement learning to avoid collisions based on the output of the depth
estimator. Using this combined technique, they enabled a small RC car to drive up
to 6 m/s in a variety of cluttered environments. While this collision avoidance system
was effective in several experiments, it did not represent a full decision-theoretic
framework and therefore was not able to meaningfully trade off between risk and
reward. This formulation, combined with the low-fidelity depth estimation, resulted
in occasional collisions at a rate of approximately one per minute of driving.
Bachrach [81 posed planning for a mobile robot in unknown environments as a

visual classification problem to predict the feasibility of certain types of trajectories
on the basis of a camera image. Rather than estimating depth for various regions of
an image, which may be a high dimensional real-valued quantity, estimating action
feasibility may be easier since it is a binary classification problem. This work is per-
haps the most similar to the collision-prediction approach used in this thesis, except
that it was demonstrated primarily using simulated images which can be obtained
easily compared to real image data. Nevertheless, the spirit of the approach was to
anticipate the outcomes of actions that may extend beyond the perceptual horizon
on the basis of local information, which is one of the primary principles of this thesis.
Giusti et al. f441 demonstrated another example of supervised learning for visual
guidance, by training a classifier to indicate the direction of a hiking trail for a
quadrotor to follow. They used a clever data collection method of recording images
from three cameras at different headings relative to the training trails in order to
obtain a large amount of data that was automatically labeled with 'left,' 'center,' and
'right' labels. They then trained a deep neural network to classify test images into
these three categories, and steered the quadrotor accordingly to stay centered and
moving along the direction of the trail.
In the category of supervised learning for vision, image segmentation has become
especially prevalent as a means of identifying navigable regions of the environment.
For instance, Farabet et al. [33] trained a convolutional neural network to extract
features that were then used for pixel-wise image labeling using superpixel and seg-
53
mentation tree methods. They demonstrated their technique on road scenes in New
York City, labeling the road surface, cars, pedestrians, bicycles, and other categories
that are essential for navigation. Badrinarayanan et al. [10] developed a similar ca-
pability with SegNet, which accomplished the same task using a single convolutional
autoencoder that contained a pixel-wise classification layer. Kendall et al. [621 ex-
tended SegNet using the dropout technique to account for model uncertainty in class
labeling. Oliveira et al. [89] achieved a similar result, but with a neural network
designed to be efficient enough to run in real time for robot navigation. While these
models can be very powerful, they may require a large amount of pixel-wise labeled
training data, which is very difficult to obtain because it requires manual labeling.
Another example of learning pixel-wise outputs from an entire image is depth es-
timation. Unlike the segmentation task, a large amount of ground truth depth data
can be easily labeled using a LIDAR or other depth sensor, making this problem
easier in this respect. Eigen et al. [321 used a multi-scale deep neural network to
estimate the depth of each pixel in an input image, and demonstrated their tech-
nique on the KITTI driving dataset. Though many other more established methods
for camera-based depth estimation exist, for instance stereo disparity techniques or
monocular SLAM, the learning-based approach offers complementary strengths to
these approaches because it is able to leverage the context of the entire image rather
than error-prone feature detection to make predictions.
While there is an enormous variety of ways in which supervised learning could be

applied to learn visual models for navigation, they tend to share the overall paradigm
of factoring the learned visual model and the planner into separate categories, where
concise, human-understandable and often task-independent information is produced
by the learned visual model and then fed to a planner. This is the approach we take
in this thesis, because isolating the learning problem in this way offers several advan-
tages compared to the more opaque end-to-end learning methods. First, supervised
learning problems may be more data efficient and easier to diagnose than end-to-end
approaches. Second, if the output of the learning algorithm is task-independent, then
we can combine it with a number of different planning architectures, which allows
54
flexibility in design. Finally, if we can assess the uncertainty of the learned model,
we can control when to trust it and when to revert to some other prior or baseline
policy that is known to be safe. This final point is a key advantage of the collision
prediction techniques we present in Chapters 3 and 5.
At the other end of this spectrum are end-to-end learned models, which attempt
to map directly from sensory input to action without necessarily building a human-
understandable or task-independent internal representation. In some ways, end-to-
end approaches are simpler in that they do not require the user to specify a class of
planners or even to understand the underlying characteristics of the system of interest.
However, they sacrifice the advantages listed above. We describe these methods next.
2.4.3 Mapping Images Directly to Actions
With the advent of deep learning, researchers have successfully joined the image-
processing capabilities of neural networks with the policy-learning techniques of re-
inforcement learning to develop systems that are trainable end-to-end. Whereas
controllers or reinforcement learning problems normally require a concise manually-
designed state description to work with, they can now accept an image as input rather
than a traditional state description.
One of the most prominent examples of these techniques was developed by Mnih
et al. [84], who demonstrated human-level performance on Atari 2600 video games
using a technique they refer to as "Deep Q-Network" (DQN). Remarkably, the inputs
to this system were a time-history of pixel values of the last several game screens,
requiring the DQN agent to interpret the visual appearance of the image and infer
the dynamic state of the system from the differences between sequential screens. This
method inherits the underlying mathematical structure of Q-Learning, a method of
model-free reinforcement learning in which the agent attempts to learn a Q-function,
which maps a state and action to an estimate of the total expected future discounted
reward, assuming the agent follows the optimal policy from then on [122]. Unlike
model-based reinforcement learning, or the methods outlined in Section 2.4.2, which
infer a concise and human-understandable state of the world through images (for ex-
55
ample semantic segmentation into known categories), DQN does not require any such
model to be learned. The advantage, in this context, is that no human supervision is
required to manually label entities and no domain knowledge is required to define a
class of models that describe the dynamics of the game. Therefore, DQN requires no
human assistance to train itself, which is a very appealing property.
Recently, roboticists have begun experimenting with similar deep reinforcement

learning algorithms using real robot sensors and hardware to address the question of
whether it is possible to short-circuit the individual steps of designing and training
perception and control techniques separately. For instance, Levine et al. [731 de-
veloped a method of training policies via policy search for PR2 manipulation that
mapped directly from camera images to motor commands, where the policies were
represented by deep convolutional neural networks.
End-to-end learning has also been used to map camera images directly to control
commands for navigation of mobile robots. Muller et al. 1851 trained a deep convolu-
tional neural network to map from a stereo image pair to steering angles for a small
off-road RC truck, and more recently Bojarski et al. [231 trained a deep convolutional
neural network to steer a car based on a single dashboard-mounted camera. How-
ever, unlike the previous end-to-end learning examples, these networks were trained
in a supervised manner using steering examples provided by a human driver. The
reason an expert is needed to provide steering demonstration in these cases is that
the geometry of the navigation environment is a latent, but crucial variable in both of
these learning problems. Without a geometric sensor such as a LIDAR, or a simula-
tor where the environment geometry is known, there must be some form of guidance
provided to the learner that encourages it to avoid collisions and stay on the road.
Whereas a video game playing agent or a robot manipulator can repeatedly fail during
the learning process without causing serious harm, this is not the case for a full-sized
automobile where a single collision would be extremely dangerous.
While many of the recent techniques mapping images to actions use deep neu-
ral networks, there are recent examples that use other feature representations. For
instance, Ross et al. [102] developed a method based on Radon features, structure
56
tensor statistics, Laws' masks and optical flow from RGB images that all contain
some indirect information about the depth or distance to obstacles in the image.
They learned a reactive control policy from expert demonstrations as a function of
these features, which enabled them to fly through a forest at 1.5 m/s using only RGB
images as input.
Chen et al. [27] developed an intermediate paradigm for visual learning using
an affordance-based representation to provide an appropriate level of abstraction for
learning to drive a car on standard roads. They describe what they call the "medi-
ated perception method" using numerous vision components to parse the scene into a
human-understandable description including the geometry of the road and the posi-
tions of other cars, etc., which is closely related to some of the supervised learning and
segmentation methods we described in Section 2.4.2. And on the other hand, they
describe the reactive behavior reflex method of driving, which maps directly from
image to action, as in the end-to-end methods we have outlined above. They propose
a third paradigm that uses a deep convolutional neural network to learn affordance
indicators, which are specific geometric measurements that describe the position of
the car on the road and relative to other vehicles. The value of this affordance system
is that it gives a compact set of features that can be estimated directly from images
in a learning framework, but are general enough so that a policy or controller based
on them can be expected to work across a variety of environments without requiring
an excessive amount of data to learn.
2.4.4 Quantifying Uncertainty in Learned Visual Models
We have now described a variety of ways to process raw sensory input that offer valu-
able perceptual tools for robotics. However, failures of these processes may result in
catastrophic failure for an autonomous vehicle that relies on, for instance, a learned
model for collision avoidance. Simultaneously, some of the most powerful tools for
visual processing, such as neural networks, will readily make erroneous predictions
for inputs unlike any that they have observed during training 120]. In fact, it has
been recently shown that neural networks sometimes classify certain images of ap-
57
parently random noise into common categories with over 99% confidence [881. This
dangerous situation strongly calls for a means of accurately quantifying uncertainty
and detecting when learned models should not be trusted.
The method used to quantify uncertainty depends on the learning algorithm, since
some methods are able to accurately report their own uncertainty while others are
not. Grimmett et al. [46] examined the introspective capability, or the self-assessment
of uncertainty, of various classifiers used for robotic decision making. Their analysis
suggested that a classifier's uncertainty should increase with distance between the
query and the training data, which occurs naturally with Gaussian process classifiers
(GPC) and other Bayesian non-parametric methods, but less so for other common
classifiers such as support vector machines (SVMs). Furthermore, they observed
that introspective learning methods must perform marginalization over a family of
candidate models, which, again, occurs naturally for Gaussian processes, but not for
SVMs or neural networks. If the learning algorithm is a kernel-based non-parametric
method with an appropriate choice of kernel, we can often trust its own reported
predictive variance as a measure of how far the query lies from the training data and
how certain it is about its prediction. This behavior is characteristic of Gaussian
process regression, where the predictive uncertainty grows with distance from the
training data until it has reached some prior value, and it is also characteristic of the
kernel-based models outlined by Vega-Brown et al. 11211, which we use in Chapter 3.
One of the key ideas from Grimmett et al. [46] is that if a classifier is to make a
mistake, it should do so with high uncertainty rather than being very confident about
an incorrect prediction. By accurately reporting high uncertainty, the prediction
can be blended with prior information in order to mitigate the consequences of a
classification error, or the system can choose to ignore a prediction that has been made
with high uncertainty. From a robotics perspective, accurate uncertainty estimates
can be essential to avoiding dangerous collisions because they can caution a planner
against acting on potentially incorrect predictions.
We will use this important concept in Chapter 3 to blend between a high-performance

data-driven model in regions of high data density and a conservative but safe prior
58
estimate of collision probability when confronted with a novel environment type for
which we do not have relevant training data. To make this trade off in a Bayesian
manner, we use an efficient kernel-based Bernoulli-Beta model [1211. On the other
hand, if the learning algorithm is a neural network, we must devise some other way
to assess its confidence. In Chapter 5, we will use a novelty-detection approach, dis-
cussed below, to switch between a learned neural-network collision prediction model
and a safe prior.
Unlike Grimmett et al. [46], who examined the uncertainty estimates provided
directly by the learning algorithms themselves, uncertainty quantification can also be
posed as an auxiliary learning problem to predict the performance or failure of the
primary system and can be used to improve reliability or avoid situations that are
prone to failure. Sofman et al. [1091, for instance, recognized the need for novelty
detection in the UPI program, since the Crusher robot was likely to cause immense
damage if it took a wrong action. They performed online novelty detection using
density estimation in a high-dimensional feature space that was used to avoid poten-
tially risky scenarios and to prompt a human operator to take over control. Gurau
et al. [471 and Daftry et al. 1301, developed methods to predict some measure of per-
formance or failure of a vision system based on prior experience. In both cases, the
authors reported improved system performance or reliability by avoiding autonomous
operation in scenarios where perception failure can be anticipated. One disadvantage
of the method developed by Daftry et al. [30] is that it is partially based on a dis-
criminatively trained neural network, meaning that the uncertainty predictor is also
susceptible to the same types of failures as the vision system it is intended to protect.
Vega-Brown et al. [120] explored the related idea of learning context-dependent

sensor measurement covariances for improved state estimation. This method was able
to predict when there was no information available along a certain dimensional axis,
for instance a long featureless hallway, so that a state estimator using that informa-
tion could compensate accordingly. Despite using different learning techniques, this
method and the methods above serve the same auxiliary function of predicting failure
in a primary learning method.
59
To quantify uncertainty in neural network models, researchers have examined both
introspective methods as well as auxiliary methods of predicting failure. To adapt
neural networks for introspection, some work has attempted to quantify uncertainty
about neural network predictions using Bayesian methods. Mackay [78], for example,
developed methods to learn or approximate distributions over neural network weights,
which in principle, capture the uncertainty about the output of the prediction. But,
Bayesian neural network methods pay a price in terms of analytical and computational
complexity and have not taken hold as practical methods for quantifying uncertainty.
A somewhat more popular line of inquiry has been to quantify uncertainty by

training ensembles of networks on re-sampled versions of the training dataset [90,
69, 38, 711. This bootstrap approach operates under the assumption that multiple
models trained on slight variations of the same training dataset will agree in regions
of high data density and perhaps disagree in regions of lower data density. The
distribution of trained models obtained in this way can then be used to approximate
the distribution of the training data, i.e., the spread of model values evaluated at a
query point can be considered to represent the variance around the predicted output.
The advantage of these bootstrap methods lies in their simplicity. The identical
machinery for training based on backpropagation with stochastic gradient descent
from deterministic neural networks can be reused. Similarly, Gal and Ghahramani
[411 showed that training and evaluating with dropout (termed "MC Dropout") could
be used to model prediction uncertainty. While dropout is normally used to reduce
overfitting of a learned model, evaluating the network multiple times with dropout
will produce a distribution of outputs that can be considered to represent the mean
and variance of the target quantity. In this way, dropout is conceptually similar to
bootstrap or ensemble methods.
While bootstrap, ensemble and dropout methods are equipped to model the vari-
ance within the training data, they are not explicitly trained to return appropriately
high uncertainty very far from the training data. Rather, they rely on the residual ran-
domness in model space to produce a "fan-out" effect representing model disagreement
far from the training data. The characteristics of this model disagreement appear to
60
be sensitive to particular choices of model architecture, activation functions, and pa-
rameters [401. It remains to be seen whether there is any usable connection between
model disagreement far from the training data and some prior uncertainty we may
wish to specify. As a result, uncertainty estimates far from the training data may be
unpredictable and inaccurate. But, these techniques are under active development
and there has yet to emerge a practical consensus on when and how these methods
can be expected to behave.
Nevertheless, these methods, as well as Bayesian approximate methods (i.e., 1211),

require at least tens or hundreds of forward passes to be evaluated in order to build up
an approximate distribution around the output value, severely compromising runtime
efficiency. Kahn et al. [601, for example, used a combination of dropout and bootstrap
methods to quantify uncertainty in a neural network collision prediction model for
a camera-guided quadrotor and RC car. The computational burden of this method
presumably limited them to extremely low resolution images no larger than 32 x 18, an
extremely small network size of two 40-node hidden layers, and only five bootstraps
to estimate the collision probability associated with a given action during online
operation. Scaling this system up to even moderate image sizes and larger numbers
of bootstraps would likely be difficult.
Observing the unfulfilled need for an effective uncertainty quantification method

and a lack of introspection in standard neural networks, Bishop [20] approached the
problem from the point of view of novelty detection and proposed a separate kernel-
based density estimator to determine whether a query had been drawn from the
training distribution or not. In the event that the input were determined to be novel,
the learning system would revert to some pre-determined default mode or prompt an
expert for assistance. While it may sometimes be tempting to perform kernel-based
density estimation in the space of learned feature values at some intermediate layer
in the network, Bishop [20] noted that this approach would not work in general since
the nonlinear layers of a neural network do not constitute an information-preserving
transformation and a novel input could be inadvertently projected into the same
region of feature space as many of the training data points. This observation means
61
that density estimation for novelty detection must be performed in the input space,
which is unlikely to be effective if the inputs are pixels of an image, but may work
for certain choices of image descriptors.
In a robotics application with crucial safety considerations and image inputs,

Pomerleau [941 observed the need to detect novel driving scenarios in the ALVINN
driving system, which made steering errors when confronted with scenarios unlike
its training data. Such novel scenarios included intersections or roads with multiple
lanes. Rather than using kernel density estimation, they detected novel scenarios by
augmenting their neural network with an additional set of outputs that would be used
to reconstruct the inputs in an autoencoder-like fashion. If the reconstruction error
were high, it would signal a novel scenario where the human test driver should take
control, or that the planner should switch to a different neural network model that
had been specifically trained for that type of scenario.
Japkowicz et al. [561 outlined a method of using autoencoders for novelty detec-
tion, expanding on the work of Pomerleau [941. They observed that novelty detection
is a type of one-class classification problem since no negative examples are assumed to
be available for training a conventional binary or multi-class classifier. This method
is conceptually similar to a one-class SVM in that a threshold must be determined in
the one-dimensional space of reconstruction errors such that inputs that are recon-
structed accurately by the autoencoder are considered to be familiar and to belong
to the same class as the training data, whereas inputs that are reconstructed inaccu-
rately are considered to be novel. By designing the autoencoder to have a bottleneck
layer or sparsity constraint, it is forced to learn a compressed representation of the
input distribution, hopefully capturing the essential characteristics that define that
class. Therefore, if an input is drawn from some other category, it is very unlikely to
accidentally conform to the structure of the input distribution, and is therefore likely
to be reconstructed poorly by the autoencoder. They advised setting the threshold
at or near the greatest reconstruction error observed in the training dataset. Under
such a threshold, all training data points would be correctly classified as familiar,
while any inputs with greater reconstruction error would be labeled as novel.
62
In fact, the method of using the reconstruction error of an autoencoder to detect
novelty has been used in a wide variety of domains, including helicopter gearbox fault
detection, molecular biology promoter recognition, sonar target classification [561,
acoustic signals 182], network server anomalies 1117], data mining [49], document
classification [811 and others. Despite its initial success, this method has not enjoyed
significantly renewed interest during the present wave of progress in deep neural
networks, and it is unclear why not, since the need for uncertainty quantification
grows as neural networks are deployed in more and more safety critical applications.
Moreover, uncertainty quantification for neural network systems is still an open
problem. Despite the various efforts that we have presented above, there are still
no reliable, efficient, off-the-shelf techniques that can be used. Of the methods we
have described, the autoencoder-based approach is by far the most efficient since it
can be evaluated in a single pass, rather than estimating the output distribution
by evaluating each member in an ensemble or distribution of methods. However, it
remains to be seen whether the autoencoder approach can scale to large and highly
varied datasets in general applications.
63
64
Chapter 3
Predicting Collisions from Features of

a Geometric Map
In this chapter, we will describe a method of predicting collision probabilities and

using them to enable a robot to safely navigate faster than would be possible under
conventional safety constraints. In Chapter 1, we described the high-level planning
task of reaching a goal in an unknown environment in minimum time, where the
knowledge horizon-the boundary between what the robot has observed and what is
still unknown-unfolds incrementally as the robot moves. In many ways, this perpet-
ually unfolding knowledge horizon is what makes the navigation problem challenging.
For a planner enforcing conventional safety constraints, the knowledge horizon dic-
tates exactly how fast the robot can travel, because it limits the available stopping
distance, and can be very restrictive in cluttered environments. Yet, reasoning be-
yond this knowledge horizon using any conventional inference method would require
knowledge of the environment distribution, which we assume that we do not have,
and it would also require a means of solving what is normally an intractable planning
task in the form of a partially observable Markov decision process (POMDP), where
the environment itself is the key partially observable quantity.
On the other hand, as we will show in this chapter, if the planner can predict
risks beyond the knowledge horizon based on training examples, without explicitly
inferring the unobserved structure of the environment, it can efficiently extend its
65
planning horizon into the unknown and navigate much more quickly while remaining
collision-free. We will first show how the navigation task we described in Chapter 1
can be formulated as a POMDP, which lays the foundation for decision theoretic plan-
ning. We will also discuss why our problem is uniquely challenging even compared
to other POMDPs. We will then derive our approximation to the POMDP to make
the problem tractable, which gives rise to our learned model of collision probability
as part of a receding-horizon control law that we use for online planning. Next, we
will discuss how we learn that model and how we encode safety constraints as a prior
over collision probability to help our planner remain safe in novel environment types
for which it has no relevant training data. Finally, we will present simulation results
demonstrating 100% empirical safety-i.e., remaining collision-free without an abso-
lute safety constraint-while navigating much faster than the baseline planner. We
will also present experimental results demonstrating the effectiveness of our approach
compared to the baseline planner in a variety of real-world settings.
3.1 POMDP Planning
We wish to control a dynamic vehicle through an unknown environment to a goal

position with minimum expected cost, where the expectation is taken with respect
to the (unknown) distribution of maps. While solving this problem exactly is com-
pletely intractable, the POMDP formalism is useful for defining the planning task and
motivating our approximations. The configuration of our robot is natively continu-
ous and governed by differential equations, which we will describe below. Therefore,
our problem definition includes these continuous variables, and is not intended to
be compatible with standard discrete POMDP algorithms. The following POMDP
tuple, (S, A, T, R, 0, Q), applies to an RC car in the horizontal plane, equipped with
a planar LIDAR, which we use throughout this thesis:
States S: {Q x M}. Q is the set of continuous-valued vehicle configurations: q
[x, y, 4, k, v], representing global x, y-position, heading, curvature of motion (which

corresponds to steering angle) and forward speed, respectively. The vehicle position
66
can be anywhere in the plane: [x, y] E R 2 . Other configuration variables have bounded
domain: ?/ C [-7,7r], k E [-kmax(V),kmax(V)], and v E [Vminvmax], where kmax(v),
Vmin and Vmax are configuration constraints defined below.

M is the set of n-cell occupancy maps: m = [iMO, m 1 , ... , mcn] {0, 1}f, where mi
represents the ith cell in the map. For a given problem instance, the true underlying
map, m, is fixed, while the configuration, q, changes at each time step as the robot
moves. We assume that q is fully observable, while m is partially observable since
only a subset of the map can be observed from a given location. We also assume that
n is fixed and known. Except where noted otherwise, we use a map cell size of 0.1 m,
which is a sufficiently fine discretization to represent small details in our experimental
environments while not requiring excessive cost for computations involving the map.
We also note that it may be possible to implement many of the techniques in this thesis
using other map representations, for example polygonal or plane-based maps, but the
choice of occupancy grids helps us to explicitly represent the robot's knowledge of
both known and unknown regions in a simple, efficient data structure.
Actions A: A is a finite set of actions, computed and stored offline, spanning the
vehicle's maneuvering capabilities. Several examples are illustrated in Figure 3-1. All
actions reach the same planning horizon of 2 m (except where otherwise noted), but
differ in their time duration as a function of their speeds. We compute the actions in
action set A using methods developed by Howard et al. 151], as dynamically feasible
solutions to boundary-value problems with the following constrained kinematic model:
c = v cos(b) (3.1)
y=v sin(o) (3.2)
k v (3.3)
k (kcmd - k)/dt, s.t. IkI < kmax( v), k|I kmax (3.4)
V) (Vcmd - v)/dt, s.t. - bmax < L' < amax, (3.5)
Note that this system of differential equations is used only to generate the discrete
set of actions offline, and is not used at all in any stage of online planning.
67
(a) (b) (c) (d)
Figure 3-1: Examples of pre-computed action sets from initial speeds of v = 1 m/s
(a), v = 4 m/s (b), and v = 8 m/s (b) with zero initial curvature, and from v = 4
m/s with non-zero initial curvature (d).
The variables kcmd and Vcmd are the commanded steering (curvature) and velocity,
respectively. The dynamics are constrained by kmax(V), kmax, amax, and bmax, which
are maximum curvature, curvature rate, acceleration and braking acceleration, re-
spectively. We use kmax = 5 (m - s) 1 , and amax = bmax = 5 m/s2 . For the purposes of
computing an action set, we also consider the maximum speed to be Vmax = 15 m/s
and the minimum speed (except for emergency-stop actions) to be Vmin = 1 m/s. The
maximum curvature kmax(v) is a function of vehicle speed, which reflects the speed-
dependent traction force required by the tires to maintain a given turning radius,
as well as other more complex vehicle dynamics effects. For our experimental RC
car platform, the maximum curvature at idle speed is approximately 1 m- 1, but this
1 at high speeds. We define
maximum curvature decreases to approximately 0.1 m
kmax(v) as a linear function of v, which decreases from kmax(Vmin) = 1 to kmax(vmax) =
0.1. While these numerical values were used for the results in this chapter, and were
obtained from a preliminary set of system identification experiments, we subsequently
refined our parameter choices slightly as we modified the suspension and weight dis-
tribution of our experimental RC car and gained more experience with its high-speed
maneuvering capabilities under strong acceleration and braking, to ensure that the
car could precisely execute the planned actions.
There are several elements of this kinematic model that play a particularly impor-
tant role in our results. First, the speed-dependent curvature limit reflects a crucial
loss of maneuverability as speed increases. This is a natural property of virtually
68
all realistic vehicles. The effect is that the set of feasible trajectories at high speeds
is much narrower than at low speeds. This effect can be observed in the differences
between feasible actions for different speeds in Figures 3-1(a)-3-1(c). The curvature
rate also has a similar effect. Intuitively, it takes time to pivot the steering wheels
to initiate a turn, and as the vehicle moves faster, more distance is covered during
that time. It also means that once a turn is initiated, the vehicle is committed to
continuing in a turning motion until it has the time to pivot the wheels out of the
turn. This effect can be observed in Figure 3-1(d), which depicts the actions avail-
able from a state with non-zero initial curvature. As a consequence of both of these
constraints, low speeds are generically safer than high speeds due to the velocity-
dependent maneuverability of the vehicle to avoid collision. Likewise, acceleration
and braking limits mean that the robot cannot instantaneously change speed. The
combined result of these physical limits is that if the robot needs to turn sharply to
negotiate a corner or avoid an obstacle, it must begin slowing down well in advance.
Conditional Transition Probabilities T: We define several deterministic func-

tions related to transition dynamics for convenience and later use. The state-transition
function F(s, a) : S x A -+ S returns the state reached by taking action a from state s
according to the dynamics given in equations (3.1)-(3.5). In F, the map portion of the
state remains fixed at its true value. The collision function C(s, a) : S x A 4 {0, 1}
indicates whether taking action a from state s would result in a collision. If a collision
would occur along trajectory a (i.e., if C(s, a) = 1), then F(s, a) clamps the config-
uration to its last feasible value along a and sets the velocity to zero. The function
ICS(s) : S - {0, 1} indicates whether state s is an inevitable collision state [371,
i.e., if there exists no infinite-horizon sequence of actions {ao,a 1 ,... } from s that
will avoid collision with the environment. We now define the conditional transition
probability distribution, assuming deterministic vehicle dynamics and a fixed map:
p(st+1 st, at) = 1 for st+1 = F(st, at) and 0 otherwise. We use the subscript t + 1 to
indicate the discrete "time" after completing action at.
Observations Q: Each observation consists of a perfect measurement of qt and

the partial map of cells visible to the robot from st. This partial map consists of
69
occupied cells at locations ri,2, corresponding to each LIDAR range measurement ri,
and unoccupied cells along the ray from qt to each rixy.
To produce the partial map from a raw LIDAR scan, we perform raycasting op-
erations for each of the individual beams. For a single beam, we mark each map cell
along the ray from the estimated LIDAR location to the x, y-endpoint of the beam as
unoccupied, and we mark the cell at the endpoint location as occupied if the beam
hit an object in the environment, or leave it unchanged if the beam reached maxi-
mum range without hitting an object. Since we assume the robot configuration (and
hence LIDAR configuration) to be fully observable, we also assume this raycasting
operation will produce a perfect map.
Conditional Observation Probabilities 0: We assume a noiseless sensor, so
p(ot+1 st+1, at) = 1 for the partial map corresponding to the map geometry visible
from state st+1, along with a perfect measurement of qt+i, and 0 otherwise.
Cost Function R: We use a minimum-time cost function and denote the time
duration of action a as Ja(a). Let SG denote the set of goal states. R(s, a) = 0 for
s E SG. For s SG, R(s, a) = Ja(a) if C(s, a) = 0 and adds an additional collision
penalty, Jc, if the action results in a collision: R(s, a) = Ja(a) + J, if C(s, a) = 1.
3.1.1 Missing Prior Distribution Over Environments
A POMDP agent maintains a belief over its state b(st) = P(qt, m), and has a state
estimator, which computes a posterior belief that results from taking an action and
then receiving an observation, given a current belief: b(st+1) = P(st+1 Ibt, at, ot+i). If
the agent's current belief, bt, assigns uniform probability to all possible maps with
independence between map cells (a very unrealistic distribution), then an observation
and subsequent inference over the map distribution simply eliminates those maps
that are not consistent with the observation and raises the uniform probability of
the remaining possible maps. If, on the other hand, the prior belief correctly assigns
high probability to a small set of realistic maps with common structures such as
hallways, rooms, doors, etc., and low probability to the unrealistic maps, then this
belief update may help to infer useful information about unobserved regions of the
70
map. For instance, it might infer that the straight parallel walls of a hallway are likely
to continue out into the unobserved regions of the map. All of this prior knowledge
about the distribution of environments enters the problem through the agent's initial
belief bo, which we must somehow provide.
Unfortunately, as we noted in Section 1.2, modeling the distribution over real-

world environments would be extremely difficult. We have little choice but to initialize
the robot believing that all maps are equally likely and that map cells are indepen-
dent from each other. To compensate for this missing environment distribution, our
approach is to learn a much more specific distribution representing the probability of
collision associated with a given planning scenario. This learned function enables the
agent to drive at high speed as if it had an accurate prior over environments enabling
reasonable inferences about the unknown. In the following sections, we will derive
our learned model from the POMDP and describe how we can efficiently train and
use this model in place of an accurate prior over environments.
3.1.2 Approximations to the POMDP
Let V(s) = E' R(st, 7r(st)) be the infinite-horizon cost associated with some policy
7r mapping states st to actions at. The optimal value function, V*(s) could then, in
principle, be computed recursively using the Bellman equation. Having computed
V*(s) for all states, we could then recover the optimal action for a given belief: 1
a*(bt) =argmin
atc
4 b(s)R(st,at)+
(3.6)
SSt b(st) St+1 P(st+1ist,at)Ot+1 P(ot+llst+,at)V*(st+1)
(
The summations over st and ot+1 perform a state-estimation update, resulting in a

posterior distribution over future states st+1 from all possible current states in our
1 Since we use a discrete set of actions, there is a finite number of configurations that can be reached
from an initial configuration. Therefore, we can sum over future states rather than integrating.
71
current belief bt, given our choice of action at. We can rewrite (3.6) as:
a*(bt) = argmin b(st)R(st, at) + P(st+1ibt,at)V*(st+1). (3.7)

atEA St St+1
In this equation and those that follow, we use the slightly nonstandard notation
P(st+1|bt,at) = E, b(st)P(st+1|st,at) for brevity and to emphasize that our approxi-
mations will be computed from attributes of the belief itself, for instance the geometry
of known and unknown cells in the map estimate. We never actually sum over states.
Since our cost function R(s, a) applies separate penalties for time and collision,
we can split V*(s) into two terms, V*(s) = V (s) + Vs(s), where T and C refer
to "time" and "collision." V (s) gives the optimal time-to-goal from s, regardless of
whether a collision occurs between s and the goal, and V5(s) gives the optimal total
future collision penalty from s. We assume that if a collision occurs, the robot can
recover and proceed to the goal. Furthermore, we assume that the optimal trajectory
from s to the goal will avoid collision if possible. But there will be some states s
for which collision is inevitable (ICS(s) = 1) since the robot's abilities to brake or
turn are limited to finite, realistic values. These assumptions imply that V5(s) = 1
if ICS(s) = 1 and 0 otherwise. Therefore, we can rewrite the total cost-to-goal as:
V*(s) = Vp(s) + J, - ICS(s). Substituting this expression, we can rewrite (3.7) as:
a*(bt) =argmin { b(s)R(st,at)+

St
(3.8)
S P(st+1|bt,at)Vc(st+1) + Jc - P(st+1|btat)ICS(st+1)}-
St 1 St+1
The first term in equation (3.8) is the expected immediate cost of the current action,
at. Since we assume that we automatically discard any actions intersecting obstacles
along the length of at, and since there are no other immediate costs that could apply,
this term simply reduces to Ja(at), the time duration of at.
The second and third terms of (3.8) are expected values with respect to the pos-
sible future states, given at and bt. However, as we have observed in Section 3.1.1,
our initial uniform belief over maps means that bt will be very unhelpful (and indeed
72
misleading) if we use it to take an expectation over future states, since the true distri-
bution over maps is surely far from uniform. The lack of meaningful prior knowledge
over the distribution of environments, combined with the extremely large number of
possible future states st+1, means that we must approximate these terms.
For the expected time-to-goal, we use a simple heuristic function h(bt, at):
P(st+IIbt,at)V (st+1) a h(bt, at), (3.9)

st+1
which we define to be the time needed to traverse the shortest piecewise straight-line
path (ignoring dynamics) from the end of action at to the goal, assuming the current
speed is maintained. We compute the length of this piecewise straight-line path using
Dijkstra's algorithm on a 2D graph of nodes connected to their 16 nearest neighbors,
respecting the observed obstacles in bt, and assuming unknown space is traversable.
The use of a lower-dimensional search to provide global guidance to a local planner
is closely related to graduated-density methods [52I. While other heuristics could be
devised, alternatives would require more specialized domain knowledge, while this one
is both efficient and general. We instead focus our efforts in this chapter on modeling
the collision probability, which can be considered to be part of the heuristic since it
contributes to the remaining cost-to-goal beyond the chosen action.
The third term in equation (3.8), JcZ
-Est, P(st+lbt, at)ICS(st+i), is the expected
future collision penalty given our current belief bt and action at. As we noted in Sec-
tion 3.1.1, the fundamental problem is that our belief bt does not accurately capture
which map structures are likely to be encountered in the unknown portions of the en-
vironment. It only captures what maps are possible given the measurements observed
thus far, or in other words, which cells in the map are free, occupied, or unknown.
Correctly predicting whether a hallway will continue around a blind corner, for in-
stance, is impossible based on the belief alone. We instead turn to machine learning
to approximate this important quantity from training data:
E P(st+1 bt, at)ICS(st+i) ~ fc(&c(bt, at)). (3.10)

St+1
73
The model, fc(Oc(bt, at)), approximates the probability that a collision will occur in
the future if we execute action at given belief bt. It is computed from some features
oc(bt, at) that summarize the relevant information contained in bt and at, for example
vehicle speed, distance to observed obstacles along at, etc. With the approximations
of all three terms in equation (3.8), we arrive at our receding-horizon control law:
a*(bt) = argmin{Ja(at)+ h(bt, at) + Jc -fc(oc(bt, at))} (3.11)

atEA
At each time step, we select a* minimizing the right hand side of equation (3.11)
given the current belief bt. We begin to execute a* while incorporating new sensor
data and re-planning. Next, we will describe how we collect data and build a model
to predict collision probabilities associated with a belief-action pair (bt, at).
3.2 Predicting Future Collisions
In this section, we focus on learning to predict the probability of collision associated

with a belief-action pair. We represent each belief-action pair as a point in a multi-
dimensional space of predictive features, 2 0(b, a) E 4D. We assume that for a given
vehicle or dynamical system, there exists some true underlying probability of collision
that is independent of the map m and robot configuration q, given features, q. In
other words, P("collision" 1q, q, m) = P("collision"|#). Under this assumption, we can
build a dataset by training in any map we wish, and the data will be equally valid in
other environments that populate the same regions of feature space. If two datasets
gathered from two different environments do not share the same output distribution
where they overlap in feature space, we assume that our features are simply not rich
enough to capture the difference. This assumption also implies that if an environment
is fundamentally different from our training environments with respect to collision
probabilities, it will populate a different region of feature space.
2
1n Section 1.4, we distinguished between features for collision prediction, qc, and features for
measurement utility prediction, Oh. For the remainder of this chapter, we simply write # to mean 0,
since we are only concerned with collision prediction and will be using subscripts for other purposes.
74
If the robot encounters an environment for which it has no relevant training data
nearby in feature space, it should infer that this environment is of some unfamiliar
type and react appropriately. In these cases, we require that our learning algorithm
somehow provide a safe prediction of collision probability rather than naively extrap-
olating the data from other environments. Our features must therefore be sufficiently
descriptive to predict collisions as well as differentiate between qualitatively different
environment types.
Designing good features relies on the domain knowledge of a human feature de-
signer. For this chapter, our features are four hand-engineered functions: (1) Min-
imum distance to the nearest known obstacle along the action; (2) mean range to
obstacle or frontier in the 600 cone ahead of the robot, averaged over several points
along the action; (3) length of the straight free path directly ahead of the robot, aver-
aged over several points along the action; and (4) speed at the end of the action. While
we will show that this set of features worked well in our navigation experiments, our
method is extensible to arbitrary numbers of continuous- and discrete-valued features
from a variety of different sources, including features that operate on a history of past
measurements. We expect that additional features may enable more intelligent and
subtle navigation behaviors.
3.2.1 Learning Problem
The model in equation (3.10), fc((bt, at)), approximates the probability that a col-
lision will occur in the future if the robot executes action at given belief bt, using
features #(bt, at) as input. We approximate this probability distribution by collect-
ing a dataset of N representative labeled feature points, D = {(# 1, y),. .. ,(#iN, YN)},
where labels yj are binary indicators ("collision", "non-collision"). In Section 3.2.3 we
will discuss how this dataset is collected.
To infer the probability of collision (or distribution over labels) associated with
some feature point # at runtime, we use a non-parametric Bayesian inference model
developed by Vega-Brown et al. [121], which generalizes local kernel estimation to the
context of Bayesian inference for exponential family distributions, effectively resulting
75
in a weighted nearest-neighbor method combined with a prior. Another natural choice
of learning algorithm in this domain might be a Gaussian process (GP). However,
classification with a GP requires approximate inference techniques that would likely
be too slow to run online, whereas the inference model below is an approximate model
that allows efficient, analytical inference.
The Bayesian nature of this model will enable us to provide prior knowledge
to make safe (though perhaps conservative) predictions in environments where we
have no training data. We model collision as a Bernoulli-distributed random event
with a beta-distributed parameter 0 - Beta(a, /), where a and 4 are prior pseudo-
counts of collision and non-collision events, respectively. We write these prior pseudo-
counts as functions a(b) and 0(0), since they may vary across the feature space and
can be defined to encode the desired feature-depended prior probability function,
Ppr.(0) = a(#)/(aZ() + 4(0)). Using the following model, we can efficiently compute
the posterior probability of collision given query 0 and dataset D:
fc(O) = P(y = "collision"|#, D) = + Z=1 N (3.12)

a(#) + 0(0) +
-1 _ k(#, #3)
where k(#, Oi) is a kernel function measuring proximity in feature space between our
query, 0, and each #i in D. We use a polynomial approximation to a Gaussian
kernel with finite support and unity kernel width.3 Based on this kernel, we will
use the effective number of nearby training data points, Neff. = EN k(, 0i), as
a measure of data density. The prior contributes Npr. pseudo data points to each
prediction, where Npr. = a(#) + 0(#). The ratio Neff./(Nff. + Npr.) determines the
relative influence of the training data and the prior in each prediction. For example,
in datasets with 5 x 10' points in total, Neff. tends to be 102 or greater when the
testing environment is similar to the training map, and Neff. < 1 when the testing
environment is fundamentally different (i.e., office building vs. natural forest). For
this chapter, we used Npr. = 5, and results are insensitive to the exact value of Npr.
as long as Neff. > Npr. or Neff. < Npr.. The choice of Npr. influences how many real
3 To weight all features equally, even though they may have different natural magnitudes, we
rescale the data by dividing each feature by its standard deviation in the dataset.
76
data points will be required before the real data begin to dominate the prior in the
prediction. The prior pseudo count functions a(#) and 0(0) must be defined jointly
to achieve the desired value of Npr. as well as the desired probability function Pr.().
In the next section, we will define ace() and 0(#) to endow our planner with safe
guidance to fall back upon when no training data are available.
3.2.2 Advantage of the Bayesian Approach
Machine learning algorithms should make good predictions for query points located
near their training data in input space. However, predictions that extrapolate beyond
the domain of the training data may be arbitrarily bad. For navigation tasks, we want
our planner to recognize when it has no relevant training data, and automatically
revert to safe behaviors rather than making reckless, uninformed predictions. For
instance, if the agent moves from a well-known building into the outdoors, where it
has not trained, we still want the learning algorithm to guide it away from danger.
Fortunately, the inference model of equation (3.12) enables this capability through
the prior functions a(5) and 0(0). If we query a feature point in a region of high
data density (Neff. > Npr.), fc(#) will tend to a local weighted average of neighboring
data points and the prior will have little effect. However, if we query a point far from
the training data (Neff. < Npr.), the prior will dominate the prediction. By specifying
priors a(#) and 0(0) that are functions of our features, we can endow the planner
with domain knowledge and some guidance about which regions of feature space are
safe and dangerous.
In this thesis, we have designed our prior functions a(#) and 13(o) such that
Ppr. (#) = 0 if there exists enough free space for the robot to come safely to a stop
in a straight line from its current velocity, and Ppr.(#) > 0 otherwise. While we do
not use the full range of possible emergency-stop actions to determine whether the
robot could safely stop or not (doing so would imply embedding a full planner into
the calculation of features which is possible, but impractical), we can use features (3)
and (4) from Section 3.2 to develop a good approximation. These features measure
the free distance in a straight line ahead of the robot (averaged along the action),
77
which we express as dfree, and the robot's speed, VF, at the end of the action. We can
use VF combined with the robot's maximum braking acceleration, bmax, to calculate
the necessary stopping distance, detop = vF/(2bmax). Using these quantities, we can
estimate the amount by which any given action choice would cause the robot to exceed
the available stopping distance: dviolation = max(dstop - dfree, 0). At this point we could
choose to define Ppr.(#) = 0 if dvioiation = 0 and Ppr.(#) = 1 if dviolation > 0. However,
it is useful to instruct the planner that actions incurring less violation of the stopping
distance requirement are safer or preferable. In the absence of training data, when
faced with a decision between two actions that could both lead to collision, we would
like our planner to slow down to reduce the chance of collision. To do so, we can
push this constraint violation through a monotonically increasing function that maps
to the interval [0, 1] to make it a valid probability. We use a half-sigmoid function:
Ppr.(#) = max(2/(1 + exp(-kpr. - dviojation)) - 1, 0). (3.13)
We use kpr. = 1, although this parameter could be tuned to adjust the shape of the
prior. From the constraint a(0) + (0) = Npr. and the relation Ppr.a) =(a(0)+
O()), we can then recover a(#) and 0(0).
Using this prior, as Neff. drops to zero (and for sufficiently large values of Jc),
the learning algorithm activates a conventional stopping distance constraint by only
selecting actions for which dvioiation is zero. Therefore, in novel types of planning situ-
ations, our planner seamlessly turns into one with essentially the same characteristics
as the safe baseline planner we compare against. As we have defined it, dvioiation con-
siders only the straight-line stopping distance, which is an approximation of the true
constraint violation that could be computed by testing all possible emergency-stop
actions. If our predictive feature set contained an exact measurement of constraint
violation, it would be possible to exactly recover the behavior of the baseline planner
via the prior. Nevertheless, we find that our features as defined in Section 3.2 are
good predictors based on data, yield good performance when used in the prior, and
are also computationally simpler and more efficient.
78
3.2.3 Training Procedure
To predict collision probabilities, we collect a training dataset of the form D
{(#1, yO),... ,(N, YN)}. Labels y, are binary indicators ("collision", "non-collision")
associated with belief-action pairs, represented as points #2 in feature space. To ef-
ficiently collect a large dataset, we use a simulator capable of generating realistic
LIDAR scans and vehicle motions within arbitrary maps. The training procedure
aims to generate scenarios that accurately emulate beliefs the planner may have at
runtime, and accurately represent the risk of actions given those beliefs. While at
runtime, the planner will use a map built from a history of scans, we make the simpli-
fying assumption that a single measurement taken from configuration qt is enough to
build a realistic map of the area around qt, similar to one the planner might actually
observe. With this assumption, we can generate training data based on individual
sampled robot configurations, rather than sampling extended state-measurement his-
tories, which not only results in more efficient training, but also eliminates the need
for any sort of planner or random process to sample state histories.
We generate each data point by randomly sampling a feasible configuration, qt,

within a training map, and simulating the sensor from qt to generate a map estimate,
and hence a belief bt. We then randomly select one of the actions, at, that is feasible
given the known obstacles in bt. Recall from equation (3.10) that our learned function
models the probability that a collision will occur somewhere after completing the
immediate chosen action at, i.e., the probability that state st+1 (with configuration
qt+1) is an inevitable collision state. Therefore, to label this belief-action pair, we run
a training planner to generate a sequence of actions from configuration qt+i at the
end of at. If there exists any feasible action sequence starting from qt+i that avoids
collision with the true hidden map (to some horizon, or until the robot can come to
a stop), then ynew = 0, otherwise ynew = 1. Finally, we compute features #new(bt, at)
of this belief-action pair and insert (#new, Ynew) into D.
Figure 3-2 illustrates "collision" and "non-collision" training instances. In 3-2(a),

the hidden map includes a sharp hairpin turn, which is infeasible for our dynamics
79
(a) (b)
Figure 3-2: Examples of "collision" (a) and "non-collision" (b) training events. One of
the immediate actions (black) is chosen for labeling. The training planner determines
whether the end of this action (purple dot) is an inevitable collision state with respect
to the hidden map (shown in light gray). Feasible partial paths are shown in blue.
Nodes successfully expanded by the training planner are green, and nodes for which
no collision-free outgoing action exists are red. In (a), all partial paths dead-end (red
nodes) before reaching the desired three-action horizon because the vehicle speed is
too great to complete the turn given curvature and curvature rate limits. In (b), the
training planner successfully finds a sequence of three actions.
model, given that the car begins toward the inside of the turn at a relatively high speed
(~ 8 m/s). Therefore, the training planner fails to find a trajectory to the desired
horizon and each partial path results in a dead-end (red dot) because the robot is
moving too fast to complete the hairpin turn. On the other hand, the hidden map
in 3-2(b) has a straight hallway rather than a sharp turn, and the training planner
succeeds in finding a feasible trajectory to the three-action horizon, aided by the car's
initial configuration toward the outside of the turn. The empirical distribution of
"collision" and "non-collision" labels collected from these sampled scenarios therefore
implicitly captures the relevant environmental characteristics of the true hidden map
distribution, and the way they interact with our dynamics model, in a much more
efficient way than modeling the complex, high-dimensional map distribution explicitly.
Our training procedure captures sensible patterns, for instance that it is safe to drive
at high speeds in wide open areas and long straight hallways, but that slower speeds
are safer when approaching a wall or navigating dense clutter.
80
We use a horizon of three actions (6 m) for our training planner because if a
collision is inevitable for our dynamics model, it will nearly always occur within this
horizon. We have explored other settings for this horizon and found the results to be
comparable, although if the horizon is too short, some inevitable collision states will
be mislabeled as "non-collision." In Chapter 5, we will revisit this training procedure,
but we will use the set of emergency-stop actions to test for inevitability of future
collisions, rather than running a separate training planner to a fixed horizon. These
two approaches perform the same task, but make slightly different approximations
based on the set of trajectories they consider. The training planner is able to search
a very dense set of possible trajectories because it chains together actions from a set
of motion primitives in exponentially many combinations, but this flexibility comes
with a computational cost. On the other hand, evaluating a pre-computed set of
emergency-stop actions is very efficient, but the set of trajectories is not as diverse or
flexible. We have found that both approximations work well for our purposes.
3.3 Results
We obtained simulation results from randomly generated maps as well as experimen-

tal results on a real RC car navigating at over 13 m/s in hallway, laboratory, parking
garage, and large open atrium environments at MIT. Our simulation results show that
replacing safety constraints with a learned model of collision probability can result
in faster navigation without sacrificing empirical safety. We demonstrate that these
results apply for a variety of sensor ranges and fields-of-view, extending the usefulness
of our technique to camera-based and RGBD perception systems which are consider-
ably more limited than LIDAR. The results also show that when the planner departs
a region of feature space for which it has training data, our prior keeps the robot
safe. Finally, our experiments demonstrate the effectiveness of collision prediction in
real-world environments, and show that even within a single environment, the robot
often encounters situations for which it has training data and others for which it does
not, thereby justifying our Bayesian approach and use of prior knowledge.
81
go 0.0. o* * o*W
. 0 -~~~
0000 .. -.: . . *.--- - *:.w-.
0 . 00: .000 .0 0 0'
.
00 0
0 00 0 0U
(a) Hallway. (b) Forest.
Figure 3-3: (a) Example Markov-chain hallway map, drawn from the same distribu-
tion as the maps for the experiments in Figures 3-6 and 3-7, and (b) an example of a
Poisson forest map drawn from the same distribution as those used for forest experi-
ments in Figure 3-9. Both map types were used in the experiments in Figure 3-4.
3.3.1 Simulation Results
In this section, we present simulation results to compare our planning approach

against the baseline planner outlined in Section 1.1, which guarantees safety using a
conventional safety constraint. This baseline planner is intended to achieve the fastest
guaranteed-safe navigation possible without the aid of special domain knowledge or
hand-coded behaviors of any kind. It represents the planning approach we would use
if we did not have a learned model of collision probability.
We trained and tested our system in hallway and forest environments that were
randomly sampled from environment-generating distributions [1001. Example envi-
ronments are shown in Figure 3-3, with start and goal locations illustrated in green
and red, respectively. We use a Markov chain to sample hallways with a width of 2.5 m
and a turn frequency of 0.4. We use a Poisson process to sample forest environments
with tree radius of 0.75 m, however we modify the traditional Poisson process with a
rejection-sampling technique to ensure a space of at least 1.25 m between obstacles.
This technique enables us to generate denser (and therefore more challenging) maps
while ensuring that there exist feasible routes through the forest. We use a Poisson
rate parameter large enough to saturate the map with obstacles.
Basic Simulation Results in Hallway and Forest Maps
In this section, we present simulation results from hallway and forest environments.
For each environment type, we obtained 50000 training data points in a training map
and then tested in 25 random maps of the same type. In these results, we used the full
82
1 - 1.8
71 Forest 1
Hallway N
0.5 01.4
4 2
7 1.2
0 ,_ ,_ ,_, C 1 ,_ ,_ ,_, 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Cost of Collision: J, Cost of Collision: J, Cost of Collision: J,
Figure 3-4: Simulation results from our planner in 25 random hallway maps and 25
random forest maps. Collision prediction models for each map type were trained in
a hallway map and a forest map, respectively, using 50000 data points in each case.
We illustrate collision rate as a function of J, (left), speed improvement over baseline
(center) and the degree to which our planner exceeded safe stopping distance (right).
Error bars indicate standard deviation for each data point.
30 m sensor range and 360' field-of-view corresponding to a LIDAR sensor. Figure 3-

4 illustrates the fraction of trials resulting in collision (left), the average navigation
speeds of our method normalized by the corresponding speeds of the baseline planner
in the same maps (center), and the average degree to which our planner exceeded the
stopping distance that would have been required to guarantee safety (right).
For small values of Jc, collisions occurred in nearly 100% of the trials in both
environments, but for J, > 0.3, our planner succeeded in 100% of trials in both
environments. In between these values, the system underwent a transition, becoming
just cautious enough to avoid collision as J, was increased. At J, = 0.3, our planner
exhibited greater than 50% speed improvement over the baseline planner in hallway
environments, and more than 20% speed improvement over the baseline planner in
forest environments. In both map types, our planner significantly exceeded the safe
stopping distance on average, yet remained empirically collision-free. This measure of
constraint violation is an indicator of the degree to which standard safety constraints
are overly conservative when equipped with a learned model of collision probability.
We report simulation results as a function of Jc, the cost assigned to collision,
which is empirically correlated to how much risk the user is willing to accept. While
it might be more intuitive to specify a chance constraint rather than a collision cost,
it is not straightforward or simple to apply chance constraints in our case. The reason
83
Figure 3-5: Example trajectories taken by the baseline planner (red) and our planner
(blue). The baseline planner typically hugs the inside of turns in order to greedily
pursue the goal as directly as possible, while our planner remains close to the middle
of the hallway, giving it greater freedom to steer wider, more gradual turns.
is that the agent has no direct control over what belief it may encounter next, and it
is possible that it may be faced with a scenario in which all action choices carry an
estimated collision probability greater than the chosen threshold. Therefore, we use
J, in an analogous fashion, understanding that it maps empirically to some measure
of risk or success rate.
Figure 3-5 illustrates the subtle differences between trajectories taken by the base-
line planner and our planner in a hallway map. Without a model of collision proba-
bility, the baseline planner is drawn toward the inside of each hallway segment and
corner by the heuristic function attempting to minimize distance to the goal, which in
turn restricts its visibility around corners and requires it to make sharper (and hence
slower) turns. In contrast, our planner generally remains near the center of each
hallway segment, giving it a safer distance from walls and enabling it to make more
gradual high-speed turns. The more dramatic difference between these trajectories is
in the robot's speed, which is not illustrated in this figure. The paths through forest
maps taken by the baseline planner and our planner are not visually distinguishable
in significant ways, and are not shown.
In our previous work, we presented a similar result showing approximately 80%

faster navigation than baseline for J, = 0.25 in random hallway environments, using
50000 data points [101], which is greater than the speed improvement we report in
this thesis. While both figures are valid, the difference results from a slight change in
84
one parameter of the dynamics model, which we modified to better reflect the true
high-speed turning abilities of our experimental RC car. We reduced the maximum
curvature at top speed from 0.25 m-1 to 0.1 m 1 , making the vehicle less maneuver-
able at high speeds. As a result, it takes greater time and distance to react to new
observations than in Richter et al. [101].
Performance as a Function of Data Volume
Figure 3-6 shows the performance of our planner using different data/prior configu-
rations and varying amounts of training data, for different J, values, in 25 randomly
sampled hallway environments. We report performance using data + prior, prior only,
and data only.' The training dataset used for these trials was collected in a separate
hallway environment drawn from the same distribution as the test maps.
The left plot in Figure 3-6 shows that for low values of J, all three planners
crashed in every trial, but became safer as J, was increased. For J, = 0.3 and
above, all planners succeeded in reaching the goal without collision 100% of the time,
regardless of the amount of data or use of the prior. These results are consistent with
the basic simulation results presented in Figure 3-4.
The middle plot in Figure 3-6 shows that there was a gradual increase in speed
as more data were used. The reason for this behavior is that the prior is a conserva-
tive (over-)estimate of collision probability, by design. On the other hand, training
data give a more-accurate, unbiased estimate of collision probability. Therefore, as
the volume of data increases and relative weighting shifts from prior to data in the
Bayesian prediction of collision probability, the output of predictions generally shift
downward, encouraging the planner to select faster trajectories for the same expected
collision cost. For N = 50, a very small amount of data, performance was nearly
identical to the performance using the prior alone (N = 0). However, for N = 500
and N 5000, the data began to contribute more influence relative to the prior.
For N = 50000, typical predictions were made with 102 < Neff. < 103 , and therefore
4
We implement the no-prior case by setting prior values of a and #3each to 0.0005 to ensure the
solution is computable in regions of feature space with no data at all. The default prediction with
no data is therefore P("collision") = a/(a + /3) = 0.5, rather than undefined if a = /3 = 0.
85
1 18
D.O., N = 50000
D.P., N = 50000 N 4
D.P., N = 5000 ' 1.6
D.P., N = 500
D.P., N = 50 1
0.5 P.O.,N=0 1.4
2
1.2
0 -_,_,_,_, C 1_ 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 3-6: Simulation results from our planner in 25 random hallway maps using
varying amounts of training data from zero to 50000 data points. We use the ab-
breviations P.O., D.P., and D.O. to refer to Prior-Only, Data+Prior, and Data-Only
configurations, respectively. Our planner achieved a 100% success rate for Jc > 0.3
regardless of the amount of training data (left), but improved in speed relative to the
baseline planner as more data points were used (center). As more data were used,
our planner exceeded its safe stopping distance by a greater margin while remaining
collision-free (right). Error bars indicate standard deviation for each data point.
the data carried considerably more weight than the prior. At J, = 0.3, our solution
navigated 52% faster than baseline using data and prior, and 57% faster than baseline
using data alone. It is reasonable to expect that as N tends toward infinity, and as
the robot obtains dense training data across all of the regions of feature space en-
countered in these environments, the difference between the performance with data
only and data + prior will vanish.
Restricted Sensor Range and Field-of-View in Hallway Environments
We trained and tested our system in simulation using both 60' and 360' field-of-view
sensor models and both 5 m and 30 m range. The 360' field-of-view, 30 m sensor
model represents the capabilities of a planar LIDAR, while the 60' field-of-view, 5
m sensor model represents the capabilities of an RGBD camera such as a Microsoft
Kinect sensor. We examined all four combinations of.these parameters to determine
their effects with dense training data (N = 50000).
Figure 3-7 illustrates the performance of the planner in 25 random hallway environ-
ments per sensor configuration. In hallway environments, we observed a maximum
speed improvement over the baseline planner of nearly 250% for 60' field-of-view
86
1 _3
Q 60, 5m 8
360', 5m 2.5
60', 30m
0.5 360, 30m 1 2
1
4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 3-7: Simulation results from four different sensor configurations in 25 random
hallway maps per configuration. Empirical collision frequency was 0% for J, > 0.3
(left). Speed improvement over baseline was greatest for 600 field-of-view (center).
Actual speed was comparable for all sensor configurations (right), while it was the
speed of baseline planners that was affected by the restricted field of view (dashed
horizontal lines, right). Error bars indicate standard deviation for each data point.
sensor models. In these trials, the performance was very similar for both 600 sensor
models regardless of range, and likewise for both 360' sensor models. The middle plot
of Figure 3-7 shows that the speed improvement above baseline was much greater for
the 600 field-of-view than it was for the 3600 field-of-view.
However, surprisingly, the absolute speeds of our planners were roughly compa-
rable regardless of sensor configuration, and it was the baseline planner that was
significantly slowed by the restricted field-of-view (Figure 3-7, right). The green and
magenta dashed lines indicate the average speed of the baseline planners with 360'
field-of-view to be about 5 m/s, while the blue and red dashed lines indicate that the
average speed of the baseline planners with 600 field-of-view was just over 3 m/s.
We illustrate the effect of a limited field-of-view on the baseline planner in Fig-
ure 3-8. The limited visibility at hallway corners with the 600 field-of-view sensor
significantly limits the observable free space for the baseline planner and hence limits
the length of the emergency-stop action that can be planned. Note that we have
already observed this limitation in Section 1.3 and will examine it further in the con-
text of information-gathering behaviors in Chapter 4. With a 360' field-of-view, the
baseline planner is significantly less restricted, and therefore navigates faster.
On the other hand, our planner is not constrained to plan within visible free
space-recall that unlike the baseline planner, it is not subject to the constraint
87
(a) 60' field-of-view. (b) 3600 field-of-view.
Figure 3-8: (a) With a 60' field-of-view sensor, the baseline planner approaches a
blind corner and slows down. However, upon rounding the corner, the planner is still
confined to plan in a very small sliver of free space, and hence continues at a slow
speed until it completes its turn. (b) On the other hand, the 360' field-of-view sensor
reveals much more of the area around the corner, enabling the robot to accelerate
as soon as it reaches the corner. This blind-corner effect illustrates why the baseline
planner is much slower with 60' field-of-view than with 360' field-of-view.
g(bt, at) = 0. Even with the restricted field-of-view, it can make a judgment of
risk based on local characteristics of the environment that have been observed. For
instance, as it approaches the corner, its distance from the sides and end of the
hallway segment may still be observable. Therefore, it is fully capable of determining
a safe speed at which to round a blind corner even if its restricted sensor has not yet
perceived the free space around the corner. The learned model is able to adapt and
compensate for each different sensor model through training data and effectively fill
in some of the information that is missing for the more restricted sensor models.
For other choices of hallway distribution, it may be possible to produce a greater

performance difference between our planners with the four different sensor configura-
tions. For instance, if the hallway distribution included many long straight segments,
our planner with a 30 m sensor range would be able to detect greater free distance
than the 5 m sensor range and exploit that knowledge to travel faster, however straight
segments longer than approximately 8 m are infrequent in our hallway distribution.
88
259
1
) 60', 5m 12
-Ci 360', 5m
60, 30m 2
. 360*',3Om 10
0.5 0-
Z8
4 1.5
6- - - - ---
- -_-_- ----
0 _-
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Cost of Collision: J,. Cost of Collision: J, Cost of Collision: J,
Figure 3-9: Simulation results from four different sensor configurations in 25 random
forest maps. Collision frequency was 0% for J, > 0.3 (left). Maximum improvement
was approximately 200% of the baseline performance for the 60 field-of-view, 5 m
range, with much less improvement for the 30 m sensor ranges (middle). Absolute
speeds were comparable across all planners using learning (right) while the baseline
planner was slowest for 5 m range sensor models (dashed horizontal lines, right).
Error bars indicate standard deviation for each data point.
Restricted Sensor Range and Field-of-View in Forest Environments
Figure 3-9 shows the corresponding results in 25 random forest environments for
each of the four sensor configurations. In contrast with hallway environments, forest
performance was much more dependent on sensor range than field-of-view. The reason
for this difference is that a robot with a sufficiently long sensor range can observe
free space far ahead along the corridors between obstacles, whereas in our hallway
environments, distant free space is usually occluded by the next turn.
Figure 3-10 illustrates the effect of sensor range on the baseline planner in the for-
est environment. With limited range, the baseline planner is forced to plan relatively
slow trajectories since it cannot fit long emergency-stop actions within its observed
free space. On the other hand, the baseline planner with a 30 m sensor range is able to
plan very long, high-speed emergency-stop actions resulting in fast navigation. The
forest environment with a 30 m sensor range is the environment in which our learned
model of collision probability offers the least improvement compared to the baseline
planner because in this case, a large portion of the free space that would be useful
for navigation can be directly observed.
These examples perfectly illustrate the benefits of collision prediction based on
the local observable characteristics of the environment. Even with a very restricted
89
000
(( 0 r
.0
w
p9. .
0
.e.0. *0
@00
0
.0.
0 00
.0 0 900
@00009000 00
0 0"0. 0.
00 0 09...
(a) 5 m range. (b) 30 mn range.
Figure 3-10: (a) With a 5 m sensor range in the forest environment, the baseline
planner is unable to navigate at high speeds because its stopping distance is severely
limited. The red trajectory in the left image shows the longest emergency-stop action
that could be fit within the known free space. (b) With a 30 m sensor range, the
baseline planner is able to navigate much faster since it can take advantage of much
more known free space ahead to plan longer (hence faster) emergency-stop actions.
5 m sensor range and 60' field-of-view, our learned model of collision probability was
able to accurately determine the maximum safe speed at which to travel through
the forest environment, and suffered no significant performance penalty due to its
restricted sensor. Even though the planner may not be able to perceive obstacles at
distances greater than 5 m with this sensor configuration, it is sufficient to predict
that there will exist some feasible trajectory as long as the robot does not exceed
a certain approximate speed threshold. This approximate speed threshold, in turn,
corresponds to a certain level of expected collision cost as determined by our collision
probability model. These results are conceptually very related to the speed threshold
observed by Karaman and Frazzoli [611 in forest-type environments.
Transition From Familiar to Unfamiliar Environment Types
While results with dense training data make very little use of the prior in our Bayesian
approach, Figure 3-11 illustrates the important role of a prior when transitioning from
a familiar environment type to an unfamiliar one where no relevant training data are
available. For these simulation experiments, we required the robot to navigate hybrid
90
%* M
*
Data Only, Np << 1 Data + Prior, Npr. = 5
3 3
ZC0. 2
ZC.102
+ 10, + -10-1--0-
10 10
Z 10- Z 10
0 50 100 150 200 0 100 200 300 400 500
1E 101
o 0
> 0 > 0
0 50 100 150 200 0 100 200 300 400 500
Time Step Time Step
Figure 3-11: Simulations from a hybrid hallway-forest map. One planner used hallway
data alone with no prior (red), and another planner used hallway data combined with
the prior (blue). Without a safe prior, the planner's data density dropped to zero in
the forest and the robot recklessly accelerated to full speed, resulting in a collision (red
'x'). However, when using the prior in addition to training data, the planner safely
navigated through the forest and reached the goal. The forest regions on the graphs
are shaded in gray. While it may seem counterintuitive that the speed increased in
the unfamiliar forest environment while under the guidance of the prior, the higher
speed is actually safe for the forest environment given the sparsity of obstacles.
environments consisting of hallway and forest sections, using only training data from
a hallway and no data from a forest. We compared the performance of the planners
with and without the use of the prior while traversing the unfamiliar forest region.
In the example shown in Figure 3-11, when the planner transitioned from the
familiar hallway portion of this environment to the unfamiliar forest region, without
the prior, the effective data density dropped essentially to zero and it was unable
to distinguish between safe and risky behaviors. Therefore the planner recklessly
accelerated to full speed resulting in a crash (red 'x'). However, using our Bayesian
approach, the effective number of data points dropped only as low as Npr. = 5 when
the robot entered the forest, and the planner was safely guided by the information
in the prior. In 25 trials of random hallway-forest maps, 100% succeeded using the
prior, while only 12% succeeded without the prior, emphasizing the importance of
detecting scenarios when our model has not received the appropriate training.
91
Figure 3-12: Experimental 1:8 scale RC car platform, equipped with a Hokuyo LI-
DAR, Microstrain IMU, and Gigabyte Brix computer.
3.3.2 Demonstrations on Autonomous RC Car
We conducted experiments on the RC car pictured in Figure 3-12 with all sensing and
computation performed onboard the vehicle. The car was equipped with a Hokuyo
UTM-30LX LIDAR, a Microstrain 3DM-GX3-25 IMU and a Gigabyte Brix Intel dual-
core i7 computer with 16GB RAM. We used an ArduPilot as the interface between the
computer and the servo inputs of the RC car. We performed state estimation by fusing
the LIDAR and IMU in an extended Kalman filter (EKF) [26]. We used the LIDAR
to provide relative pose updates to the EKF using a scan-matching algorithm 191.
We built maps online by projecting LIDAR scans into an occupancy grid map from
the robot poses estimated by the EKF, without performing any loop closures or pose
graph optimization, which made the mapping process extremely efficient.
We performed control using a combination of feed-forward and feedback tech-

niques. We collected data from manually controlling the RC car through many ac-
celeration, braking and steering maneuvers, and used that data to empirically fit
linear models mapping from desired curvature to steering PWM commands and from
desired acceleration and velocity to throttle PWM commands. The reference con-
figurations along each planned trajectory were converted to steering and throttle
PWM commands and issued directly to the servos. In addition to this feed-forward
control, we also performed proportional feedback control around the reference ve-
locity, and PD control around the reference heading and cross-track error of the
92
car's configuration. Video of our experimental demonstrations, at speeds over 13
m/s is available at: http://groups. csail.mit. edu/rrg/videos/high-speed-nav_
unknownenvironments.
Hallway-Type Environments
We performed experiments in a hallway-type environment, pictured in Figures 3-13(a)

and 3-13(b), using the full 30 m range and 270' field-of-view of the Hokuyo LIDAR.
This environment had a hallway width of approximately 1.5 m and six 90' turns.
We tabulated results for the baseline planner and our proposed method in Table 3.1,
showing that our planner reached the goal in less time and with higher average and
maximum speeds than the baseline planner.
7r
(a) Environment.
6 -__ Baseline
Our Method
54
A
2 4 6 8 10
Time (s)
(b) Generated map. (c) Speed profile for both planners.
Figure 3-13: (a)-(b) Hallway-type experimental environment and (c) speed vs. time
for the baseline planner and the proposed method in this environment. The dashed
lines in (c) indicate the average speed over the "representative" period of each trial,
between the initial acceleration and the final deceleration at the goal.
93
Table 3.1: Performance in the environment pictured in 3-13. "Representative" speed
is defined as the average speed over the interval between the initial acceleration from
rest and the final deceleration at the goal.
Time to Average "Representative" Maximum

Goal (s) Speed (m/s) Speed (m/s) Speed (m/s)
Baseline 9.59 2.04 2.37 3.06
Our Method 7.71 2.85 3.71 4.74
For these experiments, we used J, = 0.25 and recorded an average speed 40%
higher than the baseline planner (higher is better). The time-to-goal was 80% of the
time-to-goal for the baseline planner (lower is better). The discrepancy between these
two metrics arises from the fact that our planner took wider turns at speed, resulting
in a longer overall distance travelled to reach the goal. The baseline planner traveled
20.2 m while our planner traveled 23.1 m.
Figure 3-13(c) shows the speed profile of both planners. In this short course, we
observe that a substantial portion of the time for both planners was spent in the
initial acceleration and final deceleration at the goal, which lowers the average speed
of both methods. However, we observe a greater difference in "representative" speeds
if we look at only the period between the initial acceleration from rest and the final
deceleration upon reaching the goal. The averages computed over these periods of
time are shown for both the baseline planner and our planner as dashed lines in
Figure 3-13(c). According to these "representative" sections of the experiment, our
planner maintained an average speed 57% greater than the baseline planner, which
is approximately consistent with our simulation results.
While these results show an improvement over baseline, there are two important
differences compared to our simulation results. First, the 1.5 m hallway width is sub-
stantially narrower than the 2.5 m hallways used in our simulation results. We chose
to use narrower hallways and shorter hallway segments in order to fit as many turns
as possible into the confined laboratory space we used to construct this environment.
However, the turning radius of our RC car is approximately 1 m at low speeds (and
increases with speed), which means that making a 900 turn in a 1.5 m wide corridor
94
is already near the kinematic limits of what is possible even at low speeds. Therefore
these environments allow less room for error than our simulation environments.
The second challenge we faced in these very narrow hallways with 900 turns was
that our RC car's turning ability deviated from the kinematic model used in the simu-
lator when accelerating. For reasons including wheel slip and possibly weight transfer
off of the front wheels under acceleration, we found that our RC car was unable to
make as tight a turn at a given speed while accelerating as it was coasting at the same
speed. While this deviation from the kinematic model is not especially noticeable in
environments requiring only gentle turns, it is very detrimental in environments such
as this one with narrow corridors and tight turns. In this environment, many of the
turns required maximum steering input, leaving no additional control authority for
feedback to correct errors. As a result, attempting to accelerate with the car's full
acceleration capability while turning would often result in the car sliding into a wall,
both for the baseline planner and our planner.
To account for these challenges in the experiments depicted in Figure 3-13, we
computed a new set of actions especially for very confined environments, taking into
account both the scale of the environment and the issue of steering under acceleration.
For this new action set, we chose a planning horizon of 1.5 m (rather than our usual
2 m) and reduced acceleration limit of 2.5 m/s2 (rather than our usual 5 m/s 2 ). We
will discuss additional shortcomings of our vehicle model in more detail below. This
new action set was used for both the baseline planner and our proposed method. We
also trained a new model of collision probability with a new dataset using 1.5 m wide
hallways with a turn frequency of 0.8 to represent the experimental environment.
Limited Sensor Range and Field-of-View
One surprising outcome of our simulation results was that the performance of our
planning approach was relatively unaffected by limitations in sensor range or field-of-
view for the types of environments in which we tested it. We performed an experiment
in the environment pictured in Figures 3-14(a) and 3-14(b) to test this result, with
the sensor artificially limited to a 5 m range and 60' field-of-view.
95
(a) Generated map.
15
Baseline
Our Method
10-
0
0 5 10 15
Time (s)
(b) Environment. (c) Speed profile for both planners.
Figure 3-14: (a)-(b) Experimental environment for limited sensor range and field-of-
view test, and (c) speed vs. time for the baseline planner and the proposed method in
this environment with 5 m sensor range and 60' field-of-view. Obstacles were placed
along the sides of the hallway, requiring a slalom path to avoid collision.
As in our simulation results, the baseline planner was limited primarily by the sen-
sor range in a manner illustrated in Figure 3-10(a). This is an environment in which
a longer sensor range would reveal a large amount of free space ahead. While this
limitation prevented the baseline planner from selecting actions above approximately
5 m/s, our method was able to make reasonable judgements of the highest speeds
that were likely to be safe based on the observable characteristics of the environment.
In this experiment we used the same action set as in our simulation results, and
Table 3.2: Performance in the environment pictured in 3-14.
Time to Average Maximum

Goal (s) Speed (m/s) Speed (m/s)
Baseline 12.07 4.22 5.78
Our Method 8.37 6.58 10.09
96
we used the same training dataset that was used in the hallway simulations with 5
m sensor range and 60' field-of-view. We used J, - 0.25 for this trial. Table 3.2
reports the greatest improvement over the baseline planner of all our experimental
trials, with a time-to-goal of only 69% of the baseline planner, an average speed 56%
greater than the baseline planner, and a maximum speed over 10 m/s.
Transition from Familiar to Unfamiliar Environment Types
We conducted experimental trials to show that in real-world heterogeneous environ-

ments, our planner can safely navigate in unfamiliar regions where it has no relevant
training data. Figure 3-15 shows an experimental trial in the 4th floor of the Stata
Center (MIT), which is composed of straight hallways and larger unstructured re-
gions. In this case, we used the full 30 m range and 270' field-of-view of the LIDAR.
For this experiment, training data were again provided from a simulated hallway en-
vironment, which resembles the straight hallway segments, but looks very different
from the open, unstructured regions. The inset graph shows the effective data den-
sity. In the hallways, the planner used approximately 102 effective data points per
prediction. However, in the open unstructured regions, Neff. dropped to zero, and the
only information available to the planner was contributed by the prior. These open
unstructured regions are marked with purple and blue diamonds. The weight of the
prior was Npr. = 5, which is illustrated as a dashed line in the inset graph.
This experiment corresponds to the type of simulation results we presented in
Figure 3-11, in which we reported only a 12% success rate for trials conducted without
using prior knowledge. Therefore, it is possible that without using our Bayesian
approach in this experiment, the robot would have behaved recklessly and collided
while navigating through the portions of the environment for which it was untrained.
Maximum Speed Test
In order to identify potential speed limits of our algorithms and RC car platform, we
performed experiments in a parking garage with sparsely placed obstacles using the
full 30 m range and 270' field-of-view of the sensor. The map generated from this
97
1.L-
+Z
Data Density
10
z10
10 3.
I -
-
0 10 20 30
Time (s)
Figure 3-15: Trajectory through the Stata Center (MIT), traversing hallway and large
unstructured regions. Our planner was trained in a hallway environment, but not in a
large unstructured environment. Neff. drops to zero in the open unstructured regions
(purple and blue diamonds) and it must rely on the prior to navigate safely.
experiment, along with the trajectory of our planner, is illustrated in Figure 3-16.
Since very little of this environment was occluded from view, and most space was
free space, our planner chose to accelerate to maximum speed using full throttle and
was able to avoid obstacles. Full throttle in this environment resulted in a maximum
speed of 13.1 in/s. The length of this experiment was limited by a speed bump in
the parking garage, however given a longer test environment, the RC car would likely
have been able to achieve a slightly higher top speed.
Significant Shortcomings of the Kinematic Vehicle Model
While the speeds exhibited by our planner in the preceding experiments are approx-
imately consistent with our simulation results, we did not achieve the same reliable
collision-free execution that we observed in simulation. In 11 attempts in the envi-
ronment depicted in Figure 3-14, our planner successfully reached the goal in only
three of the trials. These trials were performed with different obstacle configurations
98
20
Ou Mthod
715
'& L
~' 10
r5
- 0
.-. , 0 Time (s 6
(a) Generated map. (b) Trajectory. (c) Speed profile.
Figure 3-16: (a)-(b) Map and trajectory weaving through sparse obstacles in a parking
garage. Note the regular pattern of columns, as well as the smaller obstacles placed
in the path of the robot.(c) Velocity profile reaching 13.1 m/s.
and goal locations, so they varied in difficulty and are not directly comparable with
each other. Therefore they are not averaged together in Table 3.2.
However, in every case, failure was a direct result of a deviation between the
planner's internal kinematic vehicle model (described in Section 3.1) and the true
dynamics. Specifically, we observed significant sliding behavior and wheel slip during
turns, especially at high speeds and under aggressive accelerations.
Figure 3-17 illustrates a characteristic deviation between the planner's internal

vehicle model and the actual dynamics that appeared in nearly every run in this en-
vironment. The robot configurations, and associated action choices by the planner,
are numbered 1 through 13. The action selected by the planner in configuration 1
was a mild left turn to avoid the upcoming obstacle on the right side of the hallway.
However, it was not until the robot reached configurations 3 and 4 that it had fully
initiated the turn. In configurations 3-5 there is a visible crab angle of approximately
10' between the heading of the robot configuration and the actual path traced out by
the sequence of configurations. This crab angle indicates sliding. The steering com-
mand issued to initiate the turn from configuration 1 resulted in the desired change of
heading, but the robot did not track along that chosen action as the kinematic model
predicted. The same pattern was repeated and exacerbated in configurations 6-10, in
which the planner not only had to recover from the sliding turn, but also counteract
its momentum toward the left wall and avoid an upcoming obstacle on the left side of
the hallway. This even more aggressive correction resulted in a fishtailing maneuver
99
1 2 3
4 5 6 7 8 9
10
13
L
Figure 3-17: State history exceeding 9 m/s (teal line) from a failed attempt in the
environment depicted in 3-14. Actions selected by our planner at different planning
iterations are superimposed (blue lines). This sequence of states illustrates a sig-
nificant deviation between the actual vehicle behavior and motions intended by the
planner due to wheel slip.
with a crab angle of approximately 200 with respect to the teal path. During this
fishtailing maneuver, the car slid into an obstacle on the left side of the hallway at
configuration 10.
Due to the deviation between the true vehicle dynamics and the kinematic model,
the actions selected by the planner were rarely executed precisely. Nevertheless, each
action choice may have been a perfectly sensible one, given the kinematic model. It is
reasonable to expect that given a much closer match between the true dynamics and
the planner's internal model, we should observe much more robust and reliable execu-
tion, as we do in simulation. Correcting the current model discrepancy is not simply
a matter of fitting model parameters more accurately, since the current kinematic
model does not account for any form of wheel slip. Furthermore, it is not necessarily
possible with different feedback design either, since feedback-based steering is also
likely to result in sliding. While detailed dynamic modeling is outside the scope of
this thesis, it will be necessary to develop a more accurate dynamics model to truly
push the limits of robust and repeatable high-speed navigation.
For the RC car experiments described in Chapters 4 and 5 we took some steps to
limit these issues with the vehicle model. First, we artificially restricted acceleration
and braking to 2.5 (m/s) 2 , since heavy acceleration and braking during turns tends
to break traction and lead to skidding. Second, we redesigned the layout of heavy
components mounted to the robot, including the LIDAR and battery, in order to
lower the center of mass. Finally, we changed the robot's heading estimate to be
100
computed not from the body axis of the robot, but instead from the direction of its
velocity vector. This final change enabled the robot's feedback control to counteract,
rather than exacerbate, the oversteering effect that led to sliding.
3.4 Chapter Summary
In this chapter, we have presented a mathematical formulation of the problem of

high speed, collision-free navigation in unknown environments. We have observed
that most planning approaches make worst-case assumptions about the occupancy
of unknown regions of the environment in order to guarantee that no collisions will
occur. However, real-world environments very often do not realize these worst-case
assumptions. Instead, real-world environments often follow fairly common and pre-
dictable structures such as corridors in a building. We capitalize on this gap between
worst-case assumptions and average-case structures in real environments by learning a
data-driven model of collision probability, capturing the essential characteristics of the
true environment distribution of interest. We have shown that planning high-speed
trajectories with respect to this type of data-driven collision probability model results
in significantly faster navigation, in a variety of environment types, than a planner
constrained to guarantee safety with respect to worst-case environment assumptions.
There are many cases where it would be desirable for an autonomous robot to
navigate as fast as possible, and perhaps to take some mild risks in doing so. For
instance, when deploying small, lightweight drones for reconnaissance or search and
rescue in an emergency situation, it might be of vital importance to reach targets
as quickly as possible. Furthermore, it is reasonable to expect that as robots and
their control algorithms become more agile and robust, occasional mild contact with
the environment may not be catastrophic, and may instead be a necessary part of
maximizing performance as is the case with one of our motivating examples, the
goshawk. Advancing the limits of speed and agility is one of the goals of this thesis,
and this chapter contributes an effective technique for enabling faster navigation.
We have also shown that by using a Bayesian learning algorithm, with safety
101
constraints encoded as a prior over collision probabilities, our planner can detect
when it lacks the appropriate training data for its environment and seamlessly revert
to safe behaviors. Our strategy offers a simple and elegant probabilistic method of
merging the performance benefits of training experience with the reassurance of safety
constraints when faced with an unfamiliar type of environment.
There are several potential alternatives to the approach we have proposed in this
chapter, but we believe that our approach has significant advantages. First, we might
consider attempting to model distributions over small components of environments,
such as the straight segments and turns in a hallway, and then employ POMDP solu-
tion techniques to plan online. An And-Or (or "Expectimax") method, for instance,
could be used to perform a search over action-observation sequences to some horizon.
While possibly feasible for very small toy problems, this approach would be compu-
tationally intractable for our system due to the large number of actions and much
larger number of observations to be considered. For instance, the branching factor
over actions would be on the order of 102, while the branching factor over observations
of even a small 10 x 10 cell patch of map would be 2100. The very high dimensionality
renders these problems infeasible even if the environment distribution were known.
While it may be possible to design very clever discretizations of the observa-

tion space to make these problems tractable for particular environment types, these
discretizations would not transfer to other environment types. Likewise, the distri-
butions over environment structures, or at least their parametric forms, would need
to be designed by hand and provided to the planner. While it is true that we need to
design features by hand for our prediction problems, these features have the distinct
advantage of functioning well across environment types. Therefore, compared to any
form of conventional POMDP solver, our approach eliminates the need to specifically
define the environment distribution and discretization of the observation space to
make the problem solvable. Furthermore, while it is unclear whether a lightweight
POMDP solution approach could be made efficient enough to solve a single problem
instance, our approach is efficient enough to run online.
Compared to model-free reinforcement learning methods, we also observe signifi-
102
cant advantages. It is common for reinforcement learning problems to require a great
deal of manual design effort crafting reward functions that encourage the robot to
learn a particular task, which is known as the problem of reward-shaping. Further-
more, reinforcement learning agents often need to interact with the real world over
long sequences of actions in order to obtain meaningful reward signals from which to
learn [63]. On the other hand, by learning to predict a specific, physically meaningful
quantity, we do not need to carefully fine-tune a complex reward function and we can
be much more targeted in our data collection. We can directly compute the results of
specific action sequences, converting what might otherwise be temporally extended
training episodes into simple labels for classification. The result is a model that is
easy to train and transparently understandable.
Although we have presented results for our method under some fairly idealized
circumstances in simulation, it would be useful in future work to consider relaxing
several of the assumptions made in Section 3.1. For instance, we have assumed
perfect knowledge of the dynamics model and perfect state estimation. However, if
there were some noise in our state estimate or some uncertainty in our dynamics
model, we could incorporate such effects into our training. We do not require the
planner itself to have a model of the noise or uncertainty, we only need to be able to
simulate training examples offline, with noise, to produce labeled training data. Since
state estimator noise and model uncertainty would both tend to increase the chances
of collision, these effects would simply increase the fraction of collision labels in the
training dataset, which would then produce a more conservative collision probability
model. A model trained with state estimator noise and uncertainty in the dynamics
would probably encourage the robot to keep a greater distance from obstacles and
perhaps drive more slowly to maintain greater maneuverability.
One of the main limitations of the work in this chapter is the difficulty of manually
designing feature functions that describe complex planning scenarios. In Chapter 5,
we will return to this problem using a neural network to automatically generate
feature functions from data, extending our methods to handle raw sensor data rather
than simple geometric representations. However, adopting a neural network approach
103
will make our Bayesian method unviable, requiring an alternative novelty-detection
approach to achieve analogous results.
Finally, this chapter has explored the high-level strategy of identifying a key
contributing factor to the value function in a planning problem-in this case, col-
lision probability-and approximating it using supervised learning. In the realm of
POMDPs and unknown environment navigation, there are numerous other oppor-
tunities to apply this same strategy. In the next chapter, we will explore one of
these opportunities by training a regressor to predict the navigational benefit from
taking sensor measurements from various points of view within a partially known
environment. Until now, we have been willing to trade the guarantee of safety in
favor of a data-driven approach that improves navigation speed. However, there are
many cases where safety must be enforced absolutely. For instance, when navigating
a full-sized autonomous car, a collision could cause significant damage. Therefore, in
the next section we will re-impose safety constraints and show that we can still sig-
nificantly improve navigation performance by predicting measurement utility, rather
than collision probability.
104
Chapter 4
Predicting the Utility of Future

Measurements
In this chapter, we will develop a method of learning to approximate the utility of

future sensor measurements for minimum-time navigation in unknown environments
with guaranteed safety. In the previous chapter, we enabled a planner to select actions
that could commit the robot to entering the unknown, by equipping it with a model
of risk associated with such actions in place of the conventional safety constraint.
However, robots that guarantee safety do not have the option of planning into the
unknown. Instead, as we have illustrated in Section 1.3, they may be very restricted
if they have not obtained measurements from key relevant regions of the environment.
Now, we will help to alleviate this restriction by guiding them to gain advantageous
sensor views of these unknown regions.
Receding-horizon planning subject to a safety constraint can be expressed mathe-

matically with the following optimization problem, which we presented in Section 1.1:
a*(bt) =argmin{f Ja (at) + h (b, at)} (4.1)

atcA
s.t. g(bt, at) = 0. (4.2)
This optimization problem makes use of the same heuristic, h(bt, at), that we have dis-
105
(a) 4 m/s. (b) 1 m/s. (c) 1 m/s.
Figure 4-1: Sequence of actions chosen for a robot approaching a blind corner while
guaranteeing safety with a 60' sensor field-of-view. Chosen trajectories are drawn in
blue, and feasible emergency-stop maneuvers are drawn in red.
cussed previously. In an unknown map, it is very difficult or impossible to accurately

estimate the remaining cost-to-goal-as we discussed in Section 3.1, doing so would
amount to solving a difficult POMDP-so the heuristic is inherently a simplified
approximation. Common approximations often guide the robot along the apparent
shortest path to the goal.
In Figure 1-1 (reproduced in 4-1 for convenience), we observe that this heuristic
greedily guides the robot toward the inside of the corner to minimize distance to
the goal. However, from this point of view, it is unable to see the free space ahead.
Without visibility around the corner, the local planner is forced to select an action that
slows down from 4 m/s to 1 m/s to preserve sufficient stopping distance. This common
heuristic ignores the fact that the constraints imposed by the current unknown regions
of the map may actually be lifted when future observations are taken and more free
space is added to the map. A more accurate heuristic might have guided the robot
along a slightly longer path, sacrificing some immediate progress in exchange for
improved future visibility into the unknown regions of the map. Greater visibility
might, in turn, enable faster actions in future planning steps, resulting in an overall
reduction in cost-to-goal.
Unfortunately, reasoning explicitly about this observation-action interaction again

typically implies the daunting complexity of POMDPs. To avoid this complexity while
106
retaining information-gathering behaviors, our approach is to augment the shortest-
path heuristic with a learned function that models the change in cost-to-goal resulting
from the next observation and subsequent action. Next, we will derive this learned
model and use it to efficiently reason about the observation-action interactions that
are ignored by common shortest-path heuristics.
4.1 Measurement Utility Modeling Approach
In a conventional POMDP setting, we might consider searching forward through all

possible sequences of actions and observations in order to select a next action that
minimizes cost in expectation. The value or utility of a measurement would then
be reflected directly in the set of expected outcomes. In a POMDP, the value of
information is indistinguishable and inseparable from other elements of the problem.
However, recalling the barriers to POMDP solution techniques described in Sec-
tion 3.1, our approach will instead be to use machine learning to somehow approxi-
mate the true optimal expected cost-to-goal from training examples. Using the results
from Section 3.1, and equation (3.9) in particular, let us define the expected cost-to-
goal under the optimal policy as:
h*(bt, at) = E P(st+1 bt, at) V*(st+1). (4.3)

St+1
In other words, if our heuristic were somehow perfectly accurate, then we would
have h(bt, at) = h*(bt, at), and our planner, following the optimization problem in
equations (4.1) and (4.2), would act optimally. Our goal will be to modify or augment
h(bt, at) in order to more closely resemble h*(bt, at).
While we could consider attempting to learn the entire global heuristic from data,
this quantity is not likely to be predictable due to the wide range of possible map
geometries and sizes. Instead, we observe that h(bt, at) gives reasonable guidance
much of the time, but is missing specific information about the effects of possible
map observations in the immediate future, which is more local and hence predictable
107
than the entire optimal cost-to-goal. Therefore, we will model the effects of possible
future observations in the form of a correction to the shortest-path heuristic. Our
learned function fA(#h(bt, at)) is intended to model the difference between the shortest
path heuristic and this true optimal cost-to-goal:
fh(Oh(bt, at)) h*(bt, at) - h(bt, at). (4.4)
By capturing this effect from training data, we avoid the need to explicitly model the
environment distribution and search over the vast space of observations. And, while
we cannot compute h*(bt, at), even offline during training, we can locally approximate
it using a one-step look ahead technique, which we describe next.
4.1.1 Training
We train the model fh(#h(bt, at)) using a dataset of labeled belief-action pairs, which
are intended to represent realistic scenarios the robot could encounter. We generate
each data point in simulation by first randomly sampling a robot configuration within
a training map. Figure 4-2(a) shows a training map we use for hallway environments.
Since all corners in our hallway environments share the same geometry (for a given
hallway width), it is sufficient to train on a map with a single corner. To restrict
ourselves to realistic configurations the robot might encounter while approaching a
turn, we sample uniformly from the blue rectangular region, with random headings in
the range [-45', 450] with respect to the hallway direction, and with random speed.
We then generate a belief bo from the point of view of the sampled robot configu-
ration. To generate a realistic belief in general, it would be necessary to aggregate a
history of measurements, which in turn would require us to sample a realistic history of
states for each data point. To avoid this complication, we generate the sampled belief
by simulating a single sensor measurement with a 360' field of view from the sampled
configuration. This strategy reveals the local map around the sampled configuration
while still capturing the effects of occlusions due to walls, reasonably approximating
beliefs encountered at runtime.
108
(a) (b) (c) (d)
Figure 4-2: (a) Training map, with blue region from which training configurations
are sampled. (b)-(d) Training sequence: (b) Random robot configuration is sampled,
with belief bo and feasible candidate action ao; (c) the sensor is simulated from the
end of ao, yielding belief b, containing additional observed free space; (d) next action
a 1 is selected to minimize cost, given the newly observed free space in belief bl. The
choice of a, will be used to compute the label of this data point.
Having sampled a configuration and generated the initial belief, we then randomly
select a feasible action ao the robot could execute, subject to the constraint g(bo, ao) =
0, which states that the action is collision-free and allows the robot room to stop
within known free space (Figure 4-2(b)). Next, we simulate the sensor, with the
actual field-of-view, from the end of ao and incorporate the measurement into bo to
form a new updated belief, b1 (Figure 4-2(c)). Then, we select the next action, a1
,
from the end of ao, that makes the most progress toward the goal given the updated
belief b, and subject to the constraint g(bi, a,) = 0 (Figure 4-2(d)). Progress is
measured with a calculation of h(bi, a,), from the end of a 1 .1
The training label is then computed using the optimal choice for a,:
y(bo, ao) = min{j(bi, a,) + h(bi, ai)} - h(bo, ao). (4.5)

alEA
Thus, while we cannot compute h*(bt, at) even offline during training, our training
labels still approximate the desired result of capturing the deviation from the shortest-
'When computing h(bi, ai), we assume that the speed of the robot returns from the terminal
speed of a, to the initial speed of ao according to the acceleration limits of the robot, and then
maintains that speed to the goal. In all other uses of h(bt, at) in this thesis, we assume that the
speed is held constant at the initial speed of at. We make this slight modification to our heuristic
only for the purposes of training, to produce labels that are more physically accurate and capture
the time spent accelerating and decelerating.
109
path heuristic that is due to potential future observations, and subsequent actions
the planner could take, contingent upon those observations.
Finally, we compute a low-dimensional set of features, Oh(bt, at), for each belief-
action pair that capture the useful predictive information. In this chapter, we use two
features based on the boundary between known free space and the unknown, which
we refer to as the "frontier": (1) The fraction of the map frontier in bt that will be
visible along action at, and (2) the average distance from states along the action at to
all frontier locations in the map. There are many other features that could be useful
for this learning task, but these two sufficed for our purposes in this chapter.
4.1.2 Prediction and Planning with the Learned Model
Much like in Section 3.2.1, our dataset of N data points has the form D = {(# 1, yi),
... ,(N, YN)}, where we have temporarily dropped the subscript from Oh. Indices in
this case represent different data points (not time t), and we write #2 as shorthand
for #h(b, a) for the ith data point. We use the Nadaraya-Watson estimator to make
predictions for feature vector / at runtime:
fA(b) = Z= k(, 0)y (4.6)
This estimator is closely related to the Beta-Bernoulli method in equation (3.12),

except that it does not include a prior, and we use it to predict a real-valued quantity
rather than a probability. Our proposed heuristic function is then the sum of the
shortest-path heuristic and the learned function: h(bt, at) + fh(#h(bt, at)). Putting it
all together, our planner repeatedly re-solves the following optimization problem:
a*(bt) =argmin{Ja(at)+ h(bt, at) + fh(#h(bt, at))} (4.7)

atc
s.t. g(bt, at) = 0. (4.8)
If we succeed in modeling fh(Oh(bt, at)) perfectly, then from equation (4.4) we would
have h(bt, at) + fA(Oh(bt, at)) = h*(bt, at), and our planner would act optimally.
110
4.2 Results
To evaluate our method, we trained a separate model for each combination of sensor
field-of-view (from 450 to 120') and hallway width (2 m, 2.5 m and 3 m). For each of
these combinations, we conducted simulation trials in 50 randomly-generated hallway
environments [100], recording the performance of our planner as well as that of the
baseline planner. We selected the hallway environment type because hallways are very
common and also challenging for high-speed navigation due to the visual occlusion at
turns. We selected the range of sensor fields-of-view to be relevant for RGBD sensors
such as the Microsoft Kinect and Asus Xtion, which produce point clouds with 57'
and 58' fields-of-view, respectively, as well as standard and wide-angle cameras.
4.2.1 Learned Models
Figure 4-3 shows examples of learned models for a 60' field-of-view sensor in a 2 m
wide hallway and a 1200 field-of-view sensor in a 3 m wide hallway. Both models
illustrate the same overall trend. Intuitively, the highest cost belief-action pairs occur
when actions are located very near to the map frontier while simultaneously offering
low frontier visibility (lower left corners of plots in Figure 4-3). This scenario occurs
when the robot approaches a corner hugging the inside wall of the hallway, which
enables it to get close to the frontier while keeping the frontier mostly occluded by
the wall. In fact, this costly behavior is exactly characteristic of the baseline planner.
7.73 11.54
1.5 0.6
6.18 9.24
0.0
4.64
3.09 0.5 < 4.62 0.2
1.55 C 2.31 0
0
0.00 0.00
0.00 0.17 0.33 0.50 0.66 0.83 0.00 0.20 0.40 0.60 0.80 1.00
Visible Fraction of Frontier Visible Fraction of Frontier
(a) (b)
Figure 4-3: Learned model fh (h(bt, at)) for (a) 60' field-of-view in 2 m wide hallway
and (b) 1200 field-of-view in 3 m wide hallway.
111
Our models predict a reduction in cost if the planner selects actions that either
increase distance from the frontier or increase visibility of the frontier, or both. These
two features suggest approaching the corner with a wide turn, not only to increase
available stopping distance between the robot and the frontier, but also to take an
observation of more occluded space, thereby pushing the frontier farther ahead.
The only observable deviation from this trend is that in the 60' field-of-view
model, the predicted cost begins to increase slightly as the visible fraction of frontier
rises above 50%. The reason for this cost increase is that for narrow field-of-view
sensors, the robot must drive directly toward the frontier to observe a large fraction
of it, which in hallway environments is correlated with shorter stopping distances and
therefore slower speeds. We believe that additional features would refine the model,
but these features nevertheless capture the intuition that greater distance from the
frontier and observability of the frontier are both advantageous.
4.2.2 Simulation Success Rates
We observed the surprising result that for narrow fields-of-view, the baseline planner
often failed to reach the goal altogether. These failures resulted from the planner
becoming inadvertently trapped in a small region of known free space, such that no
feasible actions would yield views of additional free space. The robot was therefore
forced to stop. In these cases, a 3-point turn would be required to reorient the robot,
take observations revealing free space toward the goal, and proceed. However, we
consider these events to be failures.
This scenario is pictured in Figure 4-4(a), where the gray lines represent all actions
in action set A. Nearly all of them would cause the robot's bounding box (not
shown for these actions) to intersect a wall or an unknown map cell and are therefore
infeasible. The exceptions are highlighted in blue and red. For these actions, we then
determine whether there exists a safe emergency-stop action, illustrated in green, with
the bounding box of the robot at the end of each emergency-stop action illustrated as
a black rectangle. The red actions making progress toward the goal are also infeasible
because the bounding box of the robot at the end of the associated emergency-stop
112
0.8
-
0.6
- Learned, Width = 2.0

0.4 --
G- Baseline, Width = 2.0
Learned, Width = 2.5
0.2 - e - Baseline, Width = 2.5
.- e-- Learned, Width = 3.0
0 -d- - Baseline, Width = 3.0
50 60 70 80 90 100 110 120

Field-of-View (degrees)
(a) Baseline trapped. (b) Success rates.
Figure 4-4: (a) Baseline planner trapped, steering away from the goal, due to in-
sufficient observed free space to the right. The only feasible actions, that are both
collision-free and have a feasible emergency-stop action, are blue. See main text for
discussion. (b) Success rates in reaching the goal as a function of hallway width and
field-of-view, for our method and the baseline planner.
action intersects both a wall and unknown space. The only feasible choices, with
emergency-stop actions lying entirely within known free space, are illustrated in blue.
However, these feasible actions neither make progress toward the goal, nor afford any
useful sensor viewpoints. Hence the planner has become trapped.
While our planner was also susceptible to this failure mode, our learned guidance
function was very effective at avoiding it for certain hallway widths and fields-of-view.
Figure 4-4(b) shows the success rate for both planners, indicating that our planner
was able to reliably reach the goal (for a given hallway width) using a field-of-view
approximately 5-10' narrower than what would be required for the baseline planner.
While these failures depend on the relative size of the hallway compared to the length
of actions, we note that using a planning horizon that is too short can lead to poor
receding-horizon planning behavior in other ways, since short actions cannot span
the robot's true range of turning maneuvers. Therefore, we show this dependency by
varying the width of the hallway instead. These results show that the baseline planner
is likely to fail under some fairly benign and common conditions. For example, using
a Microsoft Kinect sensor on an RC car robot in a 2 m wide hallway could be expected
to succeed only 20% of the time, whereas our method succeeded 100% of the time
under these conditions.
113
1.4 1.25
- - - - - - - - - - - - G- Width = 2.0 G Width = 2.0
- 9--- Width = 2.5 1.2 -G Width = 2.5
. Width = 3.0 - Width = 3.0
0.9 1.15
0.8 1.2
0.12
QG
- Width = 2.0
-G- Width = 2.5 1.05
0.7 Width = 3.0
1 1
60 80 100 120 60 80 100 120 60 80 100 120
Field-of-View (degrees) Field-of-View (degrees) Field-of-View (degrees)
(a) (b) (c)
Figure 4-5: Simulation results: (a) average time-to-goal (lower is better), (b) average
speed (higher is better), and (c) average maximum visible range. Averaged results
are normalized by the corresponding performance of the baseline planner in the same
environments. Plot color indicates hallway width.
4.2.3 Simulation Time-to-Goal as a Function of Visibility
Figure 4-5(a) illustrates the time-to-goal for our planner, normalized by the time-to-
goal for the baseline planner in the same environments. For a given hallway width
and field-of-view, we averaged this normalized time across all trials in which both
planners succeeded. Using our method, we observed times-to-goal as low as 70% of
the baseline value in 2 m wide and 2.5 m wide hallways, for narrow fields-of-view.
We observed smaller, but still substantial improvement in time-to-goal for the 3 in
wide hallway and for wider fields of view. It is intuitively sensible that the maximum
improvement should occur in cases where the observable free space is most limited.
Figure 4-5(b) illustrates the average (normalized) navigation speeds corresponding
to the time-to-goal results. In each hallway width, our solution navigated up to 1.4
times faster than the baseline planner. However, since our planner typically traveled
a slightly longer distance in order to project its sensor visibility around corners, the
increased speed did not necessarily translate to shorter times-to-goal, as was the case
with a 120' field-of-view. Although in principle, our learned function should not guide
the robot to take a longer path unless it is advantageous in terms of time-to-goal, we
have made a number of approximations in training this function, which may account
for the increased time-to-goal in the case of 2 m hallway width and 120' field-of-view.
Figure 4-5(c) quantifies the greater environment visibility achieved by our method.
114
p
(a) 6 m/s. (b) 4 r/s. (c) 6 r/s.
Figure 4-6: Example planning sequence using our method of learning measurement
utility, which guides the robot to project its sensor visibility into occluded space.
Here, the sensor field-of-view is restricted to 60', illustrated with black lines.
We recorded the maximum visible range in each sensor measurement along each
simulated trial of our planner and compared those with the maximum visible ranges
observed by the baseline planner. In every case, our method was able to view a
greater distance ahead, resulting in the ability to plan higher-speed actions. Note
that in straight sections of the hallway, both methods see comparable distances ahead,
and the advantage is primarily in the turns. Therefore, we would expect the relative
difference in visibility during turns to be much greater than we report in Figure 4-5(c).
Figures 4-6(a)-4-6(c) (reproduced from Figures 1-3(a)-1-3(c) for convenience) il-
lustrate the planning approach developed in this chapter and show that our method
gives rise to the intuitive behavior of swinging wide around corners to obtain more
advantageous views of free space enabling greater stopping distances. Compared to
the default baseline behavior pictured in Figure 4-1, this behavior is qualitatively dif-
ferent. Figures 4-7(a) and 4-7(b) show the trajectories and resulting maps produced
by the baseline planner compared to our method in a 2.5 m wide hallway with a 60'
field-of-view. In general, our planner, observes much more of the environment due to
its learned behavior. The areas of the map that are left unobserved around each turn
in Figure 4-7(a) significantly constrain the planning options for the baseline plan-
ner. Due to kinematic and sensing constraints, the robot must anticipate the need to
observe these regions well before each turn, or they will remain unobserved.
115
(a) Simulated trajectory and resulting map of the baseline planner.
(b) Simulated trajectory and resulting map of our planner.
Figure 4-7: (a)-(b) Simulated trajectories and resulting observed maps of the baseline
planner and our method, in a 2.5 m hallway with a 60' sensor field-of-view. Green
and red dots indicate start and goal, respectively. Dark gray indicates the hidden
map, and light gray regions are those that have not been observed.
4.2.4 Experimental Demonstration on Autonomous RC Car
We tested our planner in a hallway environment in the MIT Stata Center on the
autonomous RC car described in Section 3.3.2. Video of these experiments is available
at: http: //groups. csail.mit .edu/rrg/videos/planning-f or-visibility. For
these experiments, we used a .05 m map resolution to capture the smaller irregularities
of the real environment. We artificially limited the Hokuyo planar LIDAR field-of-
view for the purposes of map building (but not for state estimation). We picked a
section of hallway consisting of three 900 turns, with hallway widths of approximately
1.9 m, roughly matching our simulated hallway distribution. In this environment, we
conducted 10 trials each for the baseline planner and our planner, using both 60'
and 90' fields-of-view. Figure 4-8 illustrates our experimental map and autonomous
RC car in the experimental environment, and Table 4.1 summarizes the results. In
Table 4.1, the time-to-goal column averages across only the successful trials in each
category, while the mean and maximum speed columns represent averages over all
trials in order to quantify the speed differential for the 600 case where the baseline
planner never reached the goal.
116
(a) (b) (c)
Figure 4-8: (a) Experimental trajectory of our planner through a hallway in the
MIT Stata Center with numbered turns (inset) and expanded view of our planner's
trajectory through turn #3 (main image), exhibiting good visibility around the corner
as a result of its wide-turn behavior. (b) Baseline planner failed at turn #1. (c) Our
autonomous RC car in turn #2 of the experimental environment.
Table 4.1: Experimental results.
Time-to-Goal Mean Speed Maximum Speed

(s) (Successes Only) (m/s) (All) (m/s) (All)
0/10 N/A 2.25 4.29
600 Baseline
Learned 8/10 14.36 3.08 5.35
4/10 14.26 3.06 5.30
900 Baseline
Learned 10/10 13.68 3.43 5.55
Figure 4-8(a) shows a map and trajectory from one of our planner's successful
trials using a 60' field-of-view. The close-up view of the third turn illustrates a
characteristic example of the good visibility of free space our planner was able to
obtain, hence allowing it to maintain a greater speed around the turn.
With a 60' field-of-view, our planner succeeded in 8 of 10 trials, while the baseline
planner failed in every trial. Figure 4-8(b) shows a characteristic failure by the base-
line planner, becoming trapped as discussed in Section 4.2.2. The success rates and
relative speeds listed in Table 4.1 roughly match our simulation results for 60' and 2
m hallway width, with difference likely resulting from the fact that the experimental
hallway was slightly narrower than 2 m.
Using a 90' field-of-view, our planner succeeded in all 10 trials, while the baseline
planner succeeded in only 4. This success rate for the baseline planner was consid-
117
erably worse than the simulation results, again likely due to the narrower hallway,
but our approach was robust to these differences. For those trials that did succeed,
Table 4.1 shows that our planner reached the goal in 4% less time than the baseline
planner, which is consistent with our simulation results. In this case, the primary
benefit of using our method was not the speed improvement, but rather the ability
to reach the goal successfully without becoming trapped.
4.3 Chapter Summary
In this chapter, we have introduced a supervised learning method to approximate

the utility of future measurements, without requiring any explicit distribution over
environments or measurement model at planning time. Our method exhibits the
intuitive information-gathering behavior of approaching corners from the outside of
the turn, where visibility is greatest, and keeping the sensor field-of-view pointed at
the map frontier in the desired direction of motion.
However, some of our results are surprising in several respects. First, we did not
anticipate the failure mode of robots becoming trapped in a small area of known free
space, unable to make progress toward the goal. Given how often this failure mode
occurs under common conditions, it warrants a solution such as the one we provide
in this chapter. Second, we expected a more substantial benefit using our refined
heuristic across a wider range of fields-of-view. Instead, there is little benefit for
fields-of-view greater than 80', as the resulting speed improvement does not outweigh
the extra distance traveled to obtain better visibility. Ultimately navigation speed is
limited by stopping distance, which grows quadratically with the speed of the robot.
Therefore, to increase speed, the length of known free space must grow quadratically
as well. However, our learned guidance function was able to produce only a modest
increase in sensor visibility.
We believe that there is substantial benefit to be gained from this type of in-
formation gathering method, but that it must be tried in more applications and
experimental platforms to understand where it is most helpful. The turning radius of
118
our vehicle model is actually quite restrictive for the scale of environments we tested
in, which significantly limits the ability for the robot to deliberately point the sensor
in different directions. It is quite possible that a different robot, with a different set
of constraints, could obtain a significant advantage from information-gathering ma-
neuvers. For example, a quadrotor helicopter could yaw to the side, independently
from its direction of motion, to point its sensor around a corner.
We have now concluded our discussion of two supervised learning problems that
advance the capabilities of high-speed navigation in unknown environments. Our first
learning problem was to predict collisions so that a robot could safely travel faster
than standard safety constraints would allow, and our second learning problem was to
predict measurement utility, so that a safety-constrained planner could take actions
to observe key unknown regions of the environment. However, our solutions to both
of these problems have depended on special features based on intuition and domain
knowledge, as well as a detailed geometric representation of the environment. In the
next chapter, we will demonstrate that we can also solve prediction problems such as
these using features automatically learned from raw sensor data.
119
120
Chapter 5
Predicting Collisions from Learned

Features of Raw Sensor Data
In this chapter, we will extend our machine learning methods to make predictions
from raw sensor data. In particular, we will focus on predicting collisions from camera
images. Until this point, we have relied heavily on hand-engineered features of the
robot's geometric map, which extract useful information like distances to the nearest
obstacle or frontier. Designing these features is task-dependent and requires expert
intuition about each prediction problem, which makes it difficult to apply our learning
techniques in general. Furthermore, the feature design problem is much harder for raw
sensor data like camera images, which is why the major recent advances in computer
vision tasks like object recognition have relied upon algorithms that learn or adapt
their own feature representation in response to their training data, most notably deep
learning. Therefore, in this chapter, we will explore the use of deep neural networks to
learn predictive feature representations for camera images in mobile robot navigation.
In many ways, cameras are ideal sensors for mobile robot applications since they
provide rich appearance-based information that is much more informative than purely
geometric LIDAR scans for many different types of navigation and decision making
problems. They are also much smaller, lighter and cheaper than LIDAR, and work
in a wide variety of indoor and outdoor lighting conditions, unlike most infrared
RGBD sensors. However, cameras pose a substantial computational challenge to
121
extract information in usable forms. For example, the range of accurate, dense depth
estimation from visual SLAM may be limited (like RGBD) to only a few meters.
Therefore, it is reasonable to assume that our camera-equipped robots will be able to
build geometric maps from images online, but that those maps will be very limited in
range compared to LIDAR. Range limitations, in turn, restrict the speed at which a
conventional planner such as our baseline planner can navigate, and impede its ability
to anticipate more distant structures in the environment.
Nevertheless, camera images contain information about the context and struc-
ture of the world far beyond the limited range of accurate geometric inference. For
instance, if a robot observes a straight empty hallway ahead, it might reasonably
learn from experience and visual appearance that it can safely travel at high speed
down the hallway, even if it cannot infer the exact geometry of the entire length of
the hallway. If we can extract this type of appearance-based information by other
means, we can improve navigation performance. Indeed, as we showed in Chapter 3,
the speeds achievable by the baseline planner were significantly impacted by sensor
range limitations, while our learning-based approach was able to correctly judge that
higher speeds could be safe depending on the local, observable characteristics of the
environment. Our challenge now is to extract these types of cues from camera images.
Deep learning has been shown to be well suited to extracting useful information
for robot navigation from raw sensor data 148, 93]. However, unlike visual SLAM
algorithms that infer geometry using domain-invariant principles, neural networks
adapt their representation to domain-specific training datasets. Therefore, they may
make unreliable predictions when queried with inputs far from the distribution of the
training data [201. Yet, we would still like to deploy robots using neural networks
"in the wild," where safety is critical but real-world data will almost certainly differ
from the training dataset, violating the common assumption in machine learning that
all training and testing data are independent and identically distributed. Moreover,
conventional neural networks provide no principled measure of how certain they are,
or how well-trained they are to make a given prediction.
Some work has attempted to quantify uncertainty about neural network pre-
122
dictions by training ensembles of networks on re-sampled versions of the training
dataset [90, 69, 38, 71] or training a distribution over models in the Bayesian set-
ting [78, 211. Recently, Gal and Ghahramani [411 showed that training and evaluating
with dropout (termed "MC Dropout") could be used to model prediction uncertainty.
While these methods are equipped to model the variance within the training data,
they are not explicitly trained to return appropriately high uncertainty very far from
the training data where there is no training signal at all. As a result, uncertainty
estimates far from the training data may be unpredictable and inaccurate. Finally,
these methods require tens or hundreds of network evaluations in order to build up an
approximate distribution around the output value, compromising runtime efficiency.
Given these limitations, we believe that determining the trustworthiness of a neu-

ral network may be more effectively addressed as a novelty-detection problem. In
fact, Pomerleau [94] utilized a form of novelty detection when using a neural network
to steer an autonomous car, during the very early days of camera-guided robots.
The system described in that work learned not only to steer the vehicle based on
camera and LIDAR input, but also learned to reconstruct its input images with an
autoencoder-like set of outputs from its neural network. The difference between the
input image and its reconstructed output was used to judge whether the steering
system was appropriately trained for the given task.
In this chapter, we extend our results on collision prediction, described in Chap-

ter 3, to make predictions directly from raw sensor data such as camera images rather
than hand-engineered features of map geometry. We demonstrate safe, high-speed vi-
sual navigation of a mobile robot, guided by a neural network in environments that
may or may not resemble the training environments. We use a feedforward neural
network to predict collisions based on images observed by the robot, and we use an
autoencoder-based novelty detector to judge whether those images are similar enough
to the training data for the resulting collision predictions to be trusted.
We extend the work of Pomerleau [941 in several ways. First, if our novelty
detector classifies a planning scenario as novel, we revert to a safe prior behavior,
which enables our system to operate autonomously in any environment, regardless
123
of whether it has received appropriate training. Second, we continuously collect
training data and label it in a self-supervised manner using our range-limited SLAM
map so that we can periodically re-train our models to improve, and expand the
set of environment types that are considered familiar, rather than needing either
human demonstrations or hand-engineered visual simulators for training in every
environment type. In the next section, we will present the problem formulation for
our visual navigation task. Then, we will describe our method of learning to predict
collisions from camera images, and our approach to novelty detection. Finally, we will
present simulated and experimental robot navigation results and a demonstration of
generalization in real-world environments.
5.1 Visual Navigation Problem Formulation
We aim to solve the problem of navigation to a goal location incurring minimum

expected cost in an unknown map, as we have done in the previous two chapters. In
this chapter, however, our goal is to use a camera as the primary perceptual sensor.
We assume that we will be building a geometric map online using SLAM because it is
helpful for practical reasons such as measuring progress toward the goal, but this map
will not be crucial for obstacle avoidance while under vision-based guidance. After
all, our heuristic estimate of remaining cost-to-goal depends on the local geometry of
the environment, requiring a map (although a promising line of future research would
be to develop geometry-free global guidance methods). However, for the reasons
discussed above, we assume that our ability to accurately infer dense environment
geometry through our camera is limited to 5 m or less, which is characteristic of many
monocular, stereo and RGBD methods. The Microsoft Kinect sensor, for instance,
has an advertised range of only 4 m [1]. Therefore, for the remainder of this chapter,
we will model our geometric sensing capabilities as an idealized range sensor with 5
m range and 1100 field-of-view, equal to that of our onboard camera. Since a short
perceptual horizon limits the available stopping distance of the robot, the range-
limited geometric map is insufficient for high-speed navigation on its own. As has
124
been shown previously, using learned models for image-based guidance can extend
the effective perceptual range and improve navigation performance [48, 93, 118].
To facilitate the problem formulation in this chapter, we assume that the robot's
camera images it can be converted into the same type of observations ot that we
described in Section 3.1, and that those observations can then be inserted into the
map contained in belief bt. This representation abstracts the true characteristics of
visual SLAM front ends, treating our camera essentially as a LIDAR with limited
range and field-of-view, but is conceptually reasonable for our purposes. We will still
maintain images it as distinct from ot since we will be processing the raw pixel values
of the images directly to predict collision beyond the geometric sensing range.
Our optimization problem in this chapter is a slight variation on the one we used
in Chapter 3. We simply extend the learned function modeling collision probability
to include the robot's current camera image, it, as input:
a*(bt) = argmin{ Ja(at) + h(bt, at) + Jc -fi(it, bt, at)}, (5.1)

atEA
where fi represents a learned approximation to the collision probability distribution,

analogous to fc in equation (3.10), as follows:
13 P(st+1 Iit, bt, at)ICS(st+1) fi (it, bt, at). (5.2)

st+1
As before, the collision probability is an unknown quantity, and since we do not

know the true distribution over the real-world environments in which we hope to plan,
we will approximate the distribution over collision probability by learning fi(it, bt, at)
from data. Since our goal in this chapter is to predict collisions from images in
order to overcome the range limitations (as well as other unmodeled failure modes) of
visual SLAM, we may be tempted to simply drop the dependence on bt and model the
collision probability term in equation (5.1) as fi(it, at), which could be implemented
as a neural network with an image and action as input.
However, as we have noted, neural networks may make erratic predictions when
125
queried with inputs unlike their training data, and standard neural networks have no
reliable way to estimate their own uncertainty. We must be able to detect when an
input is unlike the training data and revert to a prior collision probability estimate
that is known to be safe, though perhaps conservative. To do so, we introduce a
novelty detection function fnovei(it), which returns 1 if image it is novel (i.e., unlike
the training images), and returns 0 otherwise.1 We then model collision probability
as follows:
fi(it, bt, at) = fnet (it, at) if fnovei(it) 0 (5.3)

Ppr.(bt, at) if fnovei(it) 1.
In this equation, the function fnet(it, at) is a neural network trained to predict colli-
sions, and Ppr.(bt, at) represents a prior estimate of collision probability based on the
geometric map estimate contained in the belief bt that can encourage safe and sensible
behavior when the robot is faced with a novel-appearing environment for which fnet
is untrained. As in Chapter 3, a reasonable prior might be to limit speed such that
the robot can stop within the known free space of the map, if need be. Since our
geometric sensing range is now significantly limited, this speed may be rather slow.
In completely novel-appearing environments, the robot will navigate according to the
behavior suggested by the prior Ppr., whereas in environments of familiar appearance,
the robot will trust its learned collision prediction model fnet.
Note that previously, we defined our learned collision probability model fc(#c(bt, at))
as a function of features 0c. We now drop the feature notation and instead write
fnet(it, at) since we will not be explicitly designing feature functions to extract infor-
mation. Instead, our neural network will operate directly on the image and action,
adapting its own feature representation as it is trained. In the next section, we will
describe how to represent and train the function fet (it, at) as a feedforward neural
network and how to model the prior collision probability estimate, Ppr.(bt, at). Then,
we will describe the method we use for building fnovei(it) to detect novel images.
'It may be possible to estimate a probability of novelty, rather than a binary indicator, and use
the probability of novelty to blend the prior with the collision prediction produced by the neural
network. However, there are potential pitfalls of that approach if the novelty predictor assigns an
intermediate probability of novelty while the collision predictor produces an erroneous prediction.
126
5.2 Learning to Predict Collisions
As we have observed above, there are two main advantages to predicting collisions
from images based on training experience, rather than relying solely on a geometric
map. First, images contain information about the environment that extends well be-
yond the range with which detailed geometry can be reliably inferred using monocular
visual SLAM, stereo, or RGBD methods. And second, a prediction based on train-
ing data can account for elements of the environment-such as free space around a
corner-that are occluded from sensor view. For example, upon observing a turn in
a hallway, one might reasonably assume that there will be free space for navigation
around the corner even without directly observing that free space, again, enabling
faster navigation. We have already demonstrated this second point in Chapter 3.
Our objective in training a model to predict collision probabilities directly from

images is to capture these two advantages implicitly through training examples. As
before, we approach this problem as a probabilistic binary classification problem:
Given a camera image, and some choice of action, what is the probability that a
future collision is inevitable? Therefore, we must devise a method for building a
training dataset that provides examples of image-action pairs, with a binary labels
indicating whether a collision would be inevitable or not.
We could reuse the same simulation training procedure that we developed in Chap-
ter 3, where we randomly sampled robot configurations within a known training map.
From each of these random configurations, we generated the associated sensor mea-
surements to build a belief, and we used our knowledge of the true hidden training
map to label actions. However, this strategy assumes that we have full knowledge
of the training map, and that we can accurately simulate sensor measurements from
that map. These assumptions were met in our simulation-based training procedure
in Chapter 3, but may not be feasible for vision-based navigation because simulated
training images may not be visually similar enough to real camera images. While
training visual models on simulated images is a promising line of future research, we
would like to be able to train on real images collected from real-world environments.
127
In this chapter, we adapt our training procedure so that we can train on images
collected by the robot as it navigates (either autonomously, or manually driven) in
real-world environments. As before, we must have knowledge of the true map and the
configuration from which each sensor measurement was taken within that map in order
to produce labels. This requirement is somewhat complicated by the fact that we
assume that our geometric sensing in this chapter is very range-limited. Nevertheless,
for a given data collection episode in a particular environment, for instance a trip
from start to goal, we can store the map that was generated by the SLAM process
using measurements along the way. Since the robot will not exhaustively map the
environment in general, this map may be incomplete. We also store the sequence of
configurations visited by the robot, and the images taken from those configurations.
The result is a collection of raw data, containing both the map and configuration-
image pairs: Draw = {in, (qo, io), (q, i1), (q2, i2), ... ,(qNraw , iNraw )}, where ri is the
(perhaps incomplete) geometric map of an environment as produced by our SLAM
system having traversed the environment, qj is a robot configuration within that map,
ii is an image taken from configuration qj, and Nraw is the number of raw data points.
In the next section, we will describe how to turn this raw dataset into a labeled
dataset that we can use for training.
5.2.1 Labeling Data
Our strategy for labeling this dataset will be to cycle through each image-configuration
pair in Draw, randomly select an action ai that could have been taken from configu-
ration qj, and use the geometric map fi to determine whether that action would have
resulted in a collision at some point in the future. Since ?in may be incomplete, we
consider unknown regions in 7-n to be occupied for the purposes of labeling data. This
assumption can lead to slightly conservative labeling since there is more free space
in the true environment than what is represented in rn, but for the environments we
study in this chapter, there is no significant adverse effect on performance.
By associating ii and ai with the resulting binary label yi, the learned model
will be able to predict that next time it encounters a similar image-action pair,
128
a similar outcome will be likely. The result will be a dataset of the form D =
{((io, ao), yo), ((iI, a,), yi), ... , ((iN, aN), YN)}. This dataset differs from the labeled
dataset in Chapter 3 in that we store image-action pairs in place of features 0.
The labeling process is nearly identical to the one we described in Chapter 3,
except in this case, we use emergency-stop actions to test for future collision rather
than the forward-search method we used previously (illustrated in Figure 3-2). Both
methods would serve this purpose adequately, but implementation improvements ex-
panded the diversity of emergency-stop actions since the work that was conducted in
Chapter 3, enabling this method to produce reasonably accurate labels, while being
computationally faster than the forward-search method. Since the set of emergency-
stop actions still does not contain every possible trajectory, there are cases where a
collision-free stopping trajectory exists but is not found, and therefore these labels
tend to be slightly conservative. 2
We illustrate several example labeling scenarios from a simulated hallway environ-
ment in Figure 5-1. In Figure 5-1(a), the robot is located in the middle of the hallway,
and the randomly selected action, illustrated in blue, steers toward the right wall at
low speed. To determine whether a collision is inevitable after executing this action,
we test the range of possible emergency-stop actions that would bring the robot to a
stop under maximum braking with a variety of steering strategies. In Figure 5-1(a),
all of these emergency-stop actions intersect the right wall, and are therefore colored
red. Hence, there exists no feasible action that can avoid collision after executing the
blue action. Figure 5-1(d) illustrates the robot's camera view from this configuration.
This image, paired with the blue action, would be labeled "collision".
Figure 5-1(b) illustrates another scenario in which the robot is approaching a turn
at a high speed, with Figure 5-1(e) illustrating the robot's camera view from this sce-
nario. Again, all emergency-stop maneuvers are doomed to collide with the far wall,
regardless of steering angle. Hence, in this case speed is the more important factor
2
0n the other hand, the forward-search method used in Chapter 3 tended to produce labels that
may have been slightly optimistic in some cases since it only searched to a horizon of three actions
and did not search all the way to a safe stop. We believe that reducing the bias in the labels is
largely a matter of how the labeling planner is implemented and not a significant limitation.
129
-77'V 1
4"rf
C
(a) "Collision". (b) "Collision". (c) "Non-Collision".
(d) Image for (a). (e) Image for (b). (f) Image for (c).
Figure 5-1: Examples of the offline data labeling procedure in a simulated hallway
environment, with the associated simulated image for each scenario.
that makes collision inevitable, rather than the choice of steering direction. Figures 5-
1(c) and 5-1(f) illustrate a scenario in which the robot is approaching the corner more
slowly and farther from the turn. In this case, an emergency-stop maneuver is feasible
(illustrated in black). This scenario would be labeled "non-collision".
The distribution of "collision" and "non-collision" labels is intended to capture the

relationship between the image observed by the robot, the geometry of the environ-
ment, and the steering and braking capabilities of the robot with respect to that
image and environment geometry. After training on a dataset constructed in this
manner, our learned model of collision probability will be able to safely guide a robot
through an environment similar to its training examples.
We also observe that we need not limit ourselves to a single action and label for
a given image, and since real images are relatively expensive to obtain, it is very
beneficial to reuse each image to produce numerous labeled data points. Therefore,
we pair each image with 50 different actions and their associated labels, resulting in a
130
50 x increase in the amount of labeled training data. We assume that for every image
in our dataset, we will have a sufficiently dense sampling of labeled image-action pairs
associated with that image to train our collision predictor.
5.2.2 Neural Network
We model collision probability using a fully connected feedforward network, with

an image and action as input. We represent this input as a vector concatenating
the grayscale pixel intensities of the image with the x, y-position of the end point of
the action in the robot's body-fixed coordinate frame, and the speed at the end of
the action. Therefore, our network input is: x = [i0 , 1 , i2, . . . , ZK, ax, ay, av], where
ik E- [0, 1] is the intensity of the kth pixel in image i, and ax, ay, and a, represent the
terminal position and speed of the action respectively. While it is easy to provide a
richer action parameterization, we found no benefit from doing so.
For our simulation results, we use three hidden layers (200 nodes per layer) with
sigmoid activation functions, and a softmax output layer for probabilistic classification
of "collision" vs "non-collision" categories. We train this network using mini-batch
stochastic gradient descent to minimize the negative log likelihood of the data labels
given the network parameters. We do not use dropout or regularization.
In our experimental results, we modify this network structure slightly for com-
putational efficiency. We aim for a 20 Hz planning frequency, and in each planning
iteration as many as 200 actions from action set A may need to be evaluated, which
is infeasible for our onboard CPU. However, in cycling through these actions, the
image remains unchanged, leading to a large amount of essentially repeated compu-
tation. Therefore, in our experimental trials, we use a factored network architecture
in which the image alone is fed as input to an image-processing stage, consisting of
three hidden layers with 200 nodes per layer, as above. This image stage need only be
evaluated once per planning iteration. Then, we concatenate the 200-element output
of this image stage with the action description, [ax, ay, acv], and feed this 203-element
vector into a second stage consisting of two hidden layers with 100 nodes per layer.
Finally, the output of this second stage is fed to the softmax classification layer.
131
This factored architecture significantly reduces computational effort, making 20 Hz
re-planning feasible. We have tested this modified network structure in simulation
and found its performance to be essentially equal to the simpler network described
above, that takes both image and action endpoint as a single input.
5.2.3 Prior Estimate of Collision Probability
The prior is designed to encourage safety with a minimum of domain knowledge or

model-specific information. Our prior predicts that an action carries zero probability
of future collision if there exists enough free space ahead of it to come to a stop before
intersecting an obstacle or unknown space. If there does not exist enough free space to
come to a stop, we assign probability equal to zero for actions that reduce speed, and
probability equal to one otherwise. While more sophisticated priors could be designed
(or learned), this one serves our purposes and illustrates that the performance of our
approach does not hinge on carefully hand-crafted functions.
We define the function dfree(bt, at), which returns the length of known free space
in map estimate contained in belief bt, along the ray extending from some robot pose
along action at, averaged over several poses along the action. We also define the
function dst 0p(at), which returns the stopping distance required by the robot to come
to a stop from the speed at the end of action at. These are the same quantities we used
to develop the prior in Chapter 3, except that we now write them as functions of the
belief and action directly, since we are no longer computing them from intermediate
features. Finally, let vo(at) and vF(at) be the initial and final speeds along action at,
respectively. With these functions, we define our prior used in this chapter as:
0 if dfree(bt, at) > dtop(at

)
Ppr.(bt, at) = 0 if dfree(bt, at) < dstop(at) and vo(at) > vF(at) (5.4)
1 if dfree(bt, at) < dst 0p(at) and vo(at) < vF(at)-
It should be noted that this prior is not the same as the prior in equation (3.13) that
we used in Chapter 3, although it expresses the same intuition that the robot is safe
132
if it has enough room to come to a stop, and that if the robot is forced to exceed the
available stopping distance, then slower actions are safer than faster ones.
We switched to this prior for two reasons. The first reason is that this prior is
simpler. It has no manually-chosen parameters, such as kpr., that alter the shape of
the half-sigmoid and could be used to influence the behavior of the planner. Second,
we use a discrete set of speeds to create our precomputed action sets in this thesis, and
in practice, the shape of the sigmoid must be tuned using the constant kpr. depending
on the speed discretization to encourage the robot to slow down. The constant kpr.
must be chosen such that the expected collision penalty reduction at lower speed
outweighs the cost penalty of slowing down. The prior we use in this chapter has
neither of these issues, and works well regardless of speed discretization.
We do not consider designing priors to be an important part of this thesis, and we
do not want our results to hinge upon careful choices of prior functions. Therefore,
we prefer the simplest and most transparent priors possible. The purpose of this prior
is to serve as a safe prediction of collision probability, to be used only in the cases
where our learned model is not appropriately trained. Therefore, we need a method
of detecting novel queries for which our learned model might make a poor prediction.
In the next section, we describe our novelty-detection approach.
5.3 Novelty Detection
We must be able to detect when an input is unlike the training data and revert to a
default collision probability estimate that is known to be safe, though perhaps con-
servative, for situations where our collision predictor is not appropriately trained.
Following Bishop [20], we expect the reliability and accuracy of our neural network
predictions to be somehow related to the unconditional probability density p(i, a)
of the image-action pair on which those predictions are made. We expect to make
accurate predictions on inputs that are well represented in our training dataset, corre-
sponding to high density. Since we can densely sample many training actions a to be
paired with each training image i, we need only consider the unconditional density of
133
training images p(i). If we could model the unconditional density of training images,
P(i), we could use that approximate density to estimate whether a test image had
been drawn from the training dataset or some other distribution of novel images. If
we determine that an image had been drawn from the training dataset, we could trust
the resulting neural network predictions made on that image. On the other hand, if
we determine that the image is novel, we could switch to a safe, conservative policy.
Bishop [20] provides a Bayesian classification method for determining the novelty
of neural network inputs. If we refer the class of training images as CT and the class
of novel images as CN, we can classify an image i as belonging to the training class
by determining whether the following expression is true:
p(i|CN)p(CN
)
P(i|Cr) > N .~IN) ( 5.5
)
In this expression, p(ijCT) is the modeled distribution or density of familiar images,
which we have termed P(i), and p(CN) and p(CT) are prior probabilities of class
membership. The distribution p(iICN) is some probability distribution over the set of
novel images, which could be, for instance, a uniform distribution. Evaluating (5.5)
is equivalent to setting a threshold Tp on P(i). In other words:
fnovei(i) = if P(i) < TP (5.6)

0 otherwise.
The difficulty with this approach in our application is that we currently have no
effective or efficient methods for estimating the unconditional probability density of
a set of images due to their high dimension. Kernel methods in the raw pixel space
are unlikely to produce meaningful results because the data requirements scale expo-
nentially with the dimensionality of the space, and because the pixel-wise similarity
between two images is unlikely to capture the meaningful qualities of the image con-
tents. While generative neural network approaches are a promising line of current
research, they have not yet matured to the point that they can be used to efficiently
estimate the likelihood of an image under the generating distribution.
134
Another difficulty is that we assume that we only have access to the set of images
that have been observed and collected by our system. While we could obtain a set
of "unfamiliar" images in a variety of ways, the set of unfamiliar images should span
the space of all other natural images unlike the training set, which would be an
impractically large set of images. Therefore, we cannot pose novelty detection as a
normal binary classification problem. We must make an assessment of novelty based
purely on the characteristics of the training data alone.
Since we cannot efficiently model the unconditional likelihood of test images or

pose a binary classification problem, we will take an alternative approach to nov-
elty classification based on autoencoders, which circumvents both of these difficulties.
Since this autoencoder-based approach models the distribution of its training data
and then makes a classification on the basis of some measure of similarity, we can
consider it to be an approximation of the Bayesian method outlined above. This
autoencoder-based approach is described by Japkowicz et al. 156], and is also appli-
cable to a wide variety of complex domains, including acoustic signals [82], network
server anomalies [117], data mining 1491, document classification [81] and others. As
we have mentioned, Pomerleau [94] used an early precursor to this method to estimate
the reliability of neural network predictions for autonomous vehicle navigation.
To classify images as novel, we use a feedforward autoencoder with three hidden

sigmoid layers and a sigmoid output layer. The autoencoder is trained using mini-
batch stochastic gradient descent to reconstruct images by minimizing the sum of
squared pixel differences between the input and reconstructed output. Let ik C [0, 1]
denote the intensity value of the kth pixel of image i. The autoencoder loss is:
L (i) (ik - rk( )) 2

, (5.7)
k=1
where r(i) refers to the output of the autoencoder, i.e., the reconstructed image, and
rk(i) refers to the kth pixel of image r(i). The number of pixels in i and r(i) is K. This
novelty detection approach assumes that the autoencoder network will learn a com-
pressed representation capturing the essential characteristics of its training dataset
135
such that it will reconstruct similar inputs with low reconstruction error. On the other
hand, when the autoencoder is presented with a novel input-that is, an input unlike
any of its training data-we expect the reconstruction error to be large. Therefore, to
classify novel images, we need only to set a threshold and determine whether the re-
construction error for a given image falls above that threshold. Mirroring the method
we described in equation (5.6), based on the unconditional probability density of a
test image, we now approximate novelty classification with the following expression:
fnovei() 1 if L,(i) > TL, (5.8)

0 otherwise,
where TL, is a threshold on the autoencoder loss, analogous to Tp, except that the
inequality in this case is reversed since high image likelihood corresponds to low
reconstruction error. Similar to Japkowicz et al. 1561, we might consider setting the
threshold as some function of the largest reconstruction error of any single training
data point. However, since this value would depend only on a single outlier, we
instead adopt a method that we have empirically found to produce more robust and
consistent results, which is to compute the empirical CDF of the distribution of losses
in the training dataset and select some high percentile, such as the 9 9 th percentile, to
be the threshold. This method labels 99% of training data as familiar.
5.3.1 MNIST Examples
To illustrate the basic function of the autoencoder novelty-detection method, we

trained an autoencoder on the MNIST digit recognition dataset using three hidden
sigmoid layers with 50 nodes per layer. Figures 5-2(a) and 5-2(b) show an example
test input drawn from the MNIST dataset, and its corresponding output, respectively.
The reconstruction error for this example is L, = 6.9 x 10', which falls below our
computed threshold of L, = 1.5 x 10-2. On the other hand, Figures 5-2(c) and 5-
2(d) illustrate an input drawn from the notMNIST dataset [2] of letters from non-
standard fonts, and its corresponding output, respectively. In this case, it is clear
136
(a) Input. (b) Output. (c) Input. (d) Output.
Figure 5-2: (a)-(b) Input and reconstruction (output) for an image of a '5' digit,
drawn from the training dataset. (c)-(d) Input and reconstruction for an image of
a 'J' character, drawn from the notMNIST dataset, exhibiting high reconstruction
error, indicating that this input image is considered novel.
that the reconstruction does not resemble the input, and that is reflected in the
high reconstruction error of L, = 3.2 x 101, which is well above the threshold for
novelty. Rather than using a discriminative prediction of class membership from a
neural network on this input, we should recognize that we do not have the necessary
training and instead revert to a prior such as a uniform distribution over class labels.
5.3.2 Robot Navigation Examples
To evaluate this novelty detection approach on a visual navigation domain, we trained

an autoencoder on a set of simulated camera images from a hallway-type environment
and tested it on simulated images of hallways (the training distribution) as well as
forests of cylindrical obstacles (the novel test distribution). Again, we used three
hidden layers with sigmoid activations, but since these images are more complex than
MNIST, we increased the hidden layer size to 200 nodes per layer. For computational
efficiency, we use very low resolution 60x44 images.
Figure 5-3 illustrates two examples of novelty detection. Figure 5-3(a) illustrates
an input image of a simulated hallway drawn from the distribution of images used
to train the autoencoder and 5-3(b) shows the corresponding reconstruction with a
reconstruction error of L, = 4.5 x 10', which is lower than the threshold of 6.3 x 10-3,
resulting in a classification of "familiar". On the other hand, Figure 5-3(c) illustrates
an input image of a simulated forest of cylindrical obstacles not drawn from the
137
(a) Input. (b) Output. (c) Input. (d) Output.
Figure 5-3: (a) A familiar input image of a simulated hallway, drawn from the training
distribution, and (b) the associated accurate autoencoder reconstruction. (c) A novel
input image of a simulated "cylinder forest," not drawn from the training distribution,
and (d) the resulting inaccurate reconstruction.
3.5 4 4.5 5 5.5 6 6.5

Autoencoder Loss (Reconstruction Error) X 10 -3
Figure 5-4: Histogram of autoencoder loss on a training dataset of simulated hallway

images. The novelty threshold is drawn as a vertical dashed line.
training distribution and 5-3(d) shows the corresponding reconstruction, which has a
reconstruction error of Ln = 4.4 x 10', which is much greater than the threshold,
resulting in a classification of this image as "novel". It is visually apparent that the
reconstruction in Figure 5-3(b) is much more accurate than that in Figure 5-3(d),
intuitively indicating that the simulated "cylinder-forest" image does not follow the
same encoding that the autoencoder learned from the simulated hallway environment.
Figure 5-4 shows a histogram of reconstruction errors for images in the hallway
training dataset, along with the PDF of a normal distribution computed from the
sample mean and variance of the reconstruction errors. The vertical dashed line
illustrates the threshold set at the 99% point in the CDF of this distribution.
5.4 Results
We tested our approach both in simulation and in experiments using the autonomous
RC car described in Section 3.3.2. In both cases, we initialized the system with an
138
empty dataset and a randomly initialized collision prediction network and autoen-
coder. The autoencoder threshold was initialized to zero so that all images appeared
novel. We then repeatedly deployed the robot to drive to a specified goal location,
collecting images during each run, labeling them, and periodically re-training the
collision predictor and autoencoder networks on the growing dataset. Video of simu-
lated and real-world experiments, including autoencoder reconstruction, is available
at: http://groups.csail.mit.edu/rrg/videos/safevisual-navigation.
In all simulation results, we used a 1100 horizontal field-of-view camera model and
a 5 m range for geometric sensing. In our experimental results, we performed trials
using two different sensor configurations. In one configuration, we used monocular
visual-inertial odometry (VIO) based on GTSAM for state estimation, and an Xtion
RGBD camera for depth estimation. In the other sensor configuration, we used
a Hokuyo UTM-30LX planar LIDAR scanner to perform state estimation, and for
geometric perception (artificially limited to a 1100 field-of-view and 5 m range). In
both configurations, we used a Point Grey Chameleon camera with a 110' field-of-view
as input to our collision-prediction and novelty-detection models.
While this work is targeted at camera-only perception, the drift and error in visual
state estimation during aggressive driving prevented us from achieving maximum per-
formance and reliability in the LIDAR-free configuration. Therefore, we employed the
LIDAR purely to test the merits of our learning approach under the idealized circum-
stances of highly accurate state estimation. We emphasize that in all tests, regardless
of the sensor suite being used for state estimation and mapping, collision prediction
was performed only on camera images combined with the action description, as de-
scribed above. Therefore our results will easily extend to future implementations of
camera-only navigation as visual state estimation and depth estimation improve.
5.4.1 Visualizing Neural Network Predictions
To illustrate the effectiveness of collision prediction using camera images, Figure 5-

5 shows the predictions of collision probability on one real camera image from an
experimental trial and two simulated images. In each case, the networks used to
139
M
0 0 0
50 25 0 -25 -50 50 25 0 -25 -50 50 25 0 -25 -50
Action Direction (degrees) Action Direction (degrees) Action Direction (degrees)
(a) Real hallway. (b) Simulated hallway. (c) Simulated forest.
Figure 5-5: Illustration of predicted collision probabilities. Each image is shown with
the learned collision probability model below it, with the horizontal axis representing
action direction and color representing speed of the action, from 1 m/s to 10 m/s.
Maps of these types of environments are illustrated in Figures 5-8 and 3-3.
make these predictions were trained with large datasets of relevant images, so novelty
was not a factor. The graph below each image illustrates the collision probability as
a function of the steering direction of the action (i.e., atan2(ay, ax)), shown on the
horizontal axis, and the speed at the end of the action, which is indicated by the
color of each line. The speeds illustrated vary from 1 m/s (blue) to 10 m/s (red),
intuitively illustrating that faster actions carry a greater probability of collision.
Figure 5-5(a) illustrates a case where the robot was approximately aligned with
the axis of a real hallway, and therefore the learned model assigns low probability of
collision to actions aligned with that axis. Figure 5-5(b) illustrates the robot entering
a turn in a simulated hallway, so the probability model assigns lowest probability of
collision to actions that bear about 25' to the right. Finally, Figure 5-5(c) illustrates
the robot in a simulated forest of cylindrical obstacles. Since there is a long gap
between obstacles near the center of the image, the probability of collision is lowest at
this point. However, the predicted collision probability increases sharply for steering
angles to the right, since there is an obstacle in the right half of the image. Steering
toward the left is safer given the greater free space in the left half of the image.
140
Hand-Coded Features 1.8
3 0.8 .Neural-Network Features
S1.6
0.6
S1.4
0.4
0.2 1.2
0 - _,_,_,_1_- - - - - - - - - - - - -- - - -
-
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Cost of Collision: J, Cost of Collision: J,
Figure 5-6: Simulated navigation performance in 25 random hallway-type environ-

ments using our neural network approach compared to the approach presented in
Chapter 3. Error bars indicate standard deviation for each data point.
5.4.2 Neural Network Compared to Hand-Engineered Features
To determine the effectiveness of collision prediction from images using learned fea-
tures, we compared the simulated performance of our neural network system against
the results presented in Chapter 3, which used manually-designed predictive features,
in 25 random hallway maps. Figure 5-6 illustrates success rates and normalized nav-
igation speeds as a function of collision cost for both methods. The neural network
approach not only resulted in slightly faster navigation, but was collision-free for
J, > 0.2, whereas using manually-designed features, the system still incurred a small
number of collisions with J, = 0.2. This result demonstrates the effectiveness of
learned features and the information content available in images for navigation.
To produce this result, we adapted the training procedure from Chapter 3, us-
ing training configurations sampled uniformly at random from the feasible locations
within a training map, rather than the online data collection approach described in
Section 5.2. We used 200,000 data points for each method and relied on the learned
models only, without any prior collision probability model or novelty detection in
either case. To equalize the calculation of the heuristic, both methods used a 360'
field-of-view and 30 m range for the geometric sensor model that produced the local
map, although for the neural network approach, the image field-of-view was held at
110 , which is consistent with the rest of our image-based results.
141
1-
e Hallway e Forest
8
0.5 7
6-
>5(
0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Number of Training Images X 10 4 Number of Training Images X 10 4
Figure 5-7: Simulation results. Perceived novelty of images encountered online, and
average navigation speed, as a function of the number of training images. Blue
indicates hallway environments and red indicates "cylinder-forest" environments.
5.4.3 Online Learning Simulation Results
We tested our online, self-supervised approach in simulation with a single run through
each of eight random hallway-type environments followed by eight random "cylinder-
forest"-type environments, collecting data and re-training our models after each run.
Examples of these types of environments are shown in Figure 3-3. Navigating through
each environment contributed about 1250 images to the dataset, so that after 16 trials,
the system had collected 20000 images in total. For reporting performance statistics,
we performed 10 separate testing runs for each amount of training data. All simulation
trials were completed without collision, demonstrating the safety of our approach.
Figure 5-7 illustrates the performance of our simulated system as a function of

the number of images observed. The first eight data points (blue) represent hallway
environments, while the second eight (red) represent "cylinder-forest" environments.
As the system gained more training data from hallway-type environments, it trusted
the learned model of collision probability on a greater fraction of observed images,
resulting in faster average speed of navigation. As the novelty fraction decreased,
the average speed of navigation increased and then reached a plateau. In total, the
average speed increased 42% from 4.90 m/s to 6.97 m/s in the hallway environments.
After completing trials in the hallway environments, we tested our system in

"cylinder-forest" environments, carrying over the complete dataset and trained models
from the final hallway test. Initially, all images were again classified as novel since
142
the forest environment type is visually distinct from the hallway environment type.
However, as our system accumulated images from the forest environments, the same
pattern emerged: Decrease in novelty with corresponding increase in speed. In forest
environments, the speed increased 50% from 5.84 m/s to 8.80 m/s. The reason
forest navigation was faster than hallway navigation, both before and after training,
is that there is on average more visible free space ahead of the robot in the forest
environments, since there are no sharp turns to create occlusions.
5.4.4 Experimental Results using LIDAR
We performed a total of 80 experimental trials using a small autonomous RC car in

much the same way that we performed our simulation trials, except that instead of
re-training after each run, we re-trained after every 10 runs through the environment
since the experimental environment was smaller than the simulation maps and yielded
fewer images per run. A generated map of the experimental environment is shown in
Figure 5-8 and an image from this environment is shown in Figure 5-11(a).
We observed essentially the same qualitative results in our experiments as we did
in simulation. As the system collected more training images, the fraction of images
considered novel decreased, resulting in a corresponding increase in navigation speed,
as illustrated in Figure 5-9. We observed an average speed increase of 25%, from
3.38 m/s to 4.24 m/s. Since the experimental environment was small compared to
the simulation environment, a substantial portion of the navigation time was spent
accelerating from the start and decelerating at the goal. Therefore, we also provide
the average maximum speeds observed for each amount of training data, where we
see a much more substantial improvement of 55%, from 3.79 m/s to 5.90 m/s. This
improvement in maximum speed implies that if the experimental environment were
larger, we would observe a greater improvement in average speed as well.
Figure 5-12 illustrates the autoencoder reconstruction error for three categories of
data after training on nearly 9000 training images from numerous runs through the
experimental environment depicted in Figure 5-8 and 5-11(a). The blue histogram and
PDF indicates the reconstruction error observed for the training images, from which
143
I
Figure 5-8: LIDAR-mapped hallway environment for training and testing.
7
- Average
6 - - Maximum)
S
S0.5 -eC) 5
C)
4
n
0 2000 4000 6000 8000 0 2000 4000 6000 8000
Number of Training Images Number of Training Images
Figure 5-9: Experimental results for 80 experimental trials. Each data point repre-
sents the average over 10 trials with a given model between re-training episodes.
the threshold (dashed line) was computed. The red histogram and PDF represents
the reconstruction error of approximately 1000 images collected during testing runs
in the same environment, but not used for training the autoencoder. While the
reconstruction error of these images was not as low as the training images, most fell
below the novelty threshold. Finally, the black histogram and PDF illustrate the
reconstruction error of a truly novel environment-a well-lit lab area with obstacles,
shown in Figures 5-10 and 5-11(b). Visually, this environment was quite different from
the training environment, and that is reflected in the much greater reconstruction
error, with all images falling above the novelty threshold.
Figure 5-10: Xtion-mapped novel environment used for LIDAR-free testing.
144
(a) Familiar Environment. (b) Novel Environment.
Figure 5-11: Images from the training hallway (a) (see map in Figure 5-8), and a
novel environment that was not used in training (b) (see map in Figure 5-10).
Training
- Testing - Familiar Environment
---- Testing - Novel Environment
- - - Threshold
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04

Autoencoder Loss (Reconstruction Error)
Figure 5-12: Histogram of autoencoder reconstruction errors for experimental data.
5.4.5 Experimental Results using Monocular VIO and RGBD
We performed experiments on our R.C car using a LIDAR-free sensor configuration

in order to demonstrate steps toward a completely vision-based learning and navi-
gation pipeline. However, as noted previously, inaccuracies in monocular VIO made
this sensor configuration unsuitable for demonstrating the entire training and testing
procedure. Nevertheless, we were able to test the LIDAR-free sensor configuration in
the environments depicted in Figures 5-8, 5-10, and 5-11 using the collision-prediction
and autoencoder models previously trained with the other sensor configuration.
In the hallway training environment, we achieved a mean speed of 3.26 m/s and
a top speed over 5.03 m/s. This result significantly exceeded the maximum speeds
achieved when driving in this environment under the prior estimate of collision prob-
ability before performing any learning. On the other hand, in the novel environment,
for which our model was untrained, the novelty detector correctly identified every
image as being unfamiliar. In the novel environment, we achieved a mean speed of
145
2.49 m/s and a maximum speed of 3.17 m/s.
We emphasize that these results represent an improvement of 31% in mean speed
and 59% in maximum speed after learning, for monocular vision-based navigation
using a neural network for planning. Naturally, the robot navigates more slowly in
the novel environment than it does in the familiar one, but our method enables the
robot to safely navigate in the novel environment, potentially gathering data and
re-training, whereas that would not be safely feasible without novelty-detection.
5.5 Generalization in Real Environments
We have now shown that learning from camera images can be used to improve naviga-
tion performance in the same real-world environment that produced the training data,
and in simulation environments where randomly-generated instances of a particular
environment type share the same textures, lighting and appearance.
We now aim to answer the question of whether our learning methods are able to
generalize to new environments, or whether they simply memorize the exact pixel-
by-pixel appearance of each particular environment on which they have been trained.
Zhang et al. [1261 observe that while many deep neural network models generalize
very well, they also usually have enough representational flexibility to memorize each
data point in their training dataset, so we must carefully examine whether our deep
neural networks are learning meaningful models or trivial representations. We find
that for hallway environments in various buildings at MIT, our learning methods do
generalize, and we will demonstrate that capability using autoencoders to measure
apparent novelty and to illustrate the qualitative nature of the models being learned.
Figure 5-13 shows a selection of 16 different hallways in various buildings at MIT.
We collected approximately 5000 images from each one by manually driving our exper-
imental RC car down each hallway five times per hallway in a combination of straight
and randomized turning motions. We did nothing to alter the environments, so many
of them contained occasional pieces of furniture, garbage cans, liquid nitrogen tanks,
plywood crates, and other items in the hallways.
146
(a) #1: (b) #2: (c) #3: (d) #4: (e) #5: (f) #6: (g) #7: (h) #8:
Bldg 26, Bldg 26, Bldg 26, Bldg 26, Bldg 16, Bldg 56, Bldg 16, Bldg 36,
5th Floor 4th Floor 3rd Floor Basement Basement Basement 3rd Floor 2nd Floor
(1) #9: (j) #10: (k) #11: (1) #12: (m) #13: (n) #14: (o) #15: (p) #16:
Bldg 37, Bldg 2, 3rd Bldg 2, 3rd Bldg 16, Bldg 16, Bldg 2, 2nd Bldg 2, 2nd Bldg 6, 2nd
5th Floor Floor Floor 2nd Floor 5th Floor Floor Floor Floor
Figure 5-13: MIT hallways used for data collection.
The hallways all share the same essential geometric characteristics of a hallway,
but they differ in their visual appearance. Some of these differences include textures
and patterns in the floor, walls and ceiling, the color and intensity of light, the pattern
of lighting fixtures, the reflections in the floor, and many other subtle characteristics.
5.5.1 Demonstrating Generalization
To test generalization, we trained a family of autoencoders, each one using data from
a different number of hallways for training, between 1 and 13. For instance, the first
autoencoder was trained on hallway #1 only (about 5000 images total), the second
autoencoder was trained on hallways #1 and #2 (about 10,000 images total), etc.,
and the thirteenth autoencoder was trained on hallways #1 through #13. After
training was completed, we then tested the ability of each of these autoencoders to
reconstruct images of the remaining three hallways that were never seen by any of
the autoencoders during training, i.e., hallways #14, #15 and #16. To test each au-
toencoder, we cycled through all images from the testing environments, recorded the
reconstruction errors, and classified those images as "novel" or "familiar" depending on
147
100
80
60
4-D
'-H 40
20 -_-
0
0 2 4 6 8 10 12
Number of Selected Training Hallways Included in Training Dataset
Figure 5-14: As we trained our novelty detectors on more and more hallway envi-
ronments, they performed better and better at reconstructing unseen testing hallway
environments. The black line illustrates the overall performance, averaged over 10
random permutations of training and testing hallways. Each colored line illustrates
the performance on each individual random permutation of data and choice of which
environments were used for training vs. testing.
whether the reconstructions fell above or below the threshold, which was determined
according to the method described above.
The results are shown by the black line in Figure 5-14. These results show that
the perceived novelty of unseen testing hallways dropped monotonically as the au-
toencoder was exposed to more and more different training hallways. After training
on only one hallway, virtually all test images were classified as novel. This result
indicates that a single hallway does not contain enough variation to capture the char-
acteristics of hallways in general. But, after training in ten or more hallways, the
test hallways were mostly familiar. After training in thirteen hallways, only 9.4% of
images from the test hallways were considered novel, indicating that the autoencoder
had learned to generalize fairly well to new hallways.
Do the results vary if we use different sets (or orderings) of hallways for training
and testing? For example, what if we train on hallways 4, 9, 13, 1, 2, 14, 11, 12, 8,
6, 3, 16, and 15 and test on hallways 5, 10 and 7, as opposed to training on 1-13 and
testing on 14-16? We found that the selection and ordering of training hallways made
some difference, but that the overall trend was consistent across permutations of the
148
data. To test the importance of this effect, we shuffled the selection of which hallways
would be used for training and which would be used for testing, and repeated the test
outlined above for 10 random selections of training and testing hallways. The colored
lines in Figure 5-14 show the individual results for each permutation, while the black
line illustrates the average over all permutations.
The reason different permutations may perform differently is that if a hallway

has no unusual features (like hallway #5 in Figure 5-13(e)), or is in some sense a
representative "average" hallway, then it may be easier to reconstruct accurately with
the learned features, causing it to appear more familiar than a more complicated
hallway. The opposite scenario is also true. If a hallway has unusual characteristics
like a unique and visually prominent tile pattern on the floor that does not appear
in any training examples (the hallways in Figures 5-13(a)- 5-13(c)), then the autoen-
coder may not be able to reconstruct it well because it has not learned the necessary
features to do so, which will make that test environment appear more novel. By ran-
domizing which environments are used for training and testing, we avoid exploiting
some accidental benefit due to the particular characteristics of certain environments.
All individual permutations of data showed the same general trend.
To show that our autoencoders are learning a meaningful representation, rather

than a trivial mapping that copies the input pixel-by-pixel, we must test them on a
variety of images including some that deviate from the training set characteristics.
First, our autoencoders classified 12.9% of images from the second floor hallways
of the MIT Stata Center as novel. This number is slightly higher than the novelty
classification of hallways illustrated in Figure 5-13, due in part to the different physical
characteristics of the Stata Center, specifically the dark carpeted floors and occasional
glass or wooden walls. Nevertheless, in classifying most Stata Center hallway images
as familiar, our autoencoders appear to have captured the main characteristics of
hallways well enough to generalize to this different appearance.
On the other hand, our autoencoders classified 91.5% of images from the out-
door Stata Center courtyard as novel, and classified 94.0% of images from the Stata
Center parking garage as novel. These results seem intuitively sensible since these en-
149
vironments are visually distinct from hallways and should be treated differently by a
planner. When the autoencoder misclassified a courtyard or parking garage image as
familiar, it was often the case that some elements of the environment actually caused
the image to look similar to a hallway. We will show an example of this scenario
below. However, in other cases, some images were relatively featureless, making them
easier to reconstruct with low pixel-value error despite being fundamentally different
from hallway environments. Some of these misclassifications were likely due to the
extremely low image resolution and lack of color. This is a potential failure mode
of the autoencoder method of novelty detection that should be addressed in future
work. It is likely that sharper images would lead to a sharper set of learned features,
and perhaps more accurate discrimination between environment types.
To perform this analysis, we made some modifications to the autoencoder pre-

sented above. First, to handle the greater natural variation of real data from 16
hallways, we increased the number of hidden nodes per layer from 200 to 400. This
increase gave the autoencoder sufficient capacity for training and capturing varia-
tion, but still imposed a significant information bottleneck, forcing the autoencoder
to learn a compact representation rather than simply copying pixel values. To train
more efficiently, we switched the hidden layer activation functions from sigmoid to
rectified linear units (ReLUs). We also replaced the sum of squares loss function in
equation (5.7) with a sum of absolute differences (SAD) loss function, which we found
to be slightly more effective at accurate reconstruction. However, we did not perform
an exhaustive evaluation of different loss functions or network architectures and there
are likely others that also perform well.
To handle the wide variety of lighting conditions and ensure that we were not
simply classifying environments on the basis of their overall brightness, we normalized
all images to have a mean pixel intensity of 0.5 and clamped any outlying pixel
intensities to the interval [0, 1]. Finally, we reduced the threshold from the 9 9 th to
9 5 th percentile to account for the non-hallway images that were incidentally included
in the training and testing datasets, for instance images at the end of each hallway
that do not conform to general hallway characteristics. These non-hallway images are
150
(a) Input image. (b) Reconstructed image.
Figure 5-15: Example of an image being correctly labeled as novel. The red box
in the original image (a) contains sunlight backlighting a frosted glass wall and a
person emerging from a door, neither of which are present in any training examples.
Therefore, the autoencoder can only approximate these features with a blank wall,
seen in the red box in the reconstructed image (b).
outliers in the training datasets and tend to skew the distribution of reconstruction
errors upward. While this step would not be necessary when learning environments
in general, where there is no clear distinction between environment types, it is useful
for illustrative purposes to more clearly distinguish hallways from non-hallways.
5.5.2 What Makes a Hallway Image Appear Novel?
Hallway images with no unusual characteristics will generally be classified as familiar

by autoencoders that have been trained on a sufficient number of hallways. But, if
there are rare objects or unusual lighting conditions, like sunlight coming through an
open office door, then the image may be classified as novel. Figure 5-15 shows an
example image of this scenario being reconstructed by an autoencoder that was not
trained on any similar scenarios. The red box in the upper left corner contains an
unusual pattern of sunlight backlighting a frosted glass wall, with a person emerging
from an open door. However, the corresponding red box in the reconstructed image
contains none of these features. Instead, it replaces the unusual lighting and the
person with a blank wall of approximately the same brightness. Note that the rest
of the image outside of the red box is reconstructed fairly accurately, including the
reflections on the floor, molding along the base of the wall, the bulletin board on the
right wall, and the door frame ahead. If not for the unusual contents of the red box,
151
(a) Input image. (b) Reconstructed image.
Figure 5-16: Example of an image being correctly labeled as familiar. The input image
is displayed in (a) while the reconstruction is displayed in (b). This reconstruction
contains some notable features, like the doors and poster boards on the walls of the
hallway, as well as a reflection of the lights and doors on the floor. The door on the
right side of the image is curved due to the distortion of the wide-angle camera lens.
this image would likely have been classified as familiar.

In situations such as this, the novelty detector is correct to label this image as
novel. Having never encountered any similar situations during training, the system
would have no way to know whether the red box contained a hazard, and ought to
slow down or revert to some baseline safe behavior.
5.5.3 Reconstruction Examples
To understand the characteristics of our hallway-trained models, we first show an ex-

ample of a hallway image (not used in training) being accurately reconstructed. The
original image and the reconstruction are illustrated in Figure 5-16. The reconstruc-
tion looks like a softened or blurred version of the input, but captures many of the
common features in a hallway, including the visually distinct straight lines along the
edges of the floor and ceiling, converging to a vanishing point. The reconstruction also
captures the poster board and doors along the walls of the hallway, the approximate
texture of the ceiling with lights, and the reflection of visual features in the floor.
In fact, many of the characteristics of the reconstruction in Figure 5-16(b) can be
observed in the weights of the output layer of an autoencoder trained on these types
of hallways. Figure 5-17 illustrates a random selection of these weights, displayed
152
Figure 5-17: Randomly selected weights from the output layer of an autoencoder
trained on hallway images. These weights have been displayed as image filters to
provide some intuition about the models being learned. Each filter displays some
common characteristics of a hallway, including the outline of the floor and ceiling
converging to a vanishing point, and rectangular shapes in the center, corresponding
to a doorway or intersection at the end of the hallway.
as images. Each one of these weight-filters contains the characteristic outline of the
floor, walls and ceiling as straight lines converging to a vanishing point. These filters
also have a rectangular shape at the center, representing a doorway or intersection
at the end of the hallway. Finally, each filter has a pattern of lines on the walls and
ceiling that can be used to reconstruct patterns like doors and lights. Perhaps the
most subtle feature of these filters is a reflection of visual features across the horizon,
which occurs since most of the training hallway floors were reflective.
While these features were specifically tuned to identify hallways, there are occa-
sionally times when an image from a non-hallway environment happens to conform
to the visual appearance of hallways and gets reconstructed accurately as a result.
Figures 5-18(a)-5-18(b) illustrate this scenario with an image of the parking garage
in the MIT Stata Center. In the input (Figure 5-18(a)), the wall to the right side of
the image and the row of columns and lights to the left roughly follow a hallway-like
structure. The reconstruction in Figure 5-18(b) is sufficiently accurate for this im-
age to be labeled as familiar. This classification could be considered incorrect since
the parking garage is not a hallway. But, it is reasonable to expect that if we scale
these methods to higher-resolution color images and large state-of-the-art convolu-
tional neural networks, these types of accidental spoofing events will become more
rare. After all there must be sufficient information in the image to make an accurate
classification, and even to the human eye, Figure 5-18(a) could look like a hallway.
On the other hand, Figure 5-18(c) shows an example of an outdoor scene, con-
153
(a) Input (garage). (b) Output (garage). (c) Input (courtyard). (d) Output (courtyard).
Figure 5-18: Examples of input images from non-hallway environments and their
reconstructions. (a) and (b) are from the parking garage at the MIT Stata Center.
The input happens to appear like a hallway, despite being a non-hallway environment,
and is therefore reconstructed accurately and classified as familiar. (c) and (d) are
from the outdoor courtyard at the MIT Stata Center. The input looks very unlike a
hallway and is therefore reconstructed poorly, resulting in a classification as novel.
taining grass, trees, and a backlit sky, which looks nothing like a hallway. Since the
learned features do not match this input, the reconstructed image in Figure 5-18(d)
is inaccurate. As a result, this image is correctly classified as novel with respect to
the hallway training data. Interestingly, the reconstruction contains some common
characteristics of hallway images that are absent in the input.
5.6 Limitations and Future Directions for Novelty

Detection Methods
As we have shown, the feedforward autoencoder method works reasonably well for
highly structured data like MNIST digits and the robot navigation environments
we use in this thesis, where we have many similar examples that lie on a fairly low-
dimensional manifold of images. However, this method might not be expected to work
well for highly varied datasets of unique images like CIFAR-10, which would require a
large set of general-purpose image features to model, rather than environment-specific
features. Such datasets may require larger convolutional autoencoders, semantic or
discriminative training, or other techniques to capture the true structure of the data.
The underlying assumption that makes the autoencoder approach reasonable is
that equation (5.8) has some characteristics that relate it to (5.6). Specifically, (5.8)
154
provides a measure of distance that is low for data points near the training data, and
larger elsewhere. However, it may be the case that as an autoencoder is exposed to
a wide variety of natural images, it will develop general-purpose features that help
it to reconstruct general images, not just the images from its training dataset. In
that case, the autoencoder loss may be low for test data that are unlike the training
data, resulting in unfamiliar images being incorrectly classified as familiar. We must
be careful to protect against this type of failure mode, perhaps by enforcing a tight
bottleneck in the autoencoder architecture so that the hidden representation is very
low dimensional, or by imposing regularization terms to encourage sparsity.
As an alternative to autoencoder-based novelty detection, we might also consider

methods such as kernel density estimation, as proposed by Bishop [201, or search-
ing the dataset for nearest neighbors in pixel-space or some feature space. However,
such non-parametric or nearest-neighbor methods would be much more computation-
ally expensive since they would require querying or iterating over a high-dimensional
dataset of images for each prediction. Furthermore, proximity in pixel-space is not an
effective measure of similarity for images, although certain types of image descriptors
may perform well for density estimation. Krizhevsky et al. [68] proposed a method of
content-based image retrieval using the 4096 learned features at the final hidden layer
of a discriminatively-trained image classification network (AlexNet), which combines
learned feature representations with non-parametric or nearest-neighbor approaches.
Their method is to project a query image into this 4096-dimensional feature space
and then find nearest neighbors using Euclidean distance as a metric of similarity.
Because these features have been trained discriminatively using object labels, this
method produces images that are similar at a semantic level, not necessarily a pixel
level. For instance, from a single image of a purple flower or an elephant, they were
able to retrieve other pictures of purple flowers or elephants in different poses.
In many ways, similarity in a semantic or conceptual space may be more impor-

tant than similarity in pixel space because it reflects a deeper understanding of the
image contents than could be accomplished by pixel-level reconstruction. Krizhevsky
et al. [68] contrast this proposed method with their earlier work ([671), which used a
155
deep autoencoder to learn a low-dimensional feature representation for content-based
image retrieval. They observed that their autoencoder-based representation tended
to retrieve images with similar pixel patterns whether or not they were semantically
related. This is the same type of potential failure mode we observed with the image
of the parking garage in Figure 5-18(b).
If we had a learned feature representation where Euclidean distance was an effec-
tive and meaningful measurement of similarity, we could project all of our training
images into that feature space and make kernel-based non-parametric predictions ex-
actly as we did in Chapter 3. We could then use local data density as a measure
of confidence in our neural network predictions. Currently, the size and complex-
ity of neural networks required to perform these feature mappings are too great to
run in real time on a CPU, and querying a large 4096-dimensional dataset would
be prohibitively expensive, but GPU hardware acceleration and sophisticated data
structures are beginning to alleviate these difficulties.
Other deep learning methods could also be used to detect novelty in other ways.
For instance, deep restricted Boltzmann machines (RBMs) have been used as genera-
tive models for images, meaning that they model the distribution over their training
images, and therefore could potentially be used to recover the likelihood of a query
image under the learned model. If we were able to compute the likelihood of an image
or probability that an image was drawn from the training distribution, we could apply
Bayesian methods as proposed by Bishop [20].
In this chapter, we have used the novelty detection method described by Japkowicz
et al. [56] and Pomerleau [941 for its simplicity and efficiency. But, we expect that as
new techniques mature, they may provide more effective methods of novelty detection
that can serve the same role in our vision-guided autonomous navigation approach.
5.7 Chapter Summary
In this chapter, we have demonstrated a method of safely learning a collision pre-

diction model from camera images collected online. We have shown that by using
156
novelty detection as a measure of uncertainty in our collision prediction network, we
can transition intelligently between the high performance of the learned model and
the safe, conservative performance of a simple prior, depending on whether the sys-
tem has been trained on the relevant data. In contrast with nearly all deep learning
applications, this novelty-detection approach enables us to deploy our system into
the real world with no training, so that it can safely gather data while performing
its specified navigation tasks and improving each time it re-trains its models on the
ever-growing dataset.
Our combination of a neural network collision predictor with novelty detection
mimics the Bayesian capability that we developed in Chapter 3, but does not rely
on manually designed predictive features. This advantage enables us to process raw
sensor data like camera images, which contain many rich contextual cues that can
significantly improve navigation performance. We expect that in future work, the
content of images will not only to enable collision prediction, but also high-level
guidance and context-dependent behaviors. Furthermore, we anticipate that larger,
more sophisticated deep learning models can be used both for collision prediction
and novelty detection. As these models are developed, they may lead to more robust
and reliable performance, as well as generalization across planning scenarios that are
similar at a conceptual level rather than merely a pixel level.
157
158
Chapter 6
Conclusion
In this thesis, we have developed several methods for high-speed navigation in un-
known environments using machine learning. Our high-level goals were to understand
the limitations of conventional techniques for planning in unknown environments, and
to take steps toward overcoming those limitations. The basic challenge we face is sim-
ply that it is impossible to meaningfully plan actions with respect to unknown regions
of the environment without some form of information about what lies beyond the map
frontiers. Without any knowledge about the unknown, the best a planner can do is
to drive toward the goal in hopes of observing relevant portions of the map along its
way, so that it can proceed.
Learning from experience allows us to endow a planner with knowledge that may
be useful for planning, even if it does not reveal the exact structure of the unknown
map. Our claim in this thesis is that there are general environmental characteristics
that can be learned, implicitly or explicitly, and used to plan more effectively in new,
unseen environments. But, the means of expressing this knowledge depends on the
constraints and objectives of the planner.
In Chapter 3, we showed that we could lift the conventional safety constraint
and replace it with a learned model of collision probability that would guide the
robot to make safe action choices at higher speeds than would be feasible under the
conventional constraint. This model endowed the planner with implicit knowledge of
what environment structures might lie behind every occlusion and enabled it to act
159
based on the very common average case environmental structures, rather than the
worst-case conservative assumptions.
In Chapter 4, we showed that even if the planner were required to enforce the con-
ventional safety constraint, we could endow it with another type of knowledge that
would guide it to observe more of the relevant parts of the environment. In this case,
rather than using the learning to represent implicit knowledge of environment struc-
tures, we used learning to help the robot directly observe those structures, thereby
gaining additional map knowledge to plan within and navigate faster as a result.
In these respects, we have taken concrete steps toward achieving the goals of this
thesis. While we have almost certainly not reached the maximum possible perfor-
mance using these proposed methods, it is likely that machine learning has an impor-
tant role to play in continuing to advance the performance and push the boundaries
of what is possible. We now propose some suggestions for future work that would
advance the directions of research explored in this thesis.
6.1 Future Work: Predicting Collisions
To develop an accurate, nuanced model of collision probability, it will be necessary

to use a richer feature representation than the four hand-coded features we used in
Chapter 3. One way to enrich the feature representation would be to train a neural
network, whose internal feature representation adapts to the training data. This is
the approach we took in Chapter 5. However, learning features through a neural
network approach still has the major shortcoming that the resulting features are not
necessarily stable, transferable, human-understandable concepts. If a learned model
used semantic features, it might be easier to compactly express complex scenarios like
"slow down before the driveway because a car may pull out onto the road." Descriptive
semantic features may enable the robot to learn from much less data, and they may
make it easier for a human expert to teach the system or provide prior knowledge.
Semantic features will also be useful in predicting route viability, which can be
viewed as a high-level abstract version of collision prediction. Determining route
160
viability means answering the question of whether a given path may lead to the goal,
and if so, how directly it will lead there. For example, a robot trying to exit a building
should not look for the corner of its current room that is closest to the goal, in hopes
of finding a path. Instead, it should look for a door, perhaps labeled with an exit sign.
Rooms, doors and signs are semantic or conceptual entities that can be understood
by a planner equipped with the right feature representation.
We also note that collision prediction and other types of hazard prediction are
completely compatible with conventional planning methods, and can be used to avoid
risks for which no conventional safety constraint exists. Consider an autonomous car
driving past a park or school. Even if the car is obeying the speed limit, it may
still decide to slow down further since there may be children playing near the street.
This is a case where the car may further reduce its risk by slowing down even if it
is moving slowly enough to stop before colliding with any of the observed obstacles.
This situation may also apply if the autonomous car observes erratic behavior by
some other car on the road ahead.
6.2 Future Work: Predicting Measurement Utility
Our method of predicting measurement utility can certainly benefit from the ex-
panded and semantic types of feature representations we have discussed above. But
there are other areas of improvement that could help as well. First, we have used only
a one-step look ahead for training. This design choice limits complexity to a modest
level, but may also limit the strength of the training signal. It could be that in some
cases, the robot must anticipate two or more actions in advance that it will need to
position itself for a particular measurement, which could only be learned by having a
multi-step look ahead during training.
We only demonstrated our approach in hallway-type environments since these are
very common environments and our intuition for the problem was derived from the
challenge of rounding a blind corner. Another useful area of future work might be to
characterize other types of environments where specific out-of-the-way observations
161
are important for good performance. When navigating in unfamiliar environment
types, it might also be beneficial to incorporate some prior knowledge, as we did with
collision prediction. A reasonable prior might encourage the robot to plan actions
that enable views of the map frontiers, which is a good general guiding principle of
exploration and agrees with the models we learned from training in this work.
Finally, in a high-level planning framework that depends on semantic understand-
ing of the world, it is likely that robots will need to seek out very specific pieces of
information. For example, in order for a robot to exit a building, it will first need to
find an exit route. Equipped with some learned knowledge of where to look for exit
signs (for instance, above doors in a hallway), it may be able to find an exit route
much more quickly than by random search. In that sense, a measurement revealing
an exit sign would be much more valuable than any other measurement. The same
type of formulation could be constructed for other kinds of robotic search problems.
6.3 Future Work: Learning Predictive Features
Feature learning through neural networks is perhaps the area with the greatest po-
tential for future work, with the first and most apparent extension being one of scale.
One of the most important extensions of our work in Chapter 5 will be to train on
larger datasets covering a much wider variety of scenarios and environment types. At
the same time, we will need to experiment with larger and more sophisticated neural
networks with convolutional architectures, which have dominated the state of the art
in image recognition and semantic labeling in recent years. It remains to be seen how
well autoencoders with various architectures may work for novelty detection as the
sizes of datasets and networks scale up, and whether they can be adapted to estimate
a probability or degree of novelty that is more nuanced than a binary prediction.
As we scale the neural networks and the size and diversity of training data, it
will be interesting to observe the generalization capabilities of our approach. For
instance, how much training data, and what types of training data, will be needed
to correctly predict collisions in general indoor environments? What about natural
162
environments? It seems intuitive that as our neural networks are trained on more and
more types of scenarios, eventually the internal feature representation will approach
some stationary point so that any additional training data will fit with the learned
model. How much data will it take to reach this point? Graphical simulators and
video game engines provide a potential source of vast amounts of training data, if it
can be made realistic or diverse enough.
In this thesis, we have focused only on single camera images, but for highly dy-
namic systems, it may also be useful to consider a history of images since the history
encodes the dynamics in a way that cannot be captured by a single image. We could
simply extend the neural network input to accept multiple images, or we could con-
sider using a recurrent neural network to learn temporal patterns. Processing images
over time opens many further questions about how to perform novelty detection or
uncertainty quantification for an image sequence.
One of the goals in this line of research is to explore the limits of what robots
can accomplish without maintaining a detailed estimate of the geometry of their
surroundings, and instead rely on learned models for sensor processing. We have
made some initial steps in this direction by testing collision prediction in place of
any conventional collision checking, but there is much more work to be done. For
instance, can we also compute effective heuristic functions directly from raw sensor
data? Can we develop all-purpose collision-avoidance models that can be implemented
efficiently in special ultra-low power neural-inspired circuitry for extremely small and
fast-moving robots? As robotic technology advances on all fronts, our hope is that
this thesis provides a useful step toward maximizing the performance of highly agile
autonomous systems, and demonstrates the future potential for machine learning to
play a transformative role in that development.
163
164
Bibliography
[1] Microsoft kinect sensor. https://msdn.microsoft.com/en-us/library/

hh438998 . aspx. Accessed: 2016-12-29.
[21 notMNIST dataset. http://yaroslavvb.blogspot. com/2011/09/

notmnist-dataset . html. Accessed: 2017-01-26.
[31 Pieter Abbeel, Adam Coates, and Andrew Y. Ng. Autonomous helicopter aer-
obatics through apprenticeship learning. InternationalJournal of Robotics Re-
search, 2010.
[4] Anayo K. Akametalu, Jamie F. Fisac, Jeremy H. Gillula, Shahab Kaynama,

Melanie N. Zeilinger, and Claire J. Tomlin. Reachability-based safe learning
with gaussian processes. In Proc. IEEE Conference on Decision and Control,
2014.
[5] Daniel Althoff, James J. Kuffner, Dirk Wollherr, and Martin Buss. Safety
assessment of robot trajectories for navigation in uncertain and dynamic envi-
ronments. Autonomous Robots, 2012.
161 Sankalp Arora, Sanjiban Choudhury, Sebastian Scherer, and Daniel Althoff. A
principled approach to enable safe and high performance maneuvers for au-
tonomous rotorcraft. In Proc. AHS 70th Annual Forum, 2014.
17] Sankalp Arora, Sanjiban Choudhury, Daniel Althoff, and Sebastian Scherer.
Emergency maneuver library-ensuring safe navigation in partially known envi-
ronments. In Proc. IEEE International Conference on Robotics and Automa-
tion, 2015.
[81 Abraham Bachrach. Trajectory Bundle Estimation For Perception-DrivenPlan-

ning. PhD thesis, Massachusetts Institute of Technology, 2013.
[9] Abraham Bachrach, Samuel Prentice, Ruijie He, and Nicholas Roy. RANGE-
robust autonomous navigation in GPS-denied environments. Journal of Field
Robotics, 2011.
[10] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep con-
volutional encoder-decoder architecture for image segmentation. arXiv preprint
arXiv:1511.00561, 2015.
165
[11 J. Andrew Bagnell, David Bradley, David Silver, Boris Sofman, and Anthony
Stentz. Learning for autonomous navigation. IEEE Robotics Automation Mag-
azine, 2010.
[12] Haoyu Bai, David Hsu, Wee Sun Lee, and Vien A. Ngo. Monte carlo value
iteration for continuous-state pomdps. In Algorithmic Foundations of Robotics
IX. 2010.
[13] Haoyu Bai, David Hsu, Mykel J Kochenderfer, and Wee Sun Lee. Unmanned
aircraft collision avoidance using continuous-state POMDPs. In Proc. Robotics:
Science and Systems, 2012.
[14] Haoyu Bai, David Hsu, and Wee Sun Lee. Integrated perception and planning in
the continuous space: A POMDP approach. InternationalJournal of Robotics
Research, 2014.
[15] Dan Barnes, Will Maddern, and Ingmar Posner. Find your own way: Weakly-
supervised segmentation of path proposals for urban autonomy. In Proc. IEEE
InternationalConference on Robotics and Automation, 2017.
[16] Kostas E. Bekris and Lydia E. Kavraki. Greedy but safe replanning under
kinodynamic constraints. In Proc. IEEE International Conference on Robotics
and Automation, 2007.
[17] Felix Berkenkamp, Andreas Krause, and Angela P. Schoellig. Bayesian op-
timization with safety constraints: Safe and automatic parameter tuning in
robotics. arXiv preprint arXiv:1602.04450, 2016.
[181 Felix Berkenkamp, Angela P. Schoellig, and Andreas Krause. Safe controller op-
timization for quadrotors with Gaussian processes. In Proc. IEEE International
Conference on Robotics and Automation, 2016.
[191 Andreas Bircher, Mina Kamel, Kostas Alexis, Helen Oleynikova, and Roland
Siegwart. Receding horizon "next-best-view" planner for 3D exploration. In
Proc. IEEE International Conference on Robotics and Automation, 2016.
[201 Christopher M. Bishop. Novelty detection and neural network validation. 1994.
121] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra.
Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424, 2015.
122] Michael W. Bode. Learning the forward predictive model for an off-road skid-
steer vehicle. Technical Report CMU-RI-TR-07-32, Robotics Institute, 2007.
[231 Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner,
Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller,
Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. End to end learning for
self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
166
[24] Oliver Brock and Oussama Khatib. High-speed navigation using the global
dynamic window approach. In Proc. IEEE InternationalConference on Robotics
[25] Adam Bry and Nicholas Roy. Rapidly-exploring random belief trees for mo-
tion planning under uncertainty. In Proc. IEEE International Conference on
Robotics and Automation, 2011.
[26] Adam Bry, Abraham Bachrach, and Nicholas Roy. State estimation for aggres-
sive flight in GPS-denied environments using onboard sensing. In Proc. IEEE
International Conference on Robotics and Automation, 2012.
[271 Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving:
Learning affordance for direct perception in autonomous driving. In Proc. IEEE
International Conference on Computer Vision, 2015.
[28] C. Connolly. The determination of next best views. In Proc. IEEE International
Conference on Robotics and Automation, 1985.
[29] Mark Cutler and Jonathan P. How. Autonomous drifting using simulation-aided
reinforcement learning. In Proc. IEEE International Conference on Robotics
[30] Shreyansh Daftry, Sam Zeng, J. Andrew Bagnell, and Martial Hebert. Intro-
spective perception: Learning to predict failures in vision systems. In Proc.
IEEE International Conference on Intelligent Robots and Systems, 2016.
[31] Marc Deisenroth and Carl E. Rasmussen. Pilco: A model-based and data-
efficient approach to policy search. In Proceedings of the 28th International
Conference on machine learning (ICML-11), pages 465-472, 2011.
[32] David Eigen, Christian Puhrsch, and Rob Fergus. Depth map prediction from
a single image using a multi-scale deep network. In Proc. Advances in Neural
Information Processing Systems, 2014.
[331 Clement Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. Learn-
ing hierarchical features for scene labeling. IEEE Transactions on PatternAnal-
ysis and Machine Intelligence, 2013.
[34] Paolo Fiorini and Zvi Shiller. Motion planning in dynamic environments using
velocity obstacles. InternationalJournal of Robotics Research, 1998.
[35] Dieter Fox, Wolfram Burgard, and Sebastian Thrun. The dynamic window
approach to collision avoidance. IEEE Robotics Automation Magazine, 1997.
[36] Thierry Fraichard. A short paper about motion safety. In Proc. IEEE Interna-
tional Conference on Robotics and Automation, 2007.
167
[37 Thierry Fraichard and Hajime Asama. Inevitable collision states-a step towards
safer robots? Advanced Robotics, 2004.
[38] Jiirgen Franke and Michael H. Neumann. Bootstrapping neural networks. Neu-
ral Computation, 2000.
[39] Emilio Frazzoli, Munther A. Dahleh, and Eric Feron. Real-time motion planning
for agile autonomous vehicles. Journal of Guidance, Control, and Dynamics,
2002.
[40] Yarin Gal. Uncertainty in Deep Learning. PhD thesis, University of Cambridge,
2016.
[41] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation:

Representing model uncertainty in deep learning. In Proc. International Con-
ference on Machine Learning, 2015.
142] Javier Garcia and Fernando FernAndez. A comprehensive survey on safe rein-
forcement learning. Journal of Machine Learning Research, 2015.
[43] Jeremy H. Gillulay and Claire J. Tomlin. Guaranteed safe online learning of a
bounded system. In Proc. IEEE/RSJ InternationalConference on Intelligent
Robots and Systems, 2011.
[44] Alessandro Giusti, J6r6me Guzzi, Dan C. Ciregan, Fang-Lin He, Juan P. Ro-
driguez, Flavio Fontana, Matthias Faessler, Christian Forster, Jiirgen Schmid-
huber, Gianni Di Caro, Davide Scaramuzza, and Luca M. Gambardella. A ma-
chine learning approach to visual perception of forest trails for mobile robots.
IEEE Robotics and Automation Letters, 2016.
[45] Hector H. Gonzdlez-Bafios and Jean-Claude Latombe. Navigation strategies

for exploring indoor environments. InternationalJournal of Robotics Research,
2002.
[46] Hugo Grimmett, Rudolph Triebel, Rohan Paul, and Ingmar Posner. Intro-
spective classification for robot perception. International Journal of Robotics
Research, 2016.
[47] Corina Gurau, Chi Hay Tong, and Ingmar Posner. Fit for purpose? predicting
perception performance based on past experience. In InternationalSymposium
on Experimental Robotics, 2016.
148] Raia Hadsell, Pierre Sermanet, Jan Ben, Ayse Erkan, Marco Scoffier, Koray
Kavukcuoglu, Urs Muller, and Yann LeCun. Learning long-range vision for
autonomous off-road driving. Journal of Field Robotics, 2009.
[49] Simon Hawkins, Hongxing He, Graham Williams, and Rohan Baxter. Outlier
Detection Using Replicator Neural Networks. 2002.
168
[50] Matanya Horowitz and Joel W. Burdick. Interactive non-prehensile manipu-
lation for grasping via POMDPs. In Proc. IEEE International Conference on
Robotics and Automation, 2013.
[511 Thomas M. Howard, Colin Green, David Ferguson, and Alonzo Kelly. State
space sampling of feasible motions for high-performance mobile robot navigation
in complex environments. Journal of Field Robotics, 2008.
[52] Thomas M. Howard, Mihail Pivtoraiko, Ross A. Knepper, and Alonzo Kelly.
Model-predictive motion planning: Several key developments for autonomous
mobile robots. IEEE Robotics Automation Magazine, 2014.
[531 Kaijen Hsiao, Leslie Pack Kaelbling, and Tomis Lozano-Perez. Robust grasping
under object pose uncertainty. Autonomous Robots, 2011.
[54] Larry D. Jackel, Eric Krotkov, Michael Perschbacher, Jim Pippine, and Chad
Sullivan. The DARPA LAGR program: goals, challenges, methodology, and
phase I results. Journal of Field robotics, 2006.
[551 Lucas Janson, Edward Schmerling, and Marco Pavone. Monte carlo motion
planning for robot trajectory optimization under uncertainty. In Proc. Interna-
tional Symposium on Robotics Research, 2015.
[56] Nathalie Japkowicz, Catherine Myers, and Mark Gluck. A novelty detection
approach to classification. In Proc. InternationalJoint Conference on Artificial
Intelligence, 1995.
[57] Joshua M. Joseph, Alborz Geramifard, John W. Roberts, Jonathan P. How,

and Nicholas Roy. Reinforcement learning with misspecified model classes. In
Proc. IEEE International Conference on Robotics and Automation, May 2013.
[58] Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforce-
ment learning: A survey. Journal of Artificial Intelligence Research, 1996.
[59] Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Plan-
ning and acting in partially observable stochastic domains. Artificial Intelli-
gence, 1998.
[601 Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, and Sergey Levine.
Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint
arXiv:1 702.01182, 2017.
[61] Sertac Karaman and Emilio Frazzoli. High-speed flight in an ergodic forest. In
Proc. IEEE InternationalConference on Robotics and Automation, 2012.
[62] Alex Kendall, Vijay Badrinarayanan, and Roberto Cipolla. Bayesian segnet:
Model uncertainty in deep convolutional encoder-decoder architectures for scene
understanding. arXiv preprint arXiv:1511.02680, 2015.
169
[631 Jens Kober, J. Andrew Bagnell, and Jan Peters. Reinforcement learning in
robotics: A survey. InternationalJournal of Robotics Research, 2013.
[641 Sven Koenig and Maxim Likhachev. Fast replanning for navigation in unknown
terrain. IEEE Transactions on Robotics, 2005.
[651 Sven Koenig, Maxim Likhachev, and David Furcy. Lifelong planning A*. Arti-
ficial Intelligence, 2004.
[66] Thomas Kollar and Nicholas Roy. Trajectory optimization using reinforcement
learning for map exploration. InternationalJournal of Robotics Research, 2008.
[67 Alex Krizhevsky and Geoffrey E. Hinton. Using very deep autoencoders for
content-based image retrieval. In Proc. European Symposium on Artificial Neu-
ral Networks, 2011.
[68] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classifi-
cation with deep convolutional neural networks. In Proc. Advances in Neural
[69] Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross valida-
tion, and active learning. In Proc. Advances in Neural Information Processing
Systems, 1995.
[701 Hanna Kurniawati, David Hsu, and Wee Sun Lee. SARSOP: Efficient point-
based POMDP planning by approximating optimally reachable belief spaces.
In Proc. Robotics: Science and Systems, 2008.
[711 Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and
scalable predictive uncertainty estimation using deep ensembles. arXiv preprint
arXiv:1612.01474, 2016.
[72] John Leonard, Jonathan How, Seth Teller, Mitch Berger, Stefan Campbell,
Gaston Fiore, Luke Fletcher, Emilio Frazzoli, Albert Huang, Sertac Karaman,
Oliver Koch, Yoshiaki Kuwata, David Moore, Edwin Olson, Steve Peters, Justin
Teo, Robert Truax, and Matthew Walter. A perception-driven autonomous
urban vehicle. Journal of Field Robotics, 2008.
[731 Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end
training of deep visuomotor policies. Journal of Machine Learning Research,
2016.
[741 Maxim Likhachev and Anthony Stentz. Probabilistic planning with clear pref-
erences on missing information. Artificial Intelligence, 2009.
[75] Brandon Luders, Mangal Kothari, and Jonathan P. How. Chance constrained
RRT for probabilistic robustness to environmental uncertainty. In Proc. AIAA
Guidance, Navigation, and Control Conference, 2010.
170
[76] Brandon D. Luders, Sertac Karaman, and Jonathan P. How. Robust sampling-
based motion planning with asymptotic optimality guarantees. In Proc. AIAA
Guidance, Navigation, and Control Conference, 2013.
[77] Sergei Lupashin, Angela Schdllig, Michael Sherback, and Raffaello D'Andrea.
A simple learning strategy for high-speed quadrocopter multi-flips. In Proc.
IEEE International Conference on Robotics and Automation, 2010.
[78] David J. C. Mackay. Probable networks and plausible predictions-a review of

practical bayesian methods for supervised neural networks. Network: Compu-
tation in Neural Systems, 1995.
[791 Anirudha Majumdar and Russ Tedrake. Funnel libraries for real-time robust
feedback motion planning. arXiv preprint arXiv:1601.04037, 2016.
[80] Alexei A. Makarenko, Stefan B. Williams, Frederic Bourgault, and Hugh F.

Durrant-Whyte. An experiment in integrated exploration. In Proc. IEEE In-
ternational Conference on Intelligent Robots and Systems, 2002.
[81] Larry Manevitz and Malik Yousef. One-class document classification via neural
networks. Neurocomputing, 2007.
[82] Erik Marchi, Fabio Vesperini, Florian Eyben, Stefano Squartini, and Bj6rn
Schuller. A novel approach for automatic acoustic novelty detection using a
denoising autoencoder with bidirectional LSTM neural networks. In Proc. IEEE
International Conference on Acoustics, Speech and Signal Processing, 2015.
[83] Jeff Michels, Ashutosh Saxena, and Andrew Y. Ng. High speed obstacle avoid-
ance using monocular vision and reinforcement learning. In Proc. International
Conference on Machine Learning, 2005.
[84] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Ve-
ness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fid-
jeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis
Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and
Demis Hassabis. Human-level control through deep reinforcement learning. Na-
ture, 2015.
[85] Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, and Yann LeCun. Off-road
obstacle avoidance through end-to-end learning. In Proc. Advances in Neural
[86] Bart Nabbe and Martial Hebert. Extending the path-planning horizon. 2007.
[87] Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte,
Ben Tse, Eric Berger, and Eric Liang. Autonomous inverted helicopter flight
via reinforcement learning. In Experimental Robotics IX. 2006.
171
[88] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily
fooled: High confidence predictions for unrecognizable images. In Proc. IEEE
Conference on Computer Vision and PatternRecognition, 2015.
[89] Gabriel L. Oliveira, Wolfram Burgard, and Thomas Brox. Efficient deep models
for monocular road segmentation. In Proc. IEEE International Conference on
Intelligent Robots and Systems, 2016.
[90] Gerhard Paass. Assessing and improving neural network predictions by the
bootstrap algorithm. In Proc. Advances in Neural Information Processing Sys-
tems, 1993.
[91] Sachin Patil, Jur van den Berg, and Ron Alterovitz. Estimating probability
of collision for safe motion planning under gaussian motion and sensing uncer-
tainty. In Proc. IEEE International Conference on Robotics and Automation,
2012.
[92] Joelle Pineau, Geoff Gordon, and Sebastian Thrun. Point-based value iteration:
An anytime algorithm for POMDPs. In Proc. InternationalJoint Conference
on Artificial Intelligence, 2003.
[93] Dean A. Pomerleau. ALVINN, an autonomous land vehicle in a neural network.

In Proc. Advances in Neural Information Processing Systems, 1988.
[94] Dean A. Pomerleau. Input reconstruction reliability estimation. In Proc. Ad-
vances in Neural Information Processing Systems, 1993.
[951 Christian Potthast and Gaurav S. Sukhatme. A probabilistic framework for

next best view estimation in a cluttered environment. Journal of Visual Com-
munication and Image Representation, 2014.
[961 Samuel Prentice and Nicholas Roy. The belief roadmap: Efficient planning
in belief space by factoring the covariance. InternationalJournal of Robotics
Research, 2009.
[971 Nathan D. Ratliff, David Silver, and J. Andrew Bagnell. Learning to search:
Functional gradient techniques for imitation learning. Autonomous Robots,
2009.
[98] Charles Richter and Nicholas Roy. Learning to plan for visibility in navigation
of unknown environments. In Proc. InternationalSymposium on Experimental
Research, 2016.
[991 Charles Richter and Nicholas Roy. Safe visual navigation via deep learning and
novelty detection. In Proc. Robotics: Science and Systems, 2017.
[100] Charles Richter, William R. Vega-Brown, and Nicholas Roy. Markov chain
hallway and Poisson forest environment generating distributions. Technical
Report MIT-CSAIL-TR-2015-014, Computer Science and Artificial Intelligence
Laboratory, Massachusetts Institute of Technology, 2015.
172
_ __,_ ..........
__F'__'__"' ,",: _"' ,. ..
..
.....
.
[101] Charles Richter, William R. Vega-Brown, and Nicholas Roy. Bayesian learning
for safe high-speed navigation in unknown environments. In Proc. International
Symposium on Robotics Research, 2015.
[102] Susan Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wen-
del, Debabrata Dey, J. Andrew Bagnell, and Martial Hebert. Learning monoc-
ular reactive UAV control in cluttered natural environments. In Proc. IEEE
International Conference on Robotics and Automation, 2013.
[103] Nicholas Roy and Sebastian Thrun. Coastal navigation with mobile robots. In
Proc. Advances in Neural Information Processing Systems, 1999.
1104] Edward Schmerling and Marco Pavone. Evaluating trajectory collision proba-
bility through adaptive importance sampling for safe motion planning. arXiv
preprint arXiv:1609.05399, 2016.
[105] Tom Schouwenaars, Jonathan How, and Eric Feron. Receding horizon path
planning with implicit safety guarantees. In Proc. American Control Confer-
ence, 2004.
[106] David Silver and Joel Veness. Monte-carlo planning in large POMDPs. In Proc.
Advances in Neural Information Processing Systems, 2010.
[107] Reid Simmons. The curvature-velocity method for local obstacle avoidance. In
Proc. IEEE InternationalConference on Robotics and Automation, 1996.
[108J Reid Simmons and Sven Koenig. Probabilistic robot navigation in partially
observable environments. In Proc. InternationalJoint Conference on Artificial
Intelligence, 1995.
[109] Boris Sofman, Bradford Neuman, Anthony Stentz, and J. Andrew Bagnell.
Anytime online novelty and change detection for mobile robots. Journal of
Field Robotics, 2011.
[110] Cyrill Stachniss, Giorgio Grisetti, and Wolfram Burgard. Information gain-
based exploration using Rao-Blackwellized particle filters. In Proc. Robotics:
Science and Systems, 2005.
[111] Anthony Stentz. Optimal and efficient path planning for partially-known envi-
ronments. In Proc. IEEE International Conference on Robotics and Automa-
tion, 1994.
[112] Anthony Stentz. The focussed D* algorithm for real-time replanning. In Proc.
InternationalJoint Conference on Artificial Intelligence, 1995.
1113] Yanan Sui, Alkis Gotovos, Joel W. Burdick, and Andreas Krause. Safe explo-
ration for optimization with gaussian processes. In Proc. InternationalConfer-
ence on Machine Learning, 2015.
173
[1141 Wen Sun, Luis G. Torres, Jur Van Den Berg, and Ron Alterovitz. Safe mo-
tion planning for imprecise robotic manipulators by minimizing probability of
collision. In Proc. InternationalSymposium on Robotics Research, 2013.
[1151 Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduc-

tion. 1998.
1116] Selim Temizer, Mykel J. Kochenderfer, Leslie Pack Kaelbling, Tomds Lozano-
P6rez, and James K. Kuchar. Collision avoidance for unmanned aircraft using
Markov decision processes. In Proc. AIAA Guidance, Navigation, and Control
Conference, 2010.
[117] Benjamin B. Thompson, Robert J. Marks, Jai J. Choi, Mohamed A. El-

Sharkawi, Ming-Yuh Huang, and Carl Bunje. Implicit learning in autoencoder
novelty assessment. In Proc. International Joint Conference on Neural Net-
works, 2002.
[118] Sebastian Thrun, Mike Montemerlo, Hendrik Dahlkamp, David Stavens, An-
drei Aron, James Diebel, Philip Fong, John Gale, Morgan Halpenny, Gabriel
Hoffmann, Kenny Lau, Celia Oakley, Mark Palatucci, Vaughan Pratt, Pascal
Stang, Sven Strohband, Cedric Dupont, Lars-Erik Jendrossek, Christian Koe-
len, Charles Markey, Carlo Rummel, Joe van Niekerk, Eric Jensen, Philippe
Alessandrini, Gary Bradski, Bob Davies, Scott Ettinger, Adrian Kaehler, Ara
Nefian, and Pamela Mahoney. Stanley: The robot that won the DARPA grand
challenge. Journal of Field Robotics, 2006.
[119] Jur Van Den Berg, Pieter Abbeel, and Ken Goldberg. LQG-MP: Optimized
path planning for robots with motion uncertainty and imperfect state informa-
tion. InternationalJournal of Robotics Research, 2011.
[120] William R. Vega-Brown, Abraham Bachrach, Adam Bry, Jonathan Kelly, and
Nicholas Roy. CELLO: A fast algorithm for covariance estimation. In Proc.
IEEE InternationalConference on Robotics and Automation, 2013.
1121] William R. Vega-Brown, Marek Doniec, and Nicholas Roy. Nonparametric

Bayesian inference on multivariate exponential families. In Proc. Advances in
Neural Information Processing Systems, 2014.
[1221 Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning,

1992.
[123] Michael Watterson and Vijay Kumar. Safe receding horizon control for ag-
gressive MAV flight with limited range sensing. In Proc. IEEE International
Conference on Intelligent Robots and Systems, 2015.
[1241 Carl Wellington and Anthony Stentz. Learning predictions of the load-bearing
surface for autonomous rough-terrain navigation in vegetation. In Proc. Inter-
national Conference on Field and Service Robotics, 2003.
174
[1251 Brian Yamauchi. A frontier-based approach for autonomous exploration. In
Proc. IEEE Symposium on ComputationalIntelligence in Robotics and Automa-
tion, 1997.
[1261 Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol
Vinyals. Understanding deep learning requires rethinking generalization. arXiv
preprint arXiv:1611.03530, 2016.
175

Signature Redacted

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Signature Redacted

Enviado por

Direitos autorais:

Formatos disponíveis

Autonomous Navigation in Unknown Environments

Using Machine Learning

Submitted to the Department of Aeronautics and Astronautics

Thesis Committee Member: Tomis Lozano-Phrez

Thesis Committee Member: Sertac Karaman

Thesis Committee Member: Ingmar Posner

3 Predicting Collisions from Features of a Geometric Map 65

4 Predicting the Utility of Future Measurements 105

5 Predicting Collisions from Learned Features of Raw Sensor Data 121

1-1 Baseline planner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3-1 Pre-computed action set. . . . . . . . . . . . . . . . . . 68

3-7 Performance in simulated hallways as a function of sensor configuration. 87

3-11 Simulated transition between familiar and unfamiliar environment types. 91

3-13 Hallway-type experiment. . . . . . . . . . . . . . . . . . . . . . . . 93

3-14 Experiment with restricted sensor range and field-of-view. . . . . . 96

3-15 Experiment traversing multiple environment types. . . . . . . . . . 98

3-16 M aximum speed test. . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3-17 Example of wheel slip during an experiment. . . . . . . . . . . . . . 100

4-1 Baseline planner. . . . . . . . . . . . . . . . . . . . . . . 106

4-2 Measurement utility training procedure. . . . . . . . . . 109

5-1 Training examples for image-based collision prediction. . . . . . . . . 130

3.1 Collision prediction performance in a hallway environment. . . . . . . 94

4.1 Measurement utility prediction performance. . . . . . . . . . . . . . . 117

In this thesis, we study the problem of autonomous navigation through an environ-

Nevertheless, we have strong evidence that extremely agile high-speed naviga-

1.1 Safe Planning in Unknown Environments

We refer to this guaranteed-safe planning approach as the "baseline" planner and

a*(bt) =argmin{Ja(at) + h(bt, at)} (1.1)

Here, we select an action a* at time t, from a set A of pre-computed motion primitives,

1.2 Predicting Collisions

Conventional planning constraints, derived from worst-case assumptions that any

To compensate for this missing knowledge of the environment distribution, we

a*(bt) = argmin{Ja(at) + h(bt, at) + Jc - fc(bt, at)}. (1.3)

Another way to interpret our approach is through the viewpoint of reinforcement

While our method improves performance in familiar types of environments that

1.3 Predicting the Utility of Future Measurements

(a) 6 i/s. (b) 4 m/s. (c)6 r/s.

We might consider applying existing next-best-view or information-theoretic tech-

It is possible to manually assign some value to information, for the purpose of

a*(bt) =argmin{Ja(bt, at) + h(bt, at) + fh(bt,at) (1.4)

s.t. g(bt, at) = 0. (1.5)

1.4 Learning Predictive Features from Raw Data

As we will show in Chapter 5, we can deploy our deep learning system in an

" A problem statement of minimum-time navigation in unknown environments as

" A method of training a model to estimate collision probability efficiently in

* An application of Bayesian inference to combine training data for collision pre-

* A neural network formulation allowing a robot to predict collisions directly from

" A novelty-detection approach, coupled with an offline training method, which

2.1 Safe Navigation for Dynamic Mobile Robots

2.1.1 Contingency Plans and Emergency-Stop Maneuvers

2.1.2 Probabilistic Notions of Safety

In addition to randomness in the physical environment, there has been substan-

2.2 Navigating Unknown Environments

2.2.2 Incremental Re-Planning in an Unfolding Map

In a similar vein, Likhachev and Stentz [74] developed a method of computing

Nevertheless, a number of approaches have helped to reduce the practical cost of

1108] demonstrated partially observable office navigation using a discrete, topological

In spite of computational difficulty, POMDPs have been used successfully in a

2.3 Learning to Control Highly Dynamic Robots

2.3.1 Learning Under Uncertainty of the System Dynamics

2.3.2 Learning From Demonstrations

2.3.3 Safe Learning

a(bt) = argmin b(st)R(st, at) + P(st+1ibt,at)V(st+1). (3.7)