Seminar - Visual Odometry Using Direct EKF Approach

Robust Visual Inertial Odometry Using a Direct
EKF-Based Approach
Michael Bloesch, Sammy Omari, Marco Hutter, Roland Siegwart

Autonomous Systems Lab, ETH Zurich
Authors
Michael
Bloesch
PhD, ETH
Zurich
Sammy
Omari
GoPro,
PhD, ETH
Zurich
Marco
Hutter
Asst. Prof.
Robotic
Systems,
ETH Zurich
Roland
Seigwart
Director,
ASL - ETH
Zurich
What do we want to achieve?
R. Newcombe, CVPR 2014 VSLAM Talk
Why and how?
Why VO? - Navigate vehicles or robot through

highly unstructured environments. Necessary for
dynamic operation of these robots. Ex - small aerial
vehicles
Higher performance is achieved by integrating IMU

with the vision sensor.
Recover path incrementally, pose after pose.
So far researchers have..
Monocular SLAM demonstrated by Davison
Dense methods and RGB-D based methods by

Cremer et. al.
Semi dense, semi direct methods with real time

operation demonstrated by Forster et. al.
CVG, ETHZ also published in IROS 15 a semi

direct EKF based VO approach
Challenges
Consistency of estimation framework
Feature representation and initialization
Use best of both worlds - feature based methods

and direct methods
Improve robustness of tracking and real time

operation
DirectApproach
Feature-based methods are based on salient and

repeatable features that are tracked over the frames;
appearance-based methods use the intensity
information of all the pixels in the image or subregions
ROVIO uses a hybrid method where photometric error

between multilevel patch features are tracked within the
EKF(in innovation term) - allowing direct intensity
feedback
Takes into account texture and can also use edge

features
Robocentric Mapping
State Estimator is consistent if it is unbiased and MSE converges

to zero in a stochastic sense - here the error covariance.
Standard EKF formulation has linearisation errors which

produces inconsistency problems.
Solution? - Attach reference frame to the vehicle as base

reference for the stochastic map
Add vehicle motion in the state formulation
Estimate incremental motion and optimise it.Transform stochastic

map according to new camera pose - now centered at origin.
Camera as a bearing sensor
Camera acts as a projective

sensor that measures bearing of a
point with respect to the optical
axis
We can estimate depth by

observing a point at different
angles
CVPR 2014 VSLAM Talk
The angle between the

observations is the points parallax
Feature Initialization
Features are parameterized using

a bearing vector and a depth
parameter - inverse distance
Delayed initialization schemes

treat new features separate until
depth uncertainty has been
reduced - thus not contributing
immediately
Using the above parameterization

enables features to be integrated
into the filter as soon as they are
detected - in an undelayed
manner.
S. Albrecht, Analysis of VSLAM
State Parameterization
Inertial world frame, body frame, camera

fixed frame are used
Rotations and unit vectors are

parameterised using the idea of
manifolds - in a minimal representation.
This helps in easily computing the
tangent space.
S. LaValle, Planning Algorithms
S. LaValle, Planning Algorithms
MTK Toolkit
Filter Update
2D linear constraint for each feature i with bearing

vector u visible in the frame, is used in the
innovation term
Large uncertainty in initial guess of the bearing

vector will cause the update to fail.
Do patch based search of the feature
Use Mahalanobis outlier detection to account for

moving objects
Features in the filter state
Feature Handling
Multi-level patch features are estimated according

to the IMU motion model
Convergence of feature distance parameters and

calibration of extrinsic yields high quality
predictions
CVPR VSLAM Talk
Feature Handling
Get multi-level patch of 8x8 pixels for a feature with

a given bearing vector
Allow for direct intensity feedback into the filter.
Robust against bad initial guesses and blur
Formulate a more accurate model by accounting

for texture
Feature Handling
Compute affine warping matrix W to account for

local distortion of patches
Multi-level patch is extracted again once a feature

is tracked
Alignment equations
The intensity errors used in the update equation or

pre alignment are modelled in this way:
Gauss Newton optimisation used for regular patch

alignment using the square of the error term for
getting optimal patch coordinates.
Use QR decomposition to tackle computational

intractability
=>
Feature detection and removal
Use FAST corner detector
Remove features very close to tracked features
Strategy to incorporate new features to add to

state: adapted Shi Tomasi score which considers
combined Hessian on multiple image levels.
Use heuristics to keep a sane filter size - evaluate

local and global quality score and use adaptive
thresholding
Test Setup Configurations
Number of features : 50
Image pyramid levels : 4
Initial inverse distance parameter : 0.5
Initial camera extrinsic are guessed
Ground truth from Vicon
Results - Slow motion

Assessing
performance using
by computing
relative position error
wrt distance
travelled
Results - Fast motion

Tracking performance of attitude and robocentric velocities.
Estimates within 3 s.d. are plotted
Results - Fast motion
Results - Flying
VO framework is
initialized without any
calibration.
Calibration
parameters converge
online
Offset can be
attributed to online
calibration of the filter
Demonstration
https://www.youtube.com/
watch?v=ZMAISVy-6ao&
Thank you

Seminar - Visual Odometry Using Direct EKF Approach

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Seminar - Visual Odometry Using Direct EKF Approach

Enviado por

Direitos autorais:

Formatos disponíveis

Robust Visual Inertial Odometry Using a Direct

Michael Bloesch, Sammy Omari, Marco Hutter, Roland Siegwart

What do we want to achieve?

R. Newcombe, CVPR 2014 VSLAM Talk

Why and how?

Why VO? - Navigate vehicles or robot through

Higher performance is achieved by integrating IMU

Recover path incrementally, pose after pose.

So far researchers have..

Monocular SLAM demonstrated by Davison

Dense methods and RGB-D based methods by

Semi dense, semi direct methods with real time

CVG, ETHZ also published in IROS 15 a semi

Consistency of estimation framework

Feature representation and initialization

Use best of both worlds - feature based methods

Improve robustness of tracking and real time

Feature-based methods are based on salient and

ROVIO uses a hybrid method where photometric error

Takes into account texture and can also use edge

State Estimator is consistent if it is unbiased and MSE converges

Standard EKF formulation has linearisation errors which

Solution? - Attach reference frame to the vehicle as base

Add vehicle motion in the state formulation

Estimate incremental motion and optimise it.Transform stochastic

Camera as a bearing sensor

Camera acts as a projective

We can estimate depth by

The angle between the

Features are parameterized using

Delayed initialization schemes

Using the above parameterization

Inertial world frame, body frame, camera

Rotations and unit vectors are

S. LaValle, Planning Algorithms

S. LaValle, Planning Algorithms

2D linear constraint for each feature i with bearing

Large uncertainty in initial guess of the bearing

Do patch based search of the feature

Use Mahalanobis outlier detection to account for

Features in the filter state

Multi-level patch features are estimated according

Convergence of feature distance parameters and

CVPR VSLAM Talk

Get multi-level patch of 8x8 pixels for a feature with

Allow for direct intensity feedback into the filter.

Robust against bad initial guesses and blur

Formulate a more accurate model by accounting

Compute affine warping matrix W to account for

Multi-level patch is extracted again once a feature

The intensity errors used in the update equation or

Gauss Newton optimisation used for regular patch

Use QR decomposition to tackle computational

Feature detection and removal

Use FAST corner detector

Remove features very close to tracked features

Strategy to incorporate new features to add to

Use heuristics to keep a sane filter size - evaluate

Test Setup Configurations

Image pyramid levels : 4

Initial inverse distance parameter : 0.5

Initial camera extrinsic are guessed

Ground truth from Vicon