Você está na página 1de 25

Robust Visual Inertial Odometry Using a Direct

EKF-Based Approach

Michael Bloesch, Sammy Omari, Marco Hutter, Roland Siegwart


Autonomous Systems Lab, ETH Zurich

Authors
Michael
Bloesch
PhD, ETH
Zurich

Sammy
Omari
GoPro,
PhD, ETH
Zurich

Marco
Hutter
Asst. Prof.
Robotic
Systems,
ETH Zurich

Roland
Seigwart
Director,
ASL - ETH
Zurich

What do we want to achieve?

R. Newcombe, CVPR 2014 VSLAM Talk

Why and how?

Why VO? - Navigate vehicles or robot through


highly unstructured environments. Necessary for
dynamic operation of these robots. Ex - small aerial
vehicles

Higher performance is achieved by integrating IMU


with the vision sensor.

Recover path incrementally, pose after pose.

So far researchers have..

Monocular SLAM demonstrated by Davison

Dense methods and RGB-D based methods by


Cremer et. al.

Semi dense, semi direct methods with real time


operation demonstrated by Forster et. al.

CVG, ETHZ also published in IROS 15 a semi


direct EKF based VO approach

Challenges

Consistency of estimation framework

Feature representation and initialization

Use best of both worlds - feature based methods


and direct methods

Improve robustness of tracking and real time


operation

DirectApproach

Feature-based methods are based on salient and


repeatable features that are tracked over the frames;
appearance-based methods use the intensity
information of all the pixels in the image or subregions

ROVIO uses a hybrid method where photometric error


between multilevel patch features are tracked within the
EKF(in innovation term) - allowing direct intensity
feedback

Takes into account texture and can also use edge


features

Robocentric Mapping

State Estimator is consistent if it is unbiased and MSE converges


to zero in a stochastic sense - here the error covariance.

Standard EKF formulation has linearisation errors which


produces inconsistency problems.

Solution? - Attach reference frame to the vehicle as base


reference for the stochastic map

Add vehicle motion in the state formulation

Estimate incremental motion and optimise it.Transform stochastic


map according to new camera pose - now centered at origin.

Camera as a bearing sensor

Camera acts as a projective


sensor that measures bearing of a
point with respect to the optical
axis

We can estimate depth by


observing a point at different
angles
CVPR 2014 VSLAM Talk

The angle between the


observations is the points parallax

Feature Initialization

Features are parameterized using


a bearing vector and a depth
parameter - inverse distance

Delayed initialization schemes


treat new features separate until
depth uncertainty has been
reduced - thus not contributing
immediately

Using the above parameterization


enables features to be integrated
into the filter as soon as they are
detected - in an undelayed
manner.
S. Albrecht, Analysis of VSLAM

State Parameterization

Inertial world frame, body frame, camera


fixed frame are used

Rotations and unit vectors are


parameterised using the idea of
manifolds - in a minimal representation.
This helps in easily computing the
tangent space.

S. LaValle, Planning Algorithms

S. LaValle, Planning Algorithms

MTK Toolkit

Filter Update

2D linear constraint for each feature i with bearing


vector u visible in the frame, is used in the
innovation term

Large uncertainty in initial guess of the bearing


vector will cause the update to fail.

Do patch based search of the feature

Use Mahalanobis outlier detection to account for


moving objects

Features in the filter state

Feature Handling

Multi-level patch features are estimated according


to the IMU motion model

Convergence of feature distance parameters and


calibration of extrinsic yields high quality
predictions

CVPR VSLAM Talk

Feature Handling

Get multi-level patch of 8x8 pixels for a feature with


a given bearing vector

Allow for direct intensity feedback into the filter.

Robust against bad initial guesses and blur

Formulate a more accurate model by accounting


for texture

Feature Handling

Compute affine warping matrix W to account for


local distortion of patches

Multi-level patch is extracted again once a feature


is tracked

Alignment equations

The intensity errors used in the update equation or


pre alignment are modelled in this way:

Gauss Newton optimisation used for regular patch


alignment using the square of the error term for
getting optimal patch coordinates.

Use QR decomposition to tackle computational


intractability
=>

Feature detection and removal

Use FAST corner detector

Remove features very close to tracked features

Strategy to incorporate new features to add to


state: adapted Shi Tomasi score which considers
combined Hessian on multiple image levels.

Use heuristics to keep a sane filter size - evaluate


local and global quality score and use adaptive
thresholding

Test Setup Configurations

Number of features : 50

Image pyramid levels : 4

Initial inverse distance parameter : 0.5

Initial camera extrinsic are guessed

Ground truth from Vicon

Results - Slow motion


Assessing
performance using
by computing
relative position error
wrt distance
travelled

Results - Fast motion


Tracking performance of attitude and robocentric velocities.
Estimates within 3 s.d. are plotted

Results - Fast motion

Results - Flying

VO framework is
initialized without any
calibration.
Calibration
parameters converge
online

Offset can be
attributed to online
calibration of the filter

Demonstration
https://www.youtube.com/
watch?v=ZMAISVy-6ao&

Thank you

Você também pode gostar