PHD Kirchner Matteo November 2018

ARENBERG DOCTORAL SCHOOL
Faculty of Engineering Science
Joint state/input estimation in

structural dynamics
State/force estimation using compressive sensing
within a multistep approach
𝑦𝑘
multistep estimator
input: 𝝍 = 𝒇(𝒖) 𝑥𝑘 𝑥𝑘
system: 𝒙ሶ = 𝒈 𝒙, 𝝍 𝜓𝑘 𝒖=𝒇 −𝟏
(𝝍) 𝑢𝑘
𝒚 = 𝒉 𝒙, 𝝍
Matteo Kirchner
Dissertation presented in partial fulfilment

of the requirements for the degree of
Doctor of Engineering Science (PhD):
Mechanical Engineering
November 2018
Joint state/input estimation in structural dynamics
State/force estimation using compressive sensing within a multistep
approach
Matteo KIRCHNER
Examination committee: Dissertation presented in partial

Prof. dr. ir. Carlo Vandecasteele, chair fulfilment of the requirements for
Prof. dr. ir. Wim Desmet, supervisor the degree of Doctor of Engineering
Dr. ir. Bert Pluymers, co-supervisor Science (PhD): Mechanical Engi-
Prof. dr. ir. Paul Sas neering
Prof. dr. ir. Davy Pissoort
Dr. ir. ing. Jan Croes
Ir. Eugène J.M. Nijman
(Virtual Vehicle Research Center, Austria)
Prof. Dr.-Ing. Claus-Peter Fritzen
(University of Siegen, Germany)
November 2018
© 2018 KU Leuven – Faculty of Engineering Science
Uitgegeven in eigen beheer, Matteo Kirchner, Celestijnenlaan 300 box 2420, B-3001 Leuven (Belgium)
Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigd en/of openbaar gemaakt worden
door middel van druk, fotokopie, microfilm, elektronisch of op welke andere wijze ook zonder voorafgaande
schriftelijke toestemming van de uitgever.
All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm,
electronic or any other means without written permission from the publisher.
Fruitur tamen etas nostra beneficio precedentis, et sepe plura novit non suo
quidem precedens ingenio, sed innitens viribus alienis et opulenta patrum.
Dicebat Bernardus Carnotensis nos esse quasi nanos gigantium humeris
insidentes, ut possim plura eis et remotiora videre, non utique proprii visus
acumine aut eminentia corporis, sed quia in altum subvehimur et extollimur
magnitudine gigantea.
Metalogicon (III, 4)
Iohannes Saresberiensis
Preface
This is the outcome of almost six years of research, that would not have
been possible without funding. Therefore, I would like to begin this book
by acknowledging the European Commission for its support through the
Marie Skłodowska-Curie ITN-EID project eLiQuiD within the 7th Framework
Programme (GA 316422). Moreover, my research was partially supported by
Flanders Make, the strategic research centre for the manufacturing industry,
within the MoForM project. I am also grateful to the Flanders Innovation
& Entrepreneurship Agency and the Research Fund KU Leuven for their
support. Additionally, my work has been partially funded by the COMET K2 –
Competence Centres for Excellent Technologies Programme of the Austrian
Federal Ministry for Transport, Innovation and Technology (BMVIT), the
Austrian Federal Ministry for Digital, Business and Enterprise (BMDW), the
Austrian Research Promotion Agency (FFG), the Province of Styria and the
Styrian Business Promotion Agency (SFG).
Many people contributed to my life as a PhD student, and I would like to take
some time to express my gratitude to everyone who surrounded me during
the past years. Chronologically, my first big thanks goes to Stijn Donders. In
July 2012 you brought to my attention the open positions within the eLiQuiD
project at the KU Leuven Noise & Vibration Research Group, and you helped
me getting in touch with my future supervisors. This is how my PhD journey
started and I am very grateful to you for this.
My second and greatest thanks goes to my supervisors: Wim Desmet and Bert
Pluymers. I remember very well my interview during ISMA 2012, when we
had a nice talk over an Irish coffee. Wim, thanks a lot for the opportunity
of pursuing a PhD within your group. The technical as well as the human
interaction with many colleagues made my research interesting, challenging
and certainly also enjoyable. Thanks for the feeling of freedom and trust that
accompanied my research in the past years, and thanks for having me still on
board in the upcoming months. Bert, organisation, projects and practicalities
i
ii PREFACE
could not go any smoother, thanks a lot for all your help.
My gratitude goes also to all members of my examination committee for the
discussion during the preliminary defence and for going through my manuscript:
to Bert Pluymers, Davy Pissoort, Eugène Nijman, Paul Sas, Wim Desmet, for
your constructive hints since day one as members my supervisory committee;
to Claus-Peter Fritzen, for reading my text in detail and spotting the main
aspects that I could improve as external jury member; to Jan Croes, for the
countless inputs, technical discussions, feedback and revisions during the past
four years as colleague, friend and additional jury member; finally, to Carlo
Vandecasteele for making sure everything went smoothly as the chair of my
examination committee.
My PhD research started at Virtual Vehicle Rerearch Center (ViF) in Graz, and
my next thanks goes to Anton Fuchs, Eugène Nijman and Jan Rejlek. I really
enjoyed the nice environment at ViF, for which you are certainly responsible:
Toni from the top, Eugène with great research ideas and Jan supporting me
during every step in the eLiQuiD project.
My initial PhD track was not at all oriented towards the development of a
technique for virtual sensing, and two occurrences played a crucial role in
establishing my final topic, that I want to mention here. On December 19th ,
2013, I was sitting at my desk at ViF, finalising and submitting my first
conference paper. Unexpectedly I received an email from Eugène Nijman, with
subject “article”, body “For the future. . . ”, and with a paper attached. Eugène,
it was kind of exceptional receiving an email from you with both subject and
body that made sense: I was used to nothing at all or possibly the result of
randomly pressing keys. That said, in attachment there was the first paper I
read about compressive sensing. So right you were!
Almost one year later, after moving to KU Leuven, I was trying to find an
application of compressive sensing that would steer my research. On October
14th , 2014, I was at a graduate school on innovative technologies for energy
conversion, where Jan Croes gave a talk about filtering techniques. Jan, it is still
not fully clear to me what both of us were doing at a course on energy conversion.
My best guess is that you were replacing a speaker who declined last minute,
whereas I was curious to know more about topics beyond my main research
interests (and at the same time I was collecting precious credits for my doctoral
diary). The point is that there we met for the first time, and you suggested very
enthusiastic to consider compressive sensing for input representation within a
moving horizon estimator. Well, nice intuition! Eugène, Jan, thanks a lot!!!
My PhD journey brought me to different locations and several topics, and
this gave me the chance to interact with a lot of people. Concerning the
technical part of my research, I am very grateful to Eugène Nijman, Francesco
Cosco, Frank Naets, Goele Pipeleers, Jakob Fiszer, Jan Croes, Karim Asrih,
PREFACE iii
Luca Sangiuliano, Noé Geraldo Rocha de Melo Filho, Simon Vanpaemel, Ward
Rottiers, Wim Desmet, for all the help and inputs that I received from you.
Thanks to Alex Ricardo Mauricio, Daniel de Gregoriis, Daniele Brandolisio,
Eddy Smets, Elke Deckers, Florian Maurin, Gunther Penninckx, Jean-Pierre
Merckx, Sebastiaan van Aalst, Tom Henskens, for being present whenever I
needed something practical. Thanks to Daniel de Gregoriis, Mathijs Vivet,
Siemen Timmermans and Simon Vanpaemel for translating title and abstract of
this dissertation in Dutch. Thanks to all my colleagues in the Noise & Vibration
Research Group at KU Leuven, and in particular thanks to the vibro-acoustics
group. Thanks also to the members of the consortia I interacted with during the
past years. Getting in touch with you helped me understanding the importance
of keeping an eye on the bigger picture. Among others, I would like to mention
COST TU1105, EARPA, eLiQuiD, GRESIMO, MoForM.
It has been a pleasure to share my daily working time with the colleagues in the
open space of Area C at ViF, in my previous office in the MECH building, and
recently at LVL. Hoping not to forget too many of you, thanks for all the coffee
breaks and the funny moments with Christopher, Elmar, Fred, Giorgio, Jan,
Markus, Petra, Rafael, Sanaz, Sophie, Vittorio, Thomas, Yasser, Zoran (ViF),
Alireza, Anna, Axel, Elke, Jaime, Kengo, Matt, Nicolas, Philip, Sjoerd, Vamsi,
Yasuo (MECH), Amar, Andrea, Bart, Daniel, Daniele, Dries, Emin, Enrico,
Francesco, Giovanni, Harald, Hui, Jan, Jakob, Jelle, Karim, Kylian, Lorenzo,
Marco, Martijn, Mathijs, Maurice, Mikel, Niccolò, Pavel, Rocco, Sebastiaan,
Siemen, Simon, Simone, Thijs, Ward (LVL). Thanks to the MECH newcomers
2014, the MECH happy hour (in particular to Alireza, Laurens, Pavel, Philip,
Sjoerd), the PMA social event team, the ISMA 2018 team, the colleagues at
ICT, HR, financial office and secretary (of course including the lekkere broodjes).
Almost six years of my life do not include exclusively research, and many people
contributed to my well-being outside ViF and KU Leuven. Thanks to Ali,
Christopher, Francesca, Michael, Rafael, Sanaz, Sophie, for the multiple dinners
at Schillerheim. Thanks to Alberto, Alex, Alireza, Andrea, Anna, Barbara, Bart,
Costanza, Ettore, Florian, Gianmaria, Gorka, Hendrik, Hervé, Ines, Jonathan,
Laura, Laurens, Lise, Luca, Marco, Marcello, Matt, Mikel, Mirian, Nico, Noé,
Pavel, Philip, Sepide, Sergio, Siemen, Simona, Sjoerd, Vyacheslav, Wim, for the
nights in the Oude Markt, the barbecues, and similar high quality activities. Nen
dikke merci aan Barbara, ik ben blij dat je naast mij staat. Thanks to Adrian,
Attila, Dorota, Janick, Mario, Michal, Sebastian, Tânia, for the nice weekends
around Europe that we manage to have once in a while even if geography does
not help us much. Thanks to my friends in Trento. Knowing that you are there
means the world to me, and I am happy that somehow feelings do not change
much with time and distance. To Alessandro, Marco, Thomas, it is always a
good day to brew another batch of Cagnara. To Alessandro, Annalisa, Cristian,
iv PREFACE
Giulia, Marco, Martina, Mauro, Michele, Monte Bondone, Nicola, Roberto,

Susanna, Thomas, Veronica, it is always a good day to go for breakfast, apéritif,
evening drinks, have a barbecue, go cycling. I just wish I were there more often.
Thanks to my family, in particular to my grandparents, parents, cousins and
sister.
Dear fellow engineer, this thesis is for you. In writing this book I tried to report
all activities that brought me to the development of the CS-MHE, including
state of the art, formulas, practical matrix implementation, examples. I really
hope that my work can be part of your work, as starting point for future
methodological developments or by contributing to a specific application. I am
looking forward to seeing your results, and I wish you all the best in your career.
Throughout this manuscript, I refer to a generic third person by the word we.
Please consider this choice of style as a personal preference rather than a claim
to authority.
Many people contributed to my research. I mentioned some of them in this
preface as well as in the acknowledgements at the beginning of each chapter,
while I cited the work of others in the bibliography. In other words, the results
that I present in this book could not have been possible without the knowledge
of other people’s work. Bernard of Chartres used to say that we [the Moderns]
are like dwarves perched on the shoulders of giants [the Ancients], and thus we
are able to see more and farther than the latter. And this is not at all because
of the acuteness of our sight or the stature of our body, but because we are
carried aloft and elevated by the magnitude of the giants. May my work make
these giants slightly bigger.
My last thanks goes to everyone who contributes daily to my free time and
my passions, and to whom lets me cultivate my (sometimes weird) hobbies.
Among others, outdoor sports and barbecues are possibly what recharges my
batteries the fastest, and music is what gives me some kind of peace. That is
why I decided to conclude this book quoting a piano sonata that is magic to
me in so many ways, and certainly belongs to my list of wonders of mankind.
It is an incredibly intimate and spiritual adieu by a giant (of course perched on
the shoulders of other giants), and it accompanied me along my whole journey,
including today, that I proudly say adieu to my life as a PhD student, looking
with a smile at whatever will come next, and being amazed every time I go out
and I see the beauty of nature.
Matteo
Leuven, November 6th , 2018

Abstract
The trend of increasing intelligence in mechatronic systems characterises many

products and production lines in our society, aiming at better performances,
improved reliability, higher safety, reduced costs and lower emissions. When
machines are operational, processes like control and monitoring employ
digital computers and sensors to take decisions in response to measured data.
Furthermore, during the design phase of a product additional sensors are desired
in response to tight design constraints. Although it can be relatively easy to
measure quantities such as temperature, pressure, light and acceleration, we
cannot state the same for forces and torques. Dedicated sensors do exist, but
in practice they may be difficult or even impossible to mount on a mechanical
system due to geometrical, safety or economic reasons. Those quantities can
therefore not be measured, whereas they are of paramount importance for
decision making related to design, durability, fault detection, failure prediction,
maintenance, advanced control strategies. To overcome this issue, within the
engineering community it is common practice to make use of models. In fact,
the combination of a model and any available measurements can lead to a
better knowledge of the system. This can be done by calling in the discipline of
state estimation, where the states are those quantities that we want to monitor
or control. Furthermore, in case of external unknowns, we can set up a joint
state/disturbance estimator, that may include an additional representation of
the unknown quantities. Model based (joint) estimators are extremely powerful
tools, but they also introduce a few challenges. First, we need to make sure
that the information that we want to estimate is observable, i.e., the level of
details of the model and the type and amount of measurements are adequate
to capture all unknowns. Furthermore, a disturbance characterised by a high
dynamics may be difficult to detect even with a very high sampling rate.
The goal of this dissertation is to overcome the typical difficulties of joint
estimation problems, related to observability and disturbance dynamics. We
propose a novel time domain approach for joint state/input estimation of
mechanical systems, where the novelty consists of exploiting compressive
v
vi ABSTRACT
sensing (CS) principles in a moving horizon estimator (MHE), allowing for the
observation of a large amount of input locations for a small set of measurements.
In the new approach, called compressive sensing–moving horizon estimator
(CS-MHE), the capability of the MHE of minimising the noise while correlating
a model with measurements is enriched with an `1 -norm optimisation in order
to promote a sparse solution for the input estimation. This allows us to model
an input through a few shape functions belonging to a predefined set, i.e., we
exploit a known input shape to represent an input and to relax the typical
limitations of estimation problems (observability, disturbance dynamics and
required sampling rate). Combining the estimation of states and inputs is
extremely valuable, since in the case of mere state estimation a model error
could be perceived as an input, i.e., the estimated input incorporates the model
inaccuracy.
This book includes the state of the art of model based estimation techniques,
the mathematical derivation of the CS-MHE formulas, and two application
cases in the field of structural dynamics. In a first example we employ the
CS-MHE for the estimation of force impacts entering a mechanical system at
an unknown location, while in a second example we focus on the estimation of
periodic loads, which we typically find in rotating machinery.
Beknopte samenvatting
De trend naar toenemende intelligentie in mechatronische systemen is

kenmerkend voor vele producten en productielijnen in onze samenleving. Deze
zijn gericht op betere prestaties, verbeterde betrouwbaarheid, hogere veiligheid,
lagere kosten en lagere emissies. Wanneer machines operationeel zijn, gebruiken
meet- en controle processen digitale computers en sensoren om beslissingen
te nemen op basis van de geregistreerde meetdata. Bovendien zijn tijdens de
ontwerpfase van een product extra sensoren gewenst om na te gaan of aan de
strakke ontwerpbeperkingen voldaan wordt. Hoewel het relatief eenvoudig is
om grootheden zoals temperatuur, druk, licht en versnelling te meten, is dit
niet het geval voor krachten en koppels. Specifieke krachten koppelsensoren
zijn beschikbaar, maar in de praktijk kunnen ze omwille van geometrische,
veiligheids- of economische redenen moeilijk of zelfs onmogelijk gemonteerd
worden op een mechanisch systeem. De krachten koppelgrootheden kunnen
daarom niet worden gemeten, hoewel ze van het allergrootste belang zijn voor
beslissingen omtrent ontwerp, duurzaamheid, foutdetectie, foutvoorspelling,
onderhouds- en geavanceerde regelstrategieën. Om dit probleem op te lossen, is
het binnen de ingenieursgemeenschap gebruikelijk om over te gaan op wiskundige
modellen. Een model in combinatie met alle beschikbare meetdata kan leiden
tot een betere kennis van het werkelijke systeem. Dit kan bekomen worden
door een beroep te doen op de discipline van toestandsschatting. Hierbij zijn
de toestanden die geschat worden de grootheden die we willen controleren of
beheersen. In het geval van externe onbekenden kunnen we bovendien een
gezamenlijke toestands/storings-schatter opzetten, die een extra voorstelling
van de onbekende grootheden kan omvatten. Modelgebaseerde (gezamenlijke)
schatters zijn buitengewoon krachtige hulpmiddelen maar hebben ook enkele
uitdagingen. Ten eerste moeten we ervoor zorgen dat de informatie die we willen
schatten observeerbaar is; dit wil zeggen dat het model voldoende gedetailleerd
moet zijn en de soort en hoeveelheid meetdata voldoende moet zijn om alle
onbekenden te identificeren. Verder kan een storing die wordt gekenmerkt
door een hoge dynamica moeilijk te detecteren zijn, zelfs met een zeer hoge
vii
viii BEKNOPTE SAMENVATTING
bemonsteringssnelheid.
Het doel van dit proefschrift is om de typische uitdagingen van gezamenlijke
schattingsproblemen, gerelateerd aan observeerbaarheid en verstorende dy-
namica, te overwinnen. We stellen een nieuwe tijdsdomeinaanpak voor met
betrekking tot de gezamenlijke toestands/input-schatting van mechanische
systemen, waarbij de nieuwigheid zit in het combineren van compressive sensing
(CS) en een bewegende horizon-schatter (moving horizon estimator, MHE),
wat toelaat om een groot aantal inputlocaties te observeren voor een kleine
set metingen. In de nieuwe aanpak, genaamd compressive sensing–moving
horizon estimator (CS-MHE), is aan de functionaliteit van de MHE, die de ruis
minimaliseert wanneer het model met metingen wordt gecorreleerd, een `1 -norm
optimalisatie toegevoegd met als doel een ijle oplossing voor de input-schatting
te verkrijgen. Het stelt ons in staat om een input te modelleren met enkele
vormfuncties behorend tot een vooraf gedefinieerde set. Dit wil zeggen dat
we een bekende inputvorm gebruiken om een input te voor te stellen en om
de typische beperkingen van schattingsproblemen (observabiliteit, verstorende
dynamica en minimale bemonsteringssnelheid) te versoepelen. Het combineren
van toestands- en inputschatting is uiterst waardevol, aangezien in het geval
van loutere toestandsschatting een modelfout kan worden waargenomen als een
input: de geschatte input omvat de modelonnauwkeurigheid.
Dit boek bevat de state of the art van modelgebaseerde schattingstechnieken,
de wiskundige afleiding van de formules CS-MHE en twee toepassingen in het
gebied van structurele dynamica. In een eerste voorbeeld gebruiken we de
CS-MHE voor het schatten van kracht impacten die een mechanisch systeem
exciteren op een onbekende locatie. In een tweede voorbeeld focussen we op de
schatting van periodieke belastingen die typisch terug te vinden zijn in roterende
machines.
List of abbreviations
1D one-dimensional
2D two-dimensional
3D three-dimensional
AD algorithmic differentiation
BOB best orthogonal basis

BP basis pursuit
BPDN basis pursuit denoising
cf. confer
cond condition number
CP convex programming problem
CPU central processing unit
CS compressive sensing
CS-MHE compressive sensing–moving horizon estimator
CS-MUSIC compressive sensing–multiple signal classification
DAE differential algebraic equation

DC direct current
Def Definition
DFT discrete Fourier transform
DOF degree of freedom
e.g. exempli gratia

EKF extended Kalman filter
EMA experimental modal analysis
Eq Equation
FE finite element
ix
x LIST OF ABBREVIATIONS
Fig Figure
fps frames per second
Freq. Frequency
FRF frequency response function
i.e. id est
ID identification number
IP interior point
IST iterative shrinkage/thresholding
KF Kalman filter
KKT Karush-Kuhn-Tucker
LASSO least absolute shrinkage and selection operator

LED light-emitting diode
LICQ linear independence constraint qualification
LP linear program
LS least squares
LTI linear time-invariant
MHE moving horizon estimator

MMV multiple measurement vector
MOF method of frames
MOR model order reduction
MoUp model updating
MP matching pursuit
MPC model predictive control
MPF modal participation factor
MSE mean square error
MUSIC multiple signal classification
NAH nearfield acoustical holography

NI no input
NI-MHE no input–moving horizon estimator
NLP nonlinear programming
NP nondeterministic polynomial time
ODE ordinary differential equation

OMP orthogonal matching pursuit
PA polynomial approximation
PBH Popov-Belevitch-Hautus
LIST OF ABBREVIATIONS xi
PC personal computer
PDE partial differential equation
PDF probability distribution function
pdf probability density function
QCQP quadratically constrained quadratic programming problem

QP quadratic program
RIP restricted isometry property

ROI region of interest
RTI real time iteration
RW random walk
RW-MHE random walk–moving horizon estimator
SCP sequential convex programming

SHM structural health monitoring
SMV single measurement vector
SOCP second-order cone program
SQP sequential quadratic programming
SVD singular value decomposition
Thm Theorem
TTL transistor–transistor logic
UKF unscented Kalman filter
Note: suffix “s” indicates abbreviations in the plural form.

List of symbols
General symbols
˙
(·) first time derivative
(·)LB lower bound
(·)UB upper bound
(·)> transpose
(·)−1 inverse
(·)k variable evaluated at time step t = tk
0 null matrix
1 vector of all entries 1
C set of complex numbers
Cn n vector of nn complex numbers
Cnn ×nm matrix of nn rows times nm columns complex numbers
d(·) differential
∂(·) partial differential
E(·) expected value
I identity matrix
=(·) imaginary part of a complex number
k generic time step t = tk
|| · ||0 `0 -norm
|| · ||1 `1 -norm
|| · ||2 `2 -norm (Euclidean norm)
N (z̄, σz2 ) Gaussian distribution with mean z̄ and standard deviation σz
R set of real numbers
Rn n vector of nn real numbers
Rnn ×nm matrix of nn rows times nm columns real numbers
<(·) real part of a complex number
t time variable (continuous time)
tk time variable at time step k (discrete time)
xiii
xiv LIST OF SYMBOLS
Specific symbols for the CS-MHE

¯
(·) the variable refers to the previous iteration
A state matrix
α vector of the sparse representation of the input
α∗ vector of the nonzero components of α, governed by εα
B input matrix
b constant term of the CS-MHE cost function
b̃ real constant term of the SOCP
C output matrix
D feedthrough matrix
εα threshold level for considering an input
εQ scaling factor assigned to the covariance matrix Q
εR scaling factor assigned to the covariance matrix R
f (·) right-hand side of the state equation
g(·) right-hand side of the measurement equation
H Hessian matrix of the CS-MHE cost function
H̃ real Hessian matrix of the SOCP
Hαx bottom-left block of H that refers both to the nx states of the system
and to the nα input components
Hαα bottom-right block of H that refers to the input
Hxα top-right block of H that refers both to the nx states of the system
and to the nα input components
Hxx top-left block of H that refers to the nx states of the system
H̃ζ Hessian matrix for the covariance matrix of an SOCP
i current iteration of a recursive filter
Jg Jacobian matrix of the constraints, for the covariance matrix in case
of a QP
J˜ζ Jacobian matrix of the constraints, for the covariance matrix in case
of an SOCP
λ balancing weight in the cost function of the CS-MHE
N number of time istants in a horizon
nα number of basis functions, i.e., length of α
nα∗ number of nozero basis functions, i.e., length of α∗
nm number of sampling points
nr number of transducers
nx number of states, i.e., length of the state vector
ny number of measurements, i.e., length of the measurement vector
nz number of states after state augmentation
να∗ noise term related to α∗
O observability matrix
Pa covariance matrix associated to the arrival cost
Pα covariance matrix associated to α
LIST OF SYMBOLS xv
Pα∗ covariance matrix associated to α∗

Q covariance matrix associated to the model
Qdrift drift term for the propagation of the prior information
q first order vector of the CS-MHE cost function
q̃ real first order vector of the CS-MHE cost function
qα part of q that refers to the input
qx part of q that refers to the nx states of the structure
q̃ζ first order vector for the covariance matrix of an SOCP
R covariance matrix associated to the measurements
S sparsity
s slack variable
T current time step
u(·) input vector
v(·) vector of the measurement error
w(·) vector of the model error
x(·) state vector
y(·) measurment vector
z vector of all augmented states
z̃ real vector of all augmented states
ζ real vector of all optimisation variables for the covariance matrix of
an SOCP
Note 1: this list is not exhaustive. It includes some important general symbols
as well as most of the symbols that we use in the development of the
CS-MHE in chapter B1. The explanation of the remaining symbols needs
to be found in the text.
Note 2: throughout this dissertation, we refer to a column vector as vector.
Accordingly, we refer to a row vector through the transpose (·)> .
Note 3: we refer to matrices and vectors with uppercase and lowercase Roman
italic characters, respectively. The only exceptions regard letters f (·),
g(·) and h(·), which are reserved to indicate functions, letters k, n, N
and T , which are scalars, and letter d, which indicates a differential.
Scalars and other particular instances are indicated by Roman characters
(“upright”, both uppercase and lowercase). The use of Greek characters
and calligraphic letters will be clear from the context.
Contents
Abstract v
Beknopte samenvatting vii
List of abbreviations ix
List of symbols xiii
Contents xvii
Introduction 1
Outline and structure of the dissertation . . . . . . . . . . . . . . . 4
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
A State of the art 11
A1 State estimation 13
A1.1 Dynamical system modelling . . . . . . . . . . . . . . . . . 14
A1.2 Probability theory . . . . . . . . . . . . . . . . . . . . . . . 18
A1.3 Least squares estimators and recursive formulation . . . . . 25
A1.4 Single step estimators . . . . . . . . . . . . . . . . . . . . . 27
xvii
xviii CONTENTS
A1.5 The moving horizon estimator . . . . . . . . . . . . . . . . . 32

A1.6 Joint state/input/parameter estimation . . . . . . . . . . . 40
A1.7 Input models . . . . . . . . . . . . . . . . . . . . . . . . . . 41
A1.8 Observability and matrix condition number . . . . . . . . . 48
A1.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A2 Optimisation 53
A2.1 An introduction to nonlinear programming . . . . . . . . . 54
A2.2 Convex programming problems . . . . . . . . . . . . . . . . 58
A2.3 Numerical methods for nonlinear programming . . . . . . . 59
A2.4 Norm approximation problems . . . . . . . . . . . . . . . . 64
A2.5 Complex optimisation variables . . . . . . . . . . . . . . . . 68
A2.6 Covariance matrix of constraint optimisation problems . . . 69
A2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
A3 Compressive sensing 73
A3.1 Introduction to compressive sensing . . . . . . . . . . . . . 74
A3.2 Methods for solving the compressive sensing problem . . . . 75
A3.3 Feasibility considerations for compressive sensing . . . . . . 77
A3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
B The compressive sensing–moving horizon estimator

(CS-MHE) for joint state/input estimation 81
B1 Formulation of the CS-MHE 83

B1.1 Background and motivations of the CS-MHE . . . . . . . . 84
B1.2 The CS-MHE: a constrained optimisation problem . . . . . 85
B1.3 The CS-MHE: limiting the amount of constraints . . . . . . 89
CONTENTS xix
B1.4 The CS-MHE with complex input representations . . . . . . 90

B1.5 Discussion about the different CS-MHE formulations . . . . 96
B1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
B2 Rank and condition number considerations for the CS-MHE 101

B2.1 Two estimation schemes to compare with the CS-MHE . . . 102
B2.2 Numerical test cases . . . . . . . . . . . . . . . . . . . . . . 104
B2.3 Structure of the matrices . . . . . . . . . . . . . . . . . . . 107
B2.4 Rank and condition number for different amounts of inputs 110
B2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
C Applications 119
C1 Estimation of force impacts 121

C1.1 Numerical estimation of multiple force impacts . . . . . . . 122
C1.2 Experimental estimation of one force impact . . . . . . . . 133
C1.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
C2 Estimation of periodic loads described by Fourier components 139

C2.1 Numerical estimation of a periodic load in time . . . . . . . 140
C2.2 Numerical estimation of a periodic load in time and space . 144
C2.3 Experimental estimation of a periodic load in time . . . . . 145
C2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Conclusions and outlook 171

Possible application cases . . . . . . . . . . . . . . . . . . . . . . . 172
Further methodology developments . . . . . . . . . . . . . . . . . . 174
xx CONTENTS
Appendix 1: Matrix implementation of the CS-MHE 177
Appendix 2: Test case description for section C2.3 183
Bibliography 203
Curriculum vitae 221
List of publications 223

Introduction
Nowadays, automation is present in many aspects of our daily life. From small
home appliances to heating, ventilation and air conditioning systems, from
electric bicycles to spacecrafts, from family businesses to industrial production
plants, an increasing number of tasks is performed by CPUs in an automated
fashion. Digital controllers make use of various configurations, algorithms,
modelling tools and measurements with the aim of obtaining better performances,
reducing costs and decreasing emissions. Furthermore, some specific tasks cannot
be carried out unless a controller continuously monitors a certain system flow.
Control strategies require specific information, which is not always available. In
such a situation, or if prediction is needed for taking future decisions, estimators
can help a controller by providing the missing information.
Besides control engineering, estimators are key elements within the communities
of structural health monitoring (with a primary interest in bridges and buildings)
and condition monitoring (related to machinery and moving components). In
fact, the evaluation of the health of a structural component or of the whole
assembly allows to predict and possibly avoid failure, to plan maintenance, thus
improving safety and reliability, and reducing costs and energy consumption. An
accurate assessment of the health status requires some information, which could
be acquired by direct measurement. Unfortunately, there are many situations
in which economic or physical constraints do not allow to place a certain sensor.
In particular, forces and torques are of paramount importance for durability.
Let us just mention forces and torques in bearings and gearboxes, contact forces
on gear teeth, aerodynamic loads on wind turbine blades, and the interaction
between a machine tool and a work piece. In such situations a direct force
measurement is often not possible, while estimators can retrieve this information
by combining a model with other measurements such as accelerations or strains.
Such an estimator is also referred to as a virtual sensor.
In order to implement an automatised routine on a digital computer, we can
describe mathematically a generic system through some so-called states, which
1
2 INTRODUCTION
are variables that we want to estimate or control. The discipline of state

estimation is then one of the main ingredients of control engineering, condition
monitoring and virtual sensing. An unexpected event (e.g., a sudden change
of the system dynamics or an external input) may jeopardise the estimation
problem, and to overcome this issue it may be convenient to set up an estimator
that can track the time history of the states as well as the disturbance. This
comes with additional challenges, since the resulting joint state/disturbance
estimator may require a lot of measurements (which are in general expensive and
may not be available) in order for the system to be observable or controllable,
and extra knowledge about the disturbance, which is in general not trivial to
acquire.
For mechatronic systems we can design a joint estimator to estimate states
as well as internal unknowns (parameters) and external disturbances. In
particular, there is a growing interest in cost-efficient virtual sensors for static
and dynamic force and torque estimation, being those quantities important in
design qualification, condition monitoring, control and sensing in general. During
the development phase of a mechatronic product or process, force and torque
data allow for better prediction of the lifetime and reduction of safety margins,
resulting in a decrease of failures, lower warranty costs, saved energy and higher
performance. Furthermore, we can foresee improved control strategies, proper
selection of machine elements and innovative design. For condition monitoring,
following the evolution of unknown external disturbances such as forces and
torques can be instrumental to failure indication systems. Moreover, we can
employ some information for feedback control and model updating depending
on the timing requirements. Force estimation for condition monitoring allows to
develop a predictive maintenance strategy and possibly modify the operational
conditions in view of optimal lifetime and energy efficiency. Consequently,
we can lower maintenance and servicing costs by exploiting direct physical
properties, whereas other techniques must use standard measurements such
as temperature, current, vibration and noise level. Control engineering can
benefit from (in-situ) force data to steer and control a process and have thus
better performance, energy consumption and lifetime. Finally, force estimators
represent a powerful tool in specific domains where direct measurements are
not possible or are not expected to be robust over a long lifetime, or where
contactless measurement protocols (e.g., video based) are required.
In this dissertation we propose a novel technique for joint state/input estimation,
which we named compressive sensing–moving horizon estimator (CS-MHE). This
methodology exploits the intrinsic capability of the moving horizon estimator
(MHE) of combining a model of a system with some measurement within a finite
length window in time, while considering the stochastic phenomena related
to their accuracies. Furthermore, compressive sensing CS principles allow to
INTRODUCTION 3
include in the estimator the knowledge of an input shape. Consequently, the

estimation problem consists of a model, some measurements and an input model.
We can represent both the system and the input as physical models, i.e., they are
inspired by physical processes instead of relying on mere mathematical and/or
probabilistic assumptions. For the CS-MHE, exploiting a physical input shape
allows to limit the observability issues which are typical of joint estimators with
multiple estimation variables, requiring less measurements. Moreover, we can
choose input shapes that can represent an input which manifests a fast dynamics,
which is not a trivial task for other state of the art modelling techniques. CS
is based on signal sparsity, resulting in the need of a sparse input. We obtain
this by projecting an input onto a set of basis functions of which only a few
are active, and we define them on the finite length estimation window of an
MHE. Such representation is well suited for the estimation of a force impact
entering a mechanical system at an unknown location. Furthermore, we can
estimate inputs distributed in time and/or space provided that we define an
appropriate set of basis functions. Before moving to the outline and structure of
this dissertation and to the contributions, we conclude this broad introductory
section by describing how the CS-MHE is positioned within the new trend in
industry, denominated Industry 4.0 [76].
The compressive sensing–moving horizon estimator in the current industry

trend
Industry 4.0 [76] is the current trend of automation and data exchange in
manufacturing technologies. Its name was proposed by the German federal
government (Bundesministerium für Bildung und Forschung [22]) and refers
to what is expected to be the fourth industrial revolution. Industry 4.0 aims
at creating a so-called smart factory, in which cyber-physical systems monitor
physical processes, create a virtual copy of the physical world and make
decentralised decisions. Cyber-physical systems communicate and cooperate
with each other and with humans in real time, and both internal and cross-
organisational services are offered and used by the participants of the product’s
value chain. Industry 4.0 has the following four design principles [76]:
1. interoperability, i.e., the ability of machines, devices, sensors, and people

to connect and communicate with each other;
2. information transparency, i.e., the ability of information systems to create
a virtual copy of the physical world by enriching digital plant models with
sensor data;
3. technical assistance, i.e., the ability of a systems to support humans in
making informed decisions and solving urgent problems on short notice, as
4 INTRODUCTION
well as the ability of cyber-physical systems to physically support humans

by conducting a range of tasks that are unsafe for humans;
4. decentralised decisions, i.e., the ability of cyber-physical systems to make
decisions on their own and to perform their tasks as autonomously as
possible.
An example for Industry 4.0 is a machine which can predict failures, trigger
maintenance processes autonomously and react to unexpected changes in
production. Such a smart product knows its history and its current and
target states. It can steer itself through the production process by instructing
other machines to perform some tasks in a certain production stage [76]. In
such framework, the CS-MHE can provide a smart product with extra data to
be employed for self-optimisation, self-configuration and self-diagnosis, as well
as further support to human workers. The CS-MHE is based on physical models
(which are a central component in Industry 4.0) and allows for the estimation
of inputs which cannot be detected by other state of the art technologies. This
improves the knowledge of a process, and can be implemented by exploiting an
already available hardware.
Outline and structure of the dissertation
Table I outlines the part and chapter subdivision of this dissertation.
Part A collects the state of the art regarding model based state/input
estimation, and includes a primer in nonlinear optimisation theory and
compressive sensing.
In Chapter A1 we give an overview of the most common state of the art
state estimation techniques which are typically employed in automotive,
manufacturing and chemical industry. After introducing single step estimators,
we focus on the (multistep) moving horizon estimator. Next, we consider several
approaches to model an input. In particular, this chapter includes the state
of the art of compressive sensing as a tool to improve the performances of
an estimator and as a way to represent a force input. Finally, we discuss the
concept of observability, which is relevant for all state estimation techniques and
becomes even more crucial in the framework of joint state/input estimation.
In Chapter A2 we outline the fundamentals of constrained optimisation, since
the CS-MHE needs to solve a minimisation problem. Moreover, we introduce
some relevant concepts such as norm approximation problems, optimisation
OUTLINE AND STRUCTURE OF THE DISSERTATION 5
Table I: Structure of this thesis.
Introduction
A State of the art

A1 State estimation
A2 Optimisation
A3 Compressive sensing
B The compressive sensing–moving

horizon estimator (CS-MHE) for
joint state/input estimation
B1 Formulation of the CS-MHE
B2 Rank and condition number considerations for
the CS-MHE
C Applications
C1 Estimation of force impacts
C2 Estimation of periodic loads described by
Fourier components
Conclusions and outlook
problems with complex variables, and how to obtain the covariance matrix in
case of constraint optimisation problems.
In Chapter A3 we treat compressive sensing. Together with the moving horizon
estimator (cf. chapter A1), compressive sensing forms the foundation of the
CS-MHE.
Part B discusses the development of the CS-MHE for joint state/input

estimation.
In Chapter B1 we introduce the CS-MHE. We derive it starting from the
MHE formulation given in chapter A1 and we discuss its tuning parameters.
Furthermore, we present a strategy to deal with the information that regards
any already estimated input, based on the covariance matrix of the optimisation
6 INTRODUCTION
−2
F [N]
−4
F −6
0.2
s1 s2 s3 0.08
0.3 0.06
0.04
0.4 0.02
0 0.125 0.245 0.325 0.405 x [m] 0
t [s]
x [m]
Figure I: First application example. Experimental set-up (top left), model

(bottom left), impact estimation in time and space (right).
problem. Next, we propose a second formulation of the CS-MHE, which

we obtain by including the constraints of the optimisation problem into the
cost function. This formulation has a few advantages, i.e., it requires less
computational effort, it allows to have matrices of fixed size throughout operation
and it paves the way for implementing complex input representations and
for assessing rank and condition number of the matrices of the CS-MHE in
chapter B2.
In Chapter B2 we assess rank and condition number of the matrices of the
CS-MHE, and we compare them with two other estimation approaches based
on an MHE. Such analysis wants to establish a link with observability.
Part C shows a few application cases of the CS-MHE for joint state/input
estimation.
In Chapter C1 we present an application case that serves as first validation
scenario for the CS-MHE. An LTI mechanical system is subjected to a force
impact, and the CS-MHE estimates the states of the system (i.e., the first three
structural eigenmodes and their time derivatives) as well as the force impact
in terms of magnitude, time and location (Fig. I). The chapter includes some
numerical simulations and one experiment.
In Chapter C2 we present a few examples that exploit a complex input
representation. First, we present two numerical test cases with a 1D and a 2D
Fourier dictionary, respectively. Furthermore, we introduce an experimental
application case which exhibits more challenges in terms of system modelling
and measurements, i.e., we treat the case of a structure excited by a shaker,
CONTRIBUTIONS 7
Iteration 1
MPF pos,n 3 n ( ) 3
10 -3
5 20
uk
0 0
5
-5 -20
T-N+1 T-1 T Fourier components 0
F(k) [N]
k
-5
MPF vel,n 3 n ( ) 3 -10

20
2
0 0 T-N+1 T-1 T
k
-2
-20
T-N+1 T-1 T Fourier components
k
Figure II: Second application example. Experimental set-up (top left), model
(bottom left), periodic load estimation in time (right).
modelled by finite elements, and measured through a high-speed camera (Fig. II).
We show the estimates for a periodic load applied at a known location and
consisting of up to four Fourier components, and we investigate the performance
of the CS-MHE in relation to the system calibration and to the length of the
estimation window.
Contributions
This thesis presents contributions related to the field of joint state/input

estimation. We exploit compressive sensing principles to model an input within
a moving horizon estimator. Such input representation can be instrumental
whenever other state of the art techniques face strong limitations. Examples are
inputs whose location is not known a priori, distributed inputs and inputs
characterised by a fast dynamics. Although this thesis focuses on input
estimation, the CS-MHE can be employed for parameter estimation without
loss of generality [133]. Many engineering areas can benefit from techniques for
joint state/input/parameter estimation, e.g., unknown input and/or parameter
characterisation, process control, condition monitoring and virtual sensing.
8 INTRODUCTION
In the remaining part of this introductory chapter we outline eight main

contributions of this thesis, divided into methodological contributions and
applications. For each contribution we indicate to which chapter or section it
mainly refers to. Most of the contributions follow linearly the structure of the
dissertation. However, both contributions (i) and (vi) result from section B1.2,
while contribution (v) involves the whole methodological chapter B1. Finally,
Table II summarises the connections between chapters and contributions.
Methodological contributions
Estimation of inputs applied at an unknown location (i)

It is not trivial to estimate an input entering a system at an unknown location.
In fact, classical techniques for input representation such as the random walk
model relate to a predefined input location. It is possible to model several
unknown inputs corresponding to a fine spatial sampling, but this may require
an overwhelming amount of measurements, since observability needs to be
satisfied. Compressive sensing has already been proposed in literature as a tool
to overcome such observability issue, for a static time horizon (in which only
one location is excited at one time) and within a single step filter. With the
CS-MHE we exploit compressive sensing principles to improve the observability
for a multistep filter such as the MHE. This allows the estimation of an input
applied at an unknown location.
cf. section B1.2
Estimation of distributed inputs (ii)

This contribution is strictly related to the previous one, since distributed inputs
can be modelled as elementary inputs applied to multiple locations. Once
again, this results in a decrease of observability. The CS-MHE overcomes this
restriction by modelling a distributed input in space and/or in time (i.e., within
an estimation window) by means of shape functions, allowing for the estimation
of distributed inputs both in space and time. A subgroup of the distributed
inputs involves an input evolving in time applied at a known location. A few
existing estimation methodologies generate excellent solutions when estimating
inputs in time, e.g., an extended Kalman filter together with a random walk
model for representing the input [133]. In this framework, the CS-MHE offers
an alternative approach based on input projection onto a set of shape functions
defined within the estimation window.
cf. section B1.3
CONTRIBUTIONS 9
Estimation of periodic inputs modelled by complex values (iii)

Complex representations are suitable for specific types of signals. For example,
a periodic signal can be projected onto a set of Fourier basis functions by
means of the Fourier transform. Accordingly, we developed a formulation of the
CS-MHE that allows the implementation of complex shape functions. Possible
applications include the estimation of quasi-periodic loads which are typical of
rotating machinery.
cf. section B1.4
Rank and condition number assessment of the CS-MHE matrices (iv)

All the previous contributions relate to observability. In fact, the CS-MHE aims
at reducing observability issues by exploiting CS principles. In this dissertation
we look into this topic by numerically investigating rank and condition number
of the CS-MHE matrices in comparison with other two MHE schemes. This
allows to illustrate the benefits of exploiting the knowledge of an input shape
and to define a threshold of the number of inputs that the CS-MHE can estimate,
based on input sparsity.
cf. chapter B2
Estimation of inputs characterised by fast dynamics with respect to the

sampling rate (v)
Up to here we listed contributions in which we employ input models that involve
shape functions in order to improve observability. At the same time, such
input representations allow to exploit a known input shape to model inputs
characterised by a fast dynamics. High bandwidth dynamics are typically quite
challenging to estimate since they are very sensitive to noise sources. Exploiting
a known shape increases the accuracy of input observers that do not make use of
a model, and increase strongly the bandwidth of random walk approximations.
This underlines the potentialities of choosing a structured input model in place
of a random walk.
cf. chapter B1
Estimation of force impacts (vi)

Impacts belong to the family of inputs characterised by a fast dynamics.
Furthermore, force measurements are important for structural health monitoring,
durability and fatigue purposes, and the interest in force estimation and virtual
sensing is growing. Although dedicated transducers do exist, they are in general
expensive and cannot be placed in every location due to geometrical and/or
mechanical restrictions. The proposed CS-MHE can be instrumental for the
estimation of force impacts.
cf. section B1.2
10 INTRODUCTION
Applications
The CS-MHE for the estimation of force impacts: application case (vii)
As a first application example, we propose an LTI mechanical system (i.e., a
cantilever beam) modelled analytically, subject to a force impact entering the
system at an unknown location. This example demonstrates the effectiveness of
compressive sensing as a tool to model an input, and includes an experimental
validation.
cf. chapter C1
The CS-MHE for the estimation of a distributed load: application case

(viii)
A second test case involves the estimation of a periodic load applied at a known
location. This relies on an input projection onto a complex Fourier dictionary.
In particular, we propose an experiment in which we model a more elaborate
LTI system with finite elements and we measure a periodic load by contactless
high-speed camera recordings and digital image processing. We designed and
built this second demonstrator to be able to apply a wider family of load
typologies. The choice of vision measurements allows us to have a flexible sensor
array for what the amount and location of the sensors are concerned, which is
beneficial in the development phase of a virtual sensor.
cf. chapter C2
Table II: Connections between chapters and contributions.
B1 Formulation of the CS-MHE (v)

B1.2 The CS-MHE: a constrained optimisation problem
(i) and (vi)
B1.3 The CS-MHE: limiting the amount of constraints
(ii)
B1.4 The CS-MHE with complex input representations
(iii)
B2 Rank and condition number considerations for

the CS-MHE (iv)
C1 Estimation of force impacts (vii)

C2 Estimation of periodic loads described by Fourier
components (viii)
Part A
State of the art
11
Chapter A1
State estimation
This chapter outlines the state of the art concerning estimation techniques
based on probability theory. Most of the approaches aim at correlating a model
with measurements while taking into account the related stochastic phenomena.
Our discussion focuses on discrete time systems and state-space representations,
which are widely employed in digital control engineering. In this chapter we
describe the moving horizon estimator (MHE) which, together with compressive
sensing (cf. chapter A3), constitutes the foundation of the CS-MHE.
We start by defining a state in section A1.1. Next, in section A1.2 we outline a few
concepts of probability theory, which represent the groundwork of least square
estimators (section A1.3), of the Kalman filter and its extensions (section A1.4),
and of the MHE (section A1.5). We dedicate some space to the latter since it is
a key ingredient of the CS-MHE. Furthermore, in section A1.6 we introduce the
concept of joint estimation, while in section A1.7 we describe four approaches
for input modelling. These subjects raise the topic of observability, which
we address in section A1.8. Finally, in section A1.9 we close this chapter by
explaining the choice of the moving horizon estimator and compressive sensing
as the two main ingredients of the CS-MHE.
Acknowledgements
This chapter is an overview of the state of the art on the topic. Great sources
of inspiration for writing this chapter were [138, 183] and the excellent MHE
references such as [72, 166, 167]. Concerning input modelling, we acknowledge
the work related to KU Leuven and reported in [38, 124, 125, 127, 128, 133, 134].
The section about compressive sensing within estimation techniques comes
primarily from [104], of which Matteo Kirchner is first author.
13
14 STATE ESTIMATION
A1.1 Dynamical system modelling
In this introduction to state estimation we present some tools which we will

recall when discussing several estimation approaches later in this chapter. We
begin by providing the definition of state-space models in continuous time,
which we then linearise and discretise in order to implement them on a digital
computer.
A1.1.1 States and state-space models
The states of a system are those variables that provide a complete representation
of the internal condition (or status) of the system at a given time instant. In
general a system does not have a unique representation, and different applications
may require different models, resulting in different sets of states. Therefore, a
state (or state variable) represents a certain quantity of a system which we want
to monitor or control. The aim of state estimation is then to extract from a
system all available information about a state. State estimation is applicable to
many scientific disciplines where it is possible to have a mathematical model of
the system under analysis.
The mathematical representation of a system through states variables is referred
to as state-space model (or state-space representation). Eq. (A1a) shows a
generic state-space representation of a continuous-time time-varying nonlinear
system.A1 Throughout this thesis we will always consider two equations, i.e., the
over-mentioned state equation (A1a) combined with a measurement equation
(A1b). Time dependency is indicated by t, belonging to a continuous time
horizon.
ẋ(t) = f x(t), u(t), w(t), t (A1a)

y(t) = g x(t), u(t), v(t), t (A1b)

All quantities can be scalars or vectors:A2 x(t) ∈ Rnx represents nx states,

y(t) ∈ Rny are ny measurements, u(t) ∈ Rnu are nu inputs, and finally w(t) ∈
Rnx and v(t) ∈ Rny are noise terms on states and measurements, respectively.
ẋ(t) is the time derivative of the state vector, i.e., ẋ(t) = ∂t
∂
x(t).
A1 A time independent parameter p ∈ Rnp may also be present, as well as extra states which
do not exhibit any dependency of the model on the time derivative of those states [183].
A2 Throughout this dissertation, we refer to a column vector as vector. Accordingly, we refer
to a row vector through the transpose (·)> .

DYNAMICAL SYSTEM MODELLING 15
State estimation and state-space models became popular within the engineering
community during the second half of the twentieth century, and got a strong
boost after the introduction of the Kalman filter [94]. The two main engineering
applications of state estimation regard control engineering (the estimate of
a system state is needed in order to implement a state-feedback controller)
and measurement systems (certain estimates are sought if direct measurements
are not possible). A state equation such as Eq. (A1a) represents a dynamic
system, where the term dynamic refers to the time changing characteristics of a
system. It is possible to describe many real world processes by mathematical
models such as Eq. (A1a) by means of ordinary differential equations (ODEs)
or differential algebraic equations (DAEs).A3
It is difficult to handle mathematically nonlinear systems such as Eq. (A1a),
while linear system theory provides us with a series of tools for linear systems
[183]. In order to apply those tools we need to linearise a system, obtaining a
matrix representation such as Eq. (A2), where AC ∈ Rnx ×nx is the state matrix
(or system matrix), BC ∈ Rnx ×nu is the input matrix (or control matrix),
CC ∈ Rny ×nx is the output matrix, DC ∈ Rny ×nu is the feedthrough matrix
(or feedforward matrix), LC (t) ∈ Rnx ×nx and GC (t) ∈ Rny ×ny are two error
matrices, and finally cx ∈ Rnx and cy ∈ Rny are constants which may appear
due to the linearisation (for linear systems, these constants are equal to zero
[183, 192]). Subscript C indicates continuous-time.
ẋ(t) = AC (t)x(t) + BC (t)u(t) + LC (t)w(t) + cx (A2a)
y(t) = CC (t)x(t) + DC (t)u(t) + GC (t)v(t) + cy (A2b)
Most real world systems manifest a continuous-time dynamics. However, state

estimation is performed by computers which require discrete-time equations.
A discrete-time state-space model consist of a system which depends on a
discrete time step t = tk belonging to a discretised time horizon. Moreover,
measurements are usually sampled at discrete times. A generic state-space
discrete-time representation of the nonlinear system (A1) and the linear system
(A2) are shown in Eqs. (A3) and (A4), respectively. Subscript k refers to time
step tk , while subscript D indicates discrete-time.
xk+1 = f (xk , uk , wk , k) (A3a)
yk = g(xk , uk , vk , k) (A3b)
A3 Partial
differential equations (PDEs) are also possible. They can be reduced to ODEs
and DAEs [204].
16 STATE ESTIMATION
xk+1 = AD,k xk + BD,k uk + LD,k wk + cx,k (A4a)
yk = CD,k xk + DD,k xk + GD,k vk + cy,k (A4b)
Up to here we introduced a few basic concepts regarding states and state-space

models. In the next two sections we will show how to perform the operations of
linearisation and discretisation.
A1.1.2 Linearisation
The linearisation of a model generates a linear approximation around a chosen

point. The validity of such approximation is application dependent and it is
strongly linked to the severity of the nonlinearity. Let us first consider a function
f (x) of scalar x. Eq. (A5) shows the Taylor series of f (x) around a nominal
operating point (or linearisation point) x = x̄, where we defined x̃ = x − x̄.
1 ∂ 2 f 2 1 ∂ 3 f 3

∂f
f (x) = f (x̄) + x̃ + x̃ + x̃ + · · · (A5)
∂x x̄ 2! ∂x2 x̄ 3! ∂x3 x̄
If x ∈ Rnx is a vector, f (x) can be expanded in Taylor series with nx partial

derivatives, as shown in Eq. (A6a). A linear approximation consist in omitting
the high order derivatives, keeping only the first order, as shown in Eqs. (A6b–
A6c), where the definition of matrix A follows from Eq. (A6b).
nx
! nx
!2
X ∂ 1 X ∂
f (x) = f (x̄) + x̃i f (x) + x̃i f (x)

i=1
∂x i 2! i=1
∂x i
x̄ x̄
nx
!3
1 X ∂
+ x̃i f (x) + · · · (A6a)

3! i=1
∂xi
x̄
nx
!
X ∂
≈ f (x̄) + x̃i f (x) (A6b)

i=1
∂xi
x̄
≈ f (x̄) + Ax̃ (A6c)
Let us now apply an analogous procedure to a function of several variables such

as Eq. (A1a). If we omit the time dependency from the notation, we obtain the
DYNAMICAL SYSTEM MODELLING 17
formulas of Eq. (A7). We defined matrices AC , BC and GC in Eq. (A7c) from

Eq. (A7b) in order to match Eq. (A2a), which differs just by the noise term
and the constant.
ẋ = f (x, u, w) (A7a)

∂f ∂f ∂f
≈ f (x̄, ū, w̄) + x̃ + ũ + w̃ (A7b)
∂x (x̄,ū,w̄) ∂u (x̄,ū,w̄) ∂w (x̄,ū,w̄)
= x̄˙ + AC x̃ + BC ũ + LC w̃ (A7c)
If we linearise also the measurement equation (A1b), we finally obtain the linear
system (A8), where all matrices are given explicitly in Eqs. (A9).A4
x̃˙ = AC x̃ + BC ũ + LC w̃ (A8a)
ỹ = CC x̃ + DC ũ + GC ṽ (A8b)

∂f ∂f ∂f
AC = BC = LC = (A9a)
∂x (x̄,ū,w̄) ∂u (x̄,ū,w̄) ∂w (x̄,ū,w̄)

∂h ∂h ∂h
CC = DC = GC = (A9b)
∂x (x̄,ū,v̄) ∂u (x̄,ū,v̄) ∂v (x̄,ū,v̄)
A1.1.3 Discretisation
The solution of the continuous-time linear system (A2a) is reported in

Eq. (A10a), assuming cx (t) = 0 [183]. We want to discretise Eq. (A10a) at time
step t = tk , assuming that the initial time t0 corresponds to the previous time
step (i.e., t0 = tk−1 ) and that AC (t) and u(t) are approximately constant in the
interval of integration. τ is the integration variable and it is defined between t0
and t. Eq. (A10b) shows the resulting linear discrete-time approximation to the
continuous time dynamics, and finally Eq. (A10c) shows a compact formulation,
obtained by introducing the definitions of Eq. (A11), where ∆t = tk − tk−1
A4 The noise terms w̄ and v̄ are usually set to zero, such that w̃ ≡ w and ṽ ≡ v [183].
18 STATE ESTIMATION
is the discretisation step size. Subscript D indicates matrices that refer to

discrete-time.
Z t
AC (t−t0 )
x(t) = e x(t0 ) + eAC (t−τ ) [BC (τ )u(τ ) + w(τ )] dτ (A10a)
t0
Z tk
AC (tk −tk−1 )
x(tk ) = e x(tk−1 ) + eAC (tk −τ ) BC (τ ) dτ u(tk−1 )
tk−1
Z tk
+ eAC (tk −τ ) w(τ ) dτ (A10b)
tk−1
Z tk
xk = AD,k−1 xk−1 + BD,k−1 uk−1 + eAC (tk −τ ) w(τ ) dτ (A10c)
tk−1
xk = x(tk ) (A11a)
uk = u(tk ) (A11b)
AD,k = eAC ∆t (A11c)

Z tk+1
BD,k = eAC (tk+1 −τ ) BC (τ ) dτ (A11d)
tk
The integral in Eq. (A11d) can be formulated explicitly if AC is invertible.

Otherwise, numerical methods may be needed to solve this so-called integration
problem, such as rectangular integration, trapezoidal integration or Runge-Kutta
integration. For a comparison of the performance of those methods we refer to
[183]. In general, the solution accuracy improves if ∆t is small.
A1.2 Probability theory
The state estimation techniques that we will mention in this chapter derive
from probability theory, the main reason being that process noise can often be
modelled as a random variable (also referred to as random quantity, aleatory
variable, or stochastic variable). Consequently, we introduce a few definitions
concerning probability theory. Our discussion follows mainly from [183].
PROBABILITY THEORY 19
Definition A1.1 (Probability of event A). The probability of event A is

defined as the number of times A occurs divided by the total number of outcomes,
and it is referred to as P(A). P(A) is called a priori probability if it does not
depend on any prior known information.
Definition A1.2 (Joint probability). The joint probability of A and B, i.e.,

the probability that events A and B both occur, is defined as P(A, B).
Definition A1.3 (Conditional probability). The conditional probability of
event A given B is defined as follows, provided that the probability of B is
nonzero:
P(A, B)
P(A | B) = . (A12)
P(B)
P(A | B) is also called a posteriori probability since it applies to the probability

of A given the fact that some information about B is already known.
We note that in general P(A | B) > P(A, B). Moreover, we can apply Eq. (A12)
to P(B | A), leading to the equivalence of Eq. (A13).
P(A, B) = P(A | B)P(B) = P(B | A)P(A) (A13)
Theorem A1.4 (Bayes’ rule). The so-called Bayes’ rule (or Bayes’ theorem)
is obtained by rearranging Eq. (A13) as follows:
P(B | A)P(A)
P(A | B) = . (A14)
P(B)
Bayes’s theorem was first introduced in [13]. For an exhaustive treatment of

Bayesian theory we refer to [176]. It is worth to look at Eq. (A14) considering
A as a state and B as a measurement. We note that this is exactly what the
test engineer (or the control system) aims at, i.e., the probability associated
to a state given a measurement. Bayes’ rule is relevant in many fields such
as probability and game theory, measurement systems, control engineering,
robotics and sensor fusion, virtual sensing, estimators and filters.
Definition A1.5 (Random variable). A random variable X is defined as a

functional mapping from a set of experimental outcomes (domain) to a set of
real numbers (range). A random variable X exists independently of any of its
realizations x, i.e., any possible value X can assume. A random variable can
either be continuous or discrete [183].
20 STATE ESTIMATION
Definition A1.6 (Probability distribution function (PDF)). The most

fundamental property of a random variable X is its probability distribution
function (PDF), defined as follows:
FX (x) = P(X ≤ x). (A15)

Definition A1.7 (Probability density function (pdf)). The probability
density function (pdf) is defined as the derivative of the PDF with respect to
x, i.e.,
dFX (x)
fX (x) = . (A16)
dx
For a list of properties of the PDF and pdf we refer to [183].

Definition A1.8 (Expected value). The expected value of a random variable
X is defined as its average value over a large number of experiments. It is
indicated as E(X) (or alternatively E(x), X̄, x̄) and it is also referred to as the
expectation, mean or average of X.
Definition A1.9 (Variance). The variance of a random variable X is a measure
of how much X is expected to vary from its mean. Formally, the variance of X
is defined as follows:
σ 2 = E (X − x̄)2 . (A17)

Definition A1.10 (Standard deviation). The standard deviation of a

random variable X is defined as the square root of the variance, i.e., σ.
We use the notation in Eq. (A18) to indicate that X is a random variable with
mean x̄ and variance σ 2 . A pdf of a random variable may be asymmetric around
its mean, and this phenomenon is referred to as skewness [183].
X ∼ (x̄, σ 2 ) (A18)
Definition A1.11 (Gaussian random variable). A random variable X is said
to be Gaussian (or normal) if its pdf is as follows:
1 −(x − x̄)2

fX (x) = √ exp , (A19)
σ 2π 2σ 2
where x̄ and σ are the mean and standard deviation of the Gaussian random
variable, respectively.
We use the notation in Eq. (A20) to indicate that X is a Gaussian random

variable with mean x̄ and variance σ 2 .
X ∼ N (x̄, σ 2 ) (A20)
A peculiarity of the Gaussian random variables concerns its PDF, which can be
obtained by integrating the pdf (A19) (cf. Def. A1.7 ). In fact, a Gaussian PDF
can be approximated by a closed-form expression [183], i.e., normal random
variables are mathematically convenient [167]. For our purposes, it is important
to consider how the pdf of a random variable propagates through a function.
It can be proven that a linear transformation of a Gaussian random variable
results in a new Gaussian random variable [183]. This aspect is crucial for the
Kalman filter (cf. section A1.4).
Up to this point we limited the discussion to one single random variable.
However, the framework of state/input/parameter estimation presents typically
a higher number of random variables, which may or may not be somehow
correlated. Such topic requires the concept of covariance.
Definition A1.12 (Covariance). The covariance of two scalar random
variables X and Y is defined as follows:
CXY = E (X − X̄)(Y − Ȳ) (A21a)

= E(XY) − X̄Ȳ. (A21b)
Definition A1.13 (Correlation coefficient). The correlation coefficient of

two scalar random variables X and Y is defined as follows:
CXY
ρ = , (A22)
σx σy
where σx and σy are the covariances of X and Y, respectively.
The correlation coefficient is a normalised measurement of independence between

two random variables. Note that ρ = 0 in case of independent variables. On
the other hand, ρ = ±1 if one variable is a linear function of the other.
Definition A1.14 (Correlation). The correlation of two scalar random
variables X and Y is defined as follows:
RXY = E(XY). (A23)

22 STATE ESTIMATION
On the contrary, X and Y are said to be uncorrelated if RXY = E(X)E(Y).
We note that if two random variables are independent, then they are also
uncorrelated, but the contrary does not hold.
Definition A1.15 (Orthogonality). Two random variables X and Y are said
to be orthogonal if RXY = 0.
Two uncorrelated random variables are orthogonal only if at least one of them
is zero-mean.
All previous definitions can be generalised for vectors, leading to quantities which
are vectors and matrices. For two vectors x ∈ Rnx and y ∈ Rny , the correlation
and covariance are defined as shown in Eqs. (A24) and (A25), respectively.
Rxy = E(xy > ) (A24a)
E(x1 y1 ) · · · E(x1 yny )

 
=  .. .. .. (A24b)
. . .
 

E(xnx y1 ) · · · E(xnx yny )
Cxy = E (x − x̄)(y − ȳ)> (A25a)

= E(xy > ) − x̄ȳ > (A25b)
Furthermore, it is possible to define an autocorrelation matrix Rx and an

autocovariance matrix Cx as shown in Eqs. (A26), which are both always
symmetric and positive semidefinite [183]. It is worth mentioning that normality
is preserved in linear transformations also in case of Gaussian random vectors
[183]. Moreover, we can employ the definitions that we presented here to model
stochastic processes and (white an coloured) noise.
Rx = E(xx> ) (A26a)
= E (x − x̄)(x − x̄)> (A26b)

Cx
Noise formulation for discrete-time systems
The linearisation process described by Eqs. (A7–A9) resulted in two error

matrices L(·) and G(·), which were already indicated in the state-space equations
in section A1.1.1. Furthermore, the error term in Eq. (A10c) resulting after
the discretisation procedure can also be expressed by a matrix. We can recall
Eq. (A26b) to show that, in case of discrete-time systems, the noise can always
be expressed as purely additive. Let us consider system (A27) with w̃k ∼ (0, Q̃k )
and ṽk ∼ (0, R̃k ).
xk = AD,k−1 xk−1 + BD,k−1 uk−1 + LD,k−1 w̃k−1 (A27a)
yk = CD,k xk + DD,k uk + GD,k ṽk (A27b)
The error term of Eq. (A27a) has a covariance given by Eq. (A28), and a similar
formulation applies to the error term of the measurement equation (A27b)
[183]. Consequently, Eq. (A27) assumes the form of Eq. (A29), with the purely
additive error terms following the distributions indicated in Eq. (A30). It is
common practice to define covariances QD,k = LD,k Q̃D,k L> D,k and RD,k =
GD,k R̃D,k GG,k , such that wk ∼ (0, QD,k ) and vk ∼ (0, RD,k ).
>
E (LD,k−1 w̃k−1 )(LD,k−1 w̃k−1 )> = LD,k−1 E(w̃k−1 w̃k−1

>
)L> (A28a)

D,k−1
= LD,k−1 Q̃D,k−1 L>

D,k−1 (A28b)
xk = AD,k−1 xk−1 + BD,k−1 uk−1 + wk−1 (A29a)
yk = CD,k xk + DD,k uk + vk (A29b)
wk ∼ (0, LD,k Q̃D,k L>

D,k ) (A30a)
vk ∼ (0, GD,k R̃D,k G>

D,k ) (A30b)
Throughout this thesis we only deal with discrete-time systems. We will omit
to explicitly indicate the dependency of the discrete time step k and we will
24 STATE ESTIMATION
consider purely additive noise. Under those assumptions, the generic discrete-
time (nonlinear) system of Eq. (A3) becomes as shown in Eq. (A31).
xk+1 = f (xk , uk ) + wk (A31a)
yk = g(xk , uk ) + vk (A31b)
In case of linear systems, we will omit subscript D from matrices AD , BD , CD ,

DD , as well as the linearisation constants cx,k and cy,k , which may be included
if needed without loss of generality. From Eq. (A4), a generic linear system
and a linear time-invariant (LTI) system are given in Eqs. (A32) and (A33),
respectively. We will deal with Gaussian processes, and Eq. (A34) holds for the
model and measurement errors. The resulting system is referred to as a Markov
process [140].
xk+1 = Ak xk + Bk uk + wk (A32a)
yk = Ck xk + Dk xk + vk (A32b)
xk+1 = Axk + Buk + wk (A33a)
yk = Cxk + Dxk + vk (A33b)
wk ∼ N (0, Qk ) (A34a)
vk ∼ N (0, Rk ) (A34b)
Formulas for the propagation of states and covariances
The definitions of expected value and covariance allow to introduce a crucial

aspect of state estimation, i.e., the propagation of states and covariances
throughout consecutive time steps. It will become clear starting from
section A1.3 that such topic allows for efficient recursive filters for dynamic
systems, which do not need to process the entire time history of a system
whenever new data are available. Let us consider the discrete-time linear
LEAST SQUARES ESTIMATORS AND RECURSIVE FORMULATION 25
system (A29a), where wk ∼ N (0, QD,k ) is a Gaussian noise with covariance QD,k .
Moreover, let its corresponding continuous-time error be w(t) ∼ N (0, QC (t)).
We can investigate how the mean of the state xk changes with time by computing
the expected value of both sides of Eq. (A29a), as indicated in Eq. (A35).
Furthermore, we can define the quantity (xk − x̄k ), and obtain the covariance
Pk of xk as shown in Eq. (A36). All mathematical details can be found in [183].
Eq. (A37) shows the integral formula for QD,k−1 [183], which is strictly linked
to the integral in Eq. (A10c), obtained from the discretisation of w(t). QD,k−1
is in general difficult to calculate, but there exist a few approximations for small
∆t [183] and a convenient formula if AD is invertible [196].
x̄k = E(xk ) (A35a)
= AD,k−1 x̄k−1 + BD,k−1 uk−1 (A35b)
Pk = E (xk − x̄k )(xk − x̄k )> (A36a)

= AD,k−1 Pk−1 A>

D,k−1 + QD,k−1 (A36b)
Z tk
>
QD,k−1 = eAC (tk −τ ) QC (τ ) eAC (tk −τ ) dτ (A37)
tk−1
A1.3 Least squares estimators and recursive formu-

lation
The idea on which least square (LS) estimators are based dates back to the
beginning of the nineteenth century, when Karl Friedrich Gauss published his
Theoria Motus [59]. An example which most engineers face at least once during
their career is linear regression, which aims at minimising an LS residual function
in order to determine some parameters from a set of noisy measurements.A5
Let x ∈ Rnx be a vector of unknowns and let y ∈ Rny be a vector of (noisy)
measurements. Furthermore, let y be a linear combination of x plus some noise
A5 We will consider the same example also in section A2.4, where we discuss optimisation
problems with norm approximations. In the context of optimisation, we will show that LS
minimisation involves a squared `2 -norm.
26 STATE ESTIMATION
term v ∈ Rny . Then every measurement can be expressed by Eq. (A38) or, in
matrix form, as Eq. (A39), where C ∈ Rny ×nx .
nx
X
yi = Ci,j xj + vi ∀ i = 1, . . . , ny (A38)
j=1
y = Cx + v (A39)
Furthermore, let us define a measurement residual y ∈ Rny in Eq. (A40), where

x̂ is the best estimate of x.
y = y − C x̂ (A40)
The LS approach aims at minimising the square of the measurement residual

J in Eq. (A41), which is a cost function (cf. section A2.1) and can be solved
analytically by substituting Eq. (A40) into Eq. (A41), computing the derivative
with respect to x̂ and setting it equal to zero, as shown in Eq. (A42).A6
ny
2y,i = >
X
J= y y (A41)
i=1
∂J
= −2y > C − 2x̂> C > C = 0 (A42)
∂ x̂
The solution of Eq. (A42) is shown in Eq. (A43), and contains term (C > C)−1 C > ,
which is the so-called (left) Moore-Penrose pseudo-inverse, and exists if ny ≥ nx
and C is full rank.
x̂ = (C > C)−1 C > y (A43)
We can generalise Eq. (A43) in case measurements are characterised by a non

uniform confidence, resulting in the weighted LS estimator of Eq. (A44). The
weights are expressed through variances into a covariance matrix R, which is
diagonal in case measurements are independent [183].
x̂ = (C > R−1 C)−1 C > R−1 y (A44)

A6 In order to be a minimum, J should be positive semidefinite [183].
SINGLE STEP ESTIMATORS 27
The implementation of the LS approach in a recursive way results in a scheme

which is particularly interesting for dynamic systems. Assume we have an
estimate x̂ after k − 1 measurements, and a new measurement yk becomes
available. If we want to update the estimation by calling Eqs. (A43) or (A44),
the system may become too big. A valid alternative is then to design a linear
recursive LS estimator such as Eq. (A45), where Kk is called estimator gain
matrix and yk − Ck x̂k−1 is a correction term [183].
yk = Ck x + vk (A45a)
x̂k = x̂k−1 + Kk (yk − Ck x̂k−1 ) (A45b)
The new estimate x̂k is computed only from the previous estimate x̂k−1 and the
new measurement yk without the need to augment the system. We note that if
either the gain or the correction term is zero, then the new estimate is equal to
the previous estimate. For a procedure to determine the optimal gain Kk we
refer to [183]. The least square estimator is historically important because it
made a bridge between Bayes’ rule (Thm. A1.4) and optimal state estimation.
Moreover, it paved the way to the Wiener filter [198, 199] and to the Kalman
filter [94]. It is worth mentioning that Kk relies on a recursive formula for the
covariance of the LS estimation error. In fact, the propagation of states and
covariances (cf. section A1.2) is crucial for recursive filters for dynamic systems.
A1.4 Single step estimators
Single steps estimators are recursive filters in which the estimation takes place
at one single time step. Any available prior or future information is typically
condensed into a covariance matrix and propagated to the estimation time
step. According to the choice of the time step it is possible to distinguish four
approaches, recalling the concepts of conditional probability (Def. A1.3) and
expected value (Def. A1.8). The following four definitions outline the differences
between those approaches [183].
Definition A1.16 (A posteriori estimate). The a posteriori estimate x̂+ k =
x̂k|k is defined as the expected value of xk conditioned on all of the measurements
up to and including time step k.
x̂+
k = E(xk | y1 , y2 , . . . , yk ) (A46)
Definition A1.17 (A priori estimate). The a priori estimate x̂−
= x̂k|k−1
k
is defined as the expected value of xk conditioned on all of the measurements up
28 STATE ESTIMATION
to time step k but not including it.
x̂−
k = E(xk | y1 , y2 , . . . , yk−1 ) (A47)
Definition A1.18 (Smoothed estimate). If nn measurements are available

after time step k, the smoothed estimate is defined as the expected value of
xk conditioned on all of the measurements up to and including time step k + nn .
x̂k|k+nn = E(xk | y1 , y2 , . . . , yk , . . . , yk+nn ) (A48)
Definition A1.19 (Predicted estimate). In order to predict xk from the

measurements available up to time step k − nm , the predicted estimate is
defined as the expected value of xk conditioned on all of the measurements up
to and including time step k − nm .
x̂k|k−nm = E(xk | y1 , y2 , . . . , yk−nm ) (A49)
Each estimate can be associated to a covariance, according to the approach that

led to Eq. (A36). The following sections present a few state of the art linear as
well as nonlinear single step estimators.
A1.4.1 Linear single step estimators
The most popular linear estimator is the Kalman filter (KF) [94]. It exploits
Defs. A1.16–A1.17 and consist of two phases. The first phase involves Eqs. (A50),
which derive from Eqs. (A35–A36) and show the time update equations x̂− k and
Pk− , i.e., the propagation from the a posteriori estimate of the previous time
step to the a priori estimate of the current time step. This phase is referred to
as the prediction.
+
x̂−
k = Ak−1 x̂k−1 + Bk−1 uk−1 (A50a)
+
Pk− = Ak−1 Pk−1 k−1 + Qk−1
A> (A50b)
Next, Eqs. (A50) (which are a priori estimates) are updated with a new
measurement, recalling Eq. (A45). This is the so-called update phase and results
in the a posteriori estimates (A51). For the details of the KF we refer to [183].
We note that at time step k = 0 we need to initialise x̂+ +
0 and P0 according to
our best knowledge of the system.
Kk = Pk− Ck> (Ck Pk− Ck> + Rk )−1 (A51a)
x̂+ − −
k = x̂k + Kk (yk − Ck x̂k ) (A51b)
Pk+ = (I − Kk Ck )Pk− (I − Kk Ck )> + Kk Rk Kk> (A51c)
Literature offers alternate forms for Pk+ and Kk . These can be found in [183],
together with their derivations and a discussion regarding their strengths and
weaknesses. Furthermore, it can be shown that if xk is a constant, then Ak = I,
Qk = 0 and uk = 0, and the KF reduces to a recursive LS estimator [183].
The most important feature of the KF is that Pk− , Kk and Pk+ do not depend
on the measurements yk , but depend only on the system parameters Ak , Ck ,
Qk and Rk . This implies that Kk can be precomputed off-line before starting
the filter, saving computational effort and allowing for a convenient real time
implementation.
We can assess the performance of the KF by introducing an error x̃ = xk − x̂,
which is a random variable, since xk is linked to the stochastic process wk and
x̂ is further computed by the stochastic process vk [183]. We can then minimise
the LS error as indicated by Eq. (A52), where Sk is an arbitrary (diagonal)
weighting matrix.
minimise E(x̃> Sk x̃) (A52)
If wk and vk are Gaussian, zero-mean, uncorrelated and white, then the

KF is the solution of problem (A52), i.e., the KF is optimal under those
hypothesis. If wk and vk are zero-mean, uncorrelated and white, then the KF
is the best linear solution of problem (A52), i.e., the KF is the best linear
filter under those hypothesis (even if the noise is not Gaussian). The over-
mentioned assumptions are often being violated during execution, mostly due
to finite precision arithmetic (typical of digital processing) and modelling errors.
The designer of the filter should then pay attention to these aspects and act
accordingly [183]. Generalisations of the KF do exist if wk and vk are correlated
or coloured [183], while approximations of the KF have been proposed in case
of nonlinear systems (cf. section A1.4.2).
Beside the KF, the so-called H∞ filter (or minimax filter) is a single step
linear filter that can handle modelling errors and noise uncertainty, and for
this reason it belongs to the family of the so-called robust filters, i.e., the
H∞ filter does not make any assumption about the noise. It minimises the
worst-case estimation error, while the KF minimises the expected value of
30 STATE ESTIMATION
the variance of the estimation error. Moreover, its formulation is based on

a constraint optimisation problem (cf. chapter A2). It is worth mentioning
that under certain (unconstrained) settings the H∞ filter reduces to the KF.
Under such perspective it can be seen as a robust version of the KF [183].
Furthermore, mixed KF/H∞ estimation techniques have been also developed
[183]. To summarise, the H∞ may be preferred over the KF for systems in which
stability margins must be guaranteed, or if worst-case estimation performance
is of primary interest, or if the model changes unpredictably, or if the model is
poorly known [183].
A1.4.2 Nonlinear single step estimators
Let us now consider some nonlinear approaches based on the KF. Nonlinear
systems are common in practice since perfectly linear systems do not exist. It is
true on the one hand that some systems can be well represented by a linear model,
but on the other hand nonlinear models are unavoidable in many situations,
where a linear system would not lead to a sufficiently accurate approximation.
Furthermore, in the framework of joint estimators (cf. section A1.6) a linear
system may become nonlinear for what inputs and/or parameters are concerned.
A straightforward approach for handling nonlinearities consists of linearising
a system (cf. section A1.1.2) and run a KF. The only problem is to choose a
nominal state trajectory for the linearisation. The so-called extended Kalman
filter (EKF) [14, 183] tackles this problem by exploiting the a posteriori KF
estimation at the previous time step, i.e., the EKF performs a Taylor linear
approximation around xk−1 = x̂+ k−1 and wk = 0 [183]. The EKF is the most
popular estimation approach for mildly nonlinear systems since the only extra
element in comparison with the KF is the linearisation. Consequently, its
computational demand remains relatively low.
For highly nonlinear systems, higher-order approaches based on the EKF
may perform well. For example, the iterated EKF applies a further first-
order Taylor approximation around the new EKF estimate, and runs the EKF
again. This can be repeated iteratively until a certain linearisation accuracy
is obtained. Alternatively, a second-order EKF employs a second-order Taylor
series expansion of the system equations [6, 183]. Another approach for nonlinear
systems is the Gaussian sum filter [4, 183], in which a non-Gaussian pdf is
approximated by a sum of nm Gaussian pdfs, resulting in nm parallel KFs.
Their nm estimates are then combined in a final estimate. The pdf can also be
approximated by using non-Gaussian functions. In comparison with the EKF,
all other over-mentioned approaches require more computational power, and
their popularity is not growing. In fact, an EKF running at a high sampling
rate may be more efficient.
One of the challenges of nonlinear estimators regards the propagation of pdfs,

since they change shape under nonlinear transformations. For this reason,
linear approximations can result in errors in the propagation of means and
covariances when a random variable undergoes a nonlinear function [183]. In
case of severe nonlinearities, the unscented Kalman filter (UKF) [92] overcomes
this issue by performing the nonlinear transformation on a single point instead
of transforming the full pdf. Moreover, it is possible to find a set of individual
points in state-space whose sampled pdf approximates the true pdf of a state
vector. Those points are the sigma points, and take part in the so-called
unscented transformation. This strategy is more accurate than a linearisation for
propagating means and covariances, provided that the approximations to derive
the sigma points are valid, and it is used within a KF scheme by replacing the
EKF equations with unscented transformations, leading to the UKF. Different
unscented transformations are characterised by a different amount of sigma
points. The simplex sigma points are the minimum quantity required, but
may be lead to numerical instability [183], whereas the spherical unscented
transformation overcomes this issue [90, 91, 183].
The last estimation approach that we indicate in this section is the particle filter,
which is based on Bayes’ rule (Thm. A1.4) implemented in a recursive fashion
[46, 66, 172]. The motivation for such approach comes from the weaknesses
of the already mentioned estimators, i.e., the filters based on the EKF can be
difficult to tune and have limitations if nonlinearities are severe, and the UKF is
an approximate nonlinear estimator. On the other hand, the particle filter is a
completely nonlinear estimator. It is probability-based, and comes at a price of
a higher computational effort. The particle filter collapses to the KF under the
assumptions of the KF (cf. section A1.4.1), proving the optimality of the latter.
We can implement the particle filter in practice by setting up a Monte Carlo
analysis. We generate nn state vectors based on an initial pdf, which we assume
to be known a priori. Those nn vectors are called particles [183] and they are
chosen at random, in contrast to the UKF, where the choice of sigma points
follows a specific criterion. Before concluding this list of single step estimators
and presenting a general comparison among them, let us mention that there
exist extensions of the estimators based on the KF that offer the possibility to
include state constraints [184].
A1.4.3 Discussion of single step estimators
In sections A1.4.1 and A1.4.2 we presented a few single step estimators, from
linear approaches (recursive LS estimator, KF, H∞ filter) up to methodologies
that can handle mild as well as severe nonlinearities (EKF and its variants,
UKF, particle filter). For linear systems, the KF provides the optimal estimate
32 STATE ESTIMATION
under the hypothesis of white uncorrelated zero-mean Gaussian noise, and

it is the optimal linear filter (in an LS sense) if the noise is not Gaussian.
The KF reduces to a recursive LS estimator if the estimates are constant in
time. Furthermore, the H∞ filter may perform better if the model is poor or
changes unpredictably, or in case constraints are needed, at a price of solving a
constrained optimisation problem.
For nonlinear systems, if nonlinearities are not severe the EKF stands out from
other state of the art estimators since it requires a reasonable computational
effort. Higher-order variants of the EKF have been proposed to cope with
strong nonlinearities, but their popularity is not growing due to the increased
computational requirements and the lack of evidence of a performance gain.
In case of severe nonlinearities, the UKF proposes to propagate a set of sigma
points that represent the mean and covariance of the current time step, which
allows to reconstruct a (zero skew) pdf after a nonlinear transformation. This
implies a higher computational load proportional to the number of sigma points.
Finally, the particle filter is based on Bayesian probability theory, is a fully
nonlinear estimator and involves a Monte Carlo simulation. This requires high
computational power, which be the bottleneck for online operations.
The choice of a filter is strongly application dependent, and the knowledge of
the system is crucial for selecting the best strategy. For this reason, it is difficult
to compare all filters. Reference [183] presents a generic comparison, where it
is clear that under the hypothesis of the KF it is not worth to implement any
filters but the KF.
A1.5 The moving horizon estimator
In the previous section we introduced the KF as a single step linear estimator,

and we outlined possible extensions to cope with common scenarios in which
the assumptions on which the KF is based are violated. All variants have in
common an increasing computational effort, which for the family of the EKF
consist in a linear (or higher order) approximation (in case of recursive EKF also
a few extra iterations), while the family of the UKF requires a certain number
of sigma points to be evaluated, which become even grater for the particle filter.
In this section we consider a multistep approach for nonlinear systems, in which
the estimation takes place within a time window instead of at a single time
step. This methodology is the moving horizon estimator (MHE), which in its
simplest formulation can be seen as an extension of the EKF on a finite length
time window sliding over time [72, 166, 167]. The greatest advantage of such
estimator is its ability to capture nonlinear behaviours in a linearised fashion
THE MOVING HORIZON ESTIMATOR 33
within a time window instead of a single time step. Even if optimality cannot
be guaranteed, the MHE has been shown to perform better than the EKF in
case of cost functions with local optima [71, 72]. The price to pay is again a
higher computational effort, which scales proportionally to the window length.
The MHE is becoming a popular estimator thanks to its intrinsic capability
to handle nonlinearities. Moreover, a recent research stream regarding model
predictive control (MPC) and optimisation algorithms is developing tools with
the aim to improve MPC/MHE real time implementation (cf. chapter A2).
However, the basic idea of MHE is not new. A list of references to trace the
history of multistep estimators is given in [167].
In the context of the CS-MHE for joint state/input estimation, we chose the
MHE among all other estimators since it provides a time dimension in which
we can exploit input sparsity (cf. section A1.7.4). In the remaining part of
this section we introduce the MHE, which we will recall in chapter B1 for the
derivation of the CS-MHE.
A1.5.1 Derivation of the moving horizon estimator
In this section we show how the MHE is derived from probability theory. Since
we are interested in implementing an MHE scheme on a digital computer, we
limit the discussion to a discrete-time MHE. Let us consider the full length
time signal of a nonlinear discrete-time dynamic system such as Eq. (A31),
from an initial point (k = 0) to the current time step (k = T ). We remind
that noise is assumed to be purely additive (cf. section A1.2). Under the
assumption of a discrete time Markov process (cf. section A1.2) [140], we can
formulate a state estimation problem by recalling the definition of conditional
probability (Def. A1.3) applied to a probability density function (pdf). Our
interest regards the conditional pdf in Eq. (A53), i.e., the pdf of the state
evolution x0 , x1 , . . . , xT given the process measurements y0 , y1 , . . . , yT −1 . In
fact, the optimal estimate of the state at time step k given measurements
y0 , y1 , . . . , yT −1 is a function of Eq. (A53) [167].
p(x0 , x1 , . . . , xT | y0 , y1 , . . . , yT −1 ) (A53)
We can express the joint probability of the state (Def. A1.2) as in Eq. (A54),
where px0 represents the prior information about the initial state of the system.
Moreover, Eq. (A31b) for the measurements leads to the conditional probability
in Eq. (A55).
34 STATE ESTIMATION
−1
TY
p(x0 , x1 , . . . , xT ) = px0 (x0 ) p(xk+1 | xk ) (A54)
k=0
−1
TY
p(y0 , y1 , . . . , yT −1 | x0 , x1 , . . . , xT −1 ) = pvk [yk − g(xk , uk )] (A55)
k=0
By applying Bayes’ rule (Thm. A1.4) to Eqs. (A54–A55) we obtain the

proportional relationship in Eq. (A56). The properties of logarithms allow to
recast Eq. (A56) as Eq. (A57), which paves the way to a multi-stage optimisation
[183].
p(x0 , x1 , . . . , xT | y0 , y1 , . . . , yT −1 )
−1
TY
∝ px0 (x0 ) pvk [yk − g(xk , uk )] p(xk+1 | xk ) (A56)
k=0
arg max p(x0 , x1 , . . . , xT | y0 , y1 , . . . , yT −1 ) =

x0 ,x1 ,...,xT
= arg max log p(x0 , x1 , . . . , xT | y0 , y1 , . . . , yT −1 ) (A57a)

x0 ,x1 ,...,xT
T
X −1
= arg max log pvk [yk − g(xk , uk )]
x0 ,x1 ,...,xT
k=0
+ log p(xk+1 | xk ) + log px0 (x0 ) (A57b)
From the state equation (A31a), we can express p(xk+1 | xk ) as indicated in

Eq. (A58) and substitute it into Eq. (A57). This results in Eq. (A59), which is
an optimisation problem of variables pvk (·), px0 (·) and pwk (·).
p(xk+1 | xk ) = pwk [xk+1 − f (xk , uk )] (A58)


x0 ,x1 ,...,xT
T
X −1
= arg max log pvk [yk − g(xk , uk )]
x0 ,x1 ,...,xT
k=0
+ log pwk [xk+1 − f (xk , uk )] + log px0 (x0 ) (A59)
If we choose pvk (·), px0 (·) and pwk (·) as the three Gaussian distributions in
Eq. (A60), we obtain problem (A61), where we defined kzk2H = z > Hz. It is
worth noting that the new problem involves a minimisation of the errors of a
model and measurements.
px0 (·) ∼ N (x̄0 , Π0 ) (A60a)
pwk (·) ∼ N (0, Q) (A60b)
pvk (·) ∼ N (0, R) (A60c)

x0 ,x1 ,...,xT
T −1
kyk − g(xk , uk )k2R−1
X
= arg min
x0 ,x1 ,...,xT
k=0
+ kxk+1 − f (xk , uk )k2Q−1 + kx0 − x̄0 k2Π−1 (A61)

0
Furthermore,
LB UB we add a set of bounds on
the optimisation variables, i.e., xk ∈
xk , xk , wk ∈ wkLB , wkUB , vk ∈ vkLB , vkUB , where suffixes LB and UB

indicate lower and upper bounds, respectively. Those bounds are typical closed
and convex, such as polyhedral convex sets. A constraint on wk is common
practice, but the same is not true for vk and xk . In fact, reference [167]
advises against a constraint on the measurement noise due to the possibility
of outliers. Furthermore, constraining a state is not a trivial task since xk
and wk may be correlated, and a constraint may have as consequence the
violation of causality [167]. In general, constraints can significantly alter the
probabilistic structure of the problem. Nevertheless, their advantage is great
36 STATE ESTIMATION
during the modelling process, since constraints allow for simplified models
[167, 184]. For example, in problem (A62) we can see that a generic state-space
model has been directly implemented by the constraints (A62b–A62c). We
−1
defined {wk }Tk=0 = {w0 , w1 , . . . , wT −1 }.
T −1
kvk k2R−1 + kwk k2Q−1 + kx0 − x̄0 k2Π−1
X
minimise (A62a)
−1
x0 ,{wk }T 0
k=0 k=0
subject to xk+1 = f (xk , uk ) + wk (A62b)
yk = g(xk , uk ) + vk (A62c)
xk ∈ xLB UB
, wk ∈ wkLB , wkUB , vk ∈ vkLB , vkUB (A62d)

k , xk
We notice that matrices Q and R are the tuning parameters for matching
the process model with the measurements. In fact, Q provides a measure of
confidence in the model, whereas R provides a measure of confidence in the
measurement system. Furthermore, matrix Π0 provides a measure of confidence
about the knowledge of the initial state x0 . Covariances Q and R have to be
built according to the information that are available about the model and the
measurement system. For the latter this is usually a simple task, since the
accuracy of the measurement system is known, while a few assumptions are
needed to choose a value for the model uncertainty.
Problem (A62) grows without bounds in time, up to the point at which solving
the system would result infeasible. For this reason, in Eq. (A63) the cost
function (A62a) is divided into two parts, the second of which is characterised
by a fixed horizon of N time steps.
T −N
X−1
ΓT = kvk k2R−1 + kwk k2Q−1 + kx0 − x̄0 k2Π−1 (A63a)
0
k=0
T −1
kvk k2R−1 + kwk k2Q−1
X
+ (A63b)
k=T −N
The second line of ΓT , indicated by Eq. (A63b), depends only on the state
−1 T −1
xT −N , the disturbance {wk }Tk=T −N and the process measurements {yk }k=T −N .
The last step before we can obtain the MHE formulation is to define the arrival
cost from Eq. (A63a). The arrival cost is labelled as ZT −N (x̄0 ) in Eq. (A64),
where x̂(·) is an already available estimate. We will discuss this MHE parameter
in section A1.5.2. Finally, problem (A65) shows a generic MHE [167]. It is
worth mentioning that for unconstrained linear systems the MHE collapses to
the KF [71].
ZT −N (x̄0 ) = minimise {ΓT −N (x0 , {wk }) : xT −N = x̄0 } (A64a)

T −N −1
x0 ,{wk }k=0
= kxT −N − x̂T −N k2Π−1 (A64b)

T −N
T −1
kxT −N − x̂T −N k2Π−1 kwk k2Q−1 + kvk k2R−1 (A65a)
X
minimise +
−1
xT −N ,{wk }T T −N
k=T −N k=T −N
yk = g(xk , uk ) + vk (A65c)
xk ∈ xLB UB
, wk ∈ wkLB , wkUB , vk ∈ vkLB , vkUB (A65d)

k , xk
The generic MHE described by problem (A65) can be subjected in practice to

small modifications connected to the availability of the measurements at the
current time step (i.e., yk ) as well as the need to predict the state of the next
time step. This topic has strong links with observability (cf. section A1.8) [105].
The estimate of the last time step of the moving horizon can thus be an a priori
or an a posteriori estimate. This section presents the MHE formulation for state
estimation problems that we described in [104], which we adopt throughout this
dissertation. Further details can be found in [72, 166, 167].
We define an estimation window of length N between the discrete time steps
k = T −N +1 and k = T , as shown in Fig. A1.1, and problem (A66) summarises
the MHE formulas. Accordingly, the prior information enters the estimation
window through the term x̄T −N +1 in Eq. (A66d), and its value is set according
to the chosen arrival cost strategy (cf. section A1.5.2). Consequently, the arrival
cost includes the past information from k = 0 to k = T −N +1 in the estimation.
It refers to the first time step of the moving window, i.e., k = T −N +1, and it
is identified by subscript “a” in problem (A66) (“a.c.” in Fig. A1.1).
38 STATE ESTIMATION
Estimation Window →
a.c.
0 T−N+1 T−1 T
k
Figure A1.1: MHE strategy. Figure reproduced from [105].
T
X −1 T
X
minimise wa> Pa−1 wa + wk> Q−1
k wk + vk> Rk−1 vk (A66a)
wa ,wk ,vk
k=T −N +1 k=T −N +1
yk = g(xk , uk ) + vk (A66c)
xT −N +1 = x̄T −N +1 + wa (A66d)
xk ∈ xLB UB
, wk ∈ wkLB , wkUB , vk ∈ vkLB , vkUB (A66e)

k , xk
Eq. (A66a) is a cost function and consists of three noise terms to be minimised.
From left to right, they are related to the arrival cost wa ∈ Rnx , the model
error wk ∈ Rnx and the measurement error vk ∈ Rnr , where nx and nr are the
number of states and transducers, respectively. We assume that each variable
is associated with a covariance matrix as indicated in Eq. (A67).
wa ∼ N (0, Pa ) (A67a)
wk ∼ N (0, Qk ) (A67b)
vk ∼ N (0, Rk ) (A67c)
Eqs. (A66b–A66c) are the state-space representation of a discretised system

under the hypothesis of additive noise (cf. section A1.2). Functions f (xk , uk )
and g(xk , uk ) depend on the state vector xk ∈ Rnx and on the input uk .
Eq. (A66d) refers to the arrival cost and finally Eq. (A66e) implements the
bounds on the optimisation variables.
A1.5.2 The arrival cost to include the past information into

the MHE
The role of the arrival cost for the MHE is to include the past information in
the estimation. Reference [71] outlines two families of approaches regarding the
computation of the arrival cost, i.e., the filtering and the smoothing schemes
(cf. Defs. A1.17–A1.18). In a filtering scheme, the optimisation takes into
account the past information by penalising the deviations of the initial estimate
within an horizon from an a priori estimate. In other words, the arrival cost is
evaluated based on information which are collected before (and not including)
the moving horizon. On the other hand, a smoothing scheme involves the
penalisation of the deviations of the trajectory of the states within the whole
estimation horizon from an a priori estimate. In this case, the arrival cost can
benefit from information which are located in the past (i.e., before the current
time step) but at the same time they are virtually in the future with respect
to the first time step of the estimation horizon. Reference [71] indicates the
smoothing scheme as superior to the filtering scheme.
There are several ways to deal with the arrival cost. A first approach is simply
not to take into account any information prior to the sliding window. A second
method is to set a constant value for the arrival cost (i.e., Pa = constant),
and a third technique employs a recursive filter, such as the EKF or the UKF
(cf. section A1.4.2) [155, 167]. If we follow the smoothing approach, the so-
called smoothed arrival cost exploits the covariance matrix of the optimisation
problem [122, 123, 168], which implements a quadratic approximation of the past
information [37]. It will become clear in chapter B1 that this approach is well
suited for the CS-MHE, since the covariance matrix is required independently
of the arrival cost (cf. section B1.2.1).
The effect of the arrival cost is minimal for a long window, in which a large set of
data is available for the optimisation, such that an extra element does not have
a strong influence on the solution. For the same reason, a good estimation of the
arrival cost is crucial if the window is short. The latter is the most interesting
case from a practical point of view, because a short window allows for a faster
computation, which may be the bottleneck for online (real time) applications.
The window length is then a tuning parameter for the MHE. According to [167],
a good trade-off between accuracy and computational effort can be obtained by
setting the length of the moving window as twice the observability index of the
system.
40 STATE ESTIMATION
A1.5.3 Discussion about the MHE
In section A1.4 we showed that the KF provides the optimal estimates for
linear Gaussian systems at a very reasonable computation cost. Therefore, in
such a situation an MHE would not perform better and would require a higher
computational effort. However, optimality is not guaranteed for nonlinear
systems, and this applies to single step approaches as well as to the MHE.
Furthermore, both the nonlinear extensions of the KF and the MHE require a
higher computational cost than the KF. A comparison is not a simple task since
the computational performance of the MHE depends on the horizon length and
on the implementation strategy.
In general we expect the MHE to be more computationally demanding than
the EKF, but this comes with great advantages of a better description of the
nonlinear behaviour, the possibility to capture fast changes in the system and
robustness. In other words, optimality is not guaranteed for the MHE, but in
practice the MHE performs better than an EKF for nonlinear systems and in
case of estimation problems with multiple optima [71, 72]. For those reasons, the
MHE is gaining popularity among all other model based estimation techniques.
Finally, we will mention in chapter A2 that tailored algorithms are available for
a fast implementation of the MHE.
A1.6 Joint state/input/parameter estimation
We opened this chapter by introducing the state-space representation as a tool

to model the internal condition of a system, which may also depend on unknown
internal parameters as well as unknown external sources. A low-complexity
model can ignore the influence of these factors by treating them as uncertainties,
but we should avoid this procedure if we aim at a detailed representation. In this
latter case, we can acquire the knowledge of parameters and external sources by
direct measurements and/or additional models, keeping in mind that those bring
some uncertainty into the system. This section describes a further strategy,
which proposes to treat parameters and inputs as unknowns, and estimate them
together with the states.
In case of estimation of multiple variables such as states, inputsA7 and
parameters, two main approaches can be followed, i.e., a dual filter approach or
a joint estimator (Fig. A1.2). For an extensive treatment of this topic we refer
to [74, 195]. A dual filter approach consists of two filters running in parallel
A7 External quantities such as inputs are sometimes referred to as disturbances, and they
influence the internal condition of a system.

INPUT MODELS 41
𝑦𝑘
Estimator 1 𝑥1,𝑘
𝑥1,𝑘
𝑦𝑘 𝑥1,𝑘 𝑥2,𝑘 Joint estimator 𝑥𝑘 = 𝑥
2,𝑘
Estimator 2 𝑥2,𝑘
Figure A1.2: Two strategies for the estimation of multiple variables. Dual filter
(left) and joint estimator (right).
exchanging the updated estimates. This method does not imply that one filter is
dedicated to the states while the other takes into account the inputs/parameters.
In fact, it may be convenient for certain applications to split the state vector
into two parts, provided that the dependency between the two new state vectors
is not too strong. An example is the separation between variables referring
to a vehicle longitudinal and lateral dynamics in [80]. On the other hand, a
joint estimator is a single filter based on state augmentation, i.e., the number of
state variables increases by the number of inputs/parameters to be estimated
[133, 183]. In parallel to this, also the state equations can be augmented with
extra equations that refer to the newly introduced variables.
The CS-MHE is a joint filter, and in the next section we will present some
state of the art techniques to model an input. The main advantage of a joint
estimator over a dual filter approach consist in capturing the cross-coupling
between all (augmented) states by means of a single covariance matrix, while
a drawback involves possible observability issues that can arise if we want to
estimate too many variables from too few information (cf. section A1.8). It
is worth mentioning that a linear system can be nonlinear with respect to its
inputs and/or parameters. For this reason, joint estimators require nonlinear
techniques, such as the nonlinear single step approaches based on the KF (cf.
section A1.4.2) and the MHE (cf. section A1.5).
A1.7 Input models
In the previous section we introduced the concepts of joint estimation and state
augmentation, i.e., inputs and/or parameters to be estimated become part of
the state vector (cf. section A1.6). At the same time, additional equations may
be needed to model these new states. Furthermore, the system may become
unobservable (cf. section A1.8). Joint state/input estimators can benefit from
42 STATE ESTIMATION
input representations. In fact, these provide the estimation problem with some
extra information aiming at an improved observability and a higher estimation
accuracy.
In this section we outline four state of the art approaches for modelling an
augmented state in case of joint estimators (sections A1.7.1 to A1.7.4). We draw
the attention on the basic concepts, main applications, and points of strength
and weakness of each scheme. All methodologies that we present here can
be applied to represent both inputs and parameters. Nevertheless, parameter
estimation is in practice a simpler task since parameters influence the system
from an internal source which is relatively easy to locate. Moreover, their
dynamic behaviour is rather slow, since a parameter is likely to evolve slowly
over time. Being the estimation of an unknown input our primary interest,
high dynamic ranges and uncertainties make the task more challenging. This
section ends with a discussion that allows us to justify the choice of the MHE
and CS as foundation of the CS-MHE, and consequently of this dissertation
(section A1.7.5).
A1.7.1 No input model
Let us begin this overview of representations for unknown inputs and parameters
by mentioning that it is possible to set up an estimation problem without
including any additional model. In such case there are neither extra equations
nor augmented states. The unknowns are taken into account by some uncertainty
terms (covariances), which have to be properly tuned. Furthermore, the model
for the state estimation problem should be as accurate as possible, to avoid
that any model uncertainty is considered as an input. We refer to this situation
with the acronym NI (no input).
The NI approach is used in the field of linear observers for unknown inputs, it
is very general and it is proven to be optimal for linear cases, i.e., it results
in the minimum variance [61, 62, 81, 82, 83, 111, 124, 125, 128]. It requires
measurements (including acceleration) to be processed at their exact time step
and for this reason it is often implemented following a smoothing approach (cf.
Def. A1.18) [124, 127]. If the input location is not known, it is possible to define
an arbitrary input which excites the system in an equivalent way [124]. The NI
approach requires a good knowledge of the system model, it is not suitable for
many inputs and requires a certain amount of measurements (more than the
states [124]). Among different methods for force reconstruction that fall into the
NI group, we mention inverse methods (e.g., the inverse structural filter [186])
and the unknown input observer [112]. Inverse methods are widely employed
but they often suffer from numerical instability [70], while the unknown input
INPUT MODELS 43
observer [69, 113] is a time domain approach which involves state augmentation.
It was originally developed for pole placement in control engineering, it operates
only on time-invariant systems, and it requires a full column rank feedthrough
matrix and a number of sensors which is at least as the number of unknown
inputs plus the number of any nonlinear term [112].
To summarise, the NI is well suited for linear systems if a very accurate model
is available and if a large number of transducers can be placed on the system.
While this could be the case in civil engineering, these requirements are likely not
to be satisfied for mechanical systems. For example, the joints of a mechanism
may bring in several model uncertainties as well as nonlinear behaviours, which
are extremely challenging to be modelled accurately. Furthermore, conditions
such as temperature gradients and high rotational speeds make measurements
more difficult from a hardware point of view. In general, the installation of
transducers modifies the system dynamics and adds extra costs (for purchase
and maintenance). For this reasons, in this dissertation we will not focus on
the NI approach. However, in section B2.1 we will set up an MHE scheme for
joint state/input estimation that utilises the NI, which we employ as a term
of comparison while discussing rank and condition number of the CS-MHE
matrices.
A1.7.2 Random walk model
Within the framework of model based estimation approaches, the so-called

random walk (RW) model [183] stands out from all available modelling
strategies for the description of inputs, parameters and disturbances in general
[52, 61, 62, 125, 126, 133, 134, 169, 170, 183]. Eqs. (A68–A69) illustrate how
the RW is formulated in continuous time (subscript C) and discrete time
(subscript D), respectively. We chose variable u since this dissertation focuses
on input estimation. However, u in Eqs. (A68–A69) can refer to a parameter or
disturbance without loss of generality. We assume that the noise terms wC,u
and wD,u associated with the RW follow a Gaussian distribution with zero mean
and variances QC,u and QD,u , respectively.
u̇ = 0 + wC,u (A68a)
wC,u ∼ N (0, QC,u ) (A68b)

44 STATE ESTIMATION
uk+1 = uk + wD,u (A69a)
wD,u ∼ N (0, QD,u ) (A69b)
The RW involves a time derivative, and the case illustrated by Eqs. (A68–A69)
is referred to as a zero-order RW. Higher orders of RW models are possible,
but they are not popular since they require the estimation of extra augmented
states, leading to possible observability issues. Variances QC,u and QD,u are
crucial tuning parameters for the estimation. In fact, they determine now much
a variable can vary between two consecutive time steps. A low variance allows
the parameter to vary smoothly. At the same time, it implies a high accuracy
of the input model. On the other hand, we would require a high variance if we
expect a variable to evolve with a fast dynamics, but this is very likely to make
the input model unreliable up to the point that the augmented state has no
weight in the estimation problem. For this reason, the designer of a filter should
pay special attention when setting the covariances, since a wrong choice can
jeopardise the estimation [195]. In case a covariance needs to be adapted online,
reference [75] proposes a so-called forgetting factor for the estimation of the
covariance of a parameter, while the Robbins-Monroe stochastic approximation
scheme recursively adapts the covariance of the process noise based on the
Kalman gain [119].
A zero-order RW model can effectively represent a parameter, since parameters
are typically not expected to vary much over two consecutive time steps. On
the other hand, if a signal input is characterised by a fast dynamics, then the
covariance associated to it would be relatively high and may lose its influence
within the estimator. An RW model is implemented by augmenting the state
vector with each input, assuming that the input signal evolves relatively slow
and according to a predefined covariance value. Its applicability holds for
single step as well as for multistep filters (cf. sections A1.4–A1.5). We can
expect better observability in comparison with the NI approach provided that
the number of augmented states satisfies the observability requirements (cf.
section A1.8). In a filtering scheme, the RW model may result in a delayed
input estimation.
To summarise, the RW is widely employed, being it a very generic modelling
tool. It can be instrumental for both single step and multistep estimators, and
offers better observability in comparison with the NI approach, provided that
the dynamics of the estimate is relatively slow. The RW does not model any
physical behaviour, often results in a delayed estimation, and the performance
of the estimation problem depends strongly on the tuning of the covariances.
Each augmented state is linked to an RW equation, and observability issues
INPUT MODELS 45
Estimation Window →
order 2
order 1
order 0
0 T−N+1 T−1 T
k
Figure A1.3: Polynomial approximation strategy.
may arise if we need to model too many estimates, e.g., in case the location of
an input is not known a priori. Furthermore, an input applied at a location
which is not modelled may take part of both the state estimation and the input
estimation in a non controlled manner.
A1.7.3 Polynomial approximation
Reference [38] proposes to model an input or parameter as a (low order)

polynomial within an MHE scheme. Such polynomial approximation (PA)
carries the information of the time derivatives of the input or parameter for
a time window, and for this reason its applicability is limited to multistep
estimators. The PA decreases the time delay which is typical of single step
estimators, and allows for value extrapolation in the neighbourhood of the time
window. The latter feature provides a prediction (cf. Def. A1.19) which can be
exploited within an MPC scheme. The mathematical formulation of an MHE
with a PA is included in [38].
An accurate representation of inputs or parameters may require a high
polynomial order, resulting in the need to estimate more values, i.e., each
coefficient of the polynomial is an augmented state in the estimation problem.
This holds especially for inputs which exhibit a fast dynamics in relation to
the window length, and may lead to observability issues. On the other hand,
parameters are usually expected to change slowly over time. For this reason,
reference [38] recommends a zero order polynomial for parameter estimation,
i.e., a parameter is assumed to be constant within the estimation window.
Fig. A1.3 illustrates three different polynomial orders within an estimation
window, obtained by least square regression. In the context of an MHE, it is
46 STATE ESTIMATION
worth mentioning that an RW model requires the estimation of one parameter

for each time step, while the PA confines the number of augmented states to
the order of the polynomial plus one (e.g., a zero order polynomial adds one
variable to the estimation problem). Furthermore, the PA can filter any noise
whose dynamics exceeds the chosen polynomial order.
An interesting aspect arises when considering the PA as a Taylor expansion
around a chosen time step k [38]. In particular, three meaningful choices are
k = 0, k = T−N−1 and k = T . Having the centre of the Taylor expansion at the
origin (k = 0) implies a growing uncertainty in time, which may be undesired.
For this reason, centring the expansion at the beginning (k = T −N −1) or at
the end (k = T ) of the moving window may be a better choice. Those two latter
options are convenient for smoothing and filtering techniques, respectively (cf.
Defs. A1.17–A1.18) [38].
To summarise, the PA is a powerful modelling technique for quantities that vary
smoothly within an estimation window. It can result in better observability
compared to an RW especially in case of a long estimation window and a low
polynomial order. Furthermore, it compensates for the lack of the feedthrough
matrix allowing for prediction. The PA is not suited for single step estimators
and cannot model accurately a variable that evolves fast in time with respect
to the window size.
A1.7.4 Basis functions and compressive sensing
Compressive sensing (CS) is a technique for data acquisition and compression

which is currently being investigated as a powerful instrument in the framework
of estimation problems based on the KF. CS allows to acquire and recover
undersampled signalsA8 and it is based on a concept referred to as signal
sparsity, i.e., a signal can be represented by a few components belonging to a
predefined set of basis functions [25, 73, 150]. In this section we outline the
state of the art of CS as a tool to boost the performances of estimation problems
and to represent an input. The discussion comes mainly from [104]. We will
treat the mathematical details of CS in chapter A3.
We found the first example of CS as a tool to improve the KF in reference
[197], where the author exploited the fact that the sparsity pattern of a
signal changes slowly over time, which allowed to design a KF with a limited
amount of measurements. References [29, 95] further developed this idea,
while [32] introduced other two sparsity conditions in order to improve the KF
A8 We say that a signal is undersampled if it does not satisfy the Nyquist-Shannon sampling
theorem.
INPUT MODELS 47
performances in terms of estimation error or convergence time, i.e., sparsity in

the state and sparsity in the innovations. CS is gaining attention in the fields
of structural health monitoring (SHM) and fault detection in order to limit the
number of required sensors [9, 88, 117, 135, 148, 185] and the amount of data
to be transmitted for processing [129]. Reference [58] proposes to exploit CS
since the structural eigenmodes are sparse in the frequency domain. The way
the eigenmodes change in time indicates a damage. For condition monitoring
of rolling bearings, reference [118] performs acoustic measurements and uses
CS for data compression and processing, in order to lower the computational
power that the high sampling rates of an acoustic signal requires.
Regarding input modelling, reference [63] applies CS for the detection of a
single force impact entering a mechanical system at an unknown location,
such that the signal is known to be sparse in time and space, and CS allows
for an accurate input estimation. Input sparsity in space has also been
exploited in [171], where a frequency domain approach is proposed to identify
unknown dynamic forces on a structure. Input models are key elements of
force identification problems. References [151, 152, 153, 154] focus on inverse
methods for force identification based on the `1 -norm regularisation, which is a
key element of CS (cf. section A2.4). In particular, reference [154] describes a
fast iterative shrinkage/thresholding (IST) algorithm which leads to an accurate
force reconstruction for impulses and harmonic loads. The same approach is
employed in [151], where different types of shape functions are compared in
order to find the best force impact representation. Furthermore, the problem of
impact identification and location is solved in [152] by a two-step IST algorithm,
allowing the characterisation of one or two force impulses within a set of nine
candidate locations. Finally, reference [153] describes the sparse deconvolution
method for the reconstruction of impact forces in case of large scale ill-posed
inverse problems.
To summarise, CS is employed in the state of the art of estimation problems based
on a single step estimator to improve the observability of a system, thus reducing
the number of required measurements. Furthermore, CS allows to model an
input by describing it by several types of shape functions. Such modelling
approach does not present dynamic range limitations in comparison with the
RW and the PA. However, a crucial requirement of an input representation
based on CS is the fact that the input needs to be sparse.
A1.7.5 Discussion regarding input models
Table A1.1 summarises the major features of the input models that we discussed
in this section (we omitted the NI approach). The features marked with a cross
48 STATE ESTIMATION
Table A1.1: Comparison of input models.
Feature RW PA CS
known input location 4 4 4
unknown input location (8) (8) 4
single step estimator 4 8 8
multistep estimator 4 4 4
slow input dynamics 4 4 4
fast input dynamics (e.g., impulse) 8 8 4
lumped input 4 4 4
distributed input (8) (8) 4
based on a model 8 4 4
unknown input shape 4 4 8
between brackets indicate that in practice it is not possible for the RW and
the PA approaches to deal with unknown input locations and distributed loads
because of observability issues, even if in theory one could discretise the system
and apply a model to each node. From Table A1.1 we understand that the
estimation of lumped inputs with slow dynamics applied at a known location
is in general not a critical task, while the same does not apply to distributed
inputs with fast dynamics applied at unknown locations. We note that the CS
can provide a solution to this problems provided that the input shape is known
and the estimation problem is formulated on a time window instead of on a
single step.
A1.8 Observability and matrix condition number
The concept of observability is fundamental to estimation problems since it

defines whether it is theoretically possible to determine the initial conditions
of a system given a set of measurements. For linear systems, observability for
continuous-time and discrete-time cases is defined as follows [183]:
Definition A1.20 (Observability of a continuous-time system). A continuous-
time system is observable if for any initial state x(0) and any final time t > 0
OBSERVABILITY AND MATRIX CONDITION NUMBER 49
the initial state x(0) can be uniquely determined by the knowledge of the input
u(τ ) and output y(τ ) ∀ τ ∈ [0, t].
Definition A1.21 (Observability of a discrete-time system). A discrete-time
system is observable if for any initial state x0 and any final time k the initial
state x0 can be uniquely determined by the knowledge of the input ui and output
yi ∀ i ∈ [0, k].
Def. A1.20 is more strict than Def. A1.21. In fact, for continuous-time systems
the initial state must be able to be determined at any final time. The definition
of observability involves the initial state, and implies that all states between the
initial and final times can also be determined [183]. Literature offers several
equivalent tests to assess observability for continuous-time and discrete-time
LTI systems. Here we limit our discussion to the observability matrix (also
referred to as the Kalman observability matrix), while we refer to [114, 183] for
further theorems.
Definition A1.22 (Observability matrix). The observability matrix of a

continuous-time LTI system of type (A70) is defined as Eq. (A71).
ẋ(t) = AC x(t) + BC uk (t) + w(t) (A70a)
y(t) = CC x(t) + DC u(t) + v(t) (A70b)
 
CC
 CC AC 
O =  .. (A71)
 

 . 
CC AnCx −1
Theorem A1.23 (Observability based on the observability matrix). The nx

state continuous-time LTI system (A70) is observable if and only if it satisfies
condition (A72).
rank(O) = nx (A72)
Similar considerations in discrete-time can be derived for Eq. (A33). The

observability test based on the observability matrix in continuous time or
discrete-time can be extended to time-variant systems. On the other hand,
observability for nonlinear systems is not simple to formalise [183].
50 STATE ESTIMATION
A possible test which relates to the topic of observability involves rank and
condition number of the discrete-time system, when we consider the problem
as an overdetermined weighted least square fitting. Those values establish a
link between observability and estimation uncertainty, and can be used to tune
estimation performance [79, 105]. Furthermore, such approach is suited to
estimators and inverse problems in general, thus including multistep estimators
such as the MHE. We would like to stress the fact that a test of rank and
condition number does not access observability as it was defined in Defs A1.20
and A1.21. However, those two metrics are of paramount importance to address
whether a system is ill-posed. In particular, we point out the following:
• a matrix that corresponds to an unobservable system is not full rank, and

consequently its condition number tends to infinity;
• in case of a full rank matrix, its condition number is an indicator for
numerical (in)stability, and can be employed for judging and comparing
different system settings such as sensor configurations (sensor type,
location, amount). It is worth mentioning that it is possible for an
observable system to have a very high condition number which jeopardises
the estimation problem. In fact, a mere observability assessment does not
provide any information concerning the quality of the estimation.
The (`2 -norm) condition number of a matrix is defined as the ratio of the largest
singular value of the matrix to the smallest. For arbitrary-size matrices we can
obtain the singular values through a singular value decomposition (SVD), while
for a square matrix we can consider its eigenvalues. Following this line, the
Popov-Belevitch-Hautus (PBH) criterion for local observability (Thm. A1.24)
[60, 114, 134] provides us with more information since it takes into account
every eigenvalue λ of the continuous-time state-space matrix AC . We refer to
[38] and referenced therein further metrics and considerations related to the
topic of observability.
Theorem A1.24 (PBH observability). The nx state continuous-time LTI
system (A70) is observable over either R or C if and only if it satisfies
condition (A73).

AC − λI
rank = nx ∀ λ∈C (A73)
CC
Being observability a crucial feature in estimation problems, it is worth to

mention how to enhance it. Let us consider a generic LTI system and assume
it is observable, i.e., it satisfies Eq. (A72). Next, let us suppose to employ the
CONCLUSIONS 51
system in a joint state/input estimation context, i.e., we augment the system

with the inputs (cf. section A1.6). The number of inputs which we can estimate
while preserving observability is limited, and consequently a certain amount of
inputs jeopardises the observability of the system [104, 105]. In such situation,
we can try to restore observability by choosing one or more actions among the
following:
1. add measurements;
2. implement or change an input model (cf. section A1.7);
3. keep the number of inputs within the observability threshold;
4. reduce the model complexity.
In general we prefer to avoid points 3–4, since they provide us with less
information. On the other hand, solution 1 is relatively easy to implement,
and for this reason it is widely employed. The major concerns regard economic
constraints (measurements may be expensive due to the costs of transducers and
acquisition systems) and the fact that certain measurements which are needed
to increase observability may not be available. In fact, specific sensors may
not exist or cannot be physically mounted on a given set-up. Finally, point 2
suggests to draw the attention to input models, going in the direction of virtual
sensing and advanced model based estimation techniques. The CS-MHE follows
this concept, since it exploits known input information in order to implement
a tailored modelling approach. This requires on the one hand some effort to
build the models, but on the other hand it reduces the need of an overwhelming
amount of measurements.
In chapter B2 we will address rank and condition number of the CS-MHE
[105]. Besides observability, the state of the art of control engineering and state
estimation offers additional system properties such as stability, detectability,
reachability and controllability. Throughout this dissertation we do not employ
those definitions, and therefore we refer to [5, 12, 33, 34, 53, 93, 183] for further
information.
A1.9 Conclusions
In this chapter we introduced the concepts of state-space representation and

state estimation, underlining their role in control engineering and virtual sensing.
Furthermore, we outlined a few concepts of probability theory, which constitute
the foundation of many (model based) estimation techniques. We summarised
52 STATE ESTIMATION
formulations, advantages and disadvantages of single step estimators (e.g., the

KF and its nonlinear extensions) and of the MHE. Moreover, we outlined four
approaches for input estimation and we introduced the topics of joint estimation
and observability. The formulation of the MHE (section A1.5), the principles of
joint estimation (section A1.6) and the concept of observability (section A1.8)
constitute three milestones of the CS-MHE, which we will recall in part B of
this dissertation.
After introducing several state of the art estimators and input models, we can
clarify the motivation for the development of the CS-MHE as well as che choice
of its two main ingredients, i.e., the MHE and CS. We summarise them in the
following list:
• The estimation of unknown inputs is not a simple task and remains an

engineering challenge. In particular, (1) observability issues may arise
when we want to estimate multiple inputs, (2) classical estimators require
a high sampling rate, and (3) typical modelling approaches are not based
on physical input models.
• CS is a convenient modelling tool to cope with all three points above,

since it leads to better observability, it allows for modelling a fast dynamic
input, and exploits the knowledge of the input shape.
• the MHE can deal with nonlinear systems, it is more robust than the EKF
in case of local minima, and its computational effort is acceptable (we
will give further details in chapter A2). Moreover, a multistep estimator
such as the MHE is a natural framework for CS, since it allows to exploit
sparsity within a time window (and not only at a single time step), i.e.,
we can profit from sparsity in time and space.
Consequently, joining the MHE with CS into the CS-MHE seemed very
promising to us. We will describe the derivation of the CS-MHE in part B
of this dissertation, while in the next chapter we introduce a few concepts of
optimisation.
Chapter A2
Optimisation
This chapter provides an overview to the topic of nonlinear optimisation, which

is crucial to the CS-MHE since each iteration consists of one optimisation
problem. We begin with a few definitions which allow to outline some optimality
conditions for (convex) optimisation problems in section A2.1. Next, we focus
on convex optimisation in section A2.2, which forms a particular class of
problems into which we cast the CS-MHE. We continue in section A2.3 by
presenting the fundamentals of two relevant numerical methods for solving
nonlinear programming, i.e., interior points methods and sequential quadratic
programming. Furthermore, in sections A2.4 and A2.5 we tackle two situations
which are typical of compressive sensing, i.e., `1 -norm optimisation (including
`1 -norm regularisation problems, such as the LASSO problem) and complex
variables. In section A2.6 we introduce a covariance matrix for constrained
estimation problems, and finally we summarise the conclusions in section A2.7.
Acknowledgements
This chapter is an overview of the state of the art on the topic. Great sources of
inspiration for writing this chapter were [21, 54, 137, 204]. We are grateful to
the developers of SPGL1 [193, 194] and CVX [67, 68], which are two packages
compatible with MATLAB® for specifying and solving convex programs that
helped us during our first steps in the world of numerical optimisation.
53
54 OPTIMISATION
A2.1 An introduction to nonlinear programming
Problem (A74) defines a generic nonlinear programming (NLP) problem in

standard form, where z ∈ Rnz is the vector of optimisation variables (or decision
variables), f : Rnz → R is the objective function (or cost function) and functions
g : Rnz → Rneq and h : Rnz → Rnieq define equality and inequality constraints,
respectively.
minimise f (z) (A74a)

z
subject to g(z) = 0 (A74b)
h(z) ≤ 0 (A74c)
Let us first provide a few basic definitions and theorems.

Definition A2.1 (Index set of equality constraints). The index set of
equality constraints is defined as follows:
E = {1, . . . , neq }. (A75)
Definition A2.2 (Index set of inequality constraints). The index set of

inequality constraints is defined as follows:
I = {1, . . . , nieq }. (A76)
Definition A2.3 (Feasible set). The feasible set (or domain) is defined as
follows:
F = {z | g(z) = 0, h(z) ≤ 0}. (A77)
Definition A2.4 (Feasible point). A point z is called feasible if and only if
z ∈ F. Problem (A74) is feasible if a feasible point exists.
Definition A2.5 (Global optimum). A point z ∗ is called global optimum
(or, in this case, global minimum) if and only if
f (z ∗ ) ≤ f (z), ∀ z ∈ F. (A78)
Definition A2.6 (Local optimum). A point z ∗ is called local optimum (or,

in this case, local minimum) if and only if there exists an open ball of radius
around z ∗ , denoted by B (z ∗ ), such that
f (z ∗ ) ≤ f (z), ∀ z ∈ F ∩ B (z ∗ ). (A79)
AN INTRODUCTION TO NONLINEAR PROGRAMMING 55
Finding the global optimum to problem (A74) is in general more challenging

than finding a local optimum [51]. In this chapter we will present sufficient
conditions for local optimality, which become relevant for global optimality in
the special case of convex optimisation problems, defined as follows.
Definition A2.7 (Convex optimisation problem). An optimisation problem of

the form (A74) is convex if it has a convex feasible set F and the cost function
f : F → R is convex.
Theorem A2.8. For a convex optimisation problem, every local minimum is
also a global minimum.
Proof. The proof can be found in [137].

Definition A2.9 (Active constraint). A constraint hi : Rnieq → R, i ∈ I is
called active if and only if hi (z) = 0 for a feasible point z ∈ F. This definition
implies that all equality constraints gi : Rneq → R, i ∈ E are active in any
feasible point z ∈ F.
Definition A2.10 (Active set). The active set A(z) for a feasible point z ∈ F
is defined as the set of the indices related to all active constraints ( i.e., equality
and inequality constraints), such that
A(z) = E ∪ {i | hi (z) = 0} ⊆ I. (A80)
Definition A2.11 (Lagrangian function, Lagrange multipliers). The La-

grangian function for the NLP (A74) is defined as follows:
L(z, λ, µ) = f (z) − λ> g(z) − µ> h(z), (A81)
where vectors λ ∈ Rneq and µ ∈ Rnieq are referred to as Lagrange multipliers

(or dual variables), and can be interpreted as shadow prices, i.e., they define
the cost improvement that one could obtain by relaxing the constraints.
Dual variables are employed to build the so-called dual problem [137]. On the
other hand, the optimisation variables z are also referred to as primal variables,
and problem (A74) is called the primal problem.
A2.1.1 Local optimality conditions
In this section we define the conditions for local optimality. Let us first introduce
the concept of constraint qualification, which is necessary to ensure that the
constraint linearisation captures the essential geometric features of the feasible
56 OPTIMISATION
set in a neighbourhood of z ∗ [137]. Among the existing constraint qualifications,

we present the so-called linear independence constraint qualification, which is a
rather strong constraint qualification. Further (and weaker) qualifications can
be found in [137].
Definition A2.12 (Linear independence constraint qualification). A feasible

point z ∈ F satisfies the linear independence constraint qualification
(LICQ) if and only if the gradients ∇gi (z), i ∈ E and the gradients ∇hi (z),
i ∈ I ∩ A(z) are linearly independent.
We can now introduce the following theorem concerning the first order necessary
conditions for local optimality.
Theorem A2.13 (Karush-Kuhn-Tucker (KKT) conditions). Let z ∗ be a local
solution to problem (A74) that satisfies the LICQ. Then there exist Lagrange
multipliers λ∗ and µ∗ such that the KKT conditions (A82) hold.
∇z L(z ∗ , λ∗ , µ∗ ) = 0 (A82a)
g(z ∗ ) = 0 (A82b)
h(z ∗ ) ≤ 0 (A82c)
µ∗ ≥ 0 (A82d)
µ∗i hi (z ∗ ) = 0, i = 1, . . . , nieq (A82e)

Definition A2.14 (KKT point). A point (z ∗ , λ∗ , µ∗ is called KKT point
(or primal-dual solution) if it satisfies the KKT conditions (A82).
Conditions (A82e) are called complementarity conditions and imply that either
constraint hi (z ∗ ) is active (i.e., hi (z ∗ ) = 0) or µ∗i = 0, or possibly both.
Definition A2.15 (Weakly active constraints and strictly active constraints).
Let point (z ∗ , λ∗ , µ∗ ) be a KKT point of problem (A74). If for an index
i ∈ A(z) it holds µ∗i = 0 then hi (z ∗ ) is said to be weakly active, and i ∈ Aw (z).
Alternatively, if µ∗i > 0 then hi (z ∗ ) is said to be strictly active, and i ∈ As (z).
Definition A2.16 (Strict complementarity). Let point (z ∗ , λ∗ , µ∗ ) be a KKT
point of problem (A74). If all active constraints are strictly active, i.e., if µ∗i > 0
for all i ∈ I ∩ A(z ∗ ), then strict complementarity holds.
AN INTRODUCTION TO NONLINEAR PROGRAMMING 57
Theorem A2.17 (Uniqueness of the Lagrange multipliers). Let point

(z ∗ , λ∗ , µ∗ ) be a KKT point of problem (A74). If strict complementarity
and LICQ hold, then the Lagrange multipliers (λ∗ , µ∗ ) are unique.
The KKT conditions (A82) characterise critical points, while second derivative
information allows to further distinguish undesirable candidates such as saddle
points [137] and can also be employed to state sufficient conditions for local
optimality.
Definition A2.18 (Jacobian of the strictly active constraints). The Jacobian
of the strictly active constraints is defined as
>
Gsa = ∇g(z ∗ ) ∇hAs (z∗ ) (z ∗ ) (A83)

.
Definition A2.19 (Nullspace of the strictly active constraints). The nullspace

of the strictly active constraints is defined as
Ns = {q | Gsa q = 0}. (A84)
Theorem A2.20 (Second order necessary optimality conditions). Let point

(z ∗ , λ∗ , µ∗ ) be a KKT point of problem (A74) satisfying the LICQ. Then the
Hessian of the Lagrangian is positive semidefinite on the nullspace of the strictly
active constraints, i.e.,
q > ∇2z L(z ∗ , λ∗ , µ∗ )q ≥ 0 ∀ q ∈ Ns .

Theorem A2.21 (Second order sufficient optimality conditions). Let point
(z ∗ , λ∗ , µ∗ ) be a KKT point of problem (A74) satisfying the LICQ. If the
Hessian of the Lagrangian is strictly positive definite on the nullspace of the
strictly active constraints, i.e., if
q > ∇2z L(z ∗ , λ∗ , µ∗ )q > 0 ∀ q ∈ Ns \ {0},
then w∗ is a local minimiser of problem (A74).
The definitions and theorems regarding local optimal conditions that we included
in this section want to give an overview to the topic, and do not pretend to be
58 OPTIMISATION
exhaustive. The interested reader can refer to specific optimisation textbooks

such as [137]. For what this dissertation is concerned, the most important
message is included in Thm. A2.8, i.e., for a convex NLP every local minimiser
is also a global minimiser. Moreover, if the reduced Hessian (i.e., a projection of
the Hessian of the Lagrangian onto the space of free variables [137]) is strictly
positive definite, there exist a unique KKT point which is the unique global
minimiser [21, 137]. In the convex case there are further useful relationships
between the primal and the dual problem, that can be found in [137].
A2.2 Convex programming problems
Convex programming problems (CPs) are an NLP subclass which derives from
problem (A74) under the additional assumptions of Def. A2.7. In this section
we list a few noteworthy types of CPs.
Quadratic programs (QPs) are crucial in the framework of state estimation
problems based on an MHE, which include the CS-MHE that we will discuss
in chapter B1. They are relatively easy to solve and for this reason they often
appear in real time optimal control problems such as MPC [54]. Formally, a
generic QP has the standard form of problem (A85), where z ∈ Rnz , H ∈ Rnz ×nz
is symmetric, q ∈ Rnz , Aeq ∈ Rneq ×nz , beq ∈ Rneq , Aieq ∈ Rnieq ×nz , bieq ∈ Rnieq .
We assume that the QP is convex, i.e., H ≥ 0.A9
1 >
minimise z Hz + q > z (A85a)
z 2
subject to Aeq z = beq (A85b)
Aieq z ≤ bieq (A85c)
Linear programs (LPs) represent the simplest type of convex optimisation

problems. The objective and constraint functions are all affine, and a generic
LP can be obtained from problem (A85) by omitting the quadratic part of the
cost function, i.e., H = 0.
Within the other forms of CPs which are relevant in the context of real
time dynamic optimisation [54], we mention the class of the so-called
quadratically constrained quadratic programming problems (QCQPs), which
extend problem (A85) by a set of convex quadratic constraints as illustrated
A9 We use notation H ≥ 0 to indicate that matrix H is positive semidefinite.
NUMERICAL METHODS FOR NONLINEAR PROGRAMMING 59
in Eq. (A86), where 0 ≤ Qi ∈ Rnz ×nz is symmetric and qi ∈ Rnz for all
i ∈ {1, . . . , nieq }.
z > Qi z + qi> z ≤ 0 ∀ i ∈ {1, . . . , nieq } (A86)
The last family of CPs that we indicate here is the second-order cone program
(SOCP), which is closely related to QCQPs [21]. In fact, the only difference is
the form of the inequality constraint, which we illustrate in Eq. (A87) for an
SOCP, where z ∈ Rnz and Ai ∈ Rni ×nz . We call such constraint a second-order
cone constraint [21].
kAi z + bi k2 ≤ c>
i z + di ∀ i ∈ {1, . . . , nieq } (A87)
SOCPs are more general than QCQPs, and will become relevant in the framework
of the CS-MHE in chapter B1, where we deal with the estimation of inputs
represented by complex shape functions (cf. section B1.4). A list of the
properties of CPs is given in [21]. For our purposes we just remind that every
local solution of a CP is also a global solution (Thm. A2.8), and CPs are
generally considered significantly more tractable than (non-convex) NLPs [54].
A2.3 Numerical methods for nonlinear program-

ming
In this section we briefly outline two popular numerical methods for solving
NLP problems of the form of problem (A74). Further approaches and details
can be found in references such as [21, 137] and references therein. Interior
point (IP) methods (section A2.3.1) and sequential quadratic programming
(SQP) (section A2.3.2) are both employed in the frameworks of MHE and
CS. Both methods aim at finding a local minimum by computing a KKT
point v ∗ = (z ∗ , λ∗ , µ∗ ) of a given NLP such as problem (A74). If the KKT
conditions (A82) were smooth, we could compute a KKT point by applying
Newton’s method. Unfortunately, Eqs. (A82c–A82e) are nonsmooth and need
special care [204].
A2.3.1 Interior point methods
IP methods (also referred to as barrier methods) handle inequality constraints

by means of a barrier function scaled by a barrier parameter. They initially
60 OPTIMISATION
set the barrier parameter to a high value and progressively reduce it as the
algorithm converges. IP methods are among the most popular classes of NLP
algorithms, since they are proven to be highly competitive especially for convex
problems. Moreover, the recent evolution of compressive sensing is said to have
occurred thanks to some breakthroughs concerning IP methods [36].
One main disadvantage of IP is the need of always going back to the so-called
central path [137], even if an initial guess very close to the solution is available
[204], i.e., the solver needs to start with a large barrier parameter even if the
solution is very close to the initial guess, and then perform some iterates which
drift away from the solution before going back to it. This fact can be limiting if
one is interested in solving a series of parametric problems for small variations
of the parameter. In such framework SQP (cf. section A2.3.2) is typically much
faster, as it can be warm-started [204].
In IP methods, a vector of slack variables s ∈ Rnieq is usually introduced and
the inequality constraints (A74c) are reformulated as shown in Eq. (A88).
h(z) − s = 0 with s ≥ 0 (A88)
Primal interior point methods implement Eq. (A88) and add a barrier
Pnieqfunction
to the cost. Problem (A89) shows a logarithmic barrier, i.e., −τ i=0 log(si ).
For τ = 0 the barrier function becomes the indicator function i(s), as shown in
Eq. (A90).
nieq
X
minimise f (z) − τ log(si ) (A89a)
z,s
i=0
h(z) − s = 0 (A89c)
0 for s ≥ 0

lim −τ log(s) = i(s) = (A90)
τ →0 ∞ otherwise
The KKT conditions of problem (A89) are given in Eq. (A91), after having
defined S = diag(s) and e = 1> ∈ Rnieq . Conditions s∗i ≥ 0 and µ∗i ≥ 0 must
be enforced by selecting an appropriate step size [137].
∇f (z ∗ ) − ∇g(z ∗ )> λ∗ − ∇h(z ∗ )> µ∗ = 0 (A91a)

−1
−τ (S ∗ ) e + µ∗ = 0 (A91b)
g(z ∗ ) = 0 (A91c)
h(z ∗ ) − s∗ = 0 (A91d)
Starting from a rather large value for τ , the IP approach consists of iteratively
solving problem (A89) and decreasing the value of the barrier parameter τ .
Once τ reaches a prescribed tolerance, the problem is solved one last time and
the solution is given as output.
Two main disadvantages of primal interior point methods are the need of finding
a feasible initial guess, as well as dealing with Eq. (A91b), which becomes quite
nonlinear near the solution, as s → 0. Primal-dual interior point methods
tackle the latter issue by multiplying Eq. (A91b) by matrix S ∗ and solving the
equivalent system (A92), with s∗ ≥ 0 and µ∗ ≥ 0. We note that if τ = 0 we get
the standard KKT conditions (A82) [204].
∇f (z ∗ ) − ∇g(z ∗ )> λ∗ − ∇h(z ∗ )> µ∗ = 0 (A92a)
−τ e + S ∗ µ∗ = 0 (A92b)
g(z ∗ ) = 0 (A92c)
h(z ∗ ) − s∗ = 0 (A92d)
A2.3.2 Sequential quadratic programming
Differently from IP methods, SQP does not relax complementary slackness

Eq. (A82e), but computes the step by using a local quadratic approximation
of the NLP and solving a QP at each iteration. Starting from an initial
guess v (0) = (z (0) , λ(0) , µ(0) ), SQP computes a sequence of points v (k) using
Eq. (A93), with α(k) ∈ (0, 1] and ∆v (k) defined in Eq. (A94), obtained as the
solution (∆z (k) , λ̃(k) , µ̃(k) ) of the QP subproblem (A95).
v (k+1) = v (k) + α(k) ∆v (k) (A93)

62 OPTIMISATION
∆v (k) = (∆z (k) , λ̃(k) − λ(k) , µ̃(k) − µ(k) ) (A94)
1 > (k)
minimise ∆z H ∆z + ∇f (z (k) )> ∆z (A95a)
∆z∈Ω(k) 2
subject to g(z (k) ) + ∇g(z (k) )> ∆z = 0, (A95b)
h(z (k) ) + ∇h(z (k) )> ∆z ≥ 0. (A95c)
Several types of SQP exist, which distinguish themselves in the computations

of the step size α(k) , the so-called Hessian matrix approximation H (k) and
the choice of the set Ω(k) ∈ Rnz . Under suitable choice of H (k) , α(k) , Ω(k)
and the initial guess v (0) , SQP is guaranteed to converge to v ∗ [137]. The
generalisation of SQP is sequential convex programming (SCP), where the
sequence of subproblems involves any type of convex problem. This can be
advantageous in case of constraints involving norms or matrix inequalities [204].
Solvers for quadratic programs
A few approaches are available for solving QP (sub)problems. Here we limit

the discussion by mentioning two second-order methods (active-set methods
and interior point methods) and first-order methods in general. Second-order
methods employ the Hessian of the QP problem to compute the optimal solution.
Starting from an initial guess of the active set A(0) , primal active-set methods
iteratively solve the linear system (A96). Its primal solution ∆z̃ (k,i) is then
used to refine the guess on the active set using Eq. (A97). The step length
α(i) ∈ [0, 1] is chosen to be the largest possible value for which all constraints
are satisfied, and can be computed explicitly in a cheap way [204].
(k,i)
H (k) ∆z̃ (k,i) + ∇f (z (k) ) − ∇g(z (k) )λ̃(k,i) − ∇hA(i) (z (k) )µ̃A(i) = 0 (A96a)
g(z (k) ) + ∇g(z (k) )> ∆z̃ (k,i) = 0 (A96b)
hA(i) (z (k) ) + ∇hA(i) (z (k) )> ∆z̃ (k,i) = 0 (A96c)
∆z (k,i) = ∆z (k,i−1) + α(i) (∆z (k,i) − ∆z̃ (k,i−1) ) (A97)

IP methods use a smooth approximation of Eqs. (A82c–A82e) (cf. sec-

tion A2.3.1). Similarly to the NLP case, also for the QP case an iterative
procedure starts with a large value for the barrier parameter τ and progressively
reduces it. The main advantage of IP solvers over active-set solvers is that there
is no need to identify any active set. The main disadvantage is that they cannot
exploit an initial guess to be warm-started, and consequently they cannot reduce
the number of iterations needed to converge.
First-order methods neglect second-order information in favour of cheaper
computations. They typically require more iterations to converge to the solution,
but these can be so cheap that first-order methods become competitive for
specific applications [204].
Hessian approximates and derivative computation
The solution of an NLP using either an SQP or an IP method requires to compute

the (exact or approximated) Hessian of the Lagrangian. Since computing the
exact Hessian of the Lagrangian can be computationally demanding, in some
cases using an approximated Hessian H (k) might yield faster computations. The
method will in general need more iterations to converge, but each iteration will
be computationally cheaper. For an exhaustive discussion regarding the possible
approaches for computing an approximated Hessian we refer to [54, 137, 161, 204]
and references therein.
The optimisation approaches we mentioned in this chapter require also the
evaluation of derivatives. In fact, the computation of a descent direction needs
at least the first-order derivatives of functions f , g and h of a generic NLP
such as problem (A74). There exist several ways for computing the derivatives,
including finite differences, algorithmic differentiation (AD) and the complex-
step derivative. For further details we refer the interested reader to [3].
A2.3.3 Towards real time applications
The core of this dissertation is the development of the CS-MHE, which is a

methodology for joint state/input estimation based on an MHE scheme and on
CS. These two state of the art techniques involve many numerical aspects that
we outlined in this chapter as well as the solution of the integration problem
(cf. section A1.1). Although we focused on the methodology rather than on its
efficient implementation, in this section we want to foresee a possible online (real
time) implementation of the CS-MHE. We believe that the interested reader
64 OPTIMISATION
will find inspiration and excellent argumentations by screening the literature

that we indicate in the remaining part of this section.
The CS-MHE can benefit from all the state of the art methodologies that
are currently being developed by the communities of nonlinear optimisation
and optimal control, since the problem structure corresponds to the classical
MPC/MHE implementation schemes. Online applications of fast dynamic
systems are characterised by tight timing requirements, which may not be met
by any available software. For this reason, tailored algorithms and QP solvers
have been developed specifically for real time optimal control and estimation
[1, 41, 50, 55, 56, 142, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165].
In the context of real time dynamic optimisation, the time needed to solve the
NLP is often not negligible compared to the sampling time. If the computation
of the NLP solution lasts longer than the sampling time, some information
become outdated [204]. To avoid this issue, the so-called real time iteration
(RTI) scheme for fast MPC/MHE computes approximate solutions using the
most up-to-date information rather than solving the MPC/MHE NLP problem
to convergence using outdated information [40]. It relies on the fast contraction
properties of Newton-type methods in order to track the exact NLP solution.
In this way, convergence is obtained while the system evolves in time. The RTI
scheme consists of a modified SQP method which computes only one full SQP
step per sampling time and keeps the initial state in MPC as a constraint in
the NLP formulation [204]. The interested reader will find a few concepts for
efficient MPC/MHE formulations in [204], and further related mathematical
and numerical details in [54, 156] and references therein.
A2.4 Norm approximation problems
Compressive sensing (cf. section A1.7.4) requires the solution of an `1 -norm

optimisation problem, which is a convex programming problem (cf. section A2.2)
that promotes a sparse solution. We will discuss compressive sensing in
detail in chapter A3, while in this section we highlight the properties of
the `1 -norm minimisation in comparison to the more widely employed `2 -
norm minimisation. Let us begin by introducing problem (A98), which is
an elementary unconstrained norm approximation problem, with z ∈ Rnz ,
A ∈ Rnm ×nz and b ∈ Rnm . Notation k · k2 indicates the `2 -norm, or Euclidean
norm. A solution of the norm approximation problem (A98) is sometimes called
an approximate solution of Az ≈ b in the `2 -norm [21].
minimise kAz − bk2 (A98)

z
NORM APPROXIMATION PROBLEMS 65
Problem (A98) is convex and solvable [21]. Moreover, we assume without loss
of generality that the nz columns of A are independent, and nm > nz , i.e., the
system is underdetermined. Note that if nm = nz the optimal solution is simply
z = A−1 b, while the case of nm < nz falls out the purposes of this discussion.
Problem (A98) is a regression problem, and can assume multiple interpretations
such as approximation, estimation, projection, design [21]. The most common
approximation problem is the least-squares approximation problem of Eq. (A100),
and involves squaring a cost function consisting of an `2 -norm. The objective
is the sum of squares of the residuals, which are defined in Def A2.22. The
resulting problem is Eq. (A100a), which we can solve analytically by calling
the Moore-Penrose pseudoinverse, i.e., z = (A> A)−1 A> b, which is obtained
starting from the formulation given in Eq. (A100b). We note that this is exactly
the philosophy upon which LS estimators are based (cf. section A1.3).
Definition A2.22 (Residual). Vector
r = Az − b (A99)
is called the residual for problem (A98).
nm
minimise kAz − bk22 = minimise ri2
X
(A100a)
z z
i=1
= minimise z > A> Az − 2b> Az + b> b (A100b)

z
When the `1 -norm is used, the resulting norm approximation problem assumes
the form of Eq. (A101), and it represents the sum of the absolute residuals [21].
nm
X
minimise kAz − bk1 = minimise |ri | (A101)
z z
i=1
Problem (A101) can be cast as an LP as shown by Eq. (A102), where s ∈ Rnm

is a slack variable.
minimise 1> s (A102a)

s
subject to − s ≤ Az − b ≤ s (A102b)
66 OPTIMISATION
A generic norm approximation involves the `p -norm, for 1 ≤ p < ∞, resulting

in problem (A103). For convenience we consider the equivalent problem (A104),
which is a separable and symmetric function of the residuals, and the cost
function depends only on the amplitude distribution of the residuals, i.e., the
residuals in sorted order.
nm
!1/p
X
minimise p
|ri | (A103)
z
i=1
nm
X
minimise |ri |p (A104)
z
i=1
Eq. (A105) gives a useful generalisation of the `p -norm approximation problem,

where φ : R → R is the (residual) penalty function. For this reason,
problem (A105) is referred to as the penalty function approximation problem,
and it is assumed to be convex. A solution vector z leads to an approximation
Az of b, which is associated to a residual vector r. The total penalty (or cost) is
the sum of the penalties for each residual φ(ri ), as shown in Eq. (A105a) [21].
nm
X
minimise φ(ri ) (A105a)
z
i=1
subject to r = Az − b (A105b)
It is crucial to note that the shape of the penalty function influences the solution
of problem (A105), i.e., φ(u) is a measure of dislike of a residual of value u.
We can stress this aspect by showing the comparison between the `1 -norm and
`2 -norm penalty function approximation problems [21]. The penalty functions
associated with the `1 -norm and `2 -norm are φ1 (u) = |u| and φ2 (u) = u2 ,
respectively. The subscript refers to the norm type. For |u| = 1, the two penalty
functions assign the same penalty. For small u we have φ1 (u) φ2 (u), i.e., the
`1 -norm approximation puts larger emphasis on small residuals compared to
`2 -norm approximation. On the other, large u result in φ2 (u) φ1 (u), i.e.,
the `1 -norm approximation puts less weight on large residuals, compared to
`2 -norm approximation. This difference in relative weightings for small and
large residuals is reflected in the solutions of the associated approximation
problems. The amplitude distribution of the optimal residual for the `1 -norm
approximation problem will tend to have more zero and very small residuals,
compared to the `2 -norm approximation solution. In contrast, the `2 -norm
NORM APPROXIMATION PROBLEMS 67
60
40
20
0
-3 -2 -1 0 1 2 3
10
0
-3 -2 -1 0 1 2 3
Figure A2.1: Histogram of residual amplitudes for `1 -norm and `2 -norm penalty
functions, with the (scaled) penalty functions also shown for reference. For this
example nm = 100 and nz = 50.
solution will tend to have relatively fewer large residuals [21]. This fact becomes
clear in Fig. A2.1, for an example involving A ∈ R100×50 and b ∈ R100 chosen
from a normal distribution.
From Fig. A2.1 we notice that the `1 -norm approximation generates many very
small (or even exactly zero) optimal residuals. This means that in `1 -norm
approximation we typically find that many of the equations are satisfied exactly
[21]. We refer to this phenomenon by defining the concept of sparsity, from
which we infer that the `1 -norm promotes sparsity.
Definition A2.23 (Sparsity). The sparsity S of a vector is defined as the
number of its nonzero elements.
Definition A2.23 states that enhancing the sparsity of a vector corresponds to

decreasing S. In other words, low values of S indicate good sparsity. Further
representations of the difference between norms (especially `2 -norm and `1 -
norm) can be found in compressive sensing publications such as [10, 27, 150,
178, 205, 206]. Up to this point we presented the main features of the `1 -norm
starting from the unconstrained problem (A101). Constrained problems lead to
similar considerations, provided that the constraints are convex such that the
problems are also convex. More details can be found in [21].
For the purposes of this dissertation, a further interesting case regards bi-
criterion formulations that involve an `1 -norm term. Such problems belong to
the family of multi-objective optimisation problems and are widely employed
in denoising and regularisation schemes. An example is the regularization of
68 OPTIMISATION
an Euclidean norm with an `1 -norm, as shown in Eq. (A106). By varying the

parameter λ we can sweep out the optimal trade-off curve between kAz − bk2
and kzk1 , which serves as an approximation of the optimal trade-off curve
between kAz − bk2 and the sparsity of z. Problem (A106) can be recast and
solved as an SOCP (cf. section A2.2) [21].
minimise kAz − bk2 + λkzk1 (A106)

z
The last relevant problem for the purposes of this thesis is problem (A107),
which differs from (A106) since the Euclidean norm is squared. We will show
in chapter B1 that in the simplest case it leads to a QP (cf. section A2.2).
minimise kAz − bk22 + λkzk1 (A107)

z
A least-squares approximation that include an `1 -norm regularisation such as

Eq. (A107) is also known under the name LASSO (least absolute shrinkage and
selection operator) [21, 189], or basis pursuit (BP, cf. section A3.2) [7], and the
idea behind it was already suggested in [175]. A problem in LASSO is that the
objective function is not differentiable, and hence optimisation techniques such
as QP or NLP in general are required. There exist a few approaches to solve
the LASSO problem. For example, reference [96] proposes a gradient descent
algorithm. For what this dissertation is concerned, we will recast the CS-MHE
into a QP or an SOCP, since computational speed and real time applications do
not constitute our primal interest. However, we refer to [141] for a performance
comparison of different solution methods.
A2.5 Complex optimisation variables
In chapter B2 we will face the situation in which complex optimisation variables

allow to project a signal onto a set of complex basis functions. In this section
we show how to recast the resulting (complex) optimisation problem into a
standard SOCP [101, 120]. Let us consider problem (A107) in case of complex
variables, i.e., z ∈ Cnz , A ∈ Cnm ×nz , b ∈ Cnm . Eq. (A108) gives explicitly
the `1 -norm term of the optimisation variable.
nz
X
kzk1 = <(zi )2 + =(zi )2 (A108)
p
i=1
COVARIANCE MATRIX OF CONSTRAINT OPTIMISATION PROBLEMS 69
Problem (A107) can be reformulated as the convex problem (A109), where the
new variables are defined in Eq. (A110), and s ∈ Rnz is a slack variable.
minimise kÃz̃ − b̃k22 + λ1> s (A109a)

z̃,s
subject to <(z)2 + =(z)2 ≤ s (A109b)

p

<(z)
z̃ = ∈ R2nz (A110a)
=(z)

<(A) −=(A)
Ã = ∈ R2m×2nz (A110b)
=(A) <(A)

<(b)
b̃ = ∈ R2nz (A110c)
=(b)
A2.6 Covariance matrix of constraint optimisation

problems
When dealing with estimation problems, it is important to assign a confidence

level to the estimates. This can be done by computing a covariance matrix of the
optimisation problem. References [18, 19] explain how to derive the covariance
matrix of constrained optimisation problems. This approach is implemented in
solvers such as the open-source qpOASES [50]. In chapter A1 we introduced the
concept of covariance of a (Gaussian) random variable, stating that its shape is
preserved if the process undergoes linear transformations (cf. section A1.2). If
this is not the case, a more formally correct denomination is weighting number.
Throughout this thesis we will talk about covariances, but in general those are
in fact weighting numbers in case nonlinear transformations are present.
Reference [19] develops the formula of the covariance matrix starting from
problem (A111), where z ∈ Rnz is the optimisation variable, f : Rnz → Rnm
represents the cost function and g : Rnz → Rneq implements neq (equality)
constraints.
70 OPTIMISATION
1
minimise kf (z)k22 (A111a)
z 2
It is worth to underline that g(z) includes all constraints, i.e., all the active
equality and inequality constraints must be taken into account (cf. Defs. A2.9–
A2.10). Moreover, the equality constraints are assumed to be perfectly true.
f (z) and g(z) are generic nonlinear functions, which we can linearise through a
truncated Taylor series at the operating point z̄ (cf. section A1.1.2), resulting
in problem (A112). Jf (z) and Jg (z) are the Jacobians with respect to z of
f (z) and g(z), respectively. Furthermore, Eq. (A113) shows the expansion of
the cost function, which is a QP with a quadratic term govern by the Hessian
H(z) = Jf (z)> Jf (z) and a linear term f (z̄)> Jf (z)∆z.
1
minimise kf (z̄) + Jf (z)∆zk22 (A112a)
z 2
subject to g(z̄) + Jg (z)∆z = 0 (A112b)
1 1
f (z̄)> f (z̄) + f (z̄)> Jf (z)∆z + ∆z > Jf (z)> Jf (z)∆z (A113)
2 2
Finally, Eq. (A114a) is the formula for the covariance matrix C ∈ Rnz ×nz given
in [19]. C is a symmetric positive semidefinite square matrix. Its diagonal
entries are the variances (or weighting numbers) of the optimisation variables,
while any nonzero off-diagonal element denotes cross-correlation between the
elements of z. By recompiling Eq. (A114a) into Eq. (A114b), we notice that the
Jacobian of the cost function Jf is no longer present, limiting the computational
effort and avoiding any related numerical errors.
−1
Jg> Jf> 0 0

H I
C = 0

I · ·
Jg 0 0 I 0 0
−1
0 Jg>

Jf H I
(A114a)
0 I Jg 0 0
−1 −1
Jg> 0 Jg>

H H H I
= 0 (A114b)

I
Jg 0 0 0 Jg 0 0
CONCLUSIONS 71
We will show in part B that the covariance matrix provides the CS-MHE not
only with the confidence levels for the estimated states and inputs, but also
with the arrival cost, which is crucial to include the past information in the
sliding estimation window (cf. section A1.5).
A2.7 Conclusions
In this chapter we have outlined the basic concepts of optimisation as well as

some numerical methods, focusing on a few specific convex problems which we
will encounter during the development of the CS-MHE in part B. For what the
CS-MHE is concerned, the main messages of this chapter include the fact that
both QPs and SOCPs are convex problems, for which every local optimum is
also a global optimum, and they are relatively easy to solve. In particular, SQP
offers a convenient framework for applications that involve dynamic systems.
Online (real time) implementation of the CS-MHE is possible by recalling
tailored methodologies that have been developed in the context of MPC/MHE.
This research branch is very active and we can expect further improvements in
the near future. Throughout the dissertation we will often recall the content of
this chapter, since it includes concepts such as nonlinear optimisation, `1 -norm
minimisation, sparsity, complex variables, covariance matrix.
Chapter A3
Compressive sensing
In section A1.7.4 we introduced compressive sensing (CS) as a candidate tool

for modelling an input within an estimation problem. In this chapter we provide
more details about CS highlighting its mathematical formulation, tools and
challenges. Together with the MHE (cf. section A1.5), this chapter forms the
foundation of the CS-MHE, which we will introduce in chapter B1.
We begin by describing the general background of CS in section A3.1, while
in section A3.2 we define CS as an optimisation problem. Furthermore, in
section A3.3 we list some pragmatic tools to increase the rate of success of CS.
Finally, in section A3.4 we summarise the potential benefits that CS can offer
in the framework of estimation problems.
Acknowledgements
This chapter is mainly based on reference [107], which documents some research
carried out at Virtual Vehicle Research Center in Graz (Austria). Matteo
Kirchner is the first author of [107]. A special thanks goes to Eugène Nijman,
who is second author of [107], first spotted the potential of compressive sensing,
and served as daily discussion partner. Furthermore, Matteo Kirchner and
Eugène Nijman outlined the principles of compressive sensing also in [109, 110].
73
74 COMPRESSIVE SENSING
A3.1 Introduction to compressive sensing
Compressive sensingA10 (CS) is a well known scheme for data acquisition and
compression in the field of audio and image processing. CS wants to directly
acquire the minimum amount of data which is needed to fully represent the
signal. This does not usually happen with digital pictures, where cameras
acquire a huge amount of data which is compressed before being stored.
CS is based on signal sparsity (cf. Def. A2.23), i.e., a signal can be represented
(fully or in an approximate way) by just a few components belonging to a certain
transformed space. This space is referred to as dictionary, and its components are
the so-called basis functions or atoms [10, 25, 73]. Moreover, the sensing scheme
should have a dense representation in the dictionary [25]. This is often achieved
by a random sampling scheme (in space for images, in time for time signals).
Among others, we cite references [8, 10, 17, 25, 28, 30, 31, 73, 101, 146, 150].
Consequently, signals are compressible (i.e., they are well approximated by
sparse representations) when they have a sparse representation in some domain
[10, 25, 73]. The challenge is then to represent a signal in that specific domain
and keep only the relevant nonzero elements (the sparser the solution, the
better the compression). We indicate reference [73] for a simple overview
on compressive sensing, while references [25, 26] provide the mathematical
guarantees to be taken into account when dealing with CS.
Eq. (A115a) shows a sensing process. Vector u ∈ Rnu is an unknown signal
to be measuredA11 , y ∈ Rny is a set of measurements and Φ ∈ Rny ×nu is
the sensing matrix (or measurement matrix), and implements the operations
related to signal acquisition. In such way, the measurements can be seen as
a linear combination of the signal. Furthermore, Eq. (A115b) shows how u is
projected onto dictionary Ψ ∈ Rnu ×nα , such that α ∈ Rnα is a sparse vector.
Finally, Θ ∈ Rny ×nα in Eq. (A115c) brings together the sensing matrix and the
dictionary in the so-called global sensing basis.
y = Φu (A115a)
u = Ψα (A115b)
y = ΦΨα = Θα (A115c)
A10 In literature compressive sensing is also referred to as compressive sampling, compressed
sampling or compressed sensing.

A11 We chose variable u since we will exploit CS for input estimation.
METHODS FOR SOLVING THE COMPRESSIVE SENSING PROBLEM 75
The system is assumed to be underdetermined (typically ny nu ≤ nα )

and has an infinite number of solutions. This happens for two reasons: first
because the number of samples ny is kept low, and secondly because the
dictionary may be overcomplete [36], meaning that the signal projection of
Eq. (A115b) is not unique, i.e., one basis function can be represented by
other basis functions. Mathematically, this corresponds to a non-orthogonal
dictionary. Such characteristic can result from either the fact that a dictionary
has been designed that way, or because some orthogonal dictionaries (complete
dictionaries) are merged together [36]. There are many existing candidates
for the composition of a dictionary for compressing sensing, such as Fourier
basis functions, wavelets, chirplets [36]. The crucial point is that the dictionary
has to be able to sparsify the signal, i.e., only a few atoms should (fully or
in an approximate way) describe the signal. Therefore, α should be sparse.
The reason why signal sparsity is a crucial aspect in CS will become clear in
section A3.3.
A3.2 Methods for solving the compressive sensing

problem
Compressive sensing can solve Eq. (A115c) provided that y is sufficiently long
and α is sufficiently sparse [25]. Among all possible solutions of Eq. (A115c), CS
is interested in finding the sparsest solution, since the number of measurements y
needed to capture a sparse signal α is proportional to its sparsity. Consequently,
we need to choose a solution method that promotes a sparse solution when
inverting Eq. (A115c) in order to determine α. We remind that the Moore-
Penrose pseudo-inverse (cf. sections A1.3 and A2.4) does not provide a sparse
solution to an arbitrary underdetermined system such as Eq. (A115c) [10].
Literature offers several algorithms for obtaining a sparse reconstruction [191,
205]. Throughout this dissertation we focus on convex optimisation via `1 -norm
minimisation (cf. sections A2.4), but this is not the only way to generate sparse
solutions. Within the CS community, a possible alternative approach involves
greedy algorithms [190]. These are approaches that aim at solving NP-hard
problems such as problem (A116) in a heuristic fashion by choosing certain
paths within a combinatorial problem. This often results in short computation
times at the high risk of ending up in a local optimum. Reference [25] states
that when α is sufficiently sparse, the recovery via `1 -norm minimisation is
provably exact. Consequently, greedy algorithms may be preferred over convex
optimisation if a certain sparsity level cannot be guaranteed. The price to
pay links to the fact that greedy algorithms very rarely converge to a global
minimum. We can obtain the sparsest solution of Eq. (A115c) by minimising

the `0 -norm of α [24], as indicated by problem (A116).
minimise kαk0 (A116a)

α
subject to Θα = y (A116b)
There are no efficient algorithms to solve such NP-hard problem, due to the
non-convexity of the `0 -norm optimisation [23, 24]. We can overcome this by
“relaxing” the `0 -norm up to an `1 -norm problem [42, 44], for the solution of
which some methods are available. Examples are the method of frames (MOF),
matching pursuit (MP), orthogonal matching pursuit (OMP), best orthogonal
basis (BOB), basis pursuit (BP) [36]. Among these methods, basis pursuit has
strongly been developed, and efficient algorithms are now available, especially
thanks to IP methods (cf. section A2.3.1) [36]. BP is capable of finding
the sparsest solution within a dictionary composed by non-orthogonal basis
functions [35, 36], while the other methods need the atoms to be orthogonal.
This becomes crucial while implementing overcomplete dictionaries with ad hoc
basis functions. Moreover, BP is based on global optimisation, offers better
sparsity and stable superresolution, and can be used with noisy data [36]. For
those reasons, we decided to employ convex optimisation rather than greedy
algorithms.
Under some conditions, which we will discuss in section A3.3, both the `0 -norm
and `1 -norm problems are proven to give the same and unique result. The
relaxation from `0 -norm to `1 -norm of Eq. (A116) leads to problem (A117),
which we solve through BP [36].

α
subject to Θα = y (A117b)
In case of solution approximation, regularization and noise filtering, a slightly

different approach is the basis pursuit denoising (BPDN) [36]. Its formulation is
shown in Eq. (A118) [25], where is a threshold level for denoising. Moreover,
Eq. (A115b) becomes u ≈ Ψα. Problem (A118) is often called LASSO (cf.
section A2.4) [25, 189]. In the next section we discuss the feasibility of CS.
FEASIBILITY CONSIDERATIONS FOR COMPRESSIVE SENSING 77

α
subject to ky − Θαk2 ≤ (A118b)
A3.3 Feasibility considerations for compressive sens-

ing
Starting from the first papers regarding the modern theory of CS [23, 27, 42],
literature offers a considerable number of publications on this topic. Some
extend the already mentioned references, while other apply the theory to several
scientific areas, among which image compression may be the most interested
field of application. A list of publications regarding the theory of compressive
sensing is given in [187], highlighting the main outcomes of each paper and
saying if a result has been superseded by a later publication. In particular,
references [25, 26] include mathematical conditions that replace many previous
ones.
In order to get the correct results of Eq. (A116) through the solution of
problem (A117), having few measurement points and few nonzero basis functions,
the matrices of the undetermined system have to satisfy a condition known
as restricted isometry property (RIP), which was proposed in [24]. The RIP
characterises matrices which are nearly orthonormal when they operate on
sparse vectors.
A3.3.1 The restricted isometry property
Let us consider the linear system of Eq. (A115c) and the `1 -norm problem (A117).
The RIP is a matrix condition which is set by means of restricted isometry
constants, defined as the smallest number δS such that Eq. (A119a) (or
alternatively Eq. (A119b) provided that kαk22 6= 0) holds for all S-sparse vectors
(S ≤ K, so that δS is defined for every S = 1, 2, . . . , K) [26]. A vector is said to
be S-sparse if it has S nonzero entries (cf. Def. A2.23).
(1 − δS )kαk22 ≤ kΘαk22 ≤ (1 + δS )kαk22 (A119a)
kΘαk22
1 − δS ≤ ≤ 1 + δS (A119b)
kαk22
A matrix Θ satisfies a certain RIP if, for any arbitrary vector α having S ≤ K
nonzero entries, the central term of Eq. (A119b) is confined within a certain
region. In other words, there is a certain sparsity K below which the amplification
introduced by the matrix transformation remains bounded. Concerning the
bounds, reference [26] states the following:
• if δ2S < 1, then the `0 -norm problem (A116) has a unique S-sparse solution,
i.e., if we can prove that for a certain sparsity S any vector α with sparsity
up to 2S will amplify matrix Θ less than a factor of 1 (when normalised
by kαk22 ), then there is a unique solution with sparsity S;
√
• if δ2S < 2 − 1, then the solution to the `1 -norm problem (A117) is that
of the `0 -norm problem (A116), and the convex relaxation is exact, i.e., if
we can prove that for a certain sparsity S any vector
√ α with sparsity up to
2S will amplify matrix Θ less than a factor of 2 − 1 (when normalised
by kαk22 ), then the solution of the `1 -norm problem corresponds to the
sparsest solution that the `0 -norm problem would give [107].
If measurements are corrupted with noise, Eq. (A115c) becomes y = Θα + v,

where v ∈ Rny is the noise term. The resulting problem is solved by Eq. (A118),
where is an upper bound on the amount of noise by which the measurements
are corrupted. For this case, reference
√ [26] states similar guarantees as for the
noiseless case, provided that δ2S√< 2 − 1 and kzk2 ≤ . For further details we
refer to [26]. Condition δ2S < 2 − 1 corresponds to having the RIP satisfied
(i.e., the RIP condition holds) [25].
A3.3.2 Signal sparsity and required number of measurements
One of the most important features of CS is its capability of representing and

reconstructing a signal with less data compared to the amount needed by the
Nyquist-Shannon sampling theorem. This crucial aspect is strongly related
to sparsity, as illustrated in Eq. (A120), where ny and nα are the numbers
of measurements and basis functions, respectively, S is the sparsity of α (cf.
Def. A2.23) and cS is a (small) constant [25]. In general, a number of samples
about 3 or 4 times the sparsity level suffices [10, 25]. This number may become
lower in case of specific matrices [25].
ny ≥ cS S log(nα /S) (A120)
Eq. (A120) indicates that a sparse signal (S small) requires a low number of
measurements. It is worth mentioning that if Eq. (A120) holds (i.e., if there
FEASIBILITY CONSIDERATIONS FOR COMPRESSIVE SENSING 79
are enough measurements ny for a certain sparsity S), then with overwhelming
probability a Gaussian random matrix obeys the RIP (i.e., a matrix whose
elements are independent and identically distributed random variables from a
Gaussian pdf with zero mean and variance 1/ny [25]). Further matrix typologies
with a similar behaviour are listed in [25]. To summarise, the RIP holds with
overwhelming probability if the following two conditions are satisfied:
1. Θ is a Gaussian random matrix;

2. for a given sparsity S (and a small constant cS ), the size of Θ ∈ Rny ×nα
satisfies Eq. (A120).
This implies an a priori knowledge of the signal, which may be problematic

for the characterisation of an unknown input signal [107]. It is then crucial to
choose a dictionary which sufficiently sparsifies the signal. In order to enhance
sparsity, literature offers some examples of dictionary learning, i.e., the basis
functions are optimised to sparsify the signal. Some examples are in [2, 47, 116].
A3.3.3 Enhancing the RIP: sensing matrix and matrix coher-

ence
The RIP condition cannot be verified for arbitrary matrices [10, 25, 30].
Nevertheless, we would like problem (A117) to yield the correct result. One
approach in the field of digital image processing proposes to pre-multiply both
sides of Eq. (A115c) by an additional sensing matrix, i.e., matrix Σ in Eq. (A121)
[10, 150]. Such operation aims at turning the quantity ΣΘ into a Gaussian
random matrix, which satisfies the RIP with high probability [11, 25]. A new
measurement vector y ∗ = Σy results then as a linear combination of the actual
measurements.
Σy = ΣΘα (A121)
Researchers are investigating the choice of Σ in order to increase the rate of

success of CS. A first approach proposes the Gaussian random matrix as good
candidate for Σ [10]. A second approach regards the mutual coherence of the
dictionary Φ [45, 43], or the coherence between Σ and Φ [25]. In fact, a low
coherence corresponds to a higher probability for CS to work. For example,
references [48, 147] perform a coherence minimisation.
In order to go beyond the Nyquist-Shannon limit, measurement have to carry
the high frequency information. This can be achieved substituting the classical
regular sampling with a random sampling (cf. section A3.1 and references
therein). It is important to underline that both a random measurement matrix
and a low matrix coherence are not sufficient conditions to have the RIP satisfied,
i.e., they are just tools to enhance (but not guarantee) the RIP.
In references [107, 109] we presented a situation in which CS fails because of
poor matrices and unknown input sparsity, where we investigated the possibility
to apply CS to nearfield acoustical holography (NAH) [200] in order to decrease
the number of measurements needed in the high frequency range [108]. The
conclusion of our research was that for NAH applications it is not possible to
increase the rate of success of BP just by randomising the microphone positions,
because of the intrinsic structure of the matrices, which could not be modified
just by randomising the sampling scheme.
A3.4 Conclusions
In this chapter we introduced the formulation and challenges of compressive

sensing. We will recall some of these aspects in chapter B1, where we apply
CS to model an input within the CS-MHE. We stated the motivation of such
choice already in section A1.7 (cf. section A1.7.5), i.e., CS is promising in
the context of estimation problems since it can limit some observability issues
and it can represent high dynamic ranges by choosing an appropriate set of
shape functions. The price to pay while employing CS is the need of a certain
knowledge about the input signal as well as a convex optimisation problem
which in general is not guaranteed to yield the correct result. In fact, the RIP
condition cannot be verified for arbitrary matrices, and CS has a probabilistic
nature.
Part B
The compressive
sensing–moving horizon
estimator (CS-MHE) for
joint state/input estimation
81
Chapter B1
Formulation of the CS-MHE
The development of the compressive sensing–moving horizon estimator (CS-

MHE) forms the heart of this dissertation. This chapter presents the CS-MHE
approach, which we first proposed in [104]. The CS-MHE brings together
several aspects that we introduced in the previous chapters, such as the MHE
(section A1.5), CS (chapter A3), joint estimators (section A1.6), input models
(section A1.7), the covariance matrix of constrained optimisation problems
(section A2.6) and a few aspects of constrained optimisation (section A2.4).
First, section B1.1 outlines the major motivations and benefits of the CS-MHE in
comparison to other state of the art estimation approaches. Then, section B1.2
presents the CS-MHE formulas, and includes a few details concerning the
recursive implementation of the CS-MHE. Next, in section B1.3 we propose a
second formulation of the CS-MHE which differs from the one of section B1.2
since it keeps the number of constraints low. This requires an additional step
when casting the optimisation problem, but at the same time it may improve the
computational speed. This alternate CS-MHE formulation allows to introduce
complex input representations such as a Fourier dictionary (section B1.4) and will
become crucial to assess rank and condition number in chapter B2. Afterwards,
in section B1.5 we discuss the choice of the CS-MHE formulation, and finally in
section B1.6 we summarise the main features of the CS-MHE.
83
84 FORMULATION OF THE CS-MHE
Acknowledgements
This chapter is based on references [103, 104], of which Matteo Kirchner is
first author. A special thanks goes to Jan Croes, who is the second author in
[103, 104] and actively contributed to the development of the CS-MHE with
great ideas, bringing in his expertise in the fields of state estimation and input
modelling, and being always an excellent discussion partner. Thanks also to
Francesco Cosco and Wim Desmet, who are both co-authors in [103, 104] and
provided precious pieces of advice during both the research and the paper
revision. Thanks to Goele Pipeleers for the suggestions and hints concerning
the practical implementation of the resulting SOCP. Finally, thanks to Eugène
Nijman who encouraged me to study the matrix implementation of the discrete
Fourier transform.
B1.1 Background and motivations of the CS-MHE
In the previous chapters we addressed the motivations that brought us to

the development of the CS-MHE. In this short introduction we summarise
them before introducing and discussing the CS-MHE. State estimation is a
key aspect in many engineering areas such as control engineering, structural
health monitoring and virtual sensing. In fact, knowing the states of a system
provides a complete representation of the internal condition of the system at
a given time instant, and allows a system to be controlled. Furthermore, the
estimation of inputs and parameters can improve the performance of an estimator
when those variables are unknown or cannot be measured in a convenient way.
Specifically, a joint estimator can capture the cross-coupling between states,
inputs and parameters by means of a single covariance matrix, thus enabling
proper consideration of any interdependency among all estimated quantities.
However, this comes at the price of a higher computational cost and possible
observability issues, which may degenerate to failure of estimation when dealing
with many estimates. A random walk model for the representation of inputs
and parameters can improve a filter provided that observability holds (i.e., the
number of random walks is limited by an observability requirement) and the
estimates do not exhibit high dynamic ranges [104].
In such context, the CS-MHE wants to limit the observability issues related to
the estimation of multiple variables and to allow for the estimation of inputs
characterised by a fast dynamics. We employ CS principles to model a sparse
input within an MHE, which offers a framework to exploit input sparsity in
time and space. We achieve this by projecting an input onto a set of basis
THE CS-MHE: A CONSTRAINED OPTIMISATION PROBLEM 85
functions of which only a few are active. As an example, such representation is

well suited for the estimation of a force impact entering a mechanical system at
an unknown location. More in general, inputs distributed in time and/or space
can be estimated provided that an appropriate set of basis functions is available
[104].
B1.2 The CS-MHE: a constrained optimisation

problem
Problem (B1) shows the CS-MHE [104]. We explained most of the notation
already in Eq. (A66), that refers to the classical MHE (cf. section A1.5.1). The
new parts involve the last two terms of the cost function (B1a), their related
bounds in Eq. (B1g), and the new constraints denoted as Eqs. (B1e–B1f).
T
X −1 T
X
minimise wa> Pa−1 wa + wk> Q−1
k wk + vk> Rk−1 vk
wa ,wk ,vk ,να∗ ,αk
k=T −N +1 k=T −N +1
T
X −1
+ να>∗ Pα−1
∗ να∗ + λ kαk k1 (B1a)
k=T −N +1
subject to xk+1 = f (xk , uk ) + wk (B1b)
yk = g(xk , uk ) + vk (B1c)
xT −N +1 = x̄T −N +1 + wa (B1d)
uk = ψk αk (B1e)
α∗ = ᾱ∗ + να∗ (B1f)
xk ∈ xLB UB
, wk ∈ wkLB , wkUB , vk ∈ vkLB , vkUB ,

k , xk
να∗ ∈ ναLB UB
, αk ∈ αkLB , αkUB (B1g)

∗ , να∗
The CS term in Eq. (B1a) consists of the `1 -norm of the sparse representation αk
of the input. It is expressed by Eq. (B1e), where ψk is the part of a dictionary Ψ
that refers to time step k, and Ψ was defined in Eq. (A115b). We formulated the
optimisation problem under the assumption that the input is fully unknown. If
this is not the case, the equations can be extended to include any available input
information, without loss of generality. The CS term is the only linear term of
the cost function, while all other components are quadratic. A constant weight
λ balances this term with the rest of the cost function, and plays a crucial role
in the optimisation (cf. section A2.4). In fact, λ scales the contribution of the
sparsity exploitation with regard to the noise terms of model and measurements,
which are represented by the covariance matrices Qk and Rk , respectively.
Eqs. (B1d) and (B1f) and their related terms in the cost function contribute to
the CS-MHE by including information prior to the current window. Specifically,
Eq. (B1d) refers to the arrival cost (cf. section A1.5), while Eq. (B1f) allows
to exploit any available knowledge about an input, and we will discuss it in
section B1.2.1. Despite its similarity to a typical random walk equation (cf.
section A1.7.2), it is important to notice that Eq. (B1f) does not refer to the
input estimation. In fact, it propagates the participation factors of an already
detected input to the next iteration, while the estimation is performed by the CS
part. Eq. (B1g) includes two extra bounds on the newly introduced optimisation
variables (να∗ and αk ). In case of real variables, i.e., if the dictionary Ψ is
real, problem (B1) is a QP (cf. section A2.2). Otherwise, we can recast it
as an SOCP (cf. section A2.5). We will discuss the latter case in details in
section B1.4.
One of the motivations that drove us to developing the CS-MHE regarded
observability. Unfortunately, it is not easy to apply to the constraint optimisation
problem (B1) any of the methods for assessing observability that we presented
in section A1.8, including an evaluation based on rank and condition number.
In order to deal with this crucial aspect, we will introduce an equivalent
unconstrained CS-MHE formulation in section B1.3, which we employ in
chapter B2 for assessing rank and condition number.
B1.2.1 Exploiting the prior information
In section A1.5 we indicated that it is possible to determine both the accuracy

of the estimates and the arrival cost by means of a covariance matrix (cf.
section A2.6). In the framework of the CS-MHE, the covariance matrix is
crucial to transfer the knowledge of an input from the current window to the
next time step. This is done as described in Algorithm 1 and it is implemented
by Eq. (B1f), together with its related term in Eq. (B1a). The following list
illustrates the notation of Algorithm 1 and gives all details about the propagation
of the input information. Moreover, the few symbols of problem (B1) that were
not described yet are explained here. Items are labelled according to the line
numbers they refer to in Algorithm 1 [104].
THE CS-MHE: A CONSTRAINED OPTIMISATION PROBLEM 87
Algorithm 1: Procedure for updating the sparse representation of an input.

1 define window i, with k = T −N +1, . . . , T ;
2 set optimisation problem i;
3 solve optimisation problem (B1) for window i;
4 compute covariance i;
5 compute αi|i∗
and nα∗ i|i ;
6 compute αi+1|i and nα∗ i+1|i ;
∗
7 if nα∗ i+1|i ≥ 1 then

8 compare αi|i∗
and αi|i−1
∗
;
9 assign Pα i+1|i as follows:
∗
10 Pα∗ i+1|i = Pα∗ i|i + Qdrift (for matching elements);

11 Pα∗ i+1|i = Qdrift (for new elements);
12 else
13 iteration i+1 will not have extra states;
14 end
1 Index i refers to the current optimisation problem.
2–4 These lines refer to the formulation and solution of the optimisation
problem i, as well as the computation of its associated covariance matrix.
Setting the problem includes any available prior information, i.e., the
arrival cost and the knowledge about a possible input. The solution of
the problem returns an estimation of states and inputs. The latter are
represented by the sparse vector αi|i , which is the collection of all αk
within the window.B1
We omitted index i from problem (B1). Similarly to the notation of the
arrival cost in Eqs. (A66d) and (B1d), we marked the prior data with a
bar, such that ᾱ∗ corresponds to αi|i−1
∗
.
5 Variable αi|i∗
collects the nonzero elements of αi|i , and their number is
denoted as nα∗ i|i . We consider a nonzero element of αi|i as such if its
absolute value exceeds a predefined threshold level εα . The purpose of εα
is to filter out the noise components and limit the size of the optimisation
problem. Its side effect is that some energy coming from the input is
discarded, and consequently the input magnitude is underestimated. We
will show a simple way to avoid this energy loss for the application
examples in chapter C1.
να∗ ∈ Rnα∗ in Eqs. (B1a) and (B1f) is a noise term related to the input,
B1 Notation zi|j refers to the estimation of variable z at time step i given the information at
time step j.
and it is assumed to follow a Gaussian distribution να∗ ∼ N (0, Pα∗ ),

where Pα∗ carries the weighting numbers of any estimated input.B2
6 αi+1|i
∗
∈ Rnα∗ i+1|i transfers the current knowledge of the input to the
next window. The input knowledge is shared with the next iteration i+1
through weighting numbers. Those can be obtained from the covariance
matrix of the optimisation problem. To achieve this, the set αi+1|i
∗
is
added to the problem as nα∗ i+1|i extra states. Note that αi|i and αi+1|i
∗ ∗
may differ since the window is sliding in time, and any information at
T −N +1 is thus omitted, i.e., time step T −N +1 of the current iteration
does not belong to the window of the next iteration.
7 If at least one element is shared with the next time step, the updating
procedure takes place. Otherwise, nothing is transferred to the next
iteration (line 13).
8–11 The comparison between current and previous windows governs the way
the input weighting numbers are updated. In fact, if an element was present
during the previous window, then the current problem and covariance
matrix include its weighting number Pα∗ i|i . This is added to a drift term
Qdrift , resulting in Pα∗ i+1|i (line 10). On the other hand, if an element is
new, only Qdrift is assigned to Pα∗ i+1|i (line 11).
In plain text, whenever the CS-MHE detects an input, the algorithm assumes
that it is highly possible that the same input will be detected also in the
following iteration, until the step at which that input reaches the end of the
window. In other words, the sparsity pattern of the input does not change in
space, while it is shifted in time according to the sliding window. Moreover, a
drift term Qdrift is added to the input magnitude, to relax the constraints of the
next optimisation problem. Note that a small Qdrift implies tighter constraints,
while higher values give more freedom to the solver. The choice of Qdrift is
linked to the knowledge of the system, which is evaluated by the covariances
Qk and Rk [104].
The knowledge of an input is shared with the next iteration in terms of
augmented states. Consequently, the number of variables of the optimisation
problem grows proportionally to nα∗ i+1|i . The size of the problem may be kept
smaller if the threshold for the input detection εα gets higher. There are several
ways to model the drift term to correct the input estimation. The simplest
B2 In this chapter we adopt notation x ∈ Rnx to indicate the size of a (column) vector, in
line with the notation of the classical MHE in problem (A66). However, in case of a complex
dictionary notation x ∈ Cnx would be more appropriate. This will become clear in chapter B2,
where we will introduce a complex dictionary (cf. section B1.4). For this chapter, matrix and
vector sizes indicated by R may also refer to complex values.
THE CS-MHE: LIMITING THE AMOUNT OF CONSTRAINTS 89
approach employs a constant value, but a function of the time steps within
the window may be more appropriate [104]. We will show an example for the
numerical test case in chapter C1 (cf. section C1.1).
Before concluding this section, it is worth to mention that Eq. (B1f) does
not involve the last time step of the window, since it does not go beyond
k = T −1. In fact, it is based exclusively on the previous window (ᾱ∗ ). This
implies that the CS-MHE cannot estimate a force input acting at k = T unless
additional information enters the estimation problem. This can happen thanks to
acceleration measurements, since the state-space matrix D (direct feedthrough,
cf. section A1.1) is full when accelerometers are employed, while it is empty
in case of displacement transducers. The situation is different if a zero-order
random walk model is introduced to represent an input (cf. section B2.1) [105].
The accuracy of the CS-MHE depends on a few tuning parameters, such as the
window length (N ), the covariances associated to the model and measurement
errors (Q and R, respectively), the covariance related to the arrival cost (Pa ), a
drift term to propagate any input information to the next iteration (Qdrift ), a
threshold for the input estimation (εα ) and the balancing weight for the `1 -norm
term (λ) [104]. We will discuss the choice of these parameters in chapter C1,
where we introduce a numerical test case as well as an experimental validation.
B1.3 The CS-MHE: limiting the amount of con-

straints
In the previous section we introduced the CS-MHE, which we derived

starting from the simplest MHE formulation for what concerns its practical
implementation, i.e., the process model and the measurement equations take part
in the optimisation problem in form of constraints, leaving only their related noise
terms in the cost function (cf. Eqs. (A61–A62) in section A1.5). Consequently,
the filter designer can add the state-space equations to the constraints without
manipulating them. We will employ this formulation for the estimation of force
impacts in chapter C1. Nevertheless, this relatively easy implementation strategy
comes with a few drawbacks. First, the computational effort required may be
high due to the presence of many constraints. Furthermore, problem (B1)
does not provide a single matrix that can be used to determine rank and
condition number of the system (cf. section A1.8). Finally, the complexity of
the formulation does not offer an easy way to investigate input projections that
involve a complex dictionary (such as a Fourier series). Consequently, in this
section we propose a second formulation for the CS-MHE that goes back to the
idea of the MHE which we first presented in Eq. (A61).
Let us consider the CS-MHE described by problem (B1) and focus on the
constraint equations labelled as Eqs. (B1b–B1f). The hypothesis of additive
noise for discrete-time systems (cf. section A1.2, page 23) allows us to write
down explicitly the noise terms as shown in Eqs. (B2) and substitute them in
the cost function. This results in the optimisation problem (B3). Note that we
have first inserted Eq. (B1e) into Eqs. (B2a–B2b). The notation is the same of
sections A1.5 and B1.2 (cf. Eqs. (A66) and (B1)). Problem (B3) represents the
starting point for introducing the complex dictionaries in section B1.4 and for
discussing rank and condition number in chapter B2.
wk = xk+1 − f (xk , uk ) (B2a)
vk = yk − h(xk , uk ) (B2b)
wa = xT −N +1 − x̄T −N +1 (B2c)
να∗ = α∗ − ᾱ∗ (B2d)
>
minimise (xT −N +1 − x̄T −N +1 ) Pa−1 (xT −N +1 − x̄T −N +1 )
xk ,αk
T −1
X >
+ xk+1 − f (xk , ψk αk ) Q−1 xk+1 − f (xk , ψk αk )

k
k=T −N +1
T
X >
+ yk − h(xk , ψk αk ) Rk−1 yk − h(xk , ψk αk ) (B3a)

k=T −N +1
T −1
>
X
+ (α∗ − ᾱ∗ ) Pα−1
∗ (α − ᾱ ) + λ
∗ ∗
kαk k1
k=T −N +1
subject to xk ∈ xLB UB
, αk ∈ αkLB , αkUB (B3b)

k , xk
B1.4 The CS-MHE with complex input representa-

tions
In this section we exploit the CS-MHE formulation of Eq. (B3) in order to

implement an input projection on a complex dictionary. We can employ complex
THE CS-MHE WITH COMPLEX INPUT REPRESENTATIONS 91
representations to model distributed inputs and periodic inputs. For example,

the Fourier series can be instrumental to estimate forces and torques in rotating
machines, whose characteristics are of quasi-periodic nature. In fact, physical
quantities such as forces and torques are of paramount interest for condition
monitoring and control engineering, but are difficult or even impossible to be
measured effectively [103].
This section discusses the CS-MHE for the estimation of a periodic load described
by a complex Fourier series. First, section B1.4.1 introduces a Fourier dictionary,
which we then insert into the CS-MHE formulation in section B1.4.2. We
recast the resulting complex problem into a second order cone program (SOCP,
cf. section A2.2), which can be solved by commercial software. Next, in
section B1.4.3 we modify the procedure to exploit the prior information (cf.
section B1.2.1) in case of a Fourier dictionary. Most of the material of this
section was first reported in reference [103], of which Matteo Kirchner is first
author.
B1.4.1 Fourier dictionary
This section introduces a Fourier dictionary, denoted by Ψ. Under the

assumptions of regular sampling and equal amount of samples and Fourier
components, a Fourier dictionary is a square matrix that applies the inverse
Fourier transform to a signal [107]. Eqs. (B4a-B4b) show this concept, where
vectors u and α are a signal and its Fourier representation, respectively. For
compressive sensing, α should be sparse (cf. chapter A3). Matrices F and
F −1 correspond to the direct and inverse discrete Fourier transform (DFT),
respectively. Each component of F is normalised by the square root of the
number of samples, such that the result of applying the inverse DFT is scaled
correctly.
α = Fu (B4a)
u = F −1 α = Ψα (B4b)
Fourier components are orthogonal, i.e., each component cannot be represented

by a linear combination of other components. Consequently, this set of basis
functions is referred to as a complete dictionary (cf. chapter A3) [36]. Ψ is
a square matrix, made of elements ψk,j at row k and column j, as shown
in Eq. (B5). Such elements are built from the formulas of the inverse DFT.
Subscripts k and j refer to a sampling point and to a Fourier component,
respectively, i.e., each column of Ψ refers to an atom of the dictionary [103]. A

Fourier dictionary may also refer to an irregular and/or undersampled acquisition
scheme. In such a case (which is particularly interesting for CS) the dictionary
becomes overcomplete (cf. section A3.1) and cannot be obtained by matrix
inversion. For further details we refer to references [107, 110]. It is worth
mentioning that we can build matrix Ψ for functions of multiple variables. For
example, an input may depend on time and space, such that a 2D-DFT can
represent the full input space within an estimation window (cf. section C2.2).
 
ψT −N +1,T −N +1 ··· ψT −N +1,T −1
Ψ =  .. .. .. (B5)
. . .
 

ψT −1,T −N +1 ··· ψT −1,T −1
B1.4.2 CS-MHE formulation with the Fourier dictionary
Eq. (B6) shows the input projection onto dictionary Ψ at time step k.
Consequently, the discrete-time state-space representation of a (linearised)
system with additive noise (cf. Eq. (A32)) becomes as in Eq. (B7). We note
that each time step k involves all atoms of the dictionary [103].
T
X −1
uk = ψk,j αk (B6)
j=T −N +1
T
X −1
xk+1 = Ak xk + Bk ψk,j αk + wk (B7a)
j=T −N +1
T
X −1
yk = Ck xk + Dk ψk,j αk + vk (B7b)
j=T −N +1
By extracting the noise terms from Eq. (B7) and inserting them into the CS-
MHE problem (B3), we obtain the formulas for the CS-MHE for a discrete-time
system that implements dictionary Ψ, which we show in Eq. (B8). All notation
should already be clear. In matrix form, problem (B8) corresponds to Eq. (B9).
The new optimisation variable z and consequently all other matrices are given
in Eq. (B10). In appendix 1 we further discuss the matrix implementation of
the CS-MHE, giving the example of an LTI system and a horizon length of
N = 4 time steps.
>
xk ,αk
 >
T
X −1 T
X −1
+ xk+1 − Ak xk − Bk ψk,j αk  ·
k=T −N +1 j=T −N +1
 
T
X −1
Q−1
k
xk+1 − Ak xk − Bk ψk,j αk 
j=T −N +1
 >
T
X T
X −1
+ yk − Ck xk − Dk ψk,j αk  ·
k=T −N +1 j=T −N +1
 
T
X −1
Rk−1 yk − Ck xk − Dk ψk,j αk 
j=T −N +1
T −1
>
X
+ (α − ᾱ) Pα−1 (α − ᾱ) + λ kαk k1 (B8a)
k=T −N +1
subject to x ∈ xLB , xUB , α ∈ αLB , αUB (B8b)

minimise z > Hz + q > z + b + λkαk1 (B9a)

x,α

x
z = (B10a)
α

Hxx Hxα
H = (B10b)
Hαx Hαα

qx
q = (B10c)
qα
Although some solvers may be able to deal with complex variables and with
an `1 -norm term, it is more convenient for computational speed to recast
the problem into the SOCP of Eq. (B11). We can obtain this by applying
the formulas that we introduced in section A2.5, defining a new optimisation
variable z̃ in which the real and imaginary parts of α are split as in Eq. (B12).
Moreover, a slack variable s keeps the relationship between <(α) and =(α)
through the new constraint Eq. (B11b) [101, 120]. Eq. (B12) gives all vectors
and matrices of problem (B11).
minimise z̃ > H̃ z̃ + q̃ > z̃ + b̃ + λ1> s (B11a)

z̃,s
subject to <(α)2 + =(α)2 ≤ s (B11b)

p
x ∈ xLB , xUB , α ∈ αLB , αUB (B11c)

 
x
z̃ =  <(α)  (B12a)
=(α)
<(Hxα ) −=(Hxα )
 
Hxx
H̃ =  <(Hαx ) <(Hαα ) −=(Hαα )  (B12b)
=(Hαx ) =(Hαα ) <(Hαα )
 
qx
q̃ =  <(qα )  (B12c)
=(qα )
b̃ = <(b) + =(b) (B12d)
B1.4.3 Exploiting the prior information in case of a Fourier

dictionary
In section B1.2.1 we indicated a procedure to take into account the prior

information into the next iteration. In this section we discuss how we adapted
the procedure in case of a complex Fourier dictionary, and we present it in
Algorithm 2. The following list explains Algorithm 2 and gives the details about
the propagation of the input information. Items are labelled according to the
line numbers they refer to in Algorithm 2.
Algorithm 2: Procedure for updating the sparse representation of an input in

case of a Fourier dictionary.
1 define window i, with k = T −N +1, . . . , T ;
2 set optimisation problem i with complex variables (cf. Eqs. (B9–B10));
3 set optimisation problem i with real variables (cf. Eqs. (B11–B12));
4 solve optimisation problem (B11) for window i;
5 compute αi|i∗
and nα∗ i|i ;
6 compute αi+1|i
∗
by phase shift. nα∗ i+1|i = nα∗ i|i ;
7 compute covariance i;
8 if nα∗ i+1|i ≥ 1 then
9 assign Pα∗ i+1|i as follows:
10 Pα∗ i+1|i = Pα∗ i|i + Qdrift (for matching elements);
11 Pα∗ i+1|i = Qdrift (for new elements);
12 else
13 iteration i+1 will not take any inputs into account;
14 end
1–5 Analogous to Algorithm 1, now with Eqs. (B9–B12).

6 This step differs significantly from Algorithm 1. In fact, the input of the
next step is obtained by phase shift taking into account the length of the
sampling time. Note that the phase shift occurs in time and not in space.
Furthermore, it does not affect the DC components.
7 Building the covariance matrix of the optimisation problem requires an

extra step in comparison with the case with a real dictionary. In fact,
in problem (B11) the optimisation variables are divided into 2 vectors,
being the slack variable s not included into z̃. On the other hand, the
formulation of the covariance matrix requires one single vector, which we
call ζ in Eq. (B13), and includes the slack variable s (cf. section A2.6).
Consequently, we need to update the Hessian and the Jacobian of the
constraints, the latter being obtained after linearising the constraint
Eq. (B11b). We will geve further details after this list.
8–14 Analogous to Algorithm 1.
The new variable ζ in Eq. (B13) for the computation of the covariance matrix
of the optimisation problem comes with its associated Hessian H̃ζ in Eq. (B14).
Furthermore, we linearise the cone constraint Eq. (B11b) following the Taylor
linearisation around point ᾱ (cf. Eq. (A5)), which we obtain from the solution of
problem (B11). Eq. (B15) shows the linearisation, which results in the Jacobian
J˜ζ in Eq. (B16). H̃ζ and J˜ζ correspond to H and Jg in Eq. (A114), respectively,
and allow to compute the covariance matrix.
 
x
 <(α) 
ζ =  
 =(α)  (B13)
s
<(Hxα ) −=(Hxα ) 0
 
Hxx
 <(Hαx ) <(Hαα ) −=(Hαα ) 0 
H̃ζ = 
 =(Hαx ) (B14)
=(Hαα ) <(Hαα ) 0 

0 0 0 0
<(ᾱ)2 + =(ᾱ)2
<(ᾱ)2 + =(ᾱ)2 − p +
p
<(ᾱ)2 + =(ᾱ)2
<(ᾱ) =(ᾱ)
· <(α) + p · =(α) ≤ s (B15)
<(ᾱ)2 + =(ᾱ)2 <(ᾱ)2 + =(ᾱ)2
p
J˜ζ =
<(ᾱ) =(ᾱ)
h i
0 √ √ −I (B16)
<(ᾱ)2 + =(ᾱ)2 <(ᾱ)2 + =(ᾱ)2
The system as such is likely not to be observable (cf. section B2.1) since
all possible inputs are included, and this would lead to problems in the
matrix inversion in Eq. (A114). For the CS-MHE described in chapter B1
we circumvented this problem by considering only the active component of the
dictionary, with the consequence of changing the problem size according to the
number of nonzero elements nα∗ . For the case of Fourier components we propose
a different approach that does not require to change the problem size, allowing
to keep the same matrix structure throughout the estimation. We implemented
this by introducing a regularisation factor on all the zero components of α. We
will show an example in chapter C2 (cf. section C2.1).
B1.5 Discussion about the different CS-MHE for-

mulations
Up to here we described two possible formulations of the CS-MHE, the second

of which allowed us to introduce complex input representations. Moreover, we
DISCUSSION ABOUT THE DIFFERENT CS-MHE FORMULATIONS 97
Dictionary of Dirac deltas Complex Dirac delta

1 1
0 0
Amplitude
A A
0 0
A A
0 0
Basis functions Single basis function
Figure B1.1: Dirac delta dictionary (left) and magnitude of a complex Dirac
delta (right). Legend: Dirac delta (—–•); signal to be modelled (- - -×).
presented two algorithms to exploit the prior information in case of time domain
and Fourier dictionaries. We gave a few hints of the application domains of the
different formulations, and in this section we further discuss this aspect.
As an example, let us consider a force impact. Throughout this dissertation we
already mentioned that the CS-MHE is well suited for the estimation of impacts
due to their sparsity in time and space. Formally, we model such idea through a
Dirac delta dictionary, which consists of a Dirac delta for every possible location
and time step within the estimation window [36, 63]. We can visualise this by
looking at the graphs on the left hand side in Fig. B1.1. From top to bottom,
there are a Dirac delta dictionary (made of unit impulses), its adaptation for
the representation of an impulse with amplitude A (green cross) applied at a
modelled location, and finally also for a location which is not modelled. The
latter case causes a worse sparsity level, and the true location can be determined
by linear interpolation [63].B3 Let us now think of modelling an impulse in
a different way, i.e., by using a single Dirac delta which is being shifted by
a phase relation (e.g., through complex numbers). We depicted such idea in
the right hand side graphs of Fig. B1.1 (where we show only the amplitude).
Such representation is quite appealing for observability reasons, but it does
not comply with sparsity. In fact, a single basis function cannot form a sparse
signal. Because of this latter reason, in the context of the CS-MHE we prefer
to have a dictionary composed by Dirac deltas spanning a series of points in
time and/or space (pre-multiplied by factor that determines the amplitude),
B3 The graphs assume an impulse of magnitude 1, which can be scaled arbitrarily without
loss of generality.
since such representation promotes a sparse solution that we want to capture

through the `1 -norm minimisation. Consequently, for the estimation of impulses
we recommend a native time domain real dictionary, which allows to operate
with a quadratic optimisation problem in the form of Eqs. (B1) or (B3), and
Algorithm 1 for the propagation for the prior information.
For different types of signals with time domain dictionaries, the choice of
formulation and algorithm could be the same as for impacts provided that the
dictionary is able to sparsify the signal. However, things are different in case
of a complex Fourier dictionary. In fact, complex representations require the
formulation given in Eq. (B3) (or Eq. (B8) specifically for a Fourier dictionary),
and the way of propagating the prior information may change, as we pointed
out in Algorithm 2, where a time shift translated into a phase shift according
to the chosen dictionary.
To summarise, the choice of the CS-MHE formulation follows from whether we
prefer to keep the model in form of constraint equations or we decide to have
the constraints integrated in the cost function. The latter option can lead to
lower computational load, and it is required in case of complex dictionaries.
The philosophy behind the propagation of the prior information is dictionary
dependent, and different dictionaries may require some adaptations of the
proposed algorithms. Among the dictionaries that we do not exploit in this
dissertation, we mention wavelets as a potential tool to model distributed loads
and transient phenomena.
B1.6 Conclusions
In this chapter we introduced the formulation of the CS-MHE. First, we derived

it from an MHE scheme for which the model is implemented as a constraint in
the optimisation problem. Moreover, we presented an algorithm to propagate
any available input information from the previous iteration to the current
iteration [104]. This CS-MHE formulation is on the one hand relatively easy
to use in practice since the state-space models are implemented as constraints,
but on the other hand the problem changes its size according to the number
of estimated inputs. Furthermore, this formulation does not offer a simple
way to implement complex dictionaries, which can be instrumental for the
representation of distributed inputs and periodic inputs, and it does not allow
to assess rank and condition number of the problem as an indication of possible
observability issues.
For these reasons, we introduced a second formulation of the CS-MHE, where
we included the state-space equations in the cost function instead of listing
CONCLUSIONS 99
them as constraint equations. This alternate formulation allowed us to also

implement complex input representations. Specifically, we showed the problem
formulation for a Fourier dictionary and we discussed how to update the prior
information. Moreover, we described how to evaluate the covariance matrix in
case of an SOCP. We will apply this formulation in chapter B2 to discuss rank
and condition number of the CS-MHE.
We will employ both CS-MHE formulations that we presented in this chapter
with numerical and experimental examples in part C of this dissertation. In
particular, in chapter C1 we will show a numerical example as well as an
experiment in which we use the first formulation for the estimation of force
impacts. We will also discuss the choice of a few tuning parameters such as
the balancing weight λ, the model and measurement covariances Qk and Rk
and the drift term to propagate any estimated input Qdrift . Furthermore, in
chapter C2 we will apply the second formulation to a few examples involving a
Fourier dictionary.
The major drawback of the MHE over the EKF is its higher computational effort,
and this disadvantage remains also for the CS-MHE. However, new efficient
algorithms are being developed for the MHE by the communities of nonlinear
optimisation and optimal control, from which the CS-MHE can benefit since
we did not change the structure of the optimisation problem (cf. chapter A2).
Chapter B2
Rank and condition number

considerations for the
CS-MHE
In this chapter we investigate rank and condition number of the CS-MHE,

beginning from the formulation of Eq. (B3). We compare the CS-MHE with
an MHE scheme with a random walk model for the input representation as
well as an MHE without any input model. This allows to spot strengths and
weaknesses of the different methodologies for input modelling and to foresee a
link with observability. This section reports the research that we presented in
[105].
After introducing the MHE schemes and few numerical scenarios in sections B2.1
and B2.2, we present the matrix structure for each scenario in section B2.3 and
we compare rank and condition number for an increasing amount of inputs in
section B2.4. Finally, we summarise the main conclusions in section B2.5.
Acknowledgements
This chapter reports the research that we presented in reference [105], of which
Matteo Kirchner is the first author. A big thanks goes to Jan Croes, who is
second author in [105] and contributed to the topic with ideas and served as
discussion partner.
101
102 RANK AND CONDITION NUMBER CONSIDERATIONS FOR THE CS-MHE
B2.1 Two estimation schemes to compare with the

CS-MHE
The discussion about rank and condition number of the CS-MHE consists of
a comparison with an MHE scheme with a random walk model for the input
representation (RW-MHE, cf. section A1.7.2) and an MHE without any input
model (NI-MHE, cf. section A1.7.1), and it is based on the test settings in the
following list [105].
• We consider an MHE in which the estimation of both states and inputs

reaches time step k = T (for the CS-MHE, this means that the summation
in the CS term in Eq. (B3a) goes up to k = T instead of going up to
k = T −1).
• We remove the asterisk from α∗ in Eq. (B3a), meaning that we consider
the full set of possible inputs (and not only the nonzero components
governed by the choice of εα , cf. section B1.2.1).
• We choose a Dirac delta dictionary for the CS-MHE (cf. section B1.5),
such that the input u and its projection α are the same, being a Dirac
delta dictionary an identity matrix, i.e., uk = Iαk ≡ αk . Related to this
aspect, it is important to mention that the matrices contained in this
section (Eqs. (B20–B21)) are not generic, and hold only in case of a Dirac
delta dictionary. In case of other dictionaries we may obtain more complex
matrices, as the ones in the example in appendix 1.
• We exclude bounds (B3b) from the set of active constraints, since
mathematically bounds introduce equations with zero covariance.
• We introduce an LTI numerical test case in section B2.2. In fact,
for multistep estimators such as the MHE we can have an indication
concerning observability by assessing rank and condition number of the
discretised system, looking at the problem as an overdetermined weighted
least square fitting (cf. section A1.8). Accordingly, in the matrices that
we report in this section (Eqs. (B20–B22)) we omit to mark the time step
dependency (subscript k), resulting in the formulation of an LTI system.
However, in case of nonlinear systems the time step dependency can be
reinserted in the matrices following the notation in Eq. (B18) [105].
We showed the matrix formulation of the CS-MHE in Eq. (B8), which we

report for convenience in Eq. (B17). Vector z ∈ Rnz collects all variables to
be estimated, i.e., states and inputs, as shown in Eq. (B18). Consequently,
TWO ESTIMATION SCHEMES TO COMPARE WITH THE CS-MHE 103
nz = nx+nm . Superscripts CS, RW and NI indicate the CS-MHE, the RW-MHE

and the NI-MHE, respectively. The equivalence holds thanks to the hypothesis
of a Dirac delta dictionary as well as the input estimation reaching time step
T . The Hessian H is the only matrix which is relevant for observability related
purposes, since its is used to compute the covariance of the optimisation problem,
which is a measure of observability [79] (cf. section A1.8). In fact, if the system
is not observable the Hessian is not full rank, and the covariance tends to
infinity. A similar discussion applies to the covariance matrix of constrained
optimisation problems [18, 19]. We can further decompose H as in Eq. (B19)
[105], where AH is given in Eqs. (B20) and (B21) for the CS-MHE and the
RW-MHE, respectively. ANI RW
H for the NI-MHE is simply AH excluding the last
block (which refers to the RW), and Σ is the diagonal matrix in Eq. (B22).
minimise z > Hz + q > z + b + λkαk1 (B17a)

x,α

   
xT−N+1 xT−N+1
 xT−N+2   xT−N+2 
.. ..
   
. .
   
   
   
 xT−1   xT−1 
   
 xT
z CS =   ≡ z RW ≡ z NI =  xT (B18)
  

 αT−N+1   uT−N+1 
   
 αT−N+2   uT−N+2 
.. ..
   
   

 . 


 . 

 αT−1   uT−1 
αT uT
H = A>
H ΣAH (B19)
Eq. (B22) collects all covariances Pa , Qk , Rk and Pα . It is always full rank

and consequently we focus our interest on rank and condition number of AH .B4
The system is not observable if AH is not full rank, while the condition number
gives a relative measure of the solution stability. For an absolute evaluation, we
need to take Σ into account as well [105].
B4 A regularisation may be needed to cope with numerical errors, cf. section B1.4.3 and
C2.1.
I
 
 −A I −BI 
 
 −A I −BI 
.. .. ..
 
. . .
 
 
0 
 

 −A I −BI 
 −C −DI 
 
 −C −DI 
ACS =  .. ..
 
H
. .

 
 

 −C −DI 


 −C −DI 


 I 

 I 
..
 
.
 
 
I 0
(B20)
I
 
 −A I −BI 
 
 −A I −BI 
.. .. ..
 
. . .
 
 
0 
 

 −A I −BI 
 −C −DI 
 
 −C −DI 
ARW =  .. ..
 
H
. .

 
 

 −C −DI 


 −C −DI 


 −I I 

 −I I 
.. ..
 
. .
 
 
−I I
(B21)
B2.2 Numerical test cases
Let us now introduce a numerical test case to assess rank and condition number
of the three estimation schemes. We modelled analytically a cantilever beam,
NUMERICAL TEST CASES 105
Pa−1
 
−1
 Q 
Q−1
 
 
..
 
.
 
 
Q−1
 
 
Σ=
 
 R−1 

 R−1 
..
 
.
 
 
 

 R−1 

 R−1 
Pα−1
(B22)
s1 s2 s3
0 L
Figure B2.1: Cantilever beam. Legend: 1st mode (—–); 2nd mode (- - -); 3rd
mode (- · -); transducers (s1 , s2 , s3 ); input locations (+). Figure reproduced
from [105].
considering an aluminium bar with uniform cross-section and length L

(Fig. B2.1). Table B2.1 contains geometry and material properties of the
beam. The model takes into account the first three eigenmodes, to which three
damping coefficients ζn , n = 1, 2, 3 are associated. The system is described by
six states (nx = 6), i.e., three position modal participation factors (MPFs) and
their time derivatives [63]. Fig. B2.1 shows also three transducers (s1 , s2 , s3 ) and
two nodes (grey crosses), which are equally distributed along the beam with the
exclusion of the clamping position. The nodes represent the locations at which
an input is applied, and their number nm is the only attribute that we vary
when comparing different scenarios. The beam is an LTI system, and has thus
a state-space representation that follows Eq. (A33). We built the measurement
equations by simulating both displacement and acceleration transducers. The
sizes of B, Ψ and α are B ∈ Rnx ×nm , Ψ ∈ Rnm ×nm and αk ∈ Rnm .
We can now construct matrix AH with the model and measurement equations
of the numerical example and subsequently employ it to investigate its rank and
Table B2.1: Beam geometry and material properties.
Parameter Value
Beam length [m] 0.400
Beam width [m] 0.025
Beam thickness [m] 0.003
Density [kg/m3 ] 2727
Young’s modulus [GPa] 67.8
ζ1 0.010
ζ2 0.012
ζ3 0.040
condition number. We considered the following test conditions for an estimation

window of length N = 5:
1. Three estimation approaches
(a) NI-MHE
(b) RW-MHE
(c) CS-MHE
2. Two types of transducers
(a) displacements
(b) accelerations
3. Knowledge about the prior information
(a) the arrival cost is known
(b) the arrival cost is not available
4. Number of input locations nm = 1, . . . , 5
5. Two final test cases involving the NI-MHE without arrival cost. First, we
consider only one time step of a window for the input estimation, within
which we take into account a growing number of input locations [105].
Next, we discuss the case of one input location and an increasing amount
of time steps within the estimation window.
STRUCTURE OF THE MATRICES 107
11
10 rank(O) = n z
rank(O)
9
6
6 7 8 9 10 11 12 13 14
nz
Figure B2.2: Rank of the observability matrix O as function of the number of

states nz = nx + nm .
B2.2.1 Observability of the test case
Before presenting the results of the numerical study, it is worth to have a look
at the observability of the system for a single step estimator (cf. section A1.4)
with an RW model to represent an input. Let us consider the case in which
only the states are being estimated. Then, let us add one by one the possible
input locations as augmented states. We did this such that we always take into
account the beam tip (x = L), we never consider point x = 0, and the rest of the
beam is covered uniformly. Moreover, we assign a random walk model to each
input position [133].
For the system to be observable, matrices A and C of the state-space
representation have to satisfy the condition given in Thm. A1.23. Fig. B2.2
shows the rank of O as function of the number of states nz , that includes the
nx = 6 position and velocity MPFs as well as the nm augmented states for the
input estimation. The graph shows that the system becomes unobservable if
nz > 9, i.e., when the number of input locations is nm > 3. On the other hand,
we will show in chapter C1 that the CS-MHE is able to observe a higher number
of input positions, provided that the input is sparse. In fact, Figs. C1.1 and
C1.13 both exhibit nm = 8 (grey crosses).
B2.3 Structure of the matrices
Let us begin by considering points 1 to 3 of the list on page 106, assuming a

constant number of input locations nm = 2, corresponding to two force impulses
at the two nodes shown in Fig. B2.1. We chose this value since the observability
matrix indicated that the system is observable for a single step estimator. At
the same time, the PBH test (cf. Thm. A1.24) suggested that some states may
encounter estimation difficulties when accelerometers are employed, due to the
fact that accelerometers cannot measure a DC component.
Fig. B2.3 shows matrix AH for the 12 resulting possible combinations of the
different estimation approaches, types of transducers and knowledge on the prior
information [105]. The columns correspond to the three different estimation
approaches, NI-MHE, RW-MHE and CS-MHE, respectively. The top block
considers the arrival cost to be known, whereas the bottom block does not
take this information into account. The two rows within each block make
a comparison between displacement and acceleration transducers. For the
numerical study of this section, all dependent variables are eliminated from
matrix H, which means that the system is observable if the rank equals the
number of columns, i.e., z contains only independent variables [105]. Every
set-up can be uniquely distinguished by looking at the distribution of the
nonzero components inside AH (small circles), to be compared with the blocks
in Eqs. (B20–B21). In general:
• The arrival cost (if present) is represented by an identity matrix I ∈

Rnx ×nx located at the top left corner of AH .
• The RW model is characterised by two parallel diagonal lines of nonzero
values in the bottom right corner of AH . A single line indicates the
CS-MHE. In both cases, they populate the last N −1 rows of AH .
• The state-space matrix D is empty in case of displacement transducers,
while it is full when accelerometers are being employed.
Each subplot of Fig. B2.3 presents rank, condition number (cond) and size
of AH (number of rows × number of columns). All values derive from the
cantilever beam described in section B2.2. Here follow a few comments about
the structure of matrix AH [105]:
• With displacement measurements, the two inputs are observable up to

time step k = T −1 for NI-MHE and CS-MHE, while the RW-MHE reaches
time step k = T . The presence of the arrival cost does not influence
the rank, but the condition number of the NI-MHE without any input
model gets much higher if the arrival cost is not present, meaning that
the estimation is likely not to be accurate.
• Both physically and mathematically, the CS-MHE with displacement
transducers cannot observe an input which is applied at the last time step
of the window. In estimating a force, acceleration measurements provide
STRUCTURE OF THE MATRICES 109
NI−MHE RW−MHE CS−MHE

rank = 38. cond = 1.7e+08 rank = 40. cond = 1e+07 rank = 38. cond = 4.9e+05
−−−−−−−−−−−−−−− arrival cost −−−−−−−−−−−−−−
0 0 0
10 10 10
20 20
disp
20
30 30
30
40 40
40
50 50
0 10 20 30 0 20 40 0 20
size 45x38 size 53x40 size 53x38
rank = 40. cond = 2e+11 rank = 40. cond = 1.3e+11 rank = 40. cond = 5.7e+10
0 0 0
10 10 10
20 20
20
acc
30 30
30
40 40
40
50 50
0 20 40 0 20 40 0 20 40
rank = 38. cond = 6.9e+11 rank = 40. cond = 1.4e+07 rank = 38. cond = 3.3e+06
−−−−−−−−−−−−−− no arrival cost −−−−−−−−−−−−−
0 0 0
10 10 10
20 20
disp
20
30 30
30
40 40
40
0 10 20 30 0 20 40 0 10 20 30
rank = 38. cond = 2.7e+17 rank = 38. cond = 1.9e+17 rank = 40. cond = 5.8e+10
0 0 0
10 10
10
20 20
acc
20
30 30
30
40 40
40
0 20 40 0 20 40 0 20 40
Figure B2.3: Matrix AH for 12 test conditions, with N = 5, nx = 6, r = 3 and

nm = 2. Figure reproduced from [105].
a direct connection with the estimates, while displacement measurements

are characterised by a lag, due to the presence of time derivatives.
• The arrival cost makes the rank of the RW-MHE full for the case with
acceleration measurements. However, it is worth to note that the arrival
cost should carry accurate information. If this is not the case, the
condition number would increase substantially with a general decrease in
the accuracy of the estimates. This is based on the fact that the arrival
cost should provide the DC component, which accelerometers do not
measure [133]. Accordingly, it is worth underlying that in practice it may
be better not to consider any arrival cost.
B2.4 Rank and condition number for different

amounts of inputs
The number of states and inputs that can be observed with a certain number
of transducers is an important factor to take into account when setting up a
filter. Fig. B2.4 shows ranks and condition numbers for a set of input locations
nm = 1, . . . , 5. The graphs correspond to each case of Fig. B2.3, and include
also the test case with nm = 2 that we described in section B2.3. The solid grey
lines show the condition numbers, plotted on a base 10 logarithmic scale (grey
right vertical axes). The dashed black lines refer to the ranks (black left vertical
axes of the graphs), and different markers characterise each point indicating
one of the following situations:
# full rank–column, i.e., AH has more rows than columns; its rank is full,
limited by the number of columns. All states are thus observable.
2 full rank–row, i.e., AH has more columns than rows; its rank is full,
limited by the number of rows. More information is needed in order to
observe all states.
× rank deficient–column, i.e., AH has more rows than columns; its rank is
not full, meaning that some states are not observable. The red digit next
to the marker indicates the difference between the rank and the number
of columns.
3 rank deficient–row, i.e., AH has more columns than rows; its rank is not
full, meaning that more information is needed in order to observe all
states, and some information is redundant (linearly dependent). This can
happen because of critical locations of inputs and transducers.
RANK AND CONDITION NUMBER FOR DIFFERENT AMOUNTS OF INPUTS 111
NI−MHE RW−MHE CS−MHE

−−−−−−−−−−−−−−− arrival cost −−−−−−−−−−−−−−
42 20 60 20 50 6
41 18
40 16 45 5.75
50 −2 15
39 14
disp
−1
38 12 40 5.5
37 10 40 10
36 8 35 5.25
35 6
34 4 30 5 30 5
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
nm nm nm
45 15 60 20 55 25
50 −2 20
50 −2 15
−1 −1
acc
40 10 45 15
40 10
40 10
35 5 30 5 35 5
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
nm nm nm
−−−−−−−−−−−−−− no arrival cost −−−−−−−−−−−−−
39 12 60 20 50 7
38 11
45 6.5
50 −2 15
37 10
disp
−1
40 6
36 9
40 10
35 5.5
35 8
34 7 30 5 30 5
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
nm nm nm
39 20 50 18.5 60 25
−5
38 18
45 18
−4 50 −2 20
37 16 −1
acc
40 −3 17.5
36 14
40 15
35 −2 17
35 12
−1
34 10 30 16.5 30 10
1−1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
nm nm nm
Figure B2.4: Ranks and condition numbers of AH as a function of nm . Legend:

ranks (left y axis, - - -); condition numbers (right y axis, base 10 logarithmic
scale, —–+); full rank–column (#); full rank–row (2); rank deficient–column
(×); rank deficient–row (3). Figure reproduced from [105].
The following list discusses the main outcomes of the simulations [105]:
• With an increasing number of inputs nm , many scenarios that refer to

the NI-MHE have a rank which is limited by the number of rows of AH .
In other words, the NI-MHE requires more information for most cases.
Including no information about the input comes at the expense of the
need for more measurements.
• The scenarios with nm ≤ 3 are in general full rank for RW-MHE and
CS-MHE, in line with the observability matrix and the PBH test for single
step estimators (cf. section A1.8). An exception is the RW-MHE with
accelerometers and without arrival cost: this is the most critical case,
since an accurate arrival cost is crucial for acceleration measurements to
have a DC component.
• For nm > 3, AH remains full rank for the CS-MHE with displacement
transducers, while it is rank deficient for the RW-MHE and for the CS-
MHE with acceleration measurements. Condition numbers are much
higher than for nm ≤ 3, highlighting a possible decrease in accuracy.
• Displacement measurements lead to better condition numbers in compari-
son with acceleration measurements. In fact, a time derivative to obtain
velocity and acceleration from displacement data is computationally more
stable than an integration of an acceleration signal.
• A particular situation arises for the CS-MHE with displacement
measurements. In fact, the condition number is quite stable while nm
increases. This indicates a possible strength of the CS-MHE, but requires
a deeper insight considering also matrix Σ.
• All matrices of the CS-MHE are built using α instead of α∗ , such that
all possible basis functions belonging to the dictionary Ψ (and not only
the few nα∗ > 0 of its sparse representation) take part in the discussion.
This assumes that all inputs are associated with a weighting number
(covariance) in Σ. Consequently, more possible input locations increase
the overall system uncertainty, which can be quantified by the covariance
matrix of the full problem.
The last comment motivated two further investigations. First, we consider only
one time step for the input estimation (k = T−N+1, i.e., there are no B matrices
on the other time steps k = T −N +2, . . . , T −1), while as usual we estimate the
states within the whole window [105]. We only examine the NI-MHE without
arrival cost, and we start by taking into account one input location (nm = 1,
corresponding to a matrix B of size nx × 1). Next, we simulate an increasing
disp acc
33 20 33 20
−1 −2 −1 −2
log10(cond)
log10(cond)
rank
rank
32 10 32 15
31 0 31 10
1 2 3 4 5 1 2 3 4 5
nm nm
Figure B2.5: NI-MHE with no arrival cost. Scenario with an increasing number
of input locations within one time step.
disp acc
34 9 34 12
log10(cond)
log10(cond)
rank
rank
32 8 32 11
30 7 30 10
1 2 3 4 1 2 3 4
number of time steps with an input number of time steps with an input
Figure B2.6: NI-MHE with no arrival cost. Scenario with one input location on
an increasing number of consecutive time steps.
nm . Fig. B2.5 shows ranks and condition numbers for this test case. We see
that the rank grows proportionally to the number of inputs for nm ≤ 3, and
numerical instability arises for nm > 3. Finally, Fig. B2.6 shows ranks and
condition numbers for a last scenario with one single input location (nm = 1),
first active only at time step k = T −N +1 and then extended to consecutive
time steps up to k = T −1. Similarly to the previous case, we report the results
for the NI-MHE without arrival cost, both for displacement and acceleration
measurements. We notice that the rank grows proportionally to the number
of added time steps. However, the condition number corresponding to 4 time
steps is much higher, revealing that the threshold nm ≤ 3 may still influence
the accuracy of the results.
Up to this point, we showed the results of a set of simulations that allowed us
to investigate the behaviour of rank and condition number of AH for different
scenarios involving input models, types of transducers and arrival cost. A
comparison helped us to understand strengths and limitations of each test case.
In many situations, a number of inputs nm > 3 results in numerical instability,
expressed by rank deficiency and/or a badly conditioned matrix AH . The
threshold nm > 3 is in line with the observability matrix (cf. section B2.2.1)
and with the PBH test for single step estimators.
Let us now focus on the CS-MHE, reminding that it deals with a sparse
representation of an input. Specifically, dictionary Ψ in Eq. (B1e) is now an
identity matrix. For the CS-MHE, the analysis in this chapter applies to the
number of nonzero components of a sparse vector, and the threshold nm ≤ 3
indicates a sparsity level that has to be guaranteed. Consequently, the CS-
MHE can take into account a much higher number of input locations, provided
that the number of nonzero basis functions stays within the over mentioned
requirements. In case of the cantilever beam example, it is possible to choose
nm > 3 provided that the input sparsity is S ≤ 3 (cf. chapter C1). From the
second last case (Fig. B2.5), it is clear that a number of input locations nm > 3
leads to numerical instability, while we could not draw any specific conclusions
with regard to the input distribution in time (Fig. B2.6). For what compressive
sensing is concerned, we remind that the whole sampling scheme (i.e., N ·nm )
should be proportional to the input sparsity (cf. chapter A3) [25].
If we look back at Eqs. (A71–A72) on page 49 and Fig. B2.2 on page 107, we can
see that the rank of the observability matrix saturates quite fast, i.e., after the
rank is satisfied it is not worth to add any further information. This information
may come from additional sensors (as we have illustrated in Fig. B2.2) as well as
from additional time steps, going in the direction of the MHE. In other words,
in case of an MHE with a random walk model the window length follows from
a go/no go assessment driven by observability requirements. Reference [38]
includes a study of the influence of the MHE window length for an unknown
input estimation with a random walk model. The outcome is a confirmation
of the fact that the random walk model benefits only mildly from a longer
window, and this trend dies out fast. On the other hand, let us consider
Eq. (A120) of compressive sensing on page 78, which indicates how the number
of required measurements (ny ) scales with the signal length (nα ) and its sparsity
(S). Fig. B2.7 (left) shows a numerical example of Eq. (A120) for a signal up to
nα = 50 and S = 10 (with cS = 3). We note that ny scales strongly with S and
smoothly with nα especially for small values of S. Moving towards an MHE,
we can rewrite Eq. (A120) by considering that the number of measurement is
equal to the number of sensors times the number of time steps, i.e., ny = N ·nr .
This results in Eq. (B23).
cS S log(nα /S)
nr ≥ (B23)
N
Moreover, the number of basis functions is equal to the number of input locations
times the number of time steps, i.e., nα = N ·nm . Fig. B2.7 (right) depicts
Eq. (B23) for nm = 1 (i.e., nα ≡ N ). We can see that for a certain sparsity
the number of required sensors decreases with an increasing number of basis
functions, indicating that given a certain sparsity a longer signal may bring in
20 0.5
15
nr
ny
10
0
5
50
50 40
40 30
30 20
20
10
10 10
10 n N 5
n 0 5 0 0
0
S S
Figure B2.7: Left: number of measurements (ny ) required by compressive

sensing in function of signal length nα and sparsity S, according to Eq. (A120),
with cS = 3. Right: number of sensors (nr ) required by compressive sensing in
case of an MHE with window N in function of signal length nα and sparsity S,
according to Eq. (B23), with cS = 3 and nm = 1 (small values correspond to a
dark colour).
15 1.5
nm = 1
10 1 nm = 2
ny
nr
nm = 3
5 0.5
0 0
0 10 20 30 40 50 0 10 20 30 40 50
n n = N nm
Figure B2.8: Sections from the curves in Fig. B2.7 for a constant sparsity S = 3.
In the right graph we indicate also the dependency on the number of input
locations (nm ).
extra information with less sensors. In other words, a long MHE window may
be worth if CS is used for input modelling. This behaviour is more evident if
we look at a section of constant sparsity, and Fig. B2.8 shows the case of S = 3.
The tendency of a decreasing amount of sensors holds also for multiple input
locations (right graph). Eq. (B23) does not express the amount sensors required
by the CS-MHE, since it reflects exclusively the theory behind compressive
sensing, and not its involvement in the CS-MHE. However, it indicates that the
need for sensors decreases with an increasing window length, and this trend
persists. Although these considerations come from Eq. (A120) which is a rather
empiric formula of CS, they constitute an interesting starting point for future
research, aiming at establishing rigorous sparsity requirements for the CS-MHE

and better understanding the CS-MHE results with displacement measurements
in Fig. B2.4.
In this section we have indicated that the number of measurements required
by the CS-MHE scales with S, and in case of high S this can result in the
CS-MHE requiring a higher number of measurements in comparison with the
random walk model (we will show an example with the estimation of multiple
Fourier components in chapter C2, section C2.3.2). However, it can generate
very accurate results at a relatively low sampling rate. In this sense, we can
consider compressive sensing as a complementary approach to other approaches
for input modelling such as the random walk. The exact domain of applicability
of compressive sensing requires further investigation, and remains an interesting
open point for future developments of the CS-MHE.
B2.5 Conclusions
In this chapter we presented a few scenarios where we compared the rank and
condition number of the CS-MHE with an MHE with no input information
(NI-MHE) and an MHE with a random walk model for the input (RW-MHE).
The comparison involved different types of measurements (displacement and
acceleration) and the availability of prior information (arrival cost). This allows
us to list the following conclusions [105]:
• The number of inputs which is possible to observe with an NI-MHE is

much lower than the RW-MHE and the CS-MHE. It is thus meaningful
to exploit any known input information.
• An accurate arrival cost is crucial when acceleration measurements are
employed. In fact, it carries the information on the DC component, which
cannot be measured by accelerometers. It is important to note that if
the arrival cost is required for observability reasons, the estimation can
work only if there is a specific time instant at which all variables are
known up to a certain accuracy. Otherwise, the filter will fail if we do not
provide explicitly that information. Moreover, the quality of the arrival
cost deteriorates while the window is sliding in time. This manifests in
a growing covariance of the first time step of the window in the next
iteration.
• The RW-MHE can detect an input applied at every time step of the
estimation window, while the CS-MHE does not reach the last time step
if displacement transducers are employed to estimate a force.
CONCLUSIONS 117
• For multistep estimators based on the MHE, the limit on the number
of inputs which is possible to observe for the NI-MHE and RW-MHE
translates into a limit on sparsity for the CS-MHE, and in this chapter we
showed this threshold. For an LTI numerical test case with 6 states and
3 displacement transducers, the NI-MHE can estimate 3 inputs within
a window, the RW-MHE can follow 3 input positions evolving in time,
while the CS-MHE can reconstruct a signal from its sparse projection up
to 3 active basis functions. In case of accelerometers, the DC component
cannot be measured. An accurate arrival cost may improve the rank, but
the system is likely to deviate in time, leading to a poor estimation.
• The random walk model is a powerful representation which allows
the observation of a limited number of inputs under the hypothesis
of slow dynamics and known locations, whereas the CS-MHE admits
more locations and no dynamic constraints provided that the input
representation has a limited sparsity.
In this chapter we confined the discussion to an LTI system excited by an

external load modelled by a Dirac delta dictionary. As future work, it could be
interesting to investigate a nonlinear system as well as any rank and condition
number dependencies in relation to the type of dictionary, including complex
dictionaries (that make the optimisation problem result in a bigger SOCP in
comparison with the QP which generates from a real dictionary, cf. chapter A2).
Moreover, further points of interest may regard the influence of the sampling
rate and possible intrinsic sparsity correlation between time and space.
Besides the results that we obtained from the numerical experiments of this
chapter, an important outcome regards the design of further experiments. In
fact, we realised that any of our first application cases of the CS-MHE (both
numerical and experimental) should include an extremely flexible measurement
system for what amount and location of the sensors are concerned. This is
quite easy to arrange when performing numerical simulations, while it is not
a simple task on a physical set-up. For example, a sensor array composed by
accelerometers or strain transducers may have a strong contribution to the
system dynamics, such that adding or moving a sensor requires to update
the model. Moreover, such an array may demand several (costly) acquisition
channels. In order to avoid these issues, we chose to work with contactless vision
based measurements. Consequently, the numerical examples in part C of this
dissertation involve virtual displacement transducers, and their experimental
counterparts utilise displacement information that originate from a sequence of
images.
Part C
Applications
119
Chapter C1
Estimation of force impacts
Throughout this dissertation we developed the CS-MHE with the aim of reducing
the observability issues which are typical of joint state/input estimators and
allowing for the estimation of inputs characterised by a fast dynamics. In
particular, in chapter B1 we presented the formulas of the CS-MHE, while
in chapter B2 we discussed a few numerical aspects such as its rank and
condition number. However, up to here we did not show any application
example of the CS-MHE. This chapter presents a few numerical test cases and
one experimental validation that illustrate the capability of the CS-MHE to
go beyond the observability threshold linked to the amount of possible input
locations and to capture the fast behaviour of force impacts. Furthermore, these
application cases allow to illustrate how to determine some of the CS-MHE
tuning parameters such as the balancing weight λ and the covariance matrices,
which are crucial to achieve a good estimation accuracy.
We begin by introducing a numerical test case in section C1.1, which
demonstrates the potential of the CS-MHE over other state of the art techniques.
Furthermore, we present an experimental validation in section C1.2, before
concluding the chapter in section C1.3.
Acknowledgements
This chapter refers mostly to [104], of which Matteo Kirchner is first author. A
big thanks goes to Jan Croes and Francesco Cosco, who are co-authors of [104]
and actively contributed to the first experimental validation of the CS-MHE.
Thanks also to Jean-Pierre Merckx and Eddy Smets for their precious support
during the measurement campaign.
121
122 ESTIMATION OF FORCE IMPACTS
C1.1 Numerical estimation of multiple force im-

pacts
The evaluation of force impacts is in general not an easy task. Force transducers
do exist, spanning from simple resistive solutions up to accurate sensors based
either on strain gauges (for low frequency ranges) or on piezoelectric elements
(for high frequency ranges). However, such measurement systems can be very
expensive and it is not always possible to integrate them in the design of a
mechanical system due to geometrical aspects or durability issues. In such
context, model based estimators and virtual sensors offer a very appealing
solution to the problem. In this chapter we employ the CS-MHE formulation
that we described in section B1.2 in order to perform joint state/input estimation
for the detection of force impacts entering the system at an unknown location.
For the reason behind the choice of this particular formulation we refer to
section B1.5 and to the literature regarding force modelling through sparse
dictionaries and CS in section A1.7.4. On the one hand, the examples in this
section demonstrate the capability of the CS-MHE to go beyond the observability
issues which are typical of other approaches such as a random walk model. On
the other hand, the estimation of an impulse offers a further challenge related to
its intrinsic fast dynamics, for which other state of the art techniques (especially
single step estimators) have strong limitations to deal with (cf. section A1.7).
Most of the material of this chapter comes from reference [104].
We begin this section by introducing a numerical example to test the CS-
MHE formulation that we described in section B1.2. We consider an LTI
mechanical system, of which the generic state-space equations are in Eq. (A33).
More specifically, we modelled analytically a cantilever beam with uniform
rectangular cross-section, according to the Euler–Bernoulli beam theory. We
obtained the state-space model following the procedure given in reference [63],
which we adapted to a cantilever beam with displacement transducers. The
system is similar to the one we introduced in section B2.2 to assess rank and
condition number of the CS-MHE. Fig. C1.1 shows the beam as well as:
• three simulated displacement transducers s1 , s2 , s3 (nr = 3), located at

three equidistant points denoted as x(s1 ), x(s2 ), x(s3 ) in Table C1.1;
• the first three eigenmodes, which the model takes into account.
Consequently, the system is described by 6 states (nx = 6), i.e., three
position modal participation factors (MPFs) and their time derivatives
[63]. The damping values associated to the eigenmodes are denoted as ζ1 ,
ζ2 and ζ3 in Table C1.1;
• a spatial sampling (eight equally spaced grey crosses; nm = 8);
NUMERICAL ESTIMATION OF MULTIPLE FORCE IMPACTS 123
F4 F3 F2 F1
s1 s2 s3
0 0.125 0.245 0.325 0.405

x [m]
Figure C1.1: Numerical test case. Legend: 1st mode (—–); 2nd mode (- - -);
3rd mode (- · -); spatial sampling (+); transducers (s1 , s2 , s3 ); input (F1 , F2 ,
F3 , F4 ). Figure reproduced from [104].
Parameter Value
Beam length [m] 0.405
Beam width [m] 0.025
Beam thickness h [m] 0.003
Density [kg/m3 ] 7502
Young’s modulus [GPa] 65.9
x(s1 ) [m] 0.245
x(s2 ) [m] 0.325
x(s3 ) [m] 0.405
ζ1 0.030
ζ2 0.037
ζ3 0.119
εα [N] 1.0
Table C1.1: Parameters of the numerical test case.
• the input, which consists of four force impacts F1 , F2 , F3 , F4 entering

the system at different locations and time, as indicated in Table C1.2 and
Fig. C1.2.
Force ID Time [s] Location [m] Magnitude [N]

F1 0.03 0.405 -10
F2 0.06 0.365 -5
F3 0.09 0.285 5
F4 0.12 0.165 10
Table C1.2: Location of the impacts in time and space.
Figure C1.2: Reference input for the numerical test case (—–×, the nonzero
components are marked with a darker circle), and window for the first CS-MHE
iteration (light blue). Figure reproduced from [104].
We note that the force (green crosses in Fig. C1.2) is zero except for the 4
impacts, such that the input signal is sparse in time and space. Consequently,
there is no need to project the signal onto any specific dictionary, and u and α
are equivalent (cf. section B2.1). The first estimation window is marked in light
blue, and consists of nm = 8 spatial points (the grey crosses in Fig. C1.1) and
N = 11 time steps, such that the input estimation takes place on 10 time steps
[105]. This value allows for good accuracy [167] and for a fast computation. The
system is at rest and this status does not change until F1 is applied. Table C1.1
summarises geometry and material properties of the beam.
The CS-MHE can directly estimate the location of an input only if this is
applied to a sampling point. If this is not the case, the input energy is spread
among the neighbouring nodes. However, we can still estimate accurately the
25
Qdrift [N2]
2.5
0
T−N+1 T−1 T
k
Figure C1.3: Qdrift as a linear function of the time step k. Figure reproduced
from [104].
exact input location by linear interpolation, provided that the input consists of
one single impulse (see Fig. B1.1 on page 97, bottom left graph) [63]. In such
context, CS outperforms the random walk model for what the robustness in
relation to the accuracy of an input location is concerned. In fact, an input
applied to an unexpected location may jeopardise the estimation, since the
random walk model does not take into account such uncertainty [104].
Fig. C1.3 shows that the drift term (Qdrift ) follows a linear function of the time
step k within one window. This choice derives from the fact that the estimation
is expected to be more accurate if both past and future data take part in the
estimation (cf. definition A1.18), and this happens next to k = T −N +1 [173].
In order to investigate the influence of the modelling error, we simulated a
model mismatch by varying the beam thickness (h). Table C1.3 shows the first
3 eigenfrequencies of the beam for the reference test case (h = 0.003 m) as well
as 3 other scenarios involving a thinner beam, which results in a frequency
mismatch indicated by δ% . The choice of parameters εR , εQ and λ for each case
will become clear in section C1.1.1. We chose a sampling period of 2.5·10−3 s
(400 Hz), which satisfies the Nyquist-Shannon sampling theorem for the highest
eigenfrequency. Note that the CS-MHE exploits compressive sampling for the
observation of a large amount of input positions, and here we are not considering
its ability to acquire and reconstruct an undersampled signal in time [104].
It is worth noticing that according to the observability matrix the numerical
example is not observable, i.e., a random walk model applied to each sampling
points renders the system unobservable (cf. section B2.2.1). This is true also if
we know a priori where the 4 inputs are applied (they would require 4 random
walks), since more that 3 random walk models are not admitted. On the other
hand, the observability criterion for the CS-MHE requires that at most 3 nonzero
components are active at the same time within an estimation window. For
the case under examination (force impacts with a Dirac delta dictionary), the
number of the active components corresponds to the number of force impulses
within a window (cf. sections A1.8 and B2.1).
h [m] 0.0030 0.0029 0.0028 0.0027

δ% [%] (reference) 0.0 -3.3 -6.7 -10.0
Freq. 1 [Hz] 8.76 8.47 8.17 7.88
Freq. 2 [Hz] 54.88 53.05 51.22 49.39
Freq. 3 [Hz] 153.66 148.54 143.42 138.30
εR [m2 ] 1.48·10−8 1.48·10−8 1.48·10−8 1.48·10−8
εQ 7.39·10−2 1.48 598.74 598.74
λ 3.27·10−4 2.94·10−2 1.98·10−4 3.27·10−4
Table C1.3: First 3 eigenfrequencies of the beam and CS-MHE tuning parameters
for the numerical test cases.
C1.1.1 Choice of the balancing weight λ
In chapter B1 we pointed out the key role of the weight λ and its dependency
on the model and measurement covariance matrices (cf. section B1.2). In this
section we discuss the choice of this crucial CS-MHE parameter. Fig. C1.4
shows the mean square error (MSE) of the input estimation of the reference
case for different values of λ and constant arbitrary covariances Q and R. We
note that the MSE drops within a region of λ, corresponding to an accurate
input estimation. If λ is too small, the optimisation gives more weight to the
minimisation of the model and measurement errors, while the input sparsity
cannot be exploited. On the other hand, a too high value of λ would promote
sparsity within a system that does not minimise any model and measurement
errors, resulting in a higher MSE [104]. In Fig. C1.4 we note a few further
aspects. First, the interval of λ for which the MSE drops is rather wide,
meaning that we can admit some uncertainty in the choice of λ without having
the estimation results compromised. Next, we see that the curve is non-smooth
in the neighbourhood of minimum MSE, and this can be due to the fact that
the graph comes from a numerical investigation that involves discrete values
of λ (the solution is suboptimal, being it governed by the choice of discrete
values, equidistant on a logarithmic scale), and it may also be linked to the
sampling scheme (a higher sampling rate may reduce the non-smoothness [38]).
Furthermore, if we look at the MSE outside the optimal area we note that
choosing a too small λ results in a smaller error than if we choose a too high
value. We know that the model is very good for the reference case, and it makes
then sense to rely on it rather then putting emphasis on the input sparsity.
log10(MSE)
−2
−4
−6
−10 −5 0 5
log10(λ)
Figure C1.4: Choice of λ for the numerical test case δ% = 0%, given arbitrary
constant Q and R. MSE of the input estimation (blue line) and location of its
minimum (green circle). Figure reproduced from [104].
Different combinations of Q and R lead to different values of λ that minimise

the MSE. Fig. C1.5 shows the MSE of the input estimation given as a function
of εQ , εR and computed with their related optimal λ. εQ and εR are two scaling
factors assigned to Q and R, respectively. We carried out a simulation for each
of the 4 scenarios that we introduced in Table C1.3. It is worth noting that there
are particular combinations of εQ and εR that give a smaller MSE, and those
are located along a line in a log–log plane (white dashed line). We spot this
trend especially in case of modelling error, while the reference case (δ% = 0%)
manifests a wider interval of low MSE. Furthermore, Fig. C1.6 depicts the values
of the optimal λ for each combination, and we see that they scale logarithmically
while moving along the line of minimum MSE. Given a measurement set-up
(such that εR is known) and a model, there are a certain model accuracy (εQ )
and a certain optimal λ at which the MSE curve has a minimum (i.e., the
CS-MHE performs its best). Those values are marked with a green circle and
are collected in Table C1.3 [104].
It is important to underline that the procedure for choosing λ that we propose
in this section is only applicable if we have some knowledge about the input,
i.e., the MSE in the graphs relies on a known reference input. This is likely to
happen in practice since we need to know the input shape in order to choose a
dictionary. Otherwise, we should make a guess according to our best knowledge
of the model and measurement accuracies. Finally, we note the presence of
several values for the measurement accuracy in the tuning process. If we know
the accuracy of the measurements we can simplify the procedure by fixing εR .
The reason why we kept the dependency relates primarily to possible violations
of Gaussianity connected to nonlinearities or deviations from zero mean noise.
Furthermore, in case of sensor arrays each transducer may have a different
accuracy which may be difficult to assess for each sensor. Consequently, in
practice it is easier to assign the same covariance to each transducer (we faced
this issue with camera measurements, cf. section C2.3 and appendix 2).
δ% = 0% δ% = −3.3% δ% = −6.7% δ% = −10%
2 2 2 2
log10(εQ)
log10(εQ)
log10(εQ)
log10(εQ)
0 0 0 0
−2 −2 −2 −2
−10 −8 −10 −8 −10 −8 −10 −8
log10(εR) log10(εR) log10(εR) log10(εR)
Figure C1.5: MSE of the input estimation as a function of εQ and εR . The

white dashed lines indicate the combinations of εQ and εR that give a minimum
MSE (small values correspond to a dark colour). The chosen combinations are
marked with a green circle, and correspond to Table C1.3.
δ% = 0% δ% = −3.3% δ% = −6.7% δ% = −10%
2 2 2 2
log10(εQ)
log10(εQ)
log10(εQ)
log10(εQ)
0 0 0 0
−2 −2 −2 −2
−10 −8 −10 −8 −10 −8 −10 −8
log10(εR) log10(εR) log10(εR) log10(εR)
Figure C1.6: Optimal λ as a function of εQ and εR . The white dashed lines

and the green circles corresponds to Fig. C1.5 (small values correspond to a
dark colour).
C1.1.2 Results and discussion
In this section we present the results of the numerical investigation. Fig. C1.7
shows the estimation window i = 7 for δ% = 0% (the whole simulation is available
as supplementary material of [104]). The left graphs display the states, divided
into position (top) and velocity (bottom) MPFs. Their confidence intervals
are also present (MPFn ± 3σn , i.e., 99.7% of the normal distribution, where
n = 1, 2, 3 identifies the first 3 eigenmodes of the structure). The right graph
shows the input estimation. The time axis in Fig. C1.7 is relative to the current
window. Fig. C1.8 shows the input estimation of the full simulation. We
obtained the graph by keeping the elements α ≥ εα that correspond to the best
estimation time step of each window, i.e., k = T −N +1 (cf. definition A1.18).
Input estimation (i = 7)
Position MPF
0.01
MPFn ± 3σn
10
0
5
−0.01
F [N]
0 0.01 0.02 0
t [s]
Velocity MPF −5
0.5
MPFn ± 3σn
−10
0
0.2
−0.5 0.3 0.02
0 0.01 0.02 0.01
t [s] x [m] 0.4 0 t [s]
Figure C1.7: State/input estimation at window i = 7 for δ% = 0%. Legend (left

graphs): 1st mode (—–); 2nd mode (- - -);reference
3rd mode (· · ·). The thick green lines
are the reference, the thick blue lines are the CS-MHE estimation, confined into
CS−MHE
two blue thin lines that represent the confidence level (MPFn ± 3σn ). Legend
(right graph): reference values (—–×); CS-MHE estimation (—–◦). The time
is relative to the current window. Figure reproduced from [104].
We can follow this approach if the last estimate (k = T −1) is not required for
specific real time applications. Moreover, we can calculate the discarded energy
(due to εα ) and adding it to each of the nonzero components proportionally
to their magnitude, obtaining the solid dots in Fig. C1.8, revealing the high
accuracy of the input estimation.
Fig. C1.9 presents three further aspects regarding the input estimation within
each iteration. First, the top graph shows the MSE of every window. We
notice that the MSE grows with the model mismatch. The higher error of the
simulation with δ% = −10% is not only due to the higher model mismatch, but
it is also due to a poor tuning. In fact, by comparing the values of εR , εQ and
λ in Table C1.3 for the case with δ% = −10% to the values for a smaller model
error, we note that the tuning encountered some issues. A finer tuning may
improve the solution, and we will discuss this aspect further in chapter C2. The
different MSE among the input locations is connected to the sensor positioning,
which is linked to the mode shapes of the beam. A simple way to compare
different locations involves the condition number of the observability matrix, or
checking all singular values of the PBH observability matrix (cf. section A1.8).
Furthermore, there exist techniques for optimally placing the transducers for
improved performance [77]. However, this is not a simple task when the location
of the input is not known. For this reason, we adopted a series of equidistant
sensors. Next, the central graph shows nα∗ i|i for each window, which depends
10
5
F [N]
−5
−10
0.2
0.15
0.3 0.1
0.05
x [m] 0.4 0
t [s]
Figure C1.8: Global solutions at k = T − N + 1 and α > εα for δ% = 0%.

Reference (—–×), CS-MHE (—–◦) CS-MHE corrected with the total energy
(—–•). Figure reproduced from [104].
on the choice of εα . The dashed green line (recognisable by the crosses) is the
reference and corresponds to the sparse signal of Fig. C1.2. We notice a link
between the number of inputs and the MSE, i.e., detecting more components
increases the MSE. Finally, the bottom graph in Fig. C1.9 shows the sum of all
elements αi|i . We observe that the magnitudes of F2 and F4 do not converge to
their expected values, whereas F1 and F3 do converge. The reason for this offset
lays in the fact that part of the model error is seen as input by the CS-MHE.
We can limit this error by a better model or a different sensor positioning which
improves local observability. In general, we can expect better results if the
input enters the system next to a transducer. In Fig. C1.1 we note that the
location of F1 corresponds to a sensor (s3 ), both F2 and F3 are surrounded by
all sensors (with the difference that F2 enters the system not far from a node of
the third mode), and F4 is located further away from the sensors. This results
in a lower estimation accuracy of F4 and, to a minor extent, also of F2 .
As an example, let us have a look at iteration 45 of the test case with δ% = −10%
(marked with a red diamond in Fig. C1.9). Fig. C1.10 shows the results of the
input estimation. We notice that most of the nonzero components are located
around the impulse, while some others are further away and are due to the
model mismatch. Unfortunately, it is not possible to filter out those components
a priori. However, we see that the results are accurate even in case of a high
model error.
0.3
MSE [N2]
0.2
F1 F2 F3 F4
0.1
0
5 10 15 20 25 30 35 40 45 50
i
6 F1 F2 F3 F4
nα* i|i
0
5 10 15 20 25 30 35 40 45 50
i
10
sum(α i|i) [N]
0 F1 F2 F3 F4
−10
0 5 10 15 20 25 30 35 40 45 50
i
Figure C1.9: MSE (top), nα∗ i|i (centre) and sum of all elements in αi|i . Legend:
reference values (- - -×), δ% = 0% (—–◦), δ% = −3.3% (- - -), δ% = −6.7% (- · -),
δ% = −10% (· · · ). Figure reproduced from [104].
Fig. C1.11 (left) shows the input estimation of the whole simulation with
δ% = −10%, to be compared to Fig. C1.8 for δ% = 0%. All peaks are well
estimated, together with some error in form of unwanted peaks. A way to filter
those peaks out is to set a higher threshold εα , as shown in Fig. C1.11 (right).
Both graphs in Fig. C1.11 do not include any energy correction, which can be
implemented as discussed for the reference test case with δ% = 0%. However,
due the modelling error the CS-MHE estimates a wrong impact. Fig. C1.11
highlights the importance of εα on the accuracy of the input estimation (cf.
section B1.2.1). Its choice follows from our best knowledge of the system noise
and the expected input. The development of an automatised routine for choosing
εα is certainly an interesting open point for future research.
To summarise, the reference test case (δ% = 0%) gave a very accurate input
estimation, whereas a small error resulted when we increased the model mismatch

Position MPF
0.01
n
10
0
n
5
−0.01
F [N]
0 0.01 0.02 0
t [s]
Velocity MPF −5
0.5
MPFn ± 3σn
−10
0
0.2
−0.5 0.3 0.02
0 0.01 0.02 0.01
t [s] x [m] 0.4 0 t [s]
Figure C1.10: Input estimation at window i = 45 for δ% = −10%. Legend: see

Fig. C1.7 (right). Figure reproduced from [104].
reference
CS−MHE
εα = 1 N εα = 2 N
10 10
5 5
0 0
F [N]
F [N]
−5 −5
−10 −10
0.2 0.2
0.3 0.1 0.3 0.1

0.05 0.05
x [m] 0.4 0 x [m] 0.4 0
t [s] t [s]
Figure C1.11: Input estimation for δ% = −10% in case of εα = 1N (left) and

εα = 2N (right). Legend: see Fig. C1.8. Figure reproduced from [104].
up to δ% = −10% on each eigenfrequency. Being the CS-MHE a model based

estimator, its performances depend on the model accuracy. The error can be
reduced by improving the model and tuning the CS-MHE parameters.
EXPERIMENTAL ESTIMATION OF ONE FORCE IMPACT 133
Figure C1.12: Experimental set-up. Figure reproduced from [104].
Freq. ID Freq. EMA [Hz] Freq. MoUp [Hz] Error [%] (δ% )
1 8.76 8.76 0.00
2 61.58 54.88 -3.71
3 180.52 153.66 -14.88
Table C1.4: Comparison of the first 3 eigenfrequencies of the beam, computed

experimentally and after model updating.
C1.2 Experimental estimation of one force impact
In this section we present an experimental validation for the numerical test

case of section C1.1. Fig. C1.12 shows the experimental set-up. A beam is
instrumented with several transducers. First, 3 accelerometers are mounted
along the beam. Moreover, an LED is placed in front of each accelerometer
for vision tracking. Their location as well as the other geometry and material
properties correspond to the ones of the numerical example in section C1.1 (cf.
Table C1.1).
We used the accelerometers to measure the modal parameters of the beam [145],
which acted as target values for updating the analytical model. The LEDs served
as displacement transducers, with their motion being optically tracked by a
Nikon Metrology K600 system [136]. We performed a model updating procedure
based on experimental data since the presence of the accelerometers strongly
influenced the dynamic behaviour of the beam. Moreover, the transducers’ cables
are responsible for the high damping values indicated in Table C1.1. Table C1.4
shows the eigenfrequencies of the beam, computed experimentally (first column)
[145] and after model updating (second column). The right column shows the
error (δ% ) for each eigenfrequency, and indicates that the third eigenmode
diverges significantly from the measurements. A better approximation could
be achieved through a more detailed model such as a finite element (FE)
representation [133], but an analytical model was chosen here for its simplicity.
We will employ an FE model in chapter C2 (cf. section C2.3).
s1 s2 s3
0 0.125 0.245 0.325 0.405

x [m]
Figure C1.13: Experimental test case. Legend: see Fig. C1.1. Figure reproduced
from [104].
Figure C1.14: Reference input for the experimental test case. Legend: see
Fig. C1.2. Figure reproduced from [104].
We applied a force impact F with a hammer to x(s2 ), as shown in Fig. C1.13.

For synchronization purposes, the hammer provided a trigger signal to the data
acquisition system. The exact input location in time and space was measured
through a video recorded during the measurement campaign by a XIMEA
MQ042CG-CM high-speed camera [201] synchronized with the K600 system,
which showed that the contact between the hammer and the beam holds for
two time steps (the video is available as supplementary material of [104]). Since
no direct force measurement was available during the experiment, we assumed
that the impulse follows a quadratic curve centred in the middle of the two
nonzero components, going to zero at their adjacent time steps [89]. The four
points highlighted with dark green circles in Fig. C1.14 belong to that curve,
and serve as reference for interpreting the CS-MHE results. The magnitude
of its peak is purely indicative and was set according to our best knowledge
EXPERIMENTAL ESTIMATION OF ONE FORCE IMPACT 135
log10(MSE)
0
−1
−2
−5 0 5
log10(λ)
Figure C1.15: Choice of λ for the experimental test case, given a constant Q
and R. MSE of the input estimation (blue line) and location of its minimum
λ = 2.14 (green circle). Figure reproduced from [104].
log10(λ) log10(MSE)
−1.2
0 −1
−1 −1.4
log10(εQ)
log10(εQ)
−2 −1 −2 −1.6
−3 −2 −3 −1.8
−3 −2
−4 −4
−8 −7 −6 −5 −8 −7 −6 −5
log10(εR) log10(εR)
Figure C1.16: Optimal λ (left) and MSE (right) of the input estimation as
functions of εQ and εR . Legend: see Figs. C1.5–C1.6. The chosen values are
εR = 8.16 · 10−8 and εQ = 5.10 · 10−3 .
of the input. The values of εR , εQ and λ for the experimental test case follow
from Figs. C1.15–C1.16 and are given in the captions. If we compare Fig. C1.15
with Fig. C1.4, we see that the interval of minimum MSE is quite narrow. In
this case the model is not as accurate as it was for that specific numerical
investigation, resulting in the need of an accurate λ. This reflects in the whole
MSE curve: if we rely on the model (small λ), the error is much higher than
what we can obtain by relying solely on the sparsity of the input (big λ). This
is the opposite of what we noted in Fig. C1.4.
Fig. C1.17 shows the estimation window i = 12, while the whole simulation is
available as supplementary material of [104]). For the notation and the legend
we refer to the numerical example in section C1.1.2. A few nonzero components
are located in the neighbourhood of the expected input position, and some of
them will be filtered out since their absolute value does not exceed εα .

x 10
−3 Position MPF
5
MPFn ± 3σn
0
0
−2
−5
F [N]
0 0.01 0.02
t [s]
−4
Velocity MPF
MPFn ± 3σn
0.2 −6
0
0.2
−0.2 0.02
0.3
0 0.01 0.02 0.01
t [s] x [m] 0.4 0 t [s]
Figure C1.17: State/input estimation at window i = 12 for the experimental

test case. Legend: see Fig. C1.7. Figure reference
reproduced from [104].
CS−MHE
−2
F [N]
−4
−6
0.2
0.08
0.3 0.06
0.04
0.4 0.02
x [m] 0
t [s]
Figure C1.18: Global solutions at k = T −N +1 and α > εα for the experimental

test case, corrected with the total energy. Reference ( —–×), CS-MHE (—–◦).
Figure reproduced from [104].
Fig. C1.18 shows the input estimation of the full simulation. We evaluated
the discarded energy due to εα and distributed it to each of the three nonzero
components, proportionally to their magnitude. Two components are located
where we expect them, while the third one is on a neighbouring node, located in
time at the second nonzero component of the impact and in space at x = 0.365
m, i.e., in the direction of the tip of the beam. A closer look at the video of the
acquisition (the video is available as supplementary material of [104]) shows
CONCLUSIONS 137
MSE [N2]
0.5
0
5 10 15 20
i
4
nα* i|i
0
5 10 15 20
i
Figure C1.19: MSE (top) and nα∗ i|i (bottom). The dashed green line is a
reference, and follows from the signal on Fig. C1.14. Figure reproduced from
[104].
that the hammer hits the beam in the direction of that node, which justifies
the presence of the third component.
Finally, Fig. C1.19 presents MSE and nα∗ i|i for the experimental test case. We
notice that the estimation gets more accurate when the input approaches the
end of the window, justifying the choice of Qdrift (cf. Fig. C1.3) as well as the
choice of recognizing the last time step of each window as the best estimate
[173]. A certain delay characterises the input detection, revealing the intrinsic
capability of a time window to detect an impulse, which is not a trivial task for
single step estimators (cf. section A1.4).
C1.3 Conclusions
In this chapter we presented a numerical example as well as an experimental

validation of the CS-MHE for the case of force impacts. The CS-MHE
formulation corresponds to the one we described in chapter B1. We exploited
the numerical example to show that the CS-MHE allows to estimate the states
of an LTI mechanical system as well as a force impulse applied at an unknown
location. The example highlighted the potential of the CS-MHE in terms
of observability and input dynamic range in comparison to a random walk
model. Furthermore, it exhibited the robustness of the CS-MHE with regard
to modelling and measurement errors, and indicated a relationship between
the CS-MHE tuning parameters, i.e., the weight λ of the CS term in the cost
function and the covariances Q and R associated to model and measurement,
respectively. The experimental example served as validation scenario, yielding
to an accurate estimation for a real system subjected to model and measurement

errors, confirming the numerical results.
In chapter A3 we mentioned the RIP as a condition to increase the rate of
success of CS (cf. section A3.3.1). The RIP cannot be verified for arbitrary
matrices, but it can be tested numerically through a Monte Carlo approach
on CS problems like Eq. (A115c), given a matrix Θ and simulating a (large)
number of sparse vectors α up to a certain sparsity. Due to the more complex
structure of the CS-MHE problem, such study becomes challenging. In fact,
it would involve all terms of the cost function including all tuning parameters
and settings of the CS-MHE, such as the number of sensors, the number of
possible input locations, the number of (augmented) states, the window length
and the balancing weight λ. At the same time, some of these parameters have to
satisfy the observability criterion. All the numerical test cases presented in this
chapter as well as the experiment converge to the expected results, suggesting
the success of the `1 -norm optimisation even if the RIP is formally not assessed.
Future work will investigate this aspect [104].
Chapter C2
Estimation of periodic loads

described by Fourier
components
In this chapter we give a few demonstrations of the formulation of the CS-

MHE in case of periodic inputs described by a complex Fourier dictionary. In
section C2.1 we begin by presenting a numerical example where we apply a
Fourier dictionary for the estimation of an input applied at a known location.
Next, in section C2.2 we extend the example for the estimation of distributed
loads using a two-dimensional Fourier dictionary. In section C2.3 we propose an
experimental validation that involves a finite element model (updated through
an experimental modal analysis and reduced on modal coordinates) and vision
based measurements. Finally, we briefly conclude the chapter in section C2.4.
Acknowledgements
The first part of this chapter expands reference [103], of which Matteo Kirchner
is first author. Thanks to Jan Croes and Francesco Cosco, who are co-authors
of [103]. Thanks to Eddy Smets for building the test set-up and to Daniele
Brandolisio for his help in the lab. Thanks to Karim Asrih, Ward Rottiers, Luca
Sangiuliano and Simon Vanpaemel for their hints regarding Siemens NX. Thanks
to Frank Naets and Jakob Fiszer for the practicalities regarding state-space
models. Thanks to Noé Geraldo Rocha de Melo Filho for sharing with me
his knowledge of LMS Test.Lab. Thanks to Francesco Cosco for the camera
measurements and to Tom Henskens for the circuit board to synchronise the
pictures with the acquisition system. Thanks to Florian Maurin for Fig. C2.7.
139
140 ESTIMATION OF PERIODIC LOADS DESCRIBED BY FOURIER COMPONENTS
Geometry Tip force

4
F(t)
F(k) [N]
0
s1 s2
−4
0 0.1 0.2 0.3 0.4 T−N+1 T−1T
x [m] k
Figure C2.1: Geometry of the numerical test case (left) and simulated sinusoidal
force applied at the beam tip (right). Figure reproduced from [103].
C2.1 Numerical estimation of a periodic load in

time
In chapter B1 we indicated the Fourier dictionary as a way to model a periodic

load, which we typically encounter in rotating machinery. Such situation strongly
encourages the implementation of virtual sensors, since it may be extremely
expensive or even impossible to instrument a rotating shaft or a gearbox with
force and torque transducers. In this section we show a numerical example that
involves a periodic load applied at a known location on a mechanical system. We
model the load through a one-dimensional Fourier series and we assume input
periodicity within the estimation window. This example provides a first proof
of concept of the second CS-MHE formulation that we introduced in chapter B1
(cf. section B1.4 for the implementation of a complex Fourier dictionary). Part
of the material of this section comes from reference [103].
The numerical example recalls the cantilever beam of section B2.1 (see Table B2.1
for geometry and material properties). For this test case we simulated two
displacement transducers (s1 and s2 , located at x = L/2 and x = L, respectively)
and one sinusoidal force F, applied at the beam tip as shown in Fig. C2.1. The
force has an amplitude of 4 N and a frequency of 120 Hz, such that there are
two complete sines in a window of N = 11 time steps sampled at 600 Hz. Since
we take into account only one input location, the input vector uk consists of one
element, to be considered on N−1 time steps [105]. Consequently, nm = 1, nr = 2,
nx = 6 (3 position MPFs and their time derivatives, cf. section B2.2). The sizes
of αk , α, Pα , and Ψ are thus αk ∈ C1 , α ∈ CN −1 , Pα ∈ C(N −1)×(N −1) and
Ψ ∈ C(N −1)×(N −1) . We solved the resulting optimisation problem in MATLAB®
by exploiting the modelling language YALMIP [121] and the solver MOSEK
[131].
Fig. C2.2 shows the results of the SOCP for a single time step (iteration 1). The
NUMERICAL ESTIMATION OF A PERIODIC LOAD IN TIME 141
Iteration 1
−3 MPFpos,n ± 3σn ℜ(α) ± 3σ
x 10
10
2
uk
0 0
−2 4
−10
T−N+1 T−1T Fourier components
F(k) [N]
k 0
MPFvel,n ± 3σn ℑ(α) ± 3σ

1 10 −4
T−N+1 T−1T
0 0 k
−1 −10
T−N+1 T−1T Fourier components
k
Figure C2.2: Results at iteration 1. Legend (state estimation): 1st mode (—–);
2nd mode (- - -); 3rd mode (· · ·). The thick green lines are the reference, the
thick blue lines are the CS-MHE estimation, confined into two thin blue lines
that represent the confidence level (MPFn ± 3σn ). Legend (input estimation):
reference values (- - -×); CS-MHE estimation (—–◦). Figure reproduced from
[103].
two graphs on the left hand side refer to the state estimation, and depict position
and velocity MPFs. Moving to the right, the next two graphs show the results
of the input estimation, expressed as <(α) and =(α) and arranged according
to the MATLAB® convention for the DFT, i.e., first the DC component, then
the half space of positive wave numbers and finally the negative half space.C1
The ±3σ confidence bands come from the covariance matrix of the constrained
optimisation problem [19], where we linearised the constraint Eq. (B11b) (cf.
Eq. B15). Finally, we obtained the right graph by applying the inverse Fourier
transform to the nonzero components of α, which are marked by a solid circle.
The input at iteration 1 is purely complex, since the signal is a sine with a
phase shift of π rad. Moreover, the DC component is null, since the force has
zero mean. A single Fourier component consist of one pair of complex conjugate
elements, scaled by the square root of the number of samples. The behaviour of
<(α) and =(α) with regard to the time shift of the optimisation window can be
spot in an animated GIF of the simulation, which is available on YouTube [102].
We can see (predominantly on the velocity MPF of the 3rd eigenmode) that
the sizes of the confidence levels decrease throughout the simulation, thanks to
the propagation of the prior information through the covariance matrix of the
optimisation problem [104].
C1 For further details we refer to function fft on MATLAB® help.
1
x ℜ(α) ℑ(α) s
0
−1
log10(σ)
−2
−3
−4
−5
Figure C2.3: Standard deviation as square root of the main diagonal of the
covariance matrix.
The example that we present in this section is characterised by relatively

small matrices, since it considers only one input location. This gives us the
chance to look into a few details of the CS-MHE, some of which we indicated
in chapter B1. At the end of section B1.4 (more precisely at the end of
section B1.4.3) we mentioned that we can keep the matrix structure and sizes
constant independently of the number of estimated inputs, by introducing a
regularisation factor on the zero elements of α. We can now show the meaning of
this regularisation factor by looking at the square root of the elements belonging
to the main diagonal of the covariance matrix (i.e., the standard deviation (σ))
for iteration 1, which we report in Fig. C2.3. The block subdivisions in the top
of the graph correspond to structure of ζ in Eq. (B13) (cf. section B1.4.3). In
the block of the states x we observe a sequence of 3 small values followed by 3
higher values. Those refer to position and velocity MPFs, respectively, and are
repeated for every time step within the estimation window. We also notice that
the uncertainty tends to get smaller while approaching the last time step of the
window. As for the remaining blocks, which are connected to each other by
the cone constraint Eq. (B11b), we recognise that the values that correspond
to a nonzero Fourier component assume a certain value, while all remaining
elements are constant (and relatively small). They represent the regularisation
factor, which is an arbitrary value that we assigned to the zero elements of α in
order to satisfy observability (cf. section A1.8 and chapter B2). The value of
the constant in the diagonal of the covariance matrix corresponds exactly to the
regularisation factor that we fed to the system, indicating that the operation
does not influence the results of the input estimation.
Fig. C2.3 allows us to discuss two further aspects, i.e., the computation of the
amplitude of the nonzero components of the estimated input α as well as the
NUMERICAL ESTIMATION OF A PERIODIC LOAD IN TIME 143
8
6
|α|
4
2
0
Fourier components
Figure C2.4: | α | for iteration 1, obtained by the slack variable | α | = s (- - -×)

and derived as | α | = <(α)2 + =2 (α) (—–◦)
p
0.6
0.4
σ
0.2
0
Fourier components
Figure C2.5: Comparison standard deviation (σ|α| ) for iteration 1. From

covariance–slack variable (- - -×), derived with Eq. (C2) (—–◦)
weighting numbers associated to it. From the theory of complex numbers we

expect that Eq. (B11b) is an active constraint, i.e., the equality indicated in
Eq. (C1) holds. Both sides of Eq. (C1) should then lead to the amplitude of the
Fourier components, i.e., | α |. Fig. C2.4 shows this comparison, indicating that
the constraints are active, as the two sides of Eq. (C1) lead to the same values.
s = <(α)2 + =(α)2 (C1)

p
Since the slack variable is part of the optimisation variables (and consequently
also part of the covariance matrix associated to it), a weighting number becomes
available (last block in Fig. C2.3). On the other hand, we can derive the same
quantity through the formula for propagating an uncertainty, assuming <(α)
and =(α) to be independent variables [115]. For our case, this yields to Eq. (C2).
Fig. C2.5 displays the comparison between the standard deviations from the
covariance matrix and calculated through Eq. (C2), showing an excellent match.
s
<(α)2 =(α)2
σ|α| = 2 2
· σ<(α) + · σ=(α) (C2)
<(α) + =(α) <(α)2 + =(α)2
F(x,t)
s1 s2 s3 s4
0 0.1 0.2 0.3 0.4

x [m]
Figure C2.6: Geometry of the numerical test case for the distributed load.
In this section we presented the estimation of a periodic input in time. We

will introduce an experimental validation for a similar 1D Fourier dictionary in
section C2.3, while in the next section we outline the 2D case for the estimation
of inputs distributed both in time and space.
C2.2 Numerical estimation of a periodic load in

time and space
In this section we show an extension of the example in section C2.1 for a

two-dimensional Fourier dictionary, i.e., the set of basis functions derives from
an inverse 2D-DFT [107]. Such example serves as proof of concept for the
estimation of distributed loads. Fig. C2.6 shows the beam test case, which
corresponds to the one of section C2.1. For this example we chose nm = 8 and
nr = 4 (as usual, nx = 6). Fig. C2.7 helps to visualise the distributed load.
Within the estimation window of (N = 11) × (nm = 8) points, the load has two
complete sine cycles in time (cf. section C2.1) and one complete sine in space.
Fig. C2.8 shows the results for the first iteration. One 2D Fourier component
consists of four nonzero elements, which undergo a phase shift in time while
the filter is running. The graphs indicate the capability of the CS-MHE of
reconstructing the input in the whole window thanks to the estimation of four
nonzero elements of the sparse representation obtained by Fourier components.
This highlights the benefits with respect to observability that the CS can
provide to joint estimators. Future work will investigate different types of input
projections (e.g., wavelets) for the estimation of distributed inputs.
EXPERIMENTAL ESTIMATION OF A PERIODIC LOAD IN TIME 145
0
F [N]
-2
-4
0
0.1
0.2
x [m] 0.3 T-1 T
0.4 T-N+1 k
Figure C2.7: Reference signal (—–×), overlapped by a surface (small values

correspond to a dark colour). The vertical planes indicate the estimation
window.
C2.3 Experimental estimation of a periodic load in

time
In order to experimentally test the CS-MHE with a periodic load, we designed

and built an ad hoc set-up. In this section we briefly introduce the test case
(section C2.3.1) and we present the experimental results (section C2.3.2). We
collected all details about the set-up in appendix 2, including the procedure that
we followed to build the model to use within the CS-MHE, and the measurement
system. Appendix 2 is rather extensive, since it involves several topics, i.e.,
the test set-up, its related finite element model (updated with measurements
through an experimental modal analysis) and the measurements. Since we chose
to operate with camera based measurements (cf. section B2.5), in appendix 2
we also describe the digital image processing steps that we took to transform
pixel information into displacement values. This section includes the results
that we presented in [106].
C2.3.1 Test case description
The test set-up consists of a 1 m long aluminium beam clamped on each side to
a vertical mount, which is fixed to the ground (Fig. C2.9). Nine steel masses
are attached to the beam every ∆x = 0.1 m with the aim of lowering the
eigenfrequencies and add complexity to the system. We built a finite element
(FE) model and we updated it based on a series of experimental modal analyses
−3 MPFpos,n ± 3σn ℜ(α) ± 3σ

x 10
3
10
2
5
0
1
−5
0 −10
8
−1 10
T−N+1 T−1 T 1 1
Γ[x] Γ[k]
k
MPFvel,n ± 3σn ℑ(α) ± 3σ
0.6
10
0.3
5
0
0
−5
−0.3 −10
8
−0.6 10
T−N+1 T−1 T 1 1
Γ[x] Γ[k]
k
uk [N]
4
−2
−4
0
0.2
0.4 T−1 T
T−N+1
x [m]
k
Figure C2.8: Results at iteration 1. Legend (state estimation): 1st mode (—–);
2nd mode (- - -); 3rd mode (· · ·). The thick green lines are the reference, the
thick blue lines are the CS-MHE estimation, confined into two thin blue lines
that represent the confidence level (MPFn ± 3σn ). Legend (input estimation):
reference values (- - -×); CS-MHE estimation (—–◦); confidence bands on
the Fourier components (—–). The nonzero components are marked by a
solid circle. Γ [x] and Γ [k] refer to the Fourier components in space and time,
respectively.
Figure C2.9: Beam set-up (with two uniaxial accelerometers at x = 0.750 m).
(EMAs). For the experiments that we present in this section, three eigenmodes
govern the dynamical behaviour of the system, which we report in Table C2.1
and Fig. C2.10. These dominate the structure response due to the orientation
of the external force. In fact, we attached a shaker to the structure such that
it applies a force along the z axis, and these modes belong to plane xz (the
axis orientation follows the notation in Fig. C2.11). The ID numbers 1, 3, 8 in
Table C2.1 and Fig. C2.10 derive from the whole mode set that we present in
appendix 2.
In order to run the CS-MHE, we need the model of the set-up to be available in
MATLAB® . We tackled this problem by extracting the FE mass and stiffness
matrices and projecting them on modal coordinates [78]. This allows to build a
reduced order state-space model that takes into account only the eigenmodes
under examination, and to operate in the same way as we did throughout this
dissertation. Beside the state-space representation, in this section we employ
camera based measurements to test the CS-MHE. The following list reports the
main hardware of the experimental set-up (see Fig. C2.12), and we refer again
to appendix 2 for further details.
• Three adhesive paper strips on the beam, each of them resulting in 53

markers equally distributed on a length of 0.260 m, i.e., a marker every
0.005 m. Due to the lighting condition, we employ only the central strip
(“BEAM STRIP 2” in Fig. C2.12). This corresponds of having up to 53
contactless displacement transducers spanning the range from x = 0.370 m
to x = 0.630 m.
• One shaker, located at x = 0.750 m and acting along the z axis (THE
MODAL SHOP Miniature Inertial Shaker K2002E01 [188]).
Table C2.1: Model update for the first three beam modes in plane xz.
Mode ID Freq. EMA [Hz] Freq. MoUp [Hz] Error [%]

1 25.74 25.79 0.19
3 68.93 69.19 0.38
8 134.71 136.00 0.96
Mode 1 Mode 3 Mode 8
Figure C2.10: Mode shapes of the first three beam eigenmodes in plane xz.
Figure C2.11: Reference system of the beam set-up.

Figure C2.12: Beam set-up.
• One impedance head, also located at x = 0.750 m and measuring the

dynamical force along the z axis (PCB ICP® 288D01 [144]). It measures
the force entering the system for validation purposes.
• One high-speed camera (Ximea MQ042CG-CM [201]).
• Additional hardware for data acquisition, i.e., a LMS SCADAS [180], a
circuit to synchronise the camera with the impedance head, and a PC.
C2.3.2 Experimental results
In this section we indicate the settings of the experiment as well as the values
for the CS-MHE tuning parameters. Next, we show the experimental results.
We will be rather concise since we already discussed how to choose the tuning
parameters εQ , εR and λ in chapter C1, and we presented a numerical example
with a 1D Fourier dictionary in section C2.1. Table C2.2 includes the values
for a first experiment that involves the estimation of a force composed by a
single Fourier component. Later in this section we will consider an increasing
amount of components. We begin by showing two cases that differ by the
amount of measurements (nr ) and consequently of other tuning parameters.
The first run uses all 53 markers available in the central strip, while for the
second run we employ only 3 markers, located at x = 0.370 m, x = 0.500 m
Parameter Value Parameter Value Parameter Value

N 17 ζ1 0.15 εQ 1.05
6 0.15 mm2 4·10−4

nx ζ3 εR
nm 1 ζ8 0.10 εα 1
Table C2.2: Parameters of the experimental test case.
and x = 0.630 m. A different amount of sensors requires to calibrate λ and

Qdrift , which we indicate in the captions of Figs. C2.14 and C2.16 for the two
runs, respectively. In fact, λ needs to be scaled with all available second order
information that enter the cost function (cf. section C1.1.1).
We chose the weights for the covariances associated to model and measurements
(εQ and εR in Table C2.2) through a simulation similar to those that lead to
Figs. C1.5, C1.6 and C1.16 in chapter C1. In particular, the order of magnitude
of εR corresponds to the standard deviation that we can obtain by looking at
the measurements. Unfortunately we did not have enough measurements of
the structure at rest to perform a rigorous statistical analysis on each point,
and consequently we decided to assign the same uncertainty to each marker of
the central strip. We can visualise this aspect by considering the whole set of
measurements at an arbitrary time step such as the top graph in Fig. C2.13. We
can fit the values of the central strip with a smooth (second order) curve, remove
such trend from the signal and then compute mean and standard deviation of the
resulting signal, as in the bottom graph of Fig. C2.13. This gives σ = 0.016 mm
and consequently σ 2 ≡ εR = 2.57·10−4 mm2 , which is slightly lower than the
value in Table C2.2. However, employing this value as εR did not give the
smallest error in the input estimation, while a higher value εR = 4·10−4 mm2
performed better. This is due to the fact that each marker has a different
uncertainty (cf. section C1.1.1) and the measurements are not zero mean. We
could notice this through a comparison with the readings of a laser Doppler
vibrometer.
The experimental results that follow involve a sinusoidal load with a frequency
of 128 Hz, applied at x = 0.750 m and sampled at 512 Hz (512 fps), i.e., there
are 4 samples per wavelength. This corresponds to twice the Nyquist-Shannon
rate, but may be a very challenging frame rate for an estimator that employs
a random walk model, due to the relatively big amplitude difference within
two consecutive time steps (cf. section A1.7.2). Once again, this indicates
that the CS-MHE can be instrumental for the estimation of inputs which are
characterised by a fast dynamics. Fig. C2.14 shows the CS-MHE results for
the first iteration using 53 displacement sensors, i.e., the whole central strip
z [mm] 0.1
-0.1
0 250 500 750 1000

x [mm]
0.04
0.02
z [mm]
0
-0.02
-0.04
370 500 630
x [mm]
Figure C2.13: Note concerning the choice of εR . Top graph: displacements of

the beam during an arbitrary time step (—–) and second order fitting of the
central strip (- - -); the signal corresponds to Fig. 14 in appendix 2. Bottom
graph: zoom on the central strip after removing the fitted curve (—–), including
its mean (- - -) and standard deviation (- · -). In particular, σ = 0.016 mm.
of markers that had the most light (“BEAM STRIP 2” in Fig. C2.12). We
can see that the input estimation (solid blue) follows well the measurements
acquired through the impedance head (dashed green).C2 The third eigenmode
(mode ID 8 in Table 5) dominates the beam response, not surprisingly since this
eigenmode is the closest to the excitation frequency. Fig. C2.15 shows the MSE
of the input estimation with and without Qdrift . We notice that the presence
of Qdrift does not always lower the MSE, but it smooths out its variability.
Lower values of Qdrift led to a lower estimation accuracy, whereas in general
we expect Qdrift to help the CS-MHE by providing a better guess for the next
iteration. We believe that this behaviour is connected to all covariances that
take part in the optimisation as well as possible numerical issues, and future
research will investigate this aspect. Furthermore, we note that the MSE in
Fig. C2.15 oscillates quite a lot, and this is caused by the fact that the MSE
C2 For the experimental results in this chapter, the green curves that refer to the state
estimation (left hand side graphs) are not reference values. We obtained them by evaluating
the states with the model and the measured force, and thus they indicate the model accuracy
related to that specific load. As for the input estimation (middle and right hand side graphs),
the impedance head does not measure any static force. The graphs show the total force
Ftotal = Fstatic + Fdynamic applied by the shaker, where Fdynamic is given by the impedance
head and Fstatic is given by the total mass of the assembly that includes shaker, impedance
head and connection screws, multiplied by gravity, i.e., Fstatic = 0.296 kg · 9.81 m/s2 = 2.9 N.
MPFpos,n 3 n MPFvel,n 3 n
10 -3
5 4
0 0
-1
-2
-3
-5 -4
T-N+1 T-1 T T-N+1 T-1 T
k k
( ) 3
20
uk [N]
0 5
0
-20
Fourier components
( ) 3 -5
20
-10
0
T-N+1 T-1 T
k
-20
Fourier components
Figure C2.14: Results at iteration 1, obtained by nr = 53 displacement sensors,

λ = 4.22 and Qdrift = 150 N2 . Legend: see Fig. C2.2.
is very sensitive to small phase errors. This fact could also be related to the
choice of Qdrift .
Next, Figs. C2.16 and C2.18 show the results (first iteration and MSE,
respectively) for the run with three measurement points at x = 0.370 m,
x = 0.500 m, x = 0.630 m. From Fig. C2.16 we note wider confidence intervals
(±3σ bands) in comparison with Fig. C2.14, where we employed 53 measurement
points. The MSE in Fig. C2.18 is higher than in Fig. C2.15, but the force
estimation is still very accurate. This graph does not include any drift term
1.3
1.2
MSE [N 2] 1.1
0.9
0.8
Q drift = 150 N 2
0.7
Q drift = 0 N 2
0.6
0 20 40 60 80 100 120 140
iteration ID
Figure C2.15: MSE of the input estimation, obtained by considering 53
displacement sensors.
since it did not introduce any result improvements. As already mentioned, this
aspect requires further investigation.
After showing some results for the cases with 53 and 3 measurement points, let
us now generalise the discussion by investigating the influence of the number of
transducers (nr ) on the results. First, Fig. C2.17 shows how the value of the
balancing weight λ changes with nr . We note that this is particularly relevant if
we want to keep nr low. The curve is not smooth due to the discrete values of λ
during tuning (cf. section C1.1.1). Moreover, we assumed the same covariance
for every measurement point, which is an approximation (it is possible that
every point has a different uncertainty due to the quality of the marker detection,
and taking this into consideration could lead to a smoother curve).
Next, from top to bottom Fig. C2.19 shows the averaged standard deviation (σ)
of the nonzero components of α, obtained by the square root of the diagonal
values of the covariance matrix that refers to the slack variable (cf. Fig. C2.5),
the standard deviation of the fifth element of the Fourier series (cf. Figs. C2.14
and C2.16) and the MSE of the input estimation, respectively. By comparing
the first two graphs we notice that the averaged values are lower than the single
component that refers to the active sinusoid. This happens because the DC
component has a lower uncertainty. Such behaviour is stronger for nr = 1, 2
due to the fact that there are more nonzero components (the system is not
observable, and extra regularisation is needed to compute the covariance matrix,
cf. section B2.1). This explains also why the first two points do not follow
the smooth curve of all other points. Furthermore, we note that σ decreases
while the number of transducers increases, but adding more than a certain
number of transducers (approximately 8) does not result in a strong decrease of
10 -3
5 4
0 0
-1
-2
-3
-5 -4
T-N+1 T-1 T T-N+1 T-1 T
k k
( ) 3
20
uk [N]
0 5
0
-20
Fourier components
( ) 3 -5
20
-10
0
T-N+1 T-1 T
k
-20
Fourier components
Figure C2.16: Results at iteration 1, obtained by nr = 3 displacement sensors,

λ = 0.38 and no Qdrift . Legend: see Fig. C2.2.
1
log 10( )
0.5
0
-0.5
-1
1 10 20 30 40 50 53
nr
Figure C2.17: Choice of λ for different number of measurement points (nr ).

2.5
MSE [N 2] 2
1.5
0.5
0 20 40 60 80 100 120 140
iteration ID
Figure C2.18: MSE of the input estimation, obtained by considering 3
displacement sensors.
5
(average)
4 iteration 1
iteration 2
3 iteration 3
2
1
1 10 20 30 40 50 53
nr
5
(1 comp.)
4
3
2
1
1 10 20 30 40 50 53
nr
4
MSE [N2]
3
2
1
0
1 10 20 30 40 50 53
nr
Figure C2.19: Averaged standard deviation from the covariance associated to

the nonzero components of the slack variable (top), standard deviation of the
5th element of the DFT (middle) and MSE of the input estimation (bottom),
as a function of the number of measurement points (nr ).
uncertainty. In the bottom graph we see that the smooth increase in accuracy
does not correspond to better results, since the MSE tends to stay constant
(a part from the non observable cases with nr = 1, 2). Lastly, we note that
σ corresponding to the first iteration is higher than for the second and third
iterations, due to the availability of the arrival cost in the latter cases. On
the other hand, we do not notice any similar trend for the MSE of the input
estimation, whose oscillating behaviour is mostly due to small phase errors. The
values of σ reported in Fig. C2.19
√ are scaled according to the chosen Fourier
transform convention (i.e., 1/ N − 1, where N −1 is the size of the Fourier
spectrum, and differs by 1 from the window length N ).
During a further measurement campaign, we investigated the influence of an
increasing number of sinusoids. Table C2.3 shows the different settings for
the tests. Notation “min(nr )” in the third column refers to the observability
requirement. We set the same amplitude for each sinusoidal component of the
input.C3 However, the dynamic response of the system composed by shaker
and structure acted as a filter, with a low-pass tendency. In comparison with
Fig. C2.16 (and previous analogous illustrations), the figures that follow contain
an extra graph (middle-right) with the amplitude of the Fourier components
(abs(α)), whose uncertainty bands are calculated following Eq. (C2). This
allows to visualise the frequency dependent filtering behaviour of the assembly
composed by the structure and the shaker in response to the same amplitude
for each Fourier component of the load (green crosses in the middle-right graphs
of Figs. C2.20–C2.30). From the fourth column of Table C2.3 we see that the
window length varies with the number of sinusoids. We made this choice a
posteriori by choosing the shortest window that generates sufficient sparsity and
guarantees that all frequencies are modelled, in order to avoid spectral leakage
that would degenerate the sparsity level (i.e., S increases). For the case of four
sines we considered a few window lengths. We remind that a single sinusoid
consists of two complex conjugate peaks, and a further nonzero element is due
to the static weight of the shaker (DC component).
Let us now go through the estimation results. First, Fig. C2.20 refers to one
sinusoid, and the results are in line with the previous experiment (cf. Fig. C2.16).
Next, Fig. C2.21 shows two sinusoids. We did not manage to estimate accurately
both components keeping the same window length as the previous case, due to
the deteriorated sparsity resulting from the extra component. Consequently,
we chose N = 33, being this the next smallest number that includes the two
frequencies indicated in Table C2.3. In order to have an idea about the sparsity
levels we can look back at Eq. (A120), which suggests that S = 5 is quite high
in case of N = 17. It is worth to mention that in general every combination
C3 We built custom audio files (wav files, paying attention not to generate any signal clipping),
that LMS Test.Lab [181] passed to the shaker through a LMS SCADAS [180].
Table C2.3: Settings for the experiment with multiple sines.
# sines Freq. [Hz] min(nr ) N

1 128 3 17
2 64; 128 5 33
3 64; 128; 192 7 33
4 64; 128; 160; 192 9 33; 49; 65; 81
of number of sensors nr and sparsity level S requires a different tuning, and

this applies especially for the balancing weight λ. Accordingly, we computed
every scenario by using a value obtained through simulations similar to what
we discussed in section C1.1.1 (see also Fig. C2.17).
Adding a third component produces the results of Figs. C2.22 and C2.23, where
the importance of a good tuning becomes clear. In Fig. C2.22 we notice that the
third component is not being captured, which may suggest that the window is
too short. However, we can detect all components by keeping the same window
length and adding measurement points (Fig. C2.23, where we also see that
the highest component has a certain phase error and a significant uncertainty).
This encourages a few considerations, which we report in the following list:
• In case we want a minimum number of measurement points (governed

by observability requirements), we may need to rely on the results in
Fig. C2.22. These offer the smallest MSE for the given set of tuning
parameters. The third component has a small amplitude in comparison
with the other components, and it is likely that the filter treats it as noise.
In case the third component is crucial for a specific application, we could
try to fine tuning all parameters or alternatively to choose a lower λ in
order to relax the weight on the sparsity requirement, at the price of a
possible higher MSE.
• In case the number of measurement points does not represent a rigid

constraint, then adding measurements may be the way to proceed, as we
pointed out in Fig. C2.23. In this context, it is important to underline
that adding measurements also come with additional noise, which clearly
reflects on the covariance of the optimisation problem. In other words,
adding measurements does not always lead to better results, since they may
deteriorate the matrix conditioning. Such aspect plays an important role
with our camera based measurements, where each point offers a different
uncertainty due to the quality of the marker detection (that involves the
10 -3
4 3
3
2
2
1
1
0 0
-1
-1
-2
-2
-3
-4 -3
T-N+1 T-1 T T-N+1 T-1 T
k k
( ) 3 abs( ) 3
20
20
0 10
-20 0
Fourier components Fourier components
( ) 3 uk [N]
20
5
0
0
-5
-10
-20
Fourier components T-N+1
k
T-1 T
Figure C2.20: Results at iteration 1 for one sinusoid, obtained by considering 3

displacement sensors and a window of 17 time steps. Legend: see Fig. C2.2.
marker itself in terms of typology, condition, positioning and light level),

which we assumed to be the same for each marker. We can see how this
affects the numerical stability of the covariance matrix by looking at the
wide uncertainty bands of the third component in Fig. C2.23. Concerning
this aspect, we believe that a rigorous noise assessment of the camera
measurement system can improve the estimation results.
0.015 6
0.01 4
0.005 2
0 0
-0.005 -2
-0.01 -4
-0.015 -6
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
30
20
20
0
10
-20
0
( ) 3 uk [N]
10
20
0
0
-10
-20
-20
k
T
Figure C2.21: Results at iteration 1 for two sinusoids, obtained by considering

5 displacement sensors and a window of 33 time steps. Legend: see Fig. C2.2.
Finally, Figs. C2.24–C2.30 illustrate the results with four Fourier components
for different window lengths (N = 33, 49, 65, 81) and number of measurement
points (nr = 9, 15, 50). To obtain these graphs we lowered λ, not to fall into the
problems that we already mentioned with regard to Fig. C2.22, using as starting
points the values that corresponds to minimum MSE (cf. section C1.1.1). We
are aware that the computational effort increases substantially going from
N = 33 to N = 81, but here we want to explore the possible benefits that
0.01 4
0.005 2
0 0
-1
-0.005 -2
-3
-0.01 -4
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
20 20
0 10
-20 0
( ) 3 uk [N]
20 10
0
0
-10
-20
k
T
Figure C2.22: Results at iteration 1 for three sinusoids, obtained by considering

a longer window would induce. By considering 33 time steps as in the case

of two and three Fourier components (cf. Figs. C2.21–C2.23), in Fig. C2.24
we notice that with 9 sensors the CS-MHE does not capture the component
with the lowest amplitude, whereas its neighbouring frequencies become active
instead.C4 Having 15 sensors such as in Fig. C2.25 overcomes this issue. This
C4 A few low frequency components become active as well. We can limit their number by
scaling the threshold on the active components (εα ) with the window length.
0.01 4
0.005 2
0 0
-1
-0.005 -2
-3
-0.01 -4
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
20 20
0 10
-20 0
( ) 3 uk [N]
20 10
0
0
-10
-20
k
T
Figure C2.23: Results at iteration 1 for three sinusoids, obtained by considering

15 displacement sensors and a window of 33 time steps. The ± 3σ interval that
falls outside the graph equals approximately ± 30. Legend: see Fig. C2.2.
tendency remains (to different extents) also in case of a window of 49 time steps
(Figs. C2.26–C2.27) and of 65 time steps (Figs. C2.28–C2.29). A last example
involves 81 time steps and 50 measurement points (Fig. C2.30).
We performed the exercise of an increasing number of Fourier components in
order to investigate the requirements and limits of the CS-MHE in terms of
number of measurements, window length (and consequently number of basis
0.01 4
0.005 2
0 0
-1
-0.005 -2
-3
-0.01 -4
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
30
20
20
0
10
-20
0
( ) 3 uk [N]
10
20
0
0
-20 -10

k
T
Figure C2.24: Results at iteration 1 for four sinusoids, obtained by considering

functions in the dictionary) and tuning parameters (covariances and weight of

the CS term) for an increasing sparsity. We obtained good estimation results,
but at the same time we are not yet able to identify clear relations that would
make the CS-MHE a mature technology. Table C2.4 shows the settings and the
main results for the whole experiment. In the second-last column we indicate
an error in time domain (i.e., after the inverse Fourier transform), averaged
within the estimation window and computed as percentage with respect to
0.01 4
0.005 2
0 0
-1
-0.005 -2
-3
-0.01 -4
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
30
20
20
0
10
-20
0
( ) 3 uk [N]
10
20
0
0
-20 -10

k
T

the peak-to-peak amplitude that results from the sum of all active Fourier
components. It is not a rigorous metric since a small phase discrepancy can
have a big influence, but it helps recognising that once the window is long
enough for a certain sparsity it is not worth to further extend it. Moreover, we
expect the error to decrease after a few iterations of the CS-MHE.
The model error has a significant effect on the experimental results, especially
in relation to the input, i.e., the accuracy of the modal participation factors
0.01 4
0.005 2
0 0
-1
-0.005 -2
-3
-0.01 -4
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
30
20
20
0
10
-20
0
( ) 3 uk [N]
10
20
0
0
-20 -10

k
T

is a key element for the force estimation. Different Fourier components at

different frequencies do not have the same weight in the estimation, since their
amplitude is scaled according to the dynamics of the system. Furthermore,
different Fourier components excite the structure eigenmodes in a different way.
Each eigenmode has a certain modelling error, resulting in the need to vary
the covariance associated to the model in order for the filter to perform better.
Accordingly, a comparison that takes into account only the number of Fourier
0.01 4
0.005 2
0 0
-1
-0.005 -2
-3
-0.01 -4
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
30
20
20
0
10
-20
0
( ) 3 uk [N]
10
20
0
0
-20 -10

k
T

components can be misleading, since their magnitude and frequency (connected

to a certain modelling error) play an important role. In addition, a poorly
modelled damping (which is likely to happen in practice) affects the dynamic
response in neighbourhood of the eigenmodes in a much stronger way than in
other portions of the spectrum. In other words, a model may result to be less
accurate in case of a broadband excitation. All these statements are typical
considerations in model based estimation problems, where in general a good
0.01 4
0.005 2
0 0
-1
-0.005 -2
-3
-0.01 -4
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
30
20
20
0
10
-20
0
( ) 3 uk [N]
10
20
0
0
-20 -10

k
T

model improves the estimation accuracy.

Several factors determine the accuracy of the measurements. These include
systematic as well as stochastic phenomena. First, the light has a strong
influence on the measurement error. Furthermore, the quality of the markers,
the accuracy of their locations and the chosen image processing algorithm cause
systematic errors. At the same time, the stochastic behaviour which is typical
of measurement system is present. All uncertainties are also linked to the
0.01 4
0.005 2
0 0
-1
-0.005 -2
-3
-0.01 -4
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
30
20
20
0
10
-20
0
( ) 3 uk [N]
10
20
0
0
-20 -10

k
T

hardware, and in particular to the camera location, resolution, quantisation and

optics. Consequently, different measurement points have a different accuracy.
Some points add less noise to the problem than others, and thus an increase
in the amount of measurements may result in an improvement as well as a
deterioration of the estimates. This aspect links to numerical stability and
constitutes an interesting open point for future research.
The size of a Fourier dictionary depends on the length of the estimation window,
0.01 4
0.005 2
0 0
-1
-0.005 -2
-3
-0.01 -4
T-N+1 T T-N+1 T
k k
( ) 3 abs( ) 3
30
20
20
0
10
-20
0
( ) 3 uk [N]
10
20
0
0
-20 -10

k
T

within which the input should be as sparse as possible due to compressive

sensing and observability requirements. This raises the question weather a
relatively long window (static or possibly shifting in time by a whole window
length) would outperform a more traditional MHE scheme for what the accuracy
of the input estimation is concerned (we refer to a traditional MHE scheme as
an MHE that aims at a short window for computational speed, shifting in time
by one time step and relying on an accurate arrival cost). In the experiments of
CONCLUSIONS 169
Table C2.4: Settings for the experiment with multiple sines.
MSE N2 Error [%]

# sines N nr λ εQ εR Fig.
1 17 3 0.34 2.0 2·10 −4
0.01 0.6 C2.20
2 33 5 0.17 1.9 1·10−3 0.89 4.1 C2.21
3 33 7 0.17 0.3 1·10 −3
1.06 5.2 C2.22
3 33 15 0.21 0.3 1·10−3 0.90 4.8 C2.23
4 33 9 0.04 0.3 1·10 −3
1.67 6.7 C2.24
4 33 15 0.07 0.3 1·10−3 1.75 6.8 C2.25
4 49 9 0.05 0.3 1·10 −3
1.14 5.5 C2.26
4 49 15 0.09 0.3 1·10−3 1.97 7.2 C2.27
4 65 9 0.08 0.3 1·10 −3
1.27 5.8 C2.28
4 65 15 0.12 0.3 1·10−3 1.90 7.1 C2.29
4 81 50 0.14 0.3 1·10 −3
2.69 8.5 C2.30
this section we did not vary the window length by unity steps, not to introduce
spectral leakage, i.e., we always made sure that the frequencies of all active
Fourier components of the input were modelled within the window. Current
research is investigating a few approaches to cope with possible spectral leakage.
C2.4 Conclusions
In this chapter we showed a few numerical as well as experimental examples

in which we employed a Fourier dictionary in order to model a periodic load.
We chose a linear model, but the procedure can be extended to nonlinear
systems provided that the state-space matrices are adapted accordingly. The
numerical examples served as a proof of concept of the formulas that we derived
in chapter B1 (cf. sections B1.3–B1.4). In particular, in the first numerical
example we employed the CS-MHE for the estimation of a periodic input
applied at a known location on an LTI mechanical system, described by Fourier
components. The main outcome is that the CS-MHE is capable to handle
complex dictionaries, paving the way for estimating distributed loads, which we
briefly discussed in the second numerical example. Possible applications include
rotating machinery where loads have a quasi-periodic nature.
The experimental example involved the topics of FE modelling, modal analysis,

model update, model order reduction and high-speed camera measurements.
Appendix 2 includes all details regarding the steps we went through, from
building a sufficiently accurate model of a mechanical system up to running the
CS-MHE for joint state/input estimation. For this reason, in this chapter we
focused on the results. The experiment showed an accurate input estimation
and at the same time it helped spotting a few open points for future research.
These involve additional considerations of the drift term in relation to the
other uncertainties, and different methodologies in camera based high-speed
recordings for structural dynamics applications. Finally, we presented a series of
experiments with an increasing number of Fourier components, in order to assess
the feasibility domain of the CS-MHE in terms of number of measurement points,
window length and system calibration needed in relation to sparsity. Future
work will further develop this topic with the final aim of finding mathematical
relationships that can bring the CS-MHE to a more mature technological stage.
Conclusions and outlook
In this dissertation we reported the development process of the compressive

sensing–moving horizon estimator (CS-MHE) for the joint estimation of states
and inputs. We started by describing the advantages that such a tool can bring
to the current industry trend, we carried on by outlining the state of the art
that constitutes the foundation of the methodology, we derived the CS-MHE
formulas and finally we introduced a few numerical and experimental test cases.
The CS-MHE is a methodology for joint state/input estimation which we
developed in order to cope with the observability issues which are typical
of estimators that need to assess multiple forces, as well as to allow for the
estimation of inputs characterised by a fast dynamics such as force impacts.
The CS-MHE takes advantage of the moving horizon estimator (MHE) for
correlating a model with measurements and minimising their uncertainties in a
given time window, while an `1 -norm term allows to estimate an input signal
described by a small set of basis functions, which is a well known principle of
compressive sensing (CS). This latter feature is particularly interesting because
it allows to exploit additional known information about the input. Other state
of the art approaches for input estimation employ a random walk model, which
relies on a strong assumption concerning the dynamic range of the input, i.e.,
the input does not vary too much. On the other hand, the CS-MHE requires
that the input has a sparse representation in some basis. Sparsity is a property
that characterises many signals and in the context of force modelling it allows
for an efficient representation of high dynamic ranges.
In the framework of virtual sensing (for condition monitoring, measurement
systems, control engineering or smart processes and products) the CS-MHE
offers the possibility to use accurate force data for several purposes, such
as system identification, (contactless) force measurements, accurate product
life-cycle prediction and monitoring, efficient maintenance prediction, process
optimisation. This extra knowledge can result in a reduction of costs, power
consumption and emissions, and improved safety and reliability.
171
172 CONCLUSIONS AND OUTLOOK
In this dissertation we showed that the CS-MHE outperforms other state of

the art estimators for what observability and dynamic range are concerned. In
particular, we showed an example of impact detection which fails if a single step
estimator such as the extended Kalman filter (EKF) is employed in conjunction
with a few random walk models. Furthermore, we discussed the implementation
of complex input representations such as the Fourier series in order to model
a periodic signal in time and/or in space. This can bring great advantages in
rotating machinery where the availability of force and torque measurements
can improve the system performance, but in practice a direct measure of force
and torque is often not possible due to economic or mechanical (and safety)
constraints.
The CS-MHE requires an accurate model, a certain a priori knowledge of the
input shape, the tuning of few parameters and a certain computational power,
since each iteration requires the solution of a constraint convex optimisation
problem (quadratic program or second order cone program). Computational
requirements constitute the major drawback of the MHE over the EKF, and this
disadvantage remains also for the CS-MHE. However, new efficient algorithms
are being developed for the MHE by the communities of nonlinear optimisation
and optimal control, from which the CS-MHE can benefit since we did not
change the structure of the optimisation problem. For these reasons, we see the
added value of the CS-MHE as a novel tool for condition monitoring, but we
do not exclude real time applications in the next future, allowing to exploit the
estimates for control engineering.
During the development of the CS-MHE, a few ideas for future activities have
arisen. We collected them in the following two sections, that list possible
application cases and further methodology developments, and conclude this
dissertation.
Possible application cases
Estimation of forces and torques in rotating machinery

We believe that it is worth to apply the CS-MHE for the estimation of forces and
torques in rotating machinery (e.g., gearboxes and other types of transmission).
In order to enhance sparsity with a Fourier dictionary, it is important to
minimise spectral leakage. If possible, we suggest to set the horizon length and
the sampling rate as a function of the rotational speed. We are aware that this
may not be feasible, and as sidetrack of the work documented in this dissertation
we have already performed some simulations using the experimental data of
section C2.3, where we investigated how to minimise the spectral leakage that
POSSIBLE APPLICATION CASES 173
results if the window does not contain a finite number of sine waves. We tackled
this problem by using the autocorrelation of the estimated input in time domain
[20], that allows to detect the dominant periodic wave and consequently to
adapt the dictionary of the next iteration in order to match the correct period
and minimising thus the spectral leakage. In doing this we did not change
the sampling time nor the MHE window length (N ), but we adapted the the
Fourier shape functions such that they match the detected periodicity. It is
worth noting that this approach works best if N does not match the input
periodicity while the sampling scheme is set in accordance with it, and all sine
waves are higher orders of the dominant frequency. In case of more complex
dynamical behaviours (e.g., some form of combination of rotational speed and
structural eigenmodes), we could think of constructing an ad hoc dictionary
made of Fourier shape functions (multiple of the rotational frequency) as well
as a collection of shape functions that match the eigenmodes of the system.
In this context, it would be interesting to compare the performance of the
CS-MHE with other state of the art estimators such as the EKF or UKF with a
random walk model, under the hypothesis of an observable system. We expect
the CS-MHE to outperform those approaches especially for long windows and
relatively low sampling rates.
Estimation of aerodynamic forces on wings and blades

As a further interesting application case we mention the estimation of distributed
loads such as aerodynamic forces on wings and blades. In this case, different
basis functions such as wavelets or ad hoc dictionaries built to represent an
aerodynamic load on an aerofoil may serve the purpose. Before working on such
case, we recommend to investigate possible shape functions that can sparsify
an aerodynamic load, and to make some preliminary analysis concerning type
and amount of sensors. This may involve 2D numerical investigations of wind
loads as well as more complex 3D computational fluid dynamics simulations.
Estimation of contact forces (tire/road interaction, bearing friction)

Assessing contact forces is a challenging task since usually we cannot place
dedicated sensors due to geometrical constraints. In this context, virtual
sensors represent a very appealing technology. Among others, we mention
tire/road forces and friction in bearings. The forces in the tire/road contact
zone have some typical shapes, suggesting that an approximation through
shape functions could represent the contact force distribution. These may
look like parabolic shapes along the length of the contact zone [139] or may
come from finite element simulations. On the other hand, one of the ideas
for condition monitoring of bearings is to model the kinematics and dynamics
of faulty bearings. Their kinematics is described in [130], where the damage
on a specific part of the bearing results in an impulsive force that repeats
at a known rate as a function of bearing geometry and shaft angle. These

impulses excite the resonant modes of the rotating structure, which become
carriers of the impulsive signals. Exploiting such information could help us
designing a dictionary of shape functions for bearing fault detection, aiming
at a sparse description. An accurate estimation of forces in bearings can lead
to a better assessment of the bearing condition and to a correct tracking of
degradation. Reliable force data are also important for simulation and data
generation, since experimental campaigns may be expensive and time consuming
(the system under investigation needs to run until breakdown). Moreover, such
destructive tests are often carried out at a higher speed and under a higher
load in comparison with the operating conditions, causing a certain mismatch
between simulations and reality. Consequently, improved force estimates can
lead to more reliable simulations. Bearing models are available as single degree
of freedom (DOF) [39], multiple DOFs [177] or finite elements [143].
Damage and defect detection

Compressive sensing is a very appealing technology for damage detection and
defect detection in structural dynamics. In fact, such phenomena are often
rather local and can thus have a sparse representation. Depending on the
system under consideration, this can apply at component level (e.g., there is
a localised defect in the material) or at assembly level (e.g., one screw or one
bearing constitute the defect). In both cases, a sparse representation of the
damage or defect needs to be developed. In the previous application case we
have already mentioned the case of a defect in a rolling bearing, and here we
can extend this example by considering a system with several bearings. It
is very likely that only one bearing is damaged, which would translate into
a sparse information within the full set of bearings. For example, references
[57, 179] discuss two applications of sparse representations for defect detection
in structural health monitoring (SHM). A few local stiffness changes form a
sparse damage parameter vector in [57], whereas [179] presents two sparsity
based algorithms for damage detection in plates. They make use of experimental
data and they solve a LASSO regression problem.
Further methodology developments
Observability
In chapter B2 we investigated rank and condition number of the matrices
that form the CS-MHE optimisation problem, as those metrics are strictly
related to observability. Since we limited the discussion to an LTI system
excited by an external load modelled by a Dirac delta dictionary, future research
could start from this evaluation to further expand the topic aiming at a more
FURTHER METHODOLOGY DEVELOPMENTS 175
formal observability assessment (cf. section B2.4, in particular Eq. B23). This
could include a nonlinear system as well as any rank and condition number
dependencies in relation to the type of dictionary. Further points of interest
may regard the influence of the sampling rate and of any possible correlation
between sparsity in time and space (cf. section B2.5).
Numerical stability
The matrices that result when setting up an estimation problem such as the
CS-MHE may be badly conditioned, and numerical issues may arise. We faced
some concerns related to this topic in part C of this dissertation, and we
tackled them by enforcing symmetry as well as introducing weighting factors
and regularisation factors. This allowed us to get the correct estimation, but
further issues may pop up for other applications. This may derive from different
transducers, models, dictionaries, units, and the resulting ill-posedness may
require specific regularisation approaches (e.g., the ones reported in [70]).
Overcomplete dictionaries
For the examples in dissertation we employed a Dirac delta dictionary and
a Fourier dictionary, which are both complete dictionaries, i.e., all basis
functions are orthogonal. In particular, whenever we dealt with a Fourier
dictionary, we limited the discussion to a DFT with regular sampling and same
number of Fourier components as the length of the time domain data, i.e., the
number of shape function equals the window length. However, overcomplete
(non-orthogonal) dictionaries and different sampling schemes may allow for
higher accuracy, better sparsity and lower sampling rates (cf. chapter A3 and
section B1.4) [107, 110].
Exploiting further sparsity patterns

Throughout this dissertation we modelled the system in a way that resembles
a single measurement vector (SMV), i.e., we always built a single vector
of unknowns whenever a matrix was present. This applies both for the
measurements and for the sparse input representation. However, treating it as
a multiple measurement vector (MMV) may allow to exploit further sparsity
patterns. This idea was already applied in references such as [97, 98, 99, 100]
for combining compressive sensing and multiple signal classification (MUSIC)
into the CS-MUSIC, and could result into a more robust system, where the
probabilistic nature of compressive sensing is limited in favour of deterministic
results. Furthermore, in chapter C1 (cf. section C1.3) we mentioned that we
did not formally assess the restricted isometry property (RIP), that would give
us an indication of the rate of success of compressive sensing. We refer again to
[97, 98, 99, 100] for getting an insight on how this issue was dealt with for the
CS-MUSIC, that involves both deterministic and stochastic behaviours.
Feasibility domain of the CS-MHE

The experiments in chapter C2 (cf. section C2.3) showed good estimation
results. However, a few topics for further discussion have arisen. In particular,
investigating the influence of Qdrift on the accuracy of the estimates in case of
Fourier components remains an open aspect. Moreover, further research could
aim at a formal assessment of the feasibility domain of the CS-MHE in terms of
number of measurement points, window length and system calibration needed
in relation to sparsity. Going in this direction can certainly bring the CS-MHE
to a more mature technological stage (cf. section C2.4).
Vision based measurements

Camera based measurements for applications in structural dynamics form a
relatively new domain which was made possible by the advances in digital
image processing techniques, digital cameras and protocols for high-speed data
transmission. In the experiments that we presented in chapter C2 we used a
high-speed camera, and we faced a few practical aspects that required special
attention. In particular, the synchronisation of the images with the data
acquisition system was not a trivial task. We believe that camera measurements
at high frame rates will become a powerful instrument in future research (not
only related to structural dynamics), and thus it is certainly worth to acquire
further knowledge concerning deterministic and stochastic delays given a camera,
an acquisition system and a protocol for data transmission. This should include
software delays as well as hardware delays connected to elements such as type
of pins (e.g., TTL pins and optically isolated pins) and parameters linked to
voltage and current levels. For a deeper insight on the typical delays that we
may encounter we refer to the technical manuals of cameras such as [84, 202].
Finally, future experiments could aim at improving the accuracy of camera
measurements, considering different algorithms of digital image processing.
Among others, we mention the subpixel resolution technique based on the
so-called optical-flow, proposed in reference [85]. We have already carried
out some preliminary analyses as sidetrack of the work documented in this
dissertation, where we compared the image processing steps that we employed
in chapter C2 (described in appendix 2) with the optical-flow and with a laser
Doppler vibrometer.
Appendix 1
Matrix implementation of the

CS-MHE
In this appendix we give the matrix formulation of the CS-MHE with a limited
amount of constraints, introduced in sections B1.3 and B1.4. We present the
matrices for an LTI system and window length N = 4. Let us start from Eq. (1),
which we presented in section B1.4.2 as Eq. (B9).1 Let us also recall Eq. (B10)
and add a more explicit definition of b in Eq. (2), denoted as Eq. (2d).
minimise z > Hz + q > z + b + λkαk1 (1a)

x,α
subject to x ∈ xLB , xUB , α ∈ αLB , αUB (1b)

x
z = (2a)
α

Hxx Hxα
H = (2b)
Hαx Hαα
q> = qx> qα> (2c)

b = a>
b Ab ab (2d)
1 We first present it in section B2.1 as Eq. (B17). Note that in this appendix we consider
an LTI system such as Eq. (A33).
177
178 APPENDIX
Element Formula
z Eq. (4)
Hxx Eq. (7)
Hαx Eq. (8)
Hxα Hxα = Hαx
>
Hαα Eq. (10)

qx> Eq. (9)
qα> Eq. (11)
a>
b Eq. (5)
Ab Eq. (6)
Table 1: Links between the elements of Eq. (2) and their formulas.
Table 1 links the different parts of Eq. (2) to their explicit formulations for
N = T = 4. Accordingly, the dictionary Ψ in Eq. (B5) results in Eq. (3).
 
ψ11 ψ12 ψ13
Ψ =  ψ21 ψ22 ψ23  (3)
ψ31 ψ32 ψ33
The state-space matrices do not have a subscript k since we assumed an LTI

system. However, they can be generalised by assigning the same subscript of the
covariance matrices. In Eqs. (9) and (11) we notice that a factor 2 characterise
all elements. This is due to the fact that we unified two terms q1 and q2 into q,
i.e., q > z = z > q1 + q2> z, by assuming that the covariance matrices Rk and Qk
are symmetric. All matrices derive from Eq. (B8), which we report in Eq. (12)
without any line brakes within the terms and for an LTI system.
z> = x> x> x> x> α1> α2> α3> (4)

1 2 3 4
b =
a> x̄> y1> y2> y3> y4> ᾱ1> ᾱ2> ᾱ3> (5)

1
Ab = diag Pa−1 , R1−1 , R2−1 , R3−1 , R4−1 , Pα−1 , Pα−1 , Pα−1 (6)

1 2 3
 
Pa−1 + A> Q−1
1 A >
 > −1
−A Q−1
1 
 +C R1 C 
 
 > −1 
 Q−1
1 + A Q2 A 
 −Q−1
1 A −A> Q−1
2 
 +C > R2−1 C 
Hxx =   (7)
 > −1 
 Q−1
2 + A Q3 A > 
 −Q−1
2 A > −1
−A Q−1
3 
 +C R3 C 
 
 
> −1
−Q−1
3 A Q−1
3 + C R4 C
MATRIX IMPLEMENTATION OF THE CS-MHE
 
> > −1 > > −1 > > −1 > > −1 > > −1
ψ11 B Q1 A −ψ11 B Q1 + ψ21 B Q2 A −ψ21 B Q2 + ψ31 B Q3 A > > −1
 > > > −1 > > −1
−ψ31 B Q3 
 +ψ11 D> R1−1 C +ψ21 D R2 C +ψ31 D R3 C 
 
 > > −1 > > −1 > > −1 > > −1 > > −1 
Hαx
 ψ12 B Q1 A −ψ12 B Q1 + ψ22 B Q2 A −ψ22 B Q2 + ψ32 B Q3 A > > −1 
(8)
=  > > −1 > > −1 > > −1
−ψ32 B Q3 
 +ψ12 D R1 C +ψ22 D R2 C +ψ32 D R3 C 
 
 > > −1 > > −1 > > −1 > > −1 > > −1 
 ψ13 B Q1 A −ψ13 B Q1 + ψ23 B Q2 A −ψ23 B Q2 + ψ33 B Q3 A > > −1 
> > −1 > > −1 > > −1
−ψ33 B Q3
+ψ13 D R1 C +ψ23 D R2 C +ψ33 D R3 C
h i
−1 > −1 > −1 > −1 > −1
qx> = −2x̄>
1 Pa − 2y1 R1 C −2y2 R2 C −2y3 R3 C −2y4 R4 C (9)
179
180
 
> > −1 > > −1 > > −1
ψ11 B Q1 Bψ11 ψ11 B Q1 Bψ12 ψ11 B Q1 Bψ13
 > > −1 > > −1 > > −1 
 +ψ21 B Q2 Bψ21 +ψ21 B Q2 Bψ22 +ψ21 B Q2 Bψ23 
> > −1 > > −1 > > −1
 
 
 > > −1 > > −1 > > −1 
 +ψ11 D R1 Dψ11 +ψ11 D R1 Dψ12 +ψ11 D R1 Dψ13 
 > > −1 > > −1 > > −1 
 
 > > −1 > > −1 > > −1 
1
 +ψ31 D R3 Dψ31 + Pα−1 +ψ31 D R3 Dψ32 +ψ31 D R3 Dψ33 
 > > −1 > > −1 > > −1 
 ψ12 B Q1 Bψ11 ψ12 B Q1 Bψ12 ψ12 B Q1 Bψ13 
 > > −1 > > −1 > > −1

 
 > > −1 > > −1 > > −1 
Hαα =  > > −1 > > −1 > > −1  (10)
 
 > > −1 > > −1 > > −1 
 > > −1 > > −1 > > −1 
2
 +ψ32 D R3 Dψ31 +ψ32 D R3 Dψ32 + Pα−1 +ψ32 D R3 Dψ33 
 
 > > −1 > > −1 > > −1 
 ψ13 B Q1 Bψ11 ψ13 B Q1 Bψ12 ψ13 B Q1 Bψ13 
 > > −1 > > −1 > > −1 
> > −1 > > −1 > > −1
 
 
 > > −1 > > −1 > > −1 
 > > −1 > > −1 > > −1 
 +ψ23 D R2 Dψ21 +ψ23 D R2 Dψ22 +ψ23 D R2 Dψ23 
> > −1 > > −1 > > −1
3
+ψ33 D R3 Dψ31 +ψ33 D R3 Dψ32 +ψ33 D R3 Dψ33 + Pα−1
" #
−2y1> R1−1 Dψ11 − 2y2> R2−1 Dψ21 −2y1> R1−1 Dψ12 − 2y2> R2−1 Dψ22 −2y1> R1−1 Dψ13 − 2y2> R2−1 Dψ23
qα> =
1 2 3
−2y3> R3−1 Dψ31 − 2ᾱ1> Pα−1 −2y3> R3−1 Dψ32 − 2ᾱ2> Pα−1 −2y3> R3−1 Dψ33 − 2ᾱ3> Pα−1
(11)
APPENDIX
>
xk ,αk
 >  
T
X −1 T
X −1 T
X −1
+ xk+1 − Axk − B ψk,j αk  Q−1
k
xk+1 − Axk − B ψk,j αk 
k=T −N +1 j=T −N +1 j=T −N +1
 >  
T
X T
X −1 T
X −1
MATRIX IMPLEMENTATION OF THE CS-MHE
+ yk − Cxk − D ψk,j αk  Rk−1 yk − Cxk − D ψk,j αk 

k=T −N +1 j=T −N +1 j=T −N +1
T −1
>
X
+ (α − ᾱ) Pα−1 (α − ᾱ) + λ kαk k1 (12a)
k=T −N +1

subject to x ∈ xLB , xUB , α ∈ αLB , αUB (12b)
181
Appendix 2
Test case description for

section C2.3
In this appendix we expand section C2.3.1, where we introduced the experimental

test case for the estimation of a periodic load described by Fourier basis functions
(cf. section C2.3). The following sections include a detailed description of the
physical set-up, the FE model and its updating procedure through experimental
modal analyses (EMAs), a note on the choice of the measurement system, and
the operations of digital image processing to extract measurement data from a
series of pictures.
Physical set-up
The set-up consists of an aluminium beam clamped on each side to a vertical

mount, which is fixed to the ground (Fig. C2.9 on page 147). Nine steel masses
are attached to the beam every ∆x = 0.1 m with the aim of lowering the
eigenfrequency and add complexity to the system, such that no analytical model
is accurate enough to represent the dynamic behaviour of the system. Moreover,
it is possible to customise the set-up by removing or replacing the masses or by
modifying the boundary conditions, thus making the set-up suitable for other
academic and industrial purposes. We chose dimensions and materials according
to some preliminary FE analyses, neglecting the influence of the vertical mounts
and assuming a rigid clamping. Table 2 indicates the geometry of the beam,
where the reference system was already sketched in Fig. C2.11 on page 148,
with the origin on the neutral axis of the beam at the left mount.
183
184 APPENDIX
Table 2: Beam geometry.
Feature Along axis Size [m]

Beam length (L) x 1.000
Beam width y 0.015
Beam thickness z 0.010
Steel mass height z 0.050
Steel mass width x 0.020
Steel mass thickness y 0.015
Mount height z 0.500
Mount cross-section plane xy 0.080 × 0.080
Finite element model, model update and state-space model
Since the preliminary FE model was incomplete and not accurate enough, we
performed a series of model updating procedures based on the following three
experimental modal analyses (EMAs) [78], using three uniaxial accelerometers
at a fixed location and one impact hammer. We extracted the eigenmodes in
LMS Test.Lab [145].
1. Aluminium beam in free-free conditions: we hanged the beam with two

elastic bands,2 and we excited 4 points along the beam, in y and z
directions. This allowed to measure the FRFs from the impacts to two
uniaxial accelerometers in y and z placed at x = 0.750 m (cf. Fig. C2.9).
2. Assembly of the aluminium beam and the 9 steel masses in free-free
conditions: we excited the same 4 points along the beam as the previous
case, in y and z direction.
3. Full set-up, i.e., aluminium beam with steel masses clamped-clamped to

the vertical mounts, which were fixed to the ground. In this case we added
a third accelerometer for the x direction, located at x = 1.100 m, and we
excited a higher number of points, in order to have a finer resolution. Fig. 1
shows an initial set of points, to which we added a few extra measurements
along x in a later stage (on the left mount and at each steel mass).
2 In free-free conditions the length of the beam was 1.200 m.

TEST CASE DESCRIPTION FOR SECTION C2.3 185
Figure 1: Schematic model for the EMA of the full set-up.
We note that the first two EMAs are rather coarse. However, their purpose
was just a material characterisation, which resulted accurate enough. On the
other hand, the third EMA is finer and allows for a more detailed study of the
eigenmodes, involving the torsional and axial directions of the beam, as well
as the flexural modes of the vertical mounts. Next, we imported the results
of the EMAs (i.e., eigenfrequencies and mode shapes) into the FE software as
target values for a model updating procedure [78], aiming at improving the
accuracy of the FE model. The model updating consisted in three main steps,
corresponding to the previous three EMAs:
1. Determine the Young’s modulus of the aluminium: we can shift the

eigenfrequencies of a beam by varying its mass and stiffness.3 Since
the mass is known (we weighted the beam on a precision scale), the
software needs to optimise the error on the eigenfrequencies by varying
the aluminium Young’s modulus (Ealu ). This model updating neglects
the holes in the aluminium beam, which are present to screw the steel
masses to the beam. Table 3 shows the results of this first model updating
procedure.
3 For the aluminium beam we assumed a known Poisson’s ratio νalu = 0.33.
186 APPENDIX
Table 3: Model update aluminium beam, free-free boundary conditions.
Mode ID Plane Freq. EMA [Hz] Freq. MoUp [Hz] Error [%]
1 xz 35.70 34.99 -1.98
2 xy 53.08 52.49 -1.11
3 xz 93.55 96.41 3.05
4 xy 144.92 144.58 -0.24
5 xz 187.41 188.87 0.78
6 xy 286.03 283.12 -1.02
2. Determine the stiffness of the contact between the aluminium and the
steel masses: in general, modelling a mechanical contact is not a simple
task. However, for the beam set-up we chose to model the contact as a
uniform isotropic stiffness of the steel (Esteel ). This approximation does
not introduce relevant distortions since the structure is rather rigid and
each mass presents a single contact area. Another approximation regards
the mass distribution: in fact, the model assumes solid isotropic steel,
while in reality the presence of the screws changes the centre of mass.
However, we neglected this aspect. Table 4 shows the results of this second
model updating procedure.
3. Determine the stiffness of the vertical mounts (Emount ), including the

clamping stiffness. Once all properties of the beam are known thanks to
the previous two points, we optimised the stiffness of the vertical mounts
in order to match the experimental values. Table 5 shows the results of
this last model updating procedure. Finally, Table 6 reports the material
properties that resulted from all three steps.4
We performed all these operations by exploiting the model update scheme

implemented in Siemens NX [182], which is parametric and works on a reduced
model. The structure is made of simple components, such that beam and shell
elements can represent the dominant mode shapes. In particular, we chose 2D
elements (shell elements) for the beam and 1D elements (beam elements) for
the vertical mounts. We set the damping values by comparing several FRFs
computed from the FE model and from the EMA acceleration measurements.
We modelled each of the two clamping points through two rigid connections (cf.
Fig. 1) going from the top node of the mounting (modelled by 1D elements) to
4 We modelled the vertical mounts as a beam with cross-section 0.050 × 0.050 m.
Table 4: Model update aluminium beam with steel masses, free-free boundary
conditions.
Mode ID Plane Freq. EMA [Hz] Freq. MoUp [Hz] Error [%]
1 xz 26.23 25.74 -1.88
2 xy 39.16 38.69 -1.20
3 xz 65.62 65.98 0.55
4 xy 93.46 94.10 0.68
5 xz 120.38 120.32 -0.05
6 xy 173.01 172.51 -0.29
Table 5: Model update full set-up.
Mode ID Plane B/M Freq. EMA Freq. MoUp Error

1 xz B 25.74 25.79 0.19
2 xy B 34.72 34.13 -1.70
3 xz B 68.93 69.19 0.38
4 xy B 85.68 84.47 -1.41
5 xz M 90.69 94.95 4.70
6 yz M 120.59 117.09 -2.91
7 yz M 141.90 135.14 -4.76
8 xz B 134.71 136.00 0.96
9 xy(z) B 181.17 178.45 -1.51
the top and bottom nodes of the aluminium beam (modelled by 2D elements).
We are aware that there are different ways to model the set-up (including 3D
FE models) and several ways to model the vertical mounts and the clamping,
which could result in a better eigenmode matching, especially for modes ID
5 and 7 (cf. Table 5). However, for our purposes we judged the model to be
accurate enough. Fig. 2 shows the mode shapes of the full set-up, linked to
Table 5. In the following list we give a few remarks about the eigenmodes of
the system:
188 APPENDIX
Table 6: Material properties after model updating.
Part Property Value

ρalu kg/m3 2655.56

Aluminium beam Ealu [GPa] 63.84

νalu 0.33
kg/m3 8013.33

ρsteel
Steel masses Esteel [GPa] 198.93
νsteel 0.29
kg/m3 2700.00

ρmount
Vertical mounts Emount [GPa] 43.45
νmount 0.33
• The third column of Table 5 indicates if a mode involves mainly the beam
(B) or the full set-up, including the vertical mounts (M). The six beam
modes (B) correspond well with our preliminary FE analysis that did
not take into account the vertical mounts. On the other hand, the three
modes that involve the whole set-up (M) are governed by the dynamical
behaviour of the vertical mounts.
• The second column of Table 5 indicates in which plane the mode is
predominantly located. This allows us to distinguish the following beam
(B) modes:
– Three beam bending modes in the vertical plane xz, i.e., modes 1,
3, 8. Excluding the clamping nodes, they have zero, one and two
nodal points, respectively. For the experimental example we chose
to excite the beam along z, such that these three modes govern the
whole dynamic response.
– Three beam bending modes in the horizontal plane xy, i.e., modes
2, 4, 9. Excluding the clamping nodes, they have zero, one and
two nodal points, respectively. Mode 9 has also a strong torsional
behaviour, and that is the reason why we used notation “plane xy(z)”
in Table 5.
In order to run the CS-MHE, we need the model of the test set-up to be available
in MATLAB® . We tackled this problem by extracting the FE mass and stiffness
Figure 2: Mode shapes of the first 9 eigenmodes of the full FE model after the
model updating (cf. Table 5).
matrices and projecting them on modal coordinates [78]. This allows to build a
state-space model and to operate in the same way as we did throughout this
dissertation. Since for the experiments in chapter C2 we mounted the shaker
such that it applies a load in the xz plane along the z axis, we expected modes
1, 3 and 8 (cf. Table 5 and Fig. 2) to dominate the response. Accordingly, we
reduced the model limiting it to those three eigenmodes. Literature offers a few
model order reduction (MOR) techniques [15, 174], but for the beam set-up we
selected the mode shapes based on the participation factors of the excitation
configuration. This allows to improve the observability of the system at the
expense of a very small loss of modelling accuracy.
190 APPENDIX
4
log 10 (FRF)
1
MoUp 1 3 8 MoUp
EMA 1 2 3 45 6 8 7 9 10 EMA
0
0 25 50 75 100 125 150 175 200 225 250
Freq. [Hz]
Figure 3: Comparison between an FRF obtained from the updated and reduced
state-space model (—–) and an experimental FRF (- - -). The mode IDs follow
the notation in Table 5 (an additional experimental model is present). The
FRFs refer to a force input at x = 0.750 m and an acceleration output at
x = 0.450 m, and are expressed as acceleration over force [mm/Ns2 ].
In order to provide a graphical indication of the model accuracy, Fig. 3 shows a

comparison between an FRF obtained from the updated and reduced state-space
model (solid blue) and an experimental FRF (dashed green). These FRFs refer
to a force input at x = 0.750 m and an acceleration output at x = 0.450 m, and
are expressed as acceleration over force [mm/Ns2 ]. Since the model considers
displacement transducers, we evaluated the second derivative of the model FRF
in frequency domain. In general, we see that the two FRFs match well. More
specifically, we notice some discrepancies at low frequency (before mode 1),
around modes 4-5 and after mode 9. Whereas the unexpected behaviour of the
experimental FRF at low frequencies is linked to the quality of the experiment
(it results from a combination of background noise and difficulty to have a clear
excitation), the other effects are connected to the choice of having a reduced
model. The major dynamical phenomena are well captured, while the minor
phenomena are discarded in favour of a better observability. Furthermore, the
graph indicates that we can expect different estimation performances in relation
to the frequency content of the input, especially in the neighbourhood of the
eigenfrequencies (due to damping) and at higher frequencies (which are not
modelled).
Choice of the measurement system
Beside the over mentioned state-space representation based on an updated

FE model, we could choose among several types of measurement systems to
validate the CS-MHE, such as accelerometers, strain sensors, laser vibrometer,
Nikon Metrology K600 system [136]. In the end of section B2.5 we have already
mentioned that we chose contactless measurements obtained through a high-
speed camera, since this approach offers a very flexible sensors array. In the list
that follows we summarise the main reasons behind the choice of vision based
measurements:
• Compared to accelerometers and strain sensors, camera measurements

offer high spatial resolution and contactless properties, such that a huge
amount of information is available in one picture, and the model does
not have to include any extra mass, stiffness and damping values related
to a sensor. This was not the case with the experimental set-up that we
presented in section C1.2 (cf. Fig. C1.12).
• There have been recent advances in vision technology and high-speed

recording. We cite references [49, 85, 86, 87, 132, 203] for a few applications
of camera measurements in structural dynamics.
• Two different camera systems are available within our research group.
Using those systems would allow to gain some expertise within the group
and pave the way for future research and development.
Camera-based displacement measurements
In this section we list the steps we went through in order to extract the
displacements measurements from a sequence of images. This procedure involves
standard digital image processing techniques, which can be found in references
such as [64, 149]. Their implementation in MATLAB® is documented in [65],
and many functions are now available in the image processing and computer
vision toolbox (MATLAB® R2017a). For this reasons, we refer the interested
reader to the MATLAB® help and references therein. The experimental set-up
was instrumented with the following (see Fig. 4):
• Three adhesive paper strips on the beam, each of them resulting in 53

markers equally distributed on a length of 0.260 m, i.e., a marker every
0.005 m. They span the ranges x = 0.005 m to x = 0.265 m, x = 0.370 m
to x = 0.630 m, and x = 0.735 m to x = 0.995 m.
192 APPENDIX
Figure 4: Beam set-up.
• Two adhesive paper strips on the vertical mounts, identical to the strips
on the beam.
• Two checkerboard patterns on two different planes, to reconstruct the
scene in 3D.
• One shaker (THE MODAL SHOP Miniature Inertial Shaker K2002E01
[188]).
• One impedance head (PCB ICP® 288D01 [144]), to measure the force
entering the system, to be used for validation purposes.
• One high-speed camera (Ximea MQ042CG-CM [201]).
• One photography lamp working with DC (rectified) current. The lamp
produces a spot light which is clearly visible in the centre of Fig. 4.
Consequently, the central strip on the beam (“BEAM STRIP 2” in Fig. 4)
receives more light.
• One data acquisition system (LMS SCADAS [180]).
• One synchronisation circuit.
• One personal computer.
Fig. 5 shows the schematic data flow when an experiment is running. The settings
for the LMS SCADAS [180] and for the camera are defined via two dedicated
Figure 5: Schematic representation of the test set-up.
programs. In LMS Test.Lab [181] we set the duration of the experiment, the
sampling rate, the signal for the shaker and the trigger for the camera. In
the camera software we set the camera such that each frame is activated by
the rising edge of the trigger signal and the exposure has a fixed duration.
Furthermore, a signal indicating the exposure time is sent back to the LMS
SCADAS for synchronisation purposes. The camera software takes also care of
saving the images on the PC. The LMS SCADAS is a data acquisition system
with multiple inputs and outputs. From top to bottom, four channels of the
LMS SCADAS are connected as follows:
1. INPUT from the impedance head: this channel acquires the dynamical
component of the force, which is generated by the shaker and captured
by the impedance head. We use this signal for validating the estimation
results.
2. OUTPUT to the shaker: this channel generates a signal that serves as
input for the shaker. This signal is amplified to reach the required force.
3. OUTPUT to the synchronisation box: this channel generates a sine wave
at the frequency corresponding to the desired frame rate. The camera
requires a square wave at a certain voltage level, and the purpose of the
194 APPENDIX
Figure 6: Images for correcting the lens distortion.
synchronisation box is to convert a sine wave centred on 0V DC (and with

peak-to-peak amplitude of 5V DC) to a digital square wave with a low
level of 0V DC and a high level as indicated on the camera documentation
(24V DC for the available version of the Ximea MQ042CG-CM. The
voltage ranges span typically from 0-3.3V DC to 0-24V DC, depending
on the hardware).
4. INPUT from the synchronisation box: this channel acquires the
synchronisation signal generated by the camera, which can be programmed
within the camera software. We chose a square wave indicating the time
of the exposure. In the synchronisation box, the signal from the camera is
pulled up to a voltage level compatible with the data acquisition system
(10V DC for the LMS SCADAS).
When performing vision based measurements, the first phase involves the
estimation of the parameters for correcting the lens distortions (typically barrel
distortions or pincushion distortions) as well as the parameters to calibrate the
scene, i.e., transforming the pixel information into metric measurements. The
following list summarises these steps:
1. Determine the correction needed to cope with lens distortion. This step
requires several pictures of the measurement area containing a calibration
checkerboard pattern (Fig. 6). We covered the other two checkerboard
Figure 7: Tracked features between two images. Legend: points of image 1 (×),
points on image 2 (◦), link (· · ·).
patterns (cf. Fig. 4) since the MATLAB® toolbox for correcting lens
distortions can easily detect a single checkerboard pattern.
2. Create an up-to-scale 3D reconstruction of the scene. 3D reconstructions

require multiple camera poses (e.g., stereo vision), which we obtained by
moving one camera. Fig. 7 shows two pictures captured at different angles.
They are overlapped and the corresponding markers are highlighted.
Furthermore, Figs. 8–9 show the 3D reconstruction of the scene, including
the position of the camera. The graphs are scaled such that the reference
system is centred and oriented according to the first location of the camera
(denoted as “1” in Figs. 8–9), while the distance between the two camera
poses corresponds to one unity. It is worth mentioning that this step
can be avoided and replaced by other techniques for correcting camera
misalignments and perspective effects, which can lead to very accurate
data if all measurements belong to the same plane (which is the case for
our experiment). However, we decided to reconstruct the 3D scene since
this approach may give higher accuracy.
3. Determine the scaling factor. In order to scale the system to the correct
metric units, we assumed that the distance between two points is known.
We chose the bottom marker on the left mount and the top marker on the
right mount. Other methods can take into account more points within an
optimisation scheme (e.g., LS minimisation).
4. Save the matrices that allow to transform a pixel coordinate into a
displacement measurement.
196 APPENDIX
2
1.5
2
1
1 -0.1
0.5
y
0
0 0.1
0.2
z 0.5 1
-0.5 0
-0.5
x
Figure 8: Up-to-scale 3D reconstruction of the scene (view 1). The reference

system is aligned with camera 1, and the distance between the cameras
corresponds to one unity.
-0.1
0
y
2 0.1
0.2
1
1
0.5
0
2 1.5 1 -0.5
0.5 0 -0.5 x
z
Figure 9: Up-to-scale 3D reconstruction of the scene (view 2).
Once all the parameters of the camera and of the scene are available, we can
apply the following procedure to every image (i.e., to every frame of a video
acquired during an experiment):
1. Make a first guess of the markers positions. This can be done manually
or automatically thanks to specific algorithms for pattern recognition.
Unfortunately, our images were quite dark, illuminated non uniformly,
and the marker resolution was quite low, resulting in the failure of
the automatic pattern recognition routines. Consequently, we defined
manually the first guess for the markers positions.
2. Refine the markers positions up to a sub-pixel resolution. This can be
done automatically by investigating the pixels in the neighbourhood of
the first guess. Fig. 10 shows the whole beam, while Fig. 11 contains
Figure 10: First frame (ROI size: 32×2048 pixel).
Figure 11: Marker detection for the central strip (top) and zoom on the central
markers (bottom). Legend: first guess (◦), sub-pixel refinement (+).
two zooms where we can notice the sub-pixel refinement of the markers
positions (green crosses).
3. Transform the pixel coordinates into metric values, by applying the lens
distortion correction (point 1 of the previous list) and the matrices for
roto-translation and scaling (point 4 of the previous list).
4. Repeat the procedure for every picture acquired during an experiment,
using the refined position of the current frame as first guess for the next
frame.
5. Correct the displacement measurements with the model. This step is not
a standard procedure in image processing, and we will discuss it after this
list.
Correct the displacement measurements with the model
If the shaker is attached to the structure and it is not active, we expect

the beam to be deformed under its static weight. Fig. 12 shows the static
198 APPENDIX
z [mm]
F
-0.1
0 250 500 750 1000
x [mm]
Figure 12: Beam deformed under the static load of the shaker (F), obtained
through an FE analysis.
1
z [mm]
-1
0 250 500 750 1000
x [mm]
0.1
z [mm]
-0.1
-0.2
0 250 500 750 1000
x [mm]
Figure 13: Displacement measurements at the first time step. From image
processing (top) and corrected with the FE deformed shape (bottom).
deformation of the beam, evaluated through the FE model. Unfortunately, the

displacement measurements do not match the expected deformation shape. This
is the consequence of multiple systematic uncertainties such as the boundary
conditions of the beam (in particular the geometry of the connection, since
it can cause static deformations) and the gluing of the marker strips, as well
as stochastic uncertainties such as the markers printing accuracy and errors
during image processing. We can spot some of these issues in Fig. 13 (top),
which shows the measurements at the first frame of the simulation, i.e., when
the shaker is still off. For this reason, we corrected the measurements with
the FE static simulation, obtaining the displacements in Fig. 13 (bottom). We
applied such correction to all frames, and Fig. 14 shows a frame acquired after
the transient, where we notice that mode 8 (cf. Fig. 2) dominates the harmonic
response. This happens since its shape lays in the same plane as the excitation,
1
z [mm]
0
-1
0 250 500 750 1000
x [mm]
z [mm] 0.1
-0.1
-0.2
0 250 500 750 1000
x [mm]
Figure 14: Displacement measurements at a single time step, corrected with

the FE deformed shape.
Figure 15: Overlap of the displacement measurements at all time steps, corrected
with the FE deformed shape.
and its eigenfrequency (142 Hz, cf. Table 5) is the closest to the excitation
frequency (128 Hz). Finally, Fig. 15 shows an overlap of the displacements
evaluated for each frame of a measurement run (approximately 50 s at 512 Hz),
where we can see the space spanned by the vibration as well as an issue related
to the illumination. In fact, we notice that the measurements become blurry
while approaching the vertical mounts of the set-up. The reason for this can be
understood by looking at the light in Fig. 10.
A brief note on model, measurements and data synchronisation
In the coming few lines we want to stress the fact that building the model
and setting up the procedure for extracting displacement measurements from
images required a certain effort and some time. The CS-MHE is a model based
estimator, and its performance is strictly linked to the model accuracy (cf.
the numerical example in section C1.1). Building an accurate model requires
the knowledge of numerical and experimental methodologies, specific software
(including protocols to exchange information among different programs), and a
certain level of personal experience. Moreover, similar considerations apply to
200 APPENDIX
sync [V] 10
0
30 30.005 30.01
t [s]
Figure 16: Synchronisation signal. The chosen time step is marked with a black
triangle (time step ID 31 in Fig. 17).
the camera measurements. For example, the first FE model that we built did
not consider the vibration of the vertical mounts of the set-up. Furthermore,
the model updating procedure consisted in a single step where we optimised
all material and contact parameters. In parallel to this, the measurements did
not include any lens correction and did only consider the 2D plane in which
the beam vibrates. Under these circumstances, in our first trial run of the
CS-MHE we could notice that something was happening, but we did not manage
to find good values for the covariance matrices and consequently it was not
possible to tune the balancing weight λ, and the results were not acceptable.
We cannot state that the sequence of operations that we applied to model and
measurements is the best possible, and there surely is space for improvement,
but we showed in section C2.3.2 that the results are satisfactory.
Before concluding this appendix, let us spend a few words about the
synchronisation signal for identifying the time step at which a picture was taken,
thus allowing us to compare the CS-MHE results with the force measurements
provided by the impedance head. We already mentioned a few details about
hardware connections (cf. Fig. 5), signal shapes and voltage levels, while in
Fig. 16 we show a portion of the synchronisation signal that goes from the
camera to the acquisition system. It is sampled at 16348 Hz (≈ 16 kHz) in
order to have 32 points describing one square wave at a frame rate of 512 frames
per second (fps). The high level of the signal indicates the time in which the
camera is in exposure active mode, and its length was set manually to 1.8 ms,
i.e., at every rising edge of the trigger signal (consisting of a square wave at
512 Hz) the camera takes a picture with an exposure time of 1.8 ms, and at
the same time it sends back the exposure status to the data acquisition system.
Then, the camera waits for the next rising edge of the trigger signal. From this
/2
phase difference [rad]
/4
- /4
1 10 20 30 40
time step ID
Figure 17: Phase difference between the force estimated by the CS-MHE and
measured by the impedance head, expressed in function of the synchronisation
time step. The choice of time step ID 31 (black triangles in Fig. 16) gives the
best alignment.
signal we had to choose the synchronization strategy, i.e., at which time step
within the exposure active we consider the picture to refer to. We did this by
trying 40 time steps in a portion of signal that includes a full exposure active,
and computing the phase difference between the force estimate given by the
CS-MHE and the force measured by the impedance head (in case the impedance
head was not present, we could set an arbitrary delay according to the values
suggested by the camera manufacturer). Fig. 17 shows this phase difference,
and we can see that the synchronisation denoted by ID 31 leads to the lowest
phase error. This timestep corresponds to the black triangles in Fig. 16, and it
is located towards the end of the exposure active signal.
Bibliography
[1] Abdollahpouri, M., Takács, G., and Rohal’-Ilkiv, B. Real-time

moving horizon estimation for a vibrating active cantilever. Mechanical
Systems and Signal Processing 86, Part A (2017), 1–15.
[2] Aharon, M., Elad, M., and Bruckstein, A. K-SVD: An algorithm
for designing overcomplete dictionaries for sparse representation. Signal
Processing, IEEE Transactions on 54, 11 (Nov 2006), 4311–4322.
[3] Albersmeyer, J. Adjoint-based algorithms and numerical methods for
sensitivity generation and optimization of large scale dynamic systems.
PhD thesis, Ruprecht-Karls-Universitität Heidelberg, 2010.
[4] Alspach, D., and Sorenson, H. Nonlinear bayesian estimation using
gaussian sum approximations. IEEE Transactions on Automatic Control
17, 4 (Aug 1972), 439–448.
[5] Åström, K. J., and Murray, R. M. Feedback Systems: An Introduction

for Scientists and Engineers. Princeton University Press, 2008.
[6] Athans, M., Wishner, R., and Bertolini, A. Suboptimal state
estimation for continuous-time nonlinear systems from discrete noisy
measurements. IEEE Transactions on Automatic Control 13, 5 (Oct
1968), 504–514.
[7] Bach, F. R., Jenatton, R., Mairal, J., and Obozinski, G.
Optimization with Sparsity-Inducing Penalties. In: Foundations and
Trends in Machine Learning, vol. 4, 1. 2011.
[8] Bai, M. R., and Chen, C.-C. Application of convex optimization to

acoustical array signal processing. Journal of Sound and Vibration 332,
25 (2013), 6596–6616.
203
204 BIBLIOGRAPHY
[9] Bao, S., Luo, L., Mao, J., and Tang, D. Improved fault detection
and diagnosis using sparse global-local preserving projections. Journal of
Process Control 47 (2016), 121–135.
[10] Baraniuk, R. Compressive sensing [lecture notes]. Signal Processing

Magazine, IEEE 24, 4 (July 2007), 118–121.
[11] Baraniuk, R., Davenport, M., DeVore, R., and Wakin, M. A
simple proof of the restricted isometry property for random matrices.
Constructive Approximation 28, 3 (2008), 253–263.
[12] Bay, J. Fundamentals of Linear State Space Systems. Electrical

Engineering Series. WCB/McGraw-Hill, 1999.
[13] Bayes, M., and Price, M. An essay towards solving a problem in the
doctrine of chances. by the late rev. mr. bayes, f. r. s. communicated by mr.
price, in a letter to john canton, a. m. f. r. s. Philosophical Transactions
53 (1763), 370–418.
[14] Bellantoni, J., and Dodge, K. A square root formulation of the
kalman-schmidt filter. AIAA journal 5, 7 (1967), 1309–1314.
[15] Besselink, B., Tabak, U., Lutowska, A., van de Wouw, N.,
Nijmeijer, H., Rixen, D., Hochstenbach, M., and Schilders, W.
A comparison of model reduction techniques from structural dynamics,
numerical mathematics and systems and control. Journal of Sound and
Vibration 332, 19 (2013), 4403–4422.
[16] Biegler, L. Nonlinear Programming. Society for Industrial and Applied
Mathematics, 2010.
[17] Bing, L., DianGe, Y., and XiaoMin, L. An acoustic holography

method with random sparse microphone array to locate moving sound
sources. In 2008 9th International Conference on Signal Processing (Oct
2008), pp. 187–190.
[18] Bock, H., Körkel, S., Kostina, E., and Schlöder, J. Robustness
aspects in parameter estimation, optimal design of experiments and
optimal control. In Reactive Flows, Diffusion and Transport. Springer
Berlin Heidelberg, 2007, pp. 117–146.
[19] Bock, H. G., Kostina, E., and Kostyukova, O. Covariance matrices
for parameter estimates of constrained parameter estimation problems.
SIAM Journal on Matrix Analysis and Applications 29, 2 (2007), 626–642.
BIBLIOGRAPHY 205
[20] Box, G. E. P., and Jenkins, G. M. Time Series Analysis: Forecasting

and Control, 3rd ed. Prentice Hall PTR, Upper Saddle River, NJ, USA,
1994.
[21] Boyd, S., and Vandenberghe, L. Convex Optimization. Cambridge

University Press, New York, 2004.
[22] Bundesministerium für Bildung und Forschung. https://
www.bmbf.de.
[23] Candes, E., Romberg, J., and Tao, T. Robust uncertainty principles:
exact signal reconstruction from highly incomplete frequency information.
Information Theory, IEEE Transactions on 52, 2 (Feb 2006), 489–509.
[24] Candes, E., and Tao, T. Decoding by linear programming. Information
Theory, IEEE Transactions on 51, 12 (Dec 2005), 4203–4215.
[25] Candes, E., and Wakin, M. An introduction to compressive sampling.

Signal Processing Magazine, IEEE 25, 2 (March 2008), 21–30.
[26] Candes, E. J. The restricted isometry property and its implications
for compressed sensing. Comptes Rendus Mathematique 346, 9-10 (2008),
589–592.
[27] Candes, E. J., Romberg, J. K., and Tao, T. Stable signal recovery
from incomplete and inaccurate measurements. Communications on Pure
and Applied Mathematics 59, 8 (2006), 1207–1223.
[28] Carin, L. On the relationship between compressive sensing and random
sensor arrays. Antennas and Propagation Magazine, IEEE 51, 5 (Oct
2009), 72–81.
[29] Carmi, A., Gurfil, P., and Kanevsky, D. Methods for sparse signal
recovery using kalman filtering with embedded pseudo-measurement norms
and quasi-norms. Signal Processing, IEEE Transactions on 58, 4 (April
2010), 2405–2409.
[30] Chardon, G., Daudet, L., Peillot, A., Ollivier, F., Bertin,
N., and Gribonval, R. Near-field acoustic holography using sparse
regularization and compressive sampling principles. The Journal of the
Acoustical Society of America 132, 3 (2012), 1521–1534.
[31] Chardon, G., Leblanc, A., and Daudet, L. Plate impulse response
spatial interpolation with sub-nyquist sampling. Journal of Sound and
Vibration 330, 23 (2011), 5678–5689.
206 BIBLIOGRAPHY
[32] Charles, A., Asif, M., Romberg, J., and Rozell, C. Sparsity
penalties in dynamical system estimation. In Information Sciences and
Systems (CISS), 2011 45th Annual Conference on (March 2011), pp. 1–6.
[33] Chen, C.-T. Linear system theory and design. Oxford series in electrical
and computer engineering. Oxford university press, 1999.
[34] Chen, C.-T. Linear system theory and design, 3rd ed. Oxford University
Press, New York, 1999.
[35] Chen, S., and Donoho, D. Basis pursuit. In Signals, Systems and
Computers, 1994. 1994 Conference Record of the Twenty-Eighth Asilomar
Conference on (Oct 1994), vol. 1, pp. 41–44.
[36] Chen, S. S., Donoho, D. L., and Saunders, M. A. Atomic
decomposition by basis pursuit. SIAM Rev. 43, 1 (Jan. 2001), 129–159.
[37] Chu, E., Keshavarz, A., Gorinevsky, D., and Boyd, S. Moving
horizon estimation for staged qp problems. In 2012 IEEE 51st IEEE
Conference on Decision and Control (CDC) (Dec 2012), pp. 3177–3182.
[38] Croes, J. Virtual sensing in mechatronic drivelines – Bridging between
advanced methods and industrial applications. PhD thesis, KU Leuven,
September 2017.
[39] D’Elia, G., Cocconcelli, M., Mucchi, E., Rubini, R., and
Dalpiaz, G. Step-by-step algorithm for the simulation of faulted bearings
in non-stationary conditions. In Proceedings of ISMA2016 including USD
2016, Leuven, Belgium (2016), P. Sas, D. Moens, and A. van de Walle,
Eds., pp. 2393–2408.
[40] Diehl, M. Real-Time Optimization for Large Scale Nonlinear Processes.

PhD thesis, Universität Heidelberg, 2001.
[41] Domahidi, A., Zgraggen, A., Zeilinger, M., Morari, M., and
Jones, C. Efficient Interior Point Methods for Multistage Problems
Arising in Receding Horizon Control. In IEEE Conference on Decision
and Control (CDC) (Maui, HI, USA, Dec. 2012), pp. 668–674.
[42] Donoho, D. Compressed sensing. Information Theory, IEEE
Transactions on 52, 4 (April 2006), 1289–1306.
[43] Donoho, D., Elad, M., and Temlyakov, V. Stable recovery of sparse
overcomplete representations in the presence of noise. Information Theory,
IEEE Transactions on 52, 1 (Jan 2006), 6–18.
BIBLIOGRAPHY 207
[44] Donoho, D., and Tanner, J. Thresholds for the recovery of sparse
solutions via l1 minimization. In Information Sciences and Systems, 2006
40th Annual Conference on (March 2006), pp. 202–206.
[45] Donoho, D. L., and Elad, M. On the stability of the basis pursuit in
the presence of noise. Signal Processing 86, 3 (2006), 511–532.
[46] Doucet, A., de Freitas, N., and Gordon, N., Eds. Sequential Monte
Carlo Methods in Practice. Springer-Verlag New York, 2001.
[47] Duarte-Carvajalino, J., and Sapiro, G. Learning to sense
sparse signals: Simultaneous sensing matrix and sparsifying dictionary
optimization. Image Processing, IEEE Transactions on 18, 7 (July 2009),
1395–1408.
[48] Elad, M. Optimized projections for compressed sensing. IEEE
Transactions on Signal Processing 55, 12 (Dec 2007), 5695–5702.
[49] Feng, D., and Feng, M. Q. Identification of structural stiffness
and excitation forces in time domain using noncontact vision-based
displacement measurement. Journal of Sound and Vibration 406,
Supplement C (2017), 15–28.
[50] Ferreau, H. J., Kirches, C., Potschka, A., Bock, H. G., and
Diehl, M. qpOASES: a parametric active-set algorithm for quadratic
programming. Mathematical Programming Computation 6, 4 (2014),
327–363.
[51] Floudas, C., Akrotirianakis, I., Caratzoulas, S., Meyer, C.,
and Kallrath, J. Global optimization in the 21st century: Advances and
challenges. Computers & Chemical Engineering 29, 6 (2005), 1185–1202.
[52] Forrier, B., Naets, F., and Desmet, W. Virtual sensing on
mechatronic drivetrains using multiphysical models. In ECCOMAS
Thematic Conference on Multibody Dynamics (2015).
[53] Franklin, G. F., Powell, D. J., and Emami-Naeini, A. Feedback
Control of Dynamic Systems, 5th ed. Prentice Hall PTR, Upper Saddle
River, NJ, USA, 2006.
[54] Frasch, J. Parallel Algorithms for Optimization of Dynamic Systems in
Real-Time. PhD thesis, KU Leuven and OVGU Magdeburg, September
2014.
[55] Frasch, J. V., Sager, S., and Diehl, M. A Parallel Quadratic
Programming Method for Dynamic Optimization Problems. Mathematical
Programming Computations 7, 3 (2015), 289–329.
208 BIBLIOGRAPHY
[56] Frison, G., Sorensen, H., Dammann, B., and Jorgensen, J. High-
performance small-scale solvers for linear model predictive control. In
Proc. 2014 European Control Conference (ECC) (June 2014), pp. 128–133.
[57] Fritzen, C.-P., Ginsberg, D., and Loffeld, O. Vibration-based

structural damage identification using sparsity constrained extended
kalman filter concept. In Proceedings of ISMA2018 and USD2018, Leuven,
Belgium (17-19 September 2018), W. Desmet, B. Pluymers, D. Moens,
and W. Rottiers, Eds.
[58] Ganesan, V., Das, T., Rahnavard, N., and Kauffman, J. L.

Vibration-based monitoring and diagnostics using compressive sensing.
Journal of Sound and Vibration 394 (2017), 612–630.
[59] Gauss, K. F. Theory of Motion of the Heavenly Bodies Moving About
the Sun in Conic Sections: A Translation of Theoria Motus. Dover, New
York, 2004.
[60] Ghosh, B. K., and Rosenthal, J. A generalized popov-belevitch-
hautus test of observability. IEEE Transactions on Automatic Control 40,
1 (Jan 1995), 176–180.
[61] Gillijns, S., and Moor, B. D. Unbiased minimum-variance input and
state estimation for linear discrete-time systems. Automatica 43, 1 (2007),
111–116.
[62] Gillijns, S., and Moor, B. D. Unbiased minimum-variance input and
state estimation for linear discrete-time systems with direct feedthrough.
Automatica 43, 5 (2007), 934–937.
[63] Ginsberg, D., and Fritzen, C.-P. New approach for impact detection
by finding sparse solution. In Proceedings of ISMA2014 including
USD2014, Leuven, Belgium (15-17 September 2014), P. Sas, D. Moens,
and H. Denayer, Eds., pp. 2043–2056.
[64] Gonzalez, R., and Woods, R. Digital Image Processing (2nd Edition).
International Edition. Prentice Hall, 2002.
[65] Gonzalez, R., Woods, R., and Eddins, S. Digital Image Processing
Using MATLAB. Pearson Prentice Hall, 2004.
[66] Gordon, N. J., Salmond, D. J., and Smith, A. F. M. Novel approach
to nonlinear/non-gaussian bayesian state estimation. IEE Proceedings F -
Radar and Signal Processing 140, 2 (April 1993), 107–113.
BIBLIOGRAPHY 209
[67] Grant, M., and Boyd, S. Graph implementations for nonsmooth

convex programs. In Recent Advances in Learning and Control, V. Blondel,
S. Boyd, and H. Kimura, Eds., Lecture Notes in Control and Information
Sciences. Springer-Verlag Limited, 2008, pp. 95–110.
[68] Grant, M., and Boyd, S. CVX: Matlab software for disciplined convex
programming, version 2.1, March 2014.
[69] Ha, Q., and Trinh, H. State and input simultaneous estimation for a
class of nonlinear systems. Automatica 40, 10 (2004), 1779 – 1785.
[70] Hansen, P. C. Rank-Deficient and Discrete Ill-Posed Problems:

Numerical Aspects of Linear Inversion. SIAM, 1998.
[71] Haseltine, E. L., and Rawlings, J. B. A critical evaluation of
extended kalman filtering and moving-horizon estimation. Tech. Rep.
2002-03, TWMCC – Texas-Wisconsin Modeling and Control Consortium,
3 2003.
[72] Haseltine, E. L., and Rawlings, J. B. Critical evaluation of extended
kalman filtering and moving-horizon estimation. Industrial & Engineering
Chemistry Research 44, 8 (2005), 2451–2460.
[73] Hayes, B. The best bits. American Scientist 97, 4 (July-August 2009),
276–280.
[74] Haykin, S. Kalman filtering and neural networks, vol. 47. John Wiley &
Sons, 2004.
[75] Haykin, S. Adaptive Filter Theory. Low price edition. Pearson Education,
2008.
[76] Hermann, M., Pentek, T., and Otto, B. Design principles for
industrie 4.0 scenarios. In 2016 49th Hawaii International Conference on
System Sciences (HICSS) (Jan 2016), pp. 3928–3937.
[77] Hernandez, E. M. Efficient sensor placement for state estimation in
structural dynamics. Mechanical Systems and Signal Processing 85 (2017),
789–800.
[78] Heylen, W., Lammens, S., and Sas, P. Modal Analysis Theory
and Testing. KU Leuven, Department of Mechanical Engineering, PMA
Section, 2007.
[79] Hinson, B. T. Observability-Based Guidance and Sensor Placement.

PhD thesis, University of Washington, 2014.
210 BIBLIOGRAPHY
[80] Hong, S., Lee, C., Borrelli, F., and Hedrick, J. K. A novel
approach for vehicle inertial parameter identification using a dual kalman
filter. IEEE Transactions on Intelligent Transportation Systems 16, 1
(Feb 2015), 151–161.
[81] Hou, M., and Patton, R. J. Optimal filtering for systems with
unknown inputs. IEEE Transactions on Automatic Control 43, 3 (Mar
1998), 445–449.
[82] Hsieh, C.-S. Robust two-stage kalman filters for systems with unknown
inputs. IEEE Transactions on Automatic Control 45, 12 (Dec 2000),
2374–2378.
[83] Hsieh, C.-S. Extension of unbiased minimum-variance input and state
estimation for systems with unknown inputs. Automatica 45, 9 (2009),
2149–2153.
[84] JAI. User Manual SP-12000M-CXP4 – SP-12000C-CXP4 – 12M Digital
Progressive Scan Monochrome and Color Camera. Version 1.0, September
2016.
[85] Javh, J., Slavič, J., and Boltežar, M. The subpixel resolution
of optical-flow-based modal analysis. Mechanical Systems and Signal
Processing 88 (2017), 89–99.
[86] Javh, J., Slavič, J., and Boltežar, M. High frequency modal
identification on noisy high-speed camera data. Mechanical Systems and
Signal Processing 98, Supplement C (2018), 344–351.
[87] Javh, J., Slavič, J., and Boltežar, M. Measuring full-field
displacement spectral components using photographs taken with a dslr
camera via an analogue fourier integral. Mechanical Systems and Signal
Processing 100, Supplement C (2018), 17–27.
[88] Jayawardhana, M., Zhu, X., Liyanapathirana, R., and
Gunawardana, U. Compressive sensing for efficient health monitoring
and effective damage detection of structures. Mechanical Systems and
Signal Processing 84, Part A (2017), 414–430.
[89] Johnson, K. L. Contact Mechanics. Cambridge University Press, 1985.
[90] Julier, S. J. The spherical simplex unscented transformation. In
Proceedings of the 2003 American Control Conference (June 2003), vol. 3,
pp. 2430–2434.
[91] Julier, S. J., and Uhlmann, J. K. Unscented filtering and nonlinear
estimation. Proceedings of the IEEE 92, 3 (Mar 2004), 401–422.
BIBLIOGRAPHY 211
[92] Julier, S. J., Uhlmann, J. K., and Durrant-Whyte, H. F. A

new approach for filtering nonlinear systems. In Proceedings of the 1995
American Control Conference (Jun 1995), vol. 3, pp. 1628–1632.
[93] Kailath, T. Linear systems. Prentice Hall information and system
sciences series. Prentice-Hall, Englewood Cliffs, 1980.
[94] Kalman, R. E. A new approach to linear filtering and prediction
problems. Transactions of the ASME–Journal of Basic Engineering 82,
Series D (March 1960), 35–45.
[95] Kanevsky, D., Carmi, A., Horesh, L., Gurfil, P., Ramabhadran,
B., and Sainath, T. Kalman filtering for compressed sensing. In 13th
Conference on Information Fusion (FUSION) (July 2010), pp. 1–8.
[96] Kim, J., Kim, Y., and Kim, Y. A gradient-based optimization algorithm
for lasso. Journal of Computational and Graphical Statistics 17, 4 (2008),
994–1009.
[97] Kim, J., Lee, O. K., and Ye, J. C. Compressive MUSIC: A missing
link between compressive sensing and array signal processing. CoRR
abs/1004.4398 (2010).
[98] Kim, J. M., Lee, O. K., and Ye, J. C. Compressive MUSIC with
optimized partial support for joint sparse recovery. In 2011 IEEE
International Symposium on Information Theory Proceedings (July 2011),
pp. 658–662.
[99] Kim, J. M., Lee, O. K., and Ye, J. C. Compressive MUSIC: Revisiting
the link between compressive sensing and array signal processing. IEEE
Transactions on Information Theory 58, 1 (Jan 2012), 278–301.
[100] Kim, J. M., Lee, O. K., and Ye, J. C. Dynamic sparse support
tracking with multiple measurement vectors using compressive MUSIC.
In 2012 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP) (March 2012), pp. 2717–2720.
[101] Kim, S.-J., Koh, K., Lustig, M., Boyd, S., and Gorinevsky, D. An
interior-point method for large-scale `1 -regularized least squares. IEEE
Journal of Selected Topics in Signal Processing 1, 4 (Dec 2007), 606–617.
[102] Kirchner, M., Croes, J., Cosco, F., and Desmet, W. Animation
paper ID #662, EURODYN 2017, https://youtu.be/MvHyCQ-SPi8, 2017.
[103] Kirchner, M., Croes, J., Cosco, F., and Desmet, W. Compressive
sensing-moving horizon estimator for distributed loads. Procedia
Engineering 199, C (2017), 447–452.
212 BIBLIOGRAPHY
[104] Kirchner, M., Croes, J., Cosco, F., and Desmet, W. Exploiting
input sparsity for joint state/input moving horizon estimation. Mechanical
Systems and Signal Processing 101 (2018), 237–253.
[105] Kirchner, M., Croes, J., Cosco, F., Pluymers, B., and
Desmet, W. Compressive sensing-moving horizon estimator for combined
state/input estimation: an observability study. In Proceedings of
ISMA2016 including USD2016, Leuven, Belgium (19-21 September 2016),
P. Sas, D. Moens, and A. van de Walle, Eds., pp. 2947–2962.
[106] Kirchner, M., Croes, J., Cosco, F., Pluymers, B., and
Desmet, W. Compressive sensing-moving horizon estimator for periodic
loads: experimental validation in structiral dynamics with video-based
measurements. In Proceedings of ISMA2018 and USD2018, Leuven,
Belgium (17-19 September 2018), W. Desmet, B. Pluymers, D. Moens,
and W. Rottiers, Eds.
[107] Kirchner, M., and Nijman, E. Cylindrical nearfield acoustical

holography using compressive sampling: feasibility and numerical
examples. In Proceedings of ISMA2014 including USD2014, Leuven,
Belgium (15-17 September 2014), P. Sas, D. Moens, and H. Denayer, Eds.,
pp. 1531–1546.
[108] Kirchner, M., and Nijman, E. Nearfield acoustical holography for the
characterization of cylindrical sources: practical aspects. SAE Technical
Paper 2014-01-2094, SAE International (2014).
holography. In eLiQuiD – Best Engineering Training in Electric,
Lightweight and Quiet Driving (2016), W. Desmet, B. Pluymers, and
M. Kirchner, Eds., KU Leuven, pp. 23–44.
holography: Practical aspects and possible improvements. In
SpringerBriefs in Applied Sciences and Technology, Automotive NVH
Technology (2016), A. Fuchs, E. Nijman, and H. Priebsch, Eds., Springer
International Publishing, pp. 47–62.
[111] Kitanidis, P. K. Unbiased minimum-variance linear state estimation.
Automatica 23, 6 (1987), 775–778.
[112] Klinikov, M., and Fritzen, C.-P. An updated comparison of the
force reconstruction methods. In Damage Assessment of Structures VII
(2007), vol. 347 of Key Engineering Materials, Trans Tech Publications,
pp. 461–466.
BIBLIOGRAPHY 213
[113] Klinkov, M., and Fritzen, C.-P. Online estimation of external

loads from dynamic measurements. In Proceedings of ISMA2006, Leuven,
Belgium (2006), pp. 1779–1785.
[114] Korovin, S., and Fomichev, V. State Observers for Linear Systems
with Uncertainty. De Gruyter Expositions in Mathematics, No. 51. De
Gruyter, Berlin, Boston, 2009.
[115] Ku, H. H. Notes on the use of propagation of error formulas. Journal
of Research of the National Bureau of Standards – C. Engineering and
instrumentation 70C, 4 (October-December 1966), 263–273.
[116] Liao, Y., Xiao, Q., Ding, X., and Guo, D. A novel dictionary design
algorithm for sparse representations. In International Joint Conference
on Computational Sciences and Optimization (CSO 2009) (April 2009),
vol. 1, pp. 831–834.
[117] Liu, B., Ling, S., and Gribonval, R. Bearing failure detection using
matching pursuit. NDT & E International 35, 4 (2002), 255–262.
[118] Liu, C., Wu, X., Mao, J., and Liu, X. Acoustic emission signal
processing for rolling bearing running state assessment using compressive
sensing. Mechanical Systems and Signal Processing 91 (2017), 395–406.
[119] Ljung, L., and Söderstrom, T. Theory and practice of recursive

identification. The MIT Press series in signal processing, optimization
and control. The MIT Press, Cambridge MA, London, 1983.
[120] Lobo, M. S., Vandenberghe, L., Boyd, S., and Lebret, H.
Applications of second-order cone programming. Linear Algebra and
its Applications 284, 1 (1998), 193–228.
[121] Löfberg, J. YALMIP : A toolbox for modeling and optimization in
MATLAB. In Proceedings of the CACSD Conference (Taipei, Taiwan,
2004).
[122] Lopez Negrete de la Fuente, R. Nonlinear Programming Sensitivity

Based Methods for Constrained State Estimation. PhD thesis, Carnegie
Mellon University (Pittsburgh, PA), Department of Chemical Engineering,
2011.
[123] Lopez Negrete de la Fuente, R., and Lorenz, B. T. A moving
horizon estimator for processes with multi-rate measurements: A nonlinear
programming sensitivity approach. Journal of Process Control 22, 4 (2012),
677–688.
214 BIBLIOGRAPHY
[124] Lourens, E., Papadimitriou, C., Gillijns, S., Reynders, E.,

Roeck, G. D., and Lombaert, G. Joint input-response estimation for
structural systems based on reduced-order models and vibration data from
a limited number of sensors. Mechanical Systems and Signal Processing
29 (2012), 310–327.
[125] Lourens, E., Reynders, E., Roeck, G. D., Degrande, G., and
Lombaert, G. An augmented kalman filter for force identification in
structural dynamics. Mechanical Systems and Signal Processing 27 (2012),
446–460.
[126] Lu, P., van Kampen, E.-J., de Visser, C. C., and Chu, Q.
Framework for state and unknown input estimation of linear time-varying
systems. Automatica 73 (2016), 145–154.
[127] Maes, K., Gillijns, S., and Lombaert, G. A smoothing algorithm for
joint input-state estimation in structural dynamics. Mechanical Systems
and Signal Processing 98 (2018), 292–309.
[128] Maes, K., Smyth, A., Roeck, G. D., and Lombaert, G. Joint
input-state estimation in structural dynamics. Mechanical Systems and
Signal Processing 70-71 (2016), 445–466.
[129] Mascareñas, D., Cattaneo, A., Theiler, J., and Farrar, C.

Compressed sensing techniques for detecting damage in structures.
Structural Health Monitoring 12, 4 (2013), 325–338.
[130] McFadden, P., and Smith, J. The vibration produced by multiple
point defects in a rolling element bearing. Journal of Sound and Vibration
98, 2 (1985), 263–273.
[131] MOSEK ApS. MOSEK Quickstart Guide Version 7.1 (Revision 60),
2015.
[132] Nady, R. H., and Hagara, M. A new procedure of modal parameter
estimation for high-speed digital image correlation. Mechanical Systems
and Signal Processing 93 (2017), 66–79.
[133] Naets, F., Croes, J., and Desmet, W. An online coupled
state/input/parameter estimation approach for structural dynamics.
Computer Methods in Applied Mechanics and Engineering 283 (2015),
1167–1188.
[134] Naets, F., Cuadrado, J., and Desmet, W. Stable force identification
in structural dynamics using kalman filtering and dummy-measurements.
Mechanical Systems and Signal Processing 50-51 (2015), 235–248.
BIBLIOGRAPHY 215
[135] Nagarajaiah, S. Sparse and low-rank methods in structural system

identification and monitoring. Procedia Engineering 199, C (2017), 62–69.
[136] Nikon Metrology NV. http://www.nikonmetrology.com.
[137] Nocedal, J., and Wright, S. J. Numerical optimization, 2nd ed.
Springer Series in Operations Research and Financial Engineering.
Springer, 2006.
[138] Ogata, K. Modern Control Engineering, 5th ed. Prentice Hall PTR,
Upper Saddle River, NJ, USA, 2010.
[139] Pacejka, H. B. Tire and Vehicle Dynamics, 2nd ed. Society of
Automotive Engineers, SAE International, 2006.
[140] Papoulis, A. Probability, random variables, and stochastic processes.
McGraw-Hill series in electrical engineering. McGraw-Hill, New York,
1991.
[141] Parikh, N., and Boyd, S. Matlab scripts for proximal methods –
Example 1 : LASSO https://web.stanford.edu/∼boyd/papers/prox_algs/
lasso.html.
[142] Patrinos, P., and Bemporad, A. An accelerated dual gradient-
projection algorithm for embedded linear model predictive control.
Automatic Control, IEEE Transactions on 59, 1 (Jan 2014), 18–33.
[143] Paulson, N. R., Sadeghi, F., and Habchi, W. A coupled finite
element EHL and continuum damage mechanics model for rolling contact
fatigue. Tribology International 107 (2017), 173–183.
[144] PCB ICP® Impedance Head 288D01. http://www.pcb.com/
products.aspx?m=288d01.
[145] Peeters, B., Van der Auweraer, H., Guillaume, P., and
Leuridan, J. The polymax frequency-domain method: A new standard
for modal parameter estimation? Shock and Vibration 11, 3-4 (2004),
395–409.
[146] Peillot, A., Ollivier, F., Chardon, G., and Daudet, L.
Localization and identification of sound sources using “compressive
sampling” techniques. In 18th International Congress on Sound &
Vibration (ICSV18) (July 10-14 2011).
[147] Pereira, M. P., Lovisolo, L., da Silva, E. A., and de Campos,
M. L. On the design of maximally incoherent sensing matrices
for compressed sensing using orthogonal bases and its extension for
biorthogonal bases case. Digital Signal Processing 27, 0 (2014), 12–22.
216 BIBLIOGRAPHY
[148] Perepu, S. K., and Tangirala, A. K. Reconstruction of missing data

using compressed sensing techniques with adaptive dictionary. Journal of
Process Control 47 (2016), 175–190.
[149] Pratt, W. Digital Image Processing: PIKS Scientific Inside (4th

Edition). A Wiley-Interscience publication. Wiley, 2007.
[150] Qaisar, S., Bilal, R., Iqbal, W., Naureen, M., and Lee, S.
Compressive sensing: From theory to applications, a survey. Journal of
Communications and Networks 15, 5 (Oct 2013), 443–456.
[151] Qiao, B., Mao, Z., and Chen, X. Sparse representation for the inverse
problem of force identification. In Proceedings of ISMA2016 including
USD2016, Leuven, Belgium (19-21 September 2016), P. Sas, D. Moens,
and A. van de Walle, Eds., pp. 1685–1696.
[152] Qiao, B., Zhang, X., Gao, J., and Chen, X. Impact-force sparse
reconstruction from highly incomplete and inaccurate measurements.
Journal of Sound and Vibration 376 (2016), 72–94.
[153] Qiao, B., Zhang, X., Gao, J., Liu, R., and Chen, X. Sparse
deconvolution for the large-scale ill-posed inverse problem of impact force
reconstruction. Mechanical Systems and Signal Processing 83 (2017),
93–115.
[154] Qiao, B., Zhang, X., Wang, C., Zhang, H., and Chen, X. Sparse
regularization for force identification using dictionaries. Journal of Sound
and Vibration 368 (2016), 71–86.
[155] Qu, C. C., and Hahn, J. Computation of arrival cost for moving horizon
estimation via unscented kalman filtering. Journal of Process Control 19,
2 (2009), 358–363.
[156] Quirynen, R. Numerical Simulation Methods for Embedded Optimization.
PhD thesis, KU Leuven and University of Freiburg, January 2017.
[157] Quirynen, R., Gros, S., and Diehl, M. Efficient NMPC for
nonlinear models with linear subsystems. In Proceedings of the 52nd
IEEE Conference on Decision and Control (2013), pp. 5101–5106.
[158] Quirynen, R., Gros, S., and Diehl, M. Fast auto generated ACADO
integrators and application to MHE with multi-rate measurements. In
Proceedings of the European Control Conference (2013), pp. 3077–3082.
[159] Quirynen, R., Gros, S., and Diehl, M. Inexact Newton based
Lifted Implicit Integrators for fast Nonlinear MPC. In Proceedings of
BIBLIOGRAPHY 217
the 5th IFAC Conference on Nonlinear Model Predictive Control (2015),

pp. 32–38.
[160] Quirynen, R., Gros, S., and Diehl, M. Lifted implicit integrators for
direct optimal control. In Conference on Decision and Control (2015).
[161] Quirynen, R., Houska, B., and Diehl, M. Efficient symmetric hessian
propagation for direct optimal control. Journal of Process Control 50
(2017), 19–28.
[162] Quirynen, R., Houska, B., Vallerio, M., Telen, D., Logist, F.,
Van Impe, J., and Diehl, M. Symmetric Algorithmic Differentiation
Based Exact Hessian SQP Method and Software for Economic MPC. In
Conference on Decision and Control (2014), pp. 2752–2757.
[163] Quirynen, R., Vukov, M., and Diehl, M. Auto Generation of
Implicit Integrators for Embedded NMPC with Microsecond Sampling
Times. In Proceedings of the 4th IFAC Nonlinear Model Predictive Control
Conference (2012), M. Lazar and F. Allgöwer, Eds., pp. 175–180.
[164] Quirynen, R., Vukov, M., Zanon, M., and Diehl, M.
Autogenerating Microsecond Solvers for Nonlinear MPC: a Tutorial Using
ACADO Integrators. Optimal Control Applications and Methods 36 (2014),
685–704.
[165] Quirynen, R., Zanon, M., Kozma, A., and Diehl, M. A Compression
Algorithm for Real-Time Distributed Nonlinear MPC. In Proceedings of
the European Control Conference (2015).
[166] Rao, C. V. Moving Horizon Strategies for the Constrained Monitoring
and Control of Nonlinear Discrete-Time Systems. PhD thesis, University
of Wisconsin-Madison, February 2000.
[167] Rao, C. V., and Rawlings, J. B. Constrained process monitoring:
Moving-horizon approach. AIChE Journal 48, 1 (2002), 97–109.
[168] Rao, C. V., Rawlings, J. B., and Lee, J. H. Constrained linear

state estimation–a moving horizon approach. Automatica 37, 10 (2001),
1619–1628.
[169] Ray, L. R. Nonlinear state and tire force estimation for advanced vehicle
control. IEEE Transactions on Control Systems Technology 3, 1 (Mar
1995), 117–124.
[170] Ray, L. R. Nonlinear tire force estimation and road friction identification:
Simulation and experiments1,2. Automatica 33, 10 (1997), 1819–1833.
218 BIBLIOGRAPHY
[171] Rezayat, A., Nassiri, V., Pauw, B. D., Ertveldt, J., Vanlanduit,
S., and Guillaume, P. Identification of dynamic forces using group-
sparsity in frequency domain. Mechanical Systems and Signal Processing
70-71 (2016), 756–768.
[172] Ristic, B., Arulampalam, S., and Gordon, N. Beyond the Kalman
filter : particle filters for tracking applications. Artech House, Boston,
London, 2004.
[173] Sain, M., and Massey, J. Invertibility of linear time-invariant dynamical
systems. Automatic Control, IEEE Transactions on 14, 2 (Apr 1969),
141–149.
[174] Sanchez, R. R., Buchschmid, m., and Müller, G. Model order
reduction techniques in structural dynamics. In ECCOMAS Congress
2016 – VII European Congress on Computational Methods in Applied
Sciences and Engineering (June 2016), M. Papadrakakis, V. Papadopoulos,
G. Stefanou, and V. Plevris, Eds.
[175] Santosa, F., and Symes, W. W. Linear inversion of band-limited
reflection seismograms. SIAM Journal on Scientific and Statistical
Computing 7, 4 (1986), 1307–1330.
[176] Särkkä, S. Bayesian Filtering and Smoothing. Cambridge University
Press, 2013.
[177] Sawalhi, N., and Randall, R. Simulation of vibrations produced
by localized faults in rolling elements of bearings in gearboxes. In 5th
Australasian Congress on Applied Mechanics (ACAM), Brisbane, Australia
(2007).
[178] Schmidt, P. Improvements in localization of planar acoustic holography.
Master’s thesis, Institute of Electronic Music and Acoustics, University of
Music and Performing Arts, Graz, Austria, May 2012.
[179] Sen, D., Aghazadeh, A., Mousavi, A., Nagarajaiah, S., and
Baraniuk, R. Sparsity-based approaches for damage detection in plates.
Mechanical Systems and Signal Processing 117 (2019), 333 – 346.
[180] Siemens PLM Software. https://www.plm.automation.siemens.com/en/
products/lms/testing/scadas/.
products/lms/testing/test-lab/.
products/nx/.
BIBLIOGRAPHY 219
[183] Simon, D. Optimal State Estimation: Kalman, H Infinity, and Nonlinear

Approaches. Wiley-Interscience, 2006.
[184] Simon, D. Kalman filtering with state constraints: a survey of linear
and nonlinear algorithms. IET Control Theory Applications 4, 8 (August
2010), 1303–1318.
[185] Smith, C. B., and Hernandez, E. M. Exploiting spatial sparsity in
vibration-based damage detection. Procedia Engineering 199, C (2017),
1925–1930.
[186] Steltzner, A. D., and Kammer, D. C. Input force estimation using

an inverse structural filter. In Proceedings of the 17th International Modal
Analysis Conference, Florida, USA (1999), pp. 954–960.
[187] Tao, T. https://www.math.ucla.edu/∼tao/preprints/sparse.html.
[188] THE MODAL SHOP Miniature Inertial Shaker Sys-

tem K2002E01. http://www.modalshop.com/mini-inertial-shaker-
system?id=1136.
[189] Tibshirani, R. Regression shrinkage and selection via the lasso. Journal
of the Royal Statistical Society. Series B (Methodological) 58, 1 (1996),
267–288.
[190] Tropp, J. A., and Gilbert, A. C. Signal recovery from random
measurements via orthogonal matching pursuit. IEEE Transactions on
Information Theory 53, 12 (Dec 2007), 4655–4666.
[191] Tropp, J. A., and Wright, S. J. Computational methods for sparse
solution of linear inverse problems. Proceedings of the IEEE 98, 6 (June
2010), 948–958.
[192] van Aalst, S., Naets, F., Johan, T., and Desmet, W. Use of flexible
models in extended kalman filtering for vehicle body force estimation. In
ECCOMAS Thematic Conference on Multibody Dynamics (2015), pp. 1329–
1340.
[193] van den Berg, E., and Friedlander, M. P. SPGL1 : A solver for
large-scale sparse reconstruction. http://www.cs.ubc.ca/labs/scl/spgl1,
June 2007.
[194] van den Berg, E., and Friedlander, M. P. Probing the pareto
frontier for basis pursuit solutions. SIAM Journal on Scientific Computing
31, 2 (2008), 890–912.
220 BIBLIOGRAPHY
[195] Van der Merwe, R. Sigma-Point Kalman Filters for Probabilistic

Inference in Dynamic State-Space Models. PhD thesis, Oregon Health &
Science University, April 2004.
[196] Van Loan, C. Computing integrals involving the matrix exponential.

IEEE Transactions on Automatic Control 23, 3 (Jun 1978), 395–404.
[197] Vaswani, N. Kalman filtered compressed sensing. In 15th IEEE
International Conference on Image Processing (ICIP 2008) (Oct 2008),
pp. 893–896.
[198] Wiener, N. Extrapolation, interpolation, and smoothing of stationary

time series with engineering applications. Technology Press of the
Massachusetts Institute of Technology, Cambridge MA, 1964.
[199] Wiener, N. I Am a Mathematician, The Later Life of a Prodigy; an
autobiographical account of the mature years and career of Norbert Wiener,
and a continuation of the account of his childhood in Ex-Prodigy. The
MIT Press, Cambridge MA, 1964.
[200] Williams, E. G. Fourier Acoustics: Sound Radiation and Nearfield
Acoustical Holography. Academic Press, 1999.
[201] XIMEA. https://www.ximea.com.

[202] XIMEA. xiQ USB 3.0 camera series – Technical Manual. Version 1.30,
April 2017.
[203] Yu, L., and Pan, B. Single-camera high-speed stereo-digital image
correlation for full-field vibration measurement. Mechanical Systems and
Signal Processing 94 (2017), 374–383.
[204] Zanon, M. Efficient Nonlinear Model Predictive Control Formulations
for Economic Objectives with Aerospace and Automotive Applications.
PhD thesis, KU Leuven, November 2015.
[205] Zhang, Z., Xu, Y., Yang, J., Li, X., and Zhang, D. A survey
of sparse representation: Algorithms and applications. IEEE Access 3
(2015), 490–530.
[206] Zwartjes, P., and Gisolf, A. Fourier reconstruction with sparse
inversion. Geophysical Prospecting 55, 2 (2007), 199–221.
Curriculum vitae
Matteo KIRCHNER
Italian. Born March 7, 1985 in Trento, Italy.

matteo.kirchner@gmail.com
2013 – 2018 PhD student

KU Leuven, Department of Mechanical Engineering,
Division PMA, Noise & Vibration Engineering, Leuven, Belgium
2013 – 2014 Researcher
Virtual Vehicle Research Center,
Area NVH & Friction, Graz, Austria
2011 – 2012 R&D researcher
LMS International, CAE Division, Leuven, Belgium
2011 Researcher
University of Trento, Faculty of Engineering,
Mechanical Measurement Group, Trento, Italy
2007 – 2010 Master of Science in Mechatronics Engineering
University of Trento, Trento, Italy
Thesis: “Design and Manufacturing of a Force Panel for the characteri-
zation of 2D Human-Machine Interaction”, Supervisor Prof. Mariolino
De Cecco
2008 Visiting student
Ruhr Universität Bochum, Bochum, Germany
2004 – 2007 Bachelor of Science in Industrial Engineering
University of Trento, Trento, Italy
Thesis: “Development of a system based on Color Image Processing to
quantify the level of baking”, Supervisor Prof. Mariolino De Cecco
221
List of publications
Journal articles
M. Kirchner, J. Croes, F. Cosco and W. Desmet. Exploiting input

sparsity for joint state/input estimation. Mechanical Systems and Signal
Processing 101C (2018), pages 237-253, 2018.
Papers at international scientific conferences
M. Kirchner, J. Croes, F. Cosco, B. Pluymers and W. Desmet.

Compressive sensing-moving horizon estimator for periodic loads: experi-
mental validation in structural dynamics with video-based measurements.
Proceedings of ISMA2018 and USD2018, Leuven, Belgium, September
17-19, 2018.
M. Kirchner, J. Croes, F. Cosco and W. Desmet. Compressive sensing-
moving horizon estimator for distributed loads. X International Conference
on Structural Dynamics (EURODYN 2017), Rome, Italy, September
10-13, 2017. Procedia Engineering 199C (2017), pages 447-452, 2017
(peer-reviewed paper).
M. Kirchner, J. Croes, F. Cosco, B. Pluymers and W. Desmet.
Compressive sensing-moving horizon estimator for combined state/input
estimation: an observability study. Proceedings of ISMA2016 including
USD2016, pages 2947-2962, Leuven, Belgium, September 19-21, 2016.
M. Kirchner and E. Nijman. Cylindrical Nearfield Acoustical Holography

using compressive sampling: feasibility and numerical examples. Proceed-
ing of ISMA2014 including USD2014, pages 1531-1546, Leuven, Belgium,
September 15-17, 2014.
223
224 LIST OF PUBLICATIONS
M. Kirchner and E. Nijman. Nearfield Acoustical Holography for

the Characterization of Cylindrical Sources: Practical Aspects. 8th
International Styrian Noise, Vibration & Harshness Congress: The
European Automotive Noise Conference (ISNVH), Graz, Austria, July
2-4, 2014. SAE Technical Paper 2014-01-2094, SAE International, 2014
(peer-reviewed paper).
Book chapters
M. Kirchner and E. Nijman. Cylindrical Nearfield Acoustical Holography.

In W. Desmet, B. Pluymers and M. Kirchner, eLiQuiD – Best Engineering
Training in Electric, Lightweight and Quiet Driving, Chapter 2, pages
23-44, 2016.
M. Kirchner and E. Nijman. Cylindrical Nearfield Acoustical Holography:
Practical Aspects and Possible Improvements. In SpringerBriefs in Applied
Sciences and Technology, Automotive NVH Technology, Chapter 4, pages
47-62, Springer International Publishing, 2016.
Books
W. Desmet, B. Pluymers and M. Kirchner. eLiQuiD – Best Engineering

Training in Electric, Lightweight and Quiet Driving. KU Leuven, 2016.
Misc.
M. Kirchner and E. Nijman. An Experimental Method for the NVH

Characterization of Electric Motors. In J. Bernasch and A. Fuchs, Virtual
Vehicle Magazine, 19 (IV-2014), pages 14-16, 2014.
Older publications (without KU Leuven affiliation)
S. Manzato, M. Kirchner, H. Erdélyi, G. Baglini and M. Pieve. Numerical

and operational identification and assessment of motorcycle dynamics and
comfort. International Conference on Structural Engineering Dynamics
(ICEDyn 2013), Sesimbra, Portugal, June 17-19, 2013.
H. Erdélyi, M. Kirchner, S. Manzato and S. Donders. Multibody simulation
with a virtual dummy for motorcycle vibration comfort assessment.
Proceeding of ISMA2012 including USD2012, pages 2269-2282, Leuven,
Belgium, September 17-19, 2012.
LIST OF PUBLICATIONS 225
M. Kirchner, M. Confalonieri, A. Paludet, F. Degasperi, M. Da Lio and

M. De Cecco. A joint force-position measurement system for accessibility
quantification. Proceedings of 2nd International AEGIS Conference and
Final Workshop, pages 263-273, Brussels, Belgium, November 28-30, 2011.
M. Kirchner, M. De Cecco, M. Confalonieri and M. Da Lio. A joint force-

position measurement system for neuromotor performances assessment.
Proceedings of the 2011 IEEE International Workshop on Medical
Measurements and Applications (MeMeA), Bari, Italy, May 30-31, 2011.
M. De Cecco, F. Setti, M. Lunardelli, R. Bini, M. Tavernini, L. Baglivo,

A. Del Bue, M. Kirchner, A. Paludet and M. Da Lio. VERITAS Project,
SIAMOC, Ferrara, Italy, 2010.
Source: https://fr.wikipedia.org/wiki/Sonate_pour_piano_n◦ _32_de_Beethoven
FACULTY OF ENGINEERING SCIENCE
DEPARTMENT OF MECHANICAL ENGINEERING
DIVISION PMA – NOISE & VIBRATION ENGINEERING
Celestijnenlaan 300 box 2420
B-3001 Leuven

PHD Kirchner Matteo November 2018

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

PHD Kirchner Matteo November 2018

Enviado por

Direitos autorais:

Formatos disponíveis

ARENBERG DOCTORAL SCHOOL

Faculty of Engineering Science

Joint state/input estimation in

Dissertation presented in partial fulfilment

Examination committee: Dissertation presented in partial

Giulia, Marco, Martina, Mauro, Michele, Monte Bondone, Nicola, Roberto,

Leuven, November 6th , 2018

The trend of increasing intelligence in mechatronic systems characterises many

De trend naar toenemende intelligentie in mechatronische systemen is

BOB best orthogonal basis

DAE differential algebraic equation

e.g. exempli gratia

LASSO least absolute shrinkage and selection operator

MHE moving horizon estimator

NAH nearfield acoustical holography

ODE ordinary differential equation

QCQP quadratically constrained quadratic programming problem

RIP restricted isometry property

SCP sequential convex programming

UKF unscented Kalman filter

Note: suffix “s” indicates abbreviations in the plural form.

Specific symbols for the CS-MHE

Pα∗ covariance matrix associated to α∗

Beknopte samenvatting vii

List of symbols xiii

A State of the art 11

A1.5 The moving horizon estimator . . . . . . . . . . . . . . . . . 32

B The compressive sensing–moving horizon estimator

B1 Formulation of the CS-MHE 83

B1.4 The CS-MHE with complex input representations . . . . . . 90

B2 Rank and condition number considerations for the CS-MHE 101

C1 Estimation of force impacts 121

C2 Estimation of periodic loads described by Fourier components 139

Conclusions and outlook 171

Appendix 1: Matrix implementation of the CS-MHE 177

Appendix 2: Test case description for section C2.3 183

Curriculum vitae 221

List of publications 223

are variables that we want to estimate or control. The discipline of state

include in the estimator the knowledge of an input shape. Consequently, the

The compressive sensing–moving horizon estimator in the current industry

1. interoperability, i.e., the ability of machines, devices, sensors, and people

well as the ability of cyber-physical systems to physically support humans

Outline and structure of the dissertation

Table I outlines the part and chapter subdivision of this dissertation.

Table I: Structure of this thesis.

A State of the art

B The compressive sensing–moving

Conclusions and outlook

Part B discusses the development of the CS-MHE for joint state/input

Figure I: First application example. Experimental set-up (top left), model

problem. Next, we propose a second formulation of the CS-MHE, which

MPF vel,n 3 n ( ) 3 -10

This thesis presents contributions related to the field of joint state/input

In the remaining part of this introductory chapter we outline eight main

Estimation of inputs applied at an unknown location (i)

Estimation of distributed inputs (ii)

Estimation of periodic inputs modelled by complex values (iii)

Rank and condition number assessment of the CS-MHE matrices (iv)

Estimation of inputs characterised by fast dynamics with respect to the

Estimation of force impacts (vi)

The CS-MHE for the estimation of a distributed load: application case

Table II: Connections between chapters and contributions.