Part 5 - Prediction Error Methods

System Identification
Control Engineering B.Sc., 3rd year

Technical University of Cluj-Napoca
Romania
Lecturer: Lucian Busoniu
ARX models
Model structures
General PEM
Part V
Prediction error methods
Implementation & Extensions
ARX models
Model structures
General PEM
Table of contents
Identification of ARX models
Model structures
General prediction error methods
Implementation and extensions
We stay in the single-output, single-input case for all but the last section.
ARX models
Model structures
General PEM
Classification
Recall Types of models from Part I:

1
2
3
Mental or verbal models

Graphs and tables (nonparametric)
Mathematical models, with two subtypes:
First-principles, analytical models
Models from system identification
Prediction error methods produce parametric, polynomial models.
ARX models
Model structures
General PEM
Table of contents

Analytical development
Theoretical guarantee
Matlab example
Model structures
ARX models
Model structures
General PEM
Recall: discrete-time model
ARX models
Model structures
General PEM
ARX model structure

We consider first the ARX model structure, where the output y(k) at
the current discrete time step is computed based on previous input
and output values:
y(k) + a1 y(k 1) + a2 y (k 2) + . . . + ana y(k na)
= b1 u(k 1) + b2 u(k 2) + . . . + bnb u(k nb) + e(k )
equivalent to
y(k) = a1 y (k 1) a2 y (k 2) . . . ana y(k na)
b1 u(k 1) + b2 u(k 2) + . . . + bnb u(k nb) + e(k )
e(k ) is the noise at step k .
Model parameters: a1 , a2 , . . . , ana and b1 , b2 , . . . , bnb .
Name: AutoRegressive (y (k ) a function of previous y values) with
eXogenous input (dependence on u)
ARX models
Model structures
General PEM
Polynomial representation
Backward shift operator q 1 :
q 1 z(k ) = z(k 1)
where z(k ) is any discrete-time signal.
Then:
y(k ) + a1 y(k 1) + a2 y (k 2) + . . . + ana y(k na)
= (1 + a1 q 1 + a2 q 2 + . . . + ana q na )y(k) =: A(q 1 )y(k)
and:
b1 u(k 1) + b2 u(k 2) + . . . + bnb u(k nb)
= (b1 q 1 + b2 q 2 + . . . + bnb q nb )u(k ) =: B(q 1 )u(k)
ARX models
Model structures
General PEM
ARX model in polynomial form

Therefore, the ARX model is written compactly:
A(q 1 )y(k) = B(q 1 )u(k ) + e(k )
The symbolic representation in the figure holds because:

y(k) =
1
[B(q 1 )u(k) + e(k )]
A(q 1 )
Remark: The ARX model is quite general, it can describe arbitrary

linear relationships between inputs and outputs. However, the noise
enters the model in a restricted way, and later we introduce models
that generalize this.
ARX models
Model structures
General PEM
Linear regression model

Returning to the explicit recursive representation:
y (k ) = a1 y (k 1) a2 y (k 2) . . . ana y (k na)
b1 u(k 1) + b2 u(k 2) + . . . + bnb u(k nb) + e(k)

= y (k 1) y (k na) u(k 1) u(k nb)

>
a1 ana b1 bnb + e(k )
=:> (k ) + e(k )
So in fact ARX obeys the standard model structure in linear
regression!
Regressor vector: Rna+nb , previous output and input values.
Parameter vector: Rna+nb , polynomial coefficients.
ARX models
Model structures
General PEM
Identification problem
Consider now that we are given a vector of data u(k ), y (k ),
k = 1, . . . , N, and we have to find the model parameters .
Then for any k :
y (k ) = > (k ) + (k )
where (k ) is now interpreted as an equation error.
Objective: minimize the mean squared error:
V () =
N
1 X
(k)2
N
k=1
Remark: When k na, nb, zero- and negative-time values for u and
y are needed to construct . They can be taken 0 (assuming the
system is in zero initial conditions).
ARX models
Model structures
General PEM
Linear system

y(1) = y (1) y (na) u(1) u(nb)

y(2) = y (0) y (1 na) u(0) u(1 nb)

y(N) = y (N 1)
Matrix form:

y (1)
y(1)
y (2) y (0)

.. =
..
.
.
y(N)
..
.
y (N 1)
y(N na) u(N 1)
y (na)
y(1 na)
..
.
u(1)
u(0)
..
.
y (N na) u(N 1)
Y =
with notations Y RN and RN(na+nb) .

u(N nb)
u(nb)
u(1 nb)
u(N nb)
ARX models
Model structures
General PEM
ARX solution
1
2
From linear regression, to minimize
PN
k =1
(k )2 the parameters are:
b = (> )1 > Y
PN
Since the new V () = N1 k=1 (k )2 is proportional to the criterion
above, the same solution also minimizes V ().
However, the form above is impractical in system identification, since
the number of data points N can be very large. Better form:
> =
N
X
(k )> (k ),
> Y =
k =1
"
b =
N
X
k=1
N
X
(k )y (k )
k=1
#1 "
(k)> (k )
N
X
k =1
#
(k )y (k )
ARX models
Model structures
General PEM
ARX solution (continued)
Remaining issue: the sum of N terms can grow very large, leading to
numerical problems: (matrix of very large numbers)1 vector of very
large numbers.
Solution: Normalize element values by diving them by N. In
equations, N simplifies so it has no effect on the analytical
development, but in practice it keeps the numbers reasonable.
"
b =
#1 "
#
N
N
1 X
1 X
>
(k ) (k )
(k )y (k)
N
N
k =1
k =1
ARX models
Model structures
General PEM
Table of contents

Matlab example
Model structures
ARX models
Model structures
General PEM
Main result
Assumptions
1
There exists a true parameter vector 0 so that:

y (k ) = (k )0 + v (t)
2
3
with v (k ) a stationary stochastic process independent from u(k ).

E (k)> (k ) is a nonsingular matrix.
E {(k )v (k )} = 0.
Theorem
ARX identification is consistent: the estimated parameters b tend to
the true parameters 0 as N .
ARX models
Model structures
General PEM
Discussion of assumptions
1
Assumption 1 is equivalent to the existence of true polynomials

A0 (q 1 ), B0 (q 1 ) so that:
A0 (q 1 )y (k) = B0 (q 1 )u(k) + v (k )
To motivate Assumption 2, recall

i1 h P
h P
N
N
1
(k )> (k )
b = 1
i
(k)y(k)

PN
>
>
k=1 (k ) (k ) E (k ) (k) .
N
As N ,
2
1
N
k=1
k=1

E (k)> (k ) is nonsingular if the data is sufficiently
informative (e.g., u(k ) should not be a simple feedback from
& Stoica for more discussion).

y(k); see Soderstr
om
E {(k )v (k )} = 0 if v (k) is white noise. For more details on
Assumption 3 and the role of E {(k )v (k )} = 0, see Part VI.
ARX models
Model structures
General PEM
Table of contents

Matlab example
Model structures
ARX models
Model structures
General PEM
Experimental data
Consider we are given the following, separate, identification and
validation data sets.
plot(id); and plot(val);
Remarks: Identification input: a so-called pseudo-random binary

signal, an approximation of (non-zero-mean) white noise. Validation
input: a sequence of steps.
ARX models
Model structures
General PEM
Identifying an ARX model

model = arx(id, [na, nb, nk]);
Arguments:
1
2
Identification data.
Array containing the orders of A and B and the delay nk.
Structure slightly different from theory: includes the explicit minimum

delay nk between inputs and outputs, useful to model systems with
time delays.
y(k) + a1 y (k 1)+a2 y (k 2) + . . . + ana y (k na)
= b1 u(k nk)+b2 u(k nk 1) + . . . + bnb u(k nk nb + 1) + e(k )
A(q 1 )y (k ) = B(q 1 )u(k nk) + e(k ), where:
A(q 1 ) = (1 + a1 q 1 + a2 q 2 + . . . + ana q na )
B(q 1 ) = (b1 + b2 q 1 + bnb q nb+1 )
Note: can transform this into theoretical structure by rewriting the
right-hand side using a B(q 1 ) of order nk + nb 1, with nk 1
leading zeros.
ARX models
Model structures
General PEM
Model validation
Assuming the system is second-order with no time delay, we take
na = 2, nb = 1, nk = 1. Validation: compare(model, val);
Results are quite bad.
ARX models
Model structures
General PEM
Structure selection
Better idea: try many different structures and choose the best one.
Na = 1:15;
Nb = 1:15;
Nk = 1:5;
NN = struc(Na, Nb, Nk);
V = arxstruc(id, val, NN);
struc generates all combinations of orders in Na, Nb, Nk.
arxstruc identifies for each combination an ARX model (on the
data in 1st argument), simulates it (on the data in the 2nd
argument), and returns all the MSEs on the first row of V (see
help arxstruc for the format of V).
ARX models
Model structures
General PEM
Structure selection (continued)

To choose the structure with the smallest MSE:
N = selstruc(V, 0);
For our data, N= [8, 7, 1].
Alternatively, graphical selection: N = selstruc(V, plot);
(Later we learn other structure selection criteria than smallest MSE.)
ARX models
Model structures
General PEM
Validation of best ARX model

model = arx(id, N); compare(model, val);
A better fit is obtained.
ARX models
Model structures
General PEM
Table of contents
Model structures
ARX models
Model structures
General PEM
Motivation
Before deriving the general PEM, we will introduce the general class
of models to which it can be applied.
ARX models
Model structures
General PEM
General model structure
y (k ) = G(q 1 )u(k) + H(q 1 )e(k )

where G and H are discrete-time transfer functions ratios of
polynomials. Signal e(t) is zero-mean white noise.
By placing the common factors in the denominators of G and H in
A(q 1 ), we get the more detailed form:
A(q 1 )y (k ) =
B(q 1 )
C(q 1 )
u(k) +
e(k )
1
F (q )
D(q 1 )
where A, B, C, D, F are all polynomials, of orders na, nb, nc, nd, nf .
ARX models
Model structures
General PEM
General model structure (continued)
A(q 1 )y (k ) =
B(q 1 )
C(q 1 )
u(k)
+
e(k )
F (q 1 )
D(q 1 )
Very general form, all other linear forms are special cases of this. Not
for practical use, but to describe algorithms in a generic way.
ARX models
Model structures
General PEM
ARMAX model structure
Setting F = D = 1 (i.e. orders nf = nd = 0), we get:

A(q 1 )y (k ) = B(q 1 )u(k) + C(q 1 )e(k )
Name: AutoRegressive, Moving Average (referring to noise model)

with eXogenous input (dependence on u)
ARX models
Model structures
General PEM
ARMAX: explicit form

A(q 1 )y(k) = B(q 1 )u(k ) + C(q 1 )e(k )
A(q 1 ) = 1 + a1 q 1 + . . . + ana q na
B(q 1 ) = b1 q 1 + . . . + bnb q nb
C(q 1 ) = 1 + c1 q 1 + . . . + cnc q nc
y(k) + a1 y(k 1) + . . . + ana y (k na)
= b1 u(k 1) + . . . + bnb u(k nb)
+ e(k) + c1 e(k 1) + . . . + cnc e(k nc)
with parameter vector:
>
= [a1 , . . . , ana , b1 , . . . , bnb , c1 , . . . , cnc ] Rna+nb+nc

Compared to ARX, ARMAX can model more intricate relationships
between the disturbance and the inputs and outputs.
ARX models
Model structures
General PEM
Special case of ARMAX: ARX
Setting C = 1 in ARMAX (nc = 0), we get:

A(q 1 )y(k) = B(q 1 )u(k ) + e(k )
precisely the ARX model we worked with before.
ARX models
Model structures
General PEM
Special case of ARX: FIR

Further setting A = 1 (na = 0) in ARX, we get:
y (k ) = B(q 1 )u(k) + e(k) =
nb
X
bj u(k j) + e(k )
j=1
M1
X
h(j)u(k j) + e(k )
j=0
the FIR model from correlation analysis!
(where h(0),the impulse response at time 0, is assumed 0 i.e.

system does not respond instantaneously to changes in input)
ARX models
Model structures
General PEM
Difference between ARX and FIR
ARX: A(q 1 )y (k ) = B(q 1 )u(k ) + e(k )

FIR:
y (k ) = B(q 1 )u(k ) + e(k )
Since ARX includes recursive relationships between current and

previous outputs, it will be sufficient to take orders na and nb (at
most) equal to the order of the dynamical system.
FIR needs a sufficiently large order nb (or length M) to model the
entire transient regime of the impulse response (in principle, we only
recover the correct model as M ).
more parameters more data needed to identify them.
ARX models
Model structures
General PEM
Overall relationship
General Form ARMAX ARX FIR
ARMAX to ARX: Less freedom in modeling disturbance.

ARX to FIR: More parameters required.
ARX models
Model structures
General PEM
Output error
Other model forms are possible that are not special cases of ARMAX,
e.g. Output Error, OE:
y(k) =
B(q 1 )
u(k) + e(k )
F (q 1 )
obtained for na = nc = nd = 0, i.e. A = C = D = 1.
This corresponds to simple, additive measurement noise on the

output (the output error).
ARX models
Model structures
General PEM
Table of contents
1
Model structures

Stepping stones: ARX and ARMAX
General case
Matlab example
Theoretical guarantees
ARX models
Model structures
General PEM
ARX reinterpreted as a PEM
2
3
Compute predictions at each step, yb(k) = > (k ) given

parameters .
Compute prediction errors at each step, (k ) = y (k ) yb(k).
Find a parameter vector minimizing criterion
PN
V () = N1 k=1 2 (k ).
Prediction error methods are obtained by extending this procedure to

general model structures.
ARX models
Model structures
General PEM
ARX reinterpreted as a PEM (continued)
Remarks:
ARX predictor yb(k) is chosen to achieve the error (k ) = e(k ).
We will aim to achieve the same error in the general PEM,
intuitively because we cannot do better.
For the mathematically inclined: Such a predictor is optimal in the sense
of minimizing the variance of (k).
The prediction error is for ARX just a rearrangement of the

equation y (k ) = > (k ) + (k ) = yb(k) + (k ).
The criterion can be more general than the MSE, but we will
always use MSE in this section. For ARX, we already know how
to minimize the MSE (from linear regression); for more general
models new methods will be introduced.
ARX models
Model structures
General PEM
Intermediate step: PEM for 1st order ARMAX
To make things easier to follow, we first derive PEM for a 1st order
ARMAX.
Looking at slide ARMAX: explicit form, we write 1st order ARMAX in a
slightly different way:
y (k ) = ay(k 1) + bu(k 1) + ce(k 1) + e(k )
ARX models
Model structures
General PEM
1st order ARMAX: predictor

To achieve error e(k ), the predictor must be:
yb(k ) = ay(k 1) + bu(k 1) + ce(k 1)
(5.1)
This depends on unknown noise e(k 1). However, we derive a

recursive predictor formula that does not.
yb(k 1) = ay(k 2) + bu(k 2) + ce(k 2)
(5.2)
From Eqn. (5.1) +c Eqn. (5.2):

yb(k ) + c yb(k 1)
= ay(k 1) + bu(k 1) + ce(k 1)
+ c(ay(k 2) + bu(k 2) + ce(k 2))
= ay(k 1) + bu(k 1) + ce(k 1)
+ c(ay(k 2) + bu(k 2) + ce(k 2) + e(k 1) e(k 1))
= ay(k 1) + bu(k 1) + ce(k 1) + cy(k 1) ce(k 1)
=(c a)y (k 1) + bu(k 1)
ARX models
Model structures
General PEM
1st order ARMAX: predictor (continued)
Final recursion:
yb(k) = c yb(k 1) + (c a)y(k 1) + bu(k 1)
Requires initialization at yb(0); this initial value is usually taken 0.
This initialization needs some technical conditions to work properly, we do not
go into them here.
ARX models
Model structures
General PEM
1st order ARMAX: prediction error
Similar recursion:
(k ) = y(k) yb(k )
(k 1) = y(k 1) yb(k 1)
(k ) + c(k 1) = y(k) + cy(k 1) (yb(k) + c yb(k 1))
= y(k) + cy(k 1) ((c a)y (k 1) + bu(k 1))
= y(k) + ay(k 1) bu(k 1)
(k ) = c(k 1) + y (k ) + ay(k 1) bu(k 1)
Requires initialization of (0), usually taken 0.
ARX models
Model structures
General PEM
Finding the parameters
Once the errors are available, the parameter is found by minimizing

PN
criterion V () = N1 k=1 2 (k).
ARX models
Model structures
General PEM
Table of contents
1
Model structures

General case
Matlab example
ARX models
Model structures
General PEM
Recall: General model structure
y (k ) = G(q 1 )u(k) + H(q 1 )e(k )

where G and H are discrete-time transfer functions.
ARX models
Model structures
General PEM
Prediction error
We start by deriving e(k ).

y(k) = G(q 1 )u(k ) + H(q 1 )e(k)
e(k) = H 1 (q 1 )(y (k ) G(q 1 )u(k ))
where H 1 is the inverse of the fraction of polynomials H.
Recall that predictor will be derived so that the prediction error
(k ) = y(k) yb(k ) = e(k ), so the formula above can also be used to
compute (k ).
ARX models
Model structures
General PEM
Predictor
To achieve error e(k ), the predictor must be:
yb(k ) = y (k ) e(k) = Gu(k) + (H 1)e(k )
= Gu(k ) + (H 1)H 1 (y(k) Gu(k))
= Gu(k ) + (1 H 1 )(y(k) Gu(k))
= Gu(k ) + (1 H 1 )y (k ) Gu(k ) + H 1 Gu(k))
= (1 H 1 )y (k ) + H 1 Gu(k )
=: L1 y(k) + L2 u(k)
where we skipped q 1 to make the equations readable.
Remarks:
New notations L1 (q 1 ) = 1 H 1 (q 1 ),
L2 (q 1 ) = H 1 (q 1 )G(q 1 ).
In order to have a causal predictor (which only depends on past
values of the output and input), we need G(0) = 0 and H(0) = 1,
leading to L1 (0) = L2 (0) = 0.
ARX models
Model structures
General PEM
1st order ARMAX: finding the parameters
Once the predictors and errors are available, the parameter is found
PN
by minimizing criterion V () = N1 k=1 2 (k ).
Again, linear regression will not work in general. Computational
methods to solve the minimization problem will be introduced in the
next section.
ARX models
Model structures
General PEM
Specializing the framework to ARX
It is instructive to see how the formulas simplify in the ARX case.

Rewriting ARX in the general model template:
y (k ) = Gu(k ) + He(k) =
B
1
u(k ) + e(k )
A
A
We have:
L1 = 1 H 1 = 1 A, L2 = H 1 G = B
yb(k ) = L1 y (k ) + L2 y (k ) = (1 A)y (k ) + Bu(k ) = (k )
(k ) = H 1 (y (k ) Gu(k )) = Ay(k) Bu(k)
which can be easily seen to be equivalent with the ARX formulation.
ARX models
Model structures
General PEM
Specializing the framework to 1st order ARMAX
Homework! Plug in the polynomials for 1st order ARMAX and verify
that you obtain the same result as before.
ARX models
Model structures
General PEM
Table of contents
1
Model structures

General case
Matlab example
ARX models
Model structures
General PEM
Experimental data
Consider again the experimental data on which ARX was applied.
plot(id); and plot(val);
Recall theoretical ARMAX:

A(q 1 )y (k ) = B(q 1 )u(k) + C(q 1 )e(k)
ARX models
Model structures
General PEM
Identifying an ARMAX model

mARMAX = armax(id, [na, nb, nc, nk]);
Arguments:
1
2
Array containing the orders of A, B, C and the delay nk.
Like for ARX, structure includes the explicit minimum delay nk

between inputs and outputs.
y (k ) + a1 y (k 1) + a2 y (k 2) + . . . + ana y (k na)
= b1 u(k nk) + b2 u(k nk 1) + . . . + bnb u(k nk nb + 1)
+ e(k ) + c1 e(k 1) + c2 e(k 2) + . . . + cnc e(k nc)
A(q 1 )y (k ) = B(q 1 )u(k nk) + C(q 1 )e(k ), where:
A(q 1 ) = (1 + a1 q 1 + a2 q 2 + . . . + ana q na )
B(q 1 ) = (b1 + b2 q 1 + bnb q nb+1 )
C(q 1 ) = (1 + c1 q 1 + c2 q 2 + . . . + cnc q nc )
Remark: As for ARX, can transform into theoretical structure by using
a new B(q 1 ) of order nk + nb 1, with nk 1 leading zeros.
ARX models
Model structures
General PEM
ARMAX model validation

Considering the system is 2nd order with no time delay, take na = 2,
nb = 2, nc = 2, nk = 1. Validation: compare(val, mARMAX);
In contrast to ARX, results are good. Flexible noise model pays off.
ARX models
Model structures
General PEM
Identifying an OE model
Recall OE model structure:

y(k) =
B(q 1 )
u(k) + e(k)
F (q 1 )
ARX models
Model structures
General PEM
Identifying an OE model (continued)

mOE = oe(id, [nb, nf, nk]);
Arguments:
1
2
Array containing the orders of B, F , and the delay nk.
y (k ) =
B(q 1 )
u(k nk) + e(k ), where:
F (q 1 )
B(q 1 ) = (b1 + b2 q 1 + bnb q nb+1 )

F (q 1 ) = (1 + f1 q 1 + f2 q 2 + . . . + fnf q nf )
Remark: Like before, can transform into theoretical structure by
changing B.
ARX models
Model structures
General PEM
OE model validation
Considering the system is second-order with no time delay, we take
nb = 2, nf = 2, nk = 1. Validation: compare(val, mOE);
Results as good as ARMAX. System seems to obey both model

structures.
ARX models
Model structures
General PEM
Table of contents
1
Model structures

General case
Matlab example
ARX models
Model structures
General PEM
Preliminaries: Vector derivative and Hessian
Consider any function V (), V : Rn R. Then:
V1
2
dV
=
.. ,
d
.
V
n
2V
2
2 V1
2 1
d 2V
=
d2
..
.
2V
n 1
2V
1 2
2V
22
..
.
2V
n 2
...
...
..
.
...
2V
1 n
2V
2 n
..
2V
n2
ARX models
Model structures
General PEM
Assumptions
Assumptions (simplified)
1
2
3
4
Signals u(k ) and y(k) are stationary stochastic processes.

The input signal u(k ) is sufficiently informative.
Transfer functions G(q 1 ) and H(q 1 ) depend smoothly on .
The Hessian
d 2V
d 2
is nonsingular.
ARX models
Model structures
General PEM
Discussion of assumptions
Assumption 2 is informal, formal name is persistently exciting
input (it comes later).
Recall that G and H have the parameters as coefficients, we
denote them by G(q 1 ; ) and G(q 1 ; ) to make the
dependence explicit. Assumption 3 ensures we can differentiate
G and H w.r.t. .
dH
Note that dG
d , d are vectors of transfer functions!
PN
Recall V () = N1 k=1 2 (k), the MSE. Assumption 4 ensures V
is not flat around minima, so the minima are uniquely
identifiable.
ARX models
Model structures
General PEM
Guarantee
Theorem 1
Define the limit V () = limN V (). Given Assumptions 14, the
identification solution b = arg min V () converges to a minimum
point of V () as N .
Remark: This is a type of consistency guarantee, in the limit of

infinitely many data points.
ARX models
Model structures
General PEM
Further assumptions
Assumptions (simplified)
5
The true system satisfies the model structure chosen. This

means there exists at least one 0 so that for any input u(k) and
the corresponding output y (k ) of the true system, we have:
y(k) = G(q 1 ; 0 )u(k ) + H(q 1 ; 0 )e(k )
with e(k ) white noise.

The input u(k) is independent from the noise e(k ) (the
experiment is performed in open loop).
ARX models
Model structures
General PEM
Additional guarantee
Theorem 2
Under Assumptions 1-6, b converges to a true parameter vector 0 as
N .
Remark: Also a consistency guarantee. Theorem 1 guaranteed a
minimum-error solution, whereas Theorem 2 additionally says this
solution corresponds to the true system, if the system satisfies the
model structure.
ARX models
Model structures
General PEM
Table of contents
Model structures

Solving the optimization problem
Multiple inputs and outputs
MIMO ARX: Matlab example
ARX models
Model structures
General PEM
Optimization problem
Objective of identification procedure: Minimize mean squared error

V () =
N
1 X
(k)2
N
k=1
where (k ) are the prediction errors. In the general case:

(k ) = H 1 (q 1 )(y (k ) G(q 1 )u(k))
Solution: b = arg min V ()
So far we took this solution for granted and investigated its properties.
b in general
While in ARX linear regression could be applied to find ,
this does not work. Main implementation question:
How to solve the optimization problem?
ARX models
Model structures
General PEM
Minimization via derivative root

Consider first the scalar case, R.
Idea: at any minimum, the derivative
f () = dV
d is zero. So, find a root of f ().
Remarks:
Care must be taken to find a
minimum and not a maximum or
inflexion. This can be checked with
2
df
the second derivative, ddV2 = d
> 0.
We may also find a local minimum
which is larger (worse) than the
global one.
ARX models
Model structures
General PEM
Newtons method for root finding
Start from some initial point 0 .

At iteration `, next point `+1 is the intersection between abscissa
and tangent at f in current point ` . By geometry arguments:
`+1 = `
f (` )
df (` )
d
Remarks:
Notation
df (` )
d
means the value of derivative

df (` )
d .
df
d
at point ` .
The slope of the tangent is

`+1 is the best guess for root given current point ` .
ARX models
Model structures
General PEM
Newtons method for the identification problem

Replace f () by
dV
d
to get back to optimization problem:

`+1 = `
dV (` )
d
d 2 V (` )
d 2
Extend to Rn . Recall that dV

d is now an n-vector, and
n n-matrix (the Hessian):
2
1
d V (` )
dV (` )
`+1 = `
2
d
d
d 2V
d 2
Add a step size ` > 0:

`+1 = ` `
d 2 V (` )
d2
1
dV (` )
d
Remark:
The stepsize helps with the convergence of the method, e.g.
because in identification V is noisy.
an
ARX models
Model structures
General PEM
Stopping criterion
Algorithm can be stopped:

When the difference between consecutive parameter vectors is
small, e.g. maxni=1 |i,`+1 i,` | smaller than some preset
threshold.
or
When the number of iterations ` exceeds a preset maximum.
ARX models
Model structures
General PEM
Computing the derivatives
V () =
N
1 X
(k)2
N
k=1
Keeping in mind that (k) depends on , from matrix calculus:

N
dV
2 X
d(k )
=
(k )
d
N
d
k=1

>
N
N
d 2V
2 X d(k ) d(k)
2 X
d 2 (k)
=
+
(k )
2
d
N
d
d
N
d2
k=1
k=1
where:
d(k)
d
is the vector derivative and
d(k) d(k ) >

d [ d ]
is an n n matrix.
d 2 (k)
d 2
the Hessian of (k ).
ARX models
Model structures
General PEM
Gauss-Newton
Ignore the second term in the Hessian of V and just use the first term:
H=

>
N
2 X d(k ) d(k)
N
d
d
k=1
leading to the Gauss-Newton algorithm:

`+1 = ` ` H1
dV (` )
d
Motivation:
Quadratic form of H gives better algorithm behavior.
Simpler computation.
ARX models
Model structures
General PEM
Example: 1st order ARMAX

Recall model and prediction error for 1st order ARMAX:
y(k ) = ay(k 1) + bu(k 1) + ce(k 1) + e(k)
(k ) = c(k 1) + y (k ) + ay(k 1) bu(k 1)
We need
d(k)
d
>
(k ) (k )
= [ (k)
a , b , c ] . Differentiating second equation:
(k)
(k 1)
= c
+ y (k 1)
a
a
(k)
(k 1)
= c
u(k 1)
b
b
(k)
(k 1)
= c
(k 1)
c
c
(k) (k )
So, (k)
a , b , c are dynamical signals that can be computed
using the recursions above, starting e.g. from 0 initial values.
ARX models
Model structures
General PEM
Example: 1st order ARMAX (continued)
Finally, the algorithm is implemented as follows:

Given current value of parameter vector ` , apply the recursions
above to find signals d(k)
d , k = 1, . . . , n.
2
)
dV d V
Plug d(k
d into equations for d , d 2
Apply Newton (or Gauss-Newton) update formula to find `+1 .
ARX models
Model structures
General PEM
Table of contents
Model structures

ARX models
Model structures
General PEM
MIMO system
So far we considered y (k ) R, u(k) R,
Single-Input, Single-Output (SISO) systems:
y (k ) = G(q 1 )u(k) + H(q 1 )e(k )
Many systems are Multiple-Input, Multiple-Output (MIMO).
E.g., aircraft. Inputs: throttle, aileron, elevator, rudder.
Outputs: airspeed, roll, pitch, yaw.
ARX models
Model structures
General PEM
General MIMO model

Consider y (k ) Rny , u(k) Rnu . Model:
y (k ) = G(q 1 )u(k) + H(q 1 )e(k )
y1 (k)
u1 (k )
e1 (k )
y2 (k)
u2 (k )
e2 (k )
.. = G(q 1 ) .. + H(q 1 ) ..
.
.
.
yny (k)
unu (k )
eny (k)
Different from SISO:

noise e(k ) Rny is a random vector of the same size as y,
white-noise (independent, zero-mean).
G(q 1 ) is a matrix of size ny nu
H(q 1 ) is a matrix of size ny ny
The matrix elements are fractions of polynomials in q 1 .
ARX models
Model structures
General PEM
MIMO ARX
A(q 1 )y (k ) = B(q 1 )u(k) + e(k)

A(q 1 ) = I + A1 q 1 + . . . + Ana q na
B(q 1 ) = B1 q 1 + . . . + Bnb q nb
where I is the ny ny identity matrix, A1 , . . . , Ana Rny ny ,
B1 , . . . , Bnb Rnynu .
Put in the general template:
y (k ) = A1 (q 1 )B(q 1 )u(k ) + A1 (q 1 )e(k)
ARX models
Model structures
General PEM
MIMO ARX: concrete example
Take na = 1, nb = 2, ny = 2, nu = 3. Then:
A(q 1 )y (k ) = B(q 1 )u(k ) + e(k )
A(q 1 ) = I + A1 q 1
11

a1 a112 1
= I + 21
q
a1 a122
B(q 1 ) = B1 q 1 + B2 q 2
11

11
b
b112 b113 1
b2
= 121
q
+
b1 b122 b123
b221
b212
b222

b213 2
q
b223
ARX models
Model structures
General PEM
MIMO ARX: concrete example (continued)

11

1 0
a
a112 1 y1 (k )
q
+ 121
0 1
y2 (k )
a1 a122

11
b1 b112 b113 1
b11 b212
=
+ 221
21
22
23 q
b1 b1 b1
b2 b222

u (k)

b213 2 1
e1 (k )
u2 (k) +
q
e2 (k )
b223
u3 (k)
Explicit relationship:
y1 (k ) + a111 y1 (k 1) + a112 y2 (k 1)
= b111 u1 (k 1) + b112 u2 (k 1) + b113 u3 (k 1)
+ b211 u1 (k 2) + b212 u2 (k 2) + b213 u3 (k 2) + e1 (k )
y2 (k ) + a121 y1 (k 1) + a122 y2 (k 1)
= b121 u1 (k 1) + b122 u2 (k 1) + b123 u3 (k 1)
+ b221 u1 (k 2) + b222 u2 (k 2) + b223 u3 (k 2) + e2 (k )
ARX models
Model structures
General PEM
General error and predictor

The derivation mirrors that in the SISO case:
y(k) = Gu(k) + He(k )
e(k ) = (k) = H 1 (y(k) Gu(k))
yb(k) = y (k ) e(k ) = Gu(k ) + (H I)e(k)
= Gu(k) + (H I)H 1 (y (k ) Gu(k ))
= Gu(k ) + (I H 1 )(y (k ) Gu(k ))
= Gu(k) + (I H 1 )y(k) Gu(k) + H 1 Gu(k ))
= (I H 1 )y(k) + H 1 Gu(k )
=: L1 y (k ) + L2 u(k )
where technical conditions must be satisfied, e.g. to ensure that H 1
is meaningful.
ARX models
Model structures
General PEM
PEM criterion
Important difference between SISO and MIMO: identification criterion.

The SISO criterion, the MSE:
V () =
N
1 X
(k)2
N
k=1
is no longer appropriate, since (k) is an ny-vector.

PN
Define R() = N1 k=1 (k)> (k), an ny ny matrix. Two options for
the new criterion:
V () = trace(R())
V () = det(R())
Assuming (k ) is zero-mean, R() is an estimate of its covariance.
ARX models
Model structures
General PEM
Table of contents
Model structures

ARX models
Model structures
General PEM
Experimental data
Consider a continuous stirred-tank reactor:
Image credit: mathworks.com
Input: coolant flow Q

Outputs:
Concentration CA of substance A in the mix
Temperature T of the mix
ARX models
Model structures
General PEM
Experimental data
Left: identification, Right: validation
ARX models
Model structures
General PEM
MIMO ARX in Matlab

A(q 1 )y (k ) = B(q 1 )u(k ) + e(k )
11 1
a (q ) a12 (q 1 ) . . . a1ny (q 1 )
a21 (q 1 ) a22 (q 1 ) . . . a2ny (q 1 )
A(q 1 ) =
...
any1 (q 1 ) any2 (q 1 ) . . . anyny (q 1 )
(
1 if i = j
ij
aij (q 1 ) =
+ a1ij q 1 + . . . + ana
q naij
ij
0 otherwise
11 1
b (q ) b12 (q 1 ) . . . b1nu (q 1 )
b21 (q 1 ) b22 (q 1 ) . . . b2nu (q 1 )
B=
...
ny 1 1
ny2 1
nynu
1
b (q ) b (q ) . . . b
(q )
ij
bij (q 1 ) = b1ij q nkij + . . . + bnb
q nkij nbij +1
ij
ARX models
Model structures
General PEM
Identifying a MIMO ARX model

m = arx(id, [Na, Nb, Nk]);
Arguments:
1
2
Matrices with orders of polynomials in A, B, and delays nk:
na11 . . . na1ny
Na = . . .
nany1 . . . nanyny
nb11 . . . nb1nu
Nb = . . .
nbny1 . . . nbnynu
nk11 . . . nk1nu
Nk = . . .
nkny1 . . . nknynu
ARX models
Model structures
General PEM
MIMO ARX results

Take na = 2, nb = 2, and nk = 1 everywhere in matrix elements:
Na = [2 2; 2 2]; Nb = [2; 2]; Nk = [1; 1];
m = arx(id, [Na Nb Nk]);
compare(m, val);
Appendix: Nonlinear ARX
Appendix: Nonlinear ARX (for project)
Nonlinear ARX structure

Recall standard ARX:
y (k ) = a1 y (k 1) a2 y (k 2) . . . ana y(k na)
b1 u(k 1) + b2 u(k 2) + . . . + bnb u(k nb) + e(k)
Linear dependence on delayed outputs y(k 1), . . . , y (k na) and
inputs u(k 1), . . . , u(k nb).
Nonlinear ARX (NARX) generalizes this to any nonlinear
dependence:
y(k) =g y(k 1), y (k 2), . . . , y(k na),

u(k 1), u(k 2), . . . , u(k nb); + e(k)
Function g is parameterized by Rn , and these parameters can be
tuned to fit identification data and thereby model a particular system.
In our case, g will be a neural network.
Neural-net NARX: training data
y(k) =g y(k 1), y (k 2), . . . , y(k na),

u(k 1), u(k 2), . . . , u(k nb); + e(k)
y(k) =g(x(k ); ) + e(k )
with new notation for the network input:
>
x(k) = [y (k 1), . . . , y(k na), u(k 1), . . . , u(k nb)]
Then, the training dataset consists of input-output pairs:
(x(1), y (1)), (x(2), y(2)), . . . , (x(N), y (N))
where N=number of time steps in the data.
Note: Negative and zero-time y and u can be taken 0, assuming
system in zero initial conditions.
Neural-net NARX: training criterion
yb(k) =g(x(k ); )
(k) =y(k) yb(k ) = y(k ) g(x(k ); )
PN
The criterion is V () = N1 k=1 2 (k ), the MSE. Training should return
a parameter vector b minimizing V ().
So NARX identification is in fact:
A nonlinear prediction error method.
And also a nonlinear regression problem.
Using the NARX model

One-step ahead prediction: If true output sequence is known, the
network input x(k) is fully available:
>
x(k) =[y(k 1), . . . , y (k na), u(k 1), . . . , u(k nb)]

b
yb(k) =g(x(k ); )
Example: On day k 1, predict weather for day k .
Simulation: If true outputs unknown, use the previously predicted

outputs to construct an approximation of x(k ):
>
xb(k) =[yb(k 1), . . . , yb(k na), u(k 1), . . . , u(k nb)]
b
yb(k) =g(xb(k ); )
(predicted outputs at negative and zero time can also be taken 0.)
Example: Simulation of an aircrafts response to emergency pilot
inputs, that may be dangerous to apply to the real system.

Part 5 - Prediction Error Methods

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Part 5 - Prediction Error Methods

Enviado por

Direitos autorais:

Formatos disponíveis

System Identification

Control Engineering B.Sc., 3rd year

Lecturer: Lucian Busoniu

Implementation & Extensions

Implementation & Extensions

Identification of ARX models

General prediction error methods

Implementation and extensions

Implementation & Extensions

Recall Types of models from Part I:

Mental or verbal models

Prediction error methods produce parametric, polynomial models.

Identification of ARX models

General prediction error methods

Implementation and extensions

Implementation & Extensions

Recall: discrete-time model

Implementation & Extensions

Implementation & Extensions

ARX model structure

Implementation & Extensions

Implementation & Extensions

ARX model in polynomial form

The symbolic representation in the figure holds because:

Remark: The ARX model is quite general, it can describe arbitrary

Implementation & Extensions

Linear regression model

Implementation & Extensions

Implementation & Extensions

y(N na) u(N 1)

Implementation & Extensions

From linear regression, to minimize

(k )2 the parameters are:

Implementation & Extensions

ARX solution (continued)

Identification of ARX models

General prediction error methods

Implementation and extensions

Implementation & Extensions

Implementation & Extensions

There exists a true parameter vector 0 so that:

with v (k ) a stationary stochastic process independent from u(k ).

Implementation & Extensions

Assumption 1 is equivalent to the existence of true polynomials

To motivate Assumption 2, recall

& Stoica for more discussion).

Identification of ARX models

General prediction error methods

Implementation and extensions

Implementation & Extensions

Implementation & Extensions

Remarks: Identification input: a so-called pseudo-random binary

Implementation & Extensions

Identifying an ARX model

Structure slightly different from theory: includes the explicit minimum

Implementation & Extensions

Results are quite bad.

Implementation & Extensions

Implementation & Extensions

Structure selection (continued)

(Later we learn other structure selection criteria than smallest MSE.)

Implementation & Extensions

Validation of best ARX model

A better fit is obtained.