Você está na página 1de 47

Numerical Optimization

Scientific Programming with Matlab WS 2015/16


apl. Prof. Dr. rer. nat. Frank Hoffmann
Univ.-Prof. Dr.-Ing. Prof. h.c. Torsten Bertram
Lehrstuhl fr Regelungssystemtechnik

Numerical Optimization
Find the best solution for a given problem
Compute or approximate the solution parameter x from the set of
alternative solutions X which minimizes (maximizes) the objective function

F(x).

Numerical Optimization
classification

regression

system identification

optimal control

min J ( ) = (ci , c( xi , ))

min =
J ( )

min=
J ( )

=
min
J (u (t ))
u (t )

x = f ( x, u )

( y y ( x , ))
i

( y(t ) y (t | )) dt
x '(t )Qx(t ) + u '(t ) Ru(t )dt

Optimization Methods for Problem Types


Problem is
nonlinear / linear
local / global
Order of known
derivatives

[Simplex method finds exact


solutions]

heuristic methods
Evolutionary
Algorithms

0: Linesearch

Ant colony
optimization

1: Gradient search

Simulated
annealing
Hill climbing

2: Newton method

Problem Classes in Optimization

Linear vs. Nonlinear optimization


Nonlinear local vs. global optimization
scalar or multiobjective problems
Unconstrained or constrained optimization
continuous or integer programming
known or unknown derivatives

Linear least squares regression


Data set

{(y 1 ,u 1 ), ,(y p ,u p )}

Modell is linear in parameters x and regressors u

y=
u i 1 x 1 + u i 2 x 2 + + u iq x q
i
Modell is linear in parameters x and nonlinear in the
regressors u

=
y i f 1 (u i 1 )x 1 + f 2 (u i 2 )x 2 + + f q (u iq )x q

Polynomial Approximation
2

y i = x 0 + ui x1 + ui x 2 + + ui

q 1

xq

Least Squares Solution

System of p linear equations in q unknowns:

y1
u 11 x 1 + u 12 x 2 + + u 1q x q =
u x + u x + + u x =
y2
21 1
22 2
2q q

u p 1 x 1 + u p 2 x 2 + + u pq x q =
yp

with

u 11 u 12
u
21 u 22

u p 1 u p 2

u 1q
u 2q
, x
=

u pq

x =
y

x1
x
2
=
und y


xq

y1
y
2 .


y p

Quadratic Cost Function

Linear Least Squares

For p < q solutions form a (q p)-dimensional subspace of q .


For p = q there is a unique solution (in general).
For p > q the system is overconstrained with no exact solution.
In the overconstrained case p > q find a solution vector x, which
minimizes the squared equation errors.
def

E =

(u i

i
=1

Vector representation

x 1 + + u iq x q y i ) = Ux y .
2

def

E =
e e, e =
Ux y.

Least Squares Solution


Overconstrained system of p>q linear equations in q unknowns x

y1
u 11 x 1 + u 12 x 2 + + u 1q x q =
u x + u x + + u x =
y2
21 1
22 2
2q q

u p 1 x 1 + u p 2 x 2 + + u pq x q =
yp

y
Ux =

Least squares solution=


x U=
y argmin Ux y
*

Pseudo inverse

U = UT U

* def

-1

UT

No need to compute U* explicitly, instead use singular value


decomposition.

Regression or Curve Fitting


Curve fitting is the process of constructing a curve, or
mathematical function, that has the best fit to a series of data points

http://en.wikipedia.org/wiki/Curve_fitting#mediaviewer/File:Regression_pic_assymetrique.gif

Regression
Regression analysis is a statistic method in data analysis
Objective: Describe the relationship between a dependent variable y
and one or multiple independent variables x

=
y f ( x) + e
=
y f ( x1 , , xn ) + e
e denotes the error or residual of the model f(x)
Quantitative description of relationships
Predict values of the independent variable y on the basis of known
values of x
Analysis of the significance of the relationship

13

Example Linear Regression

http://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Linear_regression.svg/1000px-Linear_regression.svg.png

14

Regression

y = w2 x 2 + w1 x + w0

=
y w1 x + w0

http://de.wikipedia.org/wiki/Ausgleichungsrechnung#mediaviewer/File:Liniendiagramm_Ausgleich.svg

15

Nonlinear Optimization of F(x)


Goal: minimize scalar function F(x) over parameter vector x

x* = arg min F ( x)
x

Begin

End

Nonlinear Local Optimization Methods


Derivative free methods
- line search
- secant method
- downhill-simplex-method

Methods based on first derivative


- Gradient descent and conjugate gradients
- quasi-Newton-methods (BFGS, Gauss-Newton,LevenbergMarquardt)

Methods based on second derivative


- Newton-method, Newton-Raphson-method.
- Folded Spectrum Method

Simplex Search (Nelder Mead)


Simplex: A special polytope of N + 1 vertices in N dimensions.
Examples of simplices include a line segment on a line, a triangle
on a plane, a tetrahedron in three-dimensional space and so
forth.
Generate a new test position by extrapolating the behavior of the
objective function measured at each test point arranged as a
simplex.
Replace the worst test point with the new test point
Replace the worst point with a point reflected through the
centroid of the remaining N points.

Nelder Mead Algorithm

http://capsis.cirad.fr/capsis/_media/documentation/neldermeadsteps.gif

Nelder Mead Algorithm

http://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method#mediaviewer/File:Nelder_Mead1.gif

Nelder Mead Algorithm

http://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method#mediaviewer/File:Nelder_Mead2.gif

Nonlinear Optimization
Necessary and sufficient conditions for a minimum
Taylor approximation

1
F ( x + x) F ( x) + xT g ( x) + xT G ( x)x + ...
2
Necessary condition:
Sufficient condition

g ( x*) = 0

1 T
F ( x + x) F ( x) + x G ( x)x + ...
2
xT G ( x)x > 0 G(x*)>0

Nonlinear Optimization
iterative algorithm
- Initial parameter x0 xk
- Search direction pk
xk + k pk
- determine x k +=
1

Open issues
- How to determine p k ?
- How to determine k ?
- How to determine initial parameter x0 dependency of final solution?

Seach direction
- Taylor expansion of F(x) at the current solution xk .

F
T
(x k +1 x k ) =Fk + g k ( k p k )
x
T
g k p k < 0 p k =g k

Fk +1 =F (x k + k p k ) F (x k ) +
- Gradient descent

- First order gradient descent

x k +=
xk k g k
1

Gradient Descent
F(xk) decreases fastest if one goes from xk in the direction of the
negative gradient F (x k ) of F at xk.
If the step size is small enough

x k +1= x k F (x k )
then F (x k +1 ) F (x k )
Starts with a guess x0 for a local minimum of F(x), and
considers the sequence x 0 , x1 , x 2 ,

x k +1 = x k k F (x k )
Hopefully the sequence converges to the desired local minimum.
The value of the step size is allowed to change at every
iteration.

Gradient Descent
Gradient descent constant step size

x k +1= x k F (x k )

http://en.wikipedia.org/wiki/Gradient_descent#mediaviewer/File:Gradient_descent.svg

Gradient Descent
Gradient descent
constant step size

f ( x1 , x2 ) =
(1 x1 ) 2 + 100( x2 x1 ) 2

x k +1= x k F (x k )

http://en.wikipedia.org/wiki/Gradient_descent#mediaviewer/File:Banana-SteepDesc.gif

Gradient Descent
1 2 1 2
Gradient descent f ( =
x1 , x2 ) sin( x1 x2 + 3) + cos(2 x1 + 1 e x2 )
2
4
constant step size

x k +1= x k F (x k )

http://en.wikipedia.org/wiki/Gradient_descent#mediaviewer/File:Gradient_ascent_%28contour%29.png

Nonlinear Optimization
Line search: determine step width?

x k +=
xk + k pk
1

Select k to minimize =
Fk +1 F (x k + k p k ) .
1
0
1
x0 =
p0 x1 =x 0 + p 0 =
=

1
2
1 + 2
F =1 + (1 + 2 ) + (1 + 2 ) 2
F
=2 + 2(1 + 2 )2 =0

1
3
* = , x1 = 1

4
2

F ( x1 , x2 ) =x + x1 x2 + x2
2
1

Line Search
Search along a line until the local
minimum is bracketed by search
points
Tighten the bracket by
- golden cut
- Half-half cut
- Polynomial approximation
polynomial approximation
- Approximate f(x) by a quadratic
or cubic func
- Take minimum as next point
- Might diverge
- Efficient close to minimum

Bisection Method
Identification of zeros
Optimization : zeros of first derivative
Bisection of interval constitutes next candidate solution

Secant Method (Line Search)

Second Order Methods


F
x
x1
1
T

=
xk =

, g k =
x

xn
F

xn

Faster convergence
- assumption: F is quadratic
and Taylor expansion of
gradient g k +1 at point x
k +1
- For xk +1 to become a
minimum
1

g k +1 = 0, p k = G k g k
g k +1 =g (x k + p k ) =g k + G k (x k +1 x k )
= gk + G k pk
1

x k +1 = x k + p k = x k G k g (x k )
1

x k +=
x k G k g(x k )
1
Check numerical condition of

Gk

2 F
2 F

2
x
x

1
n
1
Gk =

2
F F
2
x x
x

n
n 1

Gradient Descent vs. Newton-Method


Gradient descent
follows blindly direction of steepest descent
Newton Method
considers curvature
local second order approximation of F(x)
(Hessian)

Quasi-Newton-Methods (DFP, BFGS)


indirekt estimation of Hessian
Levenberg-Marquardt
combination of Newton-method and gradient
descent depending on numerical condition of
the Hessian

Nonlinear Optimization in Matlab


lsqlin : least squares method for (constrained) linear problems

min(Cx d )

Ax b
Aeq x = beq
xmin x xmax

quadprog : quadratic programming for (constrained) quadratic programs

1
min x ' Hx + f ' x
x
2
lsqnonlin : least squares method for nonlinear problems

Ax b
Aeq x = beq
xmin x xmax

min fi ( x) 2
x

Lsqcurvefit : least squares method for regression problems (xdata,ydata)

min ( f ( x, xdatai ) ydatai ) 2


x

Nonlinear Optimization in Matlab


fminunc : unconstrained nonlinear optimization

min f ( x)
x

fminsearch : Simplex-Method, no gradient information

min f ( x)
x

fmincon : constrained nonlinear optimization

min f ( x)
x

c( x) 0
ceq ( x) = 0
Ax b
Aeq x = beq
xmin x xmax

optimoptions : selection of optimization method and parameters


optimtool : graphical user interface

OPTIMTOOL

LSQLIN
>> C = [0.9501 0.7620 0.6153 0.4057
0.2311 0.4564 0.7919 0.9354
0.6068 0.0185 0.9218 0.9169
0.4859 0.8214 0.7382 0.4102
0.8912 0.4447 0.1762 0.8936];
>> d = [0.0578 0.3528 0.8131 0.0098 0.1388];
>> A =[0.2027 0.2721 0.7467 0.4659
0.1987 0.1988 0.4450 0.4186
0.6037 0.0152 0.9318 0.8462];
>> b =[0.5251 0.2026 0.6721];
>> Aeq = [3 5 7 9];
>> beq = 4;
>> lb = -0.1*ones(4,1);
>> ub = 2*ones(4,1);
>> x = lsqlin(C,d,A,b,Aeq,beq,lb,ub)
>> x = -0.1000 -0.1000 0.1599 0.4090

min(Cx d ) 2
x

Ax b
Aeq x = beq
lb x ub

QUADPROG
>> H = [1 -1; -1 2];
>> f = [-2; -6];
>> A = [1 1; -1 2; 2 1];
>> b = [2; 2; 3];
>> lb = zeros(2,1);
>> options = optimoptions('quadprog',...
'Algorithm','interior-point-convex','Display','off');

1
min x ' Hx + f ' x
x
2

Ax b
Aeq x = beq
lb x ub

>> [x,fval,exitflag,output,lambda] = quadprog(H,f,A,b,[],[],lb,[],[],options);


>> x,fval,exitflag
x = 0.6667 1.3333
fval = -8.2222
exitflag = 1

LSQNONLIN
>> d = linspace(0,3);
>> y = exp(-1.3*d) + 0.05*randn(size(d));
>> fun = @(r)exp(-d*r)-y;
>> x0 = 4;
>> x = lsqnonlin(fun,x0)
Local minimum possible.
lsqnonlin stopped because the final
change in the sum of squares relative to
its initial value is less than the default
value of the function tolerance.
x = 1.2645
>> plot(d,y,'ko',d,exp(-x*d),'b-');

min fi ( x) 2
x

LSQCURVEFIT
>> xdata = [0.9 1.5 13.8 19.8 24.1 28.2 35.2
60.3 74.6 81.3];
>> ydata = [455.2 428.6 124.1 67.3 43.2 28.1
13.1 -0.4 -1.3 -1.5];
>> fun = @(x,xdata)x(1)*exp(x(2)*xdata);
>> x0 = [100,-1];
>> x = lsqcurvefit(fun,x0,xdata,ydata)
Local minimum possible.
lsqcurvefit stopped
x = 498.8309 -0.1013
>> times = linspace(xdata(1),xdata(end));
>> plot(xdata,ydata,'ko',times,fun(x,times),'b-')

min ( f ( x, xdatai ) ydatai ) 2


x

FMINUNC
>> fun = @(x)x(1)*exp(-(x(1)^2 + x(2)^2)) + (x(1)^2 + x(2)^2)/20;
min f ( x)
x
>> x0 = [1,2];
2
2
>> [x,fval] = fminunc(fun,x0)
2
2
(
x
+
x
( x1 + x2 )
1
2 )
(
,
)
=
f
x
x
x
e
+
x = -0.6691 0.0000
1
2
1
20
fval = -0.4052
>> options = optimoptions(@fminunc,'Display','iter','Algorithm','quasi-newton');
>> [x,fval,exitflag,output] = fminunc(fun,x0,options)
Iteration Func-count
f(x)
Step-size
0
3
0.256738
1
6
0.222149
1
2
9
0.15717
1
3
18
-0.227902
0.438133
4
21
-0.299271
1
5
30
-0.404028
0.102071
6
33
-0.404868
1
7
36
-0.405236
1
8
39
-0.405237
1
9
42
-0.405237
1

first order optimality


0.173
0.131
0.158
0.386
0.46
0.0458
0.0296
0.00119
0.000252
7.97e-07

FMINUNC
function [f,g] = rosenbrockwithgrad(x)
% Calculate objective f
f = 100*(x(2) - x(1)^2)^2 + (1-x(1))^2;
if nargout > 1 % gradient required
g = [-400*(x(2)-x(1)^2)*x(1)-2*(1-x(1));
200*(x(2)-x(1)^2)];
end
>> options =
optimoptions('fminunc','Algorithm','trustregion','GradObj','on');
>> x0 = [-1,2];
>> fun = @rosenbrockwithgrad;
>> x = fminunc(fun,x0,options)

f ( x1 ,=
x2 ) 100( x1 x2 ) + (1 x1 ) 2
2

x opt = [1,1]

FMINCON
Ax b

>> fun = @(x)100*(x(2)-x(1)^2)^2 + (1-x(1))^2;


>> x0 = [0.5,0];
>> A = [1,2];
>> b = 1;
>> Aeq = [2,1];
>> beq = 1;
>> x = fmincon(fun,x0,A,b,Aeq,beq)
x = 0.4149 0.1701

Aeq x = beq

min f ( x)
x

lb x ub
f ( x1 ,=
x2 ) 100( x1 x2 ) + (1 x1 ) 2
2

x1 + 2 x2 1
2 x1 + x2 =
1

Nonlinear Optimization in Matlab


Final solution depends on initial
solution x0
- convergence to local minima
- multiple restart to obtain
consistent solutions
- global heuristic methods such
as evolutionary algorithms

Optimization Toolbox Demos


datdemo.m

y = c(1)*exp(-lam(1)*t) + c(2)*exp(-lam(2)*t)

Optimization Toolbox Demos


bandem.m

Next: Global Optimization


Scientific Programming with Matlab WS 2014/15
apl. Prof. Dr. rer. nat. Frank Hoffmann
Univ.-Prof. Dr.-Ing. Prof. h.c. Torsten Bertram
Institute of Control Theory and Systems Engineering

Você também pode gostar