Escolar Documentos
Profissional Documentos
Cultura Documentos
ertyuiopasdfghjklzxcvbnmqwert
yuiopasdfghjklzxcvbnmqwertyui
Benders Decomposition for
opasdfghjklzxcvbnmqwertyuiop
Dummies
asdfghjklzxcvbnmqwertyuiopas
How I learned it
dfghjklzxcvbnmqwertyuiopasdf
Gro Klæboe
ghjklzxcvbnmqwertyuiopasdfgh
jklzxcvbnmqwertyuiopasdfghjkl
zxcvbnmqwertyuiopasdfghjklzx
cvbnmqwertyuiopasdfghjklzxcv
bnmqwertyuiopasdfghjklzxcvbn
mqwertyuiopasdfghjklzxcvbnm
qwertyuiopasdfghjklzxcvbnmqw
ertyuiopasdfghjklzxcvbnmqwert
yuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiop
asdfghjklzxcvbnmrtyuiopasdfgh
Contents
For whom is this note intended?............................................................................. 3
What is Benders decomposition?........................................................................... 3
What are the benefits of using Benders decomposition?....................................3
Presentation of the case study............................................................................... 4
Dressing up the problem in the standard notation.................................................4
The first stage..................................................................................................... 4
The second stage................................................................................................ 5
The optimality cuts – the link between the first and the second stage...............7
Taking care of stochasticity................................................................................. 8
Example of an optimality cut.............................................................................. 8
The Bender procedure from beginning to end......................................................11
The steps in Benders decomposition.................................................................12
Step 1: Initialization....................................................................................... 12
Step 2: Sub problems..................................................................................... 12
Step 3: Convergence test.............................................................................. 12
Step 4: Master problem.................................................................................. 12
But how do I know how many iterations to run?...............................................13
List of notation..................................................................................................... 15
References........................................................................................................... 15
For whom is this note intended?
The primary target group for this memo is very narrow: Me. I need to write down
what I just have understood, so that I don’t forget it. Other people that might
benefit from this memo are probably quite like me in the following respects:
Since I don’t intend to publish these notes other places than on my home page, I
don’t bother to pretend that Google isn’t my main source of information.
Therefore, the list of references include other lecture notes and memos and other
useful stuff found on the net.
Within stochastic programming, one often refer to the L-shaped method [1],
which was developed by van Slyke and Wets. I think they were the first to borrow
from Benders decomposition technique and apply it to decomposition methods.
(There is also someone named Kelly who touched upon the same things). This
method is basically the same as benders decomposition, with the addition of
feasibility cuts (see [1] ch 5.1 for details on this). Since we in this article will work
with problems with relative complete recourse, we can disregard the feasibility
cuts.
The first stage problem is thus to determine how much lumber to buy, and how
much finishing and carpentry-skilled labour to hire. The random variable is
demand quantities, and the second stage problem is to figure out how many
desks, tables and chairs to produce, recognizing that production opportunities are
limited by the inputs bought in stage 1.
Ok. Let’s turn to the standard formulation. For the first stage, the notation is as
follows:
T
min z=c x+ θ
s.t Ax=b
x≥0
If you are like me, it is often useful to see the vectors with some real values to
get a feel of the problem, so here is the objective.
[] []
xl xl
min z=[ c l c f c c ] x f +θ=[ 2 4 5.2 ] x f −∞
xc xc
But what do the first-stage restrictions look like? In Higles original problems,
there are none specified (probably because Higle do not show how to solve the
problem using Bender’s). There are no restrictions to how much lumber, finishing
and carpentry labour we can buy. But when solving the case, we’ll find out that
to get started, we really need some upper limits on the x-vector, otherwise the
first stage problem will in some of the starting iterations be unbounded. This will
often be the case with Benders. Luckily, it’s usually not hard to think of some
limits to the x that would be relevant to include. There might be some budget
limits on how much money the producer could spend on buying inputs, or in the
extreme case, the availability of resources will ultimately be limited by global
availability. (There cannot be more labour hours available for carpentry in one
specific hour than there are people in the world!). So, I’ll give you some upper
limits on x, which I know are sufficient for the levels of demand: Lumber: max
3500 bd ft. Finishing labour: max 1500 hours. Carpentry labour: max 875 hours.
With this information at hand, the A-matrix simply is the identy matrix, and the
restriction Ax=b becomes:
[ ][ ] [ ]
1 0 0 xl 3500
0 1 0 xf ≤ 1500
0 0 1 xc 875
Solving this problem for the first iteration of the first stage, it is obvious that the
solution is:
[ ][]
xl 0
xf = 0
xc 0
min w=qy
st Wy=h ( ω s )−T x
y≥0
Some comments on the notation and stochasticity here. T is often named the
technology matrix, W the recourse matrix, h the resource matrix 1 and q the cost
vector. All these might be a function of the stochastic variable, but to use
Benders, W must be independent of ω. That is known as fixed recourse. Also, life
is easier when q is independent of ω, as this interferes with feasibility cuts in
some ways that I have not really thought very hard about (the interested reader
is referred to chapter 3 of [5]). However, in our case, only demand is stochastic –
which means (as I will show) that only the resource vector is stochastic.
The objective in the second stage is really to maximize income give the available
inputs and demand constraints. However, as we want to work with a minimization
problem, we formulate the problem so that we try to minimize the negative costs
of selling furniture.
1 Note that if you are reading Higle, she denote the resource matrix with r rather than h.
¿
−60 −40 −10
¿
[]
yd
z=[ q d qt q c ] y t =¿
yc
min ¿
Then, let’s move on to the restriction matrix for the second stage. There are two
sets of equations to this particular problem, the input restrictions saying that you
cannot make more furniture than the stock of inputs allows you (the three first
rows of the W,h and T vector/matrix), and the demand restriction saying that you
cannot sell more to the market than the demand scenario allow you to (the three
last rows of W,h and T vector/matrices).
[ ][ ] [ ] [ ][ ]
8 6 1 0 −1 0 0
4 2 1.5 y 0 0 −1 0 x
2 1.5 0.5
d
0 0 0 −1
l
y t = d (ω ) − xf
1 0 0 d s 0 0 0
0 1 0 yc d t (ω s) 0 0 0 xc
0 0 1 d c (ω s) 0 0 0
Note that many of the rows in the technology matrix consist of all zeros. This took
a bit time for me to realize (but maybe you are smarter than me and grasp this at
once), but all restrictions that limits the second stage problem, but are not a
function of your first stage variables will have all-0-rows in the T-matrix. However,
they are still important, because the influence the objective function in the
second stage problem, and as we will see later on, the objective function will
enter the approximated second stage cost (θ). Note also that the three last rows
in the h-vector have stochastic elements. Thus, we will actually not have only
one second stage problem – we will have s (in our case 3) problems – one for
each scenario.
One thing that I see is imprecise in this exposition, is that the second stage
restrictions are represented by an equity constraint, whereas it really should be
less than or equal to constraints. I guess the condenced form Wy = h – Tx
involves some slack variables to make the equation hold. However, I will ignore
this for now – since I am going to solve the problem through the high-level OR
program GAMS rather than by matrix calculations. If you are trying to solve the
problem in MATLAB , Maple or equivalent, you should probably think about this a
bit.
The optimality cuts – the link between the first and the
second stage
Since you are reading this text, you have probably looked at the Bender’s
decomposition method and know that the first and the second stage problem are
tied together through the optimality cuts. Thus, after the first initialization, the
first stage problem gets a set of extra equations (one for each iteration), that
limits θ.
T
min z=c x+ θ
s.t Ax=b
El x+θ ≥ e l ,l=1, … . L
x≥0
Note that theta is unbounded, so that negative values are allowed. Note also
that θ has no subscript l, so that for each iteration, there will be more and more
restrictions that limits θ, ultimately pushing θ to its optimal value. But what do
these restrictions represent? Let us rewrite the optimality cut from the Birge &
Louveaux [1]notation:
θ ≥ el− El x l=1, … . L
This equation says that θ must be greater than the right hand side. Thus, we limit
the objective (a minimization problem) by saying that the θ-part must be at least
greater than something. But what is this something? To put it short, the
something is the expected objective value of the second stage problem e l (give
some fixed values of x) less the contribution to reduce the second stage objective
value by changing the first stage variables x.
The next question is then how do we know how much changing x is going to
reduce the second stage objective? The answer is that we use the marginal
values on the restrictions that x enters. If the marginal value is negative,
increasing x would reduce the second stage objective (and thus making it better
since it is a minimization problem), whereas a positive marginal value, would
increase it. The coefficient El tells us how fast the change is.
[A word of caution: It is really easy to get sign confused when working with
optimality cuts. Birge & Louveaux’s formulation is more intuitive appealing,
whereas Higle’s formulation is more mathematically correct. When reading the
two text side by side it might seem as they disagree on the sign for the x-term,
but they don’t. In Birge & Louveaux’s notation, E l is the simplex multiplicator of
Tx, whereas in Higle, the term βx is the simplex multiplicator of –Tx. Therefore,
the two restrictions
Where does this whole idea of optimality cut come from, and why does it work?
The answer is duality theory. Let us go back to the formulation of the second
stage problem. If you know your linear programming basics, you know that you
can formulate the dual of the second stage problem as follows:
T
max π [h( ωs )−T x]
T
s.t. π W ≤q
π ≥0
Where π is the vector of dual variables, and all other matrixes should be known
from the definition of the primal second stage problem. [The last restriction, π
≥0, really only holds if Wy≤h(ωs)-Tx, but since I guess that slack variables are
included in B&L formulation, I don’t worry too much about this].
For the optimal x-vector, the objective of the dual second stage problem will be
equal to the objective of the primal second stage problem. For all non-optimal x-
vectors, the objective value of the dual problem will be less than the second
stage objective. This means that the objective of the dual second stage problem
act as a lower bound on the true second stage cost.
S
T
El ≡ ∑ ps ( π ls ) T s
s=1
With these definitions at hand, it is easy to see that the weak duality property (3)
is equivalent with the optimality cut.
In the first stage problem, the objective is still to minimize the cost of purchasing
inputs while balancing the income that the purchase of inputs could lead to in the
second stage
[] []
xl xl
min z=[ c l c f c c ] x f +θ=[ 2 4 5.2 ] x f +θ
xc xc
We still have the first stage constraints limiting how much input it is possible to
buy:
[ ][ ] [ ]
1 0 0 xl 3500
0 1 0 x f ≤ 1500
0 0 1 xc 875
From the 3 earlier iterations, we have the following three restrictions on thetha:
[]
xl
θ ≥ 0+ [ −6.25 −2.5 0 ] x f
xc
[]
xl
θ ≥ 0+ [ 0 −20 0 ] x f
xc
[]
xl
θ ≥ 0+ [ 0 0 −30 ] x f
xc
If you want to reflect a bit upon the values of these three optimality cuts, you’ll
notice that without at least one unit of each input, the Dakota Furniture Company
will be unable to produce a single table, desk or chair, and the objective value of
the second stage (el) is therefore 0 in all three cuts. Also, the feedback from the
second stage value tells us that it is the lack of lumber that is the most binding
constraint, leading to a negative marginal value2 on this in the first iteration.
Since the cost of lumber (2) is less than the expected income from buying lumber
(-6.25), this leads us to buy lumber in the second iteration, but then it is the lack
of labour for finishing that is most restrictive and yields a negative marginal
2 Remember: Minimization objective : Negative marginal value indicates a better
objective if we increase the level.
value. Again, we decide to buy finishing as well, only to discover that it would be
beneficial also to have carpentry skilled labour.
With this information we solve the first stage problem and get the following
decision on how much input to buy:
[] [ ]
xl 3500
xf = 1250
xc l=4
833.33
[Jeg undres hvilken informasjon det er som får programmet til å kjøpe mindre enn
max av finishing og carpentry?]
Also, let us have a look at the marginal value of increased demand in the three
scenarios in this iteration:
Given the structure of this problem3, where the h-vector has zeros for all rows
where the T-matrix is non-zero, and vice versa, the constant term of the cuts can
be calculated in two ways – either directly through the marginal values on the h-
vector, or indirectly, but subtracting the marginal values of the T-matrix
multiplied with the first stage decisions from the second stage objective.
Let us start out with the conventional way of calculating that [1] prescribe:
S
T
e l ≡ ∑ p s ( π ls ) hs
s=1
Thus, for our problem, this becomes:
[]
3 dd, s
e l=4=∑ p s [ π d , s π t , s π c , s ] d t , s =¿
s=1
dc , s
[ ] [ ] [ ]
50 150 250
0.3 [ −60 −40 −10 ] 20 + 0.4 [ −60 −40 −10 ] 110 +0.3 [ 0 −10 0 ] 250 =−8750
200 225 500
However, we can choose to do this in another way. Remember that the objective
of the second stage problem is equal to the dual variables times the right hand
side. We may exploit this in the following way:
w s=π T [h s−T x]
T T
π hs =w s+ π T x
S S
e l ≡ ∑ p s π T h ( ω s )=∑ p s (w s+ π T T x)
s=1 s=1
So, calculating this for our case leaves us with. (Note that the marginal values
refer to –Tx, so that we need to replace the + with a – sign to get the calculations
correct).
( [])
3 xl
e l=4=∑ p s w s− [ π l ,s π f , s π c ,s ] xf =¿
s=1
xc l =4
(
3500
0.3∗ −5800− [ 0 0 0 ] 1250
833.33 [ ]) ( l =4
[ ]) (
3500
+ 0.4∗ −15650− [ 0 0 0 ] 1250
833.33 l =4
+ 0.3∗ −21250−[ 0 −15 0 ]
[ 8
3 I have not really thought it through whether this structure is a necessary condition for
being able to calculate el in two ways, or whether it also can be done with other types of
problems.
Which method is better, depends on the structure of the problem. With a large
second stage problem (h-vector with many non-zero elements), I think this
second approach is faster.
Let us now calculate El. That is just a question of calculating the probability
weighted sums of the marginals from all scenarios. For lumber and carpentry the
value of increased inputs are 0 for all scenarios, but for finishing the marginal
value is -15 in the high demand scenario. This leaves us with a coefficient of
0.3*0 + 0.4*0 + 0.3*-15 = -4.5 for finishing. The fourth cut will then look as
follows:
[]
xl
θ ≥−8750+ [ 0 −4.5 0 ] x f
xc
The cut tells us that, from the second stage point of view, the expected income
of the input vector bought in l4 is $-8750, and that this income could be
increased by $4.5 for each extra unit of finishing that was available. Is this
interpretation really correct?
- Each iteration contains one run of the first stage problem and s runs of the
second stage problem where s is the number of scenarios
- The goal of the first stage problem is to decide upon the first stage
decision variables, x. These are then transferred to the second stage
problem of the same iteration and kept constant for all scenarios.
- The goal of the second stage problem is to find the expected value of the
second stage problem, and also the gradient of the first stage variable with
respect to the second stage objective.
- Each iteration of the second stage problem adds one cut to the first stage
problem, whereas each run of the first stage problem only replace the old
set of first stage variables.
When you solve the first stage problem the cost of the second stage problem is
bounded by the optimality cuts. Thetha can be reduced by increasing the first
stage decision variables (the x’s), a decision that has to be weighed against the
increased first stage costs. But the gradient of which increased x reduce the
second stage cost is based on an optimal second stage decision given x, so the
estimation of thetha from the first stage will act as a lower bound on the second
stage cost.
The second stage problem will calculate the cost of the second stage given
optimal second stage decisions given fixed first stage variables.
The steps in Benders decomposition
This section follows [6].
Step 1: Initialization
v:=1{iteration number}
UB:=∞{Upper bound}
LB:=- ∞{Lower bound}
Solve initial master problem
T
min c x
Ax=b
x≥0
v ¿
x ≔ x {optimal values}
x≥0
v ¿
x ≔ x {Optimal values}
v ¿
θ ≔θ
LB:= c T x v +θv
Go to step 2
Let w v =e v −E v x v
If θv ≥wv, then stop, xv is an optimal solution. Otherwise, add the optimality cut to
the master problem and run another iteration. However, this knowledge is not
very useful in predicting the progress of your algorithm, since the gap θ v-wv do
not necessarily steadily decrease. This is illustrated for our case study in Figur 1.
30000
25000
[$]
20000
GAP, wv- thetav
15000
10000
5000
0
1 2 3 4 5 6 7 8 9 10 11
Iteration number
Figur 1: Gap between the probability weighted sum of second stage objective and the
approximation (theta) in the Dakota furniture problem.
With large problems at hand, it is quite frustrating not to know how much closer
you are to the optimal solution, but just to sit and hope for an optimal solution in
the next iteration.
0
-1000er1 er2 er3 er4 er5 er6 er7 er8 er9 r10 r11
it it it it it it it it it ite ite
-2000
-3000
-4000
-5000
-6000
Upper bound
-7000
Lower bound
-8000
-9000
-10000
-11000
-12000
-13000
-14000
-15000
-16000
Figur 2: Example of bounding of the master objective of the Dakota furniture problem
However, finding out how this upper and lower bound is calculated is rather
tedious. Let me present some intuition behind it here:
Since we have a minimization problem, the lower bound of the objective is the
best possible objective, right? And the upper bound represent the worst case. So,
after the first iteration we know that the expected profit of the Dakota furniture
company is between $14875 and $0 – not a very precise measure – huh?
Ok, now I reveal how the upper and lower bound is calculated. The lower bound is
simply the optimal value of the master objective .
T v v
LB=c x +θ
The upper bound is the first stage cost, plus the probability weighted sum of
optimal responses in the second stage:
UB=c t x v +∑ ps w s |x v
Aren’t these two measures really the same? No, there is one big difference. When
calculating the lower bound, you are allowed to vary the first stage decisions,
while simultaneously taking into account the impact of your choice of x in the
first and second stage – although the effect on second stage can only be
approximated through the restrictions on theta. In the Dakota furniture problem,
this means that we choose the amount of inputs to buy while accounting for that
the lumber, finishing and carpentry both represent a direct cost, but also an
income opportunity in the second stage, and trade these two objectives off
against each other.
When calculating the upper bound, however, the choice of inputs is fixed, and the
only thing we can do about it is to make the best out of it when demand is
revealed. However, if the choice of inputs happened to be optimal, we will also
have an optimal solution.
Birge & Louveaux [1] describe the theory behind this in chapter 9. They state
that “The L-shaped method (…) is based on iteratively providing a lower bound
on the recourse objective, Q( x) .” For details on the use of bonding, I refer the
interested reader to chapter 9, and 9.3 in particular.
The lower bound is continuously improving for each iteration, whereas the upper
bound stays the same for several iterations.
List of notation
l – iteration number in the L-shaped method
h – resource vector
p – probability
s – scenario
T – technology matrix
π – dual variables
References
[1] J. R. Birge and F. Louveaux, Introduction to Stochastic Programming. New
York: Springer, 1997.
[2] U. G. Christensen and A. B. Pedersen, "Lecture Note on Benders'
Decomposition," ed, 2008.
[3] C. C. Carøe and R. Schultz, "Dual Decomposition in Stochastic Integer
Programming," K.-Z.-Z. f. I. Berlin, Ed., ed, 1996.
[4] J. L. Higle, "Stochastic Programming: Optimization When Uncertainty
Matters," in Tutorials in Operations Research, ed: INFORMS, 2005.
[5] P. Kall and S. W. Wallace, Stochastic programming, 1 ed.: John Wiley &
Sons, 1994.
[6] E. Kalvelagen, "Benders Decomposition for Stochastic Programming with
GAMS," ed, 2003, p. 10.