Você está na página 1de 41

Statistical Computing

Prof. Dr. Matei Demetrescu

Summer 2019

Statistics and Econometrics (CAU Kiel) Summer 2019 1 / 41


Today’s outline

Combinatorial optimization

1 Heuristic searches

2 Simulated annealing and genetic algorithms

3 Tabu algorithms

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2019 2 / 41


Heuristic searches

Outline

1 Heuristic searches

2 Simulated annealing and genetic algorithms

3 Tabu algorithms

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2019 3 / 41


Heuristic searches

The basic setup

We have the same goal:


maximize f (θ) w.r.t.θ = (θ1 , . . . , θp ), where θ ∈ Ω, but
Ω consists of a finite number N of possible solutions.

Sounds simple, but N may be large.

Statistics and Econometrics (CAU Kiel) Summer 2019 4 / 41


Heuristic searches

Example: Model selection

Given a dependent variable Y and a set of predictors Xj ,


we search for the best model of the form Y = β0 + sj=1 βij Xij + 
P

... where {i1 , . . . , is } is a subset of {1, . . . , p}; always with intercept.


Of course, θi is coded suitably.
What “best” means may be defined in various ways, say by the
Akaike information criterion (AIC). (Recall, R2 is not suitable.)

Regressor selection requires ideally optimization over N = 2p candidate


models!

Statistics and Econometrics (CAU Kiel) Summer 2019 5 / 41


Heuristic searches

More generally

Practitioners frequently have to deal with discrete optimization problems


where Ω has 2p or even p! elements.

E.g. a likelihood function may depend on


so-called configuration parameters θ
which describe the form of a statistical model and
for which there are many discrete choices
and other parameters which are easily optimized for each
configuration.
In such cases, we may view f (θ) as the (log) profile likelihood of a given
configuration, θ.1

The solutions space Ω is obtained by combining p various features;


optimizing w.r.t. θ is demanding...
1
I.e., the highest likelihood attainable using that configuration.
Statistics and Econometrics (CAU Kiel) Summer 2019 6 / 41
Heuristic searches

Just to get an idea

Parsing many possible solutions is not always feasible:

Time to solve problem of order. . .


p O(p2 ) O(2p ) O(p!)
20 1 minute 1 minute 1 minute
21 1.10 minutes 2 minutes 21 minutes
25 1.57 minutes 32 minutes 12.1 years
30 2.25 minutes 17.1 hours 207 million years
50 6.25 minutes 2042.9 years 2.4×1040 years

Bottom line: Many combinatorial optimization problems are inherently too


difficult to solve exactly by traditional means.

Statistics and Econometrics (CAU Kiel) Summer 2019 7 / 41


Heuristic searches

A trade-off

So let’s avoid algorithms


that will find the global maximum, but
will take forever to do so.
An algorithm that finds a good local maximum in reasonable time is
clearly to be preferred.

Heuristic algorithms explicitly trade off global optimality for speed.

Statistics and Econometrics (CAU Kiel) Summer 2019 8 / 41


Heuristic searches

Local neighbourhood search

Let stay iterative (what else?):


we improve a current candidate solution, but
do so in a neighborhood of the current candidate, say N (θ (t) ).
The update θ (t) 7→ θ (t+1) ∈ N (θ (t) ) is a local move (step).
Don’t forget to add a stopping rule.

Note: Since the neighbourhood changes with each move, we do search


larger subsets of Ω.

Statistics and Econometrics (CAU Kiel) Summer 2019 9 / 41


Heuristic searches

Choice of neighborhood

The choice of neighbourhood influences performance:


So keep it simple: e.g., define N (θ (t) ) by allowing k changes to the
current candidate solution.2
Large k is time-consuming, but small k may miss improvements.
No universal answer on how to pick it.

In the model selection problem,


θ (t) is a binary vector coding the inclusion/exclusion of each X;
a 1-neighborhood is the set of models that either add or eliminate one
regressor from θ (t) (i.e. flips one element of θ).

2
This defines a so-called k-neighborhood, and any move is called a k-change.
Statistics and Econometrics (CAU Kiel) Summer 2019 10 / 41
Heuristic searches

The moves

You may come up with many ways of updating the current candidate; you
typically want an upward move:

Simple ascent: θ (t+1) is the first better θ in N (θ (t) ).


Steepest ascent: θ (t+1) is the best θ in N (θ (t) ).

Then, update the neighbourhood – and repeat.

Different pros and cons; again, no universally best choice...

Statistics and Econometrics (CAU Kiel) Summer 2019 11 / 41


Heuristic searches

Random starts

Be smart (for the given money)


Select many starting points at random,
Run simple/steepest ascent from each point
Choose best solution as final answer.
Works quite well in practice, actually.

Worth emphasizing: The random starts strategy can be combined with any
optimization method!

Statistics and Econometrics (CAU Kiel) Summer 2019 12 / 41


Heuristic searches

Example: Model selection, p = 27 predictors

400
yaxislabel

350

300
1 31 16 46 61
xaxislabel
Results of random-starts local search, 1-change steepest ascent, 15
iterations max per start. (AIC ranges between −300 and −420.)
Statistics and Econometrics (CAU Kiel) Summer 2019 13 / 41
Heuristic searches

Example: Regression with 27 predictors cont’d

Predictors selected by various searches

Method 1 2 3 6 7 8 9 10 12 13 14 15 16 18 19 20 21 22 24 25 26 AIC
LS (2,4) ••• • • • • • • • • • −416.95
LS (1) •• • • • • • • • • −414.60
LS (3) •• • • • • • • • • • −414.16
LS (5) • •• • • • • • • • −413.52
S-Plus • •• • • • • • • • • • −416.94
Efroymson • •• • • • • • • −400.16

Random starts local search [LS] model selection results using 1-change
steepest ascent, as well as two standard procedures (greedy stepwise
ascent). (Bullets indicate inclusion of predictor j.)

Statistics and Econometrics (CAU Kiel) Summer 2019 14 / 41


Heuristic searches

Local search with variable neighborhoods

For standard ascent algorithms,


entrapment in a globally uncompetitive local maximum is inevitable
because no downhill moves are allowed, and
(many) random starts are mandatory.

We may want to
break out of a bad neighbourhood, or
move towards other (more promising) neighbourhoods.
Of course, while ensuring that you don’t slide back immediately...

Statistics and Econometrics (CAU Kiel) Summer 2019 15 / 41


Simulated annealing and genetic algorithms

Outline

1 Heuristic searches

2 Simulated annealing and genetic algorithms

3 Tabu algorithms

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2019 16 / 41


Simulated annealing and genetic algorithms

Annealing

Wikipedia says:
Annealing, in metallurgy and materials science, is a heat
treatment that alters the physical and sometimes chemical
properties of a material [...].
In annealing, atoms migrate in the crystal lattice [grid] and the
number of dislocations decreases, leading to a change in ductility
and hardness. As the material cools it recrystallizes.

Statistics and Econometrics (CAU Kiel) Summer 2019 17 / 41


Simulated annealing and genetic algorithms

An analogy

A crystal has the lowest level of potential energy among all


configurations.
Movement of atoms is random, depends on temperature.
Annealing “pushes” atoms to reach their optimal position in the
lattice. This involves:
heating to a predetermined temperature
holding the material for some time at that temperature
slowly cooling and repeating the procedure.
In real-life physics, atoms may break out of suboptimal positions
when heating – but we want such behavior for θ (t) as well!

We treat θ (t) as an atom that should reach the optimal θ ∗ .

Statistics and Econometrics (CAU Kiel) Summer 2019 18 / 41


Simulated annealing and genetic algorithms

Simulated annealing

To maintain the analogy, annealing problems are posed as minimizations.


Of course, we may always minimize −f if we needed maximization.

The simulated annealing process


Initializes with a candidate θ (0) and a “temperature” τ0 .
Consists of several stages of constant temperature, τ0 , τ1 , τ2 . . .,
where
each stage j consists of mj iterations, where updates may (but need
not) take place.
The “temperature” parameterizes how often the algorithm could take
non-improving steps,
... thus allowing for break-out attempts.

Statistics and Econometrics (CAU Kiel) Summer 2019 19 / 41


Simulated annealing and genetic algorithms

The algorithm
1 Pick (randomly) a candidate solution θ̃ from the current
neighborhood of θ (t) , according to a proposal distribution g (t) .
(Often the discrete uniform over N (θ (t) ).)
2 If proposal leads to an improvement, update θ (t+1) = θ̃
3 If proposal does not lead to an improvement,
randomly choose whether to update using θ̃ or not;
the current temperature controls the probability of updating!
So let θ (t+1) = θ̃ with probability equal to
( !)
f (θ (t) ) − f (θ̃)
min 1, exp .
τj

Otherwise, let θ (t+1) = θ (t) .


4 Repeat steps 1 – 3 a total of mj times.
5 Increment j. Update temperature and stage length. Go to step 1.
Statistics and Econometrics (CAU Kiel) Summer 2019 20 / 41
Simulated annealing and genetic algorithms

Some details
The neighbourhoods:
should be small and easily computed.
any two candidates must be connectable via a sequence of
neighborhoods.

Let τj = α(τj−1 ) and mj = β(mj−1 ) (cooling schedule):


α(·) should slowly decrease temperatures to zero.
β(·) should increase stage lengths as temperatures drop.
Popular choices:
τj−1
Set mj = 1 ∀j and reduce temperature slowly, α(τj−1 ) = 1+aτj−1 for
some a  1. Or
Set α(τj−1 ) = aτj−1 for some a < 1 (usually a ≥ 0.9). Increase stage
lengths as you go, say β(mj−1 ) = bmj−1 for b > 1.
Statistics and Econometrics (CAU Kiel) Summer 2019 21 / 41
Simulated annealing and genetic algorithms

More practical details

Pick τ0 > 0 so that exp ((f (θ i ) − f (θ j ))/τ0 ) is close to one ∀ θ i and


θ j in Ω.
This is problem-dependent and may be an effort, but
allows any point to be visited (with a reasonable chance) early on.
Larger temperature reductions typically require longer stages.
Long stages at high temperatures are usually useless; decrease
temperature rapidly at first but slowly thereafter.
Stop when no improvements for a couple of stages, run out of
patience, exceed budget, or current solution is acceptable.

Statistics and Econometrics (CAU Kiel) Summer 2019 22 / 41


Simulated annealing and genetic algorithms

Example: Regression model selection

−300
yaxislabel 1.0

otherlabel
−350
0.5

−400
0.0
0
1000 2000
xaxislabel
Discrete uniform proposal, 1-neighbourhoods, 15-stage cooling schedule
with stage lengths of 60, 120, and 220, α(τj−1 ) = 0.9τj−1 , τ0 is 1 (bottom
solid) or 10 (top solid). (Dashed: baseline temperature at each step.)
Statistics and Econometrics (CAU Kiel) Summer 2019 23 / 41
Simulated annealing and genetic algorithms

Genetic algorithms

Rather than physics we now mimic natural evolutionary processes to


improve candidate solutions.

The analogy is taken seriously:


1 One works with a so-called population of candidate solutions;
the best individual (according to a so-called fitness function) is the
current approximation to the solution.
2 Each iteration leads to a change of generations (population renewal).
3 Characteristics of individuals are inherited and modified from
individuals of the previous generation, usually in a way that involves
randomness.
4 The key aspect is that individuals with high fitness are more likely to
pass on their characteristics to the next generation.

Statistics and Econometrics (CAU Kiel) Summer 2019 24 / 41


Simulated annealing and genetic algorithms

The general structure

Genetic algorithms consist of


1 A genetic representation of the solution, typically an array of 0/1
values (some encoding may be necessary); the exact configuration of
the genetic code of an individual is called chromosome. (Its elements
are the genes, and the possible values of the genes are called alleles.)
2 The fitness function to evaluate individuals (this is usually the target
function f of interest)
3 Initial population (should be large enough; either chosen at random or
clustered in subsets of the solutions space)
4 The population renewal algorithm (the most important item here); to
keep the analogy, it consists of applying a set of genetic operators
(i.e. rules that change the genetic representation of individuals) to
selected individuals of the current population.
5 A stopping rule (well, duh...)

Statistics and Econometrics (CAU Kiel) Summer 2019 25 / 41


Simulated annealing and genetic algorithms

Model selection again

In this example,
1 The genetic representation of θ is simply inclusion/exclusion of the
respective regressor (obvious codes 0/1).
2 The AIC may play the role of the fitness function.
3 The initial population consists of several candidate models, say picked
at random.
4 To renew the population, we look at the best performing models from
the current populations, and change them suitably to yield a new
generation (“breeding”).
5 We stop if no improvement has been observed for the last say 15
generations or a total number of generations has been reached (etc.).

Statistics and Econometrics (CAU Kiel) Summer 2019 26 / 41


Simulated annealing and genetic algorithms

Parent selection and genetic operators

We need current individuals to breed the next generation, so


Select a pair of parent solution at random, with probability to be
picked proportional to individual fitness or to relative rank in current
population (sometimes more parents are better);
Create new individual (offspring) based on characteristics of parents;
repeat until you have enough new individuals.

Two basic flavours of genetic operators generating new individuals:


Cross-over: select random position and split parent chromosomes
correspondingly; then recombine pieces.
Mutation: randomly change one or several elements of the
chromosome(s)

There are many variations (rule of thumb: small mutation probabilities).


Statistics and Econometrics (CAU Kiel) Summer 2019 27 / 41
Simulated annealing and genetic algorithms

Model selection again

400

300
yaxislabel

200

100

0 50 100
xaxislabel
Population size P = 20, 100 generations breeding with fitness-rank based
selection probability, simple crossover, and 1% mutation.
Statistics and Econometrics (CAU Kiel) Summer 2019 28 / 41
Tabu algorithms

Outline

1 Heuristic searches

2 Simulated annealing and genetic algorithms

3 Tabu algorithms

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2019 29 / 41


Tabu algorithms

Potentially escaping entrapment

Let’s now consider downhill moves explicitly. We allow them


if we can’t find uphill ones, but with
care to avoid reversal in the next (or 2nd next) step,
since this may eliminate potential long-term benefits of downhill move.

SANN or GA deal with this to some extend by built-in randomness, but


local searches don’t. (And even SANN/GA may use some help.)

So, after a downhill step, let’s (temporarily) forbid (or make tabu/taboo)
moves that have the potential of taking us back. We need to
characterize moves to this end, and
keep an eye on the recent history (say H (t) ) of moves.

Statistics and Econometrics (CAU Kiel) Summer 2019 30 / 41


Tabu algorithms

Move attributes

We use so-called attributes to characterize moves:


in each step, we forbid attributes that would lead to reversing moves.
... but only do so for a couple of steps.

Several attributes may be relevant for a given problem


call the ath attribute Aa ;
a move from θ (t) to θ (t+1) can be characterized by different
attributes, depending on t.

The tabu list is a list of attributes, not moves, so a single tabu attribute
may forbid entire classes of moves.

Statistics and Econometrics (CAU Kiel) Summer 2019 31 / 41


Tabu algorithms

Examples of attributes
Generic Attribute Model Selection Example
(2-change neighbourhood)
(t)
A change in the value of θi . The at- A1 : Whether the ith predictor is added
tribute may be the value from which (or deleted) from the model.
the move began, or the value to which
it arrived.
(t) (t)
A swap in the values of θi and θj A2 : Whether the absent variable is ex-
when
(t)
θi 6=
(t)
θj . changed for the included variable.

A change in the value of f resulting A3 : The reduction in AIC achieved by


from the step, f (θ (t+1) ) − f (θ (t) ). the move.
The value of some other strategically A4 : The number of predictors in the
chosen function, g, i.e., g(θ (t+1) ). new model.
A change in the value of g resulting A5 : The change in a different variable
from the step, g(θ (t+1) ) − g(θ (t) ). selection criterion such as adjusted R2 .

Statistics and Econometrics (CAU Kiel) Summer 2019 32 / 41


Tabu algorithms

Relevance at a given iteration

Recall, the presence of attributes on the tabu list is only temporary.


The recency R of an attribute Aa is the number of steps that have
passed since a move last exhibited Aa .
Let R Aa , H (t) = 0 if the move yielding θ (t) exhibits attribute a,


R Aa , H (t) = 1 if last appearance of a was at time t − 1, etc.




History matters:
Current neighbourhoods may depend on search history; denote this
N (θ (t) , H (t) ).
The choice of θ (t+1) in N (θ (t) , H (t) ) may depend on the (recent)
search history too.
This effectively leads to using an augmented target function, fH (t) .

Statistics and Econometrics (CAU Kiel) Summer 2019 33 / 41


Tabu algorithms

Workflow

Say we make a move exhibiting Aa


Then we put its complement on the tabu list (want to prevent
undoing this move),
... and leave it there for τ iterations. (This is a tuning parameter
controlling the way the algorithm works.)
The neighbourhood where we look for updates becomes
n o
(t) (t) (t)
N (θ , H ) = θ : θ ∈ N (θ ) & no attribute of θ is tabu at time t .

When R Aa , H (t) first equals τ , we remove Aa from the tabu list.




√ √
Typical tabu validity in practice: 7 ≤ τ ≤ 20, or .5 p ≤ τ ≤ 2 p. If it’s
large enough it will prevent cycles.

Statistics and Econometrics (CAU Kiel) Summer 2019 34 / 41


Tabu algorithms

Refinements: Aspiration

May want to override the tabu when it is wise.

Most common:
permit a tabu move ...
if it provides a higher value of the objective function than has been
found in any iteration so far.

Aspiration by influence:
A very recent high increase in target may imply that algorithm parses
a new (promising) region in Ω
So cancel the tabu of not reversing the last downward move: further
exploration of new region is meaningful.

Statistics and Econometrics (CAU Kiel) Summer 2019 35 / 41


Tabu algorithms

Refinements: Diversification

Bad candidates with often appearances may correspond to


entrapment.
So construct fH (t) to penalize such (bad) moves/attributes.

This requires of course keeping track of residence frequency F Aa , H (t) .




One way of imposing this:

f (θ (t+1) ) if f (θ (t+1) ) ≥ f (θ (t) )



(t+1)
fH (t) (θ )=
f (θ (t+1) ) − cF Aa , H (t) if f (θ (t+1) ) < f (θ (t) )


Statistics and Econometrics (CAU Kiel) Summer 2019 36 / 41


Tabu algorithms

Refinements: Intensification

Construct fH (t) to extend search near good solutions with high residency.
Reward retention of attributes that correlate with high-quality
high-freqency solutions;
penalize removal of such attributes.

Easier said than done.

Statistics and Econometrics (CAU Kiel) Summer 2019 37 / 41


Tabu algorithms

To sum up
Identify a list of problem-specific attributes, initialize tabu list. Then
1 Construct fH (t) (θ) based on f (θ), and perhaps
diversification/intensification penalties or incentives.
2 Identify neighbors of θ (t) , namely the members of N (θ (t) ).
3 Rank the neighbors in decreasing order of improvement, as evaluated
by fH (t) .
4 Select the highest ranking neighbor.
5 Is this neighbor currently on the tabu list? If not, go to step 8.
6 Does this neighbor pass an aspiration criterion? If so, go to step 8.
7 If all neighbors of θ (t) have been considered and none have been
adopted as θ (t+1) , then stop. Otherwise, select the next best neighbor
and go to step 5.
8 Adopt this solution as θ (t+1) and update the tabu list.
9 Has a stopping criterion been met? If so, stop. Otherwise, increment t
and go to step 1.
Statistics and Econometrics (CAU Kiel) Summer 2019 38 / 41
Tabu algorithms

Results of tabu search for model selection example

400
yaxislabel

350

300
0 20 40 60
xaxislabel

Best solution was hit twice: iterations 44 and 66.


Statistics and Econometrics (CAU Kiel) Summer 2019 39 / 41
Up next

Outline

1 Heuristic searches

2 Simulated annealing and genetic algorithms

3 Tabu algorithms

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2019 40 / 41


Up next

Coming up

The expectation-maximization algorithm

Statistics and Econometrics (CAU Kiel) Summer 2019 41 / 41

Você também pode gostar