Você está na página 1de 125

Main Content

Art, Graphics, and Procedural Generation

Designing Non-repeating Patterns with Prime Numbers
Low-Complexity Art
Random Psychedelic Art
Seam-carving for Content-Aware Image Scaling
The Cellular Automaton Method for Procedural Cave Generation
Bezier Curves and Picasso
Making Hybrid Images

Signal Processing
The Fast Fourier Transform Algorithm, and Denoising a Sound Clip
The Two-Dimensional Fourier Transform and Digital Watermarking
Making Hybrid Images
The Welch-Berlekamp Algorithm for Correcting Errors in Data

Machine Learning and Data Mining

K-Nearest-Neighbors and Handwritten Digit Classification
The Perceptron, and All the Things it Cant Perceive
Decision Trees and Political Party Classification
Neural Networks and the Backpropagation Algorithm
K-Means Clustering and Birth Rates
Linear Regression
Eigenfaces, for Facial Recognition (via Principal Component Analysis)
Bandit Learning: the UCB1 Algorithm
Bandit Learning: the EXP3 Algorithm
Bandits and Stocks
Weak Learning, Boosting, and the AdaBoost Algorithm
Fairness in machine learning (introduction, statistical parity)
The Johnson-Lindenstrauss Transform
Singular Value Decomposition (motivation, algorithms)
Support Vector Machines (inner products, primal problem, dual problem)

Graphs and Network Science

Trees and Tree Traversal
Breadth-First and Depth-First Search
The Erds-Rnyi Random Graph
The Giant Component and Explosive Percolation
Zero-One Laws for Random Graphs
Community Detection in Graphs, a Casual Tour
Googles PageRank Algorithm:
A First Attempt,
The Final Product,
Why It Doesnt Work Anymore

Combinatorial Optimization
When Greedy is Good Enough: Submodularity and the 1 1/e Approximation
When Greedy Algorithms are Perfect: the Matroid
Linear Programming and the Most Affordable Healthy Diet Part 1
Linear Programming and the Simplex Algorithm
The Many Faces of Set Cover

Algorithmic Game Theory

Stable Marriages and Designing Markets
Serial Dictatorships and Housing Allocation

Quantum Computing
A Motivation for Quantum Computing
The Quantum Bit
Multiple Qubits and the Quantum Circuit
Concrete Examples of Quantum Gates

Elliptic Curves Introduction
Elliptic Curves as Elementary Equations
Elliptic Curves as Algebraic Structures
Elliptic Curves as Python Objects (over the rational numbers)
Programming with Finite Fields
Connecting Elliptic Curves with Finite Fields
Elliptic Curve Diffie-Hellman
Sending and Authenticating Messages with Elliptic Curves (Shamir-Massey-Omura and
The Mathematics of Secret Sharing
Zero Knowledge Proofs (primer, zero knowledge proofs for NP, definitions and theory)

Streaming and Sublinear Algorithms

Load Balancing and the Power of Hashing
Program Gallery entries

Natural Language
Metrics on Words
Word Segmentation, or Makingsenseofthis
Cryptanalysis with N-Grams

Computational Category Theory

A Sample of Standard ML (and a Preview of Monoids)
Categories, Whats the Point?
Introducing Categories
Categories as Types
Properties of Morphisms
Universal Properties
The Universal Properties of Map, Fold, and Filter

Computational Topology
Computing Homology
Fixing Bugs in Computing Homology
The ech complex and the Vietoris-Rips complex

The Wild World of Cellular Automata
Turing Machines and Conways Dreams
Conways Game of Life in Conways Game of Life
Optimally Stacking the Deck: Kicsi Poker
Optimally Stacking the Deck: Texas Hold Em
Want to make a great puzzle game? Get inspired by theoretical computer science.

The Reasonable Effectiveness of the Multiplicative Weights Update Algorithm
Hunting Serial Killers
Learning to Love Complex Numbers
Holidays and Homicide
Numerical Integration
Busy Beavers, and the Quest for Big Numbers
Row Reduction over a Field
Complete Sequences and Magic Tricks
Well Orderings and Search

Teaching Mathematics Graph Theory
Learning Programming Finger-Painting and Killing Zombies
How to Take a Calculus Test
Why Theres no Hitchhikers Guide to Mathematics
Deconstructing the Common Core Mathematical Standard
Guest Posts
Math and Baking: Torus-Knotted Baklava
With High Probability: Whats up with Graph Laplacians?

Share this:

37Share on Facebook (Opens in new window)37

Click to share on Google+ (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Twitter (Opens in new window)


5 thoughts on Main Content

1. Pingback: Cryptanalysis with N-Grams | Math Programming
2. Pingback: Matemtica e lgebra Computacional - Mrcio Francisco Dutra e Campos
3. Pingback: The Universal Properties of Map, Fold, and Filter | Math Programming
4. Pingback: Introducing Categories | Math Programming
5. Pingback: Introducing Elliptic Curves | Math Programming

Prime Design
Posted on June 13, 2011 by j2kun

The goal of this post is to use prime numbers to make interesting and asymmetric graphics,
and to do so in the context of the web design language CSS.

Number Patterns
For the longest time numbers have fascinated mathematicians and laymen alike. Patterns in
numbers are decidedly simple to recognize, and the proofs of these patterns range from
trivially elegant to Fields Medal worthy. Heres an example of a simple one that computer
science geeks will love:

Theorem: for all natural numbers .

If youre a mathematician, you might be tempted to use induction, but if youre a computer
scientist, you might think of using neat representations for powers of 2

Proof: Consider the base 2 representation of , which is a 1 in the th place and zeros
everywhere else. Then we may write the summation as

And clearly adding one to this sum gives the next largest power of 2.

This proof extends quite naturally to all powers, giving the following identity. Try to
prove it yourself using base number representations!

The only other straightforward proof of this fact would require induction on , and as a
reader points out in the comments (and I repeat in the proof gallery), its not so bad. But it
was refreshing to discover this little piece of art on my own (and it dispelled my boredom
during a number theory class). Number theory is full of such treasures.

Though there are many exciting ways in which number patterns overlap, there seems to be
one grand, overarching undiscovered territory that drives research and popular cultures
fascination with numbers: the primes.

The first few prime numbers are . Many elementary

attempts to characterize the prime numbers admit results implying intractability. Here are a

There are infinitely many primes.

For any natural number , there exist two primes with no primes between
them and . (there are arbitrarily large gaps between primes)
It is conjectured that for any natural number , there exist two primes larger
than with . (no matter how far out you go, there are still primes that
are as close together as they can possibly be)

Certainly then, these mysterious primes must be impossible to precisely characterize with
some sort of formula. Indeed, it is simple to prove that there exists no polynomial formula
with rational coefficients that always yields primes*, so the problem of generating primes
via some formula is very hard. Even then, much interest has gone into finding polynomials
which generate many primes (the first significant such example was , due to
Euler, which yields primes for ), and this was one of the driving forces behind
algebraic number theory, the study of special number rings called integral domains.

*Aside: considering the amazing way that the closed formula for the Fibonacci numbers
uses irrational numbers to arrive at integers, I cannot immediately conclude whether the
same holds for polynomials with arbitrary coefficients, or elementary/smooth functions in
general. This question could be closely related to the Riemann hypothesis, and Id expect a
proof either way to be difficult. If any readers are more knowledgeable about this, please
feel free to drop a comment.

However, the work of many great mathematicians over thousands of years is certainly not in
vain. Despite their seeming randomness, the pattern in primes lies in their distribution, not
in their values.

Theorem: Let be the number of primes less than or equal to (called the prime
counting function). Then

Intuitively, this means that is about for large , or more specifically that if
one picks a random number near , the chance of it being prime is about . Much
of the work on prime numbers (including equivalent statements to the Riemann hypothesis)
deals with these prime counting functions and their growth rates. But stepping back, this is a
fascinatingly counterintuitive result: we can say with confidence how many primes there are
in any given range, but determining what they are is exponentially harder!

And whats more, many interesting features of the prime numbers have been just stumbled
upon by accident. Unsurprisingly, these results are among the most confounding. Take, for
instance, the following construction. Draw a square spiral starting with 1 in the center, and
going counter-clockwise as below:

If you circle all the prime numbers youll notice many of them spectacularly lie on common
diagonals! If you continue this process for a long time, youll see that the primes continue to
lie on diagonals, producing a puzzling pattern of dashed cross-hatches. This Ulam
Spiral was named after its discoverer, Stanislaw Ulam, and the reasons for its appearance
are still unknown (though conjectured).

All of this wonderful mathematics aside, our interest in the primes is in its apparent lack of

Primes in Design
One very simple but useful property of primes is in least common denominators. The
product of two numbers is well known to equal the product of their least common multiple
and greatest common divisor. In symbols:

We are particularly interested in the case when and are prime, because then their
greatest (and only) common divisor is 1, making this equation

The least common multiple manifests itself concretely in patterns. Using the numbers six
and eight, draw two rows of 0s and 1s with a 1 every sixth character in the first row and
every 8th character in the second. Youll quickly notice that the ones line up every twenty-
fourth character, the lcm of six and eight:


Using two numbers which are coprime (their greatest common divisor is 1, but they are
not necessarily prime; say, 9 and 16), then the 1s in their two rows would line up
every characters. Now for pretty numbers like six and eight, there still appears to be a
mirror symmetry in the distribution of 1s and 0s above. However, if the two
numbers are prime, this symmetry is much harder to see. Try 5 and 7:


There is much less obvious symmetry, and with larger primes it becomes even harder to tell
that the choice of match up isnt random.

This trivial observation allows us to create marvelous, seemingly non-repeating patterns,

provided we use large enough primes. However, patterns in strings of 1s and 0s are not
quite visually appealing enough, so we will resort to overlaying multiple backgrounds in
CSS. Consider the following three images, which have widths 23, 41, and 61 pixels,



Each has a prime width, semi-transparent color, and a portion of the image is deleted to
achieve stripes when the image is x-repeated. Applying our reasoning from the 1s and 0s,
this pattern will only repeat once every pixels!
As designers, this gives us a naturally non-repeating pattern of stripes, and we can control
the frequency of repetition in our choice of numbers.

Here is the CSS code to achieve the result:

html {
background-image: url(23.png), url(41.png), url(61.png);

Im using Google Chrome, so this is all the CSS thats needed. With other browsers you
may need a few additional lines like height: 100% or margin: 0, but Im not going to
worry too much about that because any browser which supports multiple background
images should get the rest right. Heres the result of applying the CSS to a blank HTML

Now Im no graphic designer by any stretch of my imagination. So

as a warning to the reader, using these three particular colors may result in an eyesore more
devastating than an 80s preteen bedroom, but it illustrates the point of the primes, that on
my mere 1440900 display, the pattern never repeats itself. So brace yourself, and click the
thumbnail to see the full image.

Now, to try something at least a little more visually appealing, we do the same process with
circles of various sizes on square canvases with prime length sides ranging from 157157
pixels to 419419. Further, I included a little bash script to generate a css file with
randomized background image coordinates. Here is the CSS file I settled on:

html {
background-image: url(443.png), url(419.png), url(359.png),
url(347.png), url(157.png), url(193.png), url(257.png),
background-position: 29224 10426, 25224 24938, 8631 32461,
22271 15929, 13201 7320, 30772 13876, 11482 15854,
31716, 21968;

With the associated bash script generating it:

#! /bin/bash

echo "html {"

echo -n " background-image: url(443.png), url(419.png), "
echo -n "url(359.png), url(347.png), url(157.png), url(193.png), "
echo -n "url(257.png), url(283.png);"
echo -n " background-position: "
for i in {1..7}
echo -n "$RANDOM $RANDOM, "

echo "$RANDOM, $RANDOM;"

echo "}"

Prime Circles

And here is the result. Again, this is not a serious attempt at a work of art. But while you
might not call it visually beautiful, nobody can deny that its simplicity and its elegant
mathematical undercurrent carry their own aesthetic beauty. This method, sometimes
called the cicada principle, has recently attracted a following, and the Design Festival blog
has a gallery of images, and a few that stood out. These submissions are the true works of
art, though upon closer inspection many of them seem to use such large image sizes that
there is only one tile on a small display, which means the interesting part (the patterns) cant
be seen without a ridiculously large screen or contrived html page widths.

So there you have it. Prime numbers contribute to interesting, unique designs that in their
simplest form require very little planning. Designs become organic; they grow from just a
few prime seedlings to a lush, screen-filling ecosystem. Of course, for those graphical
savants out there, the possibilities are endless. But for the rest of us, we can use these
principles to quickly build large-scale, visually appealing designs, leaving math-phobic
designers in the dust.

It would make me extremely happy if any readers who play around and come up with a cool
design submit them. Just send a link to a place where your design is posted, and if I get

enough submissions I can create a gallery of my own

Until next time!

Share this:

7Share on Facebook (Opens in new window)7

Click to share on Google+ (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Twitter (Opens in new window)

This entry was posted in Design, Number Theory and

tagged bash, css, graphics, html, mathematics, patterns, primes, programming. Bookmark
the permalink.
Post navigation
Googles PageRank Introduction
Well Orderings and Search

12 thoughts on Prime Design

1. Pingback: Kolmogorov Complexity A Primer | Math Programming

2. Nik Coughlin

April 22, 2012 at 7:06 pm Reply

I know this post is quite old now, but I just came across it if you havent seen it already,
there is an article (and gallery) regarding this principle that you might like to look at:



o j2kun

April 22, 2012 at 8:27 pm Reply

Yup! That was the inspiration for my post, because unfortunately back when I
started this blog I didnt have any original ideas. Interestingly, I find applications
for this concept all over the place, not just in design. For instance, I went to a salsa
club recently, and I noticed that the stage lights moved in patterns with very tiny
periods, the whole pattern repeating twice a minute at least. Why not make it more
random, and give each lights erratic path a prime number period? Then the lights
wouldnt repeat the same pattern for the entire night at least!


3. Jeremy

April 22, 2012 at 11:03 pm Reply

I dont think the induction proof is so badfirst you could prove it

[Note, your formula should have denominator (k-1), not (n-1)]

Only a single variable of induction is required. In fact if you want to use a little abstract
algebra, then things get even easier (and more general).

And this will obviously expand out exactly to , exactly the sum you
want. And this will hold for an arbitrary ring.


o j2kun

April 23, 2012 at 8:59 am Reply

I believe it could work (and your TeX is a bit wonky), but it certainly is messy. I
revisited this problem in the proof gallery to give the shortest possible proof. And I
also used polynomials in an alternate proof. Indeed, it holds for
arbitrary commutative rings with unity, but I was relaxed enough to leave it at
polynomials over a ring.



April 23, 2012 at 12:41 pm

How do I edit the post? Im not that experienced with Tex, and I made
several mistakes.

But indeed the first proof does not require induction over two variables, and
the second holds in any ring where the sum is defined (commutativity not
required, and if there are no zero divisors, then the division will give a
unique answer rather than an equivalency class).



April 23, 2012 at 6:16 pm

I fixed the latex issues, and worked it out myself, and I agree that you can
just prove it by induction on one variable, and that the proof is not so bad. I

guess I was just proud of finding my own little proof

But I think you do have to worry about commutativity. Take for instance the
same idea and extend it to two variables: xy = yx as polynomials, but
plugging in random values for x and y doesnt make this still hold.



April 23, 2012 at 6:51 pm

Haha, thanks for fixing that. Theres still a duplicate line, and some minor
mistakes, if you care. I definitely agree that your proof is cool, I suppose I
just commented because I like induction. Certainly your proof offers more

The reason I dont think you have to worry about commutativity is that the
only values you deal with are 1 and powers of x, which automatically
commute. Once you start adding multi-variable polynomials, then yeah, I
agree you have to be careful about how you manipulate them.


4. Pingback: Infinitely Many Primes (Using Topology) | Math Programming

5. Pingback: Matemticas y programacin | CyberHades
6. Pingback: Kolmogorov Complexity A Primer Another Word For It

7. HarryPotter5777

November 4, 2015 at 2:02 pm Reply

For polynomials with arbitrary coefficients, I think the proof isnt actually too bad the
only polynomials which produce integer outputs for integer inputs are those with rational
coefficients. This can be seen by taking the series $f(0), f(1), f(2), $ and taking successive
differences between terms. After $n$ differences for an $n^\textrm{th}$ degree polynomial,
the sequence will go to 0; then for each first term $d_i$ in the differences compute the sum
$\sum_{i=0}^{n}{d_i} {x \choose i}$. For instance, with $f(x)=x^2$, the series is
$0,1,4,9,16,$, the first differences are $1,3,5,7,$, the second differences are
$2,2,2,2,2,$, and all successive differences are $0$. If we look at $0{x\choose 0}+1{x
\choose 1}+2{x \choose 2}$, we find that its equal to $x^2$. Since this provides a rational
polynomial expression for any polynomial-generated series of integers, we can conclude
that all polynomials mapping $\mathbb{N}$ to $\mathbb{N}$ have rational coefficients.

As for smooth functions in general, I think that such functions do exist it shouldnt be too
hard to prove that by adding together an infinite series of smooth functions which decrease
rapidly enough away from a single spike, one can create a function that over time matches
more and more of the primes and converges at every real value.
Low Complexity Art
Posted on July 6, 2011 by j2kun

The Art of Omission

Whether in painting, fiction, film, landscape architecture, or paper folding, art is often said
to be the art of omission. Simplicity breeds elegance, and engages the reader at a deep,
aesthetic level.

A prime example is the famous six-word story written by Ernest Hemingway:

For sale: baby shoes, never worn.

He called it his best work, and rightfully so. To say so much with this simple picture is a
monumental feat that authors have been trying to recreate since Hemingways day.
Unsurprisingly, some mathematicians (for whom the art of proof had better not omit
anything!) want to apply their principles to describe elegance.

Computation and Complexity

This study of artistic elegance will be from a computational perspective, and it will be based
loosely on the paper of the same name. While we include the main content of the paper in a
condensed form, we will deviate in two important ways: we alter an axiom with
justification, and we provide a working implementation for the readers use. We do not
require extensive working knowledge of theoretical computation, but the informed reader
should be aware that everything here is theoretically performed on a Turing machine, but
the details are unimportant.

So let us begin with the computational characterization of simplicity. Unfortunately, due to

our own lack of knowledge of the subject, we will overlook the underlying details and take
them for granted. [At some point in the future, we will provide a primer on Kolmogorov
complexity. We just ordered a wonderful book on it, and cant wait to dig into it!]

Here we recognize that all digital images are strings of bits, and so when we speak of the
complexity of a string, in addition to meaning strings in general, we specifically mean the
complexity of an image.

Definition: The Kolmogorov complexity of a string is the length of the shortest program
which generates it.

In order to specify length appropriately, we must fix some universal description language,
so that all programs have the same frame of reference. Any Turing-complete programming
language will do, so let us choose Python for the following examples. More specifically,
there exists a universal Turing machine , for which any program on any machine may be
translated (compiled) into an equivalent program for by a program of fixed size. Hence,
the measure of Kolmogorov complexity, when a fixed machine is specified (in this case
Python), is objective over the class of all outputs.
Here is a simple example illustrating Kolmogorov complexity: consider the string of one
hundred zeros. This string is obviously not very complex, in the sense that one could
write a very short program to generate it. In Python:

print "0" * 100

One can imagine that a compiler which optimizes for brevity would output rather short
assembly code as well, with a single print instruction and a conditional branch, and some
constants. On the other hand, we want to call a string like


complex, because it follows no apparent pattern. Indeed, in Python the shortest program to
output this string is just to print the string itself:

print "00111010010000101101001110101000111101"

And so we see that this random string of ones and zeros has a higher Kolmogorov
complexity than the string of all zeros. In other words, the boring string of all zeros is
simple, while the other is complicated.

Kolmogorov himself proved that there is no algorithm to compute Kolmogorov complexity

(the number itself) for any input. In other words, the problem of determining exact
Kolmogorov complexity is undecidable (by reduction from the halting problem; see the
Turing machines primer). So we will not try in vain to actually get a number for the
Kolmogorov complexity of arbitrary programs, although it is easy to count the lengths of
these provably short examples, and instead we speak of complexity in terms of bounds and

Kolmogorov Meets Picasso

To apply this to art, we want to ask, for a given picture, what is the length of the shortest
program that outputs it? This will tell us whether a picture is simple or complex.
Unfortunately for us, most pictures are neither generated by programs, nor do they have
obvious programmatic representations. More feasibly, we can ask, can we come up with
pictures which have low Kolmogorov complexity and are also beautiful? This is truly a
tough task.

To do so, we must first invent an encoding for pictures, and write a program to interpret the
encoding. Thats the easy part. Then, the true test, we must paint a beautiful picture.

We dont pretend to be capable of such artistry. However, there are some who have created
an encoding based on circles and drawn very nice pictures with it. Here we will present
those pictures as motivation, and then develop a very similar encoding method, providing
the code and examples for the reader to play with.

Jrgen Schmidhuber, a long time proponent of low-complexity art, spent a very long time
(on the order of thousands of sketches) creating drawings using his circle encoding method,
and here are some of his results:
Marvelous. Our creations will be much uglier. But we admit, one must start somewhere,
and it might as well be where we feel most comfortable: mathematics and programming.

Magnificence Meets Method

There are many possible encodings for drawings. We will choose one which is fairly easy to
implement, and based on intersecting circles. The strokes in a drawing are arcs of these
circles. We call the circles used to generate drawings legal circles, while the arcs are legal
arcs. Here is an axiomatic specification of how to generate legal circles:

1. Arbitrarily define the a circle with radius 1 as legal. All other circles are
generated with respect to this circle. Define a second legal circle whose center is
on , and also has radius 1.
2. Wherever two legal circles of equal radius intersect, a third circle of equal radius is
centered at the point of intersection.
3. Every legal circle of radius has at its center another legal circle of radius .

A legal arc is then simply any arc of a legal circle, and a legal drawing is any list of legal
arcs, where each arc has a width corresponding to some fixed set of values. Now we
generate all circles which intersect the interior of the base circle , and sort them first by
radius, then by coordinate, then coordinate. Now given a specified order on the circles,
we may number them from 1 to , and specify a particular circle by its index in the list. In
this way, we have defined a coordinate space of arcs, with points of the form (center,
thickness, arc-start, arc-end), where the arc-start and art-end coordinates are measured in

We describe the programmatic construction of these circles later. For now, here is the
generated picture of all circles which intersect the unit circle up to radius :

The legal circles

In addition, we provide an animation showing the different layers:

And another animation displaying the list circles sorted by index in increasing order. For an
animated GIF, this file has a large size (5MB), and so we link to it separately.

As we construct smaller and smaller circles, the interior of the base circle is covered up by a
larger proportion of legally usable area. By using obscenely small circles, we may
theoretically construct any drawing. On the other hand, what we care about is how much
information is needed to do so.

Because of our nice well ordering on circles, those circles with very small radii will have
huge indices! Indeed, there are about four circles of radius for each circle of
radius in any fixed area. Then, we can measure the complexity of a drawing by how
many characters its list of legal arcs requires. Clearly, a rendition of Starry Night would
have a large number of high-indexed circles, and hence have high Kolmogorov complexity.
(On second thought, I wonder how hard it would be to get a rough sketch of a Starry-Night-
esque picture in this circle encodingit might not be all that complex).

Note that Schmidhuber defines things slightly differently. In particular, he requires that the
endpoints of a legal arc must be the intersection points of two other legal arcs, making the
arc-start and arc-end coordinates integers instead of radian measures. We respectfully
disagree with this axiom, and we explain why here:
Which of the two arcs is more complex?

Of the two arcs in the picture to the left, which would you say is more complex, the larger
or the smaller? We observe that two arcs of the same circle, regardless of how long or short
they are, should not be significantly different in complexity.

Schmidhuber, on the other hand, implicitly claims that arcs which begin or terminate at
non-standard locations (locations which only correspond to the intersections of sufficiently
small circles) should be deemed more complex. But this can be a difference as small
as , and it drastically alters the complexity. We consider this specification unrealistic,
at least to the extent to which human beings consider complexity in art. So we stick to

Indeed, our model does alter the complexity for some radian measures, simply because
finely specifying fractions requires more bits than integral values. But the change in
complexity is hardly as drastic.

In addition, Schmidhuber allows for region shading between legal arcs. Since we did not
find an easy way to implement this in Mathematica, we skipped it as extraneous.

Such Stuff as Programs are Made of

We implemented this circle encoding in Mathematica. The reader is encouraged
to download and experiment with the full notebook, available from this blogs Github page.
We will explain the important bits here.

First, we have a function to compute all the circles whose centers lie on a given circle:
borderCircleCenters[{x_, y_}, r_] :=
Table[{x + r Cos[i 2 Pi/6], y + r Sin[i 2 Pi/6]}, {i, 0, 5}];

We arbitrarily picked the first legal circle to be the unit circle, defined with center (0,0),
while the second has center (1,0). This made generating all legal circles a relatively simple
search task. In addition, we recognize that any arbitrary second chosen circle is simply a
rotation of this chosen configuration, so one may rotate their final drawing to accommodate
for a different initialization step.

Second, we have the brute-force search of all circles. We loop through all circles in a list,
generating the six border circles appropriately, and then filtering out the ones we need,
repeating until we have all the circles which intersect the interior of the unit circle. Note our
inefficiencies: we search out as far as radius 2 to find small circles which do not necessarily
intersect the unit circle, and we calculate the border circles of each circle many times. On
the other hand, finding all circles as small as radius takes about a minute on an Intel
Atom processor, which is not so slow to need excessive tuning for a prototypes sake.

getAllCenters[r_] := Module[{centers, borderCenters, searchR,

ord, rt},
ord[{a_, b_}, {c_, d_}] := If[a < c, True, b < d];
centers = {{0, 0}};

rt = Power[r, 1/2];
While[Norm[centers[[-1]]] <= Min[2, 1 + rt],
borderCenters = Map[borderCircleCenters[#, r] &, centers];
centers = centers \[Union] Flatten[borderCenters, 1]];

Sort[Select[centers, Norm[#] < 1 + r &], ord]


Finally, we have a function to extract from the resulting list of all centers the center and
radius of a given index, and a function to convert a coordinate to its graphical

(* extracts a pair {center, radius} given the

index of the circle *)
indexToCenterRadius[layeredCenters_, index_] :=
Module[{row, length, counter},
row = 1;
length = Length[layeredCenters[[row]]];
counter = index;

While[counter > length,

counter -= length;
length = Length[layeredCenters[[row]]];

{layeredCenters[[row, counter]], 1/2^(row - 1)}


drawArc[{index_, thickness_, arcStart_, arcEnd_}] :=

Module[{center, radius},
{center, radius} = indexToCenterRadius[allCenters, index];
Circle[center, radius, {arcStart, arcEnd}]},
ImagePadding -> 5, PlotRange -> {{-1, 1}, {-1, 1}},
ImageSize -> {400, 400}]

And a front-end style function, which takes a list of coordinates and draws the resulting

paint[coordinates_] := Show[Map[drawArc, coordinates]];

Any omitted details (at least one global variable name) are clarified in the notebook.

Now, with our paintbrush in hand, we unveil our very first low-complexity piece of art.
Behold! Surprised Mr. Moustache Witnessing a Collapsing Souffl:

Surprised Mr. Moustache, Jeremy Kun, 2011

Its coordinates are:

{{7, 0.005, 0, 2 Pi}, {197, 0.002, 0, 2 Pi},

{299, 0.002, 0, 2 Pi}, {783, 0.002, 0, 2 Pi},
{2140, 0.001, 0, 2 Pi}, {3592, 0.001, 0, 2 Pi},
{22, 0.004, 8 Pi/6, 10 Pi/6}, {29, 0.004, 4 Pi/3, 5 Pi/3},
{21, 0.004, Pi/3, 2 Pi/3}, {28, 0.004, Pi/3, 2 Pi/3}}

Okay, so its lame, and took all of ten minutes to create (guess-and-check on the indices is
quick, thanks to Mathematicas interpreter). But it has low Kolmogorov complexity! And
thats got to count for something, right?
Even if you disagree with our obviously inspired artistic genius, the
Mathematica framework for creating such drawings is free and available for anyone to play
with. So please, should you have any artistic talent at all (and access to Mathematica), we
would love to see your low-complexity art! If we somehow come across three days of being
locked in a room with access to nothing but a computer and a picture of Starry Night, we
might attempt to recreate a sketch of it for this blog. But until then, we will explore other

Happy sketching!

Addendum: Note that the outstanding problem here is how to algorithmically take a given
picture (or specification of what one wants to draw), and translate it into this system of
coordinates. As of now, no such algorithm is known, and hence we call the process of
making a drawing art. We may attempt to find such a method in the future, but it is likely
hard, and if we produced an algorithm even a quarter as good as we might hope, we would
likely publish a paper first, and blog about it second.

Share this:

5Share on Facebook (Opens in new window)5

Click to share on Google+ (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Twitter (Opens in new window)

This entry was posted in Algorithms, Design, Geometry, Logic and tagged art, computational
complexity, kolmogorov complexity, low-complexity
art, mathematica, mathematics, patterns, programming, turing machines. Bookmark the permalink.
Post navigation
False Proof There are Finitely Many Primes
False Proof 31.5 = 32.5

6 thoughts on Low Complexity Art

1. paxinum

July 11, 2011 at 11:42 pm Reply

What about fractals?


o j2kun

July 12, 2011 at 7:23 am Reply

Technically this circle construction is a fractal (if we drew larger circles and smaller
circles ad infinitum), but we are selecting pieces of the fractal with the goal of
constructing a specific picture. The difference here is that we have well-defined
curves to choose from, whereas in something like Mandelbrots set, its a gradient.
Of course, there are other fractals which draw specific pictures, like the Barnsley
fern, but these sorts of constructions have a disadvantage for our purposes because
each algorithm is specific to the object being created.

With this construction we have a distinct analytical advantage. Any drawing can be
drawn, so our framework is universal. And when we restrict our attention to this
particular style of art, any drawing can be compared to any other drawing in terms
of complexity. We could theoretically construct both the Mandelbrot set and the
Barnsley fern using the coordinate system, but our real problem is to find those
drawings which have very low complexity in this framework and are still beautiful.

So, wheres your drawing?


2. erniejunior

October 7, 2012 at 10:38 am Reply

This is a nice attempt to compare the Kolmogorov complexity of images, but what happens
if you try to compare the complexities of the rather simple geometric figures of a suare and
a circle?
Your system will tell you that the square has an infinite complexity (which is not true) and
the circle is rather simple. Your results are biased by the way you encode your information.
If you encoded your pictures with parts of straight lines instead of parts of circles the
comparision of a square and a circle would give you the opposite results (square rather
simple but the circle infinitely complex).
Now saying that you could just combine the circles-system and the lines-system would not
get you anywhere: now circles and squares are both simple (as they are supposed to be) but
for example a Mandelbrot fractal (with low Kolmogorov complexity) would still be graded
infinitely complex.
If you want any usable results you need to use math or any equivalent strong language to
encode your picture information. And then again the encoding of a picture is not unique any
more and you need to make sure that any image you draw/construct is encoded in the
simplest way possible which is equally hard as computing the Kolmogorov complexity.

This is not mean as a rant on your article. I love it and your whole blog. It makes me think
and teaches me! Thanks a lot for the effort you put into your articles.



o j2kun

October 7, 2012 at 11:09 am Reply

You make some very good points, and the square counterexample clearly came
from a mathematical mind! And I think the only rebuttal is this: the circle
framework is not designed to be useful. In fact, it is not hard to see that determining
the correct Kolmogorov complexity of any image is undecidable, since any string
can be interpreted as the pixel information of an image.

So it is not fruitful to search for such a system, because no system exists. Heres a
relatively simple example, however, of a more expressive system that encapsulates
both circles and squares: Bezier curves. However, this system is just more complex,
and it sidesteps the point of the article.

That is, this is a question about aesthetics: are designs with provably low
Kolmogorov complexity more beautiful than those with higher Kolmogorov
complexity (with respect to a universal encoding system)?

Whatever you believe, Kolmogorov complexity is pretty fascinating stuff. Im

planning to look at it in some more depth in the coming months. In particular it has
shown up in a number of machine learning applications.



October 7, 2012 at 11:30 am

I agree that some of my arguments stated the obvious. I just like to write
down my though process and make everything as easy to understand as
A system based on beziers would certeinly be worth to look into. Especially
since they are not that hard to implement.
I am looking forward to your future articles about Kolmogorov complexity
because I feel that it is very important even though I do not (yet) see how it
could be at least approximated and used in any way.


3. MSM

May 2, 2013 at 12:54 pm Reply

Im a bit late to the paty, but if you want some /really/ good low-complexity art, check old,
good demoscene stuff.

Random (Psychedelic) Art

Posted on January 1, 2012 by j2kun

And a Pinch of Python

Next semester I am a lab TA for an introductory programming course, and its taught in
Python. My Python experience has a number of gaps in it, so well have the opportunity for
a few more Python primers, and small exercises to go along with it. This time, well be
investigating the basics of objects and classes, and have some fun with image construction
using the Python Imaging Library. Disappointingly, the folks who maintain the PIL are
slow to update it for any relatively recent version of Python (its been a few years since 3.x,
honestly!), so this post requires one use Python 2.x (were using 2.7). As usual, the full
source code for this post is available on this blogs Github page, and we encourage the
reader to follow along and create his own randomized pieces of art! Finally, we include a
gallery of generated pictures at the end of this post. Enjoy!

How to Construct the Images

An image is a two-dimensional grid of pixels, and each pixel is a tiny dot of color displayed
on the screen. In a computer, one represents each pixel as a triple of numbers ,
where represents the red content, the green content, and the blue content. Each of
these is a nonnegative integer between 0 and 255. Note that this gives us a total
of distinct colors, which is nearly 17 million. Some estimates of how much
color the eye can see range as high as 10 million (depending on the definition of color) but
usually stick around 2.4 million, so its generally agreed that we dont need more.

The general idea behind our random psychedelic art is that we will generate three
randomized functions each with domain and codomain , and at
each pixel we will determine the color at that pixel by the
triple . This will require some translation between pixel
coordinates, but well get to that soon enough. As an example, if our colors are defined by
the functions , then the resulting image is:
We use the extra factor of because without it the oscillation is just too slow, and the
resulting picture is decidedly boring. Of course, the goal is to randomly generate such
functions, so we should pick a few functions on and nest them appropriately. The
first which come to mind are and simple multiplication. With these,
we can create such convoluted functions like

We could randomly generate these functions two ways, but both require randomness, so
lets familiarize ourselves with the capabilities of Pythons random library.

Random Numbers
Pseudorandom number generators are a fascinating topic in number theory, and one of these
days we plan to cover it on this blog. Until then, we will simply note the basics. First,
contemporary computers can not generate random numbers. Everything on a computer
is deterministic, meaning that if one completely determines a situation in a computer, the
following action will always be the same. With the complexity of modern operating systems
(and the aggravating nuances of individual systems), some might facetiously disagree.

For an entire computer the determined situation can be as drastic as choosing every single
bit in memory and the hard drive. In a pseudorandom number generator the determined
situation is a single number called a seed. This initializes the random number generator,
which then proceeds to compute a sequence of bits via some complicated arithmetic. The
point is that one may choose the seed, and choosing the same seed twice will result in the
same sequence of randomly generated numbers. The default seed (which is what one uses
when one is not testing for correctness) is usually some sort of time-stamp which is
guaranteed to never repeat. Flaws in random number generator design (hubris, off-by-one
errors, and even using time-stamps!) has allowed humans to take advantage of people who
try to rely on random number generators. The interested reader will find a detailed
account of how a group of software engineers wrote a program to cheat at online poker,
simply by reverse-engineering the random number generator used to shuffle the deck.
In any event, Python makes generating random numbers quite easy:

import random

print(random.choice(["clubs", "hearts", "diamonds", "spades"]))

We import the random library, we seed it with the default seed, we print out a random
number in , and then we randomly pick one element from a list. For a full list of the
functions in Pythons random library, see the documentation. As it turns out, we will only
need the choice() function.

Representing Mathematical Expressions

One neat way to represent a mathematical function is viaa function! In other words, just
like Racket and Mathematica and a whole host of other languages, Python functions are
first-class objects, meaning they can be passed around like variables. (Indeed, they are
objects in another sense, but we will get to that later). Further, Python has support for
anonymous functions, or lambda expressions, which work as follows:

>>> print((lambda x: x + 1)(4))


So one might conceivably randomly construct a mathematical expression by nesting


import math

def makeExpr():
if random.random() < 0.5:
return lambda x: math.sin(math.pi * makeExpr()(x))
return lambda x: x

Note that we need to import the math library, which has support for all of the necessary
mathematical functions and constants. One could easily extend this to support two
variables, cosines, etc., but there is one flaw with the approach: once weve constructed the
function, we have no idea what it is. Heres what happens:

>>> x = lambda y: y + 1
>>> str(x)
'<function <lambda> at 0xb782b144>'

Theres no way for Python to know the textual contents of a lambda expression at
runtime! In order to remedy this, we turn to classes.

The inquisitive reader may have noticed by now that lots of things in Python have
associated things, which roughly correspond to what you can type after suffixing an
expression with a dot. Lists have methods like [1,2,3,4].append(5), dictionaries have
associated lists of keys and values, and even numbers have some secretive methods:

>>> 45.7.is_integer()

In many languages like C, this would be rubbish. Many languages distinguish

between primitive types and objects, and numbers usually fall into the former category.
However, in Python everything is an object. This means the dot operator may be used after
any type, and as we see above this includes literals.

A class, then, is just a more transparent way of creating an object with certain associated
pieces of data (the fancy word is encapsulation). For instance, if I wanted to have a type
that represents a dog, I might write the following Python program:

class Dog:
age = 0
name = ""

def bark(self):
print("Ruff ruff! (I'm %s)" % self.name)

Then to use the new Dog class, I could create it and set its attributes appropriately:

fido = Dog()
fido.age = 4
fido.name = "Fido"
fido.weight = 100

The details of the class construction requires a bit of explanation. First, we note that the
indented block of code is arbitrary, and one need not initialize the member variables.
Indeed, they simply pop into existence once they are referenced, as in the creation of the
weight attribute. To make it more clear, Python provides a special function called
__init__() (with two underscores on each side of init; heaven knows why they decided
it should be so ugly), which is called upon the creation of a new object, in this case the
expression Dog(). For instance, one could by default name their dogs Fido as follows:

class Dog:
def __init__(self):
self.name = "Fido"

d = Dog()
d.name # contains "Fido"

This brings up another point: all methods of a class that wish to access the attributes of the
class require an additional argument. The first argument passed to any method is always the
object which represents the owning instance of the object. In Java, this is usually hidden
from view, but available by the keyword this. In Python, one must explicitly represent it,
and it is standard to name the variable self.

If we wanted to give the user a choice when instantiating their dog, we could include an
extra argument for the name like this:

class Dog:
def __init__(self, name = 'Fido'):
self.name = name
d = Dog()
d.name # contains "Fido"
e = Dog("Manfred")
e.name # contains "Manfred"

Here we made it so the name argument is not required, and if it is excluded we default to

To get back to representing mathematical functions, we might represent the identity

function on by the following class:

class X:
def eval(self, x, y):
return x

expr = X()
expr.eval(3,4) # returns 3

Thats simple enough. But we still have the problem of not being able to print anything
sensibly. Trying gives the following output:

>>> str(X)

In other words, all it does is print the name of the class, which is not enough if we want to
have complicated nested expressions. It turns out that the str function is quite special.
When one calls str() of something, Python first checks to see if the object being called has
a method called __str__(), and if so, calls that. The awkward __main__.X is a default
behavior. So if we soup up our class by adding a definition for __str__(), we can define
the behavior of string conversion. For the X class this is simple enough:

class X:
def eval(self, x, y):
return x

def __str__(self):
return "x"

For nested functions we could recursively convert the argument, as in the following
definition for a SinPi class:

class SinPi:
def __str__(self):
return "sin(pi*" + str(self.arg) + ")"

def eval(self, x, y):

return math.sin(math.pi * self.arg.eval(x,y))

Of course, this requires we set the arg attribute before calling these functions, and since
we will only use these classes for random generation, we could include that sort of logic in
the __init__() function.

To randomly construct expressions, we create the function buildExpr, which randomly

picks to terminate or continue nesting things:
def buildExpr(prob = 0.99):
if random.random() < prob:
return random.choice([SinPi, CosPi, Times])(prob)
return random.choice([X, Y])()

Here we have classes for cosine, sine, and multiplication, and the two variables. The reason
for the interesting syntax (picking the class name from a list and then instantiating it, noting
that these classes are objects even before instantiation and may be passed around as well!),
is so that we can do the following trick, and avoid unnecessary recursion:

class SinPi:
def __init__(self, prob):
self.arg = buildExpr(prob * prob)


In words, each time we nest further, we exponentially decrease the probability that we will
continue nesting in the future, and all the nesting logic is contained in the initialization of
the object. Were building an expression tree, and then when we evaluate an expression we
have to walk down the tree and recursively evaluate the branches appropriately.
Implementing the remaining classes is a quick exercise, and we remind the reader that the
entire source code is available from this blogs Github page. Printing out such expressions
results in some nice long trees, but also some short ones:

>>> str(buildExpr())
>>> str(buildExpr())
>>> str(buildExpr())
>>> str(buildExpr())
>>> str(buildExpr())
>>> str(buildExpr())

This should work well for our goals. The rest is constructing the images.

Images in Python, and the Python Imaging Library

The Python imaging library is part of the standard Python installation, and so we can access
the part we need by adding the following line to our header:

from PIL import Image

Now we can construct a new canvas, and start setting some pixels.

canvas = Image.new("L00,300))
canvas.putpixel((150,150), 255)
This gives us a nice black square with a single white pixel in the center. The L argument
to Image.new() says were working in grayscale, so that each pixel is a single 0-255 integer
representing intensity. We can do this for three images, and merge them into a single color
image using the following:

finalImage = Image.merge("RGB (redCanvas, greenCanvas, blueCanvas))

Where we construct redCanvas, greenCanvas, and blueCanvas in the same way

above, but with the appropriate intensities. The rest of the details in the Python code are left
for the reader to explore, but we dare say it is just bookkeeping and converting between
image coordinate representations. At the end of this post, we provide a gallery of the
randomly generated images, and a text file containing the corresponding expression trees is
packaged with the source code on this blogs Github page.

Extending the Program With New Functions!

There is decidedly little mathematics in this project, but there are some things we can
discuss. First, we note that there are many many many functions on the interval that
we could include in our random trees. A few examples are: the average of two numbers in
that range, the absolute value, certain exponentials, and reciprocals of interesting sequences
of numbers. We leave it as an exercise to the reader to add new functions to our existing
code, and to further describe which functions achieve coherent effects.

Indeed, the designs are all rather psychedelic, and the layers of color are completely
unrelated. It would be an interesting venture to write a program which, given an image of
something (pretend its a simple image containing some shapes), constructs expression trees
that are consistent with the curves and lines in the image. This follows suit with our goal of
constructing low-complexity pictures from a while back, and indeed, these pictures have
rather low Kolmogorov complexity. This method is another framework in which to describe
their complexity, in that smaller expression trees correspond to simpler pictures. We leave
this for future work. Until then, enjoy these pictures!


The picture generated by (sin(pi*x), cos(pi*x*y), sin(pi*y))

Share this:

8Share on Facebook (Opens in new window)8

Click to share on Google+ (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Twitter (Opens in new window)

This entry was posted in Analysis, Design, Primers, Programming Languages and
tagged art, kolmogorov complexity, primer, python, random number generators. Bookmark
the permalink.
Post navigation
Row Reduction Over A Field
Numerical Integration

8 thoughts on Random (Psychedelic) Art

1. jakesprinter

January 1, 2012 at 7:40 pm Reply

Brilliant colors

wish you luck for 2012


2. Axio

January 3, 2012 at 7:27 pm Reply

Funny that this idea shows up on planet scheme. It had already three years ago (cant find
the original link, though), and I had given it a try as
well: http://fp.bakarika.net/index.cgi?show=5 (with ugly code).



3. sudonhim

June 3, 2012 at 3:20 am Reply

OK I included the random library. Available are:

Random,WichmannHill, betavariate, choice, division,

expovariate, gammavariate, gauss, getrandbits, getstate,
jumpahead, lognormvariate, normalvariate, paretovariate,
randint, random, randrange, sample, seed, setstate,
shuffle, triangular, uniform, vonmisesvariate,
weibullvariate, SystemRandom
Awesome images btw!


o j2kun

June 3, 2012 at 11:16 am Reply

I cant access your server. Did you set up a thing for users to create python images

For the random library all you need is random.choice()



June 3, 2012 at 6:50 pm

Sorry about that, some dev going on here!

Python image server is up again, however we are porting it (have a 95%
working version!) to client-side JavaScript. Although I do prefer Python, its
just too much load on the server
If you want to go ahead anyway, any images you submit to the Python
server will still appear with source in the next version, people just wont be
able to reuse the code.


4. EmoryM

May 11, 2013 at 8:05 pm Reply

If you havent read this yet, youd probably enjoy

it http://www.cs.ucf.edu/complex/papers/stanley_gpem07.pdf


5. anorthhare

January 23, 2015 at 8:05 pm Reply

Thanks for posting this, Ive just begun to scratch the surface of what Python can do. Its a
wonderful language. Im working on a program to turn images into sound. I enjoyed your
examples, they are very useful.

6. Bidobido

January 15, 2016 at 8:42 am Reply

Cant believe Ive been having so much fun with this piece of code since yesterday !!
Just added a Plus class, coded in the same fashion as the Times class, that adds its
arguments and trims the result to be in [-1,1]. Results are breathtaking (much more
so than averaging) ! Thank you for these genuinely pythonic ideas about expression
nesting by the way.

Seam Carving for Content-Aware Image

Posted on March 4, 2013 by j2kun

The Problem with Cropping

Every programmer or graphic designer with some web development experience can attest to
the fact that finding good images that have an exactly specified size is a pain. Since the
dimensions of the sought picture are usually inflexible, an uncomfortable compromise can
come in the form of cropping a large image down to size or scaling the image to have
appropriate dimensions.

Both of these solutions are undesirable. In the example below, the caterpillar looks distorted
in the scaled versions (top right and bottom left), and in the cropped version (bottom right)
its more difficult to tell that the caterpillar is on a leaf; we have lost the surrounding
In this post well look at a nice heuristic method for rescaling images called seam-
carving, which pays attention to the contents of the image as it resacles. In particular, it
only removes or adds pixels to the image that the viewer is least-likely to notice. In all but
the most extreme cases it will avoid the ugly artifacts introduced by cropping and scaling,
and with a bit of additional scaffolding it becomes a very useful addition to a graphic
designers repertoire. At first we will focus on scaling an image down, and then we will see
that the same technique can be used to enlarge an image.

Before we begin, we should motivate the reader with some examples of its use.
Its clear that the caterpillar is far less distorted in all versions, and even in the harshly
rescaled version, parts of the green background are preserved. Although the leaf is warped a
little, it is still present, and its not obvious that the image was manipulated.

Now that the readers appetite has been whet, lets jump into the mathematics of it. This
method was pioneered by Avidan and Shamir, and the impatient reader can jump straight to
their paper (which contains many more examples). In this post we hope to fill in the
background and show a working implementation.
Images as Functions
One common way to view an image is as an approximation to a function of two real
variables. Suppose we have an -pixel image ( rows and columns of pixels). For
simplicity (during the next few paragraphs), we will also assume that the pixel values of an
image are grayscale intensity values between 0 and 255. Then we can imagine the pixel
values as known integer values of a function . That is, if we take two
integers and then we know the value ; its just the intensity
value at the corresponding pixel. For values outside these ranges, we can impose arbitrary
values for (we dont care whats happening outside the image).

Moreover, it makes sense to assume that is a well-behaved function in between the pixels
(i.e. it is differentiable). And so we can make reasonable guessed as to the true derivative
of by looking at the differences between adjacent pixels. There are many ways to get a
good approximation of the derivative of an image function, but we should pause a moment
to realize why this is important to nail down for the purpose of resizing images.

A good rule of thumb with images is that regions of an image which are most important to
the viewer are those which contain drastic changes in intensity or color. For instance,
consider this portrait of Albert Einstein.

Which parts of this image first catch the eye? The unkempt hair, the wrinkled eyes, the
bushy mustache? Certainly not the misty background, or the subtle shadows on his chin.

Indeed, one could even claim that an image having a large derivative at a certain pixel
corresponds to high information content there (of course this is not true of all images, but
perhaps its reasonable to claim this for photographs). And if we want to scale an image
down in size, we are interested in eliminating those regions which have the smallest
information content. Of course we cannot avoid losing some information: the image after
resizing is smaller than the original, and a reasonable algorithm should not add any new
information. But we can minimize the damage by intelligently picking which parts to
remove; our naive assumption is that a small derivative at a pixel implies a small amount of

Of course we cant just remove regions of an image to change its proportions. We have to
remove the same number of pixels in each row or column to reduce the corresponding
dimension (width or height, resp.). Before we get to that, though, lets write a program to
compute the gradient. For this program and the rest of the post we will use the Processing
programming language, and our demonstrations will use the Javascript cross-
compiler processing.js. The nice thing about Processing is that if you know Java then you
know processing. All the basic language features are the same, and its just got an extra few
native types and libraries to make graphics rendering and image displaying easier. As
usual, all of the code used in this blog post is available on this blogs Github page.

Lets compute the gradient of this picture, and call the picture :

A very nice picture whose gradient we can compute. It was taken by the artist Ria

Since this is a color image, we will call it a function , in the sense that the
input is a plane coordinate , and the output is a triple of color
intensity values. We will approximate the images partial
derivative at by inspecting values of in a neighborhood of the

For each pixel we call the value the partial derivative in

the direction, and the partial in the direction. Note that
the values are vectors, so the norm signs here are really computing
the distance between the two values of .

There are two ways to see why this makes sense as an approximation. The first is analytic:
by definition, the partial derivative is a limit:
It turns out that this limit is equivalent to

And the closer gets to zero the better the approximation of the limit is. Since the closest
we can make is (we dont know any other values of with nonzero ), we plug in
the corresponding values for neighboring pixels. The partial is similar.

The second way to view it is geometric.

The slope of the blue secant line is not a bad approximation to the derivative at x, provided
the resolution is fine enough.

The salient fact here is that a nicely-behaved curve at will have a derivative close to the
secant line between the points and . Indeed, this idea
inspires the original definition of the derivative. The slope of the secant line is
just . As we saw in our post on numerical integration, we can do
much better than a linear guess (specifically, we can use do any order of polynomial
interpolation we wish), but for the purposes of displaying the concept of seam-carving, a
linear guess will suffice.

And so with this intuitive understanding of how to approximate the gradient, the algorithm
to actually do it is a straightforward loop. Here we compute the horizontal gradient (that is,
the derivative ).

PImage horizontalGradient(PImage img) {

color left, right;
int center;
PImage newImage = createImage(img.width, img.height, RGB);

for (int x = 0; x < img.width; x++) {

for (int y = 0; y < img.height; y++) {
center = x + y*img.width;

left = x == 0 ? img.pixels[center] : img.pixels[(x-1) +

right = x == img.width-1 ? img.pixels[center] : img.pixels[(x+1) +

newImage.pixels[center] = color(colorDistance(left, right));


return newImage;

The details are a bit nit-picky, but the idea is simple. If were inspecting a non-edge pixel,
then we can use the formula directly and compute the values of the neighboring left and
right pixels. Otherwise, the left pixel or the right pixel will be outside the bounds of the
image, and so we replace it with the pixel were inspecting. Mathematically, wed be
computing the difference and .
Additionally, since well later only be interested in the relativesizes of the gradient, we can
ignore the factor of 1/2 in the formula we derived.

The parts of this code that are specific to Processing also deserve some attention.
Specifically, we use the built-in types PImage and color, for representing images and colors,
respectively. The createImage function creates an empty image of the specified size. And
peculiarly, the pixels of a PImage are stored as a one-dimensional array. So as were
iterating through the rows and columns, we must compute the correct location of the sought
pixel in the pixel array (this is why we have a variable called center). Finally, as in Java,
the ternary if notation is used to keep the syntax short, and those two lines simply check for
the boundary conditions we stated above.

The last unexplained bit of the above code is the colorDistance function. As our image
function has triples of numbers as values, we need to compute the distance between
two values via the standard distance formula. We have encapsulated this in a separate
function. Note that because (in this section of the blog) we are displaying the results in an
image, we have to convert to an integer at the end.

int colorDistance(color c1, color c2) {

float r = red(c1) - red(c2);
float g = green(c1) - green(c2);
float b = blue(c1) - blue(c2);
return (int)sqrt(r*r + g*g + b*b);

Lets see this in action on the picture we introduced earlier.

The reader who
is interested in comparing the two more closely may visit this interactive page. Note that we
only compute the horizontal gradient, so certain locations in the image have a large
derivative but are still dark in this image. For instance, the top of the door in the background
and the wooden bars supporting the bottom of the chair are dark despite the vertical color

The vertical gradient computation is entirely analogous, and is left as an exercise to the

Since we want to inspect both vertical and horizontal gradients, we will call the total
gradient matrix the matrix whose entries are the sums of the magnitudes of the
horizontal and vertical gradients at :

The function is often called an energy function for . We will mention now
that there are other energy functions one can consider, and use this energy function for the
remainder of this post.

Seams, and Dynamic Programming

Back to the problem of resizing, we want a way to remove only those regions of an image
that have low total gradient across all of the pixels in the region removed. But of course
when resizing an image we must maintain the rectangular shape, and so we have to add or
remove the same number of pixels in each column or row.

For the purpose of scaling an image down in width (and the other cases are similar), we
have a few options. We could find the pixel in each row with minimal total gradient and
remove it. More conservatively, we could remove those columns with minimal gradient (as
a sum of the total gradient of each pixel in the column). More brashly, we could just remove
pixels of lowest gradient willy-nilly from the image, and slide the rows left.
If none of these ideas sound like they would work, its because they dont. We encourage
the unpersuaded reader to try out each possibility on a variety of images to see just how
poorly they perform. But of these options, removing an entire column happens to distort the
image less than the others. Indeed, the idea of a seam in an image is just a slight
generalization of a column. Intuitively, a seam is a trail of pixels traversing the image
from the bottom to the top, and at each step the pixel trail can veer to the right or left by at
most one pixel.

Definition: Let be an image with nonnegative integer coordinates indexed from

zero. A vertical seam in is a list of coordinates with the following

is at the bottom of the image.

is at the top of the image.
is strictly increasing.
for all .

These conditions simply formalize what we mean by a seam. The first and second impose
that the seam traverses from top to bottom. The third requires the seam to always go up,
so that there is only one pixel in each row. The last requires the seam to be connected in
the sense that it doesnt veer too far at any given step.

Here are some examples of some vertical seams. One can easily define horizontal seams by
swapping the placement of in the above list of conditions.
So the goal is now to remove the seams of lowest total gradient. Here the total gradient of a
seam is just the sum of the energy values of the pixels in the seam.

Unfortunately there are many more seams to choose from than columns (or even individual
pixels). It might seem difficult at first to find the seam with the minimal total gradient.
Luckily, if were only interested in minima, we can use dynamic programming to compute
the minimal seam ending at any given pixel in linear time.

We point the reader unfamiliar with dynamic programming to our Python primer on this
topic. In this case, the sub-problem were working with is the minimal total gradient value
of all seams from the bottom of the image to a fixed pixel. Lets call this value . If
we know for all pixels below, say, row , then we can compute the for
the entire row by taking pixel , and adding its gradient value to the
minimum of the values of possible predecessors in a
seam, (respecting the appropriate boundary conditions).

Once weve computed for the entire matrix, we can look at the minimal value at the
top of the image , and work backwards down the image to compute which
seam gave us this minimum.

Lets make this concrete and compute the function as a two-dimensional array called

void computeVerticalSeams() {
seamFitness = new float[img.width][img.height];
for (int i = 0; i < img.width; i++) {
seamFitness[i][0] = gradientMagnitude[i][0];

for (int y = 1; y < img.height; y++) {

for (int x = 0; x < img.width; x++) {
seamFitness[x][y] = gradientMagnitude[x][y];

if (x == 0) {
seamFitness[x][y] += min(seamFitness[x][y-1], seamFitness[x+1][y-
} else if (x == img.width-1) {
seamFitness[x][y] += min(seamFitness[x][y-1], seamFitness[x-1][y-
} else {
seamFitness[x][y] += min(seamFitness[x-1][y-1], seamFitness[x][y-
1], seamFitness[x+1][y-1]);

We have two global variables at work here (global is bad, I know, but its Processing; its
made for prototyping). The seamFitness array, and the gradientMagnitude array. We
assume at the start of this function that the gradientMagnitude array is filled with sensible
Here we first initialize the zeroth row of the seamFitness array to have the same values as
the gradient of the image. This is simply because a seam of length 1 has only one gradient
value. Note here the coordinates are a bit backwards: the first coordinate represents the
choice of a column, and the second represents the choice of a row. We can think of the
coordinate axes of our image function having the origin in the bottom-left, the same as we
might do mathematically.

Then we iterate over the rows in the matrix, and in each column we compute the fitness

based on the fitness of the previous row. Thats it

To actually remove a seam, we need to create a new image of the right size, and shift the
pixels to the right (or left) of the image into place. The details are technically important, but
tedious to describe fully. So we leave the inspection of the code as an exercise to the reader.
We provide the Processing code on this blogs Github page, and show an example of its use
below. Note each the image resizes every time the user clicks within the image.

Photograph by Raphael Goetter.

Its interesting (and indeed the goal) to see how at first nothing is warped, and then the lines
on the walls curve around the womans foot, and then finally the womans body is distorted
before she gets smushed into a tiny box by the oppressive mouse.

As a quick side note, we attempted to provide an interactive version of this Processing

program online in the same way we did for the gradient computation example. Processing is
quite nice in that any Processing program (which doesnt use any fancy Java libraries) can
be cross-compiled to Javascript via the processing.js library. This is what we did for the
gradient example. But in doing so for the (admittedly inefficient and memory-leaky) seam-
carving program, it appeared to run an order of magnitude slower in the browser than
locally. This was this authors first time using Processing, so the reason for the drastic jump
in runtime is unclear. If any readers are familiar with processing.js, a clarification would be
very welcome in the comments.

Inserting Seams, Removing Objects, and Videos

In addition to removing seams to scale an image down, one can just as easily insertseams to
make an image larger. To insert a seam, just double each pixel in the seam and push the rest
of the pixels on the row to the right. The process is not hard, but it requires avoiding one
pitfall: if we just add a single seam at a time, then the seam with minimum total energy will
never change! So well just add the same seam over and over again. Instead, if we want to
add seams, one should compute the minimum seams and insert them all. If the desired
resize is too large, then the programmer should pick an appropriate batch size and add
seams in batches.

Another nice technique that comes from the seam-carving algorithm is to intelligently
protect or destroy specific regions in the image. To do this requires a minor modification of
the gradient computation, but the rest of the algorithm is identical. To protect a region,
provide some way of user input specifying which pixels in the image are important, and
give those pixels an artificially large gradient value (e.g., the maximum value of an integer).
If the down-scaling is not too extreme, the seam computations will be guaranteed not to use
any of those pixels, and inserted seams will never repeat those pixels. To remove a region,
we just give the desired pixels an arbitrarily low gradient value. Then these pixels will be
guaranteed to occur in the minimal seams, and will be removed from the picture.

The technique of seam-carving is a very nice tool, and as we just saw it can be extended to a
variety of other techniques. In fact, seam-carving and its applications to object removal and
image resizing are implemented in all of the recent versions of Photoshop. The techniques
are used to adapt applications to environments with limited screen space, such as a mobile
phone or tablet. Seam carving can even be adapted for use in videos. This involves an
extension of the dynamic program to work across multiple frames, formally finding a
minimal graph cut between two frames so that each piece of the cut is a seam in the
corresponding frame. Of course there is a lot more detail to it (and the paper linked above
uses this detail to improve the basic image-resizing algorithm), but thats the rough idea.

Weve done precious little on this blog with images, but wed like to get more into graphics
programming. Theres a wealth of linear algebra, computational geometry, and artificial
intelligence hiding behind most of the computer games we like to play, and it would be fun
to dive deeper into these topics. Of course, with every new post this author suggests ten new
directions for this blog to go. Its a curse and a blessing.
Until next time!

Share this:

21Share on Facebook (Opens in new window)21

Click to share on Google+ (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Twitter (Opens in new window)

This entry was posted in Algorithms, Discrete, Optimization and

tagged calculus, graphics, javascript, mathematics, photoshop, processing, programming, seam
carving. Bookmark the permalink.
Post navigation
Methods of Proof Contradiction
Methods of Proof Induction

11 thoughts on Seam Carving for Content-Aware Image Scaling

1. k

March 5, 2013 at 2:13 am Reply

Note: GIMP too has had this with the resynthesizer (liquid rescale) plugin for some time.


2. jakeanq

March 5, 2013 at 5:24 am Reply

I would love to see some of these concepts integrated into some form of game as
mentioned, games use so much math that to see the process of game design from more of a
mathematical angle instead of the all-programming employed in other tutorials and blogs
would be very interesting.


o j2kun

March 5, 2013 at 9:14 am Reply

Games do use a lot of math, and its usually in the form of vector calculus to
emulate physics. The most sophisticated math usually goes into the graphics engine
itself: shading, lighting, texturing, etc all require a ton of linear algebra, and things
like particle flow (for water wind, flowing cloth) require differential equations
which are discretized and attacked with linear algebra.

I would be really interested to get into these sorts of things, but to be honest Ive
never done any sort of graphics programming outside of basic 2d games. So Ill just

have to do my research into OpenGL


3. Rafael Carrascosa

March 5, 2013 at 7:52 am Reply

Hi, nice article, interesting idea

I have one question and one request:
-Why dynamic programming and not dijkstras graph search? or more advanced stuff like
A* and other informed searchs?
-When inserting seams you suggest picking up the k minimal, afaik this is a non-trivial task,
could you do a post sometime on the k-shortest-path algorithms?

Thank you!


o j2kun

March 5, 2013 at 9:01 am Reply

Finding the k shortest *seams* in the seam carving example is not hard: just find
the k smallest entries of the bottom row of the seam fitness array, and compute the
seams starting at those positions. Since we dont want to remove two seams with the
same starting position, we dont care if the two shortest paths overlap at their base.
The problem is when those seams overlap even when they have different starting
points (and the overlap cant be avoided). Thats why I remove the seams one by
one in the above code, and a priori it seems this problem would occur in any search
algorithm. Im honestly not sure how this is overcome in practice, but at least for
adding seams it could simply be ignored.

AFAIK Dijkstras algorithm is used for this problem as well as dynamic

programming, and the complexity is only off by a (probably small) constant factor. I
dont think A* search would be as useful since we have multiple starting points
(though there are probably variants of A* that account for this, I think dynamic
programming is a simpler solution).

4. Andy Bennett

March 5, 2013 at 9:25 am Reply

Nice explanation: thanks!

Have you tried swapping the order of your for loops? That will make it more cache-friendly
and may speed it up by a few orders of magnitude! At the moment youre indexing your
array in such that it gives you bad locality as youre sequentially plucking elements out
from each column (in which youre storing the x values) rather than running along the
rows of the array (where youre storing y).

Ive been having trouble with the K-smallest seams as well. I can see how you can get the
smallest, but I cant see how you avoid the complexity exploding again on the way back
down the image.
Consider this matrix:


The two smallest seams land in column 7. How do you get the algorithm to select the right
hand path? Using min on the way down will always cause this path to be missed.

Moreover, one of the replies in the comments mentions that the seams are removed one-by-
one. Surely you have to recomupte the weights and paths each time you remove a seam
otherwise you are prone to minute distortions? Do you shift the pixels hard-right, hard-left
or do you perform some other kind of row-averaging when you remove a pixel from an
arbitrary position in a row?

Thanks for an interesting article. I wasnt previously aware of this particular method!

PS: your comments box doesnt resize in a friendly way when one pastes a bunch of lines



o j2kun

March 5, 2013 at 9:44 am Reply

Its been a long time since I thought about locality Ill give that a try

I think the matrix you gave is the matrix for gradient values (because the matrix for
seam fitness values is monotonically increasing). Before we try to compute seams,
we transform the gradient matrix into the seam fitness matrix, and from there
finding the minimal paths is much easier. In any event, we certainly dont use min
on the gradient matrix: that would give non-optimal paths.

Yes, I do recompute everything after each seam removal. I shift pixels hard-right,
and actually construct a new image of the right size to store the shifted image.


5. Vince P.

March 15, 2013 at 2:00 pm Reply

Hey, where are your permalinks? Dufuses like me like to add your articles to Pocket or the
like and currently I have to go to your comments page just to get a URL I can add.

Just a thought.


o j2kun

March 15, 2013 at 5:20 pm Reply

Theyre at the bottom of each post. And I believe the permalinks arent any
different from the regular URLs


Vince P.

March 30, 2013 at 11:28 am

Well, if you put them at the top of the post, then dufuses like me will never
ask that question again. Just a thought.


6. Severyn Kozak

January 18, 2015 at 1:26 pm Reply

Fantastic post. Sums up seam carving really well from start to finish, provides working code
examples, *and* contains a formal mathematical angle.
The Cellular Automaton Method for Cave
Posted on July 29, 2012 by j2kun

Dear reader, this post has an interactive simulation! We encourage you to play with it as
you read the article below.

In our series of posts on cellular automata, we explored Conways classic Game of Life and
discovered some interesting patterns therein. And then in our primers on computing theory,
we built up a theoretical foundation for similar kinds of machines, including a discussion
of Turing machines and the various computational complexity classes surrounding them.
But cellular automata served us pretty exclusively as a toy. It was a basic model of
computation, which we were interested in only for its theoretical universality. One wouldnt
expect too many immediately practical (and efficient) applications of something which
needs a ridiculous scale to perform basic logic. In fact, its amazing that there are as many
as there are.

In this post well look at one particular application of cellular automata to procedural level
generation in games.
An example of a non-randomly generated cave level from Bethesdas The Elder Scrolls

The Need for More Caves

Level design in video games is a time-consuming and difficult task. Its extremely difficult
for humans to hand-craft areas that both look natural and are simultaneously fun to play in.
This is particularly true of the multitude of contemporary role-playing games modeled
after Dungeons and Dragons, in which players move through a series of areas defeating
enemies, collecting items, and developing their character. With a high demand for such
games and so many levels in each game, it would save an unfathomable amount of money
to have computersgenerate the levels on the fly. Perhaps more importantly, a game with
randomly generated levels inherently has a much higher replay value.

The idea of randomized content generation (often called procedural generation) is not
particularly new. It has been around at least since the 1980s. Back then, computers simply
didnt have enough space to store large, complex levels in memory. To circumvent this
problem, video game designers simply generated the world as the player moved through it.
This opened up an infinitude of possible worlds for the user to play in, and the seminal
example of this is a game called Rogue, which has since inspired series such
as Diablo, Dwarf Fortress, and many many others. The techniques used to design these
levels have since been refined and expanded into a toolbox of techniques which have
become ubiquitous in computer graphics and game development.

Well explore more of these techniques in the future, but for now well see how a cellular
automaton can be used to procedurally generate two-dimensional cave-like maps.

A Quick Review of Cellular Automata

While the interested reader can read more about cellular automata on this blog, we will give
a quick refresher here.

For our purposes here, a 2-dimensional cellular automaton is a grid of cells , where each
cell is in one of a fixed number of states, and has a pre-determined and fixed set of
neighbors. Then is updated by applying a fixed rule to each cell simultaneously, and the
process is repeated until something interesting happens or boredom strikes the observer.
The most common kind of cellular automaton, called a Life-like automaton, has only two
states, dead and alive (for us, 0 and 1), and the rule applied to each cell is given as
conditions to be born or survive based on the number of adjacent live cells. This is often
denoted Bx/Sy where x and y are lists of single digit numbers. Furthermore, the choice of
neighborhood is the eight nearest cells (i.e., including the diagonally-adjacent ones). For
instance, B3/S23 is the cellular automaton rule where a cell is born if it has three living
neighbors, and it survives if it has either two or three living neighbors, and dies otherwise.
Technically, these are called Life-like automata, because they are modest generalizations
of Conways original Game of Life. We give an example of a B3/S23 cellular automaton
initialized by a finite grid of randomly populated cells below. Note that each of the black
(live) cells in the resulting stationary objects satisfy the S23 part of the rule, but none of the
neighboring white (dead) cells satisfy the B3 condition.
A cellular automaton should really be defined for an arbitrary graph (or more generally, an
arbitrary state space). There is really nothing special about a grid other than that its easy to
visualize. Indeed, some cellular automata are designed for hexagonal grids, others are
embedded on a torus, and still others are one- or three-dimensional. Of course, nothing
stops automata from existing in arbitrary dimension, or from operating with arbitrary (albeit
deterministic) rules, but to avoid pedantry we wont delve into a general definition here. It
would take us into a discussion of discrete dynamical systems (of which there are many,
often with interesting pictures).

It All Boils Down to a Simple Rule

Now the particular cellular automaton we will use for cave generation is simply
B678/S345678, applied to a random initial grid with a fixed live border. We interpret the
live cells as walls, and the dead cells as open space. This rule should intuitively work: walls
will stay walls even if more cells are born nearby, but isolated or near-isolated cells will
often be removed. In other words, this cellular automaton should smooth out a grid
arrangement to some extent. Here is an example animation quickly sketched up in
Mathematica to witness the automaton in action:
An example cave generated via the automaton rule B678/S345678. The black cells are
alive, and the white cells are dead.

As usual, the code to generate this animation (which is only a slight alteration to the code
used in our post on cellular automata) is available on this blogs Github page.

This map is already pretty great! It has a number of large open caverns, and they are
connected by relatively small passageways. With a bit of imagination, it looks absolutely

We should immediately note that there is no guarantee that the resulting regions of
whitespace will be connected. We got lucky with this animation, in that there are only two
disconnected components, and one is quite small. But in fact one can be left with multiple
large caves which have no connecting paths.

Furthermore, we should note the automatons rapid convergence to a stable state. Unlike
Conways Game of Life, in practice this automaton almost always converges within 15
steps, and this author has yet to see any oscillatory patterns. Indeed, they are unlikely to
exist because the survival rate is so high, and our initial grid has an even proportion of live
and dead cells. There is no overpopulation that causes cells to die off, so once a cell is born
it will always survive. The only cells that do not survive are those that begin isolated. In a
sense, B678/S345678 is designed to prune the sparse areas of the grid, and fill in the dense
areas by patching up holes.

We should also note that the initial proportion of cells which are alive has a strong effect on
the density of the resulting picture. For the animation we displayed above, we initially
chose that 45% of the cells would be live. If we increase that a mere 5%, we get a picture
like the following.
A cave generated with the initial proportion of live cells equal to 0.5

As expected, there are many more disconnected caverns. Some game designers prefer a
denser grid combined with heuristic methods to connect the caverns. Since our goal is just
to explore the mathematical ideas, we will leave this as a parameter in our final program.

Javascript Implementation, and Greater Resolution

One important thing to note is that B678/S345678 doesnt scale well to fine grid sizes. For
instance, if we increase the grid size to 200200, we get something resembling an
awkward camouflage pattern.
A 200200 grid cave generation. Click the image to enlarge it.

What we really want is a way to achieve the major features of the low-resolution image on a
larger grid. Since cellular automata are inherently local manipulations, we should not
expect any modification of B678/S345678 to do this for us. Instead, we will use
B678/345678 to create a low-resolution image, increase its resolution manually, and smooth
it out with you guessed it another cellular automaton! Well design this automaton
specifically for the purpose of smoothing out corners.

To increase the resolution, we may simply divide the cells into four pieces. The picture
doesnt change, but the total number of cells increases fourfold. There are a few ways to do
this programmatically, but the way we chose simply uses the smallest resolution possible,
and simulates higher resolution by doing block computations. The interested programmer
can view our Javascript program available on this blogs Github page to see this directly (or
view the page source of this posts interactive simulator).

To design a smoothing automaton, we should investigate more closely what we need to

improve on in the above examples. In particular, once we increase the resolution, we will
have a lot of undesirable convex and concave corners. Since a corner is simply a block
satisfying certain local properties, we can single those out to be removed by an automaton.
Its easy to see that convex corners have exactly 3 live neighbors, so we should not allow
those cells to survive. Similarly, the white cell just outside a concave corner has 5 live
neighbors, so we should allow that cell to be born. On the other hand, we still want the
major properties of our old B678/S345678 to still apply, so we can simply add 5 to the B
part and remove 3 from the S part. Lastly, for empirical reasons, we also decide to kill off
cells with 4 live neighbors.

And so our final smoothing automaton is simply B5678/S5678.

We present this application as an interactive javascript program. Some basic instructions:

The Apply B678/S345678 button does what youd expect: it applies

B678/S345678 to the currently displayed grid. It iterates the automaton 20 times in
an animation.
The Apply B5678/S5678 button applies the smoothing automaton, but it does so
only once, allowing the user to control the degree of smoothing at the specific
resolution level.
The Increase Resolution button splits each cell into four, and may be applied until
the cell size is down to a single pixel.
The Reset button resets the entire application, creating a new random grid.

We used this program to generate a few interesting looking pictures by varying the order in
which we pressed the various buttons (it sounds silly, but its an exploration!). First, a nice

An example of a higher resolution cave created with our program. In order to achieve
similar results, First apply B678/S345678, and then alternate increasing the resolution and
applying B5678/S5678 1-3 times.

We note that this is not perfect. There are some obvious and awkward geometric artifacts
lingering in this map, mostly in the form of awkwardly straight diagonal lines and
awkwardly flawless circles. Perhaps one might imagine the circles are the bases of
stalactites or stalagmites. But on the whole, in terms of keeping the major features of the
original automaton present while smoothing out corners, this author thinks B5678/S5678
has done a phenomenal job. Further to the cellular automatons defense, when the local
properties are applied uniformly across the entire grid, such regularities are bound to occur.
Thats just another statement of the non-chaotic nature of B5678/S5678 (in stark contrast to
Conways Game of Life).
There are various modifications one could perform (or choose not to, depending on the type
of game) to make the result more accessible for the player. For instance, one could remove
all regions which fit inside a sufficiently small circle, or add connections between the
disconnected components at some level of resolution. This would require some sort of
connected-component labeling, which is a nontrivial task; current research goes into
optimizing connected-component algorithms for large-scale grids. We plan to cover such
topics on this blog in the future.

Another example of a cool picture we created with this application might be considered a
more retro style of cave.

Apply S678/B345678 once, and increase the resolution as much as possible before applying
B5678/S5678 as many times as desired.

We encourage the reader to play around with the program to see what other sorts of
creations one can make. As of the time of this writing, changing the initial proportion of
live cells (50%) or changing the automaton rules cannot be done in the browser; it requires
one to modify the source code. We may implement the ability to control these in the
browser given popular demand, but (of course) it would be a wonderful exercise for the
intermediate Javascript programmer.

Caves in Three Dimensions

Its clear that this same method can be extended to a three-dimensional model for
generating caverns in a game like Minecraft. While we havent personally experimented
with three-dimensional cellular automata here on this blog, its far from a new idea. Once
we reach graphics programming on this blog (think: distant future) we plan to revisit the
topic and see what we can do.

Until then!
Share this:

14Share on Facebook (Opens in new window)14

Click to share on Google+ (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Twitter (Opens in new window)

This entry was posted in Algorithms, Discrete and tagged cellular

automata, javascript, mathematica, procedural generation, programming, video games. Bookmark
the permalink.
Post navigation
Dynamic Time Warping for Sequence Comparison
Machine Learning Introduction

24 thoughts on The Cellular Automaton Method for Cave Generation

1. paxinum

July 30, 2012 at 8:57 am Reply

Really nice post! I will try it out in Mathematica.


2. mortoray

August 1, 2012 at 3:35 pm Reply

Whats awkward about circles? They exist in nature, especially in caves (things dripping to
create round stalagmites and eventually form columns). Im actually upset that in video
games the open chambers are always empty, enver having columns or protrusions.


o j2kun

August 1, 2012 at 4:22 pm Reply

Perhaps I should clarify. Its not that I dont like circles, or that they dont belong in
caves. The problem is their regularity in this particular model. The picture I gave
above was lucky in that it did not have too many circles, but if you experiment with
the interactive simulation youll notice that they show up frequently, and in every
single run. These perfect circles are simply a persistent side effect when wed rather
they be a randomly occurring feature. I suppose its largely a matter of perspective,
but at least it gives some insight into the nature of the automaton.


August 2, 2012 at 12:32 am

Is there where a non-binary cell value could help?



August 4, 2012 at 6:36 pm

You could Now were getting into the realm of more general discrete
dynamical systems (which I know absolutely nothing about). I dont quite
have the intuition to design such a system.


o Andrew van der Westhuizen

November 29, 2013 at 1:50 am Reply

That is a great idea, thinking I might look into generating stalagmites or tites
using the basic code I used to generate waterfalls in this
cave. http://www.avanderw.co.za/making-a-cave-like-structure-with-worms/

Basic idea being create half-waterfalls and reverse-waterfalls and call then
stalagmites / stalactites will put aside some time to see what results.


3. Kris

August 5, 2012 at 2:31 pm Reply

As an aside, the Cahn-Hillard equation describes phase separation from a mixure, and
results in a type of tortuous distribution of caves like your CA arrives at. Although I dont
think there is a guarantee that there will be connected path from here to there.


This can be solved in 2D or 3D using Pythonic tools via FiPy.



o j2kun

August 5, 2012 at 3:24 pm Reply

Very nice! I never knew about this.


4. codiecollinge

August 11, 2012 at 7:59 pm Reply

Great post. Did you ever think about this being used for procedural textures as well as map
generation? I feel that B678/S345678 when straight scaled up would be a nice function to
start off with for a procedural texture, the smoothing would also come in handy.
Once again, thanks for the post, Im reading it in the early hours and easily understood it!
Although I think I may read your other posts on cellular automata, seems very interesting.


o j2kun

August 11, 2012 at 11:57 pm Reply

Ive been meaning to derive and implement Perlin noise for a while on this blog,
and use it to do cool textures. Alas, work and research must come first (and Im a
bit of a newbie to graphics, despite my extensive experience in both linear algebra
and C++). So textures is definitely on my list, and Ill keep your comment in mind.


5. Sascha

October 7, 2012 at 3:50 am Reply

Its the same way I did it http://www.coffee2code.blogspot.com check out the

ridiculously big screens


6. xot

October 23, 2012 at 6:36 pm Reply

Nice to see an article about this. I posted a very brief suggestion in CA topic in my forums
that Grard Vichniacs Vote CA could be used for cave generation. Its very similar to
these, but has a couple of interesting features. Its quite dynamic looking as it runs, and the
longer it runs, the more homogeneous it gets. Rather than scaling and smoothing, you just
let it keep running until the features are the size you desire. It does mean a good deal more
computation, but it also results in structures with fewer lattice artifacts.

Grard Vichniacs Vote CA rules: B4678/S4678

An image showing the development of a Vote CA pattern:


o xot

October 24, 2012 at 3:56 pm Reply

Whoops, I messed up my notation. What I was shooting for was the Vote variant
called Anneal or Vote 4/5.



7. YetAnotherPortfolio (@yetanotherportf)

October 26, 2012 at 2:59 pm Reply

Great article!
I have made a simple editor in javascript to play with cellular automaton the way you
describe it here: http://www.yetanotherportfolio.fr/tmp/cellular/index.html
I first made it for me to better understand the automaton things, so its maybe obscure the
way it works. let me now what do you think.


o j2kun

October 26, 2012 at 5:22 pm Reply

Very fun. Did you find anything interesting in your playing?


YetAnotherPortfolio (@yetanotherportf)

October 27, 2012 at 10:26 pm

Applying a b/s345678 on time at the end removed all alone cells. its pretty
useful to clean the map.
Im also trying different set of rules based of you article and what I found in
the comments to see how I can control the overall shapes of the blobs.
Im adding a blobs recognition system, to be able to add doors (or bridges)
betweens each blobs, but its not quite finished.


8. gekkostate

January 10, 2013 at 8:58 pm Reply

Amazing post! I would like to try something like this in Java. I was hoping you could point
in the correct direction? What are the first steps, I should take in learning this?


o j2kun

January 10, 2013 at 9:10 pm Reply

If you already know some Java, check out the JFrame GUI library. That will get you
started on drawing things. The javascript code I used on the demo page might help
you out with the logic.

9. rusyninventions

January 24, 2014 at 9:05 am Reply

I know that this post was written quite some time ago now, but I want to say how much I
love it. Several months ago, it really pointed me in the right direction for an idea I was
tampering with. I started blogging about the experience which is chiefly based on this
article. At present, there is only the first part of the series, but I have already extracted the
demonstration in this first article to be used in a 3D environment with sprawling, randomly
generated terrain to be featured in the followup articles.


o j2kun

January 24, 2014 at 9:20 am Reply

Very impressive! Ive been getting contacted a lot about doing 3D versions of this
idea. I know for a fact its possible, I just lack enough 3D-graphics knowledge and
time to do it. Im interested to see your game as it progresses.



January 24, 2014 at 9:29 am

I suppose I should clarify a bit. I didnt take a true 3D approach as you

mention in this post. I used this same idea but in a 3D world so that it
creates walls. I have not yet blogged about it, but here is the demo video I
recorded last night with its current



January 24, 2014 at 9:40 am

Still very cool. I think I may have to do my 3D experiments using Unity like

you are, but again time is so scarce

10. Dave S.

February 5, 2014 at 10:23 am Reply

rustyinventions I liked your 3D world youtube video.

j2kun Many years ago I played around with Al Hensels lIfe program (V1.06) and like
you, came up with some interesting rules for cave/maze generation. Once I have deciphered
my scribblings Ill put something up on my web site.

Here is an example (after

smoothing) https://www.mediafire.com/convkey/6e68/psnouz8r58ngk066g.jpg


11. Dave S.

February 6, 2014 at 3:43 pm Reply

rusyninventions I read your name wrong sorry about that. Anyway I have
looked at my notes and experimented a little. Will update my website hopefully
sometime next week.

Bezier Curves and Picasso

Posted on May 11, 2013 by j2kun
Pablo Picasso in front of The Kitchen, photo by Herbert List.

Simplicity and the Artist

Some of my favorite of Pablo Picassos works are his line drawings. He did a number of
them about animals: an owl, a camel, a butterfly, etc. This piece called Dog is on my

(Jump to interactive demo where we recreate Dog using the math in this post)

These paintings are extremely simple but somehow strike the viewer as deeply profound.
They give the impression of being quite simple to design and draw. A single stroke of the
hand and a scribbled signature, but what a masterpiece! It simultaneously feels like a hasty
afterthought and a carefully tuned overture to a symphony of elegance. In fact, we know
that Picassos process was deep. For example, in 1945-1946, Picasso made a series of
eleven drawings (lithographs, actually) showing the progression of his rendition of a bull.
The first few are more or less lifelike, but as the series progresses we see the bull boiled
down to its essence, the final painting requiring a mere ten lines. Along the way we see
drawings of a bull that resemble some of Picassos other works (number 9 reminding me of
the sculpture at Daley Center Plaza in Chicago). Read more about the series of lithographs

Picassos, The Bull. Photo taken by Jeremy Kun at the Art Institute of Chicago in 2013.
Click to enlarge.

Now I dont pretend to be a qualified artist (I couldnt draw a bull to save my life), but I can
recognize the mathematical aspects of his paintings, and I can write a damn fine program.
There is one obvious way to consider Picasso-style line drawings as a mathematical object,
and it is essentially the Bezier curve. Lets study the theory behind Bezier curves, and then
write a program to draw them. The mathematics involved requires no background
knowledge beyond basic algebra with polynomials, and well do our best to keep the
discussion low-tech. Then well explore a very simple algorithm for drawing Bezier curves,
implement it in Javascript, and recreate one of Picassos line drawings as a sequence of
Bezier curves.

The Bezier Curve and Parameterizations

When asked to conjure a curve most people (perhaps plagued by their elementary
mathematics education) will either convulse in fear or draw part of the graph of a
polynomial. While these are fine and dandy curves, they only represent a small fraction of
the world of curves. We are particularly interested in curves which are not part of the graphs
of any functions.

Three French curves.

For instance, a French curve is a physical template used in (manual) sketching to aid the
hand in drawing smooth curves. Tracing the edges of any part of these curves will usually
give you something that is not the graph of a function. Its obvious that we need to
generalize our idea of what a curve is a bit. The problem is that many fields of mathematics
define a curve to mean different things. The curves well be looking at, called Bezier
curves, are a special case of single-parameter polynomial plane curves. This sounds like a
mouthful, but what it means is that the entire curve can be evaluated with two polynomials:
one for the values and one for the values. Both polynomials share the same variable,
which well call , and is evaluated at real numbers.

An example should make this clear. Lets pick two simple polynomials in ,
say and . If we want to find points on this curve, we can just choose
values of and plug them into both equations. For instance, plugging in gives the
point on our curve. Plotting all such values gives a curve that is definitely not the
graph of a function:
But its clear that we can write any single-variable function in this parametric form:
just choose and . So these are really more general objects than regular
old functions (although well only be working with polynomials in this post).

Quickly recapping, a single-parameter polynomial plane curve is defined as a pair of

polynomials in the same variable . Sometimes, if we want to express the whole
gadget in one piece, we can take the coefficients of common powers of and write them
as vectors in the and parts. Using the example above, we can
rewrite it as

Here the coefficients are points (which are the same as vectors) in the plane, and we
represent the function in boldface to emphasize that the output is a point. The linear-
algebraist might recognize that pairs of polynomials form a vector space, and further
combine them as . But for us, thinking of points as coefficients
of a single polynomial is actually better.

We will also restrict our attention to single-parameter polynomial plane curves for which
the variable is allowed to range from zero to one. This might seem like an awkward
restriction, but in fact every finite single-parameter polynomial plane curve can be written
this way (we wont bother too much with the details of how this is done). For the purpose of
brevity, we will henceforth call a single-parameter polynomial plane curve where ranges
from zero to one simply a curve.
Now there are some very nice things we can do with curves. For instance, given any two
points in the plane we can describe the straight line between
them as a curve: . Indeed, at the value is exactly ,
at its exactly , and the equation is a linear polynomial in . Moreover (without
getting too much into the calculus details), the line travels at unit speed from to .
In other words, we can think of as describing the motion of a particle from to over
time, and at time the particle is a quarter of the way there, at time its halfway, etc.
(An example of a straight line which doesnt have unit speed is, e.g. .)

More generally, lets add a third point . We can describe a path which goes from to ,
and is guided by in the middle. This idea of a guiding point is a bit abstract, but
computationally no more difficult. Instead of travelling from one point to another at
constant speed, we want to travel from one line to another at constant speed. That is, call the
two curves describing lines from and , respectively. Then the
curve guided by can be written as a curve

Multiplying this all out gives the formula

We can interpret this again in terms of a particle moving. At the beginning of our curve the
value of is small, and so were sticking quite close to the line As time goes on the
point moves along the line between the points and , which are themselves
moving. This traces out a curve which looks like this

This screenshot was taken from a wonderful demo by data visualization consultant Jason
Davies. It expresses the mathematical idea quite superbly, and one can drag the three points
around to see how it changes the resulting curve. One should play with it for at least five
The entire idea of a Bezier curve is a generalization of this principle: given a
list of points in the plane, we want to describe a curve which travels from the
first point to the last, and is guided in between by the remaining points. A Bezier curve is
a realization of such a curve (a single-parameter polynomial plane curve) which is the
inductive continuation of what we described above: we travel at unit speed from a Bezier
curve defined by the first points in the list to the curve defined by the
last points. The base case is the straight-line segment (or the single point, if you
wish). Formally,

Definition: Given a list of points in the plane we define

the degree Bezier curve recursively as

We call the control points of .

While the concept of travelling at unit speed between two lower-order Bezier curves is the
real heart of the matter (and allows us true computational insight), one can multiply all of
this out (using the formula for binomial coefficients) and get an explicit formula. It is:

And for example, a cubic Bezier curve with control points would have

Higher dimensional Bezier curves can be quite complicated to picture geometrically. For
instance, the following is a fifth-degree Bezier curve (with six control points).
A degree five Bezier curve, credit Wikipedia.

The additional line segments drawn show the recursive nature of the curve. The simplest are
the green points, which travel from control point to control point. Then the blue points
travel on the line segments between green points, the pink travel along the line segments
between blue, the orange between pink, and finally the red point travels along the line
segment between the orange points.

Without the recursive structure of the problem (just seeing the curve) it would be a wonder
how one could actually compute with these things. But as well see, the algorithm for
drawing a Bezier curve is very natural.

Bezier Curves as Data, and de Casteljaus Algorithm

Lets derive and implement the algorithm for painting a Bezier curve to a screen using only
the ability to draw straight lines. For simplicity, well restrict our attention to degree-three
(cubic) Bezier curves. Indeed, every Bezier curve can be written as a combination of cubic
curves via the recursive definition, and in practice cubic curves balance computational
efficiency and expressiveness. All of the code we present in this post will be in Javascript,
and is available on this blogs Github page.

So then a cubic Bezier curve is represented in a program by a list of four points. For

var curve = [[1,2], [5,5], [4,0], [9,3]];

Most graphics libraries (including the HTML5 canvas standard) provide a drawing
primitive that can output Bezier curves given a list of four points. But suppose we arent
given such a function. Suppose that we only have the ability to draw straight lines. How
would one go about drawing an approximation to a Bezier curve? If such an algorithm
exists (it does, and were about to see it) then we could make the approximation so fine that
it is visually indistinguishable from a true Bezier curve.

The key property of Bezier curves that allows us to come up with such an algorithm is the

Any cubic Bezier curve can be split into two, end to end,
which together trace out the same curve as .

Let see exactly how this is done. Let be a cubic Bezier curve with control
points , and lets say we want to split it exactly in half. We notice that the
formula for the curve when we plug in , which is

Moreover, our recursive definition gave us a way to evaluate the point in terms of smaller-
degree curves. But when these are evaluated at 1/2 their formulae are similarly easy to write
down. The picture looks like this:

The green points are the degree one curves, the pink points are the degree two curves, and
the blue point is the cubic curve. We notice that, since each of the curves are evaluated
at , each of these points can be described as the midpoints of points we already
know. So , etc.
In fact, the splitting of the two curves we want is precisely given by these points. That is,
the left half of the curve is given by the curve with control
points , while the right half has control
points .

How can we be completely sure these are the same Bezier curves? Well, theyre just
polynomials. We can compare them for equality by doing a bunch of messy algebra. But
note, since only travels halfway along , to check they are the same is to
equate with , since as ranges from zero to one, ranges from zero to one
half. Likewise, we can compare with .

The algebra is very messy, but doable. As a test of this blogs newest tools, heres a screen
cast of me doing the algebra involved in proving the two curves are identical.

Now that thats settled, we have a nice algorithm for splitting a cubic Bezier (or any Bezier)
into two pieces. In Javascript,

function subdivide(curve) {
var firstMidpoints = midpoints(curve);
var secondMidpoints = midpoints(firstMidpoints);
var thirdMidpoints = midpoints(secondMidpoints);

return [[curve[0], firstMidpoints[0], secondMidpoints[0],

[thirdMidpoints[0], secondMidpoints[1], firstMidpoints[2],

Here curve is a list of four points, as described at the beginning of this section, and the
output is a list of two curves with the correct control points. The midpoints function used
is quite simple, and we include it here for compelteness:

function midpoints(pointList) {
var midpoint = function(p, q) {
return [(p[0] + q[0]) / 2.0, (p[1] + q[1]) / 2.0];

var midpointList = new Array(pointList.length - 1);

for (var i = 0; i < midpointList.length; i++) {
midpointList[i] = midpoint(pointList[i], pointList[i+1]);

return midpointList;

It just accepts as input a list of points and computes their sequential midpoints. So a list
of points is turned into a list of points. As we saw, we need to call this
function times to compute the segmentation of a degree Bezier curve.
As explained earlier, we can keep subdividing our curve over and over until each of the tiny
pieces are basically lines. That is, our function to draw a Bezier curve from the beginning
will be as follows:

function drawCurve(curve, context) {

if (isFlat(curve)) {
drawSegments(curve, context);
} else {
var pieces = subdivide(curve);
drawCurve(pieces[0], context);
drawCurve(pieces[1], context);

In words, as long as the curve isnt flat, we want to subdivide and draw each piece
recursively. If it is flat, then we can simply draw the three line segments of the curve and be
reasonably sure that it will be a good approximation. The context variable sitting there
represents the canvas to be painted to; it must be passed through to the drawSegments
function, which simply paints a straight line to the canvas.

Of course this raises the obvious question: how can we tell if a Bezier curve is flat? There
are many ways to do so. One could compute the angles of deviation (from a straight line) at
each interior control point and add them up. Or one could compute the volume of the
enclosed quadrilateral. However, computing angles and volumes is usually not very nice:
angles take a long time to compute and volumes have stability issues, and the algorithms
which are stable are not very simple. We want a measurement which requires only basic
arithmetic and perhaps a few logical conditions to check.

It turns out there is such a measurement. Its originally attributed to Roger Willcocks, but
its quite simple to derive by hand.

Essentially, we want to measure the flatness of a cubic Bezier curve by computing the
distance of the actual curve at time from where the curve would be at time if the curve
were a straight line.

Formally, given with control points as usual, we can define

the straight-line Bezier cubic as the colossal sum

Theres nothing magical going on here. Were simply giving the Bezier curve with control
points . One should think about this as points which are a 0,
1/3, 2/3, and 1 fraction of the way from to on a straight line.

Then we define the function to be the distance between the two

curves at the same time . The flatness value of is the maximum of over all values of .
If this flatness value is below a certain tolerance level, then we call the curve flat.

With a bit of algebra we can simplify this expression. First, the value of for which the
distance is maximized is the same as when its square is maximized, so we can omit the
square root computation at the end and take that into account when choosing a flatness
Now lets actually write out the difference as a single polynomial. First, we can cancel the
3s in and write the polynomial as

and so is (by collecting coefficients of the like terms )

Factoring out the from both terms and setting

, , we get

Since the maximum of a product is at most the product of the maxima, we can boundthe
above quantity by the product of the two maxes. The reason we want to do this is because
we can easily compute the two maxes separately. It wouldnt be hard to compute the
maximum without splitting things up, but this way ends up with fewer computational steps
for our final algorithm, and the visual result is equally good.

Using some elementary single-variable calculus, the maximum value

of for turns out to be . And the norm of a vector is just the sum
of squares of its components. If and , then the norm above is

And notice: for any real numbers the quantity is exactly the straight
line from to we know so well. The maximum over all between zero and one is
obviously the maximum of the endpoints . So the max of our distance function is
bounded by

And so our condition for being flat is that this bound is smaller than some allowable
tolerance. We may safely factor the 1/16 into this tolerance bound, and so this is enough to
write a function.

function isFlat(curve) {
var tol = 10; // anything below 50 is roughly good-looking

var ax = 3.0*curve[1][0] - 2.0*curve[0][0] - curve[3][0]; ax *= ax;

var ay = 3.0*curve[1][1] - 2.0*curve[0][1] - curve[3][1]; ay *= ay;
var bx = 3.0*curve[2][0] - curve[0][0] - 2.0*curve[3][0]; bx *= bx;
var by = 3.0*curve[2][1] - curve[0][1] - 2.0*curve[3][1]; by *= by;

return (Math.max(ax, bx) + Math.max(ay, by) <= tol);

And there we have it. We write a simple HTML page to access a canvas element and a few
extra helper functions to draw the line segments when the curve is flat enough, and
present the final result in this interactive demonstration (you can perturb the control points).

The picture you see on that page (given below) is my rendition of Picassos Dog drawing

as a sequence of nine Bezier curves. I think the resemblance is uncanny

Picassos Dog, redesigned as a sequence of nine bezier curves.

While we didnt invent the drawing itself (and hence shouldnt attach our signature to it),
we did come up with the representation as a sequence of Bezier curves. It only seems fitting
to present that as the work of art. Here weve distilled the representation down to a single
file: the first line is the dimension of the canvas, and each subsequent line represents a cubic
Bezier curve. Comments are included for readability.
Dog Jeremy Kun, 2013. Click to enlarge.

Because standardizing things seems important, we define a new filetype .bezier, which
has the format given above:

int int
(int) curve
(int) curve

Where the first two ints specify the size of the canvas, the first (optional) int on each line
specifies the width of the stroke, and a curve has the form

[int,int] [int,int] ... [int,int]

If an int is omitted at the beginning of a line, this specifies a width of three pixels.

In a general .bezier file we allow a curve to have arbitrarily many control points, though the
code we gave above does not draw them that generally. As an exercise, write a program
which accepts as input a .bezier file and produces as output an image of the drawing. This
will require an extension of the algorithm above for drawing arbitrary Bezier curves, which
loops its computation of the midpoints and keeps track of which end up in the resulting
subdivision. Alternatively, one could write a program which accepts as input a .bezier file
with only cubic Bezier curves, and produces as output an SVG file of the drawing (SVG
only supports cubic Bezier curves). So a .bezier file is a simplification (fewer features) and
an extension (Bezier curves of arbitrary degree) of an SVG file.

We didnt go as deep into the theory of Bezier curves as we could have. If the reader is
itching for more (and a more calculus-based approach), see this lengthy primer. It contains
practically everything one could want to know about Bezier curves, with nice interactive
demos written in Processing.

Low-Complexity Art
There are some philosophical implications of what weve done today with Picassos Dog.
Previously on this blog weve investigated the idea of low-complexity art, and its quite
relevant here. The thesis is that beautiful art has a small description length, and more
formally the complexity of some object (represented by text) is the length of the shortest
program that outputs that object given no inputs. More on that in our primer on
Kolmogorov complexity. The fact that we can describe Picassos line drawings with a small
number of Bezier curves (and a relatively short program to output the bezier curves) is
supposed to be a deep statement about the beauty of the art itself. Obviously this is very
subjective, but not without its proponents.

There has been a bit of recent interest in computers generating art. For instance, this recent
programming competition (in Dutch) gave the task of generating art similar to the work
of Piet Mondrian. The idea is that the more elegant the algorithm, the higher it would be
scored. The winner used MD5 hashes to generate Mondrian pieces, and there were many
many other impressive examples (the link above has a gallery of submissions).

In our earlier post on low-complexity art, we explored the possibility of representing all
images within a coordinate system involving circles with shaded interiors. But its obvious
that such a coordinate system wouldnt be able to represent Dog with very low
complexity. It seems that Bezier curves are a much more natural system of coordinates.
Some of the advantages include that length of lines and slight perturbations dont affect the
resulting complexity. A cubic Bezier curve can be described by any set of four points, and
more intricate (higher complexity) descriptions of curves require a larger number of
points. Bezier curves can be scaled up arbitrarily, and this doesnt significantly change the
complexity of the curve (although scaling many orders of magnitude will introduce a
logarithmic factor complexity increase, this is quite small). Curves with larger stroke are
slightly more complex than those with smaller stroke, and representing many small sharp
bends require more curves than long, smooth arcs.

On the downside, its not so easy to represent a circle as a Bezier curve. In fact, it is
impossible to do so exactly. Despite the simplicity of this object (its even defined as a
single polynomial, albeit in two variables), the best one can do is approximate it. The same
goes for ellipses. There are actually ways to overcome this (the concept of rational Bezier
curves which are quotients of polynomials), but they add to the inherent complexity of the
drawing algorithm and the approximations using regular Bezier curves are good enough.

And so we define the complexity of a drawing to be the number of bits in its .bezier file
representation. Comments are ignored in this calculation.

The real prize, and what well explore next time, is to find a way to generate art
automatically. That is to do one of two things:

1. Given some sort of seed, write a program that produces a pseudo-random line
2. Given an image, produce a .bezier image which accurately depicts the image as a
line drawing.

We will attempt to explore these possibilities in the follow-up to this post. Depending on
how things go, this may involve some local search algorithms, genetic algorithms, or other

Until then!
Addendum: want to buy a framed print of the source code for Dog? Head over to our
page on Society6.

Share this:

454Share on Facebook (Opens in new window)454

Click to share on Google+ (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Twitter (Opens in new window)

This entry was posted in Algorithms, Design and tagged art, bezier curves, de
Casteljau, graphics, javascript, low-complexity art, math, picasso, programming, svg. Bookmark
the permalink.
Post navigation
Facebook Page, Google+ Community, and Whispers of Guest Posts
Dog Print Available for Sale

18 thoughts on Bezier Curves and Picasso

1. Pixel I/O (@pixelio)

May 11, 2013 at 5:13 pm Reply

Great post! Looking forward to the next one.

An ever so slightly simpler yet still robust test for flatness was discussed here and on

Also of interest, Pyramid Algorithms by Ron Goldman covers Wangs formula (chap.
5.6.3) for determining in advance how many levels of subdivision you need to achieve a
specified degree of flatness. Wangs formula is also discussed in: DEC Paris Research
Laboratory report #1, May 1989. Clearly this approach will be more conservative than
testing each segment.


o j2kun

May 11, 2013 at 5:58 pm Reply

Thats a lot of great info! From what I understand the metric I presented here is
whats used in the Postscript language. Not to say whether that makes it good or
not, but at least its stood the test of time (and engineers).

Pixel I/O (@pixelio)

May 12, 2013 at 11:01 pm

Another good read is here: A History of Curves and Surfaces in CAGD

[ http://kowon.dongseo.ac.kr/~lbg/cagd/history1.pdf ].


2. cpress

May 11, 2013 at 7:16 pm Reply

the href link to this blogs Google code page. is incorrect


3. mbaz

May 11, 2013 at 7:46 pm Reply

What tools did you use to make the video? Its great!


o j2kun

May 11, 2013 at 7:57 pm Reply

Sketchbook for drawing with a Wacom Bamboo tablet, and Screenflow to capture it

all. Made possible by donations to this blog



May 11, 2013 at 8:30 pm

Thanks for your answer, and congratulations on getting enough out of the
blog to afford such awesome tools. Screenflow especially looks great; too
bad I dont use Macs

4. stephanwehner

May 11, 2013 at 10:58 pm Reply

I tried to find out in which year Picasso made the dog drawing. Do you know (since you
have a copy)? A few years after 1957? See http://en.wikipedia.org/wiki/Lump_(dog)

The bull lithographs are from 1945 according to your link.

So my guess is that Bezier curves came after Picassos lines, namely in the sixties,
see http://en.wikipedia.org/wiki/B%C3%A9zier_curve

How stable is your drawing? (Sensitivity to small changes in the numbers)

A related effort to yours, although less numerical, is at http://i-work-in-





o j2kun

May 12, 2013 at 11:58 am Reply

Apparently between 1936 and

1942. http://sapergalleries.com/PicassoLeChienDetail.html

The drawing is quite stable. The nature of Bezier curves makes them stable to small
perturbations of the control points.

In fact, the original inventor of Bezier curves was Paul de Casteljau, and he
published (or made public) his work on Bezier curves in 1959. So its quite amazing
how close together these two ideas are in history.



May 12, 2013 at 5:50 pm

Thanks, I saw that page, but couldnt make out that it also related to the
simple line drawing. So you think it is not a drawing of Lump, the 1957
Theres another sense of closeness, Picasso living in France during those
years, as did, I take it, Bzier and de Casteljau.



5. Frere Loup

May 12, 2013 at 7:20 am Reply

It seems you left out the 3 coefficients in the equation of the cubic Bezier curve?


o j2kun

May 12, 2013 at 11:55 am Reply

Yeah, I found I do that a lot.


6. Peter Gorgson

May 12, 2013 at 8:14 pm Reply

Picasso couldnt have used Bezier curves because they hadnt been invented in the 14th
Century when Picasso was painting.


o j2kun

May 12, 2013 at 8:59 pm Reply

You must certainly be trolling.


7. robert annett (@robert_annett)

May 13, 2013 at 2:47 am Reply

Fantastic article! Its wonderful to see different disciplines being explored at the same time.


8. Tomas

May 21, 2013 at 9:45 am Reply

Great article, it inspired me to make something like it in Racket.

It took me more time that what i wanted, but now anyone can do this in DrRacket:
#lang s-exp (planet tomcoiro/doodle-draw:1:0/lang)
500 500
(180 280 183 268 186 256 189 244)
(191 244 290 244 300 230 339 245)
(340 246 350 290 360 300 355 210)
(353 210 370 207 380 196 375 193)
(375 193 310 220 190 220 164 205)
(164 205 135 194 135 265 153 275)
(153 275 168 275 170 180 150 190)
(149 190 122 214 142 204 85 240)
(86 240 100 247 125 233 140 238)
(show Picassos Dog)

And get a frame with a dog drawed on it.


o j2kun

May 22, 2013 at 1:09 pm Reply

Very cool. Ill have to try this.


9. Ahmed Hossam

June 8, 2016 at 1:27 am Reply

This is amazing! Now I know, what to do, in order to split a Bezier Curve into two
pieces! Could you please recommend me some clear explanations on B-Splines
too?! Thanks!!!!

Making Hybrid Images

Posted on September 29, 2014 by j2kun
The Mona Lisa

Leonardo da Vincis Mona Lisa is one of the most famous paintings of all time. And there
has always been a discussion around her enigmatic smile. He used a trademark Renaissance
technique called sfumato, which involves many thin layers of glaze mixed with subtle
pigments. The striking result is that when you look directly at Mona Lisas smile, it seems
to disappear. But when you look at the background your peripherals see a smiling face.

One could spend decades studying the works of these masters from various perspectives,
but if we want to hone in on the disappearing nature of that smile, mathematics can provide
valuable insights. Indeed, though he may not have known the relationship between his work
and da Vincis, hundreds of years later Salvador Dali did the artists equivalent of
mathematically isolating the problem with his painting, Gala Contemplating the
Mediterranean Sea.
Gala Contemplating the Mediterranean Sea (Salvador Dali, 1976)

Here you see a woman in the foreground, but step back quite far from the picture and there
is a (more or less) clear image of Abraham Lincoln. Here the question of gaze is the blaring
focus of the work. Now of course Dali and da Vinci werent scribbling down equations and
computing integrals; their artistic expression was much less well-defined. But we the
artistically challenged have tools of our own: mathematics, science, and programming.

In 2006 Aude Oliva, Antonio Torralba, and Philippe. G. Schyns used those tools to merge
the distance of Dali and the faded smiles of da Vinci into one cohesive idea. In their 2006
paper they presented the notion of a hybrid image, presented below.
The Mona Lisas of Science

If you look closely, youll see three women, each of which looks the teensiest bit strange,
like they might be trying to suppress a smile, but none of them are smiling. Blur your eyes
or step back a few meters, and they clearly look happy. The effect is quite dramatic. At the
risk of being overly dramatic, these three women are literally modern day versions of Mona
Lisa, the Mona Lisas of Science, if you will.

Another, perhaps more famous version of their technique, since it was more widely
publicized, is their Marilyn Einstein, which up close is Albert Einstein and from far away
is Marilyn Monroe.

Marilyn Einstein

This one gets to the heart of the question of what the eye sees at close range versus long
range. And it turns out that you can address this question (and create brilliant works of art
like the ones above) with some basic Fourier analysis.

Intuitive Fourier analysis (and references)

The basic idea of Fourier analysis is the idea that smooth functions are hard to understand,
and realization of how great it would be if we could decompose them into simpler pieces.
Decomposing complex things into simpler parts is one of the main tools in all of
mathematics, and Fourier analysis is one of the clearest examples of its application.

In particular, the things we care about are functions with specific properties I wont
detail here like smoothness and finiteness. And the building blocks are the complex
exponential functions

where can be any integer. If you have done some linear algebra (and ignore this if you
havent), then I can summarize the idea succinctly by saying the complex exponentials form
an orthonormal basis for the vector space of square-integrable functions.

Back in colloquial language, what the Fourier theorem says is that any function of the kind
we care about can be broken down into (perhaps infinitely many) pieces of this form
called Fourier coefficients (Im abusing the word coefficient here). The way its breaking
down is also pleasingly simple: its a linear combination. Informally that means youre just
adding up all the complex exponentials with specific weights for each one. Mathematically,
the conversion from the function to its Fourier coefficients is called the Fourier
transform, and the set of all Fourier coefficients together is called the Fourier spectrum. So
if you want to learn about your function , or more importantly modify it in some way, you
can inspect and modify its spectrum instead. The reason this is useful is that Fourier
coefficients have very natural interpretations in sound and images, as well see for the latter.

We wrote and the complex exponential as a function of one real variable, but you can
do the same thing for two variables (or a hundred!). And, if youre willing to do some
abusing and ignore the complexness of complex numbers, then you can visualize complex
exponentials in two variables as images of stripes whose orientation and thickness
correspond to two parameters (i.e., the in the offset equation becomes two coefficients).
The video below shows how such complex exponentials can be used to build up an image
of striking detail. The left frame shows which complex exponential is currently being
added, and the right frame shows the layers all put together. I think the result is quite

This just goes to show how powerful da Vincis idea of fine layering is: its as powerful as
possible because it can create any image!

Now for digital images like the one above, everything is finite. So rather than have an
infinitely precise function and a corresponding infinite set of Fourier coefficients, you get a
finite list of sampled values (pixels) and a corresponding grid of Fourier
coefficients. But the important and beautiful theorem is, and I want to emphasize how
groundbreakingly important this is:

If you give me an image (or any function!) I can compute the decomposition
very efficiently.

And the same theorem lets you go the other way: if you give me the decomposition, I can
compute the original functions samples quite easily. The algorithm to do this is called the
Fast Fourier transform, and if any piece of mathematics or computer science has a
legitimate claim to changing the world, its the Fast Fourier transform. Its hard to pinpoint
specific applications, because the transform is so ubiquitous across science and engineering,
but we definitely would not have cell phones, satellites, internet, or electronics anywhere
near as small as we do without the Fourier transform and the ability to compute it quickly.

Constructing hybrid images is one particularly nice example of manipulating the Fourier
spectrum of two images, and then combining them back into a single image. Thats what
well do now.

As a side note, by the nature of brevity, the discussion above is a big disservice to the
mathematics involved. I summarized and abused in ways that mathematicians would object
to. If you want to see a much better treatment of the material, this blog has a long series of
posts developing Fourier transforms and their discrete analogues from scratch.
See our four primers, which lead into the main content posts where we implement the Fast
Fourier transform in Python and use it to apply digital watermarks to an image. Note that in
those posts, as in this one, all of the materials and code used are posted on this blogs
Github page.

High and low frequencies

For images, interpreting ranges of Fourier coefficients is easy to do. You can imagine the
coefficients lying on a grid in the plane like so:

Each dot in this grid corresponds to how intense the Fourier coefficient is. That is, its the
magnitude of the (complex) coefficient of the corresponding complex exponential. Now the
points that are closer to the origin correspond informally to the broad, smooth changes in
the image. These are called low frequency coefficients. And points that are further away
correspond to sharp changes and edges, and are likewise called high frequency
components. So the if you wanted to hybridize two images, youd pick ones with
complementary intensities in these regions. Thats why Einstein (with all his wiry hair and
wrinkles) and Monroe (with smooth features) are such good candidates. Thats also why,
when we layered the Fourier components one by one in the video from earlier, we see the
fuzzy shapes emerge before the fine details.

Moreover, we can extract the high frequency Fourier components by simply removing the
low frequency ones. Its a bit more complicated than that, since you want the transition
from something to nothing to be smooth in sone sense. A proper discussion of this
would go into sampling and the Nyquist frequency, but thats beyond the scope of this post.
Rather, well just define a family of filtering functions without motivation and
observe that they work well.

Definition: The Gaussian filter function with variance and center is the function

It looks like this

image credit Wikipedia

In particular, at zero the function is 1 and it gradually drops to zero as you get farther away.
The parameter controls the rate at which it vanishes, and in the picture above the center is
set to .

Now what well do is take our image, compute its spectrum, and multiply coordinatewise
with a certain Gaussian function. If were trying to get rid of high-frequency components
(called a low-pass filter because it lets the low frequencies through), we can just multiply
the Fourier coefficients directly by the filter values , and if were doing a high-pass
filter we multiply by .
Before we get to the code, heres an example of a low-pass filter. First, take this image of
Marilyn Monroe

Now compute its Fourier transform

Apply the low-pass filter

And reverse the Fourier transform to get an image

In fact, this is a common operation in programs like photoshop for blurring an image (its
called a Gaussian blur for obvious reasons). Heres the python code to do this. You
can download it along with all of the other resources used in making this post on this blogs
Github page.
import numpy
from numpy.fft import fft2, ifft2, fftshift, ifftshift
from scipy import misc
from scipy import ndimage
import math

def makeGaussianFilter(numRows, numCols, sigma, highPass=True):

centerI = int(numRows/2) + 1 if numRows % 2 == 1 else int(numRows/2)
centerJ = int(numCols/2) + 1 if numCols % 2 == 1 else int(numCols/2)

def gaussian(i,j):
coefficient = math.exp(-1.0 * ((i - centerI)**2 + (j - centerJ)**2) /
(2 * sigma**2))
return 1 - coefficient if highPass else coefficient

return numpy.array([[gaussian(i,j) for j in range(numCols)] for i in


def filterDFT(imageMatrix, filterMatrix):

shiftedDFT = fftshift(fft2(imageMatrix))
filteredDFT = shiftedDFT * filterMatrix
return ifft2(ifftshift(filteredDFT))

def lowPass(imageMatrix, sigma):

n,m = imageMatrix.shape
return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma,

def highPass(imageMatrix, sigma):

n,m = imageMatrix.shape
return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma,

if __name__ == "__main__":
marilyn = ndimage.imread("marilyn.png", flatten=True)
lowPassedMarilyn = lowPass(marilyn, 20)
misc.imsave("low-passed-marilyn.png", numpy.real(lowPassedMarilyn))

The first function samples the values from a Gaussian function with the specified
parameters, discretizing the function and storing the values in a matrix. Then
the filterDFT function applies the filter by doing coordinatewise multiplication (note these
are all numpy arrays). We can do the same thing with a high-pass filter, producing the edgy
image below
And if we compute the average of these two images, we basically get back to the original.

So the only difference between this and a hybrid image is that you take the low-passed part
of one image and the high-passed part of another. Then the art is in balancing the
parameters so as to make the averaged image look right. Indeed, with the following picture
of Einstein and the above shot of Monroe, we can get a pretty good recreation of the Oliva-
Torralba-Schyns piece. I think with more tinkering it could be even better (I did barely any
centering/aligning/resizing to the original images).

Albert Einstein, Marilyn Monroe, and their hybridization.

And heres the code for it

def hybridImage(highFreqImg, lowFreqImg, sigmaHigh, sigmaLow):

highPassed = highPass(highFreqImg, sigmaHigh)
lowPassed = lowPass(lowFreqImg, sigmaLow)

return highPassed + lowPassed

Interestingly enough, doing it in reverse doesnt give quite as pleasing results, but it still
technically works. So theres something particularly important that the high-passed image
does have a lot of high-frequency components, and vice versa for the low pass.
You can see some of the other hybrid images Oliva et al constructed over at their web

Next Steps
How can we take this idea further? There are a few avenues I can think of. The most
obvious one would be to see how this extends to video. Could one come up with
generic parameters so that when two videos are hybridized (frame by frame, using this
technique) it is only easy to see one at close distance? Or else, could we apply a three-
dimensional transform to a video and modify that in some principled way? I think one
would not likely find anything astounding, but who knows?

Second would be to look at the many other transforms we have at our disposal. How
does manipulating the spectra of these transforms affect the original image, and can you
make images that are hybridized in senses other than this one?

And finally, can we bring this idea down in dimension to work with one-dimensional
signals? In particular, can we hybridize music? It could usher in a new generation of

mashup songs that sound different depending on whether you wear earmuffs

Until next time!

Share this:

52Share on Facebook (Opens in new window)52

Click to share on Google+ (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Twitter (Opens in new window)

This entry was posted in Design, Linear Algebra and tagged albert einstein, art, design, fourier
analysis, hybrid images, image manipulation, marilyn monroe, mathematics, mona
lisa, programming, python, salvador dali, signal processing. Bookmark the permalink.
Post navigation
Occams Razor and PAC-learning
On the Computational Complexity of MapReduce

6 thoughts on Making Hybrid Images

1. Jonathan

September 29, 2014 at 6:15 pm Reply

In sound, this is an awful lot like what a vocoder does, when its used in music. The low-
frequence envelope is the performers voice, the high-frequency signal comes from the


o Flo Vouin

September 30, 2014 at 2:50 am Reply

@Jonathan: In a vocoder, the spectrum of one signal is used as a filter to alter the

other signal, so its slightly different Mixing the low frequencies of one song
with the high frequencies of another sounds more like what a DJ does when
transitioning between two songs.

A slightly more complex image processing technique, but which is still a lot of fun
is Poisson editing: http://www.cs.jhu.edu/~misha/Fall07/Papers/Perez03.pdf


2. Helder

October 2, 2014 at 3:05 pm Reply

Very nice post!

Have you published the code used to reconstruct the image in the video?
If so, where?

Thank you.

o j2kun

October 2, 2014 at 9:40 pm Reply

Yes, the video was originally made as part of this

post: https://jeremykun.com/2013/12/30/the-two-dimensional-fourier-transform-


3. yboris

November 24, 2015 at 11:19 am Reply

Reblogged this on YBoris.


4. Umair Jameel

June 18, 2016 at 4:09 pm Reply

Just finished my UWP windows 10 hybrid image illusion app. You combine two
images and from the combined image, you see first image when seen from some
distance and see second image at a closer look. Have a look at it.

Markov Chain Monte Carlo Without all the
Posted on April 6, 2015 by j2kun

I have a little secret: I dont like the terminology, notation, and style of writing in statistics.
I find it unnecessarily complicated. This shows up when trying to read about Markov Chain
Monte Carlo methods. Take, for example, the abstract to the Markov Chain Monte Carlo
article in the Encyclopedia of Biostatistics.

Markov chain Monte Carlo (MCMC) is a technique for estimating by simulation the
expectation of a statistic in a complex model. Successive random selections form a Markov
chain, the stationary distribution of which is the target distribution. It is particularly useful
for the evaluation of posterior distributions in complex Bayesian models. In the Metropolis
Hastings algorithm, items are selected from an arbitrary proposal distribution and are
retained or not according to an acceptance rule. The Gibbs sampler is a special case in
which the proposal distributions are conditional distributions of single components of a
vector parameter. Various special cases and applications are considered.

I can only vaguely understand what the author is saying here (and really only because I
know ahead of time what MCMC is). There are certainly references to more advanced
things than what Im going to cover in this post. But it seems very difficult to find an
explanation of Markov Chain Monte Carlo without superfluous jargon. The bullshit here
is the implicit claim of an author that such jargon is needed. Maybe it is to explain advanced
applications (like attempts to do inference in Bayesian networks), but it is certainly not
needed to define or analyze the basic ideas.

So to counter, heres my own explanation of Markov Chain Monte Carlo, inspired by the
treatment of John Hopcroft and Ravi Kannan.

The Problem is Drawing from a Distribution

Markov Chain Monte Carlo is a technique to solve the problem of sampling from a
complicated distribution. Let me explain by the following imaginary scenario. Say I have a
magic box which can estimate probabilities of baby names very well. I can give it a
string like Malcolm and it will tell me the exact probability that you will choose
this name for your next child. So theres a distribution over all names, its very specific
to your preferences, and for the sake of argument say this distribution is fixed and you dont
get to tamper with it.

Now comes the problem: I want to efficiently draw a name from this distribution . This is
the problem that Markov Chain Monte Carlo aims to solve. Why is it a problem? Because I
have no idea what process you use to pick a name, so I cant simulate that process myself.
Heres another method you could try: generate a name uniformly at random, ask the
machine for , and then flip a biased coin with probability and use if the coin lands
heads. The problem with this is that there are exponentially many names! The variable here
is the number of bits needed to write down a name . So either the
probabilities will be exponentially small and Ill be flipping for a very long time to get a
single name, or else there will only be a few names with nonzero probability and it will take
me exponentially many draws to find them. Inefficiency is the death of me.
So this is a serious problem! Lets restate it formally just to be clear.

Definition (The sampling problem): Let be a distribution over a finite set . You are
given black-box access to the probability distribution function which outputs the
probability of drawing according to . Design an efficient randomized
algorithm which outputs an element of so that the probability of outputting is
approximately . More generally, output a sample of elements from drawn according
to .

Assume that has access to only fair random coins, though this allows one to efficiently
simulate flipping a biased coin of any desired probability.

Notice that with such an algorithm wed be able to do things like estimate the expected
value of some random variable . We could take a large sample via the
solution to the sampling problem, and then compute the average value of on that sample.
This is what a Monte Carlo method does when sampling is easy. In fact, the Markov
Chain solution to the sampling problem will allow us to do the sampling and the estimation
of in one fell swoop if you want.

But the core problem is really a sampling problem, and Markov Chain Monte Carlo
would be more accurately called the Markov Chain Sampling Method. So lets see why a
Markov Chain could possibly help us.

Random Walks, the Markov Chain part of MCMC

Markov Chain is essentially a fancy term for a random walk on a graph.

You give me a directed graph , and for each edge you give me
a number . In order to make a random walk make sense, the need to satisfy
the following constraint:

For any vertex , the set all values on outgoing edges must sum to 1, i.e.
form a probability distribution.

If this is satisfied then we can take a random walk on according to the probabilities as
follows: start at some vertex . Then pick an outgoing edge at random according to the
probabilities on the outgoing edges, and follow it to . Repeat if possible.

I say if possible because an arbitrary graph will not necessarily have any outgoing edges
from a given vertex. Well need to impose some additional conditions on the graph in order
to apply random walks to Markov Chain Monte Carlo, but in any case the idea of randomly
walking is well-defined, and we call the whole object a Markov chain.

Here is an example where the vertices in the graph correspond to emotional states.
An example Markov chain; image source http://www.mathcs.emory.edu/~cheung/

In statistics land, they take the state interpretation of a random walk very seriously. They
call the edge probabilities state-to-state transitions.

The main theorem we need to do anything useful with Markov chains is the stationary
distribution theorem (sometimes called the Fundamental Theorem of Markov Chains, and
for good reason). What it says intuitively is that for a very long random walk, the
probability that you end at some vertex is independent of where you started! All of these
probabilities taken together is called the stationary distribution of the random walk, and it is
uniquely determined by the Markov chain.

However, for the reasons we stated above (if possible), the stationary distribution theorem
is not true of every Markov chain. The main property we need is that the
graph is strongly connected. Recall that a directed graph is called connected if, when you
ignore direction, there is a path from every vertex to every other vertex. It is called strongly
connected if you still get paths everywhere when considering direction. If we additionally
require the stupid edge-case-catcher that no edge can have zero probability, then strong
connectivity (of one component of a graph) is equivalent to the following property:

For every vertex , an infinite random walk started at will return to with
probability 1.

In fact it will return infinitely often. This property is called the persistence of the state by
statisticians. I dislike this term because it appears to describe a property of a vertex, when to
me it describes a property of the connected component containing that vertex. In any case,
since in Markov Chain Monte Carlo well be picking the graph to walk on (spoiler!) we will
ensure the graph is strongly connected by design.

Finally, in order to describe the stationary distribution in a more familiar manner (using
linear algebra), we will write the transition probabilities as a matrix where
entry if there is an edge and zero otherwise. Here the rows and
columns correspond to vertices of , and each column forms the probability distribution
of going from state to some other state in one step of the random walk. Note is the
transpose of the weighted adjacency matrix of the directed weighted graph where the
weights are the transition probabilities (the reason I do it this way is because matrix-vector
multiplication will have the matrix on the left instead of the right; see below).

This matrix allows me to describe things nicely using the language of linear algebra. In
particular if you give me a basis vector interpreted as the random walk currently at
vertex , then gives a vector whose -th coordinate is the probability that the random
walk would be at vertex after one more step in the random walk. Likewise, if you give me
a probability distribution over the vertices, then gives a probability vector interpreted
as follows:

If a random walk is in state with probability , then the -th entry of is the probability
that after one more step in the random walk you get to vertex .

Interpreted this way, the stationary distribution is a probability distribution such

that , in other words is an eigenvector of with eigenvalue 1.

A quick side note for avid readers of this blog: this analysis of a random walk is exactly
what we did back in the early days of this blog when we studied the PageRank algorithm for
ranking webpages. There we called the matrix a web matrix, did random walks on it,
and found a special eigenvalue whose eigenvector was a stationary distribution that we
used to rank web pages (this used something called the Perron-Frobenius theorem, which
says a random-walk matrix has that special eigenvector). There we described an algorithm
to actually find that eigenvector by iteratively multiplying . The following theorem is
essentially a variant of this algorithm but works under weaker conditions; for the web
matrix we added additional fake edges that give the needed stronger conditions.

Theorem: Let be a strongly connected graph with associated edge

probabilities forming a Markov chain. For a probability vector ,
define for all , and let be the long-term average .

1. There is a unique probability vector with .

2. For all , the limit .

Proof. Since is a probability vector we just want to show that as .

Indeed, we can expand this quantity as

But are unit vectors, so their difference is at most 2, meaning .

Now its clear that this does not depend on . For uniqueness we will cop out and appeal to
the Perron-Frobenius theorem that says any matrix of this form has a unique such
(normalized) eigenvector.
One additional remark is that, in addition to computing the stationary distribution by
actually computing this average or using an eigensolver, one can analytically solve for it as
the inverse of a particular matrix. Define , where is the identity
matrix. Let be with a row of ones appended to the bottom and
the topmost row removed. Then one can show (quite opaquely) that the last column
of is . We leave this as an exercise to the reader, because Im pretty sure nobody uses
this method in practice.

One final remark is about why we need to take an average over all our in the
theorem above. There is an extra technical condition one can add to strong connectivity,
called aperiodicity, which allows one to beef up the theorem so that itself converges to
the stationary distribution. Rigorously, aperiodicity is the property that, regardless of where
you start your random walk, after some sufficiently large number of steps the random
walk has a positive probability of being at every vertex at every subsequent step. As an
example of a graph where aperiodicity fails: an undirected cycle on an even number
of vertices. In that case there will only be a positive probability of being at certain vertices
every other step, and averaging those two long term sequences gives the actual stationary

Image source: Wikipedia

One way to guarantee that your Markov chain is aperiodic is to ensure there is a positive
probability of staying at any vertex. I.e., that your graph has a self-loop. This is what well
do in the next section.

Constructing a graph to walk on

Recall that the problem were trying to solve is to draw from a distribution over a finite
set with probability function . The MCMC method is to construct a Markov
chain whose stationary distribution is exactly , even when you just have black-box access
to evaluating . That is, you (implicitly) pick a graph and (implicitly) choose transition
probabilities for the edges to make the stationary distribution . Then you take a long
enough random walk on and output the corresponding to whatever state you land on.

The easy part is coming up with a graph that has the right stationary distribution (in fact,
most graphs will work). The hard part is to come up with a graph where you can prove
that the convergence of a random walk to the stationary distribution is fast in comparison to
the size of . Such a proof is beyond the scope of this post, but the right choice of a
graph is not hard to understand.

The one well pick for this post is called the Metropolis-Hastings algorithm. The input is
your black-box access to , and the output is a set of rules that implicitly define a
random walk on a graph whose vertex set is .

It works as follows: you pick some way to put on a lattice, so that each state corresponds
to some vector in . Then you add (two-way directed) edges to all neighboring
lattice points. For it would look like this:

Image credit http://www.ams.org/samplings/feature-column/fcarc-taxi

And for it would look like this:

Image credit http://www.chem.latech.edu/~upali/

You have to be careful here to ensure the vertices you choose for are not disconnected,
but in many applications is naturally already a lattice.
Now we have to describe the transition probabilities. Let be the maximum degree of a
vertex in this lattice ( ). Suppose were at vertex and we want to know where to go
next. We do the following:

1. Pick neighbor with probability (there is some chance to stay at ).

2. If you picked neighbor and then deterministically go to .
3. Otherwise, , and you go to with probability .

We can state the probability weight on edge more compactly as

It is easy to check that this is indeed a probability distribution for each vertex . So we just
have to show that is the stationary distribution for this random walk.

Heres a fact to do that: if a probability distribution with entries for each has
the property that for all , the is the stationary distribution.
To prove it, fix and take the sum of both sides of that equation over all . The result is
exactly the equation , which is the same as . Since the
stationary distribution is the unique vector satisfying this equation, has to be it.

Doing this with out chosen is easy, since and are both equal
to by applying a tiny bit of algebra to the definition. So were done! One
can just randomly walk according to these probabilities and get a sample.

Last words
The last thing I want to say about MCMC is to show that you can estimate the expected
value of a function simultaneously while random-walking through your Metropolis-
Hastings graph (or any graph whose stationary distribution is ). By definition the
expected value of is .

Now what we can do is compute the average value of just among those states weve
visited during our random walk. With a little bit of extra work you can show that this
quantity will converge to the true expected value of at about the same time that the
random walk converges to the stationary distribution. (Here the about means were off
by a constant factor depending on ). In order to prove this you need some extra tools Im
too lazy to write about in this post, but the point is that it works.

The reason I did not start by describing MCMC in terms of estimating the expected value of
a function is because the core problem is a sampling problem. Moreover, there are many
applications of MCMC that need nothing more than a sample. For example, MCMC can be
used to estimate the volume of an arbitrary (maybe high dimensional) convex set. See these
lecture notes of Alistair Sinclair for more.
If demand is popular enough, I could implement the Metropolis-Hastings algorithm in code
(it wouldnt be industry-strength, but perhaps illuminating? Im not so sure).

Until next time!

Share this:

753Share on Facebook (Opens in new window)753

Click to share on Google+ (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Twitter (Opens in new window)

This entry was posted in Algorithms, Graph Theory, Linear Algebra, Probability Theory and
tagged markov chain, mathematics, MCMC, monte carlo, random walk. Bookmark the permalink.
Post navigation
The Codes of Solomon, Reed, and Muller
The Many Faces of Set Cover

42 thoughts on Markov Chain Monte Carlo Without all the Bullshit

1. Ben Buckley

April 6, 2015 at 3:13 pm Reply

Its not immediately obvious to me how this helps with our baby name blackbox. I assume
Im missing something important.

My understanding is that, in the graph, each state would correspond to some name, where n
= 26 (letters in the alphabet) and d = 7 (just to keep things simple) so that MALCOLM is
one of the states. Wont the states neighbours be crazy strings like JALCOLM and
MALCZLM for which the blackbox should return zero, and p(j)/p(i) is always zero? So,
if I do a walk on the graph, how am I supposed to leave the state MALCOLM?

Liked by 1 person

o j2kun

April 6, 2015 at 3:27 pm Reply

This is a good observation, and youre right. Unfortunately this is a problem

dependant issue, and in particular for names there is nothing stopping someone from
making up a name like Jalcom. So the issue is finding a way to map names to grid
vertices in a sensible way. I dont know of a simple way to do that off the top of my
head without a given enumeration of all legal names.

o paulie

August 30, 2017 at 7:57 am Reply

Thank you and God bless you for the inspiration we named our sweet baby boy


2. ZL

April 6, 2015 at 4:23 pm Reply

If demand is popular enough, I could implement the Metropolis-Hastings algorithm in

code yes please


o Hugle (@wulong3)

April 10, 2015 at 6:12 am Reply

There is a working version in


Liked by 1 person

3. gt

April 6, 2015 at 11:51 pm Reply

Strictly speaking your theorem also requires the state space to be finite: a simple M/M/1
exploding queue will serve as a counter example. Having said that, your original
motivation was MCMC on a finite state set X, so perhaps this is implicit.

Liked by 1 person

4. Amnon Harel

April 7, 2015 at 3:18 am Reply

Thats quite a straw man, in the introduction. Not only full of jargon but after two badly
chosen and inexact sentences it moves on to specific usages and implementations that do
not belong in an introduction, without spelling out, e.g. what is a Monte Carlo?. This web
page gives a very nice introduction to Markov Chain sampling. But the title is Markov
Chain Monte Carlo, and all the basic concepts of the Monte Carlo method are missing. To
be sure, they are readily available elsewhere:

Still, I would get rid of expectation values. Integration is more accurate, basic, general,
and communicative to everyone who went through a calculus course and encountered the
fact that sometimes integrals are hard.

Liked by 1 person

o j2kun

April 7, 2015 at 8:56 am Reply

Im not sure what to say. Any search for Markov chain sampling method gives
you results for MCMC, or a scientific paper about dirichlet distributions. And the
core of any Monte Carlo method is sampling, regardless of whether you use the
sample to estimate an integral.


5. Richard

April 7, 2015 at 7:53 am Reply

Your explanation is great. Thanks for writing this.

One small comment, though: In your definition of the sampling problem you use f both as a
probability density function and as a random variable, and that was a little confusing. It
would be very helpful (at least for me) if you used different symbols here (assuming they
are meant to be different?).


6. Josh

April 7, 2015 at 8:16 pm Reply

Nice post! And Ive found many of your other primers very helpful as well.

Quick question: you wrote that a markov chain is essentially a random walk on a graph. In
many important situations, however, we define markov chains on continuous state spaces,
and Im not sure I see how that fits into the framework you described. Can markov chains
on continuous state spaces be interpreted as random walks on (implicit) graphs?

Also, a perhaps clearer introduction to MCMC than the one you cited is in chapter 29 of
David MacKays book: http://www.inference.phy.cam.ac.uk/itprnn/book.html
Liked by 1 person

o j2kun

April 7, 2015 at 8:34 pm Reply

You can define graphs on continuous state spaces just fine, and just as you would
for a usual Markov chain you can define the transitions implicitly and talk about
densities as integrals, etc.


o j2kun

April 7, 2015 at 8:36 pm Reply

And yes that text does appear to have a great treatment of the subject.


7. Ian Mallett

April 7, 2015 at 10:16 pm Reply

Very nice article; thanks!

One minor clarification: you go to j with probability p_j / p_i. |-> you go to j with
probability p(j) / p(i).


o j2kun

April 7, 2015 at 10:39 pm Reply

Fixed, thanks!


8. Tyson Williams

April 14, 2015 at 9:48 pm Reply

Nice post. I completely agree that most explanations of MCMC are too jargon dense. Your
treatment here is great.
Given that your opening example was picking baby names, I was anxiously looking forward
to how you were going to define that two name are adjacent in the state graph. I became

disappointed when I read you pick some way to put X on a lattice


o j2kun

April 14, 2015 at 10:59 pm Reply

For what its worth, Im pretty sure the set of baby names is sparse in the set of all
strings, but yeah its a cop out.


9. Andreas Eckleder

April 16, 2015 at 2:41 am Reply

I think from your description it is not immediately clear that your black box does not have a
finite vocabulary of names but the name is really an arbitrary string. I think it would make
understanding this great article a lot easier if you explicitly mentioned that out at the


o j2kun

April 16, 2015 at 8:33 am Reply

It is finite.


10. Nick

April 20, 2015 at 4:40 pm Reply

A bit of a thought about the statement that a graph being strongly connected is equivalent to
For every vertex v \in V(G), an infinite random walk started at v will return to v with
probability 1.

I can see how the former implies the latter, but without also requiring at least connectedness
already, the later does not seem to imply the former. Consider a graph with more than one
vertex, where each vertex has exactly one edge which connects to itself with probability 1.
It definitely satisfies the latter property, but is also not strongly connected, or even
connected at all.

The property for every pair of vertices u,v \in V(G), and infinite random walk started at u
will pass through v with probability 1. would imply strong connectedness, and I think,
though I havent worked out the proof, that strong connectedness and all edge probabilities
positive would imply it.

Am I going horribly wrong here? Is equivalent only being used as a one directional
implication rather than as an iff?


o j2kun

April 20, 2015 at 9:02 pm Reply

Youre right, I was being unclear. The confusion is because what the statistics
community calls persistent (which is the definition you quoted) really means the
connected component containing a vertex is strongly connected. Its sort of silly
because for any Markov chain you assume youre working with a connected graph
(in which case persistent means the whole graph is strongly connected), because to
analyze a graph which is a union of connected components you just analyze the
connected components one by one. I have updated the text to reflect this.


11. Lorand

April 24, 2015 at 4:07 am Reply

Hi, thanks for the article!

It is not clear to me how to choose the transition probabilities from one state (name) to
another state.
Also, is it important which neighbors a state has, or can i just randomly assign names to
verices in the lattice (lets say i would have 100 names and assign them randomly to a
lattice of 1010)?


12. isomorphismes

April 26, 2015 at 12:44 am Reply

Agree. Statisticians often let notation get the better of them.

o isomorphismes

April 26, 2015 at 12:45 am Reply

*notation and verbiage


13. Evan

June 20, 2015 at 3:45 pm Reply

In the last paragraph of the Constructing a Graph to Walk On section, I think theres a
small error in this sentence:

Doing this with out chosen p(i) is easy, since p(i)p_{i,j} and p(i)p_{j,i} are both equal to
\frac1r \min(p(i), p(j)) by applying a tiny bit of algebra to the definition.

I think p(i)p_{i,j} and p(i)p_{j,i} is meant to be p(i)p_{i,j} and p(j)p_{j,i} (note the i
replaced by j as the argument to the second p()).

Thanks for the great article!


14. asmageddon

June 27, 2015 at 3:23 pm Reply

> without all the bullshit

> mathematical notation everywhere
No thanks.


15. Samchappelle

August 6, 2015 at 11:23 pm Reply

Reblogged this on schapshow.


16. Tann
January 12, 2016 at 3:37 am Reply

Glad I didnt get the version *with* all the bullshit. This is hard enough

Liked by 1 person

17. Navaneethan

March 17, 2016 at 12:46 pm Reply

I had a question about MCMC that Im finding hard to answer. If the idea is to sample from
a complicated distribution, how do you know that youve produced a representative sample?
Is there a property of MCMC that ensures that the sample is representative? If not, isnt that
a huge weakness of this framework?


o j2kun

March 17, 2016 at 1:13 pm Reply

Yes, in fact that is the entire point, and and I gave a mathematical proof that it
works in the post.


18. mariusagm

August 12, 2016 at 9:55 pm Reply

Reblogged this on Marius.


19. thweealc

September 20, 2016 at 1:02 am Reply

should (u,v) element E correctly be (u,v) element V?


20. Cindy Yeh

September 24, 2016 at 1:41 pm Reply

Thank you, I needed help to understand what a MCMC is, after listening to Ed Vuls talk
about cognitive biases and trying to model how reasoning works and seeing what biases it
explains. I thought it was a very interesting talk,
here https://www.youtube.com/watch?v=eSq_80TfUO0 Plan to read more of your blog
posts. Thanks very much.


21. Boris Jensen

April 6, 2017 at 5:35 am Reply

Thanks for the nice writeup.

Doing this with out chosen p(i) is easy, since p(i)p_{i,j} and p(i)p_{j,i} are both equal to
\frac1r \min(p(i), p(j))
Dont you mean p(i)p_{i,j} and p(j)p_{j,i}? Or am I misunderstanding something?


22. Richard

April 19, 2017 at 3:46 pm Reply

It is more general than this right? The black box is some constant k times p(x)?
So you only need to know the proportions of the probabilities via the black box


23. compostbox

May 4, 2017 at 9:37 pm Reply

Thanks for the good writing.

However, there is one thing not obvious to me. Why the lattice thing is necessary at all in
Metropolis-Hastings? Imagine a fully linked graph where you can move to any state from
any state. It seems to me everything will still work. Put it another way, its just that the r
will now equal to the number of size of X minus 1, and notice in your proof the exact value
of r does not matter.


o j2kun

May 4, 2017 at 10:06 pm Reply

This is correct, however, often the size of X is exponentially large, and so a
complete graph will not be tractable.

For example, suppose your state space is the integer grid . You may want
your algorithm to run in time polynomial in , but there are states. This is why I
brought up the example of names, since it is also an exponentially large state space.


24. allenhw

May 8, 2017 at 11:43 am Reply

I first wanted to thank you and great job for getting ideas across clearly. It was super

I have one question; whats the advantage of MCMC over simpler algorithms using RNG?
For example, you can simulate a dice roll by dividing [0.1] into 6 subspaces, generating a
random number x in [0,1], and outputting result based which subspace x falls into. This can
be done as long as we have p(x) for all X.


o j2kun

May 8, 2017 at 12:27 pm Reply

MCMC is used when the number of possible outputs is exponentially large. To see
this, imagine your proposed die had 2^50 sides, each of which had a slightly
different probability of occurring (and there is no discernible pattern or formula to
tell you p(x) for a given x, you just have to ask the black box p(x) to give you a
value). How does your algorithm break down in this scenario? How long does it
take to simulate a single die roll on average?


25. Rob Forgione

June 6, 2017 at 11:31 am Reply

Thanks for writing this! Do you mind explaining why x_t and x_0 are unit vectors in your
proof of the eigenvector theorem? I was under the impression that each x vector is a
probability vector, which means it would sum to 1 but not necessarily have length 1. Any
help/clarification is appreciated thanks!

26. Matt

June 15, 2017 at 7:47 am Reply

Thanks for writing this. It was very helpful to understand the basic idea. Unfortunately, I
have some troubles to understand the last step (Constructing a graph to walk on). Could
you maybe explain on a simple example how you build up the correct graph?

Maybe for the following example:

Let us assume
X = {A,B,C}
p(A) = 2/10, p(B) = 5/10, p(C) = 3/10.
This means our stationary distribution should be
p = [2/10, 5/10, 3/10] (correct me if I am wrong).

How would you now build up the correct Graph?

What would in this case be n and d you used to build the lattice?
How would the lattice look like?

Thanks for your answers


o j2kun

June 15, 2017 at 8:15 am Reply

In your example, the number of nodes in the graph is only 3, which is so small that
any connected graph should work, if Im not mistaken.

In general, there is no correct answer. You want a graph which is sparse, but also
has high connectivity properties, meaning you need to remove many edges in order
to disconnect the graph. Graphs that contain only one or two shortest paths between
two nodes would have bad mixing. Grid graphs tend to do well, but I dont know
how to pick the parameters in a principled way. There is also a theory of expander
graphs that is closely related, and you may want to look up that literature.



June 15, 2017 at 8:39 am

Maybe I am little bit more confused than I thought. Have I
understand the procedure correct?

We need to find a Graph that has the stationary distribution

p = [2/10, 5/10, 3/10]
This is the same like saying: We need to find a Matrix A with
Eigenvalue 1 and Eigenvector p.
If we have this we could walk on the Graph represented by matrix A.
If we do this a long time the vertex we will end could be used as
generator for random variables probability p.
Is this correct?