David S. Goodsell (Auth.) - Atomic Evidence - Seeing The Molecular Basis of Life-Copernicus (2016)

Atomic Evidence
David S. Goodsell
Atomic Evidence
Seeing the Molecular Basis of Life
David S. Goodsell
The Scripps Research Institute
and RCSB Protein Data Bank
La Jolla, California
USA
ISBN 978-3-319-32508-8 ISBN 978-3-319-32510-1 (eBook)

DOI 10.1007/978-3-319-32510-1
Library of Congress Control Number: 2016943685
Springer International Publishing Switzerland 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the
whole or part of the material is concerned, specically the rights of translation, reprinting,
reuse of illustrations, recitation, broadcasting, reproduction on microlms or in any other
physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter
developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in
this publication does not imply, even in the absence of a specic statement, that such
names are exempt from the relevant protective laws and regulations and therefore free for
general use.
The publisher, the authors and the editors are safe to assume that the advice and
information in this book are believed to be true and accurate at the date of publication.
Neither the publisher nor the authors or the editors give a warranty, express or implied,
with respect to the material contained herein or for any errors or omissions that may have
been made.
Printed on acid-free paper
This Copernicus imprint is published by Springer Nature

The registered company is Springer International Publishing AG Switzerland
V
Preface
Why do I need a new flu shot each year? This book builds on my work with the
Should I be frightened by all the news RCSB Protein Data Bank, where I write a
about bacterial drug resistance? What column each month that highlights
about that new diet I just read about on the atomic structures from the PDB archive.
web? Biomolecular science is increasingly It has been a tremendous gift to have the
important in our everyday life, helping us opportunity to work on the Molecule of
answer questions like these, and giving us the Month, and I gratefully acknowledge
the knowledge to make critical choices Helen Berman, Stephen Burley, Christine
about our diet, our health, and our well- Zardecki, and the entire RCSB team for
ness. How do fireflies light up? Why do their enthusiastic support over the past
plant and animal populations evolve over 15 years.
many generations? Biomolecular science
also allows us to be curious, to look deeper The molecular stories in this book are sup-
into the natural world, and to be inspired ported by a monumental body of work by
by the complex inner workings of life. scientists around the world. Throughout
the book, I have included accession codes
In this book, I will take an evidence-based for structures at the PDB and EMData-
approach to current knowledge about the Bank. You can explore the structures
structure of biomolecules and their place directly at their websites (www.pdb.org
in our lives, inviting us to explore how we and www.emdatabank.org). The database
know what we know and how current entries for each of these structures also
gaps in knowledge may influence our include the primary journal publications
individual approach to the information. that describe the detailed science support-
The book is separated into a series of short ing each structure.
essays that present some of the founda-
tional concepts of biomolecular science, David S. Goodsell
with many examples of the molecules that San Diego, CA, USA
perform the basic functions of life.
VII
Contents
1 The Protein Data Bank ................................................................................................................ 1
2 Seeing Is Believing: Methods of Structure Solution ............................................... 5
3 Visualizing the Invisible World of Molecules ............................................................... 11
4 The Twists and Turns of DNA .................................................................................................. 17
5 The Central Dogma ....................................................................................................................... 25
6 The Secret of Life: The Genetic Code ................................................................................. 33
7 Evolution in Action ........................................................................................................................ 41
8 How Evolution Shapes Proteins............................................................................................ 51
9 The Universe of Protein Folds ................................................................................................ 59
10 Order and Chaos in Protein Structure .............................................................................. 67
11 Molecular Electronics .................................................................................................................. 77
12 Green Energy..................................................................................................................................... 83
13 Peak Performance .......................................................................................................................... 89
14 Cellular Signaling Networks ................................................................................................... 99
15 GPCRs Revealed............................................................................................................................... 107
16 Signaling with Hormones ......................................................................................................... 113
17 Single-Molecule Chemistry: Enzyme Action

and the Transition State............................................................................................................. 121
18 Seven Wonders of the World of Enzymes ....................................................................... 129
19 Building Bodies ............................................................................................................................... 139
20 Coloring the Biological World ................................................................................................ 149
21 Amazing Antibodies ..................................................................................................................... 155
22 Attack and Defense: Weapons of the Immune System .......................................... 163
23 Reconstructing HIV ....................................................................................................................... 171

1 1
The Protein Data Bank

D.S. Goodsell, Atomic Evidence, DOI 10.1007/978-3-319-32510-1_1
2 Chapter 1 The Protein Data Bank
Were very lucky that today we can go to our computers and instantly
1 start exploring a hundred thousand atomic structures of biomole-
cules. The structural biology community has spearheaded a com-
prehensive effort to make the results of biostructural research freely
available to everyone. In 1971, a group of scientists at the Brookhaven
National Laboratory started an archive of atomic structures, called
the Protein Data Bank, as a way to make these structures available.
The first archive contained the seven protein structures that were
available at the time. Today the archive has grown to over a hundred
thousand entries and is managed by centers around the world:
RCSB and BMRB in the USA, PDBe in Europe, and PDBj in Japan.
Together, they have created online interfaces to this massive archive,
providing tools to deposit, curate, find, analyze, and visualize the
structures.
This wasnt always the case, however. In the early days of struc-
tural science, many researchers chose to keep the primary results of
their work, the atomic coordinates, secret. Instead of making these
available, they published only pictures of their structures and
descriptions of their own ideas about the structure and function.
Arguably, this was justified: because these structures require so
much effort to solve, these researchers wanted to have the freedom
to analyze them completely themselves. Many researchers, however,
felt that this policy went against the spirit of science, where results
are made available and may be used by the entire community to
build a more complete picture of our world. And perhaps more
importantly, results need to be made available to allow other
researchers to check their authenticity and reproduce any scientific
insights gained from them.
For this reason, with the support of many researchers, Fred
Richards drafted a letter in 1988 to the major government institutes
funding science, requesting a policy that crystallographic data be
made available, at least for all research supported by public funds.
The effort was ultimately successful, and today, deposition of coor-
dinates and data in a public database is typically a mandatory con-
dition for funding of grants as well as for publication of results in
many prominent journals.
The widespread availability of coordinates has transformed the
study of molecular biology. Each structure is a window into a par-
ticular topic, allowing us to see the atomic details of biomolecular
processes. But that is only the beginning. An entire field of structure-
based drug design has been built upon these structures, allowing
the discovery of new pharmaceuticals to fight everything from HIV
to depression. Comparison of many different structures has led
directly to new insights about the general principles for biomolecu-
lar structure and function and the evolution of these molecules, and
these insights have blossomed into an entire field of protein design
and biotechnology.
Today, we can download atomic structures for nearly any bio-
logical molecule we would be interested in exploring, from tiny hor-
mones to huge viruses (. Fig. 1.1). Most of the illustrations in this
book are created directly from atomic coordinates from the PDB or,
in some cases, from the experimental data supporting the atomic
Chapter 1 The Protein Data Bank
3 1
. Fig. 1.1 Selected structures from the Protein Data Bank. The Protein Data Bank archives atomic structures of
biomolecules such a proteins, DNA, and RNA. A few familiar examples are shown here. Three small molecules are shown
for size comparison: (1) water, (2) glucose, and (3) ATP. Proteins in the blood: (4) antibody, (5) insulin, and (6) brinogen.
Digestive enzymes: (7) lysozyme, (8) pepsin, and (9) amylase. A virus: (10) rhinovirus. Membrane-bound proteins: (11) ATP
synthase, (12) adrenergic receptor and G-protein, (13) potassium ion channel, and (14) photosystem II. A few interesting
proteins: (15) hemoglobin, (16) green uorescent protein, (17) luciferase, and (18) ribulose-bisphosphate carboxylase
oxygenase. Molecules involved in protein synthesis: (19) ribosome, (20) transfer RNA, (21) aminoacyl-tRNA synthetase,
(22) protein chaperone GroEL/GroES, and (23) ubiquitin. A few enzymes: (24) catalase, (25) nitrogenase, and (26) leucine
aminopeptidase. Proteins that bind to DNA: (27) repair protein DNA photolyase, (28) topoisomerase, (29) RNA
polymerase, (30) lac repressor, (31) catabolite gene activator protein, and (32) transcription factor complex. Iron storage
protein (33) ferritin and three enzymes involved in sugar metabolism: (34) hexokinase, (35) phosphofructokinase, and
(36) pyruvate kinase (PDB entries 1igt, 2hiu, 1m1j, 2baf, 1lz1, 5pep, 1smd, 4rhv, 1e79, 1c17, 3sn6, 3lut, 1s5l, 4hhb, 1g,
2d1s, 1rcx, 1j5e, 1jj2, 4tna, 1y, 1aon, 1ubq, 1qqw, 1n2c, 1lap, 1tez, 1a36, 1tlf, 1efa, 1cgp, 1ais, 1hrs, 1dgk, 4pfk, 1a3w)
4 Chapter 1 The Protein Data Bank
. Fig. 1.1 (continued)
structure. For each illustration, I have included the accession code

for the data at the PDB. With this information, you can easily
explore the structure yourself using the tools at one of the PDB sites.
The accession code also allows access to a variety of other informa-
tion about the structure, for instance, the scientists who determined
the structures, journal articles about the structure, and links to
other databases related to the structure. So if a particular topic cap-
tures your interest, make a visit to the PDB to explore the molecules
in more detail!
5 2
Seeing Is Believing:
Methods of Structure
Solution

6 Chapter 2 Seeing Is Believing: Methods of Structure Solution
Scientists are curious people. Were always asking questions and

2 then trying to figure out ways to answer them. This is particularly
tricky with molecular biology. Theres no direct way to see individ-
ual molecules, at least in atomic detail, so were forced to use a
bunch of specialized methods that probe different aspects of the
structure. Then, from this information, we can build up an under-
standing of the molecule and create images of the molecule that are
consistent with the data. Take, for instance, the ribosome (.
Fig. 2.1). Researchers have been working for decades on this elusive
subject, assembling information from many sources to build the
detailed understanding we have today.
All of the methods currently used to determine the atomic
structures of molecules rely on observing many copies of molecule.
For this reason, the first step is to purify the molecule, separating it
from its cellular context. This is a surprisingly big limitation with
these studies, for several reasons. First, we cant really see how it is
acting in the cellwe only observe how it behaves in an artificial,
purified state. Second, a variety of noncellular conditions are often
necessary to stabilize the molecule in its purified state. Fortunately,
in the case of ribosomes, when they are purified and mixed with the
proper partners, they happily go about their task of building pro-
teins, so we have reasonable confidence that they act similarly when
they are in their normal environment in the cell. Finally, once we
have a purified (but still active) molecule, we can bring to bear the
three major techniques for exploring biomolecular structure:
electron microscopy, x-ray crystallography, and nuclear magnetic
resonance (NMR) spectroscopy.
Much of the seminal work on ribosome structure was per-
formed using electron microscopy. It is a satisfyingly visual method,
capturing more or less directly an image of individual ribosomes.
Early studies would spread ribosomes on a surface and stain them
with heavy metals, gathering pictures of the outer shape of the mol-
ecules. Today, a field of molecules are frozen in a thin layer of ice,
and an image is captured. Computer analyses of these many mole-
cules, caught in different orientations, are combined and aligned to
create a three-dimensional map of the molecule. As I write this
book, the field of cryoelectron microscopy is undergoing a techni-
cal revolution, and for some large, well-behaved molecules, this
process gives enough information to determine the location of each
atom in the molecule.
Electron microscopy was used to discover all the basic features
of ribosome structure and function: the shapes of the large and
small subunits, the threading of messenger RNA between them and
the location of the transfer RNA subsites, an exit tunnel out the
back of the large subunit, association of ribosomes with protein
transporters in the endoplasmic reticulum, and many other things.
Today, researchers are using the detailed structures from cryo-EM
to reveal piece by piece each step of protein synthesis and interac-
tions with the many molecules that assist with the process.
X-ray crystallography is the least ambiguous, but perhaps the
most artificial, method for atomic structure determination. A very
pure solution of the molecule is coaxed to form crystals using a
Chapter 2 Seeing Is Believing: Methods of Structure Solution
7 2
. Fig. 2.1 Experimental views of a bacterial ribosome. The upper image

shows a 3D reconstruction from electron microscopy, with the small subunit in
green and the large subunit in blue. The lower image is an atomic structure
from x-ray crystallography and an NMR structure of a exible protein stalk that
is not observed in the crystal structure (PDB entries 4v4q and 1rqv,
EMDataBank entry EMD1110)
variety of unusual methods, such as concentrated solutions of salt

or waxy polyethylene glycol. These crystals are then subjected to an
intense beam of x-rays, which is diffracted into a characteristic pat-
tern of spots by the many identically oriented copies of the mole-
cule inside the crystal. Finally, these spots are analyzed to generate
a three-dimensional map of the location of all of the electrons in the
molecule. From this, the location of each atom is determined, pro-
vided that the crystal and diffraction are of high enough quality.
Crystallography has revealed the inner secrets of the ribosome
in glorious detail. For many years, researchers studied the individ-
ual proteins by crystallography, slowly building up a picture of the
whole molecule. Then, in 2000, three labs presented atomic struc-
tures of the intact ribosomal subunits. One major insight from
these structures was the discovery that the ribosome is a ribozyme,
with one particular nucleotide in the RNA catalyzing the protein-
building reaction. The structures also revealed how the small sub-
unit positions the messenger RNA, the details of the tunnel where
the newly synthesized protein exits from the construction site, and
a host of other interesting details.
NMR spectroscopy captures biological molecules in a more cell-
like environment. A solution of the purified molecule is subjected
to a radio field, and a series of characteristic resonances are
measured. By tailoring the types of fields, information is obtained

2 on the local conformation of the molecular chain, and atoms that
are close to one another may be identified. This information is then
used to create an atomic model of the molecule that is consistent
with all the observations. The complexity of NMR spectra typically
limits the method to smallish proteins and nucleic acids, at least if
entire atomic structures are going to be determined, but NMR
excels at study of flexible molecules, which typically thwart struc-
ture determination by microscopy or crystallography. For instance,
a recent structure of the L7/L12 stalk of the ribosome was solved by
NMR methods, revealing how it changes conformations to organize
the interaction of the ribosome with the many protein factors that
guide each step of protein synthesis.
The structural biology community is currently very excited
about the concept of integrative structural biology. The idea is to
approach large and difficult problems by throwing everything we
have at it. This approach is opening many doors that were previ-
ously closed for study, particularly for large and flexible assemblies.
For instance, the integrative approach has been essential for all
aspects of the study of the ribosome. Electron microscopy was used
for years (and still is) to define the overall shape and evolution of
ribosomes and to discover all of the basic mechanisms of protein
synthesis. The recent atomic structures have revealed the details of
ribosomes and many aspects of the peptide-forming reaction and
interaction with drugs. But the integration of EM and crystallogra-
phy is still essential for defining how the many protein helpers
guide each of the steps and some of the more mobile aspects of the
structure.
The underlying foundation of the scientific method tells us to
question everything, and when we use the results of science, we
always need to be critical. Do the experimental data support the
structures or are we building them based on our biases or imagina-
tion? Are our discoveries about the function of the molecules based
on what we have observed or on our preconceived notions? When
we go to the PDB looking for a structure, we have to watch out for a
few potential pitfalls.
Fortunately, the overall validity of structures in the PDB is not
typically at question. Scientists are highly critical people, and there
are usually at least two or three different groups competing with one
another on a particular topic. We continually question our own
work and that of our competitors, making sure that the results are
supported by evidence. The PDB site, as well, contains a variety of
methods for validating structures and assessing the quality of the
underlying data. For instance, the quality of crystallographic data is
often measured by the resolution of the electron density maps,
which determine how much detail can be seen. Structures in the
PDB range from structures where every atom may be clearly seen to
elusive structures where only the general shape is observed (.
Fig. 2.2).
Each of these experimental methods has distinct advantages but
also characteristic weaknesses. For instance, x-ray crystallography
is typically able to determine very exact positions of heavy atoms
Chapter 2 Seeing Is Believing: Methods of Structure Solution
9 2
. Fig. 2.2 Resolution of crystallographic electron density maps. Three electron density maps of DNA are shown here.
At the upper left is a very high-resolution structure, where every atom is resolved, and we can even see hints of
hydrogen positions. At the lower left is a more typical map, similar in resolution to most of the structures in the PDB. The
overall shape of the bases and backbone, as well as a beautiful hydrated magnesium ion, is easily discernable, but
individual atoms are not resolved. At the right is a low-resolution structure, which is sucient to place the overall shape
of the double helix, but not resolve the individual nucleotides (PDB entries 4hig, 196d, 3gbi, maps taken from the
Uppsala EDS server)
(carbon, nitrogen, oxygen, etc.) in a protein molecule but rarely

resolves the many tiny hydrogen atoms. For this reason, most of the
structures in the PDB are missing their hydrogens, and if they are
important for the study of the molecule, they need to be modeled
based on the known geometry. NMR spectroscopy, on the other
hand, observes the relative location of hydrogen atoms in a struc-
ture and infers much of the remaining structure based on the
known chemical structure of the molecule.
Atomic structures are difficult to determine, and researchers
often have to do drastic things to the molecules they study (. Fig. 2.3).
For instance, flexible molecules are often cut into smaller, more
rigid pieces, and each piece is studied separately. To understand the
function of the whole protein, we then need to reassemble the
pieces in the computer to model the entire assembly. Proteins are
often engineered to make them easier to study, with strings of histi-
dine amino acids that are easy to purify or selenium atoms that have
a distinctive signal in crystallographic experiments. In most cases,
. Fig. 2.3 Pitfalls of the PDB. ATP synthase (left) is a rotary motor with several moving parts. The whole assembly has
not been crystallized yet, but structures have been obtained by cutting it into several more or less rigid pieces.
G-protein-coupled receptors were an elusive target for many years, until researchers engineered a version with an entire
lysozyme protein grafted into one loop. The lysozyme assists in the formation of crystals (right) (PDB entries 1c17, 1e79,
1l2p, 2a7u, 2rh1)
these modifications dont seriously perturb the function of the pro-

tein, but this needs to be validated through experiment to make
sure were getting a biologically relevant view.
Given the evidence-based approach of this book, I will show
only the portions of the molecules that have been observed in
experiment and use a schematic approach to show the portions that
are inferred. Fortunately, science is a forever-growing field, and sci-
entists continue to shed light into these currently shadowy areas.
11 3
Visualizing the Invisible

World of Molecules

12 Chapter 3 Visualizing the Invisible World of Molecules
In my career, I have had the great pleasure to be able to combine two

3 of my interests: science and art. I started my studies at a serendipi-
tous time, when the field of molecular visualization was just getting
off the ground. When I started my studies as a graduate student,
computer graphics was brand new, and those of us who knew how
to use the hardware, and how to write the software to make it work,
had a monopoly on the new technology. Scientists routinely came
to us to create figures for papers, or movies for talks, or just to sit
and explore their molecules. It was a wonderfully exciting timewe
were making things up as we went, developing new methods for
viewing molecules and trying to make them practical enough that
we could use them in research (. Fig. 3.1).
Im happy to say that this has all changed now. Sophisticated
computer graphics hardware is available on everybodys desktop,
and even on our phones, and we have dozens of user-friendly
molecular graphics programs to help us look at our molecules.
Today, researchers produce most of their images themselves,
without needing me to act as middleman between them and their
molecules (. Fig. 3.2).
Computer graphics images are our primary way of exploring
and understanding the structure of biological molecules, and the
pictures we create are the evidence that we use to document our
discoveries. So, it is critically important that we use visual methods
that are accurate and capture relevant aspects of the molecules
structure and function. Over the years, researchers have developed
a number of useful ways to create images of molecules based on the
experimental atomic structures. Initially, these images were created
by clever scientists, often with the help of an artist. Today, nearly all
molecular images are created with computer graphics. This has the
. Fig. 3.1 Some experiments in molecular visualization. Left: the Evans and Sutherland Multipicture System allowed
interactive display of dots and lines and was widely used by crystallographers to interpret their experimental electron
density maps. This image shows a cross section through DNA molecule, with lines to show the bonds between atoms
and dots to show the surface of the molecule. Center: pen plotters were used to create illustrations for journal
publications, where most gures were printed in black and white. This illustration shows all of the sites of interaction
between this DNA molecule and its neighbors in the crystal lattice. We often printed stereopairs like this to provide
(with a little practice) a three-dimensional view. Right: raster images, which are used for almost everything today, were
quite slow when they were rst developed. This illustration of DNA took almost an hour to calculate
Chapter 3 Visualizing the Invisible World of Molecules
13 3
great advantage of creating a picture directly from the experimental
structure, so the image is true to the actual data. The artistry comes
in when we design the best way to capture a particular aspect of our
molecular subjects.
. Fig. 3.2 Modern molecular graphics programs. Dozens of eective programs for molecular graphics are freely
available to explore molecules on your computer. Top: Python Molecular Viewer is a modular molecular graphics
program with many sophisticated methods for displaying molecules, electron density maps, and other aspects of
molecular structure. Bottom: JSmol is the most popular method of embedding molecular graphics into web pages.
For instance, it is used at the RCSB Protein Data Bank site to allow instant viewing of any of the structures stored in the
archive, and as shown here, in the Molecule of the Month column at the RCSB site. Many of the illustrations in the book
were created with these two programs
. Fig. 3.3 Visual representations of biomolecules. These three representations of myoglobin, capturing dierent
aspects of the molecule, were created with JSmol at the RCSB PDB website. Left: a bond diagram shows the atomic
details of oxygen binding to iron. Thin bonds are used for the protein, and balls and sticks are used to show the heme,
iron, oxygen, and an important histidine amino acid. Center: a spacelling diagram shows how the heme ts in a
form-tting pocket. Right: a cartoon diagram shows how the chain folds into a series of alpha helices that surround the
heme (PDB entry 1mbo)
When designing an image, we want to capture important

properties of the molecule in a way that is visually comprehensible.
Atomic structures of biomolecules typically include a list of atoms,
including where each atom is located in space and what type of
atom it is (carbon, nitrogen, etc.). The details of the experiment may
also include additional information, like how much its moving
around or how confident we are that its located where we think it is.
This is the basic information from the experiment. On top of this,
we can add a bunch of chemical knowledge. For instance, the atoms
are bonded together in a specific way, and proteins and nucleic
acids are built of a characteristic set of standard building blocks.
There are also interesting properties of the atoms or the entire mol-
ecule, such as the charge of the atoms or their reactivity. All of these
things may be captured in an image, if they are relevant to the story
we want to tell.
Scientists and artists have experimented with many different
types of images over the years, to highlight these important aspects
of their structures. Three basic types of representationsbond dia-
grams, spacefilling diagrams, and cartoon diagramshave turned
out to be the most useful and popular, each capturing a different
aspect of the molecule (. Fig. 3.3). In this book, I will use different
variations of these three basic types, picking the representation that
I think captures the molecules property the best in each case. Most
modern molecular graphics programs provide all three of these
types of images, as well as a variety of options for customizing them
for a particular subject.
The workhorse of structural research is the bond diagram. In
this representation, the bonds connecting each atom are drawn,
sometimes as thin lines that are fast to display and other times as
cylinders and balls to give more visual cues about how they are
arranged in space. Bond diagrams show the chemical connectivity
of the whole molecule, and with experience, many aspects of the
structure may be understood from the image. A major limitation of
these diagrams, however, is their complexity. Often, it is necessary
Chapter 3 Visualizing the Invisible World of Molecules
15 3
. Fig. 3.4 Common coloring conventions for proteins. Left: hemoglobin is colored by atom type, based on the
scheme developed by Linus Pauling. Center: backbone diagrams often color individual chains dierently, to show
how they assemble together. Right: a rainbow scheme makes it easier to follow the chain from one end to the other
(PDB entry 2hhb)
to view them interactively to allow us to rotate and explore them or,

as I have done in the figure, to include only a close-up of one inter-
esting portion of the molecule.
To understand the overall shape and form of a molecule, and
how it interacts with other molecules, we need a representation of
the size of each atom. Linus Pauling developed a simple approach to
this, by placing a sphere at each atom position with a size that
encloses most of the electrons in the atom. These spacefilling dia-
grams are perfect for understanding the physical bulk of the mole-
cule, but they tend to hide all of the internal structure. Spacefilling
diagrams have always been my favorite way to draw molecules,
because to me, they seem to capture how a molecule might look if
we could actually see one.
Cartoon diagrams are used to simplify the complex structures
of molecules, capturing a few important aspects of their structure.
Two types of cartoons are particularly popular. For DNA, a ladder
diagram was presented in the classic paper by Watson and Crick
and has become the iconic way to represent the molecule. For pro-
teins, a similar approach was popularized by Jane Richardson, using
a cartoon ribbon to schematize the folding of protein chains. These
protein cartoons revolutionized the way we think about proteins
and their evolution by removing distracting detail and highlighting
the underlying architecture of the chains.
We also have a lot of flexibility in choice of colors, since most
biological molecules are colorless and the colors are all made up for
our benefit. A few conventions have appeared over the years
(. Fig. 3.4). For instance, the common scheme of coloring carbon
black, oxygen red, and nitrogen blue is based on a scheme used in an
early set of plastic models developed by Corey, Pauling, and Koltun
and now has become a convention widely used by chemists and
biologists. Protein chains are often colored with rainbow colors,
helping us follow the chain from beginning to end. Often, each sub-
unit of a large complex will be given a different color, allowing us to
see how everything fits together. But if these common approaches
arent effective for highlighting what were trying to show, were
mostly free to use whatever we think makes the point best.
. Fig. 3.5 The danger of defaults. These images show a structure of DNA bound to a sequence-reading antibiotic,
displayed using the default settings in several visualization methods available on the RCSB PDB website. From left to
right, the default static image at the PDB structure summary page, the JSmol default image, and the PV default image.
All of these are great for getting a quick feeling for the structure, but with a small amount of work, you can customize
the image to highlight features of interest. For instance, I created the image on the right in JSmol to highlight the
perfect t of the antibiotic (blue) within the narrow minor groove of the DNA (PDB entry 6bna)
Finally, a bit of a warning and a challenge. Many different

molecular graphics programs are freely available for exploring and
studying molecular structures. These programs provide an
incredible scope for creativity. Most often, the program designer
has chosen a default representation that highlights some aspect of
the structure (. Fig. 3.5). But there are also many menus, or perhaps
scripting capabilities, that allow you to customize the picture to
highlight the features of interest. With a little experience, you can be
generating compelling images in no time. So jump in, play with
some of these programs, and start picturing molecules yourself!
17 4
The Twists and Turns

of DNA

18 Chapter 4 The Twists and Turns of DNA
I'm lucky to be able to say that I have looked at DNA firsthandor

4 as close as we can get to firsthand with current experimental tech-
niques. I did my graduate work with Richard Dickerson at UCLA,
and at the time his lab was interested in sorting out the fine struc-
ture of the DNA helix. A previous graduate student had solved the
structure of a short piece of DNA, 12 nucleotides long, revealing a
complex inner structure to the double helix and an interesting
interaction with the surrounding water. When I joined, the lab was
exploring other pieces of DNA and also the interaction of DNA
with drugs and proteins.
At the time, the basic structure of DNA was well known. Ladder
diagrams were common in textbooks and the popular media, based
largely on the figure that Watson and Crick included in their classic
journal article on structure of the B-DNA double helix. The struc-
ture, famously determined using the experimental data collected by
Rosalind Franklin, was a revelation, revealing how biological infor-
mation is stored and transmitted using a digital sequence of four
nucleotides.
Of course, as an enthusiastic graduate student, I had to try to
reproduce these results as I was (as yet unsuccessfully) trying to get
my own pieces of DNA to crystallize. I purchased a small bottle of
calf thymus DNA, mixed in some water and buffer, and pulled thin
fibers using a glass rod. After mounting these in a jury-rigged
receptacle built from a cut section of Eppendorf tube and two pieces
of Mylar, I put the fibers in our x-ray camera and took a few pic-
tures. After a few refinements to adjust the humidity of the cham-
ber, I was greeted by the familiar X-shaped diffraction pattern and
strong diffraction along the direction of the fiber. I had successfully
followed in the footsteps of the giants.
And so, I went back to my real research. This type of fiber dif-
fraction was perfect for determining the overall structure of the
DNA double helix, but single crystals are needed to see the atomic
details. So instead of using natural DNA from cells, we used small
pieces of synthetic DNA, assembled nucleotide by nucleotide in
the laboratory. The advantage of this approach is that the length of
the DNA, and also the sequence of nucleotides, can be precisely
controlled.
The first atomic structures of B-form DNA were determined
with a piece 12 base pairs long, with a palindromic sequence taken
from the cleavage side of EcoRI, a bacterial restriction enzyme (pal-
indromic sequences are often used in these types of studies, since
you only need to synthesize one strand, and then they all pair up
with each other to form double helices). Several interesting things
were revealed in this structure. First of all, the 12-base-pair helix
was a bit more than a full turn of the double helix, so neighboring
helices locked together rather than stacking on top of one another.
This deformed the DNA, causing a large bend at one end.
True to the serendipitous nature of science, this was taken as a
great benefit instead of a defect and was used to launch an entire
field of study on DNA bending and flexibility. Subsequent work
on other DNA helices showed that these bends most often occur
in places where they close down or open up the grooves of the
Chapter 4 The Twists and Turns of DNA
19 4
DNA, rather than torquing the DNA base pairs along their long
dimension. Later, the lab got to see unbent DNA by synthesizing
pieces of DNA with ten base pairs instead of twelve. These short
pieces are exactly the right length to form one complete turn of
the DNA helix. In the crystals, they all stack perfectly on top of
one another, mimicking an unbroken DNA helix. The regularity
of the helices within these crystals was also much improved, so
the crystals diffract x-rays beautifully, allowing location of each
atom in the helix (. Fig. 4.1).
I worked on several of these DNA structures during my gradu-
ate and postdoctoral work, and they were always a pleasure to solve.
Amazingly, we could guess a lot about the structure after seeing the
first diffraction patterns. As with Rosalind Franklins fiber diffraction
patterns, the diffraction patterns of these little pieces of DNA show,
if you squint a bit, the same X-shaped pattern and strong diffraction
from the stacking of bases. So, once you get your first x-ray picture,
you can figure out how the DNA double helices are arranged in the
crystal (. Fig. 4.2).
These short pieces of DNA have been instrumental in explor-
ing all aspects of DNA structure and function. Under normal cel-
lular conditions, the DNA helix prefers the classic B-helix
structure, which has been observed in hundreds of structures of
DNA by itself and in complexes with drugs and proteins. In spe-
cial cases, however, it shifts into different shapes (. Fig. 4.3). For
instance, when dehydrated with high concentrations of salt, DNA
forms a chunkier helix similar to the form of RNA double helices,
. Fig. 4.1 Packing of DNA in crystals. B-form DNA has almost exactly 10 nucleotides per turn of the double helix.
Early structures were determined using chains that were 12 nucleotides long, so they formed an odd crystal lattice with
the ends of the chains overlapped. Later structures shortened the chain to 10 nucleotides, which stacked beautifully to
simulate a long DNA double helix (PDB entries 1bna, 196d)
. Fig. 4.2 X-ray diraction of DNA. The x-ray diraction image at the upper
left is from bers of natural DNA, and the pattern at the lower left is from a
crystal of a small piece of synthetic DNA. Both show the distinctive pattern of
DNA diraction, with a strong signal above and below the center and an
X-shaped feature closer to the center. The strong diraction is produced by the
regular stacking of bases in the DNA helix (horizontal lines at right), and the
X-shaped pattern is produced by the helical arrangement of the backbones
(diagonal lines at right)
termed the A-helix. Hybrid double helices, with one DNA strand
and one RNA strand, also form this type of helix, for instance,
when HIV is building a DNA strand with its reverse transcriptase.
Some special sequences of DNA, for instance, DNA with alternat-
ing cytosine and guanine nucleotides, can be induced to form a
helix with the opposite handedness, termed the Z-helix. Its still
not known if this is an experimental oddity or if it actually plays a
functional role in cells, for instance, helping to relieve stress in the
double helix when it is pulled apart when being duplicated or cop-
ied. If you search around the PDB, you can also find several exotic
structures of DNA, such as odd X-shaped Holliday junctions
formed during the process of recombination and tough quadru-
plex blocks of guanine bases that may seal off the ends of chromo-
somes in telomeres.
21 4
. Fig. 4.3 A, B, and Z DNA. Early crystallographic structures of the three

forms are included at the top, and idealized models are included at the bottom
(PDB entries 1ana, 1bna, 2dcg)
Atomic structures using short pieces of DNA have revealed the

many ways that DNA interacts with other molecules. Just about
every possible variation has been observed, binding around and
inside the double helix. In my graduate work with Dickerson, the
lab was interested in a class of drugs that bind using a noninvasive
approach. They snuggle into the narrower of the two grooves of the
DNA, binding to the edges of the bases. Based on these structures,
a series of DNA-reading molecules have been designed (dubbed
lexitropsins by Dickerson), with modules that read the different
bases (. Fig. 4.4). Other DNA-binding drugs take a far more
aggressive approach. They contain a portion that looks very similar
to a DNA base, and when they bind, they force their way between
the bases. Several atomic structures have revealed the basis of this
. Fig. 4.4 DNA-binding antibiotics. Lexitropsins bind in the narrow minor

groove of the DNA, reading the edges of the bases, and actinomycin D
intercalates between bases. In both cases, the structures were determined
using short pieces of DNA (334d, 173d)
intercalation. These drugs are typically made as weapons by micro-

organismstheyre quite toxic, because they corrupt the copying
and reading of DNA.
The ways that proteins interact with DNA are even more com-
plex, twisting and bending and unwinding the double helix as nec-
essary (. Fig. 4.5). Many regulatory proteins approach the DNA
and wrap arms around it, reading the edge of the bases to find
regions with the proper sequence. They use all manner of shapes
to do thisscissor-shaped pairs of helices, strings of little mod-
ules organized around zinc atoms, blocky domains that jam into
the grooves, and flexible arms that wrap around. In many cases,
researchers have been forced to clip off these DNA-reading por-
tions for study, since they are part of flexible complexes that are
difficult to study.
Proteins involved in DNA packaging are experts at bending
DNA to fit into small places. In the nuclei of our cells, DNA is
wrapped twice around histone proteins to form compact nucleo-
somes. Long flexible arms extend from the complex and help to
regulate when the DNA needs to be released. In bacterial cells, a
smaller protein, HU, bends the DNA.Both of these proteins interact
mostly with the phosphate groups in the DNA backbone. This
makes sense, since they are generic packaging tools, and dont need
to pay attention to the sequence of bases in the DNA.
23 4
. Fig. 4.5 DNA-binding proteins. (A) Restriction endonuclease EcoRI, (B)

DNA photolyase, (C) RNA polymerase, (D) lac repressor, (E) catabolite gene
repressor protein, (F) TATA-binding protein and transcription factor IIb, (G)
topoisomerase, (H) DNA helicase, (I) DNA polymerase, (J) nucleosome, (K) HU
protein, and (L) single-stranded DNA-binding protein (PDB entries 1eri, 1tez,
2e2i, 1lbh, 1efa, 1cgp, 1ais, 1a36, 4esv, 1tau, 1aoi, 1p51, 3a5u)
DNA repair proteins take an even more aggressive approach.

They scan along the DNA, and when they find a problem, they
wrench out the offending base and repair it. In some cases, they cut
the entire section out. In other cases, they fit the corrupted base into
a form-fitting active site and repair it on the spot. Atomic structures
using short DNA helices with odd bases have captured these repair
proteins in action.
I remember well my very first electron density map of one of
these short pieces of DNA.I sat at the computer graphics screen and
realized that I was actually seeing atoms. Although Im no longer
crystallizing DNA and solving structures myself, it is still possible to
get this firsthand view of the structure of DNA.Most crystallogra-
phers deposit their primary experimental dataa list of how bright
each spot is in the x-ray diffraction patternin the PDB archive.
From this, it is fairly easy to generate an electron density map and
take a personal tour of the atoms in a DNA helix. For instance, to
create the pictures of the data shown in . Fig. 4.6, I calculated the
map at the free online Electron Density Server at Uppsala Universitet
and visualized them with the free Python Molecular Viewer.
. Fig. 4.6 Tour of an electron density map. The upper gure shows the hexagonal crystal lattice, with one DNA helix in
red. Notice that there are large open channels between the helices, which are lled with disordered water molecules
that dont give a strong signal in the electron density. At bottom left, a spine of water molecules (small red spheres) ll
the narrow minor groove. At bottom right is an AT base pair and a calcium ion (purple) surrounded by a coordination
sphere of seven water molecules (PDB entry 158D)
25 5
The Central Dogma

26 Chapter 5 The Central Dogma
The first thing we learn in molecular biology class is the central

5 dogma, presented with almost religious fervor: DNA begets RNA
begets protein. Then, were all properly mortified when we learn of
the heretical ways that viruses corrupt this natural flow of infor-
mation, building RNA from RNA or, even worse, DNA from
RNA.The reverence we feel is well founded; when we look at the
molecular machines of the central dogma, we are looking at the
heart of what keeps us alive, and structural science is revealing
that these molecular machines are truly wonders of the subcellu-
lar world.
DNA replication is the first step: the enzyme DNA polymerase
duplicates the information in a DNA strand to build a comple-
mentary DNA copy. DNA polymerase faces several challenges
when performing this task. First, the copying must be accurate,
since it needs to make an exact copy of the genetic information.
Second, it needs to be highly processive, meaning that it can dupli-
cate huge stretches of DNA without needing to rest. Finally, the
antiparallel orientation of the two strands causes a unique prob-
lem: if the replicative machinery moves only direction down the
double helix while duplicating it, two different methods need to be
employed for the two strands, one going forward and one going
backward.
The accuracy of DNA polymerase is a consequence of the
enzyme itself. It employs several methods to ensure that only the
proper nucleotides are used to build the new DNA strand. First, the
form-fitting active site is shaped to fit only the proper base pairs as
new nucleotides are added. Then, a nearby proofreading site tests
the match, and if its not quite strong enough, it cuts out the offend-
ing nucleotide, making room to add another one. This combina-
tion of precise base pairing with proofreading allows DNA
polymerase to duplicate our entire genome while only making a
handful of errors.
The other two challengesprocessivity and the antiparallel
nature of DNAare solved by including DNA polymerase in part
of a large replisome complex (. Fig. 5.1). The replisome includes a
helicase that separates the two strands, making the bases available
for copying. One strand feeds directly into DNA polymerase. A cir-
cular clamp is attached to the polymerase, locking the strand in
place and ensuring that long stretches may be duplicated without
falling off. The other strand is trickier. Since it is in the opposite
orientation, a large loop is unwound and replicated in pieces, which
are connected up later. This requires additional machinery: a pri-
mase to build little RNA primers to get the new strands started, a
loader to add new clamps and a DNA polymerase for each new
strand, and a special protein to protect the loop of DNA when it is
left exposed.
Atomic structures have been solved for many of the individ-
ual pieces of this replisome, revealing how they work, but the
Chapter 5 The Central Dogma
27 5
. Fig. 5.1 Bacterial replisome. The replisome includes several proteins interconnected
with exible linkers. The DNA is shown in blue, and the newly synthesized strands of DNA
are in white. Helicase (1) separates the two strands, and primase (2) builds a short piece
of RNA (green) on one strand to act as a primer. The clamp loader (3) encircles the DNA
strand with a sliding clamp (4), which improves the processivity of DNA polymerase (5).
DNA synthesis by DNA polymerase on the leading strand (6) proceeds continuously, but
it builds short segments on the lagging strand (7) since the lagging strand is oriented in
the opposite direction. Single-stranded DNA-binding protein (8) protects the lagging
strand, while the DNA copy is being made (image created in collaboration with Jacob
Lewis and Nicholas Dixon, University of Wollongong)
. Fig. 5.2 RNA polymerase. This structure of a yeast RNA polymerase (blue)
includes two strands of DNA (orange) that have been opened up to form a
transcription bubble and a short piece of RNA (red) being transcribed. A
magnesium ion (green) assists with the addition of each new nucleotide to the
growing RNA chain (PDB entry 5c4j)
architecture of the whole complex is still a matter of controversy

and study. The complex is held together by a set of flexible pro-
tein linkers that ensure that all the necessary pieces are nearby
and ready to perform their tasks. The painting included here
brings together information from several experimental sources,
including crystal structures of the main proteins, protein
sequences to estimate the length of the linkers, and electron
micrographs to help determine how to connect everything
together.
The second step in the central dogma is the transcription of
DNA information into a strand of complementary RNA. The
major machine that performs this task, RNA polymerase II, also
uses flexible linkers to assist in its task. The main RNA-
polymerizing portion is a typical enzyme that unwinds the DNA,
slots it into a DNA-fitting groove, and adds new nucleotides one at
a time (. Fig. 5.2). Many structures have been determined for this
portion of the enzyme, capturing it at different steps in the
transcription of RNA from DNA. Sequence analyses of RNA
29 5
. Fig. 5.3 Ribosomes in action. Three atomic structures capture ribosomes (blue and green) in the process of
building a protein chain. Elongation factor Tu (magenta) delivers a new transfer RNA (yellow), pairing its anticodon with
the messenger RNA (red) codon. The ribosome then catalyzes the formation of the peptide bond; the structure at the
center includes two transfer RNA molecules with amino acids attached (bright green) and positioned in the catalytic
site. Finally, elongation factor G (magenta) shifts everything by one codon (toward the right in this illustration), opening
a space for the next transfer RNA (PDB entries 4v5g, 4v5d, 4v5f )
polymerase have revealed that it also includes a long, flexible tail

that acts a bait to capture a variety of enzymes that process the
resulting strand of RNA, for instance, modifying the first nucleo-
tide to form a resistant cap and adding a long string of adenine
nucleotides to protect the other end.
The final step of the central dogma is the most complex, where
the information in RNA is translated to build proteins. A combina-
tion of x-ray crystallography and electron microscopy has captured
the ribosome in many steps of protein synthesis, starting with ini-
tiation of synthesis on a new messenger RNA strand to elongation
of the new protein one amino acid at a time (. Fig. 5.3) and finally
to termination when a stop codon is read. A constellation of pro-
teins and specialized RNA molecules are needed to prepare and
deliver the amino acids needed for each step, and researchers have
studied them one by one, filling out all the pieces to this biomolecu-
lar puzzle.
One of my favorite structures captures an odd corner of the
process of protein synthesis (. Fig. 5.4). Ribosomes occasionally
get stalled when faulty messenger RNA molecules are read, for
instance, for a messenger RNA that has broken and is thus is
missing its stop codon. A special mechanism is used to rescue
these stalled ribosomes when they get stuck at the end of the
truncated chain. A strangely shaped RNA molecule mimics both
a transfer RNA and a messenger RNA, restarting the process and
cleaning up the mess.
Biology is never as simple as we might like, and when Francis
Crick codified this central dogma in 1956, there was already an
indication that the story was richer than this. At the time, they
. Fig. 5.4 Transfer-messenger RNA. Transfer-messenger RNA (top) includes a

portion that mimics a transfer RNA (red) and a portion that mimics a
messenger RNA (magenta), complete with a stop codon. It binds to stalled
ribosomes (bottom), resuming synthesis using its own short message.
Amazingly, this message encodes a small tag that is added to the end of the
truncated protein, signaling to the cell that the protein is faulty and needs to
be destroyed (PDB entries 3iyr, 4tna, 4v6t)
knew that some viruses carry their genome in RNA and thus would
need a machine to create a duplicate RNA strand from an RNA
template. Additional study revealed that other viruses carry their
genome in RNA but create a DNA copy once they get inside the cell
and start wreaking havoc. These retroviruses, such as HIV, use a
reverse transcriptase enzyme to reverse the canonical information
flow (. Fig. 5.5).
Lest it seem that this heretical use of information is limited to
viruses and other evil organisms, we only need to look to our own
cells to find an example of reversed information flow. The enzyme
telomerase contains a small piece of RNA that it uses as a template
to build long repeated sequences of DNA that protect the ends of
31 5
. Fig. 5.5 Reverse transcription. Bacterial DNA polymerase (left) and HIV
reverse transcriptase (right) are both shaped like a hand, with ngers and
thumb that wrap around the nucleic acid strands. DNA polymerase performs
the classic reaction, creating DNA using a DNA template, and reverse
transcriptase performs the unusual reaction of creating DNA using a RNA
template. Both structures were determined with short pieces of DNA (orange)
and/or RNA (red) bound in the active site (PDB entries 1tau, 4pqu)
each of our chromosomes. However, the use of protein as a template

to reverse-translate other molecules is still largely forbidden, how-
ever. There are special hard-wired cases, such as the synthesis of
polyA tails or the construction of small peptide antibiotics, but no
general methods for using protein chains as carriers of information
have been foundat least yet.
33 6
The Secret of Life:

The Genetic Code

34 Chapter 6 The Secret of Life: The Genetic Code
DNA is bursting with information, and were just now at a point in

6 human history where we can take advantage of it. The genetic code,
now that we understand it, is quite straightforward, at least in its
basics: the 20 amino acids in proteins, as well as special instructions
for starting and stopping, are encoded in a linear string of four types
of DNA nucleotides. Once you know the sequence of the DNA, you
know the sequence of the protein. Well, almostthere are still many
oddities and exceptions that add much color and diversity to life.
But at the core, there is the genetic code of codons and anticodons.
The heart of information storage in living cells is the classic pair-
ing of nucleotides. In DNA, cytosine pairs with guanine and thy-
mine pairs with adenine, and in RNA, a small change is made, using
uracil instead of thymine. When the first atomic structures of DNA
were solved, they perfectly confirmed the pairing of bases proposed
by Watson and Crick, and for the bulk of biological information
transfer, these pairings do all the work. Our cellular machinery has
evolved to work perfectly with these pairings as they manage our
genetic information.
The classic A-T and G-C pairings are not the end of the story,
however. As is often the case with biology, many variations have
also evolved to add additional depth to this basic approach. For
instance, if we look at the atomic structure of transfer RNA, we
quickly find some unusual things going on. Most of the bases form
canonical base pairs, but a few odd pairings are needed to stabilize
the functional L-shaped structure of the whole molecule. Often
these odd pairings are enforced by modifying the base, so that the
typical pairing is not even possible (. Fig. 6.1).
. Fig. 6.1 Base pairing in transfer RNA. Transfer RNA is stabilized by many traditional base pairs, such as the two
shown on the left, but it has also evolved to incorporate unusual pairings to stabilize the structure. In the base pair at
the upper right, the adenine base has an extra methyl group, causing it to ip in its interaction with the uracil. In the
triplet at the lower right, a normal A-U base pair is joined by a second adenine (PDB entry 1tra)
Chapter 6 The Secret of Life: The Genetic Code
35 6
Mispairing of bases also plays an essential role during the syn-
thesis of proteins. The 20 amino acids in proteins are encoded by
triplet codons in DNA, along with a few codons used to specify the
end of a protein. Doing the math, we see that there are 64 possible
codons, so there is some degeneracy to the code, and several dif-
ferent codons are used to specify the same amino acid. However, if
we look inside the nucleus, we find that there are only 20 or so
types of transfer RNA that match up the appropriate amino acids
with its codon. This requires some mismatching of the transfer
RNA anticodon with all of these different codons. This is accom-
plished by allowing some wobble in the third position of the
codon, so that different pairings are allowed. When the structures
of ribosomes were solved, it was found that the first two bases in
the codon are tightly controlled by the ribosome, ensuring only
the proper pairing, but the third position is looser, allowing some
wobble (. Fig. 6.2).
The story continues to build from the pairing and mispairing of
bases: many additional levels of information are layered on top of
this. One edge of each DNA base is involved in base pairing, but this
leaves additional hydrogen-bonding groups exposed in the two
grooves of the double helix. These base edges are recognized by the
many proteins that regulate the use of DNA information. These pro-
teins reach into the grooves and feel for specific sequences of DNA
(. Fig. 6.3). Researchers have searched unsuccessfully for a general
code (something akin to the pairing of A with T and G with C) to
understand how these are recognized. Rather, each protein seems to
. Fig. 6.2 Wobble in codon-anticodon pairing. Transfer RNA molecules (shown in red) are able to recognize several
dierent codons by allowing some wobble on the third base. These two structures show the transfer RNA that encodes
phenylalanine paired with the two codons that specify the phenylalanine, UUU and UUC. The traditional base pair is
formed with UUC, and a wobble base pair is formed with UUU. The ribosome was also included in both of these
structures, revealing that it surrounds the bases and monitors the base pairing. The ribosome is not shown here, for
clarity (PDB entries 1ibl, 4v9d)
. Fig. 6.3 DNA recognition by proteins. The basic principles of DNA recognition by proteins were discovered in early
structures of regulatory proteins from bacteriophages, such as the lambda repressor structure shown here (PDB entry
1lmb)
use whatever it needs to recognize the base, often even trapping

water molecules to help with the reading of bases.
In my graduate work, I worked on a project that uses a similar
approach to try to discover DNA-reading molecules that can be
used as drugs for cancer therapy (. Fig. 6.4). These molecules take
a modular approach, adapted from a class of toxic DNA-reading
molecules made by bacteria. These molecules have small molecular
units, 5 or 10 atoms each, that can read the edge of each base. The
trick then is to synthesize custom molecules with these DNA-
reading elements all strung in the right order to target the DNA
sequence of choice. A series of atomic structures have been used to
refine the designs, honing the recognition of each of the four DNA
bases.
The story doesnt stop there. Additional layers of epigenetic
information are added on top of the typical genetic information
encoded in the sequence of nucleotides. A simple example is found
in many bacteria. They contain two enzymes: one that adds methyl
groups to the edge of bases in a particular base sequence and
another that cuts any DNA that doesnt have these methylated
bases. It turns out that this simple two-enzyme system is centrally
important both to the bacterium, and now, to us. In the bacterium,
this is a powerful mechanism to fight infection by viruses. Viruses
inject their genetic material into the bacterium, but since it isnt
labeled with the signature methyl groups, it is quickly destroyed.
For us, these enzymes allow us to cut DNA selectively at one par-
ticular sequence of DNA, and building on this, they have spawned
the entire field of recombinant DNA biotechnology. By looking at
37 6
. Fig. 6.4 Sequence-reading molecules. The toxic bacterial antibiotic netropsin, shown on the left in blue, reads A-T
base pairs in DNA. As shown in the center, it forms hydrogen bonds (green lines) with the base edges, positioning a
carbon atom (star) near the A base. If this base were G instead, it would have an amino group (shown in blue with the
letter N) that would clash with the netropsin carbon atom. Sequence-reading molecules are being designed by
substituting this carbon atom for other atoms, such as nitrogen. Two of these designed molecules typically bind side by
side in the DNA groove, as shown on the right, each reading one of the bases in the base pairs (PDB entries 6bna, 365d)
many bacteria, each with its own signature sequence, we have

gathered a collection of DNA scissors that allow us to cut and paste
DNA sequences into custom-engineered genomes. Using these
molecular tools, were now able to engineer fast-growing organisms
like bacteria and yeast to create useful molecules. For instance,
most of the insulin used for treating diabetes is currently created
this way.
The epigenetic information in our own cells, as you might
expect, is even more complex. We also use modifications like meth-
ylation to mark our DNA, turning unneeded genes off when neces-
sary. Packaging of DNA in nucleosomes is also used to put certain
sequences in storage, and selective modification of the nucleosomes
determines how deep this storage is.
Weve all inherited detailed instructions for building about
20,000 different proteins. Until recently, this information remained
hidden, used every day by each of our cells, but not accessible for
our own personal use. This changed with the availability of rapid
gene-sequencing techniques, allowing us to take control of our own
information. Genetic testing looks for specific mutations in our
genes, so we can predict whether we will be susceptible to genetic
diseases. For instance, mutations in the DNA repair protein BRCA2
(. Figs. 6.5 and 6.6) have been linked to a higher incidence of breast
cancer, so people who have these mutations are counseled to watch
more carefully for warning signs and to reduce other risk factors.
Genetic sequences have also been very useful for forensics, for
identifying individuals from the traces of DNA that they leave at the
scene of a crime. For this, we need to analyze tiny amounts of
DNA.This is typically done by creating many duplicate copies of
the DNA, so that there is enough to sequence and analyze. To assist
. Fig. 6.5 DNA modication. In the upper structure, HhaI methylase is captured in the process of adding a methyl
group to a short piece of DNA. The enzyme has ipped the base out of the double helix and is using a cofactor (in green)
to donate the methyl group. In the lower structures, EcoRV endonuclease is caught before and after its cleavage
reaction. In the structure with cleaved DNA, the site of cleavage is shown with two stars (PDB entries 1mht, 1rva, 1rvc)
. Fig. 6.6 BRCA2. BRCA2 is a huge, exible protein involved in DNA repair. Several structures of dierent portions of
the protein were used to assemble this illustration of the protein bound to a single strand of DNA (red) and the repair
protein Rad51 (blue) (PDB entries 1miu, 1n0w)
with this task, an inordinately useful enzyme was found in a bacte-

rium that lives in boiling hot springs (7 shown in Fig. 5.5). The
DNA polymerase of this bacterium is highly stable and has evolved
to work at high temperatures. This is perfect for the polymerase
chain reaction, a technique used to create many copies of a desired
DNA strand. The sample is mixed with the polymerase and a bunch
of nucleotides, which creates a duplicate. Then the whole thing is
39 6
. Fig. 6.7 Adenovirus. Adenovirus is composed of an icosahedral protein coat (blue) with long laments at the
vertices (green). The laments help the virus attach to the cells that it infects (PDB entries 1vsz, 1qiu)
heated up, separating the sample strand from the duplicated strand.
Then, the polymerase duplicates both of these. Repeated rounds of
heating and replication create many identical copies of the
DNA.The whole thing is made easier by using this heat-resistant
polymerase, so you dont have to add new enzyme with each round.
Another great hope, building directly on our knowledge of the
genetic code, is gene therapy: the ability to replace faulty genes in
the cells of a patient, curing genetic diseases at their source. Once
weve identified the problem, synthesizing the corrected DNA is
fairly straightforward, but the tricky part of this is finding a way to
get the genes into the afflicted cells. Today, this is primarily done by
creating an engineered virus, such as adenovirus (. Fig. 6.7), that
infects the cell and inserts the therapeutic DNA in the process. In
this way, were taking the reigns from evolution and taking personal
charge of our own genetic information.
41 7
Evolution in Action

42 Chapter 7 Evolution in Action
I can safely say that evolution is a familiar thing to me. After grow-
7 ing up with many visits to the museum and after gathering a small
personal collection of fossil fish and insects trapped in amber, I can
easily imagine a world very different from ours, with dinosaurs
roaming through a forest of giant ferns and giant dragonflies. With
the help of ancient pot shards and stone knives, I can imagine hair-
ier versions of myself discovering fire and hunting mammoths. I
can even imagine tiny cells, newly minted, colonizing the early
Earth and gradually, over millennia, flooding it with oxygen to cre-
ate the world we live in today.
We often take this type of historical view of evolution. Looking
at the similarities and differences between living organisms and
comparing them with fossil remains of ancient organisms, we
reconstruct the gradual changing of life on Earth over millions of
years (. Fig. 7.1). But evolution is continuing today, naturally and
through our own intervention. You only need to visit a rose garden
filled with huge blooms, or compare a wolf with the many different
breeds of dogs being walked in your local park, to see the results of
human-driven evolution. Looking at the atomic structures of bio-
molecules, we can find abundant evidence for the history of evolu-
tion, and we can also watch evolution in action today.
What is evolution? Evolution is a unique process that produces
increasingly better organisms, but without the need for any intelli-
gent intervention. Its no wonder that the theory of evolution caused
so much consternation when proposed by Charles Darwin, since its
so different than anything in our familiar lives. It goes against our
intuition, since were used to planning and designing when we build
things ourselves. But biological evolution takes a less directed, but
highly successful, approach.
For evolution to work, a few things are needed. First, evolution
requires a population of individuals that reproduce to create chil-
drenevolution doesnt work on a single organism, but rather
works over many generations. Next, a source of variation in the
population is needed, with traits that are passed from parents to
children. In natural evolution, this variation is random and happens
through mutation of DNA.Finally, evolution requires a source of
selection that favors the best individuals in the population. Given
these thingsselection of a population that has inheritable varia-
tionsthe population will gradually change and improve as the
best individuals dominate and the weaker ones lose out.
Darwin developed his theory of evolution after observing varia-
tion in populations, such as the natural variations in finches on the
Galapagos Islands or the many types of pigeons bred by fanciers.
Every time I work in the garden, I try to make similar connections
between the shapes of the flowers, figuring out how the differences
might have improved their competitiveness as they were evolving.
Evolution is also firmly in my mind every time I pull weedsI
always feel like Im selecting and evolving a breed of weeds that are
best suited to elude me and my shovel.
Darwin observed these variations in populations of birds, but at
the time, it wasnt known how the variation occurs or how it is
passed to offspring. The discovery of DNA and genetic information
Chapter 7 Evolution in Action
43 7
. Fig. 7.1 Mammoth hemoglobin. Researchers have reconstructed a

hemoglobin molecule from an extinct mammoth, based on DNA gathered
from frozen animals. As you might expect, it is very similar to hemoglobin from
living elephants, but has a few changes (shown in red) that make it more
ecient in the cold climate where the mammoth lived (PDB entry 3vrf )
was the missing piece of the puzzle for understanding the mecha-
nisms of biological evolution. Organisms gain variability through
mutation of their DNA.Natural radioactivity from the environment
or errors in copying the DNA introduce small changes into the
genome, which then cause small changes in the proteins that are
encoded. In some cases, very small changes in the genome can
have large effects in the form and behavior of the organism. Most
cause problems, and scientists have uncovered countless examples
of point mutations that corrupt the function of a protein and lead
to a disease state or death. But in some cases, the mutation leads to
an improved form of a protein, and a competitive benefit for the
organism.
The classic example is sickle cell anemia, which surprisingly is
both a loss-of-function mutation and a beneficial mutation. A
single mutation in the gene for hemoglobin, which changes a
small alanine to a larger leucine, has wide ranging effects. It cre-
ates a small sticky spot on the protein, which causes it to form long
filaments under some conditions. These filaments distort the
blood cells and cause life-threatening circulatory problems. But at
the same time, the filaments inhibit infection by the parasites that
cause malaria, so the mutation also provides a selective advantage
in areas where malaria is a danger, for people that only carry one
gene for the mutated protein. Structures are available for both the
unmutated form and the mutated form of hemoglobin, revealing
how this tiny mutation can induce the formation of fibers (.
Fig. 7.2).
. Fig. 7.2 Sickle cell

7 hemoglobin. The crystal
structure of sickle cell
hemoglobin shows how one
small mutation causes the
protein to form bers. The site
of the mutation is on the
surface of the protein and forms
a sticky spot that associates
with neighboring molecules
(PDB entry 2hbs)
Most of evolution occurs over millennia, as individuals with

useful traits reproduce and dominate populations. For instance, the
sickle cell gene probably arose about 10,000 years ago, at the time
when humans first developed agriculture and started to live in com-
munities and thus were more susceptible to infestations of mosqui-
toes. Since then, the mutation has persisted in populations where it
provides a benefit. However, in some cases, we can watch evolution
in action. For instance, when people infected with HIV are treated
with a single drug, such as AZT or a protease inhibitor, the levels of
virus in the blood drop rapidly as viral growth is halted. But then, in
a matter of days, the levels of virus rise rapidly. When we look at
these viruses, we find that they are a new form, with mutations that
make them resistant to the drug.
HIV evolves quickly because it reproduces quickly, has a huge
population of individuals, and mutates rapidly. There is also a very
strong selection pressure, since the immune system constantly
attacks the virus and the medical community fights it with antiviral
drugs. Structural biology has shown us both the mechanism for this
rapid mutation and the advantages of this rapid evolution for fight-
ing this selection pressure.
45 7
The high rate of mutation is caused by reverse transcriptase
(see . Fig. 5.5), the enzyme that copies the genetic information of
the virus when it infects a cell. When HIV infects a new cell, it car-
ries its genetic information in a short strand of RNA, which
includes enough information to encode its handful of essential
proteins. Inside the cell, reverse transcriptase makes a DNA copy
of the RNA genome, which is then inserted into the cells own
DNA genome, where it directs the formation of many new copies
of the virus. This enzyme is far more prone to making errors than
the cells polymerases, which incorporate proofreading methods to
improve their accuracy.
When treatment with an antiviral drug begins, this error-prone
reverse transcriptase ensures that many mutated forms of the virus
are already circulating within the population and are quickly
selected if they show some resistance to the drug. In a matter of
days, they dominate the population, and the drug becomes useless.
Further mutation selects even more resistant forms. Today, the
most effective mode of treatment is to provide a cocktail of drugs
that attack several HIV proteins at once, playing the odds that there
are no viruses in the population that have drug-resistant mutations
in several genes at once.
This process was directly observed in the laboratory, revealing
the atomic details of an evolving population. Viruses were grown in
a culture of cells and subjected to increasing amounts of an experi-
mental anti-HIV drug. Gradually, over a few weeks, mutant forms
of the virus dominated the population. The first mutation appeared
in the active site. It reduces the size of one amino acid, weakening
the binding of the drug but also weakening the function of the
enzyme in the viral life cycle. Other mutant forms then were
selected in the population. Two additional mutations modify the
mobility of flaps that cover the enzyme active site and help to
restore the enzymes function, and three additional mutations bur-
ied deep inside the protein further tune the activity of the mutant
form, ultimately yielding an active enzyme that is resistant to the
drug (. Fig. 7.3).
Evolution of resistance is also very common in other cases
where antibiotics are used to fight an organism. A few short years
after the discovery of penicillin, bacteria had already evolved mul-
tiple different ways to fight antibiotics: by destroying the drug
directly, by changing the target of the drug, or by pumping the drug
out of the cell before it can do any damage. To make things even
worse, bacteria have ways of sharing these resistant proteins with
other bacteria, by exchanging small circles of DNA that encode the
information for building them. Consequently, bacterial drug resis-
tance is currently one of the major challenges facing the medical
community, and structural biologists are busy characterizing new
targets for the creation of antibiotic drugs.
Fast or slow, evolution has shaped the form of the biological
world. Much of the early history of biology was involved in classify-
ing the diverse organisms living on the Earth and creating a Tree
of Life that relates organisms that are very similar and those that
are more different. The theory of evolution provided a way of
. Fig. 7.3 HIV resistance mutation. Four structures of HIV protease follow the evolution of drug resistance. The
enzyme is composed of two identical chains, so each mutation (shown in red) shows up in two places, on each half of
the complex. The drug (shown in blue) binds in a tunnel-shaped active site, gripped by two aps that close over the top
(PDB entries 2az8, 2az9, 2azb, 2azc)
understanding this tree as a family tree, representing the rise of

organisms from common ancestors. Much of this work was done
using the visible characteristics of the organisms: for instance, by
comparing the number of legs, we find that a fly is more closely
related to a grasshopper than it is to a spider.
Molecular biology allows us to look at this family tree in a much
more quantitative way. We can compare the proteins in different
organisms, and the DNA that encodes them, and estimate how
much time it took for the changes to build up. To do this for many
different organisms, we need to choose a protein that is essential for
all of them, that resists change. The classic example is cytochrome c,
a central protein in energy metabolism. By looking at the similarity
of this protein across organisms, we can build up lineage of our
closest and most distant relatives in the biosphere (. Fig. 7.4).
They may not have called it evolution, but humankind has fid-
dled with evolution for millennia, using selection to breed bigger and
better and tastier plants and animals. This is evolution in the classic
sense. Breeders are the force of selection, culling out the weakest of
the herd and allowing the desirable individuals to dominate.
47 7
. Fig. 7.4 Cytochrome C evolution. A family tree of our ancient ancestors may be created by counting up the
numbers of changes in the proteins found in modern organisms, identifying our close relatives and our distant relations.
Cytochrome c is shown here. Our molecule is in pink, with the bound heme group in bright red. Amino acids that have
changed to chemically similar amino acids are shown in lighter pink in the cytochrome c proteins from other organisms,
and amino acids that change to entirely dierent amino acids are in white (PDB entries 3zcf, 2b4z, 1hrc, 1cyc, 1ycc)
Throughout this process, the breeders rely on random variations to

explore the evolutionary landscape, gradually selecting individuals
with better and better traits.
More recently, scientists have started using evolution in the test
tube to discover molecules with new functions. For instance, SELEX
(selective evolution of ligands by exponential enrichment) is a
remarkably effective way to discover novel RNA and DNA mole-
cules with highly complex functions. The concept is based on natu-
ral evolution. A large population of RNA molecules is synthesized
with random sequences. Then, these are added to the target, the best
ones bind, and all the rest are washed away and discarded. These
RNA molecules are then duplicated, and added to the target again,
under conditions that only allow the best to bind. After several more
rounds of duplication and selection, the best molecules are found.
This technique has been used, for instance, to find molecules that
bind to thrombin, which may be useful for treating blood clotting
diseases (. Fig. 7.5).
As understanding of biological evolution and biomolecular
structure has grown, scientists have also tried their hand at intelli-
gent design. Taking a modular approach to design, scientists have
started with existing pieces from natural biomolecules and then
reconnected them in novel ways. This approach has been used to
design a tetrahedral cage built of protein, by linking together pieces
. Fig. 7.5 Designer molecules. Scientists have used articial evolution to discover small RNA molecules (aptamers)
that bind selectively to thrombin, an enzyme involved in blood clotting. A modular approach was used to design the
protein cage, by linking together existing proteins that form dimers and trimers (PDB entries 1hut, 4i7y, 3vdx)
that associate with known geometries (Fig. 7.5). Scientists have also
built chimeric molecules that combine functions. For instance,
antibody molecules have been tethered to deadly toxins to create
new cancer therapies. The antibody binds specifically to cancer
cells, allowing the toxin to kill it.
Of course, in any discussion of biological evolution, we naturally
find our imagination drifting back to the very beginning. People are
great speculators, and when it comes to events that happened in the
distant past, there are as many theories as there are scientists. The
origin of life is one of these topics that promote much discussion
and much disagreement. Many scientists have worked to gather evi-
dence for processes that could have generated life based solely on
the molecules and conditions that were present on the early Earth.
Many lines of evidence have pointed to RNA as being the first
living molecule on the Earth. Experiments in the laboratory have
shown that RNA molecules can be artificially evolved to perform
reactions or make copies of themselves. These may represent some
of the first steps toward life. Scientists have also looked at the exist-
ing molecules in cells for clues. The most provocative piece of infor-
mation came when the structure of ribosome was solved. All living
things rely on ribosomes, indicating that they must have been pres-
ent in the earliest cells. Looking closely at the active site of the ribo-
some, where new proteins are built, the structure reveals that the
machinery is composed of RNA, and a particular RNA base per-
forms the reaction (. Fig. 7.6). This, along with the central role
played by RNA in all aspects of protein synthesis, has been taken as
evidence for an RNA World, where self-replicating RNA mole-
cules evolved and discovered the basic processes of life.
49 7
. Fig. 7.6 Active site of the ribosome. This structure includes a ribosome with the tips
of two transfer RNA molecules (magenta and blue spheres) bound in the protein-building
site. The ribosome nucleotide shown in red catalyzes the reaction (PDB entry 2wdl)
51 8
How Evolution Shapes

Proteins

52 Chapter 8 How Evolution Shapes Proteins
Just yesterday, I was walking to my car from the lab and I found a
8 stick insect on the sidewalk. I picked him up to return him back to
the greenery. As I was coaxing him onto my hand, he suddenly
folded up his long, spindly legs and turned into, well, a stick.
Evolution is an odd, meandering process, which often produces
unexpectedly magical results. Theres much evidence for this in our
everyday world: you only need to look around. Here in my California
garden, I have found caterpillars shaped exactly like bird droppings.
We have katydids that look exactly like leaves. Ive seen a moth that
looks exactly like a bumblebee and a hand-sized moth with such
perfect camouflage that it disappears completely when it lands on
the trunk of a tree.
All of these natural wonders have been shaped by evolution.
Because of natural selection, they are the best at what they do,
hiding from predators or scaring them off. Their ancestors were
the most successful, ultimately surviving where their less per-
fectly shaped ancestors perished. In the same way, evolution has
shaped the proteins in cells. Proteins are constructed in many
strange and elaborate shapes and evolved to optimize their many
diverse functions.
The mechanisms of evolution impose some specific constraints
on the way proteins evolve. Evolution at the molecular scale is
tricky. In order for a mutant protein to be successful, it has to per-
form its job continuously and help keep the cell alive. So, cells with
harmful mutations, and faulty proteins, rapidly die. This means that
legacy is a key limitation of biological evolution: every step along
the way must build on a successful predecessor. This legacy is easily
seen by looking at any protein. Chemists tell us that amino acids
can be made in two similar varieties, a left-handed form and a right-
handed form. However, all natural proteins, with the exception of a
few odd antibiotics created by the occasional microorganism, are
composed of amino acids with only one of these two possible hand-
ednesses (7 see Fig. 10.3).
The other handedness would work equally well. This was shown
in an amazing experiment from the laboratory of Steve Kent, where
they chemically synthesized a protein from scratch, entirely from
amino acids with the opposite hand. The structure was a perfect
mirror image of the natural protein, and it worked perfectly well on
substrates that also had a mirrored conformation. So, the current
ubiquitous handedness is a fossil from the earliest forms of life, and
weve been stuck with it ever since.
So how can a protein ever mutate and change if it must be con-
tinually active? One of the common mechanisms is to build a
backup copy through gene duplication. The gene for the protein is
copied and inserted into the genome. Then, one copy is able to
mutate and diverge, while the other remains the same and contin-
ues to perform its job. When you look at protein structures, exam-
ples of gene duplication show up everywhere. By comparing the
location of these different copies in the genomes of different organ-
isms, it has become apparent that our entire genome has been
duplicated several times, followed by a period where most of the
duplicate genes are weeded out. We only need to look at our most
Chapter 8 How Evolution Shapes Proteins
53 8
familiar protein, hemoglobin, to see an example of this. Our genome
includes several very similar proteins, presumably all created by
duplication from an original ancestor protein. These include the
two chains of hemoglobin, a few different forms of hemoglobin
optimized for use before birth, myoglobin, and two more recently
discovered forms with as-yet unknown function: cytoglobin and
neuroglobin (. Fig. 8.1).
The ease of gene duplication has led to a modular approach to
the evolution of proteins. Looking at the proteins in modern cells,
most are composed of compact domains. Comparing different pro-
teins, we find that these domains are reused over and over again in
new functional contexts. Some domains are particularly successful
and have been pressed to service in many different proteins. For
instance, a domain that binds to the cofactor NAD, first discovered
by Michael Rossman and named for him, shows up in many differ-
ent proteins that use the cofactor in their function (. Fig. 8.2). In
other cases, a similar domain may be repeated multiple times in a
single protein. For instance, the giant protein titin, which acts like a
. Fig. 8.1 Gene duplication in human globins. All of these proteins are encoded in the human genome, and all are
thought to have evolved from a common ancestor protein. The structures are colored to show their dierences from the
hemoglobin beta chain, with unchanging amino acids in red, mutations to similar amino acids in pink, and mutations to
entirely dierent amino acids in white. Notice that the regions that stay the same are primarily clustered around the
oxygen-carrying heme group and buried deep inside the protein (PDB entries 1hho, 1fdh, 3rgk, 1ut0, 1oj6)
. Fig. 8.2 Modular domains in proteins. The Rossman domain (top) specializes in binding to the cofactor NAD and is
found in many dierent enzymes. Three examples are shown here; in each case, the Rossman domain is connected to a
dierent domain that denes how the NAD is used in the reaction. Titin (bottom) is composed of many domains that
form a long, exible band. This structure includes only four domains in the center of the protein (PDB entries 3gpd, 1i10,
1htb, 3b43)
55 8
long elastic band that controls the stretching of muscle fibers, is
composed of several hundred similar domains all strung in a row,
like beads on a string.
By looking at the many organisms in biosphere, evolutionary
biologists have discovered fascinating patterns in the way that life
has evolved. For instance, divergent evolution is a process where a
population of organisms is split, and they gradually evolve new
traits. A familiar example is our handit developed from the front
feet of a distant mousy ancestor. If we look at our extended family of
mammals, this same limb has evolved to form hooves and flippers.
Convergent evolution, on the other hand, is just the opposite. This is
when two different populations have a similar selection pressure
and evolve traits that are similar. Eyes are a perfect examplebeing
able to see is a great advantage, and light-sensing eyes have evolved
independently in insects, octopuses, and humans.
Examples of divergent and convergent evolution are every-
where at the molecular scale. My favorite examples are found in
the serine protease digestive enzymes. These enzymes all use a
similar triplet of amino acids to perform their protein-cutting
reaction. A serine interacts directly with the target protein chain,
and a neighboring histidine and aspartate are perfectly posi-
tioned to activate it for the reaction. Looking at our digestive
enzymes, we can find three very similar serine proteasestryp-
sin, chymotrypsin, and elastasethat evolved from a common
ancestor protein and then diverged to attack different protein
sequences (. Fig. 8.3). If we cast our net a bit wider, we can find
many other protein-cutting enzymes that use the same arrange-
ment of serine-histidine-aspartate but have entirely different
foldings of the protein chain (. Fig. 8.4). These are examples of
convergent evolution, where a similar active site evolved within a
different protein framework.
There are even examples of molecular mimicry, reminiscent of
the way that stick insects and bark-colored moths rely on mimicry
for protection. Our immune system is one of the most powerful
selective pressures for pathogenic organisms, and these pathogens
have evolved many ways of mimicking our own molecules to make
them invisible to the immune system. For instance, many viruses,
such as HIV and influenza, coat their surface proteins with sugar
chains, the same sugar chains that decorate all of our normal cell
surface proteins. The unique portions of the viral proteins, which are
essential for finding and infecting cells, are shielded behind this cam-
ouflage of humanlike sugars, so our immune system cant find them.
The bacteriophage T7 has a particularly striking example of
molecular mimicry, creating a protein that mimics DNA. Many
bacteria have a defensive system that marks their own DNA genome
with methyl groups and then cuts any invading viral DNA that isnt
marked with the methyls (7 see Fig. 6.5). T7 phages circumvent
these defenses by building a protein that looks exactly like DNA,
which binds to the defensive enzyme that normally cuts up the
phages DNA (. Fig. 8.5). Once again, evolution has stumbled into
the perfect molecule to solve the problem.
. Fig. 8.3 Divergent evolution of serine proteases. These three enzymes cut protein chains using a similar active site
that includes a serine, histidine, and aspartate (shown in shades of purple). They evolved from a common ancestor and
then diverged to cut dierent proteins. Chymotrypsin has a large pocket next to the reactive serine (seen here above
and to the right of the purple serine), so it preferentially digests protein chains next to large amino acids. Trypsin has
evolved a negatively charged group at the bottom of this pocket, so it has a taste for positively charged protein targets.
Elastase has a much smaller pocket, so it prefers small amino acids (PDB entries 2cha, 2ptn, 3est)
57 8
. Fig. 8.4 Convergent evolution of serine proteases. These four enzymes use the same catalytic triad of serine (dark
magenta), histidine (lighter magenta), and aspartate (lightest magenta) to perform their protein-cutting reactions. As you
can see from the ribbon diagrams, however, the protein chains are entirely dierent, providing evidence that they
evolved separately and converged on the same active site machinery. Their functions are also quite dierent: elastase
and subtilisin cut in the middle of protein chains, carboxypeptidase Y clips chains from the end, and aspartyl
dipeptidase breaks very small peptide chains (PDB entries 3est, 1scn, 1wpx, 1fye)
. Fig. 8.5 DNA mimic. The protein ocr (short for overcome classical restriction) protects bacteriophages from the
defenses of the bacteria it infects. These structures show how ocr mimics DNA to block EcoKI (shown in shades of blue),
a defensive enzyme that normally cuts infecting phage DNA. EcoKI surrounds the DNA, as shown in the upper left image.
The image at upper right has one subunit removed to show the DNA inside. The complex with ocr is shown at the
bottom. Notice how the shape of ocr matches the DNA double helix, and the negatively charged amino acids (in bright
red) mimic the phosphate groups (in red and yellow) on the DNA (PDB entries 2y7h, 2y7c)
59 9
The Universe of Protein

Folds

60 Chapter 9 The Universe of Protein Folds
When I was applying for graduate school, a crystallographer lent

9 me her copy of Jane Richardsons 1971 article, Protein Anatomy.
And thus began, as it has for many scientists, an ongoing fascination
with the ways that protein chains fold. Her article is a perfect com-
bination of art and science, bringing to life a complex but endlessly
intriguing subject. The paper marks a cornerstone moment in the
study of protein structurethe time when enough structures of dif-
ferent proteins had been solved to start developing an understand-
ing of the general principles that are involved in folding a random,
tangled chain into a beautifully ordered, functional protein.
Everything that is needed to fold up a protein is encoded in the
protein chain, in the order of amino acids. But strangely, to under-
stand the way that proteins fold, we need to look not at the protein,
but rather to the water that surrounds it. Two aspects of the interac-
tion of proteins with water drive the folded shape of a protein. These
principles were apparent in the very first structure of a protein and
have been observed in every structure since then (. Fig. 9.1).
First of all, we need to look at the protein backbone. It is built of
peptide subunits that are relatively rigid, so they can only adopt a
few stable conformations. These peptide units also interact favor-
ably with water, forming hydrogen bonds, so were going to pay a
penalty if we try to bury them inside a protein. The two major struc-
tures seen inside proteins alpha helices and beta sheets are ways
for protein chains to replace all their water hydrogen bonds with
protein-protein hydrogen bonds, while keeping within the con-
straints of the allowable conformations that protein backbones can
adopt (. Fig. 9.2). Alpha helices fold the chain into a helix, with
each peptide unit forming hydrogen bonds with a peptide a few
steps along the chain. Beta sheets, on the other hand, align the
chains side by side, forming all possible hydrogen bonds between
. Fig. 9.1 Basic principles of protein folding. The rst structure of a protein revealed two basic principles of protein
folding: (left) the peptide chain forms many hydrogen bonds (green) to form a scaold of secondary structure, (right)
carbon-rich amino acids (blue) are packed mostly in the interior, and charged amino acids (red) are displayed on the
surface, in contact with water (PDB entry 1mbn)
Chapter 9 The Universe of Protein Folds
61 9
chains. These two types of structures form the building blocks for
the overall fold of the protein chain.
Second, the side chains of the amino acids, which are differ-
ent for each of the 20 amino acids, direct the folding of the chain
into a particular globular structure. Carbon-rich amino acids
shed their unfavorable interactions with water, driving the fold-
ing to place them in the interior of the protein. Charged amino
acids and amino acids that form hydrogen bonds largely stay on
the surface, interacting with the surrounding water. Many other
additional properties tune and shape the fold. For instance, a
bond may be formed between sulfur atoms in cysteine amino
acids, gluing portions of the chain together. Positive and negative
charges interact favorably with one another, and repulsion of
identical charges directs folding away from some possible folds.
Specific hydrogen bonds between some amino acids may favor a
. Fig. 9.2 Protein secondary structure. Alpha helices and beta sheets provide most of the secondary structure for
proteins. Two other types of helices are rarely seen: 310 helices are wound more tightly than alpha helices, and pi helices
are looser (taken from PDB entries 2viu, 2g8c, 3sbn, 1fuo)
. Fig. 9.3 Protein folds. A few common protein folds are shown here, using the cartoon representation popularized
by Jane Richardson. In each, the alpha helices are shown in magenta, the beta sheets in yellow, and the connecting
loops in white (PDB entries 2ccy, 1mbn, 1lrv, 1ppr, 1cem, 1fbr, 1vie, 1prn, 4bcl, 1stm, 1hcd, 1jpc, 1rie, 1got, 1air, 1ndd,
1tim, 1kvd, 1fua, 2dnj)
particular conformation of a loop. All these forces all contribute

to stabilization of the final fold.
Looking at the many structures that had been determined, Jane
Richardson found that they fell into a few large classes. Some were
composed primarily of alpha helices, others were composed of
beta sheets rolled into barrels or sandwiched on top of one another,
and others had layers of alpha helices stacked on a central beta
sheet. Scientists being scientists, this began a widespread effort to
develop rigorous classification schemes. Of course, the soft nature
of biology resists this type of formal classification, but in spite of
the many variations, two popular classification schemes have
63 9
. Fig. 9.4 New protein folds deposited each year in the PDB. The Protein Structure Initiative, started in 2000, had the
goal of determining all of the ways that natural proteins fold. It achieved its goal in about 10 years of work, as shown in
this graph of the number of unique protein folds as classied by two popular methods of analyzing protein folds
gained prominence, SCOP and CATH, that codify the ways that
proteins can fold (. Fig. 9.3).
This understanding of protein folding is a foundational piece
of information, particularly if we want to design new proteins
ourselves. In search of this understanding, the scientific com-
munity launched an effort at the turn of this century, termed
structural genomics, to determine structures for all possible
folds. At the time, proteins with new folding patterns were being
discovered right and left, so structural biologists decided to take
a more systematic approach to this grand challenge. The genome
of an organism or set of organisms was analyzed using the best
prediction tools. Then, proteins of interest were identified,
which were predicted to be quite different than anything cur-
rently known. A sophisticated structure determination pipeline
was then brought to bear to solve thousands of these structures.
The new structures help improve the prediction methods, and
the whole effort iteratively cranks out structures that fill in the
gaps in knowledge.
Several of these large efforts were set up around the world, and
the effort was a complete success. Looking at the number of new
folding patterns that are deposited in the PDB, we see a large num-
ber around this time, and then they fall off after a few years, pre-
sumably as the universe of stable protein folds is effectively covered
(. Fig. 9.4). One of the side effects of the effort is an explosion of
structures for domains of unknown function. These structures
were determined based on this goal of finding new folds, and the
research community is now faced with the challenge of figuring out
what they do in the life of the cell.
Scientists being scientists, we also want to make use of our

9 information once we have it. If we truly understand the rules of
protein folding, we should be able to design entirely new proteins
that fold up into custom shapes. But of course biology is never is
simple as it seems, and the field of de novo protein design has gone
through many fits and starts, but is now quite successful.
One of the major problems that protein designers faced
immediately is the negative design problem. Most of the rules for
protein folding were discovered by looking at folded protein
structures, so focus has been on the features that stabilize proteins,
such as a strong hydrophobic core and strategic placement of salt
bridges and hydrogen bonds. But it turns out that it is also critically
important to make sure that the chain only has a single stable folded
conformation. So, during design, we also need to test out all the
possible competing folds and make sure that they are not stable.
Evolution does this negative design naturallyorganisms with pro-
teins that adopt lots of nonfunctional folds quickly die out, leaving
only those with correctly folding proteins. Scientists with their
computers, on the other hand, need to test out all these unwanted
possibilities manually, one by one.
This work started out small and has continued to grow
(. Fig. 9.5). The first protein with an entirely designed sequence
was created in Stephen Mayos laboratory in 1997. They started with
a known protein fold, with two beta strands and an alpha helix, and
tested the many possible arrangements of different amino acids to
find the best. It worked, and a structure was determined by NMR
spectroscopy to confirm the designed folding pattern. The next big
milestone was the design of a protein with an entirely new fold,
never (at least so far) observed in nature. In 2003, David Bakers
laboratory developed a design method that iterates between opti-
mizing the sequence for a given fold and optimizing the predicted
structure for this fold. The result was a small protein that is extremely
stable, as seen in a crystallographic structure. This successful design
is strong evidence that our current understanding of protein folding
is on the right track.
65 9
. Fig. 9.5 Designed proteins. Designed proteins FSD-1 and Top7 both build on the basic principles of protein folding,
with a scaold of secondary structure (as seen in the ribbon diagrams at the top) and a partitioning of charged amino
acids (red in the lower images) on the surface and carbon-rich amino acids (blue) in the interior (PDB entries 1fsd, 1qys)
67 10
Order and Chaos

in Protein Structure

68 Chapter 10 Order and Chaos in Protein Structure
Our understanding is often shaped by the things that we can actually

10 see. Science is bursting with examples of this. Galileos crude tele-
scope gave a blurry image of Saturn, so he thought it had two large
moons like our own moon. But the better telescope used decades
later by Huygens revealed them to be something entirely new: the
rings of Saturn. The first microscopes changed the way we thought
about disease by revealing pathogenic organisms, launching the
effort to find effective antibiotics. New techniques of seeing often
lead to new insights. This is certainly the case with structural biology.
Kendrews crystallographic structure of myoglobin opened a whole
new world of understanding, revealing the atomic details of biology.
In some cases, however, the things that we can see may limit our
view. A perfect example has occurred in the structural biology com-
munity, quite ironically caused by the wonderful success of x-ray
crystallography. To determine a crystallographic structure, we have
to have a sample that can be crystallized. So, this usually means that
the proteins need to be rigid bricks that can stack perfectly into a
crystal lattice. Because of this, most of us now think of all proteins
as being just like that very first myoglobin structurea perfectly
folded chain forming a functional globular protein.
Many proteins have this type of perfect order, as evidenced by the
hundred thousand structures in the PDB.This order reaches glorious
heights with addition of symmetry. Sometimes a single protein chain
just isnt enough to build the structure that is needed. Often, larger
structures are built using many copies of a protein, which then associ-
ate to form a larger assembly. In some cases, these are point group sym-
metries that create a closed complex with an exactly defined number of
subunits (. Fig. 10.1). In other cases, these include translational sym-
metries, creating, for instance, helical complexes that span entire cells.
As more and more structures have become available, it is apparent
that multi-subunit proteins are most often symmetrical. This makes
sense for many reasons. Since they are all in identical, symmetrical
. Fig. 10.1 Symmetrical assemblies. Sliding clamps used in DNA replication have evolved to encircle DNA, but a
bacterial clamp (left) and a human clamp (right) achieve this function using assemblies with two dierent symmetries
(PDB entries 1axc, 2pol)
Chapter 10 Order and Chaos in Protein Structure
69 10
locations, they all have the same surfaces for interactionbasically,
one type of subunit is all that is needed. This is easier to evolve and
more economical to build. Also, symmetrical complexes, or at least
those based on point groups, are self-limitingthey form a defined
complex, not an open-ended aggregate. Aggregates are a great danger
to cells, since they clog everything up. So, when we look to the PDB,
we find that nearly all complexes are symmetrical.
Ironically, highly symmetrical filaments are often some of the
most difficult subjects to study with crystallography. The reason for
this is that filaments are often built with perfect helical symmetry,
but the symmetry is rarely exactly what is needed to build a crystal
lattice. For instance, actin filaments have about 13 subunits in 6 turns
of the helix, which doesnt fit nicely into the two-, three-, four-, and
sixfold symmetries that are compatible with crystals (. Fig. 10.2). So,
. Fig. 10.2 Actin structures. Crystallographic structures of actin are typically

determined in complexes with an actin-binding protein, such as gelsolin shown
on the left. Structures of the lament are obtained by tting these types of struc-
tures into reconstructions from electron microscopy, as shown on the right (PDB
entries 1yvn, 3j8j)
these filaments are often studied using an integrative approach. The

10 symmetry of the filament is studied using electron microscopy, and
the atomic details are determined using a single subunit. Then, the
two pieces of information are combined to create an atomic model of
the whole filament.
There are also a few types of symmetry that are largely forbidden
in biological molecules. These involve mirrors. In our familiar
world, we have right and left shoes and right turn and left turn
arrows, but these are not typically found with proteins and nucleic
acids. Amino acids and nucleotides have a specific handedness,
which is used to build nearly all proteins and nucleic acids. Of
course, there are exceptions: a few odd bacteria build small antibi-
otics with amino acids of the opposite hand (. Fig. 10.3). These are
particularly useful because the cells defenses have evolved to digest
normal amino acids, so these flipped ones are more resistant.
X-ray crystallography has shaped the way we see proteins, and
only now are we discovering that they are built with a much richer
palette than what we have previously seen. Many of the earliest
structures of proteins were these types of small, stable, globular
. Fig. 10.3 Handedness of amino acids. Cyclosporin, a cyclic peptide made

by fungi, contains a pair of alanines with opposite handedness (PDB entry 1cya)
71 10
proteins, but problems cropped up quite quickly. For instance,
when researchers wanted to explore the structure of antibodies,
they proved very difficult to crystallize. This is because antibodies
are composed of several functional domains, connected by flexible
tethers. To get an atomic view of antibodies, scientists simply
chopped them up and solved structures of the stable pieces. More
recently, a few lucky researchers have managed to coax entire anti-
bodies into a crystal lattice, trapping them in one frozen conforma-
tion (. Fig. 10.4).
These types of flexible tethers are very common in proteins that
need to adapt to different scenarios. Antibodies have two or more
flexible arms that bend and flex to adapt to the location of antigens
on their targets. The huge titin protein is composed of hundreds of
small stable domains connected by flexible hinges, acting like a
huge rubber band that stabilizes muscle contraction (7 see Fig. 8.2).
These flexible segments are built of a characteristic complement of
amino acids: they have lots of proline and glycine, which form kinks
in the chain that resist folding into stable globules, and lots of amino
acids that interact strongly with water.
. Fig. 10.4 Antibody linkers. Flexible linkers connect the dierent functional domains of antibodies. These linkers
contain many proline amino acids (green) that kink the chain and keep it from adopting a folded structure. The linker
also includes several cysteine amino acids (with sulfur shown here in yellow), which form cross-links that connect the
antibody chains (PDB entry 1igt)
The structures of icosahedral viruses also revealed a need for a

10 different type of flexibility. These viruses surround their genome
with a symmetrical protein coat, which delivers the genome to the
cells they infect. Small viruses with tiny genomes can use a perfectly
symmetrical shell with icosahedral symmetry, but other viruses
need more space. Its not really practical, however, to make a bigger
subunit or to use several different subunits, since viruses only have
a limited amount of space in their genome to encode proteins.
Instead, they employ quasisymmetry.
In the classic conception of quasisymmetry, proposed by
Caspar and Klug in 1962, many copies of a viral protein form a
huge spherical coat, with each subunit in almost, but not quite per-
fect, symmetrical arrangements with its neighbors. Small deforma-
tions in the protein make it all possible. This has been observed in
numerous structures of different viruses, allowing the construction
of a wide range of capsids with different sizes. The symmetry of
these capsids is defined by a triangulation number, which speci-
fies the number of unique subunit conformations that are needed
in the tiling. A perfectly icosahedral capsid is T=1, a larger T=3
capsid has three unique conformations, and so on (. Fig. 10.5).
Each of these positions has similar interactions with its neighbors,
but not perfectly identical (. Fig. 10.6).
Evolution blindly explores every possibility, and as more and
more structures of viral capsids have been studied, numerous
exceptions to classic quasisymmetry have been found (. Fig. 10.7).
For instance, bluetongue virus is built of 120 subunits, which adopt
two very different environments. The whole thing is reminiscent of
an Escher tiling, with a shape that is just right to lock together and
. Fig. 10.5 Viral quasisymmetry. Quasisymmetry is used to construct viral capsids of dierent sizes. Satellite tobacco
necrosis virus is composed of 60 subunits in perfect T=1 icosahedral symmetry. Tobacco bushy stunt virus is composed
of 180 subunits in T=3 quasisymmetry: 60 form the vefold vertices (colored red), and the remaining 120 form a ring of
six centered on the threefold axes (colored orange and yellow). Similarly, the Nudaurelia capensis omega virus has 240
subunits in T=4 quasisymmetry, and bacteriophage HK97 has 420 subunits in T=7 quasisymmetry (PDB entries 2buk,
2tbv, 1ohf, 1ohg)
73 10
. Fig. 10.6 Protein deformations in quasisymmetry. Quasisymmetry

requires small deformations in the subunits to accommodate the slightly
dierent neighborhoods of the dierent positions in the capsid. Tobacco
bushy stunt virus achieves this by building a subunit composed of two
domains. The central domain here associates to form a dimer, and then the
other domain exes slightly to form the dierent interactions with neighbors
(PDB entry 2tbv)
. Fig. 10.7 Exceptions to quasisymmetry. Many exceptions to the classic

concept of quasisymmetry have been discovered. Bluetongue virus (left) is
composed of 120 subunits, which occupy two quite dierent positions (red and
orange). Simian virus 40 (right) is similar to the T=7 capsid of HK97 but has a
pentamer of subunits at the positions normally occupied by six subunits in a
classic T=7 quasisymmetry (PDB entries 2btv, 1sva)
completely cover the sphere. Simian virus 40 has an overall tiling

similar to a classic T=7 quasisymmetry. However, the whole struc-
ture is built of pentamers, which normally can only occupy the five-
fold symmetric vertices of the icosahedron. The trick is that each
subunit has long, flexible arms that reach over and interact with
whichever neighbor happens to be closest.
More recently, it has become apparent that many proteins do
away with any folded structure at all. For instance, unstructured
10
. Fig. 10.8 CBP protein. The modular CBP protein has been studied by crystallography and NMR spectroscopy by
cutting it into pieces and including only small pieces of the interacting proteins (green) (PDB entries 1l8c, 1kdx, 1jsp,
3biy, 2ka6, 1kbh)
regions are widely used in cellular signaling. Many signaling pro-

teins have an unstructured tail that is recognized by a form-fitting
groove in a target protein. This has many great advantages: a single
tail can adapt itself to many similar grooves in different proteins
(perhaps with different affinities), and the interaction can be highly
specific but rather weak, allowing the chain to interact with its tar-
get, but quickly separate when its job is finished.
The CBP protein is a perfect example of how all of these features
come to play (. Fig. 10.8). It acts as an integrator of signaling infor-
mation, interacting with many molecules and deciding if transcrip-
tion will begin. It is composed of one long chain. Several regions in
this chain fold into defined structures, and they are separated by
stretches that are flexible and unstructured. The folded domains
themselves interact with the unstructured portions of other mole-
cules.
As you can imagine, these unstructured proteins are quite diffi-
cult to study, since they never sit still long enough to get a good
look. One common approach is to study only the part that binds to
its targetso were essentially looking at the structured state of an
unstructured protein (. Fig. 10.9). NMR spectroscopy can also give
information on the range of conformations that are accessible to an
unstructured chain, giving us some idea of their mobility.
Structural scientists are still trying to extend their reach into
these many different areas of protein structure, developing meth-
ods to characterize modes that are not amenable to x-ray crystal-
lography. One of the challenging current topics is the study of
amyloids. These are examples of what happens when protein
sequences are taken out of context. Take, for instance, the amyloid
precursor protein, which plays an important role in Alzheimers
disease (. Fig. 10.10). Normally it is a stably folded membrane pro-
tein. But if it is clipped, one of the peptides aggregates to form rigid
fibers that clog up the function of neurons.
75 10
. Fig. 10.9 Proteins with unstructured tails. NMR was used to study two initiation factor proteins with unstructured
tails. When they form a complex, a portion of the tail (highlighted in turquoise) of eIF4E binds in a groove in eIF4G (red),
adopting a dened structure (PDB entries 1ap8, 1rf8)
These fibrils have been very difficult to study at the atomic level,
because they often have many similar, but different forms, and lack
the order or periodicity to form crystals. Scientists have used a vari-
ety of methods to probe their structure, combining NMR studies,
which can provide information on the local conformation of the
chain and portions that are close to one another, with electron
microscopy, which gives an understanding of the overall form of
the fibril and the way that the protein chains are stacked within it.
By using this type of integrated approach, throwing every technique
we have at a difficult problem, were able to expand our conception
of what proteins are, and how they balance order and chaos as they
go about their jobs.
10
. Fig. 10.10 Amyloid bers. Amyloid-beta precursor protein, shown on the left, is normally found in the membrane
of nerve cells. The cells processing proteins cut it into dierent pieces, creating a small peptide (shown in green) in some
cases. This peptide can refold to form long amyloid bers, as shown on the right, that contribute to the nerve problems
in Alzheimers disease (PDB entries 1mwp, 1owt, 1rw6, 1iyt, 2m4j)
77 11
Molecular Electronics

78 Chapter 11 Molecular Electronics
Cells have evolved countless systems to keep their houses in

11 ordersystems for plumbing, systems for heating, and systems for
recyclingso perhaps we shouldnt be surprised that they are also
master electricians. In our familiar world, our houses are wired with
copper, and the flow of electricity is controlled with switches and
electrical components. Huge numbers of electrons hop from atom
to atom through these wires, powering our lights and appliances.
Cells, however, take a more personal approach to electricity. Unlike
electronics in our familiar world, cells manage their electricity one
electron at a time.
Electrons are slippery beasts, and cells need special tools to
manage them. They typically use two approaches (. Fig. 11.1). The
first is to use a metal ion to capture the electron. Looking to the
structures of proteins, we see many variations on this theme. By
employing different ions, and in different states, the affinity of the
ion for electrons is tailored for the particular task. Iron ions are very
common. In some cases, the ion is immobilized in the center of a
large heme group. In other cases, several iron ions are held in a
small cluster with sulfur atoms. Additional ions are also used in
special cases. For instance, copper ions are used in the the photo-
synthetic protein plastocyanin, giving it a beautiful bluish color.
Alternatively, some proteins use small organic molecules to
carry electrons. These molecules typically have large ring systems
that can adopt different charged states, capturing electrons and
releasing them, often capturing and releasing protons at the same
time. In our bodies, these carrier molecules are often created from
vitamins like niacin and riboflavin, since our cells dont have the
ability to construct them from scratch.
In some cases, cells need to move electrons over large distances,
so they employ small carrier proteins. These proteins shelter their
electron-carrying metal ion or cofactor and shuttle it from location
to location. In other cases, electron-carrying cofactors are arranged
in large proteins in chains, forming a nanowire that transmits elec-
trons from one site to another. Remarkably, the electrons move
along these chains, and from the carrier proteins to the chain, by
. Fig. 11.1 Electron carriers. Soluble electron transport proteins use many tools to transport electrons. Cytochrome c
uses an iron ion held in heme and ferredoxin uses a cluster of iron and sulfur. Plastocyanin has a copper ion and
avodoxin uses avin molecule (PDB entries 3cyt, 1a70, 5pcy, 1ag9)
Chapter 11 Molecular Electronics
79 11
quantum mechanical tunneling. The position of each electron is
fuzzymost of the time its near the atom nucleus, but there is a
small chance that it will be found at a distance from the nucleus, a
chance that gets less and less probable over longer distances.
Looking for clues in the structures of proteins with electron trans-
port chains, we find that a distance of about 14 nanometers is the
maximum distance where this tunneling occurs at functional rates.
Both of these types of electron transport are exemplified in a
structure of cytochrome c with one of its metabolic partners, cyto-
chrome bc1 (. Fig. 11.2). Cytochrome bc1 is a large membrane-
bound protein that uses a flow of electrons to pump protons across
the membrane. It has a string of iron atoms, held in heme and iron-
sulfur clusters, that electrons flow through to power the pump.
When the electron reaches the end, it is delivered to its final desti-
nation by cytochrome c, a small carrier protein with a heme group
at its center. The complex shows that the cytochrome c docking site
positions its heme group right next to one of the hemes in cyto-
chrome bc1, allowing an electron to tunnel across the gap.
The electricity that powers our homes is largely (at least for
now) obtained by the burning of fossil fuels, and the heat is used to
power generators. Cells take a much more delicate approach to
obtain their electrical energy. Early evolution of life, reactive mole-
cules were common in the environment, and the earliest cells
tapped them for energy. We can look to exotic bacteria living on the
Earth today to get a feeling for what is must have been like then. For
instance, some of these bacteria use hydrogen gas as their raw
. Fig. 11.2 Cytochrome bc1 and cytochrome c. This complex includes the small soluble protein cytochrome c
(red backbone) and the large membrane-bound protein cytochrome bc1 (blue backbone). The complex brings the heme
groups of the proteins in close proximity, allowing an electron to tunnel from one to the other (PDB entry 1kyo)
11
. Fig. 11.3 Hydrogenase. This atomic structure was determined by a combination of NMR spectroscopy and structural
modeling and captures the transfer of electrons from hydrogenase (blue backbone) to a cytochrome (red backbone). The
hydrogen-splitting site has a usual cofactor (in atomic spheres at the left) with two iron ions, cyanide and carbon
monoxide, and a small sulfur-containing molecule. The electrons that are released from the reaction jump through three
iron-sulfur clusters (at center) and end up in the heme group of the cytochrome (at right) (PDB entry 1e08)
material, splitting it and extracting the electrons. Their electrical

tools can be quite exotic. The one shown in . Fig. 11.3 uses a pair of
iron ions surrounded by cyanide and carbon monoxide and a com-
pound with two sulfur atoms. Together, they grip the tiny hydrogen
molecule and split it into two unequal halves: a hydride ion (a pro-
ton and two electrons) and a proton. Then, the two electrons are
stripped off the hydride ion and transferred down a string of cofac-
tors to an electron carrier.
Today, we obtain our electrons from food molecules, ultimately
shuttling them to the oxygen we breathe and using them to power
our cells in the process. Food molecules are broken down and the
electrons are transferred to the small carrier molecule NAD.This
shuttles them to the respiratory electron transport chain, where
they are used to charge an electrochemical battery that builds
ATP. Atomic structures have revealed the complex circuitry of
carriers and nanowires that carry this life-powering current
(. Fig. 11.4). Recent research has revealed that this respiratory
chain forms a large complex, bringing three large electron-transfer-
ring protein complexes together for maximal efficiency. A recent
electron microscope structure reveals the complex from cow mito-
chondria, showing a very compact supercomplex. This may be par-
ticularly important to reduce leakage of electrons. Occasional
electrons escape from the chain, hopping directly onto an oxygen
molecule to form toxic superoxide radicals that can wreak havoc in
the cell by damaging proteins and DNA. Fortunately we have
Chapter 11 Molecular Electronics
81 11
. Fig. 11.4 Respiratory supercomplex. Electron microscopy has been used to study a supercomplex of the three large
protein complexes involved in respiratory electron transport. As electrons ow from cofactor to cofactor, they power
the pumping of protons across the membrane, charging an electrochemical battery (PDB entry 2ybb)
another set of metalloproteins, superoxide dismutases, that detoxify

these dangerous molecules.
Electrons are also needed for many chemical tasks. These are
cases where the properties of the molecule need to be changed by
adding electrons or removing them (. Fig. 11.5). For example,
superoxide dismutase detoxifies oxygen using a copper and zinc ion
to extract the electron and add it back to another superoxide mol-
ecule, creating a less dangerous molecule of hydrogen peroxide.
Xanthine oxidoreductase takes a more typical approach. It binds to
an electron carrier cofactor and transfers these electrons to the
active site, where they are used to make purine bases more soluble,
so they can be easily removed from the body.
This continual flow of electrons powers the entire biosphere. It
is needed for everything we do: to fuel biochemical processes, to
power motion, and to assist in metabolic reactions that construct,
assemble, and recycle our molecular building blocks. The ultimate
source of nearly all of this electrical energy is solar, captured by
plants and feeding the rest of the living world.
11
. Fig. 11.5 Metalloenzymes. Superoxide dismutase (top) uses a copper ion and a zinc ion to extract electrons from a
destructive superoxide radical, and xanthine oxidoreductase (bottom) shuttles electrons from a avin cofactor
(in spheres near the top) through two iron-sulfur clusters to a complicated molybdenum cofactor (in spheres near the
bottom), where it performs a reaction that converts purine bases into soluble waste products that can be excreted
(PDB entries 2sod, 1fo4)
83 12
Green Energy

84 Chapter 12 Green Energy
Plants are the very definition of green energy. They are powered by
12 sunlight and grow using a few common resources in the environ-
ment. They are infinitely renewable, returning everything to the
environment when they die. And they do all this using molecules
that color our world in beautiful shades of green and red and yellow.
Structural biologists are looking to plants for hints about how they
live so gracefully, with the hopes that we can somehow incorporate
these principles into our own management of energy resources.
At the center of the green energy of plants is a green molecule:
chlorophyll. It is a small organic molecule with a magnesium ion at
its center, which has the special property that it absorbs light and
uses it to energize an electron. These energetic electrons can then be
passed down a chain of electron carriers, which are wired to power
energy-requiring tasks, in particular, to charge up an electrochemi-
cal battery. Structures of these molecules have revealed a multitude
of amazing aspects to the process.
All the action occurs inside huge protein complexes, called pho-
tosystems, that hold the chlorophyll and other molecules in exactly
the right orientations. At the center is a special pair of chlorophyll
molecules, the ones that ultimately spit out the energetic electron
and are later restored by stripping a less-energetic electron out of
water. Surrounding this are a host of other brightly colored mole-
cules that absorb light and transfer the energy inward to this central
pair.
The most advanced methods for crystallography are currently
being used to watch this process in action. In these methods, tiny
crystals of the photosystem are subjected to a very powerful beam
from an x-ray laserso powerful that it destroys the crystal in the
process. But before it does, x-rays are diffracted by the crystal and
measured, capturing one view of the crystal. This is repeated for
thousands of tiny crystals, randomly building up a full data set of
the diffraction pattern from different angles. One of the advantages
of the method is that it is very fast, capturing a defined moment. So,
researchers can illuminate the crystal and then determine a struc-
ture at defined times after the photon is absorbed.
The results are quite subtle (. Fig. 12.1). Most of the protein and
its associated cofactors stay in exactly the same places as light is
absorbed. But a strategically placed tyrosine amino acid changes
position slightly, shifting toward the chlorophyll molecules that
absorb the light. Spectroscopic studies of this protein have revealed
that this tyrosine loses its hydroxyl hydrogen, gaining a negative
charge. Although the hydrogens are not seen in the crystallographic
experiment, the motion is evidence of this change, as the negatively
charged form of the tyrosine moves toward the positively charged
chlorophyll and helps mediate the flow of electrons through the
chain of cofactors.
There are many other amazing aspects to the process. For
instance, many photosystems are surrounded by a field of antenna
proteins, themselves filled with light-absorbing molecules like chlo-
rophyll and carotene (. Fig. 12.2). These all work diligently to absorb
photons and then transfer the energy from molecule to molecule
until it reaches the special pair at the center of the photosystem.
Chapter 12 Green Energy
85 12
. Fig. 12.1 Bacterial photosystem. Two structures of a bacterial photosystem were determined, before and after it
had absorbed a photon of light. The photosystem is shown on the left. Light is absorbed by a special pair of chlorophyll
molecules (green) at the center, and then electrons are transported down (shown with an arrow), ultimately reaching a
quinone (orange). The electrons are replenished from the top through a string of hemes (red). As shown on the right, the
two structures were quite similar, except for the motion of a key tyrosine amino acid, shown in blue. The change was
taken as evidence that the tyrosine loses its hydrogen atom in the light-activated state (the lower of the two in the
gure), gaining a negative charge and moving closer to the special pair of chlorophylls (PDB entries 2x5u, 2x5v)
The machinery for stripping electrons out of water has also been
revealed in atomic structures. These electrons are needed to replace
the ones that are sent down the electron transfer chain, producing
the oxygen that we all breathe in the process. The action occurs at a
complex cofactor composed of four manganese ions and a calcium
ion. The structures have revealed the arrangement of ions in the
cofactor, but researchers are still sorting out how it captures water
and produces oxygen (. Fig. 12.3).
Looking inside plant cells, we find that they have vast arrays of
these photosystems, all surrounded by their fields of antennas. They
are arranged in disk-shaped compartments (termed grana), which
allow them to build up a gradient of protons as they perform their
light-driven pumping operation (. Fig. 12.4). The energy of this
gradient is ultimately used to power the creation of sugar molecules,
which fuel the entire biosphere. The process of building sugar
involves many enzymes, but one plays a key role: ribulose bisphos-
phate carboxylase/oxygenase (RuBisCO).
RuBisCO (. Fig. 12.5) is the enzyme that captures carbon diox-
ide and fixes it into a molecule that can be used by the cell to build
sugar. Ironically, this enzyme is one of the least efficient enzymes in
cells. This is due in part to the similarity between carbon dioxide
and oxygen molecules. As reflected in the name of the enzyme, it
performs two competing reactions: a carboxylase reaction that fixes
carbon dioxide and an oxygenase reaction that creates a toxic side
12
. Fig. 12.2 Antenna proteins. Photosynthetic reaction centers (shown in darkest green in the center of each complex)
are often surrounded by a core antenna complex (in medium green and pink) and peripheral antenna proteins (lightest
green and pink). Photosystem II (with the oxygen-evolving center in red and purple) is associated with the
light-harvesting protein LHCII and other proteins (not shown here). Photosystem I has several light-harvesting subunits
that associate with main core to form the supercomplex shown here. The simple reaction center from a photosynthetic
bacterium (lower right) is surrounded by light-harvesting complex LH1 and associates loosely with LH2 (PDB entries
4ub6, 2bhw, 4y28, 1pyh, 2fkw)
. Fig. 12.3 Oxygen-evolving center. The oxygen-evolving center of

photosystem II includes four manganese ions (purple) and a calcium ion
(green), all stitched together with oxygen and water (red). The oxygen atom
marked O is thought to be in the position where the reaction occurs, and the
position marked W may be a water molecule ready to be inserted into the
reaction (PDB entry 4ub6)
Chapter 12 Green Energy
87 12
. Fig. 12.4 Chloroplast. This cross section through a chloroplast shows the two-layered membrane at the top and the
stacked grana below. The photosynthetic electron transport chain is embedded in the membranes of the grana: (1)
photosystem II, (2) light-harvesting complex II, (3) plastoquinone, (4) cytochrome b6f, (5) plastocyanin, (6) photosystem
I, (7) ferredoxin, (8) ferredoxin reductase, and (9) ATP synthase. Many RuBisCO enzymes (10) are found in the soluble
space along with the machinery to synthesize and manage the chloroplast
12
. Fig. 12.5 RuBisCO. RuBisCO is a huge enzyme complex composed of eight

copies of two dierent chains (shown in green and blue). A transition-state
analogue of the reaction is shown in the active sites in red (PDB entry 1rlc)
product. The plant cell then needs to clean up all these side prod-
ucts. It must all be worth the effort, however, because RuBisCO has
been estimated to be the most plentiful enzyme on the Earth.
Today, we think of plants as being the greenest of green energy,
but this was not always the case. In the early evolution of life, pho-
tosynthetic organisms were the major polluters on Earth, so much
so that they changed the basic characteristics of the environment.
The earliest organisms used other forms of energy to power their
process, grabbing readily available reactive molecules in the envi-
ronment. But obviously photosynthesis was a more successful evo-
lutionary innovation, and the oxygen released by these early
organisms filled the skies and gradually poisoned all of the
competitors.
89 13
Peak Performance

90 Chapter 13 Peak Performance
Diet and exercise are a continual topic of discussion in my circle of

13 friends, and the discussion is a free mixture of science, pseudosci-
ence, and sheer wishful thinking. Again and again, when I look to
experts for advice, it all comes down to a logical balance of physical
exercise and the amount of food. We need to burn up the amount of
food that we eat, or if were dieting, use up more food than we eat.
But the subject is tricky, of course, because everybody wants a
shortcut to health, and even the slightest scientific provenance can
support the newest craze.
Personally, Ive gone on several diets. The most successful, in
terms of weight loss, was a low-sugar diet. The biochemical logic
behind this is a bit convoluted and seeks to change the way that
your body deals with food. In a typical high-carb diet, sugar is
stored as glycogen, and theres always plenty of it around to convert
back into glucose to feed hungry cells. In the low-carb diet, this
glycogen runs out, and fats need to be broken down instead.
Researchers are still arguing about it all: whether it works, whether
its dangerous, or whether it makes any difference in the long run. I
had great results with it, but the cynical part of me still feels that the
reason it worked so well was that it was so difficult to find low-sugar
foods in our supermarket, so I ended up looking much more care-
fully at what I ate.
Structural biologists have looked at many of the enzymes
involved in the storage and release of glucose in glycogen. Glycogen
phosphorylase is a central player in this story. It is the enzyme that
releases glucose when needed. Since this is a critical role, it is highly
regulated, to make sure that it is only active at the appropriate times.
Structures of the enzyme reveal many interesting aspects to this
regulation (. Figs. 13.1 and 13.2). The enzyme is a dimer, with two
protein subunits, each with its own active site for clipping off glu-
cose from the glycogen chain. Each subunit also has a second site
that binds to glycogenthis tethers the enzyme to the glycogen
granule and has been termed the storage site.
If we flip the enzyme over, some of the regulatory machinery is
on the backside. This includes a serine amino acid that is phosphor-
ylated based on signals from hormones. For instance, adrenaline
triggers phosphorylation of the enzyme, turning it on and giving us
a burst of extra sugar to respond to whatever dangers were encoun-
tering. Insulin, on the other hand, leads to removal of the phos-
phate, turning the enzyme off and shifting to storage of glucose
instead. The enzyme is also regulated by the level of ATP in the cell.
When the cell needs energy, AMP is more plentiful, and it binds in
a cleft between the subunits and controls the flexing of the molecule
between active and inactive states.
The biochemistry behind low-fat diets, on the other hand,
makes perfect sense to me. Gram for gram, fats provide more
energy than sugars: 9 calories per gram for fat and 4 for carbohy-
drates. This makes them the perfect energy-rich molecule to store
energy in cells (probably the reason they taste so good!). This also
means that were going to have to exercise more if our diet is full of
fats. Structural biologists have explored many of the enzymes
involved in synthesis of fatty acids and their breakdown. Since
Chapter 13 Peak Performance
91 13
. Fig. 13.1 Glycogen phosphorylase. Glycogen phosphorylase is a dimeric enzyme that includes an active site that
clips glucose from glycogen (with a nucleotide bound in the site in this structure, shown in red) and a storage site that
tethers the enzyme to the glycogen granule. Regulatory sites are seen on the backside of the enzyme, including a serine
that is phosphorylated (in green) and an allosteric site for binding nucleotides (in red) (PDB entry 6gpb)
fatty acids have long strings of carbon atoms, they are broken
down bit by bit by four enzymes, which release two-carbon units
and connect them to the carrier molecule coenzyme A. In our
mitochondria, three of these enzymes are associated into a multi-
enzyme complex that allows the fatty acid to transfer directly from
site to site during the reactions. The structure of a similar complex
from bacteria has been studied by crystallography, uncovering
some of the atomic details of how the fatty acids and other
necessary cofactors all bind to perform the progressive breakdown
(. Fig. 13.3).
13
. Fig. 13.2 Glycogen phosphorylase regulation. Glycogen phosphorylase is an allosteric enzyme that shifts shape
between an inactive and an active form (PDB entries 8gpb, 1gpa)
. Fig. 13.3 Fatty acid metabolism. This structure of a bacterial beta-oxidation multienzyme complex captures several
pieces of the story. The complex includes three dierent enzymes, with two copies of each. Two are found in the
subunits shown in blue, with NAD (pink) and a fatty-acid-like molecule (gray) bound in the active sites. The other
enzyme (green) performs the nal step of attaching a piece of the fatty acid to the carrier molecule coenzyme A
(magenta) (PDB entry 1wdk)
93 13
. Fig. 13.4 Supersweet proteins. Supersweet proteins like monellin, thaumatin, and brazzein, as well as sweeteners
like aspartame, bind to the sweet taste receptor, which is similar to the glutamate receptor shown in blue (PDB entries
3mon, 1thv, 2brz, 2e4z, 4or2)
Sometimes we need a bit of help keeping to our diets, and sci-

ence has come to the rescue. For instance, a number of sweet com-
pounds have been discovered that can replace some of the calories
in sugar. These molecules bind to the sweet receptor in our tongue
and trick us into thinking weve eaten some sugar. When this recep-
tor normally binds to sugar, it changes shape and sends a signal off
to the brain telling us weve eaten something enjoyable. An atomic
structure is not available for this receptor, but a similar receptor that
binds to glutamate has been studied. It is composed of two parts. It
has a classic GPCR that crosses the cell membrane, activating the
signaling machinery inside the cell (see Chap. 15 for more informa-
tion on GPCRs). On the outside, there is a two-domain portion that
closes around sugar (and sweeteners like aspartame) when it finds
it, passing the signal on to the GPCR portion. Since the whole
structure is so flexible, it has been studied in parts by crystallogra-
phy (. Fig. 13.4).
Several supersweet proteins have also been discovered in nature,
which are thousands of times sweeter than common table sugar.
When scientists first started studying these proteins, they assumed
that the proteins would have a little extension, a sweet finger, that
would extend into the sugar-binding site of the receptor. They tried
clipping off pieces of the protein to see if they could find pieces that
worked just as well as the whole protein. Unfortunately, this was not
successful. So the current theory about how they work assumes that
13
. Fig. 13.5 Whey proteins. Whey proteins are rich in essential amino acids (magenta), particularly the branched
amino acids (red) (PDB entries 1beb, 1hfz, 3v03)
the whole protein binds in the cleft between the receptor domains
acting like a wedge to create the sweet-tasting conformation
changes. Scientists are now using mutations to dissect the interac-
tion, trying to determine which portions are most important. Some
even sweeter versions of the proteins have been discovered, and
researchers have discovered ways to make them more attractive for
use in cooking. For instance, the two chains of monellin fall apart
when it is heated and the molecule loses its sweetness, but an engi-
neered single-chain version is much more stable.
I also went through a phase where I was a bit of a gym rat, trying
to build up some muscle. I got a personal trainer, who promptly
prescribed a course of supplements. These included a daily multivi-
tamin (which I still take), lots of protein in the form of shakes, and
a creatine supplement. The protein is easy to understandmy body
needs the building blocks to build new muscle. My trainer was
pushing whey proteins at the time. This is a collection of small, sol-
uble proteins from milk, left over when all the stuff needed to make
cheese is taken out. These have been found to be rich in essential
amino acidsthe amino acids that our body cant make by itself.
Three of these get the most press in the context of the gym: the
branched chain amino acids leucine, isoleucine, and valine. Studies
have found that supplementing the diet with these amino acids can
stimulate muscle growth, so proteins that have more of them (like
whey proteins) are popular for protein supplements (. Fig. 13.5).
The multivitamin makes perfect sense as wellthese are needed
for many of the molecular machines that control metabolism.
95 13
. Fig. 13.6 Vitamin B12. Vitamin B12 is collected by the intrinsic factor protein, which then binds to the cubam
receptor and is imported into our cells. It is used by two essential enzymes: one that is involved in the regeneration of
the amino acid methionine and the one shown here, which is involved in energy metabolism (PDB entries 3kq4, 2xiq)
A multivitamin ensures that all of these are in top shape. Structural

biology has allowed us to see many of these vitamins in action. The
B vitamins, in particular, are used to build many specialized chemi-
cal tools that are needed by enzymes. The one shown in . Fig. 13.6
is an unusual molecule, vitamin B12 or cobalamin, which has an
atom of cobalt at its center. Our cells cant make it on their own, but
bacteria in our gut do, so we have a set of machinery for gathering
it and delivering it into our cells. It is essential for the action of two
enzymes, involved in energy metabolism and regeneration of the
amino acid methionine. This might not seem critical, but if vitamin
B12 is missing from the diet (or we are unable to absorb it), it causes
many complications as molecules downstream in metabolism are
also impacted. Much the same is true for other vitamins, so it always
pays to make sure that were getting enough in our diet, or if neces-
sary, in that multivitamin with breakfast.
The creatine prescribed by my trainer, however, is a bit more
problematic. Creatine is a small molecule that is obtained in the diet
or is constructed in cells from the amino acids arginine and glycine.
It forms a very unstable bond with phosphate groups and is used as
a way to shuttle energy around cells, in particular, around muscle
cells. ATP is made in large quantities in the mitochondria, and the
enzyme creatine kinase builds phosphocreatine using up the
ATP.This phosphocreatine then travels out to the working part of
the muscle, and a different form of creatine kinase performs the
opposite reaction, creating ATP to power muscle contraction. So
the logic is obvious: if we supplement our muscle cells with creatine,
13
. Fig. 13.7 Creatine kinase. Creatine kinase in cytoplasm is a dimer of subunits, but the mitochondrial version of the
enzyme can also form huge octameric complexes (PDB entries 2crk, 1qk1)
we can potentially build up a larger storehouse of energy and per-

form better in endurance sports that require large bursts of energy.
The science has been ambiguous, however: some studies see
improvements in strength and stamina, others dont. I did notice a
remarkably quick gain in muscle mass when I started taking it, but
that, I unfortunately later learned, was primarily due to retention of
water (. Fig. 13.7).
For people with bigger goals in mind, such as professional body
builders, there is abundant science available to help reach any goal
wed like to obtain. For instance, the growth of muscle is highly reg-
ulated in our bodies by a network of anabolic hormones, such as
testosterone, and these are easily tweaked by dosing with extra
amounts. Structural biology has revealed how these work: they bind
to a receptor in the cell nucleus, activating the genes that control
both androgenic properties related to male characteristics and
anabolic properties related to the synthesis of protein in muscle,
formation of blood cells, and the emotional and physical aspects of
sexual function.
The use of these types of steroid hormones is quite effective for
improving athletic performance but has been deemed unsports-
manlike by most athletic organizations. This has lead to a chemical
arms race as competitive athletes look for an edge, and sports
officials develop ways to test athletes to ensure a clean competition.
The molecule shown here (. Fig. 13.8) is the designer steroid
tetrahydrogestrinone (THG), created by the nutritional supple-
ment company Bay Area Laboratory Cooperative (BALCO), which
97 13
. Fig. 13.8 Steroid receptor. This model of the anabolic steroid receptor includes structures of the steroid-binding
domain, shown with the designer steroid THG, and the DNA-binding portion bound to a short piece of DNA. The
receptor also includes another large domain that has not been studied at the atomic level yet (PDB entries 2amb, 1r4i)
played a central part in a doping scandal in 2003 when this

previously undetectable steroid came to light.
So today, I take a simple approach to my own peak performance,
based in equal part on science and on personal preference. As I
write this, Im currently on yet another diet, trying to shed a few
pounds gained over my holiday celebrations. Im choosing not to
jump into a fad diet, as entertaining as they can be, and instead Im
taking the simple approach: smaller portions and more exercise.
Science has helped me with some diet juices, to reduce my sugar
intake. Ive cut out the nutritional supplements, but I still take a
multivitamin in spite of probably getting everything I need in my
meals. It may not be a miracle cure, but Im certain it will get the job
done.
99 14
Cellular Signaling
Networks

100 Chapter 14 Cellular Signaling Networks
I probably shouldnt admit this, but I always dread writing about

14 cellular signaling. Its a fascinating topic, but its always fabulously
complex. The stories are never straightforward. Id like to write
simple, understandable stories, like: Bob sent Mary an invitation to
a party, so she went. Cellular signaling stories, on the other hand,
end up more like: Bob sent Mary an invitation to a party, but Sam
cut off the bottom of it so she couldnt read the date, but Sally showed
Mary her invitation, but there was a power failure and Mary couldnt
read it, but Sean brought a flashlight, so she ended up going.
Let me show you what I mean. A few years ago, I had the oppor-
tunity to work with a team of students to develop a picture of signal-
ing during the development of blood vessels, as part of Tim
Hermans CREST project at the Milwaukee School of Engineering.
The growth factor VEGF (vascular endothelial growth factor) is
released by cells that arent getting enough oxygen, and it promotes
the development of new blood vessels in the vicinity. This action is
essential for the development of the circulatory system in embryos
and for adding new blood vessels to compensate for injured or
blocked ones. VEGF also plays an important role in disease. For
instance, cancer cells often make a lot of VEGF to build blood ves-
sels in a growing tumor. Many processes are stimulated by VEGF,
including cell division, migration, and remodeling of the connec-
tions between cells (. Fig. 14.1).
The first step of VEGF signaling is fairly straightforward. VEGF
is released into the blood and circulates to its target cells and then
. Fig. 14.1 Signaling pathway for VEGF. This diagram is taken from the KEGG Pathway Database (http://www.
genome.jp/kegg/pathway.html), a popular online database for signaling networks in cells. As you can see, the binding
of VEGF activates many interconnecting signaling pathways that lead to a variety of cellular changes. The picture is even
more complex than this, because all of the dotted lines on the right side of the diagram represent dozens of other
proteins that change or cause changes based on the signal
Chapter 14 Cellular Signaling Networks
101 14
somehow has to relay its signal inside the cell. This task is accom-
plished by a specific receptor for VEGF, which is found in the mem-
brane of the target cell. The mechanism of signal transduction is a
matter of simple arithmetic: 1 + 1 = 2. Normally the receptor pro-
teins float around separately in the membrane and are in the off
state. When VEGF binds, it brings together two receptor molecules,
forming an active dimer that triggers a signal inside the cell.
This dimerization mechanism was revealed in crystallographic
structures of the receptor. The mechanism was first discovered in a
structure of the receptor for human growth factor, and later struc-
tures for the VEGF receptor showed a similar mechanism. These
receptors all have a similar modular structure. There is a large
domain on the outside of the cell that binds to the soluble factor,
connected to a short segment that crosses the membrane and, on the
inside, a domain that triggers the signal inside the cell. The whole
thing is rather flexible, so scientists often determine the structure in
parts (. Fig. 14.2). If we allow ourselves a bit of latitude to mix and
match pieces from several similar forms of the receptor, we can build
up a rather complete picture of the whole thing.
The portion on the inside of the cell is a protein kinase domain.
Protein kinases are enzymes that add phosphate groups to protein
chains. When two VEGF receptors are brought together, these
kinases first modify each other, making them more active, and then
they start adding phosphates to other proteins in the signaling net-
work. This launches the signal inside the cell.
The VEGF receptor exemplifies several of the functional fea-
tures needed for an effective signaling protein. First, they need to be
able to turn on and off quickly, so they can respond to the minute-
by-minute needs of the cell. The association and separation of the
two receptors provide these two states. At the same time, the signal
we need is a strong signal thats not too subtle, but it still needs to be
reversible. Phosphate groups are perfect for this. They carry a strong
negative charge, so they are easily recognizable by other proteins in
the signaling network. But at the same time, they are easy to add
and remove by employing specific kinases and phosphatases, so the
signal may be turned on and off quickly and efficiently.
Looking at the pathway diagram, we can see that many other
modes of signaling are also used. Calcium is released in some cases,
which binds to proteins and modulates their activity. The gas nitric
oxide is produced in other legs of the network. It diffuses to nearby
cells and causes its changes there. Like phosphorylation, these
small, mobile molecules are easily recognizable, sending a strong
signal that may be quickly quenched by gathering up or metaboliz-
ing the molecules when the signal is finished.
Activation by phosphorylation is essential in the next leg of
the signaling network, where the signal is amplified and delivered
to appropriate places within the cell. Activation of the VEGF
receptor has many effects in the cell. One is the stimulation of a
variety of genes involved in cell growth. A cascade of kinases, one
phosphorylating the next, amplifies the signal, ultimately deliver-
ing into the nucleus. A variety of helper proteins tune the process.
Atomic structures have revealed that many of these kinases have a
. Fig. 14.2 VEGF receptor. Atomic

14 structures of several portions of the
receptor have been determined,
including the portion that binds to
VEGF (at the top, with VEGF in red), a
small domain near the membrane, the
portion that crosses the membrane, and
the kinase domains on the inside (PDB
entries 2x1x, 3kvq, 2m59, 4ase)
distinctive two-lobed structure, trapping the ATP that donates the

phosphate inside. Structures of the last kinase in this cascade,
ERK2, show that the phosphorylation remodels the active site,
making it over a thousand times more active in the reaction
(. Fig. 14.3).
Once the activated kinase gets into the nucleus, the message is
transferred to regulatory proteins that are expert in controlling the
genome. This is where things get really complex, since many differ-
ent genes need to be activated, and its not always the same genes in
different types of cells. When I was developing the story with the
students, we decided to show C-fos and Jun. C-fos is phosphory-
lated by the signaling kinase, stimulating it to associate with Jun,
which together bind to specific sites in the genome. It binds as part
of an enhanceosome, which integrates signals from many different
transcription factors, together deciding if the gene will be expressed
or not.
Transcription factors come in many sizes and shapes. Atomic
structures have revealed that they typically are composed of two
103 14
. Fig. 14.3 ERK2 and DUSP5. ERK2 is a kinase that adds phosphate groups
to transcription factors and is itself activated by phosphorylation. The two
structures on the left show how addition of phosphates tightens up the active
site. DUSP5 reverses the signal by removing phosphate groups. The image on
the right is built from two atomic structures: a crystallographic structure of the
catalytic domain and an NMR structure of a domain involved in substrate
recognition (note that this domain looks a bit dierent in the illustration, since
it includes all the hydrogen atoms). The catalytic domain includes sulfate
groups bound in the active site. An acidic amino acid (shown here in bright
magenta) performs the cleavage reaction (PDB entries 1erk, 2erk, 2g6z, 1hzm)
parts: one part receives the signal and responds to it, and the other
recognizes the appropriate sequence of DNA in the genome. Very
often, as with the cell surface receptors, scientists determine struc-
tures of these two portions separately. In the case of C-fos and Jun,
only the DNA-binding portions have been determined, and we have
to infer the rest from the sequences of the proteins. A related com-
plex is found as part of the enhanceosome for the interferon-beta
gene, as seen on the left-hand side of . Fig. 14.4.
Finally, a specific phosphatase enzyme, DUSP5, removes the
phosphate from the last kinase in the signaling cascade, shutting the
whole process down. Its also a flexible protein and has been studied
in parts. A crystallographic structure revealed a dimeric structure
for the catalytic domain, which includes a small loop that hugs the
phosphate group and perfectly positioned acid group that clips it
off. The protein also includes another domain involved in recogni-
tion of the proteins that it dephosphorylates. A domain from a sim-
ilar protein is shown in . Fig. 14.3, since the DUSP5 domain has
not been studied yet.
I worked with the CREST team to bring this information
together into a picture of the whole process. The students chased
down UniProt entries and atomic structures, along with electron
micrographs of vascular cells, filling out all the structural details.
14
. Fig. 14.4 Enhanceosome. This illustration of an enhanceosome is cobbled

together from several dierent structures, each determining a portion of the
whole assembly. Missing pieces are shown schematically with circles (PDB
entries 1t2k, 2pi0, 2o6g, 2o61, 1qwt)
We decided to show two processes stimulated by VEGF: remodel-

ing of cellular adhesion sites and the actin cytoskeleton and expres-
sion of genes. . Fig. 14.5 includes two details from the painting. In
spite of the complexity of this image, it only captures a few small
aspects of VEGF signaling.
There is nothing quite like the study of signaling networks to
convince us of the random process of evolution and its effectiveness
as a design process. When looking at enzymes like lysozyme or
trypsin, I see the cell as a finely honed machine, with every protein
optimized over millennia for perfect function. When looking at cell
signaling networks, however, I see the cell as an old jury-rigged
automobile, barely held together with bailing wire and duct tape. I
can just imagine generation after generation of changes, randomly
adding a new kinase here or a backup phosphatase there, slowly
tuning the flow of information. As a result, our cells are filled with
thousands of receptors, kinases, phosphatases, and transcription
factors and a host of other signaling proteins that together decide
how to respond to our world minute by minute and year after year.
105 14
. Fig. 14.5 VEGF in action. VEGF (1) signaling starts at the cell membrane, where it brings together two VEGF
receptors (2), activating the kinase domains inside the cell and activating several signaling pathways in the cell. C-src (3)
is phosphorylated, causing it to open up and phosphorylate cadherins (4) in adherens junctions between cells. This
releases alpha-catenin (5), which dimerizes and bundles actin laments (6). In another pathway, receptor dimerization
launches a cascade of activation reactions through PLC-gamma (7), PKC (8), Raf-1 (9), MEK (10), and ultimately ERK2 (11).
Activated ERK2 (11) is transported into the nucleus, where it phosphorylates C-fos (12), causing it to form a heterodimer
with Jun (13). This then acts as part of an enhancer to promote transcription of genes needed for blood vessel growth,
binding to transcription mediator (14) and ultimately starting transcription by RNA polymerase (15). Finally, DUSP5 (16)
terminates the process by removing the phosphates from ERK2
107 15
GPCRs Revealed

108 Chapter 15 GPCRs Revealed
In the past decade or so, there has been a quiet revolution in the
15 field of cell signaling. An elusive target finally yielded its secrets: in
2007, the first atomic structure of a GPCR was determined. GPCRs
(G-protein-coupled receptors) sit in the membranes of cells
throughout our bodies and pass messages inside to G proteins.
Theyve been a particularly hard nut to crack because they are
smallish proteins, most of which is buried in the membrane. The
small bits that extend on either side of the membrane dont provide
much leverage for forming crystals, so they eluded crystallography
for many years.
The first glimpses of GPCRs were obtaining using a similar pro-
tein, bacteriorhodopsin, made by bacteria that live in high-
temperature brine pools. Bacteriorhodopsin forms beautiful arrays
in the membrane of these bacteria, making it a perfect candidate for
study by electron diffraction. Richard Henderson and Nigel Unwin
worked for years to improve, step by step, the structures of bacteri-
orhodopsin. At first, they could only see the major feature: a bundle
of parallel alpha helices that cross the membrane. With more work,
they finally revealed the atomic details, including the loops that
connect the helices, chromophore bound inside that captures light,
and even amino acids involved in pumping protons across the
membrane powered by light energy (. Fig. 15.1).
The big advance that opened the door to GPCR atomic structure
came from protein engineering. The trick was to create a chimeric
protein that substitutes one of the GPCR loops with a small, stable
protein. This acts like a handle, helping to coax the slippery
membrane-crossing portion of the molecule into crystals.
Crystallographers often use antibodies in the same way. Antibodies
stick to the target protein and help link them together into crystals.
Indeed, parallel structures of the GPCR that recognized adrenaline
were solved in both of these waysas a chimera with lysozyme
(. Fig. 15.2) and as a complex with an antibody.
Many amazing structures followed these breakthrough
structures, building on the method. For instance, many additional
structures of the adrenergic receptor have revealed the structural
basis of signaling. One of the big mysteries has been the way that
GPCRs pass their signal from the outside to the inside of the mem-
brane. By comparing an inactive conformation, frozen in place by
an inhibitor that blocks signaling, with an active form bound to a G
protein, we see that a few of the helices shift and bend, propagating
a signal through the protein and across the membrane. One helix in
particular bends at its center, forming a convenient pocket for the
binding of the G protein (. Fig. 15.3).
Additional structures have revealed a diverse collection of
GPCRs, receiving all manner of signals and passing them into the
cell (. Fig. 15.4). Opsin receives the tiniest of signalsa single pho-
tonultimately launching a cascade of signals that tells the brain
that it has seen light. The serotonin receptor is one of the cogs in the
process of thoughtit recognizes signals from the neurotransmit-
ter serotonin and passes the message into a nerve cell. CXCR4
mediates signals passed between cells in the immune system and
has the unfortunate distinction of being one of the proteins
Chapter 15 GPCRs Revealed
109 15
. Fig. 15.1 The structure of bacteriorhodopsin was solved after many years of work, studying two-dimensional
crystals of the protein using electron diraction. The crystal lattice is shown at the top, with the proteins in purple and
the surrounding lipids in the membrane in dark red. At the bottom, a cartoon shows the characteristic bundle of seven
alpha helices, with the light-capturing chromophore in magenta (PDB entry 2brd)
recognized by infecting HIV virions. Glucagon receptor helps mon-

itor the level of glucose in the blood and tells cells to take appropri-
ate action.
Atomic structures of these GPCRs reveal that they are all built
around the same infrastructure of parallel alpha helices, but they
are customized for their particular signaling task. Opsin needs a
special chromophore to recognize its signal. Serotonin receptor has
a tiny binding site that perfectly fits the neurotransmitter. CXCR4
forms dimers that help to modulate its signaling. Glucagon receptor
has a large domain that encloses the hormone. Each new structure
brings a new surprise to extend the remarkable structural palette of
this class of molecules.
So what is all the fuss about GPCRswhy is everyone so excited
by these structures? It turns out that GPCRs are the targets of many
important drugs, and atomic structures have revealed how these
110 Chapter 15 GPCRs Revealed
15
. Fig. 15.2 Adrenergic receptor. The adrenergic receptor was engineered to substitute one small loop with an entire
molecule of lysozyme, which has many charged amino acids (bright blue and red on the right) on its surface. The portion
of the receptor that spans the membrane (shown with the gray bar) is coated with carbon-rich amino acids (shown with
white spheres on the right) (PDB entry 2rh1)
. Fig. 15.3 GPCR signaling. The motions involved in GPCR signaling were revealed in two structures of the adrenergic
receptor: an inactive conformation (in blue at left) bound to an inhibitor (green) and an active conformation (in red at
left) in the process of activating a G protein (shown on the right). The major change is a large swinging motion of one of
the helices that propagates the message from the adrenaline-binding site to the G protein (PDB entries 3sn6, 2rh1)
Chapter 15 GPCRs Revealed
111 15
. Fig. 15.4 GPCRs. GPCRs come in all shapes and sizes. Opsin binds to retinal and senses light in our retinas.
Serotonin receptor senses the levels of the neurotransmitter serotonin in the brain. CXCR4 senses the level of
chemokines and acts as a dimer. Glucagon receptor has an extra domain that closes around the top of the small peptide
hormone glucagon (PDB entries 1f88, 4iar, 3odu, 1gcn, 4ers, 4l6r)
. Fig. 15.5 Adenosine receptor and caeine. Caeine blocks the adenosine receptor, a GPCR that plays a role in the
level of metabolism. These two structures capture the adenosine receptor doing its normal job (on the right) and after it
is blocked by caeine (PDB entries 2ydo, 3rfm)
work and opened new avenues for improving them. For instance,
the most widely used psychoactive compound, caffeine, acts
through a GPCR to stimulate cells and give us that special boost
from our morning cup of coffee (. Fig. 15.5). It blocks the action of
the adenosine receptor by perfectly mimicking the normal activator
of the receptor. Today, scientists are using these new structures of
GPCRs to design new treatments. With more careful targeting of
the adenosine receptor, we are finding ways to help people with
Parkinsons disease. By blocking the histamine receptor, we can
manage allergies. By targeting the opioid receptor, we can help
manage pain. And the list goes on. These central receptors are
becoming ever more central to the way we manage our health.
113 16
Signaling with Hormones

114 Chapter 16 Signaling with Hormones
Failure to communicate is rarely a good thing. At a personal level, it

16 can ruin a relationship; at a national level, it can lead to war and
strife. The same is true in our own bodies. Diseases like diabetes are
the direct result of failure to communicate. Our bodies are very big,
and our cells are very small, so they need to talk to one another to
make sure that everyone is working towards the same goals. But
when this communication breaks down, it causes deadly problems.
Cells throughout the body communicate by passing hormones
to each other, which act like little molecular letters with a single
encoded instruction. These messages come in many shapes and
sizes. Some, such as human growth hormone and glucagon, are
small proteins. Others, such as the thyroid hormones, are small
organic molecules, occasionally with odd atoms like iodine. Some
are even smaller, such as the gas nitric oxide that is used for control-
ling the state of the circulatory system.
Insulin is probably the most familiar of these messages, due to
its connection with the current rise of diabetes in the western world.
When special cells in the pancreas sense that glucose levels are get-
ting high, maybe after a carb-heavy meal, they build insulin and
drop it into the bloodstream. It travels to cells throughout the body,
binding to dedicated receptors on cell surfaces. This triggers a
change in the cells, and they shift their duties to the uptake and stor-
age of glucose in the form of glycogen (. Fig. 16.1).
Atomic structures have revealed many of the details of this pro-
cess, but there are still some mysteries. Insulin itself is a tiny protein,
composed of two chains, termed the A-chain and B-chain. Crystal
structures revealed that it forms a beautiful hexamer when zinc ions
are around, which turn out to be important for storage of the
molecule before release (. Fig. 16.2). Many decades of structural
research have also shown that the insulin fold is quite dynamic. In
particular, one end of the B-chain adopts a range of different confor-
mations that are important for the signaling.
You might wonder how this tiny protein stays folded up into the
proper shape. Most proteins are considerably larger and have more
amino acids that can help to stabilize the whole protein. Many small
proteins, such as insulin, are stabilized by the addition of disulfide
bridges. These are bonds formed between two cysteine amino acids
in the chain, and they form extra connections that help glue the
whole structure. Insulin includes three of these linkages. Another
puzzle is how it gets to this folded structureit would be tricky for
two tiny chains to come together and form exactly the right folded
structure, with exactly the right disulfide linkages. This puzzle was
solved when the sequence of the insulin gene was determined.
Insulin is actually built as a larger protein, termed proinsulin, which
folds up into the appropriate structure. Then, the extra bits are
clipped off to create the active protein (. Fig. 16.3).
The insulin receptor is a huge protein complex with many mov-
ing parts (. Fig. 16.4). Since it is so flexible, structural biologists
have chopped it into functional pieces and studied each one sepa-
rately. The insulin-binding portion, which is displayed on the outer
surface of the cell, is composed of two L-shaped domains that gather
insulin. This propagates a signal to the smaller kinase domains on
Chapter 16 Signaling with Hormones
115 16
. Fig. 16.1 Insulin in action. This illustration shows two consequences of insulin. Insulin (1) binds to its receptor (2) on
the surface of the cell, activating a cascade of signaling molecules (3) inside the cell. These activate glycogen-building
enzymes (4) and also stimulate the transport of vesicles with glucose transporters (5) to the cell surface. Together, they
take glucose (white dots) into the cell from the blood and store it in large glycogen (6) molecules
. Fig. 16.2 Insulin. Insulin forms a stable hexamer (left) when it is stored in the pancreas, which disassembles into
active monomers (right) when it is delivered to the blood. Each insulin monomer is composed of two chains (colored
blue and green) that are linked together with several disulde linkages (yellow) (PDB entry 1trz)
16
. Fig. 16.3 Proinsulin and insulin. Insulin is synthesized in cells as a longer protein, called proinsulin, which is then
clipped to form the active protein (PDB entries 2kqp, 4ins)
the interior of the cell membrane, bringing them together so that

they can activate each other. These kinases then launch the signal
inside the cell by adding phosphate groups to signaling proteins.
The kinase domain in the insulin receptor is itself activated by
adding phosphoryl groups. Presumably when insulin binds, it brings
the two kinases closer, so they can activate each other. Two structures
have captured this activation (. Fig. 16.5). In the inactive conforma-
tion, a loop in the protein, which contains three key tyrosine amino
acids, nestles in the active site of the protein, blocking it. When these
tyrosines are phosphorylated, the whole loop pops free, exposing the
active site and allowing it to add phosphoryl groups to tyrosines on
other proteins. As seen in these structures, kinases are typically very
mobile enzymes that open and close around their targets. They need
to do this because the reaction needs to be sheltered from water. They
close around a molecule of ATP and their target protein chain and
then transfer a phosphate from the ATP to the protein. If theres too
much water around, it could circumvent the process, breaking off the
phosphate before it can be transferred to the desired location.
As you might imagine, hormones are of great interest to the
medical community. Slight problems with the signaling network
can cause severe problems with health, and insulin is no exception.
Diabetes is caused, in part, when insulin signaling is corrupted,
leading to chronic high levels of glucose in the blood. Over time,
this leads to life-threatening complications. For instance, glucose
and molecules that are made from it are mildly reactive, and they
slowly attach themselves to sensitive amino acids in proteins, form-
ing a weak connection. Over time, however, these attached sugars
117 16
. Fig. 16.4 Insulin receptor. These illustrations of the insulin receptor are constructed from several atomic structures
of the individual parts. The inactive form is shown on the left, with the insulin-binding portion at the top and the kinase
domains at the bottom. When insulin (red) binds, it is thought to bring the kinase domains together so that they can
activate each other (PDB entries 3loh, 2mfr, 1irk, 3w14)
. Fig. 16.5 Kinase domain of the insulin receptor. The kinase domains are activated when three tyrosines (green) are
phosphorylated (yellow and red in the right image). This opens up the active site, allowing ATP (magenta) to bind. The
active structure on the right also includes a small piece of a signaling protein (red), with a tyrosine ready to be
phosphorylated by the ATP (PDB entries 1irk, 1ir3)
16
. Fig. 16.6 Glycated hemoglobin. This structure of hemoglobin has a sugar attached to a lysine amino acid deep
inside the protein (PDB entry 3b75)
transform into strongly attached analogues that corrupt the func-

tion of the protein. The immune system is not happy about these
problems and launches an inflammatory response. If this gets bad
enough, it can lead to the terrible complications of diabetes. Doctors
often look at a patients hemoglobin to assess this damage, by
analyzing the amount of sugar that has been attached to it. An
atomic structure has captured one of the glycated hemoglobins
(. Fig. 16.6).
Remarkably, researchers discovered a century ago that we may
reverse these problems by simply replacing the missing hormone.
Human insulin was very hard to come by at the time (it needed to
be purified from human cadavers), but they found that the very
similar insulin from livestock worked just as well. Today, human
insulin is constructed using engineered bacteria or yeast that build
large quantities of recombinant insulin identical to the one we nor-
mally make. Using this insulin, diabetic patients can manage their
own blood glucose levels. But because insulin is a protein, it must be
injected, since it would be quickly digested if taken orally.
Researchers are currently taking insulin treatment to the next
step, building on the results from structural biology. A large quan-
tity of insulin is released after each meal, telling the body to store
the sugar that is teeming through the blood. But low levels of insu-
lin are also secreted throughout the day and night, working with the
complementary hormone glucagon to ensure that there is always
just the right amount of sugar available to power cells throughout
the body. When insulin is injected, it has an immediate effect, but
this wears off within a few hours. Soon after the discovery of insulin
treatment, researchers began searching for ways to mimic the more
long-term, basal effects of insulin. This is where the atomic struc-
tures came to the rescue.
119 16
. Fig. 16.7 Designer insulins. Insulins with special properties have been engineered. Slow-acting insulin Degludec
(top) adds a hydrocarbon (pink) that links neighboring hexamers, and fast-acting insulin Aspart (bottom) changes one
amino acid to a charged aspartate (red) that destabilizes the hexamer (PDB entries 4ajx, 4gbc)
The trick to creating a long-lasting insulin is the same trick we

use to make other long-lasting drugs: we create a form that slowly
dissolves, releasing the active hormone slowly over time. The first
approach was very traditional: they mixed insulin with the fish
protein protamine and a little zinc, creating a complex composed of
insulin hexamers in a gluey aggregate that falls apart slowly when
injected. The atomic structures allowed researchers to find more
sophisticated ways that employ only the protein itself. One success-
ful approach is to connect long hydrocarbon chains to the insulin
chains (. Fig. 16.7). These chains reach out and interact with neigh-
boring insulin hexamers, promoting the formation of aggregates
that dissolve more slowly. They also interact with the fatty acid car-
rying protein albumin in the blood, further sequestering them and
slowing release. In this way, these designer insulins continue to
have an effect over many hours.
Researchers have also developed designer insulins that act more
quickly than regular insulin, for use immediately after meals. The
trick with these is just the opposite of slow-acting insulins: we need
to make the insulin hexamer fall apart quickly. One approach is to

16 change one amino acid from a neutral amino acid to a charged
aspartate. This destabilizes the interface between the insulin mono-
mers by several hundredfold, but doesnt affect the hormones bind-
ing to the insulin receptor. A combination of these designer insulins,
some fast acting and some slow acting, now allows patients to join
in their bodys dialogue about glucose levels, deciding when glucose
needs to be stored and when it is needed for energy.
121 17
Single-Molecule
Chemistry: Enzyme Action
and the Transition State

122 Chapter 17 Single-Molecule Chemistry: Enzyme Action and the Transition State
Life is control. The trick to staying alive is to control your environ-

17 ment. Living cells need to take the resources available around them
and change them into the molecules that they need. To do this, cells
build thousands of types of specialized proteins called enzymes.
Each one performs a specific chemical reaction needed for living.
Some break molecules into pieces; others connect molecules
together. Some change the shape of molecules, and others change
the chemical properties of molecules. Some capture energy in their
reactions; others require energy to perform a particularly difficult
reaction. All of these different enzymes work together to perform
the many chemical tasks needed in the cell.
The growth of medical science has made enzymes a familiar
part of our own life. Most drugs are small molecules that bind to
particular enzymes and block their action. For instance, if you take
an aspirin, you are blocking the enzyme cyclooxygenase, an enzyme
involved in the construction of molecules that deliver pain signals.
Antibiotics like penicillin attack essential enzymes in pathogenic
bacteria, blocking their action and crippling the bacteria (. Fig. 17.1).
These drugs are highly specific tools that we can use to target specific
enzymes involved in health and disease.
Much of the research of structural biology has been focused on
exploring the atomic secrets of enzymes. Enzymes are particularly
amenable to study, since they are often stable, soluble proteins and
may be coaxed with clever experiments to perform their atom-sized
jobs while we are watching, so there is a vast amount of information
. Fig. 17.1 Penicillin and bacterial enzymes. The enzyme shown here,
d-alanyl-d-alanine peptidase, performs an essential reaction for the
construction of the protective cell wall surrounding many types of bacteria.
Penicillin, shown here in red, blocks the machinery of this enzyme, ultimately
killing the cell (PDB entry 1pwc)
Chapter 17 Single-Molecule Chemistry: Enzyme Action and the Transition State
123 17
available about how they are constructed and how they perform
their reactions.
In 1965, D.C.Phillips solved the structure of lysozyme, giving us
the first look at how enzymes work. The structure revealed a form-
fitting active site, perfectly shaped to bind to its target, a bacterial
carbohydrate chain. This structure confirmed the basic theory of
enzyme action: enzymes stabilize the transition state of an enzyme
reaction. A chemical reaction typically begins with a stable sub-
strate and ends up with a stable productin the case of lysozyme,
the substrate is a carbohydrate chain and the product is a cleaved
chain. In the course of this reaction, however, the molecules must
pass through a number of less stable intermediate states, termed
transition states. The enzymes major job is to streamline the path
through these transition states, guiding the reaction efficiently from
substrates to products.
The structure of lysozyme showed many ways that enzymes
do this. A key concept is that enzymes make sure that everything
is in the right place at the right time. For the lysozyme cleavage
reaction, this includes several things. It has a form-fitting groove
that grips the intended substrate, making sure that the enzyme
only acts where it is supposed to act. But the groove isnt a com-
fortable fit. It grabs the two ends of the chain and torques the
center, causing one of the sugar rings to distort into a less-stable
conformation that is more amenable to the reaction (. Fig. 17.2).
Then, specific amino acids around the target bond deliver a
water molecule and orchestrate the chemical steps of the cleav-
age reaction.
. Fig. 17.2 Lysozyme mechanism. This structure captures lysozyme in the

middle of its reaction. The structure includes a small fragment of the normal
bacterial cell wall, with two sugars (in yellow and green) and a short peptide
chain (extending o to the left). The sugar in the front (in green) is in the typical
chair shape, which is quite stable. The sugar in the back (in yellow) is being
attacked by an acidic group in the enzyme (in red) and has been distorted into
an unstable shape that is needed to form the transition state of the reaction
(PDB entry 148l)
17
. Fig. 17.3 HGPRT active site. Three snapshots of HGPRT capture the enzyme with its starting substrates (left), after
guanine has been added to the sugar to form GMP, releasing pyrophosphate (center), and a form with just the GMP
product bound (right). Notice that a loop in the enzyme (green) has opened up in the nal structure, but the release
of GMP is still the slowest part of the whole process (PDB entries 1d6n, 1bzy, 1hmp)
One of the challenges of studying enzymes is that they perform

their reactions so quickly, so its hard to observe their structures in
action. Researchers have gotten around this problem by throwing a
monkey wrench in the system, which halts the enzyme at one step
and allows us to see whats happening. They construct molecules
that are similar to the normal substrate of the enzyme but with a few
atoms changed into particularly nonreactive analogues. The trick is
to find molecules that are similar enough that the function and
shape of the enzyme is similar to the natural one, but different
enough that they dont just get caught up in the reaction.
In this way, researchers have been able to observe the many
steps in a chemical reaction. For instance, several analogue mol-
ecules were used to study the enzymatic reaction of HGPRT, an
enzyme involved in the synthesis of nucleotides. It normally takes
a phosphate-activated sugar molecule and attaches a guanine
base to it, releasing GMP for use in building DNA.Researchers
got a look at how the two starting molecules are bound to the
enzyme by using a nonreactive molecule similar to guanine and a
look at the final product by using a molecule with a nonreactive
sugar (. Fig. 17.3).
Molecules that mimic the transition state are useful for studying
the action of enzymes, but theyre even more useful as drugs. For
instance, several HIV drugs mimic the transition state of an essen-
tial enzyme made by the virus: HIV protease. The reaction involves
a water molecule that is activated by two acidic amino acids in the
enzyme. This is added to the protein chain that is cut, forming a
transition state that has the water attached. This then breaks apart
to finish the reaction. The drugs mimic this process by creating a
molecule that looks like the protein chain with the water attached
(. Fig. 17.4). As these drugs have been optimized for better clinical
properties, the protein portion has been modified and looks quite
different than typical protein chains. But the attached analogue of
125 17
. Fig. 17.4 HIV protease inhibitors. Three structures show some of the logic
for design of drugs that block HIV protease. The enzyme is shown at the top,
with a small peptide bound in the active site. Two acidic groups (turquoise)
catalyze the cleavage reaction at the center of the peptide, activating a
carbonyl group (red). Below this, three molecules are shown. At the top is the
same peptide, with the carbonyl oxygen shown with a star. One of the early
inhibitors was a symmetrical analogue of the transition state that has the
oxygen changed to a noncleavable hydroxyl. Later developments lead to
eective drugs like saquinavir, which are smaller and bind much more tightly
to the enzyme (PDB entries 1kj4, 9hvp, 1hxb)
water is found in all of these drugs, where it interacts with the two
catalytic amino acids.
This type of rational drug design is one of the most direct
approaches to using structural information and taking control of our
own molecules. The goal is to find a specific small molecule to block
the action of a biological molecule. The approach has been successful
in numerous systems, creating therapeutic drugs for everything
from cancer to blood pressure. This is often a meandering process,
making changes one by one to improve the drug. For instance, the
17
. Fig. 17.5 Rational design of Gleevec. The antileukemia drug Gleevec was
designed based on the structure of a protein tyrosine kinase (shown on the
left, with the drug in green). A series of renements were made during the
design process, ultimately yielding a drug that is specic for the targeted
protein and has good properties for use as a drug (PDB entry 1iep)
antileukemia drug Gleevec was designed in several rational steps

(. Fig. 17.5). The target of the drug is a protein tyrosine kinase called
ABL, which has a mutated form in the leukemia cells. The drug
design effort began with a small compound that binds to a related
kinase, PKC. Addition of another ring enhanced this compounds
activity against cells. Addition of an amide group at the other end
made the compound active against ABL.Then, addition of a single
methyl group surprisingly abolished the activity against PKC, mak-
ing the drug more specific for the desired target. Finally, an addi-
tional group was hung off of one end to make the whole thing more
soluble, so it would be useful as an oral drug.
Scientists have also used this understanding of enzymes and their
transition states in a clever way: to create new enzymes using the
immune system. The immune system has the ability to build anti-
bodies that bind to nearly any type of molecule. So scientists have
used these types of transition state analog molecules to immunize
animals, coaxing their immune system to create antibodies that bind
tightly to the molecules. Since this is one definition of an enzymea
protein that binds to reaction transition statesthese abzymes
often show enzymatic activity. The one shown in . Fig. 17.6 was cre-
ated using a transition state molecule that mimics the breakdown of
cocaine, and consequently, we have a brand-new enzyme that will
detoxify the drug in the bloodstream.
127 17
. Fig. 17.6 Cocaine catalytic antibody. Three structures capture a catalytic antibody at dierent steps in its reaction:
binding to cocaine and a water molecule, the transition state where the water has been attached to cocaine, and the
cleaved cocaine products. Two amino acids that catalyze the reaction are shown in turquoise (PDB entries 2ajv, 2ajx,
2ajy)
129 18
Seven Wonders
of the World of Enzymes

130 Chapter 18 Seven Wonders of the World of Enzymes
The enzymes in our cells perform a bewildering variety of tasks.

18 Some of these tasks, such as the cleavage reaction performed by
lysozyme, are simple enough that a standard protein, composed of
the normal 20 amino acids, will suffice. In other cases, however, the
reaction may be too difficult, or too sensitive to the surrounding
environment, or too dangerous to the cell to be performed by a
typical enzyme, and cells have evolved many wondrous specializa-
tions in their enzymes to perform these tasks as needed. In this
chapter, I have selected seven aspects of enzyme action that I find
most amazing. But you only need to look to Nature to find many
additional wonders.
One: Perfect Enzymes
Some enzymes are amazingly fast, so fast in fact that they perform
their reactions faster than molecules can get to them. The diffusion
of molecules through the watery cell environment is fast, but only
so fast. This sets an interesting upper limit on the evolution of
enzyme functiontheres no need to improve the function further,
because the enzyme is already perfect enough in the context that it
performs its job.
Scientists have found several examples of these perfect
enzymes. Most perform very simple tasks that require the capture
of a single molecule, followed by a small chemical change. Carbonic
anhydrase is a perfect example. It is important for solubilizing car-
bon dioxide in the blood. Throughout your body, it combines car-
bon dioxide and water to form carbonic acid and bicarbonate,
which are both very soluble. Then, in the lungs, it oversees the
reverse process and releases the carbon dioxide when we breathe
out. This reaction can occur naturally without the enzyme, but the
enzyme allows more control, speeding up the reaction in the desired
place by a million times.
But the evolution of enzymes hasnt stopped there. A study of
the enzyme superoxide dismutase found that it performs its reac-
tion even faster than might be expected. The structure of superox-
ide dismutase revealed that this enzyme gives its target an extra
boost. It has a strongly charged patch around the active site, which
forms a funnel that draws the oppositely charged radical molecule
into the right place (. Fig. 18.1).
Two: Induced Fit

As more and more structures of enzymes have been determined, it
has become increasingly apparent that they are dynamic machines.
Some enzymes are virtually rigid, with a cleft on one side that binds
to molecules and catalyzes the reaction. Other enzymes, however,
change their shape to accommodate their substrates, in a process
that was dubbed induced fit by the scientists studying the process.
This motion may be a small rearrangement of amino acids to grip
the substrate molecule more tightly, or in some cases, the entire
Two: Induced Fit
131 18
. Fig. 18.1 Substrate steering by superoxide dismutase. Superoxide

dismutase speeds up an already fast reaction by drawing its substrate
superoxide into its active site. In the upper illustration, a metal ion in the active
site is shown in green. The lower illustration shows the electrostatic potential
around the enzyme, showing a large positively charged region around the
active site (in blue). This attracts the negatively charged superoxide radical
(PDB entry 2sod)
enzyme will open up and close around its substrate, creating a

form-fitting pocket that completely surrounds it.
The structure of hexokinase, the enzyme that performs the
first step in the breakdown of glucose, was the first enzyme struc-
ture where induced fit was observed. It was captured in two states:
an open state and a closed state bound to glucose. In the years
since then, many structures have shown that this is a common
approach used by many kinases, such as phosphoglycerate kinase
(. Fig. 18.2), and other enzymes that use ATP in their reactions.
These enzymes all capture ATP and a target molecule and transfer
a phosphate from the ATP to the target. The transition state of this
reaction is sensitive to water, which would cause the phosphate to
be released without being added to the target. So, these kinase
enzymes surround their reaction, excluding any interfering water
molecules.
18
. Fig. 18.2 Kinase-induced t. Phosphoglycerate kinase has two large

domains connected by a exible hinge. The two structures show how it closes
around its substrates when it catalyzes its reaction (PDB entries 2xe6, 2wzb)
Three: Form-Fitting Active Sites
One of the great advantages of an enzyme active site that surrounds

a set of reactants is that it allows very tight control over the shape of
products that are formed. This allows enzymes to be highly specific,
performing one reaction quickly and efficiently, while leaving all
the other molecules in the cell alone.
When enzymes were first being studied, researchers were puz-
zled by their stereospecificity. For instance, they found that the
enzyme aconitase was able to distinguish between two seemingly
identical acid groups in citric acid, always acting on only one of the
two. Theoretical work on this enzyme lead to the three-point
hypothesis for stereochemistrythe citrate molecule would land on
the surface of the enzyme, and it would recognize three different
groups, orienting the molecule properly. The structure revealed that
this is a simplification, and the enzyme actually surrounds the entire
molecule, recognizing it from top and bottom and right and left.
Looking at the many structures of enzymes that have been deter-
mined, we find that this is a general feature: enzymes recognize their
substrates by touch. The active site has exactly the right shape and
chemical characteristics to fit the substrate, or more specifically, the
transition state. My favorite example of this form-fitting chemistry
is the enzyme lanosterol synthase. This enzyme performs an amaz-
ing series of cyclization reactions, converting a long snaky molecule
into a chunky cholesterol-like molecule (. Fig. 18.3).
Four: Allostery
Cells are filled with enzymes, all performing their individual jobs
quickly and efficiently. Some of these enzymes perform their reac-
tions reversibly, performing a particular reaction and the opposite
Four: Allostery
133 18
. Fig. 18.3 Lanosterol synthase active site. Two structures capture the enzyme lanosterol synthase before and after
its reaction. A cascade of cyclization reactions converts an oxidosqualene molecule (left) into a lanosterol molecule
(right) (PDB entries 1ump, 1w6k)
. Fig. 18.4 Allostery in pyruvate kinase. Pyruvate kinase, a molecule

involved in energy production, exes into an active state (right) when it binds
to molecules that signal the need for energy (PDB entries 1e0u, 1a3w)
reaction equally well. Of course, if all of the enzymes in the cell did
this, the result would be chaos, and nothing would get done. To solve
this problem, cells include many enzymes that perform one reaction
preferentially and may be turned on and off according to need.
One of the key ways that enzymes are regulated is through a
change in shape, termed allostery. The enzyme has two (or more)
states that are active and inactive. The switch between the two states
is controlled by another site on the enzyme, which binds to a regula-
tory molecule. Atomic structures have revealed the nature of these
allosteric motions, capturing many enzymes in both their active
and inactive states (. Fig. 18.4).
18
. Fig. 18.5 Regulation by modication. The oncogene protein Src (upper)

closes up into an inactive shape when a key tyrosine is phosphorylated, and
the digestive enzyme pepsin (lower) is activated by removing a piece from its
inactive precursor pepsinogen (PDB entries 2src, 3psg, 5pep)
Enzymes may also be regulated by making physical changes that

turn them on or off (. Fig. 18.5). In some cases, these changes are
temporary and may be changed according to need. For instance,
phosphate groups are often added to turn kinases on and off in sig-
naling processes. These modifications, since the phosphate carries a
strong negative charge, make a substantial change to the properties
of the enzyme. In other cases, the modification is more drastic and
permanent. For instance, many digestive enzymes are built as pro-
tein chains that are longer than necessary. The extra bits inactivate
the enzyme and make it safe to build and deliver to the digestive
tract. Once in the proper place, the extra bits are clipped off and the
active enzyme launches into its job of destruction.
Five: Substrate Channeling
Some chemical reactions require the formation of toxic intermedi-

ates, or highly reactive intermediates that would quickly be
destroyed in the watery environment of the cell. As we saw before,
Five: Substrate Channeling
135 18
. Fig. 18.6 Substrate tunneling. A cross section of tryptophan synthase shows the tunnel (red stars) that delivers
indole from one active site to the next. The two enzymes are shown in blue and green, with substrate molecules in red
spheres (PDB entry 1beu)
some enzymes completely surround the molecules as the reaction

is catalyzed. In even more complex tasks, several enzymes are
brought together, and they pass intermediates directly one to the
next, making sure that the intermediate doesnt stray off and get
into trouble.
Tryptophan synthase is one of the classic examples. It is a
complex composed of two enzymes that perform two sequential
steps in the construction of the amino acid tryptophan. The com-
plex has a narrow tunnel that connects the two enzymes, deliv-
ering the toxic intermediate indole directly from one active site
to the next (. Fig. 18.6). A huge enzyme complex involved in the
breakdown of sugar takes an entirely different approach. It uses a
flexible arm to transfer its substrates from one place to the next
(. Fig. 18.7).
Researchers have speculated for years about even higher levels of
organization. For instance, we could imagine forming a huge com-
plex with all of the enzymes of glycolysis together, to streamline the
breakdown of sugar. The evidence for these glycosomes is compel-
ling but not conclusive. But other cases have strong evidence, for
instance, a supercomplex of the respiratory electron transport pro-
teins (7 see Fig. 11.4)
18
. Fig. 18.7 Substrate transfer. Pyruvate dehydrogenase complex includes many parts connected by exible linkers.
At the center is a cubic core that organizes the whole complex. Small domains have a special lipoic acid group added
(magenta) that carries the substrate molecules from enzyme to enzyme around the outside. Only a few of these
enzymes are shown in the illustrationin reality, the entire complex is surrounded by them (PDB entries 1eaa, 1lac,
1w85, 1ebd)
Six: Chemical Cofactors
Most enzymes are made of protein, so they have a limited set of

chemical tools that they can bring to bear during a chemical reac-
tion. Of the twenty natural amino acids, only a few are reactive, and
cells play a lot of tricks to make them work. For instance, enzymes
will often increase the reactivity of a key amino acid by pairing it
Six: Chemical Cofactors
137 18
. Fig. 18.8 Metal cofactors in nitrogenase. In the enzyme nitrogenase,

several complex metal complexes provide electrons for the dicult reaction
of splitting nitrogen to form ammonia. Iron is shown in brown, sulfur in yellow,
and a single atom of molybdenum is in red. The small molecule in white and
red helps to stabilize the unusual molybdenum atom (PDB entry 1n2c)
with a charged amino acid, or one that can tweak its chemical state
appropriately. The classic example of this is the serine proteases.
They use a serine amino acid in the reaction, which attacks the pro-
tein chains that are cut by the enzyme. A chain of a histidine and an
aspartate activate this amino acid, making it much easier to transfer
a hydrogen to the molecule being attacked (7 see Figs. 8.3 and 8.4).
Some reactions, on the other hand, are too difficult for the 20
natural amino acids and need special chemical tools. Looking to
nature, these tools abound. Many of them are small organic mole-
cules evolved to perform a specific task. For instance, ATP is perfect
for carrying phosphates, and NAD is perfect for carrying electrons.
Molecules built from the B vitamins specialize in carrying carbon
atoms and other small groups, and transferring them to other mol-
ecules. SAM performs a similar task for sulfur.
In other cases, even more chemical creativity is needed, and
specific metal ions are employed. In some cases, the enzyme just
needs something to bind strongly to a charged group. In other cases,
the strong charges are needed to force a particularly difficult reac-
tion. For instance, the enzyme that fixes nitrogen, converting gas-
eous nitrogen in biologically useful ammonia, performs this
incredibly difficult reaction using a complex cluster of exotic metals
(. Fig. 18.8).
18
. Fig. 18.9 Ribozyme. Two structures of a minimal hammerhead ribozyme,

composed of two short RNA strands, capture the molecule before and after the
cleavage reaction. A small active site is formed on one strand (magenta) that
helps to catalyze the reaction (PDB entry 488d)
Seven: Ribozymes
Proteins arent the only molecules that can catalyze chemical reac-
tions in cellstheyre just the most creative. Scientists have also
discovered that RNA is used to build catalytic molecules, termed
ribozymes. The most famous one, of course, is the ribosome, which
uses an adenine base to catalyze the addition of amino acids to a
growing protein chain (7 see Fig. 7.6). Other common examples are
RNA molecules that can cleave other RNA molecules, or them-
selves. These RNA molecules fold into complex shapes reminiscent
of the globular structures of enzymes. Theyre particularly good for
these tasks because they have built-in machinery for recognizing
the target RNA sequence, since they can use typical base pairing to
bind to it. They usually employ metal ions to perform the actual
reaction.
A tiny ribozyme has been the object of much of the study, serv-
ing as a convenient model for the action of ribozymes. It is termed
the Hammerhead ribozyme, since chemical diagrams look like a
hammer, and it was first found in plant pathogens, where they are
involved in self-splicing of the RNA genome, and similar ones have
been found in many organisms. Researchers whittled away at this
natural ribosome, ultimately finding a minimal version of it that
performs the self-cleavage reaction with only two short RNA
strands (. Fig. 18.9).
139 19
Building Bodies

140 Chapter 19 Building Bodies
Molecular biology and cell biology study the same subjectsliving

19 cellsbut they approach it from opposite angles. Molecular biolo-
gists take a bottom-up approach. They look very closely at the
components of cells and then try to fit all these individual puzzle
pieces together into a coherent image. Cell biologists, on the other
hand, often take a top-down approach. They observe whole cells
and then try to ferret out what the individual molecules are doing.
These two approaches are gradually merging, as molecular biolo-
gists work with larger and larger assemblies, and cell biologists
develop ever more powerful methods of microscopy to probe finer
and finer details. Together, they are building a detailed image of
how cells, and our whole bodies, work.
The power of this complementary approach is particularly
apparent when we look at the infrastructure of the cell. Think of the
complex infrastructure that supports our familiar lives. We have
houses that protect us and allow us to compartmentalize our many
tasks. We also have a complex infrastructure of delivery that gets us
the resources we need and takes away the waste. Finally, there is a
rich infrastructure of communication, bringing us information
about our day-to-day life at home and work, and information about
the world as a whole. Cells, and entire organisms, rely on a similar
infrastructure to hold everything in place and orchestrate the many
processes of living.
The infrastructure of the cell has been remarkably difficult to
study at the atomic level. There are several reasons for this. The
physical infrastructure is large, heterogenous, and flexible, so it
doesnt conform well to the strictures of the available methods for
structure determination. Also, much of this infrastructure is built
around membranes made of lipids, which are slippery at best. So
structural biologists have had to play all sorts of clever games with
these molecules to get a close look at them, working hand in hand
with cell biologists to see how they all fit together.
Take, for instance, the cytoskeleton. There is a network of fila-
ments inside each cell that defines the cell shape and manages trans-
port of resources from place to place. Micrographs of cells have
revealed that these filaments come primarily in three sizes: narrow
actin filaments, medium-sized intermediate filaments, and wide
microtubules, all arranged in a tangled web. Structural biologists
have found ways to get detailed structures of actin filaments and
microtubules, through a combination of atomic methods and elec-
tron microscopy, but researchers are still piecing together the details
of intermediate filaments like vimentin (. Fig. 19.1). They are com-
posed of protein chains that form long alpha helices, which then
coil around a neighbor to form a sturdy protein rod. These then
stack side-by-side and end-to-end to form the filament. Atomic
structures have been determined for portions of the helical coils,
and then results from electron microscopy are used to model how
these coils fit together.
A diverse collection of molecules are involved with modeling
and remodeling the filaments in the cytoskeleton. These include
molecules that connect one filament to its neighbors and molecules
that guide the assembly of filaments during cell growth, movement,
Chapter 19 Building Bodies
141 19
. Fig. 19.1 Cytoskeletal laments. Cytoskeletal laments actin (top), vimentin (center), and microtubule (bottom)
(PDB entries 1m8q, 1gk7, 3uf1, 3trt, 1gk4, 1tub)
and division. One of the remarkable machines is a protein that sits

at the end of actin filaments and delivers new subunits to the end,
guiding growth of the filament. It is a flexible protein with several
functional parts connected by flexible linkers. It has been studied at
the atomic level, but in parts, looking at the actin-binding and regu-
latory domains separately and then cobbling everything together
for a complete view (. Fig. 19.2).
The outer wall of the cell and the walls around all its internal
rooms are largely composed of membranes made of lipids. Lipid
membranes are wonderfully dynamic structures, with the individ-
ual molecules in constant motion, flowing past one another but still
forming a waterproof barrier. The membrane structure is mostly
what we see in classic cellular electron microscopy, because the
membranes pick up the heavy metal stains used to enhance the con-
trast. Membranes have been studied by a variety of biophysical
methods, quantifying how the thickness and fluidity changes based
on the composition of different types of lipids. An atomic view,
however, has been hard to pin down, given the dynamic nature of
the membrane. Computer simulations, however, have given us a
hint of how they might look (. Fig. 19.3).
Of course, a wall with no doors or windows is only good as a
prison, so the cell needs ways to get materials and information
across its membranes. It does this with proteins that span the mem-
brane, forming pores and channels and signal transducers. These
have been remarkably difficult to study, because of the environment
they have evolved to occupy. These proteins typically have a belt of
carbon-rich amino acids around their middle, which interacts
beautifully with the carbon-rich interior of the membrane
(. Fig. 19.4). This causes problems for study, however, because it
makes them finicky and insoluble when they are purified. Structural
biologists have used many methods to coax them into crystals,
including engineered forms with convenient handles, coating them
with antibodies, coating them with detergent, and anything else they
19
. Fig. 19.2 Formin. Formin (yellow) assembles an actin lament (blue) by

adding actin molecules one at a time with the help of prolin (green). The
regulatory subunit of formin (at the top) binds to GTP-binding proteins such as
CDC42 (pink). The exible linkers of formin were not seen in the crystallographic
structures and are shown here with dots (PDB entries 1y64, 2w4u, 3chw, 3eg5)
can think of (. Fig. 19.5). Using these techniques, atomic structures

for hundreds of these molecules are currently available.
These structures have revealed many of the secrets of transport
across membranes. One of the most remarkable stories is the potas-
sium channel, which posed a molecular mystery. In experiments,
the channel was shown to pass potassium ions freely across the
membrane, but it somehow managed to block the flow of sodium
ions, which are smaller than potassium. So obviously, it is not sim-
ply a size filter. The atomic structure revealed that water plays a key
role (. Fig. 19.6). The ions are normally associated with a charac-
teristic shell of water ions when free in solution. The channel has
evolved to strip these water ions away, selecting only the shape of
the waters that are found around a potassium ion and not the differ-
ent shape of waters around a sodium ion.
Membranes are dynamic barriers, and the cell takes advantage of
this by building small, closed vesicles for delivery of cargo from site
to site. These vesicles are created by pinching off a section of
143 19
. Fig. 19.3 Lipid bilayer. This model of a lipid bilayer was generated using
computer simulation in the laboratory of Klaus Schulten. The two layers of lipids
are seen in the center, with their carbon-rich tails (gray) pointing inwards and
charged groups (red, yellow and blue) exposed to water molecules on each side
. Fig. 19.4 Membrane proteins. Membrane-spanning proteins are encircled with a belt of carbon-rich amino acids
that interact with the carbon-rich interior of the membrane. In this illustration, charged atoms are bright red and blue
and are mostly outside the membrane, and carbon-rich regions are in white. The three proteins are a photosynthetic
reaction center (left), an ion pump (center), and P-glycoprotein (right), a protein that pumps toxic molecules out of our
cells (PDB entries 1prc, 1su4, 3g61)
membrane to form a closed sphere. Electron micrographs of the

inner surfaces of cells revealed little dome-shaped protein assem-
blies with a characteristic geodesic texture, which were caught in the
act of building these vesicles to pull nutrients into the cell. A three
armed molecule called clathrin forms these assemblies and provides
the force to bud the vesicle. A recent cryo-EM structure has captured
19
. Fig. 19.5 Structure determination of adrenergic receptor. This structure of

the adrenergic receptor shows some of the tricks that researchers need to use
to study membrane-bound proteins. Lipid molecules like cholesterol and
palmitate (turquoise) were used to stabilize the membrane-spanning portion,
and the protein chain was engineered as a chimera with lysozyme (green) to
add more water-soluble bulk to the protein (PDB entry 2rh1)
a very small clathrin coat in almost atomic detail, showing how the
arms interdigitate and embrace the vesicle inside (. Fig. 19.7).
Cells in our bodies also need infrastructure to hold them together
as building blocks of a larger body. At a local level, cells have many
ways of connecting their neighbors. For instance, cells are tied
together and communicate with each other through gap junctions,
formed of a closely packed arrangement of hundreds or thousands
of connexon proteins. These connexons form a narrow pore that
connects the cytoplasm of the two cells, allowing small molecules
like ions and nucleotides to pass back and forth (. Fig. 19.8).
Researchers have found that this flow of molecules stops when the
cell is damaged, however. Damage often leads to release of calcium
from internal storehouses, and these calcium ions bind to the con-
nexons. It has been thought for many years that this causes a confor-
mational change in the connexon, closing up the pore. A recent
structure, however, shows that the calcium-bound pore is wide
open, similar to the normal state of the pore. Based on this structure,
researchers now think that all of these calcium ions may form an
145 19
. Fig. 19.6 Potassium channel. This structure of a potassium channel reveals

the basis for its specicity. A potassium ion (blue) surrounded by water
molecules (red spheres) is waiting in a vestibule at the bottom. The channel
forces it to release the waters but replaces them with a perfect arrangement of
oxygen atoms from the protein chain as the ions pass single le through the
channel. On the other side of the membrane (at the top here), the ions pick up
a new shell of waters (PDB entry 1k4c)
. Fig. 19.7 Clathrin. Cryoelectron microscopy was used to capture the

structure of a particularly small and regular assembly of clathrin (PDB entry 1xi4)
electrostatic barrier that inhibits the flow of positive ions like potas-
sium through the pore.
Our bodies also need a larger infrastructure to tie everything
together. This is built of many very large molecules that together
form networks that support tissues and organs. Collagen is one of
the major structural components of these networks. It is composed
19
. Fig. 19.8 Gap junction. Connexons are arranged in an approximately

hexagonal lattice to form gap junctions. The cell membranes are shown
schematically in gray, and calcium ions are shown in red (PDB entry 5er7)
of three protein strands that form a characteristic triple helix. Early

work on collagen proposed this structure based on the unusual
amino acid sequence, which has a triplet repeat that includes a gly-
cine and a proline. Models of this helix proposed that the glycine is
needed to fit in a tight space between strands in the helix, and the
prolines are needed to form periodic kinks that keep the whole
structure tightly together. The structure is far too large to be studied
by crystallography or NMR spectroscopy, but structural biologists
have solved this problem by chopping it into manageable pieces.
Several atomic structures of these pieces are available in the PDB,
revealing the atomic details of the triple helix (. Fig. 19.9).
For many years, I have been trying to integrate this diverse infor-
mation from structural biology and from cell biology to build up
a coherent picture of the cell and its internal molecular processes
(. Fig. 19.10). I am continually amazed, as I create these pictures, by
the complexity of the cellular infrastructure. When I research a new
topic, I always look carefully at each molecule. If it is in a particular
place, I then look for information on molecules that hold it there. And
147 19
. Fig. 19.9 Collagen. Atomic structures have been determined for small pieces of the collagen triple helix, as shown
at the top. The characteristic glycines are shown with spheres and starred near the left side, and the many ve-membered
rings are prolines or hydroxyprolines that kink the chain and direct it back towards the center of the triple helix
(PDB entry 1cag)
. Fig. 19.10 Painting of a nerve synapse. This painting shows a cross section through a glutamatergic nerve synapse.
Remarkably, all of this complex infrastructure is needed to ensure that small neurotransmitter molecules (yellow dots)
are released at the proper time, delivering a signal to receptors on the surface of the lower cell. The infrastructure
includes vesicles that hold neurotransmitters, proteins that store the vesicles and deliver them to the surface of the cell
at the appropriate time, and proteins that manage the fusion of the vesicles with the cell surface to release the
neurotransmitters. There is also an infrastructure for holding the two cells together and arranging the receptors in the
proper place. On top of this, there is a complex regulatory infrastructure that modulates the activity of the synapse,
creating complex behaviors such as memory
then, on top of that, I search for information on the regulatory infra-

19 structure that makes sure its there at the right time. Science is revealing
how truly complex and integrated all of this infrastructure is, ensuring
that were at top performance, from molecules to cells to entire bodies.
149 20
Coloring the Biological

World

150 Chapter 20 Coloring the Biological World
One enzyme has arguably caused more human strife than any other,
20 the enzyme tyrosinase. The one shown here is from bacteria
(. Fig. 20.1), but the one in our cells is similar but is bound to
membranes. It performs an interesting reaction: it oxidizes the
amino acid tyrosine, which then forms huge aggregates called mela-
nin which strongly absorb light, looking dark brown or black. Cells
in our skin have special compartments that make this melanin to
help protect us from the dangerous effects of sunlight. Therein lies
the problem. Human populations around the world have evolved
cells that make different amounts and types of melanin, driven
largely by their historical exposure to sunlight. This has yielded a
beautiful diversity in skin color, ranging from clear white to darkest
black and everything in between. Similar molecules give hair its
shades of blonde, red, brown, and black. Unfortunately, human
society has never been good with differences, and this highly visible
consequence of a single enzyme has helped to fuel many of our cur-
rent societal challenges.
Were not at all unique in this variation of color, or in the strife
it can cause. The biological world is filled with colors, which have
evolved to provide a variety of selective advantages. These include
colors that help hide and colors meant specifically to be seen. Some
colors are a consequence of the selective absorption of other colors,
and some colors are a consequence of light actually created by the
cell.
The most common color in our biological world is greenthe
ever-present green of plants. Ironically, this green light is leftover
lightlight that the plant cant use. The chlorophyll used by plant
cells to capture the energy in light absorbs red and blue light
strongly, leaving the greenish hues. The color is caused by the large,
flat ring of atoms, termed a porphyrin, which has many atoms that
share electrons and can absorb the energy from visible light
(. Fig. 20.1). These chlorophyll molecules are held inside special
. Fig. 20.1 Tyrosinase. These two structures capture tyrosinase before and after it performs its reaction of converting
tyrosine to L-DOPA (PDB entries 4p6r, 4p6s)
Chapter 20 Coloring the Biological World
151 20
proteins that hold them in huge arrays, ready to soak up as much
light as possible (7 see Figs. 12.1 and 12.2).
To assist chlorophyll, plants also build molecules that absorb
other colors. For instance, beta-carotene absorbs blue and green
light, and thus looks orange. Looking at the photosynthetic machin-
ery, these molecules are arrayed with chlorophyll in many plants.
Plants are also masters of color generation for decoration. They
build all manner of colorful molecules to decorate their flowers. The
evolutionary goal for these, amazingly, is to look pretty, at least
pretty to the insects that pollinate them.
Other colors in our own bodies are a consequence of the metal
ions we need for life. The bright red of blood is the most familiar. It
is caused by the iron ion that is held within a heme (. Fig. 20.2). As
with chlorophyll, the color is a consequence of the delocalized elec-
trons in the porphyrin ring. Trillions of these molecules fill every
red blood cell, soaking up oxygen and blue and green and yellow
light. Similarly, proteins such as cytochromes, as indicated by their
names, have the side effect of producing color. Other organisms use
different metal ions in these tasks, so their blood may be different
colors. For instance, hemocyanin from insects uses a copper ion
and is blue green.
Colored molecules are also ideal for sensing light. Cells in our
retinas use a particularly useful molecule, called, quite logically,
retinal. Like porphyrins, it has atoms with delocalized electrons that
absorb visible light. But when they do, they induce a change in the
shape of the molecule. This is perfect for sensing light. The retinal is
. Fig. 20.2 Porphyrins. Porphyrins provide much of the color in our natural world. They are composed of a at ring of
atoms that capture a metal ion in the center. The colors depend on the specic arrangement of atoms in the ring and
the type of metal ion at the center (from PDB entries 1s5l, 2hhb)
20
. Fig. 20.3 Rhodopsin. Retinal (magenta) changes shape when it absorbs a photon, triggering the protein opsin
(white) to launch a signal to the brain (PDB entries 1u19, 3pqr)
embedded in a protein, opsin, and the shape change triggers a

change in the protein, which amplifies the signal by passing it on to
many downstream signaling proteins (. Fig. 20.3).
Amazing color effects are generated by specialized organisms
using fluorescence. Fluorescent molecules absorb light and then
reradiate it with a different color. Often they absorb ultraviolet light
and then emit colored light. Youre probably familiar with this with
the neon bright paints that fluoresce under black lights. Fluorescent
proteins, such as GFP (green fluorescent protein) have become a
mainstay of research in cell biology. These small proteins, originally
discovered in jellyfish and corals, have now been engineered in
every color of the rainbow (. Fig. 20.4). They are used to tag pro-
teins in cells, allowing researchers to track them as they move from
place to place. Atomic structures have revealed how GFP performs
its job. It has two amino acids locked away inside the protein, which
react with themselves when the protein folds up. This forms a new
chromophore that has the perfect characteristics for absorbing UV
light and reradiated colors. Other organisms have proteins that
make similar chromophores, but by tuning the characteristics of the
surrounding amino acids, the color is changed from green to red or
Chapter 20 Coloring the Biological World
153 20
. Fig. 20.4 Fluorescent proteins. The chromophore of GFP (left) forms spontaneously when a new bond (dotted
turquoise here) forms between three successive amino acids in the chain. Biotechnology researchers have made small
changes to the chromophore and the amino acids that surround it to create uorescent proteins with all the colors of
the rainbow (right) (PDB entries 1ema, 3m24, 2q57, 4ar7, 2y0g, 1huy, 2h5o, 2h5q)
to yellow. Scientists have jumped on this possibility and further

modified the molecules so that we now have a full rainbow of fluo-
rescent proteins to use in as tags in scientific experiments.
For instance, researchers commonly attach GFP (green fluores-
cent protein) to other proteins, like the proteins that form the
cytoskeleton, to create a probe that lights up the proteins inside liv-
ing cells. The structures of the proteins are an essential step in the
process, to ensure that the place that GFP is connected will not cor-
rupt the function of the protein being studied. In an exciting twist
on this, scientists have attached a calcium-binding protein to GFP,
creating a glowing calcium sensor that may be used to track the
level of calcium in living cells (. Fig. 20.5).
But I save the best for last: some organisms have evolved ways to
make their own light. The most familiar, of course, are fireflies. They
use a protein called luciferase to emit cool greenish light (. Fig. 20.6).
The creation of light is an energetic process, so it needs to be pow-
ered. Atomic structures have revealed that the chromophore forms a
highly strained shape when it interacts with oxygen, consuming
ATP in the process. When this oxygenated molecule breaks, releasing
carbon dioxide, the energy released is enough to release a photon of
light.
20
. Fig. 20.5 Fluorescent calcium sensor. The engineered calcium sensor GCaMP2 includes a circularly permuted green
uorescent protein (green), attached to calmodulin (magenta, with calcium ions in yellow) and a short chain from
myosin (blue). The calmodulin portion changes shape when it binds to calcium, changing the uorescence of the GFP
portion (PDB entry 3evr)
. Fig. 20.6 Luciferase. The chromophore luciferin is shown in the center, surrounded by the luciferase protein
(PDB entry 2d1s)
155 21
Amazing Antibodies

156 Chapter 21 Amazing Antibodies
As I write this chapter, antibodies are much on our minds due to

21 controversies about vaccination. This is a pity, since vaccines are
one of the true wonders of medical science. By challenging our
bodies with a weakened form of a deadly pathogen, we can prime
our defenses, making us resistant to infection throughout the rest of
our life. Because of this, we no longer fear polio, or smallpox, or
measles, or a host of other deadly viruses (. Fig. 21.1). For most of
us, this gives us much peace of mind for ourselves and our children.
Unfortunately, a feared connection with autism, widely popularized
but with no scientific support, has frightened some people enough
that they forego this protection, putting themselves at greater risk
and, if the numbers grow too large, the entire population along with
them.
The key to antibodies is their selectivity. Each type of antibody
binds to a different target, such as a virus or a bacterial protein.
When researchers started solving structures of proteins, antibodies
were on their short list of most wanted structures, to reveal the basis
of this incredible recognition ability. Genetic studies had revealed
that the amino acid sequence of different antibodies are mostly
much the same, except that there are six small regions that show
many changes, termed hypervariable loops. The atomic structure
. Fig. 21.1 Poliovirus neutralized by antibodies. This cryoEM structure includes the viral capsid (red and orange) and
the virus-binding portion of the antibodies (blue). Since the resolution of the experiment was not sucient to resolve
individual atoms, the structure includes only a single atom for each amino acid, which are represented here with a
larger sphere than is normally used for atomic images (PDB entry 3j3p)
Chapter 21 Amazing Antibodies
157 21
revealed that the antibody chains fold into a well-ordered structure
with two parallel beta sheets, and all of these hypervariable regions
are arrayed at one end, there they form loops that together sur-
round the binding site.
Flexibility is also a key component of antibody action. Antibodies
typically have two or more binding sites in a particular complex,
allowing them to make multiple connections on the surface of a
pathogenic organism. To make this even more efficient, the connec-
tors holding these binding sites are flexible, allowing them to
accommodate to different types of surfaces. This, however, makes
antibodies difficult to study. The classic Y-shaped antibody has been
observed by electron microscopy, but most atomic structures have
been solved using fragments of antibodies, which are more-or-less
rigid. An atomic structure had to wait until a lucky researcher found
a crystal form that trapped the flexible antibody in one particular
frozen pose (. Fig. 21.2).
We now have structures of hundreds of antibodies, bound to
many different types of targets. These structures reveal the secrets of
antibody recognition. By genetically mixing and matching seg-
ments of these hypervariable loops, they are able to recognize
almost anything. Structures in the PDB include antibodies that bind
to small molecules like cocaine or steroids, to soluble and
membrane-bound proteins, to RNA, to DNA, and to entire viruses.
For instance, structures have been determined for three different
antibodies that all recognize the same protein but in different ways
(. Fig. 21.3). Atomic structures have also captured the process of
antibody affinity maturation, where antibodies are tuned by the
immune system to improve their binding ability (. Fig. 21.4).
. Fig. 21.2 Antibody structures. Many atomic structures of antibodies have been solved by breaking the molecule
into stable fragments. Two early structures are shown here at the left: an antigen-binding Fab fragment that binds to the
small molecule phosphocholine and the Fc fragment that is similar (or constant) in many antibodies. A handful of
crystallographic structures of entire antibodies have also been determined, such as the one shown here on the right,
capturing the exible antibody in one particular pose (PDB entries 1mcp, 1fc1, 1igt)
21
. Fig. 21.3 Anti-lysozyme antibodies. These three antibodies all recognize lysozyme (shown with a rainbow-colored
cartoon), but they bind to dierent sides of the molecule. Notice that the binding sites are quite dierent on the
antibodies: the one on the left has a cluster of positively charged amino acids (in bright blue), the one in the center has
more negatively charged amino acids (in bright red), and the one on the right is largely uncharged (white and pastel
colors) (PDB entries 1fdl, 1yqv, 3hfm)
. Fig. 21.4 Antibody maturation. The immune system tunes antibodies by making small changes to improve binding
to the target. The antibody on the left recognizes lysozyme (green) and is from the initial response to the protein and
binds fairly weakly. The antibody on the right has been optimized by anity maturation and binds a thousand times
more tightly. Sites of mutation (red) are scattered through the antibody chains, together making a better t to lysozyme
(PDB entries 1mlc, 1p2c)
159 21
. Fig. 21.5 Antibodies in science. Antibodies are used in many medical and scientic applications. The structure on
the left shows two small fragments of antibodies bound to human chorionic gonadotropin. These types of antibodies are
used in pregnancy tests, since the hormone is prevalent during pregnancy. Other tests, such as the commonly used test
for HIV infection, use an antibody to recognize the unique shape of another antibodythe one that is built by the
immune system to ght the virus (shown in the center). Antibodies are also used by structural scientists to assist in the
crystallization of dicult systems, such as the small ion channel shown in green on the right (PDB entries 1qfw, 1iai, 1k4c)
The binding of antibodies is so tight and specific that antibodies

have become indispensable scientific tools (. Fig. 21.5). Antibodies
are widely used in biological testing, for instance, in tests for preg-
nancy or HIV infection. Researchers have attached fluorescent mol-
ecules to antibodies and then used them to track molecules in living
cells. They are also widely used by crystallographers. Antibodies are
often bound to particularly recalcitrant proteins, providing a sturdy
handle to help crystallization.
Recently, unusual antibodies have been discovered in camels
and sharks and have revolutionized the practical applications of
antibodies (. Fig. 21.6). These have binding sites that are composed
of a single chain, unlike the two chains needed to form a typical
antibody binding site, and have been dubbed nanobodies. This is a
boon for research, because these nanobodies are easier to engineer
and synthesize, and they are starting to fill some of the jobs previ-
ously used by traditional antibody molecules from rabbits or goats.
The evolution of antibodies and the immune system by our dis-
tant ancestors protected them from perils in their environment, but
this quickly led to an evolutionary arms race with our attackers.
Viruses, bacteria, and parasites have all evolved methods to fight
back, evading our defenses, and the immune system has evolved in
turn to fight these. We can see an example of this happening today
with one of our greatest perils: HIV.
HIV is a particularly insidious virus that infects the cells of the
immune system, slowly and relentlessly disabling it. One of the rea-
sons that HIV is so effective is that, from the outside, it looks like a
tiny human cell. This makes it difficult for antibodies to recognize
21
. Fig. 21.6 Antibody structures. As shown on the left, most antibodies are composed of two heavy chains (in blue)
and two light chains (in green). The smaller antibodies made by camels and sharks are composed of two copies of a
single chain, as shown on the right (PDB entry 1mel)
that something is wrong. Viruses only need one thing on their sur-
faces: a machine to recognize susceptible cells and force their way
inside. Everything else can be hidden away inside, invisible to the
immune system. In HIV, this machine is called envelope glycopro-
tein, and its trick for survival is revealed in its name. It is covered
with sugar chains that are the same as the sugar chains on our cell
surface proteins. These form a protective coat of camouflage that
hides the virus from the immune system.
However, the immune system is extremely resourceful, and
shuffling and hypermutation can create many, many different types
of antibodies in short order. Several types of antibodies have been
observed in patients that are effective for neutralizing HIV.They use
some amazing tricks, including long fingers to reach through the
sugar coat and probe the underlying HIV protein and the linking of
several antibodies in tandem to create a complex that binds to the
sugars themselves (. Fig. 21.7). Unfortunately, it typically takes a
long time for these antibodies to be created by the immune system,
and the virus has already taken a strong hold by the time they are
being produced. One hope for anti-HIV vaccines would be to try to
elicit these broadly neutralizing antibodies earlier in the infection.
Vaccines have changed our lives, but some targets have remained
elusive. Influenza is a classic example. It changes so rapidly that our
complement of antibodies quickly goes out of date and is ineffective
against the newest strains or even old strains that havent been seen
161 21
. Fig. 21.7 HIV envelope glycoprotein and antibodies. The Fab portions of two broadly neutralizing antibodies (blue)
show unusual ways to recognize HIV envelope glycoprotein (yellow and red), which is protected by a coat of
carbohydrates (orange). The one on the left has a long nger that pushes through the carbohydrates to reach the
protein, and the one on the right is domain swapped, producing two tandem binding sites that recognize the
carbohydrates directly (PDB entries 1nco, 1op5)
for many, many years. This is why we need a new influenza shot
each yearthe medical establishment makes an educated guess
about which strains will pose a danger and protects us with a vac-
cine against it. And then our antibodies get to work, patrolling
through our bodies and protecting us from invaders.
163 22
Attack and Defense:

Weapons of the Immune
System

164 Chapter 22 Attack and Defense: Weapons of the Immune System
I dont think of myself as being particularly germophobic. I wash

22 my hands regularly but not obsessively, and I keep the kitchen and
bathroom reasonably disinfected. Sometimes, however, I can
understand our societys growing obsession with germs. Based on
what I hear on the news, a pandemic often seems just around the
corner.
Make no mistake: we are constantly under attack. Bacteria are
everywhere, they reproduce at blinding speed, and theyre continu-
ally trying to gain a toehold in our food and on our exposed sur-
faces. Viruses are easily transmitted from person to person, and
they quickly set up shop in our cells and start churning out more
viruses. In recent history, we have become aware of this ever-present
danger, and careful attention to sanitation has helped fight back the
hoards of attackers. But many still get through, and it is left to our
own bodies to resist.
Fortunately we have a very full arsenal of defenses against these
attacks. These foes have been present in our environment since the
dawn of mankind and have imposed a heavy selection pressure on
human populations. As a result, the human race has evolved many
ways of fighting them. Together these make up our immune system,
which stands guard in our defense. Many of the tools of the immune
system are hardwired to attack our most common foes. These com-
prise the innate immune system, which protects us from old ene-
mies. This works hand in hand with the adaptive immune system,
centered around antibodies, which is more flexible and takes care of
our newer enemies.
The innate immune system includes many weapons that attack
invaders at their weak points. For instance, the first one that was
discovered was lysozyme, famously discovered by accident by
Alexander Fleming. It is secreted in bodily fluids, for instance, it is
the most prevalent protein in tears. Its job is to seek out bacterial
cells and cut up their cell walls. When it was discovered, it was
hoped that it would be the magic bullet to kill bacteria and fight
infectious disease. But since it is a protein, it is difficult to adminis-
ter as a drug, and a true medical magic bullet had to wait until the
discovery of antibiotics like penicillin. However, the characteristics
of lysozyme have made it one of the most popular molecular lab rats
of structural science: it is small and stable, which allows it to survive
the harsh conditions outside the cell and stay dangerously active
while performing its job or while being studied by scientists. Many
important discoveries have been made using lysozyme as the test
subject. These include the first atomic structure of an enzyme
(7 see Fig. 17.2) and detailed work on protein folding and stability
by study of hundreds of atomic structures of lysozyme mutants.
Other proteins take a similar approach, attacking weak points of
the invading organisms. For instance, bacteria are surrounded by a
typical lipid membrane, which must remain continuous and closed
to support the proper environment inside the cell. We make a vari-
ety of antimicrobial peptides that attack this membrane. Atomic
structures of dermcidin have revealed how it creates a hole through
the membrane, allowing ions to enter and exit freely (. Fig. 22.1).
The structures have also revealed a major puzzle posed by these
Chapter 22 Attack and Defense: Weapons of the Immune System
165 22
. Fig. 22.1 Antibacterial proteins. Siderocalin gathers up the siderophores,

such as enterobactin, that bacteria use to gather iron, starving them of this
essential nutrient. Dermcidin punches holes through the bacterial cell
membrane. The picture on the right has two subunits removed to show the
tunnel through the center, which is lined with charged amino acids (bright red
and blue) (PDB entries 3cmp, 2ymk)
defensive proteins: how do they keep from doing the same thing to
our own membranes? The structures reveal that they are coated
with positively charged amino acids, which recognize the negatively
charged phospholipids that are common in bacterial membranes.
Our own cells have more lipids that are neutral and thus are not as
susceptible to attack.
Iron ions are another weak point in bacteria that is targeted by
the innate immune system. Iron is a precious commodity in our
bodies. We have a lot of iron, but we keep it locked up inside pro-
teins like hemoglobin. This leaves very little free iron for infecting
bacteria to use for their own metalloproteins. One type of bacteria,
the one that causes Lyme disease, has evolved a particularly draco-
nian solution to this challenge: all of the proteins in its genome that
22
. Fig. 22.2 Complement C1. An electron micrograph reconstruction of

complement C1 and an antibody is shown on the left, and the scientists
interpretation, based on atomic structures of individual domains of
the proteins, is shown on the right, with complement C1q in blue, other
C1 proteins in magenta, and immunoglobulin M in green (Figures were
generated on the EMDataBank website for entry EMDB2507)
normally require iron have been replaced by proteins that use other
metals or no metals at all. Most other bacteria, however, need to
find a way to gather up these rare iron ions for their own use.
This has lead to an evolutionary battle between our cells and
bacterial invaders. Bacteria build unusual small molecules, termed
siderophores, with a big appetite for iron. They release these sidero-
phores into the environment and then gather them up after they
have captured individual iron atoms (. Fig. 22.1). In response, our
ancestors evolved proteins that grab siderophores, termed sidero-
calins, and sequester them before the bacteria get a chance. In
response to this, some bacteria have then evolved stealth sidero-
phores that can gather iron but are not recognized by siderocalins.
And so the battle continuesand scientists are following every step
with atomic structures.
Our immune system also builds a more elaborate system for
attacking bacteria, termed the complement system. When antibod-
ies (such as star-shaped immunoglobulin M) find a bacterium, the
complement C1 protein binds to them and launches a cascade of
response that leads to the creation of a membrane attack complex
that pierces the bacterial cell wall. These proteins are large and flex-
ible and thus have been difficult to study. Atomic structures have
been determined for many of the functional parts of the molecules,
such as the antibody-recognizing arms of C1q, but electron micros-
copy has proven to be the best way to study the entire system in
action (. Fig. 22.2).
Viruses are much more slippery and require a different set of
weapons for defense. These look for the unusual aspects of viruses
and attack them there. For instance, many viruses have genomes
composed of double-stranded RNA, which is rarely found in cells.
So, if a cell notices that there is double-stranded RNA in the cyto-
plasm, it knows that something must be wrong. Plant and animal
cells have a sophisticated system for recognizing and silencing
RNA (. Fig. 22.3). The system starts with a protein that breaks the
RNA into small, recognizable pieces, called dicer. These little
167 22
. Fig. 22.3 Small interfering RNA. Atomic structures have shown that the large active site of the protein dicer is
exactly the right size to cut double-stranded RNA into perfectly sized pieces, using several metal ions (left). These small
interfering RNA molecules are then bound by argonaute and used to recognize and destroy RNA that matches the
sequence (center). Some viruses circumvent this protection by creating proteins that sequester siRNA before it can nd
the viral RNA (right) (PDB entries 2f8s, 2, 4w5o, 1r9f )
pieces, called small interfering RNA (siRNA), then activate RNA-

digesting proteins called argonaute. It strips away one strand of the
siRNA and then looks for other RNA that matches the remaining
strand. In our cells, this system is mainly used to silence specific
messenger RNA molecules when they are no longer needed to
build the proteins they encode. But in plant and insect cells, and
perhaps also in ours, siRNA also provides an effective way to rec-
ognize and destroy any viral RNA that is being made. Of course,
viruses evolve quickly, and they have discovered ways to circum-
vent this system, by building proteins that hide the siRNA before it
can activate argonaute.
Bacteria also face the same problem of fighting off viruses and
have evolved an elegant system that remembers viruses that have
attacked the population in the past (. Fig. 22.4). When they are
attacked, they harvest small pieces of viral DNA and package it in
their own genome in a distinctive region called clustered regularly
interspaced short palindromic repeats or CRISPR for short. A col-
lection of Cas proteins then uses this library of stored information
to monitor any nucleic acids that are in the cell, keeping vigilant
watch for a repeated attack by this same virus. The Cas system
includes proteins that process RNA made from the CRISPR library
and other proteins that display it and launch into action when they
find any DNA that matches it. These proteins are showing great
promise for medicine, since they are powerful tools to breaking
DNA at very specific sequences inside a living cell. Recently, these
have been used to engineer a potential cure for HIV infection, by
introducing a specific Cas protein into infected cells that will cut up
any HIV DNA.
22
. Fig. 22.4 CRISPR and Cas. Cas9 uses CRISPR RNA (red) to recognize viral DNA (yellow) and then it breaks it into
pieces. Engineered versions of Cas9 are now being developed to destroy integrated HIV in infected cells (PDB entry 4un3)
Medical science has allowed us to play other direct roles in our

own immune response. As described in the previous chapter, vac-
cines allow us to prime our adaptive immune system for future
attack, by mobilizing the appropriate cells to build protective anti-
bodies. When we take antihistamines, were slowing down another
arm of the immune system: inflammation. When a trouble zone is
sensed, histamine and other molecules tell the body to make the
area more accessible to immune cells in the blood, so they can
assess the problem and figure out how to solve it. This system occa-
sionally gets a bit too aggressive and can cause dangerous problems,
so we take antihistamines or anti-inflammatories to calm every-
thing down. Atomic structures have been invaluable for character-
izing the many molecules involved in the inflammatory response
and designing new drugs that allow us to keep it in control
(. Fig. 22.5). I certainly benefit from this research every spring, as I
try to convince my own immune system that dust and pollen dont
really pose a life-threatening danger.
169 22
. Fig. 22.5 Histamine receptor. The histamine receptor is the target of

antihistamine drugs, such as the rst-generation drug doxepin. Unfortunately,
the drug also binds to other receptor proteins, which leads to side eects like
drowsiness. By using atomic structures like this, researchers are designing new
drugs that block only the histamine receptor (PDB entry 3rze)
171 23
Reconstructing HIV

172 Chapter 23 Reconstructing HIV
Were at a very exciting place in the study of molecular biology.

23 Using the highly successful deconstructive approach of science, we
understand many pieces of the puzzle, and were starting to put
them all together to reveal the overall picture. I have spent much of
this book talking about these individual pieces, each fascinating in
their own right. But things get really exciting, and challenging,
when we start to put it all together.
As part of the AIDS-Related Structural Biology Program sup-
ported by the NIH, I put together a series of illustrations to capture
the current understanding of HIV and its life cycle. The goal of
these illustrations is to integrate what is known: structures of the
pieces, how they fit together, and how they orchestrate infection
and reproduction of the virus. HIV is arguably the best understood
of any organism, but there are still many mysteries yet to be solved.
A second goal of these illustrations is to identify these gray areas.
I started the story at the point of infection (. Fig. 23.1). The
surface of the virus is studded with several copies of the envelope
glycoprotein. Its job is to find appropriate cells and then force the
viral genome inside. It has been known for some time that this pro-
tein recognizes key proteins on the cells of our immune system,
which is why HIV primarily attacks them. The primary target is
CD4, a protein that normally assists in recognition of pathogens by
the immune system. Then, the glycoprotein makes a secondary
interaction with CCR5 and similar GPCR proteins, which triggers
the transition that leads to fusion of the virus with the cell.
Ive tried to capture several aspects of this process in the paint-
ing. Evidence from fluorescence microscopy indicates that the
envelope glycoproteins are mostly clustered on one side of the virus,
so Ive included several attaching at one time. Atomic structures
have been determined for portions of the glycoprotein and its com-
plexes with receptors and antibodies, but often these are solved after
engineering out particularly flexible loops and chopping off most of
the protective coat of polysaccharides. The details of the portions on
the inside of the virus are still a matter of some speculationIve
drawn them based on results from spectroscopy that see the tails as
short helices that lie on the inner surface of the membrane. There
have also been some controversial results on the portions that cross
the membranesome results from EM see them all as a single stalk,
others see a tripod structure.
Once the virus gets inside the cell, it quickly gets to work. The
first task is performed by the enzyme reverse transcriptase, which
creates a DNA copy of the viral genome, which is carried in two
RNA strands in the infectious form of the virus (. Fig. 23.2).
Current understanding of the virus sees this happening inside a
more-or-less intact capsid, which presumably protects the viral
RNA strands from RNA-cutting enzymes in the cell. The whole
thing is transported to the nucleus, and the capsid falls apart at
some point along the journey. There are also many interesting wrin-
kles to the process. For instance, a human transfer RNA is used as a
primer to get the process started, and the viral nucleocapsid pro-
tein, which is present in many copies bound to the RNA, assists
with keeping everything unfolded and ready to be copied into DNA.
Chapter 23 Reconstructing HIV
173 23
. Fig. 23.1 Infection by HIV. HIV is shown at the top and a target cell is shown at the bottom in blues. HIV envelope
protein (1) has bound to the receptor CD4 (2) and then to coreceptor CCR5 (3), causing a change in conformation that
inserts fusion peptides into the cellular membrane. This ultimately leads to fusion of the virus with the cell membrane
The picture also includes several ways that the virus protects
itself. The capsid is dotted with a cellular protein, cyclophilin A.It
blocks the binding of a cellular antivirus protein that works by coat-
ing the capsid and stopping it from releasing the viral DNA. The
virus also injects the protein Vif, which attacks the cellular protein
APOBEC.APOBEC normally modifies bases on viral RNA, inacti-
vating it before it can be used to build new viruses.
The first big advances in the fight against HIV were achieved in
the late 1980s, using the classic deconstructive approach of molecu-
lar biology on reverse transcriptase and the two other viral enzymes
encoded in its genome. These viral enzymes are attractive targets for
drug therapy because they play essential roles in the viral life cycle,
and there is abundant precedent for creating inhibitors to block
enzymes like these. So they were purified, crystallized, and studied
23
. Fig. 23.2 Reverse transcription. After the capsid has entered the cell, reverse transcriptase (1) creates a DNA copy
(green) of the HIV RNA genome (yellow), using a cellular transfer RNA (2) as a primer. HIV nucleocapsid protein (3) acts as
a chaperone to unfold the RNA secondary structure. The ribonuclease activity of RT removes the viral RNA after the DNA
strand is created. Interaction of HIV Vif (4) with cellular APOBEC (5) is also shown
at atomic resolution. The structures of these enzymes allowed the

discovery of the effective anti-HIV drugs in current use.
Reverse transcriptase inhibitors are key weapons in the front line
of battle against HIV.Two approaches have proven effective. The first
attacks the enzyme at its central machinery. Drugs like AZT look like
typical nucleotides, but when they are added to the growing viral
DNA by the enzyme, they terminate the chain and stop the virus
from propagating. Structures have captured the process in motion,
seeing both the binding of AZT and the DNA chain after it is termi-
nated. The second approach attacks reverse transcriptase from the
opposite side. Structures of the enzyme revealed a deep pocket in the
enzyme. When this is filled with a drug, such as nevirapine, it freezes
the essential motions of the enzyme, blocking its function.
175 23
. Fig. 23.3 Integration of the viral DNA. Uncoating of the viral capsid (shown at the top) and interaction with nuclear
pore proteins such as Nup358 (1) releases the viral DNA (2). The DNA enters the nucleus through the nuclear pore
(shown in purple) and is spliced into the cellular genome by the enzyme HIV integrase (3). Cellular protein LEDGF (4) is
important for localization of the site of integration at DNA in nucleosomes (5)
Once a DNA copy of the viral genome is made by reverse tran-

scriptase, it is transported into the nucleus and HIV integrase
splices it into the cells own DNA (. Fig. 23.3). This is the secret
weapon of HIV, and one of the reasons it has been so hard to eradi-
cate. This copy stays hidden in the cells DNA, so infected cells are
virtually invisible to our intrinsic protective systems and can lie
dormant for many years. The search for an HIV cure (rather than a
treatment) has focused on ways to attack this integrated viral DNA,
either by coaxing dormant infected cells out in the open or using
molecular weapons to get inside these infected cells and destroy the
viral DNA.
This process of integration is one of the least understood steps of
the viral life cycle, and researchers are busy at work studying it. The
23
. Fig. 23.4 Transcription of viral RNA. HIV Tat protein (1), bound to the TAR RNA stem-loop structure, binds to the
P-TEFb complex (2), activating transcription by RNA polymerase (3). The illustration also shows HIV Rev (4) bound to the
Rev response element and CRM1 (5), a cellular protein involved in transport through the nuclear pore
process of import into the nucleus is not well understood, so I have

kept it very simple in the painting. It almost certainly involves many
other proteins to assist with the import, which are not shown.
Additional cellular proteins are also needed to assist with the actual
integration. I have shown one here that has been well studied,
termed LEDGF/p75. It directs the site of integration toward DNA
wrapped around nucleosomes, targeting integration into cellular
chromosomes.
Once the HIV DNA is integrated into the cells genome, the
virus co-opts the cells own transcription machinery to build new
viral RNA (. Fig. 23.4). The virus faces a big hurdle for this: the cell
has a very complex system that regulates the transcription of RNA,
and the virus needs to hijack this to make its own RNA.It does this
177 23
. Fig. 23.5 Construction of viral proteins. The HIV Gag polyprotein (1, shown in red) is built from the HIV RNA genome
(in yellow) by cellular ribosomes (2). A short hairpin loop in the genome (3) induces a frameshift roughly 5% of the time,
producing the longer Gag-Pol protein (4)
by inserting one small protein into the process, Tat. Transcription

normally has a checkpoint and RNA polymerase stalls soon after it
starts if it doesnt get the right signals to continue. Tat short-circuits
these controls and gives the signal to go.
The painting also includes a viral protein involved in getting the
viral RNA out of the nucleus. This protein, Nef, needs to bridge
between the viral RNA and the cellular proteins that guide mole-
cules through the nuclear pore. In the painting, I have drawn this
based on a jellyfish model of the protein. Other researchers, how-
ever, see it as being a bit more compact than this.
Viral genomes are typically very compact, and they need to fit a
lot of information in a small space. The HIV genome is no excep-
tion. One example of this is seen when the viral RNA is used to
build viral proteins (. Fig. 23.5). The virus builds its major proteins
in two forms, using the cells own ribosomes to do the job. The
23 smaller viral protein, called Gag, includes the proteins that direct
the budding of the virus and ultimately form the structure of the
mature virus. About one in twenty times, however, a longer protein
is made, termed Gag-Pol, that includes these same proteins but with
the three HIV enzymes added to the end. All of this is encoded in
one long gene in the viral RNA, but at the end of portion that
encodes Gag, there is a special sequence that forms a little hairpin
loop. This loop is just strong enough to stall the cells ribosomes as
they are creating the protein, and most of the time it falls off, mak-
ing the shorter Gag protein. Occasionally, however, it manages to
read through the loop and create the longer Gag-Pol protein.
The Gag and Gag-Pol proteins assemble on the inner surface of
the cell membrane, guiding the process of budding that produces
new viruses (. Fig. 23.6). This requires the assistance of many of the
cells own molecules to orchestrate the assembly, budding and
pinching off of the virus. This interplay of viral proteins and cellular
proteins is currently a major topic of study in the HIV biology com-
munity, as we try to understand the process and look for weak
points that can lead to new treatments and cures. Many of the
details still need to be resolved. I have included a few aspects in the
illustration. The end of the viral RNA has a complex structure that
dimerizes (to ensure that two copies of the genome end up in the
virus) and captures the cellular transfer RNA that will prime reverse
transcription, as well as the Tat protein that will promote transcrip-
tion. Cyclophilin A is captured and will end up on the surface of the
capsid, and as the whole thing buds out, bystanding cellular pro-
teins are swept up in the membrane and in the interior of the virus.
The final step of the life cycle is maturation, converting the
newly budded immature form of the virus into the infectious
mature form (. Fig. 23.7). This process is orchestrated by a small
viral enzyme: HIV protease. It cuts the Gag and Gag-Pol proteins
into their functional pieces. The timing of this is critical. Some of
the cuts need to be made first to ensure that everything assembles in
the proper order. I have drawn the painting at two stages. The lower
virus is just getting started, and the first cut is separating the struc-
tural proteins from the portions bound to the viral genome. The
upper virus is at the very end of the process. All of the proteins have
been processed, and they are assembling into the distinctive cone-
shaped capsid surrounded by a spherical membrane envelope.
HIV protease is one of the major targets for drug therapy. To
discover these drugs, scientists started with molecules that look
very much like the viral proteins that the enzyme cuts. Then, they
tinkered and tweaked these molecules until a version was found that
binds strongly to the enzyme but has usable properties that allow it
to be taken as a drug (7 see Fig. 17.4). This process has been ongo-
ing, continually improving the drugs and adding to our arsenal in
the fight against AIDS.As I write this chapter, there are close to a
thousand structures of HIV protease, capturing it in its many guises.
Currently, the most effective treatment plans combine these
protease inhibitor drugs with drugs that bind reverse transcriptase
179 23
. Fig. 23.6 Virus budding. HIV Gag protein (1) and Gag-Pol (2) form arrays on the cell surface, capturing two copies of
HIV genome (in yellow), which dimerize through a specic sequence (3) and bind to a cellular transfer RNA (4) that will
act as primer for reverse transcription. Viral proteins Vpr (5) and Vif (6) are also incorporated. Several cellular proteins of
the ESCRT system (7) are involved in the process of budding
and, increasingly, additional drugs that attack integrase or other

steps in the life cycle. This is our primary defense against viral drug
resistance. By hitting the virus hard at multiple places, it is much
more difficult for the viral population to evolve resistant forms.
Based on this understanding of the viral life cycle, we hope to
discover new ways to cure HIV-infected people and to create vac-
cines to protect people from infection. One of the exciting advances
in the search for a vaccine has been a better understanding of the
ways that the immune system recognizes the virus. It turns out that
the immune system can build effective antibodies against HIV, but
they are quite unusual and difficult to develop and typically appear
after the virus has taken a strong hold in the body (. Fig. 23.8).
23
. Fig. 23.7 Maturation of HIV. This illustration shows an immature viron in the process of maturation at bottom right
and a nearly-mature virion at upper left. HIV protease (1) is cleaving the Gag and Gag-Pol proteins into functional
proteins
These broadly neutralizing antibodies use unusual methods to

recognize the slippery virus (7 see Fig. 21.7).
I have spent many years taking an artistic approach to this
challenge of integrative biology, creating paintings of the so-called
mesoscale between atomic structure of molecules and the
ultrastructure of cells and their internal compartments. This is an
exciting scale level to explore, since its largely (at least until
recently) invisible to experimental observation. The goal is to
create an illustration of a significant portion of a cell, showing all of
the molecules in the proper place, size, and concentration. When I
started creating these pictures in the 1990s, there was just barely
181 23
. Fig. 23.8 Broadly neutralizing antibodies attack HIV. HIV is shown at lower right, with viral proteins in red and
magenta, and viral RNA in yellow. Blood plasma is shown at the top and left side. Several broadly neutralizing antibodies
(1) are binding to HIV envelope glycoprotein (2). Other viral proteins include matrix (3), capsid (4), reverse transcriptase
(5), integrase (6), protease (7), Vif (8), and Tat (9)
enough information to support them. Today, however, biological

information has exploded. And, access to information is incredibly
easy through resources like the PDB, UniProt, and PubMed. This
type of mesoscale modeling is currently transitioning, in my labo-
ratory and in many others, from an artistic, descriptive approach to
a more quantitative approach. The idea is to create computational
methods, such as our program cellPACK, that integrate diverse
information from molecular and cellular biology into a three-
dimensional view of HIV (. Fig. 23.9), a portion of a cell, or
indeed, an entire living cell.
23
. Fig. 23.9 Three-dimensional model of HIV. The cellPACK program (http://www.cellpack.org) was used to create a
three-dimensional model of HIV and blood plasma based on atomic structures and models of the individual molecules.
Image created by Mathieu Le Muzic and Ivan Viola from a model created by Ludovic Autin, Graham Johnson, and Arthur
Olson

David S. Goodsell (Auth.) - Atomic Evidence - Seeing The Molecular Basis of Life-Copernicus (2016)

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

David S. Goodsell (Auth.) - Atomic Evidence - Seeing The Molecular Basis of Life-Copernicus (2016)

Enviado por

Direitos autorais:

Formatos disponíveis

Atomic Evidence

ISBN 978-3-319-32508-8 ISBN 978-3-319-32510-1 (eBook)

Library of Congress Control Number: 2016943685

Springer International Publishing Switzerland 2016

Printed on acid-free paper

This Copernicus imprint is published by Springer Nature

1 The Protein Data Bank ................................................................................................................ 1

2 Seeing Is Believing: Methods of Structure Solution ............................................... 5

3 Visualizing the Invisible World of Molecules ............................................................... 11

4 The Twists and Turns of DNA .................................................................................................. 17

5 The Central Dogma ....................................................................................................................... 25

6 The Secret of Life: The Genetic Code ................................................................................. 33

7 Evolution in Action ........................................................................................................................ 41

8 How Evolution Shapes Proteins............................................................................................ 51

9 The Universe of Protein Folds ................................................................................................ 59

10 Order and Chaos in Protein Structure .............................................................................. 67

11 Molecular Electronics .................................................................................................................. 77

13 Peak Performance .......................................................................................................................... 89

14 Cellular Signaling Networks ................................................................................................... 99

15 GPCRs Revealed............................................................................................................................... 107

16 Signaling with Hormones ......................................................................................................... 113

17 Single-Molecule Chemistry: Enzyme Action

18 Seven Wonders of the World of Enzymes ....................................................................... 129

19 Building Bodies ............................................................................................................................... 139

20 Coloring the Biological World ................................................................................................ 149

21 Amazing Antibodies ..................................................................................................................... 155

22 Attack and Defense: Weapons of the Immune System .......................................... 163

23 Reconstructing HIV ....................................................................................................................... 171

The Protein Data Bank

Springer International Publishing Switzerland 2016

. Fig. 1.1 (continued)

structure. For each illustration, I have included the accession code

Springer International Publishing Switzerland 2016

Scientists are curious people. Were always asking questions and

. Fig. 2.1 Experimental views of a bacterial ribosome. The upper image

variety of unusual methods, such as concentrated solutions of salt

measured. By tailoring the types of fields, information is obtained

(carbon, nitrogen, oxygen, etc.) in a protein molecule but rarely

these modifications dont seriously perturb the function of the pro-

Visualizing the Invisible

Springer International Publishing Switzerland 2016

In my career, I have had the great pleasure to be able to combine two

When designing an image, we want to capture important

to view them interactively to allow us to rotate and explore them or,

Finally, a bit of a warning and a challenge. Many different

The Twists and Turns

Springer International Publishing Switzerland 2016

I'm lucky to be able to say that I have looked at DNA firsthandor

. Fig. 4.3 A, B, and Z DNA. Early crystallographic structures of the three

Atomic structures using short pieces of DNA have revealed the

. Fig. 4.4 DNA-binding antibiotics. Lexitropsins bind in the narrow minor

intercalation. These drugs are typically made as weapons by micro-

. Fig. 4.5 DNA-binding proteins. (A) Restriction endonuclease EcoRI, (B)

DNA repair proteins take an even more aggressive approach.

The Central Dogma

Springer International Publishing Switzerland 2016

The first thing we learn in molecular biology class is the central

architecture of the whole complex is still a matter of controversy

polymerase have revealed that it also includes a long, flexible tail

. Fig. 5.4 Transfer-messenger RNA. Transfer-messenger RNA (top) includes a

each of our chromosomes. However, the use of protein as a template

The Secret of Life:

Springer International Publishing Switzerland 2016