Escolar Documentos
Profissional Documentos
Cultura Documentos
David S. Goodsell
Atomic Evidence
Seeing the Molecular Basis of Life
David S. Goodsell
The Scripps Research Institute
and RCSB Protein Data Bank
La Jolla, California
USA
Preface
Why do I need a new flu shot each year? This book builds on my work with the
Should I be frightened by all the news RCSB Protein Data Bank, where I write a
about bacterial drug resistance? What column each month that highlights
about that new diet I just read about on the atomic structures from the PDB archive.
web? Biomolecular science is increasingly It has been a tremendous gift to have the
important in our everyday life, helping us opportunity to work on the Molecule of
answer questions like these, and giving us the Month, and I gratefully acknowledge
the knowledge to make critical choices Helen Berman, Stephen Burley, Christine
about our diet, our health, and our well- Zardecki, and the entire RCSB team for
ness. How do fireflies light up? Why do their enthusiastic support over the past
plant and animal populations evolve over 15 years.
many generations? Biomolecular science
also allows us to be curious, to look deeper The molecular stories in this book are sup-
into the natural world, and to be inspired ported by a monumental body of work by
by the complex inner workings of life. scientists around the world. Throughout
the book, I have included accession codes
In this book, I will take an evidence-based for structures at the PDB and EMData-
approach to current knowledge about the Bank. You can explore the structures
structure of biomolecules and their place directly at their websites (www.pdb.org
in our lives, inviting us to explore how we and www.emdatabank.org). The database
know what we know and how current entries for each of these structures also
gaps in knowledge may influence our include the primary journal publications
individual approach to the information. that describe the detailed science support-
The book is separated into a series of short ing each structure.
essays that present some of the founda-
tional concepts of biomolecular science, David S. Goodsell
with many examples of the molecules that San Diego, CA, USA
perform the basic functions of life.
VII
Contents
12 Green Energy..................................................................................................................................... 83
Were very lucky that today we can go to our computers and instantly
1 start exploring a hundred thousand atomic structures of biomole-
cules. The structural biology community has spearheaded a com-
prehensive effort to make the results of biostructural research freely
available to everyone. In 1971, a group of scientists at the Brookhaven
National Laboratory started an archive of atomic structures, called
the Protein Data Bank, as a way to make these structures available.
The first archive contained the seven protein structures that were
available at the time. Today the archive has grown to over a hundred
thousand entries and is managed by centers around the world:
RCSB and BMRB in the USA, PDBe in Europe, and PDBj in Japan.
Together, they have created online interfaces to this massive archive,
providing tools to deposit, curate, find, analyze, and visualize the
structures.
This wasnt always the case, however. In the early days of struc-
tural science, many researchers chose to keep the primary results of
their work, the atomic coordinates, secret. Instead of making these
available, they published only pictures of their structures and
descriptions of their own ideas about the structure and function.
Arguably, this was justified: because these structures require so
much effort to solve, these researchers wanted to have the freedom
to analyze them completely themselves. Many researchers, however,
felt that this policy went against the spirit of science, where results
are made available and may be used by the entire community to
build a more complete picture of our world. And perhaps more
importantly, results need to be made available to allow other
researchers to check their authenticity and reproduce any scientific
insights gained from them.
For this reason, with the support of many researchers, Fred
Richards drafted a letter in 1988 to the major government institutes
funding science, requesting a policy that crystallographic data be
made available, at least for all research supported by public funds.
The effort was ultimately successful, and today, deposition of coor-
dinates and data in a public database is typically a mandatory con-
dition for funding of grants as well as for publication of results in
many prominent journals.
The widespread availability of coordinates has transformed the
study of molecular biology. Each structure is a window into a par-
ticular topic, allowing us to see the atomic details of biomolecular
processes. But that is only the beginning. An entire field of structure-
based drug design has been built upon these structures, allowing
the discovery of new pharmaceuticals to fight everything from HIV
to depression. Comparison of many different structures has led
directly to new insights about the general principles for biomolecu-
lar structure and function and the evolution of these molecules, and
these insights have blossomed into an entire field of protein design
and biotechnology.
Today, we can download atomic structures for nearly any bio-
logical molecule we would be interested in exploring, from tiny hor-
mones to huge viruses (. Fig. 1.1). Most of the illustrations in this
book are created directly from atomic coordinates from the PDB or,
in some cases, from the experimental data supporting the atomic
Chapter 1 The Protein Data Bank
3 1
. Fig. 1.1 Selected structures from the Protein Data Bank. The Protein Data Bank archives atomic structures of
biomolecules such a proteins, DNA, and RNA. A few familiar examples are shown here. Three small molecules are shown
for size comparison: (1) water, (2) glucose, and (3) ATP. Proteins in the blood: (4) antibody, (5) insulin, and (6) brinogen.
Digestive enzymes: (7) lysozyme, (8) pepsin, and (9) amylase. A virus: (10) rhinovirus. Membrane-bound proteins: (11) ATP
synthase, (12) adrenergic receptor and G-protein, (13) potassium ion channel, and (14) photosystem II. A few interesting
proteins: (15) hemoglobin, (16) green uorescent protein, (17) luciferase, and (18) ribulose-bisphosphate carboxylase
oxygenase. Molecules involved in protein synthesis: (19) ribosome, (20) transfer RNA, (21) aminoacyl-tRNA synthetase,
(22) protein chaperone GroEL/GroES, and (23) ubiquitin. A few enzymes: (24) catalase, (25) nitrogenase, and (26) leucine
aminopeptidase. Proteins that bind to DNA: (27) repair protein DNA photolyase, (28) topoisomerase, (29) RNA
polymerase, (30) lac repressor, (31) catabolite gene activator protein, and (32) transcription factor complex. Iron storage
protein (33) ferritin and three enzymes involved in sugar metabolism: (34) hexokinase, (35) phosphofructokinase, and
(36) pyruvate kinase (PDB entries 1igt, 2hiu, 1m1j, 2baf, 1lz1, 5pep, 1smd, 4rhv, 1e79, 1c17, 3sn6, 3lut, 1s5l, 4hhb, 1g,
2d1s, 1rcx, 1j5e, 1jj2, 4tna, 1y, 1aon, 1ubq, 1qqw, 1n2c, 1lap, 1tez, 1a36, 1tlf, 1efa, 1cgp, 1ais, 1hrs, 1dgk, 4pfk, 1a3w)
4 Chapter 1 The Protein Data Bank
Seeing Is Believing:
Methods of Structure
Solution
. Fig. 2.2 Resolution of crystallographic electron density maps. Three electron density maps of DNA are shown here.
At the upper left is a very high-resolution structure, where every atom is resolved, and we can even see hints of
hydrogen positions. At the lower left is a more typical map, similar in resolution to most of the structures in the PDB. The
overall shape of the bases and backbone, as well as a beautiful hydrated magnesium ion, is easily discernable, but
individual atoms are not resolved. At the right is a low-resolution structure, which is sucient to place the overall shape
of the double helix, but not resolve the individual nucleotides (PDB entries 4hig, 196d, 3gbi, maps taken from the
Uppsala EDS server)
. Fig. 2.3 Pitfalls of the PDB. ATP synthase (left) is a rotary motor with several moving parts. The whole assembly has
not been crystallized yet, but structures have been obtained by cutting it into several more or less rigid pieces.
G-protein-coupled receptors were an elusive target for many years, until researchers engineered a version with an entire
lysozyme protein grafted into one loop. The lysozyme assists in the formation of crystals (right) (PDB entries 1c17, 1e79,
1l2p, 2a7u, 2rh1)
. Fig. 3.1 Some experiments in molecular visualization. Left: the Evans and Sutherland Multipicture System allowed
interactive display of dots and lines and was widely used by crystallographers to interpret their experimental electron
density maps. This image shows a cross section through DNA molecule, with lines to show the bonds between atoms
and dots to show the surface of the molecule. Center: pen plotters were used to create illustrations for journal
publications, where most gures were printed in black and white. This illustration shows all of the sites of interaction
between this DNA molecule and its neighbors in the crystal lattice. We often printed stereopairs like this to provide
(with a little practice) a three-dimensional view. Right: raster images, which are used for almost everything today, were
quite slow when they were rst developed. This illustration of DNA took almost an hour to calculate
Chapter 3 Visualizing the Invisible World of Molecules
13 3
great advantage of creating a picture directly from the experimental
structure, so the image is true to the actual data. The artistry comes
in when we design the best way to capture a particular aspect of our
molecular subjects.
. Fig. 3.2 Modern molecular graphics programs. Dozens of eective programs for molecular graphics are freely
available to explore molecules on your computer. Top: Python Molecular Viewer is a modular molecular graphics
program with many sophisticated methods for displaying molecules, electron density maps, and other aspects of
molecular structure. Bottom: JSmol is the most popular method of embedding molecular graphics into web pages.
For instance, it is used at the RCSB Protein Data Bank site to allow instant viewing of any of the structures stored in the
archive, and as shown here, in the Molecule of the Month column at the RCSB site. Many of the illustrations in the book
were created with these two programs
14 Chapter 3 Visualizing the Invisible World of Molecules
. Fig. 3.3 Visual representations of biomolecules. These three representations of myoglobin, capturing dierent
aspects of the molecule, were created with JSmol at the RCSB PDB website. Left: a bond diagram shows the atomic
details of oxygen binding to iron. Thin bonds are used for the protein, and balls and sticks are used to show the heme,
iron, oxygen, and an important histidine amino acid. Center: a spacelling diagram shows how the heme ts in a
form-tting pocket. Right: a cartoon diagram shows how the chain folds into a series of alpha helices that surround the
heme (PDB entry 1mbo)
. Fig. 3.4 Common coloring conventions for proteins. Left: hemoglobin is colored by atom type, based on the
scheme developed by Linus Pauling. Center: backbone diagrams often color individual chains dierently, to show
how they assemble together. Right: a rainbow scheme makes it easier to follow the chain from one end to the other
(PDB entry 2hhb)
. Fig. 3.5 The danger of defaults. These images show a structure of DNA bound to a sequence-reading antibiotic,
displayed using the default settings in several visualization methods available on the RCSB PDB website. From left to
right, the default static image at the PDB structure summary page, the JSmol default image, and the PV default image.
All of these are great for getting a quick feeling for the structure, but with a small amount of work, you can customize
the image to highlight features of interest. For instance, I created the image on the right in JSmol to highlight the
perfect t of the antibiotic (blue) within the narrow minor groove of the DNA (PDB entry 6bna)
. Fig. 4.1 Packing of DNA in crystals. B-form DNA has almost exactly 10 nucleotides per turn of the double helix.
Early structures were determined using chains that were 12 nucleotides long, so they formed an odd crystal lattice with
the ends of the chains overlapped. Later structures shortened the chain to 10 nucleotides, which stacked beautifully to
simulate a long DNA double helix (PDB entries 1bna, 196d)
20 Chapter 4 The Twists and Turns of DNA
. Fig. 4.2 X-ray diraction of DNA. The x-ray diraction image at the upper
left is from bers of natural DNA, and the pattern at the lower left is from a
crystal of a small piece of synthetic DNA. Both show the distinctive pattern of
DNA diraction, with a strong signal above and below the center and an
X-shaped feature closer to the center. The strong diraction is produced by the
regular stacking of bases in the DNA helix (horizontal lines at right), and the
X-shaped pattern is produced by the helical arrangement of the backbones
(diagonal lines at right)
termed the A-helix. Hybrid double helices, with one DNA strand
and one RNA strand, also form this type of helix, for instance,
when HIV is building a DNA strand with its reverse transcriptase.
Some special sequences of DNA, for instance, DNA with alternat-
ing cytosine and guanine nucleotides, can be induced to form a
helix with the opposite handedness, termed the Z-helix. Its still
not known if this is an experimental oddity or if it actually plays a
functional role in cells, for instance, helping to relieve stress in the
double helix when it is pulled apart when being duplicated or cop-
ied. If you search around the PDB, you can also find several exotic
structures of DNA, such as odd X-shaped Holliday junctions
formed during the process of recombination and tough quadru-
plex blocks of guanine bases that may seal off the ends of chromo-
somes in telomeres.
Chapter 4 The Twists and Turns of DNA
21 4
. Fig. 4.6 Tour of an electron density map. The upper gure shows the hexagonal crystal lattice, with one DNA helix in
red. Notice that there are large open channels between the helices, which are lled with disordered water molecules
that dont give a strong signal in the electron density. At bottom left, a spine of water molecules (small red spheres) ll
the narrow minor groove. At bottom right is an AT base pair and a calcium ion (purple) surrounded by a coordination
sphere of seven water molecules (PDB entry 158D)
25 5
. Fig. 5.1 Bacterial replisome. The replisome includes several proteins interconnected
with exible linkers. The DNA is shown in blue, and the newly synthesized strands of DNA
are in white. Helicase (1) separates the two strands, and primase (2) builds a short piece
of RNA (green) on one strand to act as a primer. The clamp loader (3) encircles the DNA
strand with a sliding clamp (4), which improves the processivity of DNA polymerase (5).
DNA synthesis by DNA polymerase on the leading strand (6) proceeds continuously, but
it builds short segments on the lagging strand (7) since the lagging strand is oriented in
the opposite direction. Single-stranded DNA-binding protein (8) protects the lagging
strand, while the DNA copy is being made (image created in collaboration with Jacob
Lewis and Nicholas Dixon, University of Wollongong)
28 Chapter 5 The Central Dogma
. Fig. 5.2 RNA polymerase. This structure of a yeast RNA polymerase (blue)
includes two strands of DNA (orange) that have been opened up to form a
transcription bubble and a short piece of RNA (red) being transcribed. A
magnesium ion (green) assists with the addition of each new nucleotide to the
growing RNA chain (PDB entry 5c4j)
. Fig. 5.3 Ribosomes in action. Three atomic structures capture ribosomes (blue and green) in the process of
building a protein chain. Elongation factor Tu (magenta) delivers a new transfer RNA (yellow), pairing its anticodon with
the messenger RNA (red) codon. The ribosome then catalyzes the formation of the peptide bond; the structure at the
center includes two transfer RNA molecules with amino acids attached (bright green) and positioned in the catalytic
site. Finally, elongation factor G (magenta) shifts everything by one codon (toward the right in this illustration), opening
a space for the next transfer RNA (PDB entries 4v5g, 4v5d, 4v5f )
knew that some viruses carry their genome in RNA and thus would
need a machine to create a duplicate RNA strand from an RNA
template. Additional study revealed that other viruses carry their
genome in RNA but create a DNA copy once they get inside the cell
and start wreaking havoc. These retroviruses, such as HIV, use a
reverse transcriptase enzyme to reverse the canonical information
flow (. Fig. 5.5).
Lest it seem that this heretical use of information is limited to
viruses and other evil organisms, we only need to look to our own
cells to find an example of reversed information flow. The enzyme
telomerase contains a small piece of RNA that it uses as a template
to build long repeated sequences of DNA that protect the ends of
Chapter 5 The Central Dogma
31 5
. Fig. 5.5 Reverse transcription. Bacterial DNA polymerase (left) and HIV
reverse transcriptase (right) are both shaped like a hand, with ngers and
thumb that wrap around the nucleic acid strands. DNA polymerase performs
the classic reaction, creating DNA using a DNA template, and reverse
transcriptase performs the unusual reaction of creating DNA using a RNA
template. Both structures were determined with short pieces of DNA (orange)
and/or RNA (red) bound in the active site (PDB entries 1tau, 4pqu)
. Fig. 6.1 Base pairing in transfer RNA. Transfer RNA is stabilized by many traditional base pairs, such as the two
shown on the left, but it has also evolved to incorporate unusual pairings to stabilize the structure. In the base pair at
the upper right, the adenine base has an extra methyl group, causing it to ip in its interaction with the uracil. In the
triplet at the lower right, a normal A-U base pair is joined by a second adenine (PDB entry 1tra)
Chapter 6 The Secret of Life: The Genetic Code
35 6
Mispairing of bases also plays an essential role during the syn-
thesis of proteins. The 20 amino acids in proteins are encoded by
triplet codons in DNA, along with a few codons used to specify the
end of a protein. Doing the math, we see that there are 64 possible
codons, so there is some degeneracy to the code, and several dif-
ferent codons are used to specify the same amino acid. However, if
we look inside the nucleus, we find that there are only 20 or so
types of transfer RNA that match up the appropriate amino acids
with its codon. This requires some mismatching of the transfer
RNA anticodon with all of these different codons. This is accom-
plished by allowing some wobble in the third position of the
codon, so that different pairings are allowed. When the structures
of ribosomes were solved, it was found that the first two bases in
the codon are tightly controlled by the ribosome, ensuring only
the proper pairing, but the third position is looser, allowing some
wobble (. Fig. 6.2).
The story continues to build from the pairing and mispairing of
bases: many additional levels of information are layered on top of
this. One edge of each DNA base is involved in base pairing, but this
leaves additional hydrogen-bonding groups exposed in the two
grooves of the double helix. These base edges are recognized by the
many proteins that regulate the use of DNA information. These pro-
teins reach into the grooves and feel for specific sequences of DNA
(. Fig. 6.3). Researchers have searched unsuccessfully for a general
code (something akin to the pairing of A with T and G with C) to
understand how these are recognized. Rather, each protein seems to
. Fig. 6.2 Wobble in codon-anticodon pairing. Transfer RNA molecules (shown in red) are able to recognize several
dierent codons by allowing some wobble on the third base. These two structures show the transfer RNA that encodes
phenylalanine paired with the two codons that specify the phenylalanine, UUU and UUC. The traditional base pair is
formed with UUC, and a wobble base pair is formed with UUU. The ribosome was also included in both of these
structures, revealing that it surrounds the bases and monitors the base pairing. The ribosome is not shown here, for
clarity (PDB entries 1ibl, 4v9d)
36 Chapter 6 The Secret of Life: The Genetic Code
. Fig. 6.3 DNA recognition by proteins. The basic principles of DNA recognition by proteins were discovered in early
structures of regulatory proteins from bacteriophages, such as the lambda repressor structure shown here (PDB entry
1lmb)
. Fig. 6.4 Sequence-reading molecules. The toxic bacterial antibiotic netropsin, shown on the left in blue, reads A-T
base pairs in DNA. As shown in the center, it forms hydrogen bonds (green lines) with the base edges, positioning a
carbon atom (star) near the A base. If this base were G instead, it would have an amino group (shown in blue with the
letter N) that would clash with the netropsin carbon atom. Sequence-reading molecules are being designed by
substituting this carbon atom for other atoms, such as nitrogen. Two of these designed molecules typically bind side by
side in the DNA groove, as shown on the right, each reading one of the bases in the base pairs (PDB entries 6bna, 365d)
. Fig. 6.5 DNA modication. In the upper structure, HhaI methylase is captured in the process of adding a methyl
group to a short piece of DNA. The enzyme has ipped the base out of the double helix and is using a cofactor (in green)
to donate the methyl group. In the lower structures, EcoRV endonuclease is caught before and after its cleavage
reaction. In the structure with cleaved DNA, the site of cleavage is shown with two stars (PDB entries 1mht, 1rva, 1rvc)
. Fig. 6.6 BRCA2. BRCA2 is a huge, exible protein involved in DNA repair. Several structures of dierent portions of
the protein were used to assemble this illustration of the protein bound to a single strand of DNA (red) and the repair
protein Rad51 (blue) (PDB entries 1miu, 1n0w)
. Fig. 6.7 Adenovirus. Adenovirus is composed of an icosahedral protein coat (blue) with long laments at the
vertices (green). The laments help the virus attach to the cells that it infects (PDB entries 1vsz, 1qiu)
heated up, separating the sample strand from the duplicated strand.
Then, the polymerase duplicates both of these. Repeated rounds of
heating and replication create many identical copies of the
DNA.The whole thing is made easier by using this heat-resistant
polymerase, so you dont have to add new enzyme with each round.
Another great hope, building directly on our knowledge of the
genetic code, is gene therapy: the ability to replace faulty genes in
the cells of a patient, curing genetic diseases at their source. Once
weve identified the problem, synthesizing the corrected DNA is
fairly straightforward, but the tricky part of this is finding a way to
get the genes into the afflicted cells. Today, this is primarily done by
creating an engineered virus, such as adenovirus (. Fig. 6.7), that
infects the cell and inserts the therapeutic DNA in the process. In
this way, were taking the reigns from evolution and taking personal
charge of our own genetic information.
41 7
Evolution in Action
I can safely say that evolution is a familiar thing to me. After grow-
7 ing up with many visits to the museum and after gathering a small
personal collection of fossil fish and insects trapped in amber, I can
easily imagine a world very different from ours, with dinosaurs
roaming through a forest of giant ferns and giant dragonflies. With
the help of ancient pot shards and stone knives, I can imagine hair-
ier versions of myself discovering fire and hunting mammoths. I
can even imagine tiny cells, newly minted, colonizing the early
Earth and gradually, over millennia, flooding it with oxygen to cre-
ate the world we live in today.
We often take this type of historical view of evolution. Looking
at the similarities and differences between living organisms and
comparing them with fossil remains of ancient organisms, we
reconstruct the gradual changing of life on Earth over millions of
years (. Fig. 7.1). But evolution is continuing today, naturally and
through our own intervention. You only need to visit a rose garden
filled with huge blooms, or compare a wolf with the many different
breeds of dogs being walked in your local park, to see the results of
human-driven evolution. Looking at the atomic structures of bio-
molecules, we can find abundant evidence for the history of evolu-
tion, and we can also watch evolution in action today.
What is evolution? Evolution is a unique process that produces
increasingly better organisms, but without the need for any intelli-
gent intervention. Its no wonder that the theory of evolution caused
so much consternation when proposed by Charles Darwin, since its
so different than anything in our familiar lives. It goes against our
intuition, since were used to planning and designing when we build
things ourselves. But biological evolution takes a less directed, but
highly successful, approach.
For evolution to work, a few things are needed. First, evolution
requires a population of individuals that reproduce to create chil-
drenevolution doesnt work on a single organism, but rather
works over many generations. Next, a source of variation in the
population is needed, with traits that are passed from parents to
children. In natural evolution, this variation is random and happens
through mutation of DNA.Finally, evolution requires a source of
selection that favors the best individuals in the population. Given
these thingsselection of a population that has inheritable varia-
tionsthe population will gradually change and improve as the
best individuals dominate and the weaker ones lose out.
Darwin developed his theory of evolution after observing varia-
tion in populations, such as the natural variations in finches on the
Galapagos Islands or the many types of pigeons bred by fanciers.
Every time I work in the garden, I try to make similar connections
between the shapes of the flowers, figuring out how the differences
might have improved their competitiveness as they were evolving.
Evolution is also firmly in my mind every time I pull weedsI
always feel like Im selecting and evolving a breed of weeds that are
best suited to elude me and my shovel.
Darwin observed these variations in populations of birds, but at
the time, it wasnt known how the variation occurs or how it is
passed to offspring. The discovery of DNA and genetic information
Chapter 7 Evolution in Action
43 7
was the missing piece of the puzzle for understanding the mecha-
nisms of biological evolution. Organisms gain variability through
mutation of their DNA.Natural radioactivity from the environment
or errors in copying the DNA introduce small changes into the
genome, which then cause small changes in the proteins that are
encoded. In some cases, very small changes in the genome can
have large effects in the form and behavior of the organism. Most
cause problems, and scientists have uncovered countless examples
of point mutations that corrupt the function of a protein and lead
to a disease state or death. But in some cases, the mutation leads to
an improved form of a protein, and a competitive benefit for the
organism.
The classic example is sickle cell anemia, which surprisingly is
both a loss-of-function mutation and a beneficial mutation. A
single mutation in the gene for hemoglobin, which changes a
small alanine to a larger leucine, has wide ranging effects. It cre-
ates a small sticky spot on the protein, which causes it to form long
filaments under some conditions. These filaments distort the
blood cells and cause life-threatening circulatory problems. But at
the same time, the filaments inhibit infection by the parasites that
cause malaria, so the mutation also provides a selective advantage
in areas where malaria is a danger, for people that only carry one
gene for the mutated protein. Structures are available for both the
unmutated form and the mutated form of hemoglobin, revealing
how this tiny mutation can induce the formation of fibers (.
Fig. 7.2).
44 Chapter 7 Evolution in Action
. Fig. 7.3 HIV resistance mutation. Four structures of HIV protease follow the evolution of drug resistance. The
enzyme is composed of two identical chains, so each mutation (shown in red) shows up in two places, on each half of
the complex. The drug (shown in blue) binds in a tunnel-shaped active site, gripped by two aps that close over the top
(PDB entries 2az8, 2az9, 2azb, 2azc)
. Fig. 7.4 Cytochrome C evolution. A family tree of our ancient ancestors may be created by counting up the
numbers of changes in the proteins found in modern organisms, identifying our close relatives and our distant relations.
Cytochrome c is shown here. Our molecule is in pink, with the bound heme group in bright red. Amino acids that have
changed to chemically similar amino acids are shown in lighter pink in the cytochrome c proteins from other organisms,
and amino acids that change to entirely dierent amino acids are in white (PDB entries 3zcf, 2b4z, 1hrc, 1cyc, 1ycc)
. Fig. 7.5 Designer molecules. Scientists have used articial evolution to discover small RNA molecules (aptamers)
that bind selectively to thrombin, an enzyme involved in blood clotting. A modular approach was used to design the
protein cage, by linking together existing proteins that form dimers and trimers (PDB entries 1hut, 4i7y, 3vdx)
that associate with known geometries (Fig. 7.5). Scientists have also
built chimeric molecules that combine functions. For instance,
antibody molecules have been tethered to deadly toxins to create
new cancer therapies. The antibody binds specifically to cancer
cells, allowing the toxin to kill it.
Of course, in any discussion of biological evolution, we naturally
find our imagination drifting back to the very beginning. People are
great speculators, and when it comes to events that happened in the
distant past, there are as many theories as there are scientists. The
origin of life is one of these topics that promote much discussion
and much disagreement. Many scientists have worked to gather evi-
dence for processes that could have generated life based solely on
the molecules and conditions that were present on the early Earth.
Many lines of evidence have pointed to RNA as being the first
living molecule on the Earth. Experiments in the laboratory have
shown that RNA molecules can be artificially evolved to perform
reactions or make copies of themselves. These may represent some
of the first steps toward life. Scientists have also looked at the exist-
ing molecules in cells for clues. The most provocative piece of infor-
mation came when the structure of ribosome was solved. All living
things rely on ribosomes, indicating that they must have been pres-
ent in the earliest cells. Looking closely at the active site of the ribo-
some, where new proteins are built, the structure reveals that the
machinery is composed of RNA, and a particular RNA base per-
forms the reaction (. Fig. 7.6). This, along with the central role
played by RNA in all aspects of protein synthesis, has been taken as
evidence for an RNA World, where self-replicating RNA mole-
cules evolved and discovered the basic processes of life.
Chapter 7 Evolution in Action
49 7
. Fig. 7.6 Active site of the ribosome. This structure includes a ribosome with the tips
of two transfer RNA molecules (magenta and blue spheres) bound in the protein-building
site. The ribosome nucleotide shown in red catalyzes the reaction (PDB entry 2wdl)
51 8
Just yesterday, I was walking to my car from the lab and I found a
8 stick insect on the sidewalk. I picked him up to return him back to
the greenery. As I was coaxing him onto my hand, he suddenly
folded up his long, spindly legs and turned into, well, a stick.
Evolution is an odd, meandering process, which often produces
unexpectedly magical results. Theres much evidence for this in our
everyday world: you only need to look around. Here in my California
garden, I have found caterpillars shaped exactly like bird droppings.
We have katydids that look exactly like leaves. Ive seen a moth that
looks exactly like a bumblebee and a hand-sized moth with such
perfect camouflage that it disappears completely when it lands on
the trunk of a tree.
All of these natural wonders have been shaped by evolution.
Because of natural selection, they are the best at what they do,
hiding from predators or scaring them off. Their ancestors were
the most successful, ultimately surviving where their less per-
fectly shaped ancestors perished. In the same way, evolution has
shaped the proteins in cells. Proteins are constructed in many
strange and elaborate shapes and evolved to optimize their many
diverse functions.
The mechanisms of evolution impose some specific constraints
on the way proteins evolve. Evolution at the molecular scale is
tricky. In order for a mutant protein to be successful, it has to per-
form its job continuously and help keep the cell alive. So, cells with
harmful mutations, and faulty proteins, rapidly die. This means that
legacy is a key limitation of biological evolution: every step along
the way must build on a successful predecessor. This legacy is easily
seen by looking at any protein. Chemists tell us that amino acids
can be made in two similar varieties, a left-handed form and a right-
handed form. However, all natural proteins, with the exception of a
few odd antibiotics created by the occasional microorganism, are
composed of amino acids with only one of these two possible hand-
ednesses (7 see Fig. 10.3).
The other handedness would work equally well. This was shown
in an amazing experiment from the laboratory of Steve Kent, where
they chemically synthesized a protein from scratch, entirely from
amino acids with the opposite hand. The structure was a perfect
mirror image of the natural protein, and it worked perfectly well on
substrates that also had a mirrored conformation. So, the current
ubiquitous handedness is a fossil from the earliest forms of life, and
weve been stuck with it ever since.
So how can a protein ever mutate and change if it must be con-
tinually active? One of the common mechanisms is to build a
backup copy through gene duplication. The gene for the protein is
copied and inserted into the genome. Then, one copy is able to
mutate and diverge, while the other remains the same and contin-
ues to perform its job. When you look at protein structures, exam-
ples of gene duplication show up everywhere. By comparing the
location of these different copies in the genomes of different organ-
isms, it has become apparent that our entire genome has been
duplicated several times, followed by a period where most of the
duplicate genes are weeded out. We only need to look at our most
Chapter 8 How Evolution Shapes Proteins
53 8
familiar protein, hemoglobin, to see an example of this. Our genome
includes several very similar proteins, presumably all created by
duplication from an original ancestor protein. These include the
two chains of hemoglobin, a few different forms of hemoglobin
optimized for use before birth, myoglobin, and two more recently
discovered forms with as-yet unknown function: cytoglobin and
neuroglobin (. Fig. 8.1).
The ease of gene duplication has led to a modular approach to
the evolution of proteins. Looking at the proteins in modern cells,
most are composed of compact domains. Comparing different pro-
teins, we find that these domains are reused over and over again in
new functional contexts. Some domains are particularly successful
and have been pressed to service in many different proteins. For
instance, a domain that binds to the cofactor NAD, first discovered
by Michael Rossman and named for him, shows up in many differ-
ent proteins that use the cofactor in their function (. Fig. 8.2). In
other cases, a similar domain may be repeated multiple times in a
single protein. For instance, the giant protein titin, which acts like a
. Fig. 8.1 Gene duplication in human globins. All of these proteins are encoded in the human genome, and all are
thought to have evolved from a common ancestor protein. The structures are colored to show their dierences from the
hemoglobin beta chain, with unchanging amino acids in red, mutations to similar amino acids in pink, and mutations to
entirely dierent amino acids in white. Notice that the regions that stay the same are primarily clustered around the
oxygen-carrying heme group and buried deep inside the protein (PDB entries 1hho, 1fdh, 3rgk, 1ut0, 1oj6)
54 Chapter 8 How Evolution Shapes Proteins
. Fig. 8.2 Modular domains in proteins. The Rossman domain (top) specializes in binding to the cofactor NAD and is
found in many dierent enzymes. Three examples are shown here; in each case, the Rossman domain is connected to a
dierent domain that denes how the NAD is used in the reaction. Titin (bottom) is composed of many domains that
form a long, exible band. This structure includes only four domains in the center of the protein (PDB entries 3gpd, 1i10,
1htb, 3b43)
Chapter 8 How Evolution Shapes Proteins
55 8
long elastic band that controls the stretching of muscle fibers, is
composed of several hundred similar domains all strung in a row,
like beads on a string.
By looking at the many organisms in biosphere, evolutionary
biologists have discovered fascinating patterns in the way that life
has evolved. For instance, divergent evolution is a process where a
population of organisms is split, and they gradually evolve new
traits. A familiar example is our handit developed from the front
feet of a distant mousy ancestor. If we look at our extended family of
mammals, this same limb has evolved to form hooves and flippers.
Convergent evolution, on the other hand, is just the opposite. This is
when two different populations have a similar selection pressure
and evolve traits that are similar. Eyes are a perfect examplebeing
able to see is a great advantage, and light-sensing eyes have evolved
independently in insects, octopuses, and humans.
Examples of divergent and convergent evolution are every-
where at the molecular scale. My favorite examples are found in
the serine protease digestive enzymes. These enzymes all use a
similar triplet of amino acids to perform their protein-cutting
reaction. A serine interacts directly with the target protein chain,
and a neighboring histidine and aspartate are perfectly posi-
tioned to activate it for the reaction. Looking at our digestive
enzymes, we can find three very similar serine proteasestryp-
sin, chymotrypsin, and elastasethat evolved from a common
ancestor protein and then diverged to attack different protein
sequences (. Fig. 8.3). If we cast our net a bit wider, we can find
many other protein-cutting enzymes that use the same arrange-
ment of serine-histidine-aspartate but have entirely different
foldings of the protein chain (. Fig. 8.4). These are examples of
convergent evolution, where a similar active site evolved within a
different protein framework.
There are even examples of molecular mimicry, reminiscent of
the way that stick insects and bark-colored moths rely on mimicry
for protection. Our immune system is one of the most powerful
selective pressures for pathogenic organisms, and these pathogens
have evolved many ways of mimicking our own molecules to make
them invisible to the immune system. For instance, many viruses,
such as HIV and influenza, coat their surface proteins with sugar
chains, the same sugar chains that decorate all of our normal cell
surface proteins. The unique portions of the viral proteins, which are
essential for finding and infecting cells, are shielded behind this cam-
ouflage of humanlike sugars, so our immune system cant find them.
The bacteriophage T7 has a particularly striking example of
molecular mimicry, creating a protein that mimics DNA. Many
bacteria have a defensive system that marks their own DNA genome
with methyl groups and then cuts any invading viral DNA that isnt
marked with the methyls (7 see Fig. 6.5). T7 phages circumvent
these defenses by building a protein that looks exactly like DNA,
which binds to the defensive enzyme that normally cuts up the
phages DNA (. Fig. 8.5). Once again, evolution has stumbled into
the perfect molecule to solve the problem.
56 Chapter 8 How Evolution Shapes Proteins
. Fig. 8.3 Divergent evolution of serine proteases. These three enzymes cut protein chains using a similar active site
that includes a serine, histidine, and aspartate (shown in shades of purple). They evolved from a common ancestor and
then diverged to cut dierent proteins. Chymotrypsin has a large pocket next to the reactive serine (seen here above
and to the right of the purple serine), so it preferentially digests protein chains next to large amino acids. Trypsin has
evolved a negatively charged group at the bottom of this pocket, so it has a taste for positively charged protein targets.
Elastase has a much smaller pocket, so it prefers small amino acids (PDB entries 2cha, 2ptn, 3est)
Chapter 8 How Evolution Shapes Proteins
57 8
. Fig. 8.4 Convergent evolution of serine proteases. These four enzymes use the same catalytic triad of serine (dark
magenta), histidine (lighter magenta), and aspartate (lightest magenta) to perform their protein-cutting reactions. As you
can see from the ribbon diagrams, however, the protein chains are entirely dierent, providing evidence that they
evolved separately and converged on the same active site machinery. Their functions are also quite dierent: elastase
and subtilisin cut in the middle of protein chains, carboxypeptidase Y clips chains from the end, and aspartyl
dipeptidase breaks very small peptide chains (PDB entries 3est, 1scn, 1wpx, 1fye)
58 Chapter 8 How Evolution Shapes Proteins
. Fig. 8.5 DNA mimic. The protein ocr (short for overcome classical restriction) protects bacteriophages from the
defenses of the bacteria it infects. These structures show how ocr mimics DNA to block EcoKI (shown in shades of blue),
a defensive enzyme that normally cuts infecting phage DNA. EcoKI surrounds the DNA, as shown in the upper left image.
The image at upper right has one subunit removed to show the DNA inside. The complex with ocr is shown at the
bottom. Notice how the shape of ocr matches the DNA double helix, and the negatively charged amino acids (in bright
red) mimic the phosphate groups (in red and yellow) on the DNA (PDB entries 2y7h, 2y7c)
59 9
. Fig. 9.1 Basic principles of protein folding. The rst structure of a protein revealed two basic principles of protein
folding: (left) the peptide chain forms many hydrogen bonds (green) to form a scaold of secondary structure, (right)
carbon-rich amino acids (blue) are packed mostly in the interior, and charged amino acids (red) are displayed on the
surface, in contact with water (PDB entry 1mbn)
Chapter 9 The Universe of Protein Folds
61 9
chains. These two types of structures form the building blocks for
the overall fold of the protein chain.
Second, the side chains of the amino acids, which are differ-
ent for each of the 20 amino acids, direct the folding of the chain
into a particular globular structure. Carbon-rich amino acids
shed their unfavorable interactions with water, driving the fold-
ing to place them in the interior of the protein. Charged amino
acids and amino acids that form hydrogen bonds largely stay on
the surface, interacting with the surrounding water. Many other
additional properties tune and shape the fold. For instance, a
bond may be formed between sulfur atoms in cysteine amino
acids, gluing portions of the chain together. Positive and negative
charges interact favorably with one another, and repulsion of
identical charges directs folding away from some possible folds.
Specific hydrogen bonds between some amino acids may favor a
. Fig. 9.2 Protein secondary structure. Alpha helices and beta sheets provide most of the secondary structure for
proteins. Two other types of helices are rarely seen: 310 helices are wound more tightly than alpha helices, and pi helices
are looser (taken from PDB entries 2viu, 2g8c, 3sbn, 1fuo)
62 Chapter 9 The Universe of Protein Folds
. Fig. 9.3 Protein folds. A few common protein folds are shown here, using the cartoon representation popularized
by Jane Richardson. In each, the alpha helices are shown in magenta, the beta sheets in yellow, and the connecting
loops in white (PDB entries 2ccy, 1mbn, 1lrv, 1ppr, 1cem, 1fbr, 1vie, 1prn, 4bcl, 1stm, 1hcd, 1jpc, 1rie, 1got, 1air, 1ndd,
1tim, 1kvd, 1fua, 2dnj)
. Fig. 9.4 New protein folds deposited each year in the PDB. The Protein Structure Initiative, started in 2000, had the
goal of determining all of the ways that natural proteins fold. It achieved its goal in about 10 years of work, as shown in
this graph of the number of unique protein folds as classied by two popular methods of analyzing protein folds
gained prominence, SCOP and CATH, that codify the ways that
proteins can fold (. Fig. 9.3).
This understanding of protein folding is a foundational piece
of information, particularly if we want to design new proteins
ourselves. In search of this understanding, the scientific com-
munity launched an effort at the turn of this century, termed
structural genomics, to determine structures for all possible
folds. At the time, proteins with new folding patterns were being
discovered right and left, so structural biologists decided to take
a more systematic approach to this grand challenge. The genome
of an organism or set of organisms was analyzed using the best
prediction tools. Then, proteins of interest were identified,
which were predicted to be quite different than anything cur-
rently known. A sophisticated structure determination pipeline
was then brought to bear to solve thousands of these structures.
The new structures help improve the prediction methods, and
the whole effort iteratively cranks out structures that fill in the
gaps in knowledge.
Several of these large efforts were set up around the world, and
the effort was a complete success. Looking at the number of new
folding patterns that are deposited in the PDB, we see a large num-
ber around this time, and then they fall off after a few years, pre-
sumably as the universe of stable protein folds is effectively covered
(. Fig. 9.4). One of the side effects of the effort is an explosion of
structures for domains of unknown function. These structures
were determined based on this goal of finding new folds, and the
research community is now faced with the challenge of figuring out
what they do in the life of the cell.
64 Chapter 9 The Universe of Protein Folds
. Fig. 9.5 Designed proteins. Designed proteins FSD-1 and Top7 both build on the basic principles of protein folding,
with a scaold of secondary structure (as seen in the ribbon diagrams at the top) and a partitioning of charged amino
acids (red in the lower images) on the surface and carbon-rich amino acids (blue) in the interior (PDB entries 1fsd, 1qys)
67 10
. Fig. 10.1 Symmetrical assemblies. Sliding clamps used in DNA replication have evolved to encircle DNA, but a
bacterial clamp (left) and a human clamp (right) achieve this function using assemblies with two dierent symmetries
(PDB entries 1axc, 2pol)
Chapter 10 Order and Chaos in Protein Structure
69 10
locations, they all have the same surfaces for interactionbasically,
one type of subunit is all that is needed. This is easier to evolve and
more economical to build. Also, symmetrical complexes, or at least
those based on point groups, are self-limitingthey form a defined
complex, not an open-ended aggregate. Aggregates are a great danger
to cells, since they clog everything up. So, when we look to the PDB,
we find that nearly all complexes are symmetrical.
Ironically, highly symmetrical filaments are often some of the
most difficult subjects to study with crystallography. The reason for
this is that filaments are often built with perfect helical symmetry,
but the symmetry is rarely exactly what is needed to build a crystal
lattice. For instance, actin filaments have about 13 subunits in 6 turns
of the helix, which doesnt fit nicely into the two-, three-, four-, and
sixfold symmetries that are compatible with crystals (. Fig. 10.2). So,
. Fig. 10.4 Antibody linkers. Flexible linkers connect the dierent functional domains of antibodies. These linkers
contain many proline amino acids (green) that kink the chain and keep it from adopting a folded structure. The linker
also includes several cysteine amino acids (with sulfur shown here in yellow), which form cross-links that connect the
antibody chains (PDB entry 1igt)
72 Chapter 10 Order and Chaos in Protein Structure
. Fig. 10.5 Viral quasisymmetry. Quasisymmetry is used to construct viral capsids of dierent sizes. Satellite tobacco
necrosis virus is composed of 60 subunits in perfect T=1 icosahedral symmetry. Tobacco bushy stunt virus is composed
of 180 subunits in T=3 quasisymmetry: 60 form the vefold vertices (colored red), and the remaining 120 form a ring of
six centered on the threefold axes (colored orange and yellow). Similarly, the Nudaurelia capensis omega virus has 240
subunits in T=4 quasisymmetry, and bacteriophage HK97 has 420 subunits in T=7 quasisymmetry (PDB entries 2buk,
2tbv, 1ohf, 1ohg)
Chapter 10 Order and Chaos in Protein Structure
73 10
10
. Fig. 10.8 CBP protein. The modular CBP protein has been studied by crystallography and NMR spectroscopy by
cutting it into pieces and including only small pieces of the interacting proteins (green) (PDB entries 1l8c, 1kdx, 1jsp,
3biy, 2ka6, 1kbh)
. Fig. 10.9 Proteins with unstructured tails. NMR was used to study two initiation factor proteins with unstructured
tails. When they form a complex, a portion of the tail (highlighted in turquoise) of eIF4E binds in a groove in eIF4G (red),
adopting a dened structure (PDB entries 1ap8, 1rf8)
These fibrils have been very difficult to study at the atomic level,
because they often have many similar, but different forms, and lack
the order or periodicity to form crystals. Scientists have used a vari-
ety of methods to probe their structure, combining NMR studies,
which can provide information on the local conformation of the
chain and portions that are close to one another, with electron
microscopy, which gives an understanding of the overall form of
the fibril and the way that the protein chains are stacked within it.
By using this type of integrated approach, throwing every technique
we have at a difficult problem, were able to expand our conception
of what proteins are, and how they balance order and chaos as they
go about their jobs.
76 Chapter 10 Order and Chaos in Protein Structure
10
. Fig. 10.10 Amyloid bers. Amyloid-beta precursor protein, shown on the left, is normally found in the membrane
of nerve cells. The cells processing proteins cut it into dierent pieces, creating a small peptide (shown in green) in some
cases. This peptide can refold to form long amyloid bers, as shown on the right, that contribute to the nerve problems
in Alzheimers disease (PDB entries 1mwp, 1owt, 1rw6, 1iyt, 2m4j)
77 11
Molecular Electronics
. Fig. 11.1 Electron carriers. Soluble electron transport proteins use many tools to transport electrons. Cytochrome c
uses an iron ion held in heme and ferredoxin uses a cluster of iron and sulfur. Plastocyanin has a copper ion and
avodoxin uses avin molecule (PDB entries 3cyt, 1a70, 5pcy, 1ag9)
Chapter 11 Molecular Electronics
79 11
quantum mechanical tunneling. The position of each electron is
fuzzymost of the time its near the atom nucleus, but there is a
small chance that it will be found at a distance from the nucleus, a
chance that gets less and less probable over longer distances.
Looking for clues in the structures of proteins with electron trans-
port chains, we find that a distance of about 14 nanometers is the
maximum distance where this tunneling occurs at functional rates.
Both of these types of electron transport are exemplified in a
structure of cytochrome c with one of its metabolic partners, cyto-
chrome bc1 (. Fig. 11.2). Cytochrome bc1 is a large membrane-
bound protein that uses a flow of electrons to pump protons across
the membrane. It has a string of iron atoms, held in heme and iron-
sulfur clusters, that electrons flow through to power the pump.
When the electron reaches the end, it is delivered to its final desti-
nation by cytochrome c, a small carrier protein with a heme group
at its center. The complex shows that the cytochrome c docking site
positions its heme group right next to one of the hemes in cyto-
chrome bc1, allowing an electron to tunnel across the gap.
The electricity that powers our homes is largely (at least for
now) obtained by the burning of fossil fuels, and the heat is used to
power generators. Cells take a much more delicate approach to
obtain their electrical energy. Early evolution of life, reactive mole-
cules were common in the environment, and the earliest cells
tapped them for energy. We can look to exotic bacteria living on the
Earth today to get a feeling for what is must have been like then. For
instance, some of these bacteria use hydrogen gas as their raw
. Fig. 11.2 Cytochrome bc1 and cytochrome c. This complex includes the small soluble protein cytochrome c
(red backbone) and the large membrane-bound protein cytochrome bc1 (blue backbone). The complex brings the heme
groups of the proteins in close proximity, allowing an electron to tunnel from one to the other (PDB entry 1kyo)
80 Chapter 11 Molecular Electronics
11
. Fig. 11.3 Hydrogenase. This atomic structure was determined by a combination of NMR spectroscopy and structural
modeling and captures the transfer of electrons from hydrogenase (blue backbone) to a cytochrome (red backbone). The
hydrogen-splitting site has a usual cofactor (in atomic spheres at the left) with two iron ions, cyanide and carbon
monoxide, and a small sulfur-containing molecule. The electrons that are released from the reaction jump through three
iron-sulfur clusters (at center) and end up in the heme group of the cytochrome (at right) (PDB entry 1e08)
. Fig. 11.4 Respiratory supercomplex. Electron microscopy has been used to study a supercomplex of the three large
protein complexes involved in respiratory electron transport. As electrons ow from cofactor to cofactor, they power
the pumping of protons across the membrane, charging an electrochemical battery (PDB entry 2ybb)
11
. Fig. 11.5 Metalloenzymes. Superoxide dismutase (top) uses a copper ion and a zinc ion to extract electrons from a
destructive superoxide radical, and xanthine oxidoreductase (bottom) shuttles electrons from a avin cofactor
(in spheres near the top) through two iron-sulfur clusters to a complicated molybdenum cofactor (in spheres near the
bottom), where it performs a reaction that converts purine bases into soluble waste products that can be excreted
(PDB entries 2sod, 1fo4)
83 12
Green Energy
Plants are the very definition of green energy. They are powered by
12 sunlight and grow using a few common resources in the environ-
ment. They are infinitely renewable, returning everything to the
environment when they die. And they do all this using molecules
that color our world in beautiful shades of green and red and yellow.
Structural biologists are looking to plants for hints about how they
live so gracefully, with the hopes that we can somehow incorporate
these principles into our own management of energy resources.
At the center of the green energy of plants is a green molecule:
chlorophyll. It is a small organic molecule with a magnesium ion at
its center, which has the special property that it absorbs light and
uses it to energize an electron. These energetic electrons can then be
passed down a chain of electron carriers, which are wired to power
energy-requiring tasks, in particular, to charge up an electrochemi-
cal battery. Structures of these molecules have revealed a multitude
of amazing aspects to the process.
All the action occurs inside huge protein complexes, called pho-
tosystems, that hold the chlorophyll and other molecules in exactly
the right orientations. At the center is a special pair of chlorophyll
molecules, the ones that ultimately spit out the energetic electron
and are later restored by stripping a less-energetic electron out of
water. Surrounding this are a host of other brightly colored mole-
cules that absorb light and transfer the energy inward to this central
pair.
The most advanced methods for crystallography are currently
being used to watch this process in action. In these methods, tiny
crystals of the photosystem are subjected to a very powerful beam
from an x-ray laserso powerful that it destroys the crystal in the
process. But before it does, x-rays are diffracted by the crystal and
measured, capturing one view of the crystal. This is repeated for
thousands of tiny crystals, randomly building up a full data set of
the diffraction pattern from different angles. One of the advantages
of the method is that it is very fast, capturing a defined moment. So,
researchers can illuminate the crystal and then determine a struc-
ture at defined times after the photon is absorbed.
The results are quite subtle (. Fig. 12.1). Most of the protein and
its associated cofactors stay in exactly the same places as light is
absorbed. But a strategically placed tyrosine amino acid changes
position slightly, shifting toward the chlorophyll molecules that
absorb the light. Spectroscopic studies of this protein have revealed
that this tyrosine loses its hydroxyl hydrogen, gaining a negative
charge. Although the hydrogens are not seen in the crystallographic
experiment, the motion is evidence of this change, as the negatively
charged form of the tyrosine moves toward the positively charged
chlorophyll and helps mediate the flow of electrons through the
chain of cofactors.
There are many other amazing aspects to the process. For
instance, many photosystems are surrounded by a field of antenna
proteins, themselves filled with light-absorbing molecules like chlo-
rophyll and carotene (. Fig. 12.2). These all work diligently to absorb
photons and then transfer the energy from molecule to molecule
until it reaches the special pair at the center of the photosystem.
Chapter 12 Green Energy
85 12
. Fig. 12.1 Bacterial photosystem. Two structures of a bacterial photosystem were determined, before and after it
had absorbed a photon of light. The photosystem is shown on the left. Light is absorbed by a special pair of chlorophyll
molecules (green) at the center, and then electrons are transported down (shown with an arrow), ultimately reaching a
quinone (orange). The electrons are replenished from the top through a string of hemes (red). As shown on the right, the
two structures were quite similar, except for the motion of a key tyrosine amino acid, shown in blue. The change was
taken as evidence that the tyrosine loses its hydrogen atom in the light-activated state (the lower of the two in the
gure), gaining a negative charge and moving closer to the special pair of chlorophylls (PDB entries 2x5u, 2x5v)
The machinery for stripping electrons out of water has also been
revealed in atomic structures. These electrons are needed to replace
the ones that are sent down the electron transfer chain, producing
the oxygen that we all breathe in the process. The action occurs at a
complex cofactor composed of four manganese ions and a calcium
ion. The structures have revealed the arrangement of ions in the
cofactor, but researchers are still sorting out how it captures water
and produces oxygen (. Fig. 12.3).
Looking inside plant cells, we find that they have vast arrays of
these photosystems, all surrounded by their fields of antennas. They
are arranged in disk-shaped compartments (termed grana), which
allow them to build up a gradient of protons as they perform their
light-driven pumping operation (. Fig. 12.4). The energy of this
gradient is ultimately used to power the creation of sugar molecules,
which fuel the entire biosphere. The process of building sugar
involves many enzymes, but one plays a key role: ribulose bisphos-
phate carboxylase/oxygenase (RuBisCO).
RuBisCO (. Fig. 12.5) is the enzyme that captures carbon diox-
ide and fixes it into a molecule that can be used by the cell to build
sugar. Ironically, this enzyme is one of the least efficient enzymes in
cells. This is due in part to the similarity between carbon dioxide
and oxygen molecules. As reflected in the name of the enzyme, it
performs two competing reactions: a carboxylase reaction that fixes
carbon dioxide and an oxygenase reaction that creates a toxic side
86 Chapter 12 Green Energy
12
. Fig. 12.2 Antenna proteins. Photosynthetic reaction centers (shown in darkest green in the center of each complex)
are often surrounded by a core antenna complex (in medium green and pink) and peripheral antenna proteins (lightest
green and pink). Photosystem II (with the oxygen-evolving center in red and purple) is associated with the
light-harvesting protein LHCII and other proteins (not shown here). Photosystem I has several light-harvesting subunits
that associate with main core to form the supercomplex shown here. The simple reaction center from a photosynthetic
bacterium (lower right) is surrounded by light-harvesting complex LH1 and associates loosely with LH2 (PDB entries
4ub6, 2bhw, 4y28, 1pyh, 2fkw)
. Fig. 12.4 Chloroplast. This cross section through a chloroplast shows the two-layered membrane at the top and the
stacked grana below. The photosynthetic electron transport chain is embedded in the membranes of the grana: (1)
photosystem II, (2) light-harvesting complex II, (3) plastoquinone, (4) cytochrome b6f, (5) plastocyanin, (6) photosystem
I, (7) ferredoxin, (8) ferredoxin reductase, and (9) ATP synthase. Many RuBisCO enzymes (10) are found in the soluble
space along with the machinery to synthesize and manage the chloroplast
88 Chapter 12 Green Energy
12
product. The plant cell then needs to clean up all these side prod-
ucts. It must all be worth the effort, however, because RuBisCO has
been estimated to be the most plentiful enzyme on the Earth.
Today, we think of plants as being the greenest of green energy,
but this was not always the case. In the early evolution of life, pho-
tosynthetic organisms were the major polluters on Earth, so much
so that they changed the basic characteristics of the environment.
The earliest organisms used other forms of energy to power their
process, grabbing readily available reactive molecules in the envi-
ronment. But obviously photosynthesis was a more successful evo-
lutionary innovation, and the oxygen released by these early
organisms filled the skies and gradually poisoned all of the
competitors.
89 13
Peak Performance
. Fig. 13.1 Glycogen phosphorylase. Glycogen phosphorylase is a dimeric enzyme that includes an active site that
clips glucose from glycogen (with a nucleotide bound in the site in this structure, shown in red) and a storage site that
tethers the enzyme to the glycogen granule. Regulatory sites are seen on the backside of the enzyme, including a serine
that is phosphorylated (in green) and an allosteric site for binding nucleotides (in red) (PDB entry 6gpb)
fatty acids have long strings of carbon atoms, they are broken
down bit by bit by four enzymes, which release two-carbon units
and connect them to the carrier molecule coenzyme A. In our
mitochondria, three of these enzymes are associated into a multi-
enzyme complex that allows the fatty acid to transfer directly from
site to site during the reactions. The structure of a similar complex
from bacteria has been studied by crystallography, uncovering
some of the atomic details of how the fatty acids and other
necessary cofactors all bind to perform the progressive breakdown
(. Fig. 13.3).
92 Chapter 13 Peak Performance
13
. Fig. 13.2 Glycogen phosphorylase regulation. Glycogen phosphorylase is an allosteric enzyme that shifts shape
between an inactive and an active form (PDB entries 8gpb, 1gpa)
. Fig. 13.3 Fatty acid metabolism. This structure of a bacterial beta-oxidation multienzyme complex captures several
pieces of the story. The complex includes three dierent enzymes, with two copies of each. Two are found in the
subunits shown in blue, with NAD (pink) and a fatty-acid-like molecule (gray) bound in the active sites. The other
enzyme (green) performs the nal step of attaching a piece of the fatty acid to the carrier molecule coenzyme A
(magenta) (PDB entry 1wdk)
Chapter 13 Peak Performance
93 13
. Fig. 13.4 Supersweet proteins. Supersweet proteins like monellin, thaumatin, and brazzein, as well as sweeteners
like aspartame, bind to the sweet taste receptor, which is similar to the glutamate receptor shown in blue (PDB entries
3mon, 1thv, 2brz, 2e4z, 4or2)
13
. Fig. 13.5 Whey proteins. Whey proteins are rich in essential amino acids (magenta), particularly the branched
amino acids (red) (PDB entries 1beb, 1hfz, 3v03)
the whole protein binds in the cleft between the receptor domains
acting like a wedge to create the sweet-tasting conformation
changes. Scientists are now using mutations to dissect the interac-
tion, trying to determine which portions are most important. Some
even sweeter versions of the proteins have been discovered, and
researchers have discovered ways to make them more attractive for
use in cooking. For instance, the two chains of monellin fall apart
when it is heated and the molecule loses its sweetness, but an engi-
neered single-chain version is much more stable.
I also went through a phase where I was a bit of a gym rat, trying
to build up some muscle. I got a personal trainer, who promptly
prescribed a course of supplements. These included a daily multivi-
tamin (which I still take), lots of protein in the form of shakes, and
a creatine supplement. The protein is easy to understandmy body
needs the building blocks to build new muscle. My trainer was
pushing whey proteins at the time. This is a collection of small, sol-
uble proteins from milk, left over when all the stuff needed to make
cheese is taken out. These have been found to be rich in essential
amino acidsthe amino acids that our body cant make by itself.
Three of these get the most press in the context of the gym: the
branched chain amino acids leucine, isoleucine, and valine. Studies
have found that supplementing the diet with these amino acids can
stimulate muscle growth, so proteins that have more of them (like
whey proteins) are popular for protein supplements (. Fig. 13.5).
The multivitamin makes perfect sense as wellthese are needed
for many of the molecular machines that control metabolism.
Chapter 13 Peak Performance
95 13
. Fig. 13.6 Vitamin B12. Vitamin B12 is collected by the intrinsic factor protein, which then binds to the cubam
receptor and is imported into our cells. It is used by two essential enzymes: one that is involved in the regeneration of
the amino acid methionine and the one shown here, which is involved in energy metabolism (PDB entries 3kq4, 2xiq)
13
. Fig. 13.7 Creatine kinase. Creatine kinase in cytoplasm is a dimer of subunits, but the mitochondrial version of the
enzyme can also form huge octameric complexes (PDB entries 2crk, 1qk1)
. Fig. 13.8 Steroid receptor. This model of the anabolic steroid receptor includes structures of the steroid-binding
domain, shown with the designer steroid THG, and the DNA-binding portion bound to a short piece of DNA. The
receptor also includes another large domain that has not been studied at the atomic level yet (PDB entries 2amb, 1r4i)
Cellular Signaling
Networks
. Fig. 14.1 Signaling pathway for VEGF. This diagram is taken from the KEGG Pathway Database (http://www.
genome.jp/kegg/pathway.html), a popular online database for signaling networks in cells. As you can see, the binding
of VEGF activates many interconnecting signaling pathways that lead to a variety of cellular changes. The picture is even
more complex than this, because all of the dotted lines on the right side of the diagram represent dozens of other
proteins that change or cause changes based on the signal
Chapter 14 Cellular Signaling Networks
101 14
somehow has to relay its signal inside the cell. This task is accom-
plished by a specific receptor for VEGF, which is found in the mem-
brane of the target cell. The mechanism of signal transduction is a
matter of simple arithmetic: 1 + 1 = 2. Normally the receptor pro-
teins float around separately in the membrane and are in the off
state. When VEGF binds, it brings together two receptor molecules,
forming an active dimer that triggers a signal inside the cell.
This dimerization mechanism was revealed in crystallographic
structures of the receptor. The mechanism was first discovered in a
structure of the receptor for human growth factor, and later struc-
tures for the VEGF receptor showed a similar mechanism. These
receptors all have a similar modular structure. There is a large
domain on the outside of the cell that binds to the soluble factor,
connected to a short segment that crosses the membrane and, on the
inside, a domain that triggers the signal inside the cell. The whole
thing is rather flexible, so scientists often determine the structure in
parts (. Fig. 14.2). If we allow ourselves a bit of latitude to mix and
match pieces from several similar forms of the receptor, we can build
up a rather complete picture of the whole thing.
The portion on the inside of the cell is a protein kinase domain.
Protein kinases are enzymes that add phosphate groups to protein
chains. When two VEGF receptors are brought together, these
kinases first modify each other, making them more active, and then
they start adding phosphates to other proteins in the signaling net-
work. This launches the signal inside the cell.
The VEGF receptor exemplifies several of the functional fea-
tures needed for an effective signaling protein. First, they need to be
able to turn on and off quickly, so they can respond to the minute-
by-minute needs of the cell. The association and separation of the
two receptors provide these two states. At the same time, the signal
we need is a strong signal thats not too subtle, but it still needs to be
reversible. Phosphate groups are perfect for this. They carry a strong
negative charge, so they are easily recognizable by other proteins in
the signaling network. But at the same time, they are easy to add
and remove by employing specific kinases and phosphatases, so the
signal may be turned on and off quickly and efficiently.
Looking at the pathway diagram, we can see that many other
modes of signaling are also used. Calcium is released in some cases,
which binds to proteins and modulates their activity. The gas nitric
oxide is produced in other legs of the network. It diffuses to nearby
cells and causes its changes there. Like phosphorylation, these
small, mobile molecules are easily recognizable, sending a strong
signal that may be quickly quenched by gathering up or metaboliz-
ing the molecules when the signal is finished.
Activation by phosphorylation is essential in the next leg of
the signaling network, where the signal is amplified and delivered
to appropriate places within the cell. Activation of the VEGF
receptor has many effects in the cell. One is the stimulation of a
variety of genes involved in cell growth. A cascade of kinases, one
phosphorylating the next, amplifies the signal, ultimately deliver-
ing into the nucleus. A variety of helper proteins tune the process.
Atomic structures have revealed that many of these kinases have a
102 Chapter 14 Cellular Signaling Networks
. Fig. 14.3 ERK2 and DUSP5. ERK2 is a kinase that adds phosphate groups
to transcription factors and is itself activated by phosphorylation. The two
structures on the left show how addition of phosphates tightens up the active
site. DUSP5 reverses the signal by removing phosphate groups. The image on
the right is built from two atomic structures: a crystallographic structure of the
catalytic domain and an NMR structure of a domain involved in substrate
recognition (note that this domain looks a bit dierent in the illustration, since
it includes all the hydrogen atoms). The catalytic domain includes sulfate
groups bound in the active site. An acidic amino acid (shown here in bright
magenta) performs the cleavage reaction (PDB entries 1erk, 2erk, 2g6z, 1hzm)
parts: one part receives the signal and responds to it, and the other
recognizes the appropriate sequence of DNA in the genome. Very
often, as with the cell surface receptors, scientists determine struc-
tures of these two portions separately. In the case of C-fos and Jun,
only the DNA-binding portions have been determined, and we have
to infer the rest from the sequences of the proteins. A related com-
plex is found as part of the enhanceosome for the interferon-beta
gene, as seen on the left-hand side of . Fig. 14.4.
Finally, a specific phosphatase enzyme, DUSP5, removes the
phosphate from the last kinase in the signaling cascade, shutting the
whole process down. Its also a flexible protein and has been studied
in parts. A crystallographic structure revealed a dimeric structure
for the catalytic domain, which includes a small loop that hugs the
phosphate group and perfectly positioned acid group that clips it
off. The protein also includes another domain involved in recogni-
tion of the proteins that it dephosphorylates. A domain from a sim-
ilar protein is shown in . Fig. 14.3, since the DUSP5 domain has
not been studied yet.
I worked with the CREST team to bring this information
together into a picture of the whole process. The students chased
down UniProt entries and atomic structures, along with electron
micrographs of vascular cells, filling out all the structural details.
104 Chapter 14 Cellular Signaling Networks
14
. Fig. 14.5 VEGF in action. VEGF (1) signaling starts at the cell membrane, where it brings together two VEGF
receptors (2), activating the kinase domains inside the cell and activating several signaling pathways in the cell. C-src (3)
is phosphorylated, causing it to open up and phosphorylate cadherins (4) in adherens junctions between cells. This
releases alpha-catenin (5), which dimerizes and bundles actin laments (6). In another pathway, receptor dimerization
launches a cascade of activation reactions through PLC-gamma (7), PKC (8), Raf-1 (9), MEK (10), and ultimately ERK2 (11).
Activated ERK2 (11) is transported into the nucleus, where it phosphorylates C-fos (12), causing it to form a heterodimer
with Jun (13). This then acts as part of an enhancer to promote transcription of genes needed for blood vessel growth,
binding to transcription mediator (14) and ultimately starting transcription by RNA polymerase (15). Finally, DUSP5 (16)
terminates the process by removing the phosphates from ERK2
107 15
GPCRs Revealed
In the past decade or so, there has been a quiet revolution in the
15 field of cell signaling. An elusive target finally yielded its secrets: in
2007, the first atomic structure of a GPCR was determined. GPCRs
(G-protein-coupled receptors) sit in the membranes of cells
throughout our bodies and pass messages inside to G proteins.
Theyve been a particularly hard nut to crack because they are
smallish proteins, most of which is buried in the membrane. The
small bits that extend on either side of the membrane dont provide
much leverage for forming crystals, so they eluded crystallography
for many years.
The first glimpses of GPCRs were obtaining using a similar pro-
tein, bacteriorhodopsin, made by bacteria that live in high-
temperature brine pools. Bacteriorhodopsin forms beautiful arrays
in the membrane of these bacteria, making it a perfect candidate for
study by electron diffraction. Richard Henderson and Nigel Unwin
worked for years to improve, step by step, the structures of bacteri-
orhodopsin. At first, they could only see the major feature: a bundle
of parallel alpha helices that cross the membrane. With more work,
they finally revealed the atomic details, including the loops that
connect the helices, chromophore bound inside that captures light,
and even amino acids involved in pumping protons across the
membrane powered by light energy (. Fig. 15.1).
The big advance that opened the door to GPCR atomic structure
came from protein engineering. The trick was to create a chimeric
protein that substitutes one of the GPCR loops with a small, stable
protein. This acts like a handle, helping to coax the slippery
membrane-crossing portion of the molecule into crystals.
Crystallographers often use antibodies in the same way. Antibodies
stick to the target protein and help link them together into crystals.
Indeed, parallel structures of the GPCR that recognized adrenaline
were solved in both of these waysas a chimera with lysozyme
(. Fig. 15.2) and as a complex with an antibody.
Many amazing structures followed these breakthrough
structures, building on the method. For instance, many additional
structures of the adrenergic receptor have revealed the structural
basis of signaling. One of the big mysteries has been the way that
GPCRs pass their signal from the outside to the inside of the mem-
brane. By comparing an inactive conformation, frozen in place by
an inhibitor that blocks signaling, with an active form bound to a G
protein, we see that a few of the helices shift and bend, propagating
a signal through the protein and across the membrane. One helix in
particular bends at its center, forming a convenient pocket for the
binding of the G protein (. Fig. 15.3).
Additional structures have revealed a diverse collection of
GPCRs, receiving all manner of signals and passing them into the
cell (. Fig. 15.4). Opsin receives the tiniest of signalsa single pho-
tonultimately launching a cascade of signals that tells the brain
that it has seen light. The serotonin receptor is one of the cogs in the
process of thoughtit recognizes signals from the neurotransmit-
ter serotonin and passes the message into a nerve cell. CXCR4
mediates signals passed between cells in the immune system and
has the unfortunate distinction of being one of the proteins
Chapter 15 GPCRs Revealed
109 15
. Fig. 15.1 The structure of bacteriorhodopsin was solved after many years of work, studying two-dimensional
crystals of the protein using electron diraction. The crystal lattice is shown at the top, with the proteins in purple and
the surrounding lipids in the membrane in dark red. At the bottom, a cartoon shows the characteristic bundle of seven
alpha helices, with the light-capturing chromophore in magenta (PDB entry 2brd)
15
. Fig. 15.2 Adrenergic receptor. The adrenergic receptor was engineered to substitute one small loop with an entire
molecule of lysozyme, which has many charged amino acids (bright blue and red on the right) on its surface. The portion
of the receptor that spans the membrane (shown with the gray bar) is coated with carbon-rich amino acids (shown with
white spheres on the right) (PDB entry 2rh1)
. Fig. 15.3 GPCR signaling. The motions involved in GPCR signaling were revealed in two structures of the adrenergic
receptor: an inactive conformation (in blue at left) bound to an inhibitor (green) and an active conformation (in red at
left) in the process of activating a G protein (shown on the right). The major change is a large swinging motion of one of
the helices that propagates the message from the adrenaline-binding site to the G protein (PDB entries 3sn6, 2rh1)
Chapter 15 GPCRs Revealed
111 15
. Fig. 15.4 GPCRs. GPCRs come in all shapes and sizes. Opsin binds to retinal and senses light in our retinas.
Serotonin receptor senses the levels of the neurotransmitter serotonin in the brain. CXCR4 senses the level of
chemokines and acts as a dimer. Glucagon receptor has an extra domain that closes around the top of the small peptide
hormone glucagon (PDB entries 1f88, 4iar, 3odu, 1gcn, 4ers, 4l6r)
. Fig. 15.5 Adenosine receptor and caeine. Caeine blocks the adenosine receptor, a GPCR that plays a role in the
level of metabolism. These two structures capture the adenosine receptor doing its normal job (on the right) and after it
is blocked by caeine (PDB entries 2ydo, 3rfm)
work and opened new avenues for improving them. For instance,
the most widely used psychoactive compound, caffeine, acts
through a GPCR to stimulate cells and give us that special boost
from our morning cup of coffee (. Fig. 15.5). It blocks the action of
the adenosine receptor by perfectly mimicking the normal activator
of the receptor. Today, scientists are using these new structures of
GPCRs to design new treatments. With more careful targeting of
the adenosine receptor, we are finding ways to help people with
Parkinsons disease. By blocking the histamine receptor, we can
manage allergies. By targeting the opioid receptor, we can help
manage pain. And the list goes on. These central receptors are
becoming ever more central to the way we manage our health.
113 16
. Fig. 16.1 Insulin in action. This illustration shows two consequences of insulin. Insulin (1) binds to its receptor (2) on
the surface of the cell, activating a cascade of signaling molecules (3) inside the cell. These activate glycogen-building
enzymes (4) and also stimulate the transport of vesicles with glucose transporters (5) to the cell surface. Together, they
take glucose (white dots) into the cell from the blood and store it in large glycogen (6) molecules
. Fig. 16.2 Insulin. Insulin forms a stable hexamer (left) when it is stored in the pancreas, which disassembles into
active monomers (right) when it is delivered to the blood. Each insulin monomer is composed of two chains (colored
blue and green) that are linked together with several disulde linkages (yellow) (PDB entry 1trz)
116 Chapter 16 Signaling with Hormones
16
. Fig. 16.3 Proinsulin and insulin. Insulin is synthesized in cells as a longer protein, called proinsulin, which is then
clipped to form the active protein (PDB entries 2kqp, 4ins)
. Fig. 16.4 Insulin receptor. These illustrations of the insulin receptor are constructed from several atomic structures
of the individual parts. The inactive form is shown on the left, with the insulin-binding portion at the top and the kinase
domains at the bottom. When insulin (red) binds, it is thought to bring the kinase domains together so that they can
activate each other (PDB entries 3loh, 2mfr, 1irk, 3w14)
. Fig. 16.5 Kinase domain of the insulin receptor. The kinase domains are activated when three tyrosines (green) are
phosphorylated (yellow and red in the right image). This opens up the active site, allowing ATP (magenta) to bind. The
active structure on the right also includes a small piece of a signaling protein (red), with a tyrosine ready to be
phosphorylated by the ATP (PDB entries 1irk, 1ir3)
118 Chapter 16 Signaling with Hormones
16
. Fig. 16.6 Glycated hemoglobin. This structure of hemoglobin has a sugar attached to a lysine amino acid deep
inside the protein (PDB entry 3b75)
. Fig. 16.7 Designer insulins. Insulins with special properties have been engineered. Slow-acting insulin Degludec
(top) adds a hydrocarbon (pink) that links neighboring hexamers, and fast-acting insulin Aspart (bottom) changes one
amino acid to a charged aspartate (red) that destabilizes the hexamer (PDB entries 4ajx, 4gbc)
Single-Molecule
Chemistry: Enzyme Action
and the Transition State
. Fig. 17.1 Penicillin and bacterial enzymes. The enzyme shown here,
d-alanyl-d-alanine peptidase, performs an essential reaction for the
construction of the protective cell wall surrounding many types of bacteria.
Penicillin, shown here in red, blocks the machinery of this enzyme, ultimately
killing the cell (PDB entry 1pwc)
Chapter 17 Single-Molecule Chemistry: Enzyme Action and the Transition State
123 17
available about how they are constructed and how they perform
their reactions.
In 1965, D.C.Phillips solved the structure of lysozyme, giving us
the first look at how enzymes work. The structure revealed a form-
fitting active site, perfectly shaped to bind to its target, a bacterial
carbohydrate chain. This structure confirmed the basic theory of
enzyme action: enzymes stabilize the transition state of an enzyme
reaction. A chemical reaction typically begins with a stable sub-
strate and ends up with a stable productin the case of lysozyme,
the substrate is a carbohydrate chain and the product is a cleaved
chain. In the course of this reaction, however, the molecules must
pass through a number of less stable intermediate states, termed
transition states. The enzymes major job is to streamline the path
through these transition states, guiding the reaction efficiently from
substrates to products.
The structure of lysozyme showed many ways that enzymes
do this. A key concept is that enzymes make sure that everything
is in the right place at the right time. For the lysozyme cleavage
reaction, this includes several things. It has a form-fitting groove
that grips the intended substrate, making sure that the enzyme
only acts where it is supposed to act. But the groove isnt a com-
fortable fit. It grabs the two ends of the chain and torques the
center, causing one of the sugar rings to distort into a less-stable
conformation that is more amenable to the reaction (. Fig. 17.2).
Then, specific amino acids around the target bond deliver a
water molecule and orchestrate the chemical steps of the cleav-
age reaction.
17
. Fig. 17.3 HGPRT active site. Three snapshots of HGPRT capture the enzyme with its starting substrates (left), after
guanine has been added to the sugar to form GMP, releasing pyrophosphate (center), and a form with just the GMP
product bound (right). Notice that a loop in the enzyme (green) has opened up in the nal structure, but the release
of GMP is still the slowest part of the whole process (PDB entries 1d6n, 1bzy, 1hmp)
. Fig. 17.4 HIV protease inhibitors. Three structures show some of the logic
for design of drugs that block HIV protease. The enzyme is shown at the top,
with a small peptide bound in the active site. Two acidic groups (turquoise)
catalyze the cleavage reaction at the center of the peptide, activating a
carbonyl group (red). Below this, three molecules are shown. At the top is the
same peptide, with the carbonyl oxygen shown with a star. One of the early
inhibitors was a symmetrical analogue of the transition state that has the
oxygen changed to a noncleavable hydroxyl. Later developments lead to
eective drugs like saquinavir, which are smaller and bind much more tightly
to the enzyme (PDB entries 1kj4, 9hvp, 1hxb)
water is found in all of these drugs, where it interacts with the two
catalytic amino acids.
This type of rational drug design is one of the most direct
approaches to using structural information and taking control of our
own molecules. The goal is to find a specific small molecule to block
the action of a biological molecule. The approach has been successful
in numerous systems, creating therapeutic drugs for everything
from cancer to blood pressure. This is often a meandering process,
making changes one by one to improve the drug. For instance, the
126 Chapter 17 Single-Molecule Chemistry: Enzyme Action and the Transition State
17
. Fig. 17.5 Rational design of Gleevec. The antileukemia drug Gleevec was
designed based on the structure of a protein tyrosine kinase (shown on the
left, with the drug in green). A series of renements were made during the
design process, ultimately yielding a drug that is specic for the targeted
protein and has good properties for use as a drug (PDB entry 1iep)
. Fig. 17.6 Cocaine catalytic antibody. Three structures capture a catalytic antibody at dierent steps in its reaction:
binding to cocaine and a water molecule, the transition state where the water has been attached to cocaine, and the
cleaved cocaine products. Two amino acids that catalyze the reaction are shown in turquoise (PDB entries 2ajv, 2ajx,
2ajy)
129 18
Seven Wonders
of the World of Enzymes
Some enzymes are amazingly fast, so fast in fact that they perform
their reactions faster than molecules can get to them. The diffusion
of molecules through the watery cell environment is fast, but only
so fast. This sets an interesting upper limit on the evolution of
enzyme functiontheres no need to improve the function further,
because the enzyme is already perfect enough in the context that it
performs its job.
Scientists have found several examples of these perfect
enzymes. Most perform very simple tasks that require the capture
of a single molecule, followed by a small chemical change. Carbonic
anhydrase is a perfect example. It is important for solubilizing car-
bon dioxide in the blood. Throughout your body, it combines car-
bon dioxide and water to form carbonic acid and bicarbonate,
which are both very soluble. Then, in the lungs, it oversees the
reverse process and releases the carbon dioxide when we breathe
out. This reaction can occur naturally without the enzyme, but the
enzyme allows more control, speeding up the reaction in the desired
place by a million times.
But the evolution of enzymes hasnt stopped there. A study of
the enzyme superoxide dismutase found that it performs its reac-
tion even faster than might be expected. The structure of superox-
ide dismutase revealed that this enzyme gives its target an extra
boost. It has a strongly charged patch around the active site, which
forms a funnel that draws the oppositely charged radical molecule
into the right place (. Fig. 18.1).
18
Four: Allostery
Cells are filled with enzymes, all performing their individual jobs
quickly and efficiently. Some of these enzymes perform their reac-
tions reversibly, performing a particular reaction and the opposite
Four: Allostery
133 18
. Fig. 18.3 Lanosterol synthase active site. Two structures capture the enzyme lanosterol synthase before and after
its reaction. A cascade of cyclization reactions converts an oxidosqualene molecule (left) into a lanosterol molecule
(right) (PDB entries 1ump, 1w6k)
reaction equally well. Of course, if all of the enzymes in the cell did
this, the result would be chaos, and nothing would get done. To solve
this problem, cells include many enzymes that perform one reaction
preferentially and may be turned on and off according to need.
One of the key ways that enzymes are regulated is through a
change in shape, termed allostery. The enzyme has two (or more)
states that are active and inactive. The switch between the two states
is controlled by another site on the enzyme, which binds to a regula-
tory molecule. Atomic structures have revealed the nature of these
allosteric motions, capturing many enzymes in both their active
and inactive states (. Fig. 18.4).
134 Chapter 18 Seven Wonders of the World of Enzymes
18
. Fig. 18.6 Substrate tunneling. A cross section of tryptophan synthase shows the tunnel (red stars) that delivers
indole from one active site to the next. The two enzymes are shown in blue and green, with substrate molecules in red
spheres (PDB entry 1beu)
18
. Fig. 18.7 Substrate transfer. Pyruvate dehydrogenase complex includes many parts connected by exible linkers.
At the center is a cubic core that organizes the whole complex. Small domains have a special lipoic acid group added
(magenta) that carries the substrate molecules from enzyme to enzyme around the outside. Only a few of these
enzymes are shown in the illustrationin reality, the entire complex is surrounded by them (PDB entries 1eaa, 1lac,
1w85, 1ebd)
with a charged amino acid, or one that can tweak its chemical state
appropriately. The classic example of this is the serine proteases.
They use a serine amino acid in the reaction, which attacks the pro-
tein chains that are cut by the enzyme. A chain of a histidine and an
aspartate activate this amino acid, making it much easier to transfer
a hydrogen to the molecule being attacked (7 see Figs. 8.3 and 8.4).
Some reactions, on the other hand, are too difficult for the 20
natural amino acids and need special chemical tools. Looking to
nature, these tools abound. Many of them are small organic mole-
cules evolved to perform a specific task. For instance, ATP is perfect
for carrying phosphates, and NAD is perfect for carrying electrons.
Molecules built from the B vitamins specialize in carrying carbon
atoms and other small groups, and transferring them to other mol-
ecules. SAM performs a similar task for sulfur.
In other cases, even more chemical creativity is needed, and
specific metal ions are employed. In some cases, the enzyme just
needs something to bind strongly to a charged group. In other cases,
the strong charges are needed to force a particularly difficult reac-
tion. For instance, the enzyme that fixes nitrogen, converting gas-
eous nitrogen in biologically useful ammonia, performs this
incredibly difficult reaction using a complex cluster of exotic metals
(. Fig. 18.8).
138 Chapter 18 Seven Wonders of the World of Enzymes
18
Seven: Ribozymes
Proteins arent the only molecules that can catalyze chemical reac-
tions in cellstheyre just the most creative. Scientists have also
discovered that RNA is used to build catalytic molecules, termed
ribozymes. The most famous one, of course, is the ribosome, which
uses an adenine base to catalyze the addition of amino acids to a
growing protein chain (7 see Fig. 7.6). Other common examples are
RNA molecules that can cleave other RNA molecules, or them-
selves. These RNA molecules fold into complex shapes reminiscent
of the globular structures of enzymes. Theyre particularly good for
these tasks because they have built-in machinery for recognizing
the target RNA sequence, since they can use typical base pairing to
bind to it. They usually employ metal ions to perform the actual
reaction.
A tiny ribozyme has been the object of much of the study, serv-
ing as a convenient model for the action of ribozymes. It is termed
the Hammerhead ribozyme, since chemical diagrams look like a
hammer, and it was first found in plant pathogens, where they are
involved in self-splicing of the RNA genome, and similar ones have
been found in many organisms. Researchers whittled away at this
natural ribosome, ultimately finding a minimal version of it that
performs the self-cleavage reaction with only two short RNA
strands (. Fig. 18.9).
139 19
Building Bodies
. Fig. 19.1 Cytoskeletal laments. Cytoskeletal laments actin (top), vimentin (center), and microtubule (bottom)
(PDB entries 1m8q, 1gk7, 3uf1, 3trt, 1gk4, 1tub)
19
. Fig. 19.3 Lipid bilayer. This model of a lipid bilayer was generated using
computer simulation in the laboratory of Klaus Schulten. The two layers of lipids
are seen in the center, with their carbon-rich tails (gray) pointing inwards and
charged groups (red, yellow and blue) exposed to water molecules on each side
. Fig. 19.4 Membrane proteins. Membrane-spanning proteins are encircled with a belt of carbon-rich amino acids
that interact with the carbon-rich interior of the membrane. In this illustration, charged atoms are bright red and blue
and are mostly outside the membrane, and carbon-rich regions are in white. The three proteins are a photosynthetic
reaction center (left), an ion pump (center), and P-glycoprotein (right), a protein that pumps toxic molecules out of our
cells (PDB entries 1prc, 1su4, 3g61)
19
a very small clathrin coat in almost atomic detail, showing how the
arms interdigitate and embrace the vesicle inside (. Fig. 19.7).
Cells in our bodies also need infrastructure to hold them together
as building blocks of a larger body. At a local level, cells have many
ways of connecting their neighbors. For instance, cells are tied
together and communicate with each other through gap junctions,
formed of a closely packed arrangement of hundreds or thousands
of connexon proteins. These connexons form a narrow pore that
connects the cytoplasm of the two cells, allowing small molecules
like ions and nucleotides to pass back and forth (. Fig. 19.8).
Researchers have found that this flow of molecules stops when the
cell is damaged, however. Damage often leads to release of calcium
from internal storehouses, and these calcium ions bind to the con-
nexons. It has been thought for many years that this causes a confor-
mational change in the connexon, closing up the pore. A recent
structure, however, shows that the calcium-bound pore is wide
open, similar to the normal state of the pore. Based on this structure,
researchers now think that all of these calcium ions may form an
Chapter 19 Building Bodies
145 19
electrostatic barrier that inhibits the flow of positive ions like potas-
sium through the pore.
Our bodies also need a larger infrastructure to tie everything
together. This is built of many very large molecules that together
form networks that support tissues and organs. Collagen is one of
the major structural components of these networks. It is composed
146 Chapter 19 Building Bodies
19
. Fig. 19.9 Collagen. Atomic structures have been determined for small pieces of the collagen triple helix, as shown
at the top. The characteristic glycines are shown with spheres and starred near the left side, and the many ve-membered
rings are prolines or hydroxyprolines that kink the chain and direct it back towards the center of the triple helix
(PDB entry 1cag)
. Fig. 19.10 Painting of a nerve synapse. This painting shows a cross section through a glutamatergic nerve synapse.
Remarkably, all of this complex infrastructure is needed to ensure that small neurotransmitter molecules (yellow dots)
are released at the proper time, delivering a signal to receptors on the surface of the lower cell. The infrastructure
includes vesicles that hold neurotransmitters, proteins that store the vesicles and deliver them to the surface of the cell
at the appropriate time, and proteins that manage the fusion of the vesicles with the cell surface to release the
neurotransmitters. There is also an infrastructure for holding the two cells together and arranging the receptors in the
proper place. On top of this, there is a complex regulatory infrastructure that modulates the activity of the synapse,
creating complex behaviors such as memory
148 Chapter 19 Building Bodies
One enzyme has arguably caused more human strife than any other,
20 the enzyme tyrosinase. The one shown here is from bacteria
(. Fig. 20.1), but the one in our cells is similar but is bound to
membranes. It performs an interesting reaction: it oxidizes the
amino acid tyrosine, which then forms huge aggregates called mela-
nin which strongly absorb light, looking dark brown or black. Cells
in our skin have special compartments that make this melanin to
help protect us from the dangerous effects of sunlight. Therein lies
the problem. Human populations around the world have evolved
cells that make different amounts and types of melanin, driven
largely by their historical exposure to sunlight. This has yielded a
beautiful diversity in skin color, ranging from clear white to darkest
black and everything in between. Similar molecules give hair its
shades of blonde, red, brown, and black. Unfortunately, human
society has never been good with differences, and this highly visible
consequence of a single enzyme has helped to fuel many of our cur-
rent societal challenges.
Were not at all unique in this variation of color, or in the strife
it can cause. The biological world is filled with colors, which have
evolved to provide a variety of selective advantages. These include
colors that help hide and colors meant specifically to be seen. Some
colors are a consequence of the selective absorption of other colors,
and some colors are a consequence of light actually created by the
cell.
The most common color in our biological world is greenthe
ever-present green of plants. Ironically, this green light is leftover
lightlight that the plant cant use. The chlorophyll used by plant
cells to capture the energy in light absorbs red and blue light
strongly, leaving the greenish hues. The color is caused by the large,
flat ring of atoms, termed a porphyrin, which has many atoms that
share electrons and can absorb the energy from visible light
(. Fig. 20.1). These chlorophyll molecules are held inside special
. Fig. 20.1 Tyrosinase. These two structures capture tyrosinase before and after it performs its reaction of converting
tyrosine to L-DOPA (PDB entries 4p6r, 4p6s)
Chapter 20 Coloring the Biological World
151 20
proteins that hold them in huge arrays, ready to soak up as much
light as possible (7 see Figs. 12.1 and 12.2).
To assist chlorophyll, plants also build molecules that absorb
other colors. For instance, beta-carotene absorbs blue and green
light, and thus looks orange. Looking at the photosynthetic machin-
ery, these molecules are arrayed with chlorophyll in many plants.
Plants are also masters of color generation for decoration. They
build all manner of colorful molecules to decorate their flowers. The
evolutionary goal for these, amazingly, is to look pretty, at least
pretty to the insects that pollinate them.
Other colors in our own bodies are a consequence of the metal
ions we need for life. The bright red of blood is the most familiar. It
is caused by the iron ion that is held within a heme (. Fig. 20.2). As
with chlorophyll, the color is a consequence of the delocalized elec-
trons in the porphyrin ring. Trillions of these molecules fill every
red blood cell, soaking up oxygen and blue and green and yellow
light. Similarly, proteins such as cytochromes, as indicated by their
names, have the side effect of producing color. Other organisms use
different metal ions in these tasks, so their blood may be different
colors. For instance, hemocyanin from insects uses a copper ion
and is blue green.
Colored molecules are also ideal for sensing light. Cells in our
retinas use a particularly useful molecule, called, quite logically,
retinal. Like porphyrins, it has atoms with delocalized electrons that
absorb visible light. But when they do, they induce a change in the
shape of the molecule. This is perfect for sensing light. The retinal is
. Fig. 20.2 Porphyrins. Porphyrins provide much of the color in our natural world. They are composed of a at ring of
atoms that capture a metal ion in the center. The colors depend on the specic arrangement of atoms in the ring and
the type of metal ion at the center (from PDB entries 1s5l, 2hhb)
152 Chapter 20 Coloring the Biological World
20
. Fig. 20.3 Rhodopsin. Retinal (magenta) changes shape when it absorbs a photon, triggering the protein opsin
(white) to launch a signal to the brain (PDB entries 1u19, 3pqr)
. Fig. 20.4 Fluorescent proteins. The chromophore of GFP (left) forms spontaneously when a new bond (dotted
turquoise here) forms between three successive amino acids in the chain. Biotechnology researchers have made small
changes to the chromophore and the amino acids that surround it to create uorescent proteins with all the colors of
the rainbow (right) (PDB entries 1ema, 3m24, 2q57, 4ar7, 2y0g, 1huy, 2h5o, 2h5q)
20
. Fig. 20.5 Fluorescent calcium sensor. The engineered calcium sensor GCaMP2 includes a circularly permuted green
uorescent protein (green), attached to calmodulin (magenta, with calcium ions in yellow) and a short chain from
myosin (blue). The calmodulin portion changes shape when it binds to calcium, changing the uorescence of the GFP
portion (PDB entry 3evr)
. Fig. 20.6 Luciferase. The chromophore luciferin is shown in the center, surrounded by the luciferase protein
(PDB entry 2d1s)
155 21
Amazing Antibodies
. Fig. 21.1 Poliovirus neutralized by antibodies. This cryoEM structure includes the viral capsid (red and orange) and
the virus-binding portion of the antibodies (blue). Since the resolution of the experiment was not sucient to resolve
individual atoms, the structure includes only a single atom for each amino acid, which are represented here with a
larger sphere than is normally used for atomic images (PDB entry 3j3p)
Chapter 21 Amazing Antibodies
157 21
revealed that the antibody chains fold into a well-ordered structure
with two parallel beta sheets, and all of these hypervariable regions
are arrayed at one end, there they form loops that together sur-
round the binding site.
Flexibility is also a key component of antibody action. Antibodies
typically have two or more binding sites in a particular complex,
allowing them to make multiple connections on the surface of a
pathogenic organism. To make this even more efficient, the connec-
tors holding these binding sites are flexible, allowing them to
accommodate to different types of surfaces. This, however, makes
antibodies difficult to study. The classic Y-shaped antibody has been
observed by electron microscopy, but most atomic structures have
been solved using fragments of antibodies, which are more-or-less
rigid. An atomic structure had to wait until a lucky researcher found
a crystal form that trapped the flexible antibody in one particular
frozen pose (. Fig. 21.2).
We now have structures of hundreds of antibodies, bound to
many different types of targets. These structures reveal the secrets of
antibody recognition. By genetically mixing and matching seg-
ments of these hypervariable loops, they are able to recognize
almost anything. Structures in the PDB include antibodies that bind
to small molecules like cocaine or steroids, to soluble and
membrane-bound proteins, to RNA, to DNA, and to entire viruses.
For instance, structures have been determined for three different
antibodies that all recognize the same protein but in different ways
(. Fig. 21.3). Atomic structures have also captured the process of
antibody affinity maturation, where antibodies are tuned by the
immune system to improve their binding ability (. Fig. 21.4).
. Fig. 21.2 Antibody structures. Many atomic structures of antibodies have been solved by breaking the molecule
into stable fragments. Two early structures are shown here at the left: an antigen-binding Fab fragment that binds to the
small molecule phosphocholine and the Fc fragment that is similar (or constant) in many antibodies. A handful of
crystallographic structures of entire antibodies have also been determined, such as the one shown here on the right,
capturing the exible antibody in one particular pose (PDB entries 1mcp, 1fc1, 1igt)
158 Chapter 21 Amazing Antibodies
21
. Fig. 21.3 Anti-lysozyme antibodies. These three antibodies all recognize lysozyme (shown with a rainbow-colored
cartoon), but they bind to dierent sides of the molecule. Notice that the binding sites are quite dierent on the
antibodies: the one on the left has a cluster of positively charged amino acids (in bright blue), the one in the center has
more negatively charged amino acids (in bright red), and the one on the right is largely uncharged (white and pastel
colors) (PDB entries 1fdl, 1yqv, 3hfm)
. Fig. 21.4 Antibody maturation. The immune system tunes antibodies by making small changes to improve binding
to the target. The antibody on the left recognizes lysozyme (green) and is from the initial response to the protein and
binds fairly weakly. The antibody on the right has been optimized by anity maturation and binds a thousand times
more tightly. Sites of mutation (red) are scattered through the antibody chains, together making a better t to lysozyme
(PDB entries 1mlc, 1p2c)
Chapter 21 Amazing Antibodies
159 21
. Fig. 21.5 Antibodies in science. Antibodies are used in many medical and scientic applications. The structure on
the left shows two small fragments of antibodies bound to human chorionic gonadotropin. These types of antibodies are
used in pregnancy tests, since the hormone is prevalent during pregnancy. Other tests, such as the commonly used test
for HIV infection, use an antibody to recognize the unique shape of another antibodythe one that is built by the
immune system to ght the virus (shown in the center). Antibodies are also used by structural scientists to assist in the
crystallization of dicult systems, such as the small ion channel shown in green on the right (PDB entries 1qfw, 1iai, 1k4c)
21
. Fig. 21.6 Antibody structures. As shown on the left, most antibodies are composed of two heavy chains (in blue)
and two light chains (in green). The smaller antibodies made by camels and sharks are composed of two copies of a
single chain, as shown on the right (PDB entry 1mel)
that something is wrong. Viruses only need one thing on their sur-
faces: a machine to recognize susceptible cells and force their way
inside. Everything else can be hidden away inside, invisible to the
immune system. In HIV, this machine is called envelope glycopro-
tein, and its trick for survival is revealed in its name. It is covered
with sugar chains that are the same as the sugar chains on our cell
surface proteins. These form a protective coat of camouflage that
hides the virus from the immune system.
However, the immune system is extremely resourceful, and
shuffling and hypermutation can create many, many different types
of antibodies in short order. Several types of antibodies have been
observed in patients that are effective for neutralizing HIV.They use
some amazing tricks, including long fingers to reach through the
sugar coat and probe the underlying HIV protein and the linking of
several antibodies in tandem to create a complex that binds to the
sugars themselves (. Fig. 21.7). Unfortunately, it typically takes a
long time for these antibodies to be created by the immune system,
and the virus has already taken a strong hold by the time they are
being produced. One hope for anti-HIV vaccines would be to try to
elicit these broadly neutralizing antibodies earlier in the infection.
Vaccines have changed our lives, but some targets have remained
elusive. Influenza is a classic example. It changes so rapidly that our
complement of antibodies quickly goes out of date and is ineffective
against the newest strains or even old strains that havent been seen
Chapter 21 Amazing Antibodies
161 21
. Fig. 21.7 HIV envelope glycoprotein and antibodies. The Fab portions of two broadly neutralizing antibodies (blue)
show unusual ways to recognize HIV envelope glycoprotein (yellow and red), which is protected by a coat of
carbohydrates (orange). The one on the left has a long nger that pushes through the carbohydrates to reach the
protein, and the one on the right is domain swapped, producing two tandem binding sites that recognize the
carbohydrates directly (PDB entries 1nco, 1op5)
for many, many years. This is why we need a new influenza shot
each yearthe medical establishment makes an educated guess
about which strains will pose a danger and protects us with a vac-
cine against it. And then our antibodies get to work, patrolling
through our bodies and protecting us from invaders.
163 22
defensive proteins: how do they keep from doing the same thing to
our own membranes? The structures reveal that they are coated
with positively charged amino acids, which recognize the negatively
charged phospholipids that are common in bacterial membranes.
Our own cells have more lipids that are neutral and thus are not as
susceptible to attack.
Iron ions are another weak point in bacteria that is targeted by
the innate immune system. Iron is a precious commodity in our
bodies. We have a lot of iron, but we keep it locked up inside pro-
teins like hemoglobin. This leaves very little free iron for infecting
bacteria to use for their own metalloproteins. One type of bacteria,
the one that causes Lyme disease, has evolved a particularly draco-
nian solution to this challenge: all of the proteins in its genome that
166 Chapter 22 Attack and Defense: Weapons of the Immune System
22
normally require iron have been replaced by proteins that use other
metals or no metals at all. Most other bacteria, however, need to
find a way to gather up these rare iron ions for their own use.
This has lead to an evolutionary battle between our cells and
bacterial invaders. Bacteria build unusual small molecules, termed
siderophores, with a big appetite for iron. They release these sidero-
phores into the environment and then gather them up after they
have captured individual iron atoms (. Fig. 22.1). In response, our
ancestors evolved proteins that grab siderophores, termed sidero-
calins, and sequester them before the bacteria get a chance. In
response to this, some bacteria have then evolved stealth sidero-
phores that can gather iron but are not recognized by siderocalins.
And so the battle continuesand scientists are following every step
with atomic structures.
Our immune system also builds a more elaborate system for
attacking bacteria, termed the complement system. When antibod-
ies (such as star-shaped immunoglobulin M) find a bacterium, the
complement C1 protein binds to them and launches a cascade of
response that leads to the creation of a membrane attack complex
that pierces the bacterial cell wall. These proteins are large and flex-
ible and thus have been difficult to study. Atomic structures have
been determined for many of the functional parts of the molecules,
such as the antibody-recognizing arms of C1q, but electron micros-
copy has proven to be the best way to study the entire system in
action (. Fig. 22.2).
Viruses are much more slippery and require a different set of
weapons for defense. These look for the unusual aspects of viruses
and attack them there. For instance, many viruses have genomes
composed of double-stranded RNA, which is rarely found in cells.
So, if a cell notices that there is double-stranded RNA in the cyto-
plasm, it knows that something must be wrong. Plant and animal
cells have a sophisticated system for recognizing and silencing
RNA (. Fig. 22.3). The system starts with a protein that breaks the
RNA into small, recognizable pieces, called dicer. These little
Chapter 22 Attack and Defense: Weapons of the Immune System
167 22
. Fig. 22.3 Small interfering RNA. Atomic structures have shown that the large active site of the protein dicer is
exactly the right size to cut double-stranded RNA into perfectly sized pieces, using several metal ions (left). These small
interfering RNA molecules are then bound by argonaute and used to recognize and destroy RNA that matches the
sequence (center). Some viruses circumvent this protection by creating proteins that sequester siRNA before it can nd
the viral RNA (right) (PDB entries 2f8s, 2, 4w5o, 1r9f )
22
. Fig. 22.4 CRISPR and Cas. Cas9 uses CRISPR RNA (red) to recognize viral DNA (yellow) and then it breaks it into
pieces. Engineered versions of Cas9 are now being developed to destroy integrated HIV in infected cells (PDB entry 4un3)
Reconstructing HIV
. Fig. 23.1 Infection by HIV. HIV is shown at the top and a target cell is shown at the bottom in blues. HIV envelope
protein (1) has bound to the receptor CD4 (2) and then to coreceptor CCR5 (3), causing a change in conformation that
inserts fusion peptides into the cellular membrane. This ultimately leads to fusion of the virus with the cell membrane
The picture also includes several ways that the virus protects
itself. The capsid is dotted with a cellular protein, cyclophilin A.It
blocks the binding of a cellular antivirus protein that works by coat-
ing the capsid and stopping it from releasing the viral DNA. The
virus also injects the protein Vif, which attacks the cellular protein
APOBEC.APOBEC normally modifies bases on viral RNA, inacti-
vating it before it can be used to build new viruses.
The first big advances in the fight against HIV were achieved in
the late 1980s, using the classic deconstructive approach of molecu-
lar biology on reverse transcriptase and the two other viral enzymes
encoded in its genome. These viral enzymes are attractive targets for
drug therapy because they play essential roles in the viral life cycle,
and there is abundant precedent for creating inhibitors to block
enzymes like these. So they were purified, crystallized, and studied
174 Chapter 23 Reconstructing HIV
23
. Fig. 23.2 Reverse transcription. After the capsid has entered the cell, reverse transcriptase (1) creates a DNA copy
(green) of the HIV RNA genome (yellow), using a cellular transfer RNA (2) as a primer. HIV nucleocapsid protein (3) acts as
a chaperone to unfold the RNA secondary structure. The ribonuclease activity of RT removes the viral RNA after the DNA
strand is created. Interaction of HIV Vif (4) with cellular APOBEC (5) is also shown
. Fig. 23.3 Integration of the viral DNA. Uncoating of the viral capsid (shown at the top) and interaction with nuclear
pore proteins such as Nup358 (1) releases the viral DNA (2). The DNA enters the nucleus through the nuclear pore
(shown in purple) and is spliced into the cellular genome by the enzyme HIV integrase (3). Cellular protein LEDGF (4) is
important for localization of the site of integration at DNA in nucleosomes (5)
23
. Fig. 23.4 Transcription of viral RNA. HIV Tat protein (1), bound to the TAR RNA stem-loop structure, binds to the
P-TEFb complex (2), activating transcription by RNA polymerase (3). The illustration also shows HIV Rev (4) bound to the
Rev response element and CRM1 (5), a cellular protein involved in transport through the nuclear pore
. Fig. 23.5 Construction of viral proteins. The HIV Gag polyprotein (1, shown in red) is built from the HIV RNA genome
(in yellow) by cellular ribosomes (2). A short hairpin loop in the genome (3) induces a frameshift roughly 5% of the time,
producing the longer Gag-Pol protein (4)
in two forms, using the cells own ribosomes to do the job. The
23 smaller viral protein, called Gag, includes the proteins that direct
the budding of the virus and ultimately form the structure of the
mature virus. About one in twenty times, however, a longer protein
is made, termed Gag-Pol, that includes these same proteins but with
the three HIV enzymes added to the end. All of this is encoded in
one long gene in the viral RNA, but at the end of portion that
encodes Gag, there is a special sequence that forms a little hairpin
loop. This loop is just strong enough to stall the cells ribosomes as
they are creating the protein, and most of the time it falls off, mak-
ing the shorter Gag protein. Occasionally, however, it manages to
read through the loop and create the longer Gag-Pol protein.
The Gag and Gag-Pol proteins assemble on the inner surface of
the cell membrane, guiding the process of budding that produces
new viruses (. Fig. 23.6). This requires the assistance of many of the
cells own molecules to orchestrate the assembly, budding and
pinching off of the virus. This interplay of viral proteins and cellular
proteins is currently a major topic of study in the HIV biology com-
munity, as we try to understand the process and look for weak
points that can lead to new treatments and cures. Many of the
details still need to be resolved. I have included a few aspects in the
illustration. The end of the viral RNA has a complex structure that
dimerizes (to ensure that two copies of the genome end up in the
virus) and captures the cellular transfer RNA that will prime reverse
transcription, as well as the Tat protein that will promote transcrip-
tion. Cyclophilin A is captured and will end up on the surface of the
capsid, and as the whole thing buds out, bystanding cellular pro-
teins are swept up in the membrane and in the interior of the virus.
The final step of the life cycle is maturation, converting the
newly budded immature form of the virus into the infectious
mature form (. Fig. 23.7). This process is orchestrated by a small
viral enzyme: HIV protease. It cuts the Gag and Gag-Pol proteins
into their functional pieces. The timing of this is critical. Some of
the cuts need to be made first to ensure that everything assembles in
the proper order. I have drawn the painting at two stages. The lower
virus is just getting started, and the first cut is separating the struc-
tural proteins from the portions bound to the viral genome. The
upper virus is at the very end of the process. All of the proteins have
been processed, and they are assembling into the distinctive cone-
shaped capsid surrounded by a spherical membrane envelope.
HIV protease is one of the major targets for drug therapy. To
discover these drugs, scientists started with molecules that look
very much like the viral proteins that the enzyme cuts. Then, they
tinkered and tweaked these molecules until a version was found that
binds strongly to the enzyme but has usable properties that allow it
to be taken as a drug (7 see Fig. 17.4). This process has been ongo-
ing, continually improving the drugs and adding to our arsenal in
the fight against AIDS.As I write this chapter, there are close to a
thousand structures of HIV protease, capturing it in its many guises.
Currently, the most effective treatment plans combine these
protease inhibitor drugs with drugs that bind reverse transcriptase
Chapter 23 Reconstructing HIV
179 23
. Fig. 23.6 Virus budding. HIV Gag protein (1) and Gag-Pol (2) form arrays on the cell surface, capturing two copies of
HIV genome (in yellow), which dimerize through a specic sequence (3) and bind to a cellular transfer RNA (4) that will
act as primer for reverse transcription. Viral proteins Vpr (5) and Vif (6) are also incorporated. Several cellular proteins of
the ESCRT system (7) are involved in the process of budding
23
. Fig. 23.7 Maturation of HIV. This illustration shows an immature viron in the process of maturation at bottom right
and a nearly-mature virion at upper left. HIV protease (1) is cleaving the Gag and Gag-Pol proteins into functional
proteins
. Fig. 23.8 Broadly neutralizing antibodies attack HIV. HIV is shown at lower right, with viral proteins in red and
magenta, and viral RNA in yellow. Blood plasma is shown at the top and left side. Several broadly neutralizing antibodies
(1) are binding to HIV envelope glycoprotein (2). Other viral proteins include matrix (3), capsid (4), reverse transcriptase
(5), integrase (6), protease (7), Vif (8), and Tat (9)
23
. Fig. 23.9 Three-dimensional model of HIV. The cellPACK program (http://www.cellpack.org) was used to create a
three-dimensional model of HIV and blood plasma based on atomic structures and models of the individual molecules.
Image created by Mathieu Le Muzic and Ivan Viola from a model created by Ludovic Autin, Graham Johnson, and Arthur
Olson