Escolar Documentos
Profissional Documentos
Cultura Documentos
Edited by
Stuart L. Schreiber,
Tarun M. Kapoor, and Cunther Wess
Volume I
Related Titles
2006 2003
Hardcover Hardcover
ISBN 978-0-470-09064-0 ISBN 978-3-527-30680-0
Klipp, E., Herwig, R., Kowald, A., Nicolaou, K. C., Hanko, R.,
Wierling, C., Lehrach, H. Hartwig, W. (eds.)
2004 2002
Hardcover Hardcover
ISBN 978-3-527-30987-0 ISBN 978-0-471-49726-4
1807-2007 Knowledge for Generations
Each generation has its unique needs and aspirations. When Charles Wiley
first opened his small printing shop in lower Manhattan in 1807, it was a
generation of boundless potential searching for an identity. And we were
there, helping to define a new American literary tradition. Over half a century
later, in the midst of the Second Industrial Revolution, it was a generation
focused on building the future. Once again, we were there, supplying the
critical scientific, technical, and engineering knowledge that helped frame
the world. Throughout the 20th Century, and into the new millennium,
nations began to reach out beyond their own borders and a new international
community was born. Wiley was there, expanding its operations around the
world to enable a global exchange of ideas, opinions, and know-how.
For 200 years, Wiley has been an integral part of each generation’s journey,
enabling the flow of information and understanding necessary to meet their
needs and fulfill their aspirations. Today, bold new technologies are changing
the way we live and learn. Wiley will be there, providing you the must-have
knowledge you need to imagine new worlds, new possibilities, and new
opportunities.
Generations come and go, but you can always count on Wiley to provide you
the knowledge you need, when and where you need it!
Edited by
Stuart 1. Schreiber, Tarun M. Kupoor,
and Cunther Wess
.,CENTENNIAL
B I C I W T E N N I I L
ISBN 978-3-527-31150-7
Iv
Preface XV
Volume 1
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Cunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
vi 1 Contents
Volume 2
9 Diversity-orientedSynthesis 483
9.1 Diversity-oriented Synthesis 483
Derek S. Tan
Volume 3
Index 1151
I xv
Preface
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Giinther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
xvi 1 Preface
is even more profound as synthetic organic chemists tackle the new challenges
noted above. The objects of synthesis planning, no longer limited by the
biochemical transformations used by cells in synthesizing naturally occurring
small molecules, require radically new strategies and methodologies.
Several contributors help us answer a related question that also influences
synthetic plannig: What are the structural features of small, organic molecules
most likely to yield specific modulation of disease-relevant functions? They
note that the ability to assess the performance of these compounds, and to
compare their performance to other small molecules such as commercially
available or naturally occurring ones, is possible through public small-molecule
screening efforts and public small-molecule databases (e.g., WOMBAT,
PubChem, ChemBank). These developments are reminiscent of the early
stage of genomics research, where visionary scientists recognized the need to
create a culture of open data sharing and to develop public data repositories
(e.g., GenBank) and analysis environments (e.g., Ensembl, UCSC Genome
Browser).
Sometimes the line between small and macromolecules is blurred.
Oligosaccharides are often presented as a third class of macromolecules, yet
several contributions here reveal arguably greater similarities of carbohydrates
to small-molecule terpenes than to nucleic acids and proteins, both in terms
of their biosynthesis and cellular functions. Oligosaccharides are shown to be
synthesized by glycosyl transferases (analogous to isopentenyl pyrophosphate
transferases used in terpene biosynthesis) and, like the terpenes, are subject
to tailoring enzymes. Transferase enzymes are used to attach oligosaccharides
and terpenes to proteins, where they serve key functions (e.g., glycoproteins,
farnesylated Ras). Chemical biologists have illuminated and manipulated
oligosaccharides and the unquestionable member of the macromolecule
family, the proteins, with great aplomb. Several of our contributors are
pioneers in the revolution of protein chemistry and protein engineering, and
their chapters provide clear testimony to the consequences of these advances
to life science. Finally, in examing the similarities of and synergies between
chemical biology and systems biology, several of our contributors have perhaps
offered a glimpse into the future of these fields.
Rochdi Bouhelal
Christel A. S. Bergstrom
Novartis Institutes for
AstraZeneca R&D
BioMedical Research
Discovery Medicinal Chemistry
Lichtstrasse 35
15185 Sodertalje
4056 Basel
Sweden
Switzerland
Marco Betz
Rolf Breinbauer
Center for Biomolecular
Institute o f Organic Chemistry
Magnetic Resonance
University o f Leipzig
Institute o f Organic Chemistry
Johannisallee 29
and Chemical Biology
041 03 Leipzig
Johann Wolfgang Goethe-
Germany
University Frankfurt
Max-von-Laue-Str. 7 Erin E. Carkon
60439 Frankfurt Department o f Chemistry
Germany University o f Wisconsin
1101 University Avenue
Madison, WI 53706
USA
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
xviii 1 List ofContributors
Benjamin F. Cravatt
Tim Clackson
Neuro-Psychiatric Disorder Institute
ARIAD Pharmaceuticals, Inc.
The Skaggs Institute for Chemical
26 Landsdowne Street
Biology
Cambridge, MA 021 39-4234
The Scripps Research Institute
USA
BCC 159
10550 North Torrey Pines Rd.
Paul A. Clemons
La Jolla, CA 92037
Chemical Biology
USA
Broad Institute o f Harvard & MIT
7 Cambridge Center Sean M. Davidson
Cambridge Center, MA 02142 The Hatter Cardiovascular Institute
USA 67 Chenies Mews
University College Hospital
Philip A. Cole London WC1 E 6DB
Department o f Pharmacology United Kingdom
Johns Hopkins School o f Medicine
725 N. Wolfe St. Philip Dawson
Baltimore, MD 21 205 Department o f Cell Biology
USA and Chemistry
The Scripps Research Institute
Jon L. Collins 10550 N. Torrey Pines Road
Discovery Research. La Jolla, CA 92037
GlaxoSmithKline Discovery Research USA
Research Triangle Park, NC 27709
Frank L. Douglas
USA
Aventis Pharma
lndustriepark Hochst
Virginia W. Cornish 65926 Frankfurt
Department o f Chemistry
Germany
Columbia University
3000 Broadway, MC 31 67 Bettina Elshorst
New York, NY 10027-6948 Center for Biomolecular
USA Magnetic Resonance
Institute o f Organic Chemistry
Simon J. Crabb and Chemical Biology
School o f Chemistry Johann Wolfgang Goethe-
University o f Southampton University Frankfurt
Highfield Max-von-Laue-Str. 7
Southampton SO1 7 1 BJ 60439 Frankfurt
United Kingdom Germany
List ofcontributors I xix
Hang Yin
Department o f Chemistry European Molecular Biology
Yale University Laboratory
225 Prospect St. Gene Expression Programme
New Haven, Meyerhofstr. 1
CT 06520-8107 691 17 Heidelberg
USA Germany
PART I
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited bv Stuart L. Schreiber. Tamn M. Kauoor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
13
1
Chemistry and Biology - Historical and Philosophical
Aspects
Gerhard Quinkert, Holger Wallmeier,Norbert Windhab,and Dietmar Reichert
Dedicated to Profs. Helmut Schwarz and Utz-Hellmuth Felcht on the occasion of their
respective GOth birthdays.
1.1
Prologue
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
4
I between moleculesthe crucial aid is the open sesame represented by the periodic
1 Chemistry and Biology - Historical and Philosophical Aspects
1.2
Semantics
1.2.1
Synthesis - Genesis - Preparation
1 2 3
by the application of one of the usual degradative methods (alkali melt, effect
of oxidizing agents) to the naturally occurring dyestuff. These degradation
products were treated with an extraordinarily broad range of chemicals in
a form of intuitive combinatorial process, to examine whether the resulting
products would contain 3. In this way, Baeyer and Emmerling succeeded in
transforming isatin 10 into 3 in 1870.The preparation of 10 (from phenylacetic
acid4: 1878)was however too elaborate to becomrnerciallyviable (Scheme 1-2).
As long as the constitution of a target molecule is unknown, the above
definition of a synthesis is inadmissible. The sequence of reactions depicted in
Scheme 1-2, however, characterizes a venture that serves for the preparation
of indigo. Two other pathways that afforded indigo in the laboratory were also
not industrially viable. A. von Baeyer encouraged BASF and Farbwerke Hoechst
to undertake a systematic search for an industrial synthesis of artijicial indigo
(the constitution of which had meanwhile been established) in competition
with one another. This was finally achieved in a strategicallyclear and tactically
flexible manner through the already mentioned Heumann-P’eger synthesis
(Scheme 1-1).It was envisaged that the artificial preparation of dyes from coal
tar should become a source of national wealth. Baeyer’s Miinchen University
laboratories and the two representatives of Germany’s flowering chemical
r 1
4 5
1 6
1
H
7 a 9
Scheme 1-2 Laboratory studies ofthe preparation of indigo 3 by A. (uon) Baeyer and his
colleagues.
1.2 Semantics
17
industry had exchanged ideas and experiences in a previously unknown scale
and had thus passed the test for a collaboration in partnership. In 1905, Adolf
von Baeyer was awarded the Nobel Prize for Chemistry for his contribution to
the development of organic chemistry and the chemical industry.
It has thus been demonstrated that the example of indigo is suitable for
conceptual differentiation between molecule construction according to a plan
(synthesis) and one without a plan (preparation). It can also provide an
illustration, based on the different character of the synthetic steps involved,
of differentiation between chemical and biological synthesis steps within the
overall indigo syntheses. Chemical synthesis steps [ 17a] can be understood
to include transformations achieved not only through the use of reagents or
catalysts prepared by chemists but also those in which enzymes, antibodies,
or even dead cells are used. Synthesis steps in which the synthetic capabilities
of living cells, either possessing their original genomes or new recornbinant
variants, are deployed in a targeted manner, are classified as a part of biological
synthesis [17a]. Indigo was synthesized biologically in 1983 (Scheme 1-3) [18].
Biological indigo synthesis made use of an Escherichia coli strain with a
recornbinant genome, being capable of converting aromatic hydrocarbons in
general into cis-l,2-dihydrodiols and, in particular, indole (obtained from
tryptophan 11 with the aid of tryptophanase) into cis-2,3-dihydroxy-2,3-
dihydroindol13. The recombinant E. coli strain was augmented with the genes
expressing naphthalene dioxygenase from Pseudomonas putida. The initially
produced oxidation product spontaneously loses water, and the resulting
indoxyl 2 is converted by aerial oxidation into 3, which can be taken up into
organic solvents.
&NH2 H cis-2,3-dihydroxy-
2,3-dihydroindol
/
H
11 12 13
11
Tryptophanase
- 12
Naphthalene-
+ 13
1
dioxygenase
- H2O
Air oxidation
3 - 2
Indol-3-
glycerol- 12 2 3
phosphate
After the discussion on the biological synthesis of indigo with the aid
of a recombinant E. coli strain, one question still remaining relates to the
programmed genesis of indigo precursors in plants. Plants cultivated for indigo
production contain 2, stabilized by glycosylation (e.g., as indican = indoxyl
B-D-glucoside or as isatan B = indoxyl 5-ketogluconate) [19]. Indoxyl on its
part is produced from indole 3-glycerinephosphate [20] (Scheme 1-4) and that
in turn by the chorismate pathway.
This essay deals not only with preparation (intuitive) and synthesis (planned)
but also with genesis (programmed). Such (genetically and somatically
regulated) programs have arisen through Darwinian evolution. A plan for
a synthesis is devised by a synthetic chemist as designer and enacted by the
synthetic chemist as molecule maker. How is a synthesis planned?
1.2.2
Synthetic Design - Synthetic Execution
1.2.3
Preparative Chemistry - Synthetic Chemistry
The terms preparative chemistry and synthetic chemistry are often used
synonymously. We wish to draw some distinction between them: in preparative
chemistry we see a rich fund of knowledge from which the synthetic chemist
can draw, gained from work on chemical reactions. The preparative chemist is
concerned with broadly aimed investigations geared toward the discovery of
chemical reactions and the development and improvement of already known
ones. A chemical reaction may qualify as “mature” [17a] if it is capable of
transforming a starting compound of not too restricted substrate specificity in
a predictable manner:
under easily maintainable reaction conditions;
as far as possible with the use of substoichiometric
proportions of effective catalysts;
I Chemistry and Biology - Historical and Philosophical Aspects
10
I without restriction to a particular scale;
with high chemical yield; and
with high regio- and stereospecificity
1.3
Bringing Chemical Solutions to Chemical Problems
1.3.1
The Present Situation
At the beginning of the twenty-first century chemistry finds itself in the middle
of a phase of reorientation. In the chemical industry there is a clear trend
toward specialization and concentration. It cannot be ignored that traditional
organizational structures can be altered appreciably by investment and
disinvestment decisions, the maxim being away from the broadly diversified
chemical concern of yesterday toward the megacorporation of tomorrow,
with its focus on a few core competences. Measures adopted in established
organizations are disposition of particular branches, horizontal fusion of
adjoining core activities, and vertical integration of new high-tech ventures.
In the chemical sciences, progressive integration with chemical biology
and also with nanotechnology is underway. Self-organization of molecules
and modules into supramolecular and supramodular functional units plays a
prominent role in both fields of development, as is clear from research and
1.3 Bringing Chemical Solutions to Chemical Problems
I”
-A AC ABD 7ABCD
AB
\?AAD
N A Y D1
6 further planning variants
BC BD CD
A
+
A B C D t C 4 further planning variants
4
D
Scheme 1-5 Virtual synthetic pathways single step of an AB (AC, AD, BC, BD, or
toward the steroid skeleton with rings A, 6, CD)-building block into the ABCD system;
C, and D. Top row: stepwise conversion of a bottom row: expansion in a single step of an
ring A (B,C, or D)-building block into the A (B,C, or D)-building block into the ABCD
ABCD system; middle row: expansion in a system.
teaching in the top academic institutions. That this has been possible is due to
the development of physical methods without the aid of which it would be im-
possible even to establish the existence or presence of systems with particular
properties. The core competence of chemistry, though, remains the provision
of new molecules through synthesis, a mission equally valid for synthetic
chemists in both industrial and academic environments. Both can point to
great successes in the past. Nonetheless, synthesis finds itself in a dilemma.
Academic synthetic chemists tended to give the highest priority to the
elegance of the design of a synthesis, and this veneration was passed on to
their students. For industry’s molecular engineers, the expediency with which
the synthesis could be carried out held center stage: a concept which new
graduates did not have to come to terms with until their entry into their
industrial careers. Meanwhile, the constructive tension between elegance and
efficiency was usurped by the dream of the perfect reaction and the ideal
synthesis. The perfect reaction can be summarized in Derek Burton’s utopian
view: 100%yield, 100%stereoselectivity [25a]. B. M. Trost [25b]seeks to advance
toward the ideal through observance of atom-economy, and M. Beller [25c]
12
I through transformation of multiple-component educts into single-component
7 Chemistry and Biology - Historical and Philosophical Aspects
1.3.2
Historical Periods of Chemical Synthesis
The pre- Woodwardian era largely concerned itself with the collection and
classification of synthetic tools: chemical reactions suited to broad application
to the constitutional construction of molecular skeletons (including Kiliani’s
chain-extension of aldoses, reactions of the aldol type, and cycloadditions of
the Diels-Alder type). The pre- Woodwardian era is dominated by two synthetic
chemists: Emil Fischer and Robert Robinson. Emil Fischer was emphasizing the
importance of synthetic chemistry in biology as early as 1907 [30]. He was
probably the first to make productive use of the three-dimensional structures
of organic molecules, in the interpretation of isomerism phenomena in
carbohydrates with the aid of the Van’t Ho$ and Le Be1 tetrahedron model (cf.
family tree of aldoses in Scheme I-G),and in the explanation of the action of
an enzyme on a substrate, which assumes that the complementarily fitting
surfaces of the mutually dependent partners are noncovalently bound for a
little while to one another (shape complementarity) [31].
Robert Robinson looked for suitable reactions with the aid of which
constitutional modifications in a pathway to, for example, a steroid synthesis
might be achieved. He was probably the first to employ mechanistic
! c 7 cs
c2
Glyceraldehyde
0C1
Eryihrose
gl:$4
CH20H CH20H CH20H
/ \ / Arabinose
\ / Xylose \ / \
LYXOSQ
$
Ribose
H OH
HO
OH OH H
$ CH>OH
OH
between anionoid and cationoid atom groups [32] through space and through
the bonds lying between them (charge complementarity). Robinson used a
transparent accounting system (curly arrows) to illustrate the direction of charge
displacement (Scheme 1-7).
Case Study Estrone: Elisabeth Dane’s attempts to produce estrone 24
(Scheme 1-8)synthetically [33], beginning with a Diels-Alder reaction that
might formally give rise to two regioisomeric adduct components, ended in
disappointment: whilst no adduct at all was obtained from an attempted
reaction between the Dane diene 1 4 and the monoketonic dienophile
15a, the reaction between 14 and the biketonic dienophile 19a resulted
in a mixture of rac-20a and rac-2la, in which rac-20a, with the steroidal
molecular skeleton, was present only as a minor component. It is thus no
surprise that the Dane strategy was consigned to the files, at the end of
the 1930s.
P O
,-
Me Me
Scheme 1-7 Analysis ofthe relative orientation o f Dane’s diene 14 and the
complementary dienophile following Robinson’s way.
3) Woodward graduated as a Doctor of Philosophy 4) I have no doubt that they ( Woodwards seminars
in 1937, after submission of his dissertation at at ETH Zurich)played a major role in stimulating
M I T (Cambridge, Mass.) (341. my ownpredilectioizforand enthrallment with the
synthesis of complex natural products; A. E.: in
1351.
5) See the concise Preface in [36a].
1.3 Bringing Chemical Solutions to Chemical Problems
I 15
22a: R = Me 23 24
22b: R = Et
1.3.3
Diels-Alder Reaction - Prototype of a Synthetically Useful Reaction
I&[
Me0 \
25
Me0 &&
\
26
Me0 \
27
20
C.r:"
Me0
29 30
a:R=Me
b: R = Et
10) For further examples see the section "In- 11) Optimization of the reaction conditions was
tramolecular DielT-Alder Reactions" in carried out in the racemic series 1481. See 1491
Ref. [47]. for the synthesis ofthe enantiomerically pure
target compounds.
18
I I Chemistry and Biology - Historical and Philosophical Aspects
1.4
Bringing Chemical Solutions to Biological Problems
1.4.1
The Role o f Evolutionary Thinking in Shaping Biology
Biology is such a hugely diversified field that a historical guide hardly helps as
an aid to orientation. Given this, it might then be reasonable to consciously
pick out some particular partial aspect, as Theodosius Dobzhansky did in his
famous statement “Nothing in Biology makes Sense except in the Light of
Evolution”. With evolutionary biology as a compass, it is not hard to discern
three historical periods.
12) See [SO] for the Cuuier-Geofioydebate before 13) See [52]: Discussions between Goethe and
and beyond the Academie. Eckerrnann of the 2nd August 1830.
1.4 Bringing Chemical Solutions to Biological Problems
questions was left open; that of whether in the union of two gametes into
a zygote a mixture of the genes involved took place (blending inheritance),
occupied a key position. It could only be answered after:
Gregor Mendel [54]had set out statistical rules for the passing
on of particular hereditary characteristics from generation to
generation, which are useful for discussion on the complex
relationships in questions of heredity, and
Wilhelm]ohannsen [55] had coined the terms phenotype and
genotype, which made it possible to distinguish between a
statistically apparent type (the phenotype) of observable
properties and the corresponding genetic make-up (the
genotype) of an organism.
The distinction between genotype and phenotype facilitated the separation
between genetics and embryology. It is clear from this separation that the
differentiation between genetic and environmental causes in embryology and
the wider discipline of developmental biology is something to talk about.
unfit mutants making uphill progress until a local peak is reached. For the
evolutionary process in the high-dimensional sequence space, local peaks in
the vicinity may readily be reached by small jumps, without the need to traverse
the valleys between them, and a continuous sequence of small jumps to reach
a global summit is a realistic prospect. To use Eigen’s own words: “Because
of frequent criss-crossing of paths in multidimensional sequence space, by
virtue of its inherent non-linear mechanism which gives the appearance of
goal-directednessthe process of evolution is steered in the direction of optimal
value peak” [8b]. In brief, biological evolution uses two processes: genetic
mutation (as a means of generating random diversity) and natural selection
(as a means to optimize the peak-jumping technique) in the environmentally
shaped fitness landscape.
Through the removal of subdisciplinary barriers, biology’s evolutionary
thinking has contributed on two occasions to enhance that science’s voice in
the choir of the natural sciences. In the 1940s and 1950s, a union of Darwinian
and Mendelian perspectives took place in Modern Synthesis [65], whilst at the
turn of the twentieth to the twenty-first century a union of developmental
and evolutionary biology into evolutionary developmental biology (Evo-Devo)
is taking place before our eyes in the New Synthesis [66].
1.4.2
O n the Sequence of Chemical Synthesis (Preparation) and Biological Analysis
(Screening)
In an ideal starting situation for the synthetic chemist the structure of the
target molecule is already given. In the real world of the search for active
substances, the matter of whether a target molecule is to be synthesized is
determined by its presumed profile of properties. If a management decision
is made in favor of a target molecule to be synthesized, the synthetic chemist
then looks for a way to relate molecular function back to molecular structure.
This is based on the supposition that a functional unit should contain at
least two structurally complementary molecules non-covalently bound to one
another in a supermolecule. The idea of supermolecules as supramolecular
functional units, nowadays preached and systematically further developed
most conspicuously by Jean-Marie Lehn [67], goes back directly to Emil Fischer
[31], who introduced the instructive lock-and-key metaphor as early as 1894.
Fischer’s metaphor, as the tip of the submerged model of molecular recognition,
traces the function of a supermolecule back to structural interactions between
its complementary constituents. Through this, the complementarity between
substrate and enzyme was to become the basis of enzymology. Paul Ehrlich
seized on the lock-and-key metaphor in his 1908 Nobel lecture [68], and the
goal of chemotherapeutic endeavor thereafter came to be regarded as the
activation or deactivation of a receptor through noncovalent binding of a
7.4 Bringing Chemical Solutions to Biological Problems
HO
32
&
3, a: R = M e 33
b: R = Et
;fi
\
Me0 a: R = Me
35 b: R = Et
34
Me0
Me0
37 38
What this means in detail should become clear through illustration with
later-generation gestagens.
Gestoden 39 (Scheme 1-11) has the lowest ovulation inhibitory dose
of all gestagens known to date. It displays both antiestrogenic and
antimineralcorticoidal activity. A lower affinity to the androgen re-
ceptor is not sufficient to produce measurable anabolic androgenic
effects.
The pathway to 39 passes through compound 47 (Scheme 1-12) [7G] and
after microbiological introduction of an 0 function at C(15) (with the aid
of Penicilliurn ruistuickii), on through the stations 48 (R = H or Ac) and
49 [77]. Compound 31b, incidentally, can be easily obtained starting from
47 [78].
Desogestrel 40 (Scheme 1-11) is a progestagen that is transformed in
the intestinal mucosa and in the liver into the actual effective metabo-
lite 3-ketogestrel. The bioavailability is around 75%. Desogestrel, obtained
partially synthetically by chemists at Orgunon [79], displays minimal an-
drogenic and estrogenic activity. The long pathway from the 19-nor-
steroid estr-4-ene-3,17-dione includes a microbiological hydroxylation of
39 40 41
.J-:3:1
&&
42 43 44
<! 0 0 /
O A O E t
45 46 47
48 49
54 55 56 57
58 59 60
61 62 63 38
26 b 64 65
66 67 68 69
70 71 72 73
74 75 76
Pinkus and Chang (Section 1.4.2.1.1),in their search for orally applicable
contraceptives, had decided upon norethindrone after some 200 steroidal
candidates had been examined one by one. Chemists at Schering AG had
stumbled upon drospirenone after some 600 newly prepared molecules
with antialdosterone activity had become available [84].It can be justifiably
stated that the hardly ineffectual pharmaceutical industry had finished up
in a Mind alley in its search for new active substances by using traditional
strategies [85].
The rapidly progressing expansion of the world market, where new suppliers
have arrived in great numbers (globalization), places serious decisions before
the management of every multinational company [86] (see Section 1.3.1).
These are not merely restricted to restructuring of portfolios of the products
manufactured; they also do not exclude the reorganization of the entire
company structure”). Under real pressure from financial analysts and
resumptive pressure from shareholders, questions have also been directed
toward the scientists involved: whether there might be new methods that
could afford more rapid access to new active substances. The answer was not
long in coming: with chirotechnologyI8)and the combinatorial acceleration of
the preparation and screening of whole populations of molecular candidates,
a new turn has been taken in the solution of biological problems through
chemical methods.
17) The consequences arising from reorganiza- 18) One of the main challenges of synthetic
tion of the structure of a business may be chemistry in the post-Woodwardian era (see
guessed by careful market analysis. Most dif- Section 1.3.2.3) is to find routes that sat-
ficult to predict is the reaction of employees. isfy the demands of industrial applicability
If the creative people among them are not to enantiomerically pure compounds [37].
convinced by the new orientation, or have In 1992, various international journals (Fi-
even been put off by the way in which it has nancial Times, Neue Ziircher Zeitung, Science,
been implemented, they may defect to the and Chemical & Engineering News), as if co-
competition, thus doubly weakening their ordinated by a global editor, touched on the
previous employer. phenomenon of chirality. C&EN even pre-
dicted that chirotechnology may progress in
the future as biotechnology had grown in the
past.
1.4 Bringing Chemical Solutions to Biological Problems
FK 506 Rapamycin
-4
CsA
molecular complexity. One can’t help wondering why the traditional method,
I 31
preparative rounds, each round allowing for the parallel attachment of one out
of seven building blocks available.
The complete set of monomeric building blocks used in the construction
of the combinatorial variation of Scheme 1-17 is shown in Scheme 1-18.The
aesthetic elegance of the combinatorial strategy reveals itself when compared
with alternative strategies*’).
The bead-bound substrate variation was screened for binding to a biological
receptor (a fluorescence-conjugated immunophilin [87])by mixing a sample
of the charged beads with a buffer containing the complementary protein.
The beads that carry variants with affinity for the receptor are easily identified
by visual inspection under a microscope with a fluorescent illuminator and
removed with the aid of a (non-plastic) syringe. The sequence of each bead-
bound substrate variant has been determined indirectly but unambiguously
by Clark Still’s encoding-decoding alternation [93].
Molecular encoding: During each step of the construction of a focused variation
of tripeptides (see Scheme 1-17)tagging molecules are attached to the beads
that encode both the step number (one through 21) and the reagent (amino
I 33
6 OCH3
77
6 OCH3
78 79
OCH3
Scheme 1-19 On-bead molecules (rac-77 and roc-78) selected from the variation of
Scheme 1-17. and the seeming target structure 79.
H 0
\
80 0 81
82
81 a)82
81 -bl
83
82+83 - - -
CI
84
d)
85
+86
e)
80
a) 6 0 ~ ~aq0 NaOH,
, dioxane, 90 %
b) MeOH. SOClp, 98 %
c ) 2-Chloro-1methylpyridiniumiodide, CH2Cl2.NEt3. 50 %
d) MeOH. 2.5 N NaOH, 74 %
e) 2-Chloro-1methylpyridiniumiodide, CH2Clp,NEt3. 86 %
coevolution between them and the host may occur. There is, however, a
tremendous difference between a static variation and the immune system.
While the processes of preparation and screening of a static variation were
designed by chemists, what happens in immunology was not designed but
rather evolved.
The preparation of a dynamic variation (to be described in the following
section) is somewhat in between the two extremes, though very much closer
to the designer's end.
setup profile, a static molecular variation was prepared (on microscale) and
screened (collectively) to afford a select variant qualifying as the candidate
for subsequent synthesis (on macroscale). In this section, we present the self-
assembly ofa variation ofthree sets ofconjugates from which an added receptor
selects a number of effectors by molecular recognition. This selection works
by way of the interactions of protein surfaces within the receptor-effector
supermolecule, the knowledge of which ought to be helpful in drug design.
The self-assembly to be introduced is based on three pyranosyl-RNA (p-RNA)
[96] single strands (a, b, and c, Scheme 1-21) associating in a Watson-Crick-like
manner, initially into binary and further on into ternary super molecule^^^). In
ci + aj *aj : ci,
25) (1)and (2) form closed subsystems. As soon out of the three single conjugates. Since
as all three components are present, how- this corresponds to third-order kinetics, a
ever, the full system of equilibria (1-5) is process of this type is significantly less prob-
valid. Equilibrium (5) represents the syn- able than the purely bimolecular processes
chronous formation of the ternary complex (1-4).
1.4 Bringing Chemical Solutions to Biological Problems
I 37
I I ' I l \ rh
a acb b
C
Variation of [a]
Variation of [b]
~ Variation of [c]
26) I t should be pointed out that the transition 27) For the conjugates the following p-RNA se-
from ac to cb does not take place as a quences have been used: a = {CGGGGGNJ.
direct, single process, but should be regarded b = [NGAAGGG], and c = (CCCTCTNCC
only as a conflation of processes ac cf CCCG}. N is a tryptamine nucleoside [98],
a + c and cb c) c + b. The corresponding which serves to attach the oligopeptides
edge of the bipyramid thus - unlike the (discrete random variation of hexapeptides
other edges - does not symbolize a single composed of the amino acids C, E, F, H , K ,
equilibrium. L, N, R, S, T, W).
38
I 7 Chemistry and Biology - Historical and Philosophical Aspects
are exchanged. There are three types of pure binary substitutions, and two
higher order substitutions where one conjugate is substituted for two others at
a time. Whether these simultaneous exchanges of several conjugates, as well as
the higher order associations and dissociations are relevant, though, remains
to be determined experimentally. The alternative of stepwise processes is
available in any case.
Topologically, the molecular species can be ordered into four levels of
complexity28’(Scheme 1-25). On the simplest level is the free receptor R. The
level above is represented by the binary complexes R:A, R B , and R C , the next
level by the ternary complexes RAB, RAC, and RBC, whilst lastly the level of
highest complexity is occupied by the quaternary complex R:ACB. Accordingly,
the participating species can be arranged as vertices of a cube. All possible
equilibria are now either edges, or face- or space-diagonals of the cube and the
system is, by definition, described by a point inside the cube at any time.
The cube-style representation shows, firstly, that pathways from one species
to another are possible either via both edges and diagonals, or exclusively via
29) The biotinylated conjugates (ACB, AC, BC, 30) The enzyme is mixed with its photolabeled
or C) are captured by a sensor chip, whose substrate S. Upon cleavage by the enzyme,
surface is coated with immobilized strept- the label is activated and fluorescence can be
avidin and which acts via surface plasmon detected. In case ofinhibition by the effector,
resonance as a tool for enzyme (R) binding cleavage does not occur and fluorescence is
experiments. not detected.
7.4 Bringing Chemical Solutions to Biological Problems
Scheme 1-26 Correlation diagram of affinity (binding) and activity (inhibition) for some
nodes ofthe network of Scheme 1-25. Values for ACB are set to 100%.
with [A] = [B] = 5000 nM and [C] = 555 nM, where the properties of A and B
have a 10 times greater statisticalweight than those of C33).From the foregoing
discussion it can be directly inferred that the activity of a conjugate triplet is
not connected to a single molecular species from Scheme 1-25.
Given the dynamics of the supramolecular system described, one could
go a step further and transgress the confinements of molecular constitution.
It should be just as possible to use carbohydrates, steroids, terpenes or
even nonbiogenic substance classes - dendrimers, for example - in place
of the peptides. Through the addition of conjugates of different types of
constitution, the transition from one type to another could be studied in a
quasi-continuous way, opening up a further, new option for the determination
of structure-activity relationships.
The dynamics of the system allows it to adapt to changes in the environment.
Adaptation here means that the balance between the interactions inside the
effector (between the individual conjugates) on the one hand and those
I 45
between the effector and the receptor on the other hand, can change. Therefore,
depending on the prevailing conditions, different molecular species may be
responsible for the effects produced at the receptor. Particular combinations
of members of the three sets described may be used to map the affinity
profile of the receptor. In short: receptor profiling directly results from a
thorough investigation of the dynamic system under discussion. It reveals the
complementarity between the sites of the interacting surfaces of receptor and
effectors and suggests the design for a specific, biologically active substance
finally taking over from the analyzing effectors.
Ultimately, the potential ofbiologicallyactive substances can only be assessed
in actual biological systems by means of animal experiments (Scheme 1-29)
and confirmed by subsequent clinical studies. En route to this, however, the
dynamic system described here offers various options for the analysis and
optimization of pharmacological parameters like affinity and activity. It is the
heterobifunctional character of the dynamic system that allows the synthetic
chemist to influence both intrinsic self-assembly as well as extrinsic molecular
recognition in a controlled way.
1.5
Bringing Biological Solutions to Chemical Problems
1.5.1
Proteins 1991
long time [17f],were finally taken up by the biochemist who could not afford
to ignore bio-macromolecules like nuclear acids and proteins any longer.
The bottom-up view of the biochemist eventually was complemented by the
top-down attitude of the (molecular) biologist. Quite a few of those scientists
who considered themselves molecular biologists entertained the idea [ 100aI
that “other laws of physics’ might be discovered by studying the gene”. This
search for the physical paradox [100b] remained an important element of the
psychological infrastructure of the creators of molecular biology. As a matter of
fact, the physicists among the new group were going to create a new approach
to biology [loll.
1.5.1.1.1
1.5 Bringing Biological Solutions to Chemical Problems
The resulting adenylated amino acid appears to be tightly bound to its specific
enzyme, the corresponding aminoacyl-tRNA synthetase. without leaving its
enzyme, the former, in a consecutive step, reacts with a low-molecular-weight
RNA (called soluble RNA = sRNA, later more logically known as transfer RNA
= tRNA) to afford an aminoacyl-tRNA [115,116].
acid. Thus one is lead to suppose that after the activating step, discovered by
I 49
Hoagland and described earlier (vide supra), some other more specific step is
needed before the amino acid can reach the template”.
Which template? Several observations had excluded rRNAs from being
candidates for acting as templates. A cell, for example, could make a new type
of protein without making a new type of ribosome. The template-RNA was
finally disinterred as a class of unstable intermediates, self-explanatorilycalled
messenger-RNAs ( ~ R N A s ) ~When ~ ) . J . D. Watson informed the scientific
community “About the Involvement of RNA in the Synthesis of Protein”
[117a]he could begin with the sentence: “The ordered interaction of the three
classes of RNA controls the assembly of amino acids into protein”.
Now essential details in brief: protein genesis (translation) is the central event
in molecular biology. It takes place in the incredibly complex machinery3’)
of the ribosome [124], where the syntactic structure of ribonucleic acids is
translated into the syntactic structure of proteins. During the translation
process, the information contained in a triplet codon of mRNA is decrypted by
an anticodon of a tRNA molecule, according to the instructions of the genetic
code. The genetic code is an abstract scheme for the redundant correlation of 64
“words” (nucleoside triplets) in the language of nucleic acids with 20 “words”
(canonical amino acids) in the language of proteins. The synthetic chemist
accepts the limitation on the number of amino acid building blocks as the
price for his readymade use of the ribosomal protein generating system. The
undisputed leading actors in the translation process at the stage of information
transfer from ribonucleic acids to proteins are aminoacyl-tRNAs [ 1251. These
are conjugates made up of proportions of both biopolymer types (language
systems), produced through esterification of an amino acid with a tRNA. A
particular tRNA with its anticodon corresponding to a specific amino acid is
covalently coupled (esterified) with precisely this amino acid. The esterification
takes place through the help of an enzyme (an aminoacyl-tRNA synthetase)
capable of specifically recognizing and coupling that particular tRNA and its
cognate amino acid [126].Whilst the self-assembly of mRNA and tRNA during
translation is due to codon-anticodon interaction, based on Watson-Crick
36) Messenger-RNAs were the last of the RNA 37) In an urgent appeal, we are certainly going to
trio engaged in protein genesis, to be de- follow henceforth, Carl Woese [123] requests
tected [120]. A further type of RNA has been to stop looking at an organism as a molecular
discovered as a widespread, universal tool machine. The machine metaphor, according
in biology for gene regulation by means of to his view, overlooks much of what biology
antisense-like interactions [121]. It is called is. To understand living systems in any deep
inductive RNA (RNAi) and is produced from sense, “we must come to see them not
double stranded RNA in a cascade of enzy- materialistically, as machines, but as stable
matic processes by a set of specific RNAses. complex, dynamic organization”.
Several regulatory pathways involving RNAi
are known in many eukaryotes, including
plants and mammals. RNAi is used exten-
sively as a tool for research and its therapeutic
potential is getting more and more obvious
[122].
1 Chemistry and Biology - Historical and Philosophical Aspects
50
I pairing of complementary nucleobases, the mutual recognition of a tRNA and
its cognate synthetase during aminoacyl-tRNA formation is due to molecular
shape complementarity.
By Natural Selection
The genetic code has the potential for 64 (=43) triplet codons, 61 of which
redundantly specify the 20 canonical amino acids. The methionine-specifying
triple code AUG may take on the role of a starting signal at the beginning
of protein synthesis: it thus has a double function. Three triplet codes in a
mRNA - UAA (ochre), UGA (opal), and UAG (amber) - known as nonsense
codons, specify no amino acids; that is, there are no tRNAs with complementary
anticodons for these codons. As a consequence, translation breaks off here.
The nonsense codons are also, therefore, termed stop signals (termination
codons). Broader roles in protein genesis, however, have also been established
for two of these three stop signals in recent years. In E. coli (and also in a
whole range of other organisms) the UGA codon may be redefined to perform
one of two different functions: either it may function as a stop codon and thus
end the elongation of the protein chain under construction, or further growth
of the polypeptide chain may carry on with incorporation of selenocysteine
[129],not a member of the standard set of canonical amino acids. Which of
the two instructions is followed by the translation system is dictated by the
secondary and tertiary structure of the mRNA to be decrypted (and possibly by
protein factors). Similarly, structural alterations in mRNA are able to modify
the programming of the UAG codon: once more, a codon that continues a
translation in progress, in this case through the incorporation of pyrrolysine
[130], is produced from a stop codon. The genetic code is thus naturally
expanded from the standard set. Instead of the original 20 amino acids, 22
amino acids specified by mRNA sequences are currently recognized. Further
as yet unrecognized extensions of the genetic code through natural selection
cannot be excluded. Why no sense codon has (yet) been found to be doubly
1.5 Bringing Biological Solutions to Chemical Problems
coded, is unclear. The discovery that the genetic code, as a result of natural
I 51
selection, already has more than 20 amino acid building blocks for protein
genesis in store, poses the question of whether the genetic code might also be
expandable by design; that is, whether amino acids not specified by the genetic
code in their original version might be introducible into a polypeptide chain
by translation.
By Design [131]
Peter G. Schultz, a leading protagonist of the movement to consider biology
an engineering discipline, is aiming at the construction of new proteins and,
eventually of new organisms with enhanced properties. Two alternatives for
site-specific in vivo incorporation into proteins, of amino acids not specified
by the genetic code in their original version, have been designed to achieve
that goal: systematic reassignment of three-base nonsense codons or use of
supersized codons.
The addition of a non-canonical amino acid to the genetic code requires - in
the first case - additional components of the protein producing system: a
noncanonical amino acid, an exogenous tRNA/aminoacyl-tRNA synthetase
pair, and an unique codon that specifies the amino acid of interest.
Orthogonality between the exogenous translational components (Scheme 1-30)
and their endogenous opposite numbers is the key feature of this approach.
With the effect
that the codon for the noncanonical amino acid should not
encode a canonical amino acid;
that the new tRNA or the cognate aminoacyl-tRNA synthetase
should not cross-react with any endogenous tRNA/synthetase
pair; and
that the new synthetase should recognize only the
noncanonical and not any of the canonical amino acids.
A completely autonomous bacterium with a 21 amino acid genetic code was
engineered. The bacterium can generate p-aminophenylalanine from basic
carbon sources and incorporate this amino acid into proteins in response to
the amber nonsense codon (1321.
As the restriction of non-coding triplet codons limits the number of non-
canonical amino acids, the question arises as to whether or not expansion of
the genetic code by use of a supersized codon and cognate tRNA with an ex-
panded anticodon loop might be possible. A study Exploring the Limits of Codon
and Anticodon Size [133] reveals that the E. coli ribosome is capable of using
codons of three to five nucleobases. The tRNAs that decode these codons are
most efficient with a Watson-Crick complementary anticodon containing two
additional nucleotides on either side of the normal-sized anticodon in the loop.
An orthogonal synthetase/tRNA pair was designed and constructed, which
site-specifically incorporates a noncanonical amino acid (L-homoglutamin)
into proteins of E. coli in response to the four-base codon AGGA [134].
J Chemistry and Biology - Historical and Philosophical Aspects
52
I
Scheme 1-30 Incorporation of (a) canonical (yellow) and (b) noncanonical (red) amino
acids into proteins in vivo.
1.5.2
Antibodies
The ribosomal system is not the only evolutionary accomplishment the syn-
thetic chemist might use in pursuit of his ends. The immune system offers
an example of how a biological solution can successfully be brought to exploit
antibodies as enzymatic catalysts. As far as their functions are concerned,
enzymes and antibodies normally are quite different. Enzymes have been
selected for the transition state of a catalyzed reaction over millions of years
[105].Antibodies have been selected for their affinity for the immunogen over
a period ofweeks [135].Ifthe immunogen were a transition state analogue, the
resulting antibodies should catalyze the appropriate reaction. Richard A. Lemer
and Peter G. Schultz with their respective colleagues have designed molecules
1. I; Bringing Biological Solutions to Biological Problems
that could be used to guide the process of clonal expansion and somatic muta-
I 53
1.6
Bringing Biological Solutions to Biological Problems
In the past it has been tried to link the problem to the question of
life’s origin in terms of molecular evolution [144]. Recently, sequencing of
the human and other complete genomes has shed some new light on this
field. The question of what the minimal set of genes would be necessary
for a living organism can be put more concisely in the context of what
is now called synthetic biology [145]. Both approaches, the top-down way of
deactivating more and more genes of an existing species [146]and the bottom-
up way of assembling genes to build an organism with a fully synthetic
genome [147],have not yet reached the goal to explain the transition from the
inanimate to the animate world. On the one hand, results obtained through
different methods to identify the minimal set of genes that constitute a living
organism point to roughly 250 genes [148]. On the other hand, none of
the synthetic constructs obtained so far covers the central functionality of
life, self-construction, metabolism, adaptation, self-repair, reproduction, and
evolution [149].
Nonetheless, the bottom-up route has turned into an engineering approach to
synthetic biology [150].The strategy is to combine predefined DNA modules,
so-called bio-bricks that can be combined to bio-circuits, designed to be
implementations of biological functions [ 1511. In that sense, synthetic biology
is seen as the successor of molecular cloning, in particular, with respect to
safety issues.
1.7
EPI LOCUE
To round offthis essay, we point to two issues gaining more and more emphasis
in chemistry. One thing is the problem of shared use of the limited sources of
energy and raw materials. The other thing is the concept of a total synthesis, in
particular for complex natural substances. Both topics underline that organic
chemistry is far from being pure routine applying a comprehensive toolbox
to solve any problem in synthesis [ 1521. Medical therapeutics, agrochemicals,
and high-performance materials must be provided by organic chemistry to
fulfill global needs.
1.7.1
The Fossil Fuel Dilemma o f Present Chemical Industry
For chemical industry, the interdependence of energy source and raw material
supply is typical. This double function of fossil fuel to act as a source of raw
material supply as well as an energy source will have to be terminated in
a not-too-distant future [153]. Being the main source of raw material, fossil
fuel should be maintained as long as possible for the chemical industry. A
final way out to disentangle energy requirement and raw material supply
would be to find new sources for one field or the other. Nuclear energy,
I
1.7 EPlLOCUE 55
despite political moves to dispense with nuclear power, could play a role
as an alternative to fossil fuel. With petroleum supplies dwindling, there
is increasing interest in selective methods for transforming other carbon
feedstocks into hydrocarbons suitable for transportation fuel. The reductive
oligomerization of CO and H l to produce hydrocarbons (specificallyn-alkanes)
with highly controlled molecular weight (Fischer-Tropsch process [154]) from
the vast reserve of coal, natural gas, oil, or biomass is one such process that
was developed in the 1920s. The Goldman-Brookhart process (tandem alkane
dehydrogenation-olefin metathesis [155]) is of a similar kind, but of recent
origin.
1.7.2
Two Lessons From the Wealth o f Published Total Syntheses
The final proof of the structure of a natural product after the latter has also
been synthesized in the chemist’s lab was, for a long time, common procedure
[156]. In a few cases, disagreement raised a few eyebrows. This was the case
for patchouli alcohol and for a molecule called hexacyclinol [157]. Quinine is
an example of the difficulties associated with the notion of a total synthesis.
Shouts [35, 37,1581 and murmurs [llb,159] have been expressed to comment
on the wealth of total syntheses of natural products performed in the second
half of the twentieth century.
1.7.2.1 Synthetic Lesson from Patchouli Alcohol: The Trouble with “the Last
Structural Proof’ [160]
The peculiar case of patchouli alcohol (87) (Scheme 1-31) was told and
commentated by Jack D. Dunitz [IbOa]. Following W. H. Perkin’s jun.
advice [I561 to perform, as a final proof of structure a total synthesis of
a natural product 87 was synthesized [IGOc]. The synthetic product proved
to be identical to sesquiterpene whose structure had been derived from
the results of a long series of chemical experiments lasting more than
50 years and apparently confirmed in 1961 by total synthesis [IGOc]. In
spite of this, X-ray structure determination [IbOa] revealed that the accepted
structure of patchouli alcohol was wrong. A careful reinvestigation showed
that during chemical degradation as well as during synthesis a rearrangement
of the molecular skeleton had taken place. The first reaction step of the
chemical degradation (acetate pyrolysis affording patchoulene 88) and the last
reaction step of the chemical synthesis (hydrolysis of the epoxide 89 obtained
from 88) were accompanied by a rearrangement proceeding in precisely
the reverse direction of the rearrangement in the other case. Taking this
56
I 1 Chemistry and Biology - Historical and Philosophical Aspects
Degradation 87 Synthesis
a7 88
t i
89
(b)
Scheme 1-31 Synthesis and degradation of Patchouli alcohol.
1.7.2.2 Synthetic Lesson From Quinine 90: The Trouble with Formal Total
Syntheses [161a]
In the period between 1918 and 2001, a series of publications appeared that
changed the claim of the total synthesis of 90 (Scheme 1-32) as a fact into a
myth. It started with a paper of Rabe and Kindler in 1918 [lGlb]on the partial
synthesis of 90 from quinitoxine (91),via quininone (92) (Scheme 1-32a).91 is
a relais compound to 90, since it can easily be made from 90. In 1944 and 1945,
Woodward and Doring published two papers [lGle]where they linked the par-
tial synthesis of Rabe and Kindler to their own synthesis of 91 (Scheme 1-32b),
taking the combination as a total synthesis of 90. Not being convinced of the
view of Woodward and Doring, Stork published a new total synthesis of 90
1.7 EPILOGUE
I 57
92 90 9-epf-quinine
quinidine 9-epr-quinidine
HO HOP Me N
A HO F MeN , Ac - qN, 0
Me
Ac
isoquinoline-7dl mixture of
stereoisomers
OMe
91 90
J.-+.OTBS .POTBDPS
oAf=
OTBDPS
94
in 2001 [Iblfl. He started from the Taniguchi lactone (94) and proceeded via
desoxyquinine (95) (Scheme 1-32c).According to Stork, a distinction between
a real total synthesis and a formal one is necessary. Accordingly, the work of
Woodward and Doring is an example of a formal total synthesis.
Acknowledgments
149. (a) P.L. Luisi, About various 158. D.H.R. Barton, The relevance of
definitions of life, Origins ofL@ and organic chemistry, Chem. Britain
Evolution ofthe Biosphere 1998, 28, 1973, 9, 149.
613; (b) B. Korzeniewski, Cybernetic 159. (a) R. Huisgen, The adventure
formulation of the definition of life, Playground of Mechanisms and
/. theor. Biol. 2001, 209, 275; (c) Y.N. Novel Reactions, in: Profiles,
Zhuravlev, V.A. Avetisov, The Pathways, and Dreams, J.I. Seeman
definition of life in the context of its (Ed.),American Chemical Society,
origin, Biogeosciences 2006, 3, 281; Washington DC, 1994, p. X X I I ;
(d) D.E. Koshland Jr.,The seven (b) P. Schmalz, Interview mit Gilbert
pillars of life, Science 2002, 295, Stork: Organische - Zukunft und
2215. Gegenwart, Nachr. Chew. Tech. Lab.
150. (a) E. Andrianantoandro, S. Basu, 1987, 35, 349.
D.K. Karig, R. Weiss, Synthetic 160. (a) J.D. Dunitz, X-Ray Analysis and the
biology: new engineering rules for an Structure of Organic Molecules, Cornell
emerging discipline, Mol. Systems University Press, Ithaca, 1978, p. 310;
Biol. 2006, 2, msb4100073; (b) P. Fu, (b) J. Fleming, Selected Organic
A perspective of synthetic biology: Syntheses, Wiley, London, 1973,
assembling building blocks for novel p. 125; (c) G. Buchi, R.E. Erickson,
functions, Biotechnol. /. 2006, 1, 690; N. Wakabayashi, Constitution of
(c) J.B. Tucker, R.A. Zilinskas, The Patchouli Alcohol, /. A m . Chem. Soc.
promise and perils of synthetic 1961, 83,927; (d) G. Buchi, W.D.
biology, Trte New Atlantis 2006, McLeod jr., J. Padilla O., Synthesis of
Spring 2006,25. Patchouli Alcohol, 1.Am. Chem. SOL.
151. A registry of standardized modules 1964, 86,4438.
can be found at http://parts.mit.edu. 161. (a) S.M. Weinreb, Synthetic lessons
152. Editorial, Beauties of Synthesis, from quinine, Nature 2001, 21 1, 429;
Nature 2006, 443, 1. (b) P. Rabe, K. Kindler, Uber die
153. K. Weissermel, Energie und Rohstoff partielle Synthese des Chinins, Ber.
entkoppeln, aber wie?, Lecture given dtsch. chem. Ges. 1918, 51, 466;
in Frankfurt am Main, Feb. 22nd, (c)T.S. Kaufman, E.A. Ruveda, The
1980, Hicom GmbH, quest for quinine: Those Who Won
http://www.hicom.de. the Battles and Those Who Won the
154. K. Weissermel, H.-J. Arpe, Industrial War, Angew. Chem. Internat. Ed.
Organic Chemistry, Fourth Edition, 2005, 44, 854; (d) ].I. Seeman, The
Wiley-VCH, Weinheim, 2003. Woodward-Doeringl Rabe- Kindler
155. A.S. Goldman, A.H. Roy, Z. Ahuja, Total Synthesis of Quinine: Setting
W. Schinski, M. Brookhart, Catalytic the Record Straight, Angew. Chem.
Alkane Metathesis by Tandem Internat. Ed. in press; (e) R.B.
Alkane Dehydrogenation-Olefin Woodward, W.E. Doering, The total
Metathesis, Science 2006, 312, synthesis of quinine, J . A m . Chem.
257. Soc. 1994, 66, 849; 1945, 67,860;
156. W.H. Perkin, Jr., Experiments on the (fl G. Stork, D. Niu, A. Fujimoto,
synthesis of the terpenes. Part I., /. E.R. Koft, J.M. Balkovec, J.R. Tata,
Chem. Soc. 1904,85,654. G.R. Dake, The first stereoselective
157. E. Marris, The proofis in the synthesis of quinine, J . Am. Chem.
product, Nature 2006, 442,492. Soc. 2001, 123, 3239.
PART II
Using Small Molecules to Explore Biology
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, T a r u n M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I 71
2
Using Natural Products to Unravel Biological Mechanisms
2.1
Using Small Molecules to Unravel Biological Mechanisms
Outlook
2.1.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
72
I 2 Using Natural Products to Unravel Biological Mechanisms
2.1.2
Use of Small Molecules to Link a Protein Target to a Cellular Phenotype
Small molecules with dramatic cellular phenotypes have been used, without
knowledge of their protein target, to provide insight into biological processes.
If the effects of a small molecule are well characterized, then identification
of the protein target immediately provides a wealth of information about its
cellular functions because of the known inhibition phenotypes.
I 0'
I
Replicated Sp'indle fiber
chromosome pair
Taxol
iv V vi
Fig. 2.1-1 (a) Overview o f mitosis. move in opposite directions. (v) The cell
(i) Chromosomes are replicated before divides as the cleavage furrow forms
mitosis. (ii) The spindle forms and between the separated chromosomes.
chromosomes attach to spindle fibers. (vi) Two daughter cells form, each with
(iii) Chromosomes move t o the center ofthe exactly one copy of each chromosome.
spindle at metaphase. (iv) Sister (b) Structures o f t w o small molecules that
chromosomes separate at anaphase and target microtubules: colchicine and taxol.
Fig. 2.1-3 (a) Structures ofthe small gated by capsaicin binding, heat, and
molecule capsaicin and menthol. protons. (c) Response of the VR1 receptor
(b) Schematic o f the VR1 receptor, a channel t o capsaicin, temperature, and pH.
nonspecific cation channel. The channel is Adapted from [Ref. 281.
2. J Using Small Molecules to Unravel Biological Mechanisms
2.1.3
Small Molecules as Probes for Biological Processes
8 .
(I4-
with benomyl
Cells dead due to catastrophic
chromosome misegregation
* *
Benomyl Benomyl removed
Fig. 2.1-4 Screening strategy used t o missegregation and eventual cell death.
identify genes required for feedback control (b) Cells were mutagenized, and colonies
o f anaphase onset in budding yeast [35]. were grown from single cells and then
(a) Cells were arrested in mitosis for 20 h transferred t o create two replicate plates.
with benomyl, a small molecule that targets One plate (top) was grown without benomyl.
tubulin and prevents spindle formation. The second plate (bottom) was treated with
After removal o f benomyl, wild-type cells benomyl. Colonies that failed to grow on the
form a spindle and proceed normally second plate, indicating defective feedback
through mitosis. Mutant cells fail to arrest control, were selected from the first plate t o
and enter anaphase without forming a identify the mutated gene.
spindle, causing chromosome
the genes identified in these screens. The Mad and Bub genes, which
are well conserved from yeast to mammals, have provided the foundation
for much of our current understanding of the mitotic spindle checkpoint.
Studies in transgenic mice have confirmed the importance of several of these
genes for faithful chromosome segregation in higher eukaryotes, as reduced
expression increases both aneuploidy and cancer susceptibility. In human
tumors, mutations have been reported in Madl, Mad2, Bubl, and BubRl, a
related vertebrate protein (reviewed in [Ref. 11. Additionally, human germline
mutations in BubR1 have been linked to mosaic variegated aneuploidy, a
condition associated with high risk of cancer [37].
Experiments examining the intracellular localization of Mad2 have suggested
a model for how the feedback control mechanism might operate [38, 391. At
early stages of mitosis, Mad2 localizes to the kinetochore, a structure that forms
on each chromosome and mediates attachment to spindle microtubules. As
80
I 2 Using Natural Products to Unravel Biological Mechanisms
Anti-Mad2 antibody
Using Small Molecules to Unravel Biological Mechanisms
@+&I
arrest in mitosis with
monopolar spindles due to
activation of the spindle
checkpoint. Microinjection o f an
antibody against the protein
Mad2 inactivates the checkpoint
p
b monopolar
so that cellsspindles.
divide with
Monastrol
(b)
IV
-b -b
H
2.7 Using Small Molecules to Unravel Biological Mechanisms
4 Fig. 2.1-6 Correction o f improper (c) Spindles were fixed after bipolarization
chromosome attachments by activation o f either in the absence (i) or in the presence
Aurora kinase [44]. (a) Structures o f t w o (ii) o f a n Aurora kinase inhibitor.
Aurora kinase inhibitors (AKI), hesperadin Chromosomes are shown in blue and
and AKI-1. (b) Assay schematic. microtubule fibers in green. The arrows
(i) Treatment with the Eg5 inhibitor indicate sister chromosomes that are both
monastrol arrests cells in mitosis with attached t o the same spindle pole.
monopolar spindles, in which sister Projections o f multiple image planes are
chromosomes are often both attached to the shown, with optical sections o f boxed
single spindle pole. (ii) Hesperadin, an regions (1 and 2) t o highlight attachment
Aurora kinase inhibitor, is added as errors. Scale bar 5 pm. (d) After removal o f
monastrol is removed. As the spindle hesperadin, CFP tubulin (top) and
bipolarizes with Aurora kinase inhibited, chromosomes (bottom) were imaged live by
attachment errors fail t o correct so that three-dimensional confocal fluorescence
some sister chromosomes are still attached microcopy and differential interference
t o the same pole o f t h e bipolar spindle. contrast (DIC), respectively. The arrow and
(iii) Removal o f hesperadin activates Aurora arrowhead show two chromosomes that
kinase. Incorrect attachments are move to the spindle pole (marked by a circle
destabilized by disassembling the in DIC images) as the associated
microtubule fibers, pulling the kinetochore-microtubule fibers shorten, and
chromosomes to the pole, while correct then move t o the center ofthe spindle. Time
attachments are stable. (iv) Chromosomes (min:s) after removal of hesperadin. Scale
move from the pole to the center ofthe bar 5 pm. (With permission from Lampson
spindle as correct attachments form. et al. N a t . Cell Biol. 2004, Ref. 44.)
solution to this problem because they can be used to inhibit kinase function
and subsequently removed to activate the kinase. Understanding the function
of Aurora kinases is particularly important because they have been linked to
oncogenesis, and Aurora kinase inhibitors are currently in development as
cancer therapeutics [47, 481.
Several issues needed to be addressed to devise a strategy to address the
question of how attachment errors were corrected. First, kinase inhibition
should be temporally controlled to experimentally isolate the error correction
process, as Aurora kinases have been implicated in multiple mitotic processes.
Second, error correction likely involves some regulation of the dynamics of the
microtubule fibers that attach chromosomes to the spindle. These dynamics
can be analyzed with high temporal and spatial resolution by high-resolution
microscopy in living cells. Finally, the dynamics of individual microtubule
fibers are difficult to analyze if that fiber is obscured by other microtubules in
the spindle. The dynamics can be clearly observed, however, under conditions
in which the improperly attached chromosomes are positioned away from the
spindle body.
All of these issues were addressed through the development of an assay
using several reversible small molecule inhibitors (Fig. 2.1-6) [44]. First,
treatment with the Eg5 inhibitor monastrol arrests cells in mitosis with
monopolar spindles (Fig. 2.1-G(b) i). A particular chromosome attachment
error in which both sisters are attached to the single spindle pole, referred to
as syntelic attachment, is frequent in the monopolar spindles [49]. If monastrol
84
I is removed, the spindle becomes bipolar, all of the accumulated attachment
2 Using Natural Products to Unravel Biological Mechanisms
Fig. 2.1-7 (a) Schematic ofthe secretory ARF CTPase. Exchange o f GDP for GTP on
pathway. Transport vesicles carry membrane ARF triggers ARF-CTP binding t o Colgi
and soluble material from the ER t o the membranes. After ARF-CFP binding, the
Colgi and from the Golgi to the plasma coatamer complex assembles on the
membrane, where the soluble contents are membrane and induces budding o f a
released into the extracellular space. transport vesicle. ARF hydrolyzes CTP after
(b) Structure of the small molecule Brefeldin vesicle budding t o release coatamer and
A. (c) Regulation ofvesicle budding by the ARF-CDP from the membrane.
Together these experiments linked the COPI complex with forward membrane
transport from the Golgi, through the observed effects of BFA on both COPI
coat assembly and the dynamics of ER-Golgi trafficking.
2.7 Using Small Molecules t o Unravel Biological Mechanisms
Elongated
polypeptide chain
-OR
Growing
polypeptide chain
NHz ReleasedtRNA
", Purornycin
Fig. 2.1-8 (a) Elongation o f a polypeptide peptidyl-tRNA. (b) The small molecule
chain. The amino group ofthe incoming puromycin replaces the arninoacyl-tRNA in
aminoacyl-tRNA joins the carbonyl group o f the polypeptide chain and prevents further
the growing polypeptide chain to replace the elongation.
2.1.4
Conclusion
References
2.2
2.2 Using Natural Products to Unravel Cell Biology
I 95
Outlook
In recent years, a new discipline has emerged from the interface of chemistry
and biology, known as chemical biology. The unique foundation of this field is
the examination of biological questions through the use of chemical probes. An
example of chemical genetics is the use of biologically active natural products
as “inducible alleles” for elucidating protein function. In this chapter, we
discuss a variety of different natural products and their use in understanding
cell biology.
2.2.1
Introduction
2.2.2
Historical Development
Evolution has taught us that biological systems find or create ways to adapt
to exogenous forces or stressors. Natural products are often the result of this
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
96
l survival mechanism. These often highly potent small molecules encompass
2 Using Natural Products to Unravel Biological Mechanisms
2.2.3
General Considerations
2.2.4
Applications and Practical Examples
of the first HDAC inhibitors trichostatin A (TSA)1 and trapoxin (TPX) 2 in the
I 97
1990s [3] these, and other similar inhibitors have provided insight into a diverse
array of cell-signaling events: cell cycle arrest, apoptosis, cell differentiation,
angiogenesis, and metastasis inhibition. The general mechanism of action
for many of these natural products entails an aliphatic chain with a metal
chelating moiety that interferes with zinc coordination in the binding pocket
of their targeted HDACs.
0
3
2.2.4.1.1 Trichostatin A
The antifungal natural product TSA, originally isolated from a Streptomyces,was
found to have reversible biological activity at low nanomolar concentrations.
Yoshida and coworkers [4]demonstrated that TSA causes the induction of
Friend leukemia cell differentiation as well as inhibition of the cell cycle of
normal rat fibroblasts in the G I and G2 phases. This initial work revealed that
at low nanomolar concentrations, TSA induces the accumulation of acetylated
histones because of inhibiting HDAC activity within the cell.
TSA has also been shown to induce apoptosis in various tumor cell lines
[5] thereby making HDACs possible targets for cancer treatment. By blocking
HDACs, inhibitors such as TSA affect the level of gene transcription, causing
both the up- and downregulation of many genes ( ~ 2 % of the genome)
[GI. For example, TSA was found to reduce the expression of cyclin B1, a
key cyclin for G2-M transition, but in fact also stimulated expression of
p21C1P/WAF, an inhibitor of cyclin-dependent kinase (CDK)and Cdc2. Through
TSA-mediated HDAC inhibition, the G2-M transition is blocked because of
98
I 2 Using Natural Products to Unravel Biological Mechanisms
increased transcription of cell cycle regulators, p21C'P/WAF and cyclin B1. This
occurs via the modulation of histone acetylation at these gene promoters [7].
In addition, TSA has proved useful in the elucidation of important nuances
of cell differentiation. Cell cycle inhibitors had shown that inhibition of
proliferation was necessary, but not sufficient, for the differentiation of
neuronal precursor cells into oligodendrocytes [8]. Given the significant level
of chromatin remodeling that accompanies cellular differentiation, Marin-
Husstege and colleagues [9]hypothesized that histone acetylation plays a role
in oligodendrocyte differentiation. Using synchronized primary neonatal rat
cortical progenitors that were induced to differentiate into oligodendrocytes,
the authors showed that there is a temporal window during which histone
deacetylation is correlated with the acquisition of a branched morphology and
myelin gene expression. TSA-treated progenitors were able to exit from the
cell cycle but did not progress into oligodendrocytes. The ability of HDAC
inhibitors to inhibit oligodendrocyte differentiation is cell lineage dependent,
although TSA did not affect the precursor cells' ability to differentiate into
astrocytes. These results suggest that transcriptional repression is a crucial
event during oligodendrocyte lineage progression.
2.2.4.1.2 Trapoxin
The irreversible HDAC inhibitor TPX was first isolated as a fungal metabolite
that induced morphological reversion of v-sis-transformed NIH 3T3 cells
[lo]. Using the known structure-activity relationship between other HDAC
inhibitors as a guide, a TPX affinity reagent was synthesized and used to
identify its target protein as a HDAC [ll].
TPX was used to elucidate the protein interactions necessary for HDAC
mediated transcriptional repression via the Mad:Max ternary complex [ 121.
Previous studies had suggested that Mad:Max transcriptional repression
was mediated by ternary complex formation with another unknown protein.
Biochemical experiments identified the proteins mSin3A or B as the primary
candidates responsible for this negative transcriptional function. Coexpression
of activated or inactivated MAD (a DNA-binding transcription factor) in the
presence of TPX demonstrated that HDAC activity was necessary for ternary
complex formation. Additionally, these and other experiments showed that
the Mad:Max heterocomplexes repress transcription in a mSin3A-associated
H DAC-dependent manner.
activity against various cancer cell lines [14], and like depudecin, displays
potent in uitro and in uivo antiangiogenic activities [15, 161. Thus, given the
ability of HDAC inhibitors to arrest cell proliferation and reverse tumor cell
morphology, HDAC inhibitors have generated much attention as a new class
of antitumor drugs.
\ /
N
OH OH
5 6 7 8
QOH
CI
OH 0
9 10 11
2.2.4.2.2 Flavopiridol
Flavopiridol (FLV) 11 is a sernisynthetic flavinoid derived from rohitukine, an
indigenous plant from India [30]. FLV can induce cell cycle arrest by three
mechanisms: (a) direct inhibition of CDK via binding in the ATP-binding site;
2.2 Using Natural Products to Unravel Cell Biology I 101
2.2.4.3.1 Lactacystin
Originally characterized as a microbial metabolite that induced neurite
outgrowth in neuroblastoma cells [39, 401, lactacystin 14 was later found
to be a potent inhibitor of cell proliferation [41]. Using a [3H] lactacystin
analog, Fenteany and coworkers [39] demonstrated that lactacystin and
its related clasto-B-lactone covalently bind the N-terminal threonine of the
20s proteasome subunit. Functionally, lactacystin is a relatively nonspecific
protease inhibitor, also showing significant inhibition of peptidyl peptidase I1
and cathepsin A [40].Despite this cross-inhibitory activity, lactacystin has been
used to investigate the role of the Ub proteasome pathway in a diverse array
of systems such as Alzheimer’s disease, breast cancer, neurobiology, kidney
research, and nephrology, to name a few [41-461.
102
I 2 Using Natural Products to Unravel Biological Mechanisms
15
13
14
2.2.4.3.2 a,b-Epoxyketones
Selective covalent inhibitors of proteasome have also been developed.
Epoxomicin and eponemycin are members of the cr,B-epoxyketone class
of proteasome inhibitors that were isolated from actinomycete strains and
found to exhibit in vivo antitumor activity against B16 melanoma [47,48]. Early
structure activity studies and structural motifs present in similar molecules
suggested that the terminal epoxyketone moiety was an important aspect of
the functional pharmacophore, possibly via covalent modification of its target
protein. Through synthetic chemistry and biochemical affinity techniques, the
natural products and corresponding biotinylated affinity reagents have been
used to identify the 20s proteasome as the molecular target of epoxomicin 12
and eponemycin 13 [38,491.
X-ray crystallographic analysis demonstrated that the epoxyketone pharma-
cophore of epoxomicin forms a covalent adduct as a morpholino ring [SO] with
the amino terminal threonine of the 20s proteasome. Epoxomicin draws its
specificity from the uniqueness of the proteasomal N-terminal threonine; non-
proteasomal proteases lack an N-terminal nucleophilic residue and thus cannot
form a stable covalent morpholino adduct with the epoxomicin epoxyketone
pharmacophore [50].
These potent and specific proteasome inhibitors have been used to answer
questions in a number of biological fields and systems. For example, protea-
some inhibitors have been used to investigate inflammation, cancer biology
2.2 Using Natural Products to Unravel Cell Biology I 103
2.2.4.3.3 TMC-95A
Recently, more selective noncovalent inhibitors of proteasome have been
developed. TMC-95A 15 is a potent and reversible selective inhibitor of the
chymotrypsin-like, trypsinlike, and caspaselike activities ofthe 20s proteasome.
Comparatively, TMC-95A shows no inhibition of calpain, cathepsin, or trypsin.
This selectivity in activity has led to a great deal of current biological interest
in TMC-95A [50, 52,531 including X-ray crystallographic analysis showing that
TMC-95A does not covalently bind the yeast proteasome [54].
16
I 0
’
17
2.2.4.5.1 Curcuminoids
Curcuminoids, a group of natural products originally isolated from the
Indian spice turmeric, have been known to be potent antioxidant and anti-
inflammatory agents for many years. Curcuminoids reduce tissue factor (TF)
gene expression through the inhibition of the AP-1 and NF-KB transcription
factors and thus lead to the loss of angiogenesis initiation [Gl,621.
19
106
I and TNP-470 [G4].X-ray crystal structures of the free and the fumagillin-bound
2 Using Natural Products t o Unravel Biological Mechanisms
20
22 21
2.2 Using Natural Products t o Unravel Cell Biology 1 107
2.2.4.6.2 Rapamycin
The fungal immunosuppressive agent rapamycin was isolated from Strepto-
myces hygroscopicus, originally found in a soil sample from Rapa-Nui, Easter
Island in 1975. Although structurally similar to FK 506, rapamycin demon-
strated markedly different activity. Rapamycin does not affect the progression
from Go to GI, but rather blocks T-cell progression from GI to S phase.
As FK 506 and rapamycin share structural similarities, it was not surprising
that rapamycin also bound FKBP 12. However, binding studies revealed that
the FKBP 12-rapamycin complex does not target calcineurin, as done by the
F K 50G-FKBP 12 complex. Rather, using FKBP 12-rapamycin complex as an
affinity reagent, the lipid kinases target of rapamycin 1 and 2 (TOR1 and
TOR2) were identified [71]; these proteins possess homology to the mam-
malian phosphatidyl inositol-3-kinases, which are involved in the regulation
of cell cycle progression in stimulated cells. Studies have shown that growth
factor addition to cells leads to TOR activation and subsequent increased p70
SG kinase activity [72].
2.2.4.7.1 Capsaicin
Some of the most commonly and frequently used spices throughout the
world are hot peppers of the Capsicum family, of which capsaicin 23 is the
major pungent ingredient. Because of its analgesic and anti-inflammatory
activities, topical application of capsaicin has been used for the treatment of
a variety of neuropathic pain conditions. Autoradiographic visualization of
a tritiated resiniferatoxin probe in tissues of various species identified the
vanilloid receptor (VR) as a molecular target [73, 741. Additionally, capsaicin
was used as a molecular probe to isolate the first nociceptive receptor, VR1[75].
Characterization of VR1 revealed it to be a member of the Transient Receptor
Potential (TRP)ion channel family and a nonselective cation channel activated
by capsaicin or elevated temperatures.
'0
24
2.2 Using Natural Products t o Unravel Cell Biology I 109
2.2.4.7.2 Parthenolide
Parthenolide 24, the biologically active natural product in the medicinal
herb Feverfew, has been used for 2000 years to treat fevers, headaches, and
inflammation [76]. Initial studies of the anti-inflammatory of parthenolide
activity showed that it was a potent inhibitor of NF-KB nuclear translocation
as well as I K B phosphorylation. Using a biotinylated analog of parthenolide
in affinity chromatography experiments revealed that parthenolide formed a
covalent adduct with IKB Kinase beta (IKK-B) in a specific and dose-dependent
manner [77]. This specific interaction between IKKB and parthenolide was
confirmed by mass spectrometric analysis. Parthenolide was shown to
form a covalent adduct with Cys179 of IKKB, which lies between the two
phosphorylated serines in the kinase activation loop. Moreover, constitutively
activated protein with a Cysl79Ala point mutation was found to be insensitive
to 40 pM parthenolide, indicating that parthenolide inhibits IKKB via Michael
addition by Cys179 in the kinase activation loop [77].
2.2.5
Future Development
2.2.6
Conclusions
After a decade, both natural products and cell-based bioassay screening, which
were out of favor, are making a renaissance in the pharmaceutical industry.
Natural products still offer an impressive range of chemical diversity and
have a long track record of providing scaffolds for successful drugs. A greater
appreciation of their potential for the identification of novel hit structures
is propelling a new interest in the use of natural product screens in the
pharmaceutical industry. Likewise, cell-based bioassays are regaining some
of their previous acceptance in the drug development process, primarily
because of the success of novel target deconvolution strategies. New proteomic
technologies are largely behind the belief that the pharmaceutical industry has
the ability to identify the targets of compounds identified in cell-based assays.
Obviously, not all biologically active compounds identified in these screens
will be developed into therapeutic agents. However, this renewed interest in
both natural products and cell-based assays will, in turn, offer many new
2 Using Natural Products to Unravel Biologicd Mechanisms
110
I opportunities for the development of novel cell biological probes, using the
fruits of these screens.
Acknowledgments
The authors would like to acknowledge the financial support of the NIH (grant
GMG21G0).
References
3
Engineering Control Over Protein Function Using Chemistry
3.1
Revealing Biological Specificity by Engineering Protein- Ligand Interactions
Outlook
Protein function can be altered in a rapid and graded manner through small
molecule ligand binding in both natural systems and through drug design. In
natural systems evolutionary pressure can lead to accumulation of mutations
that influence ligand binding specificity, thereby altering protein function.
Similarly, in the laboratory, mutations that have well defined effects on a
protein’s ligand specificity can provide a functional handle to elucidate the
protein’s biological role. Here we explore examples of mutations, introduced
in the laboratory or found in nature, that cause significant changes to protein
ligand specificity, with an emphasis on the biological and biochemical lessons
learned from these studies. The examples described here illustrate both the
challenges and the power of engineering protein-ligand interactions in order
to elucidate a protein’s biological role.
3.1 .I
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited bv Stuart L. Schreiber. Tarun M. Kauoor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag G k b H & Co KGaA, Weinheirn
ISBN 978-3-527-31150-7
116
I powerful means to investigate these biological activities (e.g., observing the
3 Engineering Control Over Protein Function Using Chemistry
3.1.2
The Selection of Resistance Mutations to Small-molecule Agents
Fig. 3.1-1 HIV PR bound to a NC-pl peptide substrate [3] (a) and nelfinavir (b) (41.
in the enzyme and the Ala-to-Val mutation in the substrate are found to
coevolve.
There are at least three lessons from HIV PR inhibitor resistance. First,
relatively few mutations are often sufficient to induce inhibitor resistance,
and in many cases a single point mutation is sufficient. Interestingly, several
mutations allow HIV to overcome inhibitor sensitivity demonstrating that
there are numerous solutions to the same engineering problem. While the
mutations are focused in regions that directly contact the inhibitor, as we might
expect, some are sufficiently subtle (e.g., acting through slight rearrangements
of the protein core) that it is hard to imagine predicting similar mutations
while attempting to rationally engineer a protein.
Second, relatively few mutations may be necessary to convergently engineer
a protein and its substrate - in this case natural selection led to a HIV PR
mutation (I82A) that changed its substrate selectivity and a compensatory
change in one of its substrates.
While the first two lessons are encouraging for the purposes of engineering
proteins with altered specificities, the third lesson is largely cautionary:
protein functions can be intimately interconnected. In at least one case,
altering the inhibitor surface of HIV PR affected the substrate specificity
of the mutant proteases. For this reason, engineering projects that intend
to dissect individual functions of a given protein must also take care to
control other unintended changes to the protein function. For example, it is
common that engineering a protein will adversely affect its stability or activity.
This natural example demonstrates the feasibility but also the challenges of
mutating a protein to alter its ligand specificity using only a small number of
mutations.
Me,,
Rapamycin
Fig. 3.1-3 The crystal structure of imatinib bound t o Abl kinase [12]. The gatekeeper
residue (T315, colored red) packs tightly against imatinib (PDB: 1 IEP).
3. I Revealing Biological Specificity by Engineering Protein-Ligand Interactions I 125
position 315 of Bcr-Abl makes contact with the exocyclic amine of ATP and,
thus, lines the adenine-binding pocket of the kinase. The ATP-binding pocket
of most protein kinases is larger than necessary for binding ATP, especially
in the vicinity of the exocyclic amine of ATP. Thus, a large hydrophobic
pocket adjacent to adenine is available for small-molecule inhibitor binding.
Importantly, the size of the amino acid residue at position 315 controls access
to this extra pocket, and thus it has been termed the gatekeeper residue. In the
T315I mutant Bcr-Abl kinase, imatinib cannot access the hydrophobic pocket
because the larger isoleucine residue blocks its access. Since the bulkier
isoleuciiie occupies a pocket not used by substrate ATP, the T315I mutant is
still able to efficiently bind ATP and catalyze phosphotransfer reactions.
As the predominance of imatinib resistance mechanisms can be traced to
Bcr-Abl functional upregulation, the clinical resistance offers another proof
of mechanism akin to the genetic screen which identified TOR as the target
of rapamycin discussed in Section 3.1.2.2. In the former case imatinib was
more or less designed to be a Brc-Abl inhibitor, thus its target was known
from the outset of the clinical trial. In the case of rapamycin, a genetic
screen to identify its target(s) was carried out to identify the molecular basis
for its effect on immune suppression. In an amalgam between these two
paradigms for target identification and clinical efficacy, a B-Raf inhibitor
BAY43-9006 displayed disappointing efficacy in clinical trials of myeloma
patients, despite the identification of activating mutations in B-Raf, in this
form of cancer. Luckily, BAY43-9006 was also used in clinical trials of other
cancer types, where it showed surprising efficacy in the treatment of renal
cancer, which is thought to be particularly dependent on vascularization.
Subsequent biochemical studies demonstrated that BAY43-9006, which was
originally thought to be a highly specific B-Raf inhibitor, is a potent inhibitor
of the vascular endothelial growth factor receptor (VEGFR),providing a post
fucto rationale for its efficacy in this VEGFR-dependent cancer type [13].
In another case of small-molecule assisted target identification, the imatinib
response of patients with idiopathic hypereosinophilic syndrome lead to the
identification of a chromosomal rearrangement involving the tyrosine kinase,
and the known imatinib target, PDGFR, as a likely cause of this syndrome
[14]. The link between the PDGFR fusion and hypereosinophilic syndrome
was further strengthened when, after extended imatinib therapy, a relapse in
one patient was observed to correlate with the emergence of a T674I mutation
in PDGFRA. T674 is the gatekeeper residue in PDGFRA.
Similarly, imatinib has been found to be a useful therapy for gastrointestinal
stromal tumors (GIST)which is driven by the c-Kittyrosine kinase, a previously
known “off-target’’ of imatinib when it was being developed as a Bcr-Abl
inhibitor. Again, resistance to imatinib in GIST patients has emerged and c-Kit
ATP-binding site mutations to the gatekeeper residue (T670I) is commonly
found [ 151.
The lessons learned from irnatinib, BAY-43-9006suggest that cancers can
be uniquely dependent on the catalytic activity of a single kinase. Moreover,
126
I because of the highly conserved nature of the kinase ATP-binding pocket,
3 Engineering Control Over Protein Function Using Chemistry
drug candidates always inhibit multiple family members. In some cases, off-
target effects will lead to new medicines (BAY43-9006).In some other cases
of course, off-target effects will lead to toxic side effects, and will predictably
lead to failures of clinical trials. Moreover, because a single amino acid in
the binding pocket of kinases, the gatekeeper residue, can control inhibitor-
binding specificity, resistance to these drugs has emerged quickly in cancer
patients. A central challenge in all therapeutic areas is to identify key kinase
targets for the treatment of the signaling defects in human diseases.
3.1.3
ExploitingSensitizing Mutations to Engineer Nucleotide Binding Pockets
Me
PPl 1NM-PP1
inhibitor was designed (based on the parent inhibitor PPl), which is only
capable of inhibiting kinases containing a glycine or alanine gatekeeper
residue. Importantly, the kinases with the smallest naturally occurring
gatekeeper residues, serine and threonine, are not inhibited by 1NM-PP1
(Fig. 3.1-4).It is interesting to note that the gatekeeper residue was selected
on the basis of structural models of kinase-ATP crystal structures and docking
models of pyrazolopyrimidine-based inhibitors prior to the discovery of the
gatekeeper mutations in imatinib resistant CML patients. The fact that
gatekeeper mutations can be used to confer inhibitor sensitivity through
rational design and inhibitor resistance through natural selection processes
highlights that this residue is a dominant feature controlling small molecule
access to the ATP-binding pocket without affecting kinase activity.
4-(03P)30
OH OH OH OH
ATP N6-Benzyl ATP
OH OH
GTP
3.1.4
Engineeringthe Ligand Selectively of Ion Channels
Fig. 3.1-7 The crystal structure o f EF-Tu bound to a nonhydrolyzable CTP analog shows
Asp138 hydrogen bonding t o guanine. (PDB: 1 EXM).
kinase pathway (Fig. 3.1-8). Thus, the DHP-resistant T1006Y mutant L-type
calcium channel provides the specificity handle necessary to dissect the
activity of L-type calcium signaling. For example, this TlO06Y channel was
instrumental in the identification of a calmodulin-binding site on the C-
terminus of the channel. This binding site provides insight as to how L-type
calcium channel signaling can use local Ca2+ influx to interface specifically
with other cellular signaling pathways.
Capsaicin
3.1.5
Conclusion
3.1.5.3 Conclusion
Reengineering protein-ligand interactions can provide powerful information
that complements traditional biochemical and genetic approaches. The power
of these engineering approaches will increase as new methods are developed
both in protein engineering and in our ability to genetically manipulate
the organisms we wish to study. These engineering approaches are most
useful in vitro or in organisms where genetic manipulation is tractable,
such as bacteria, yeast, flies, and mice. As pharmacological agents that
target wild-type proteins become increasingly selective, these reagents will
complement chemical genetic tools. Even in these cases, however, engineering
protein-ligand interactions can provide important information about the
specificity of the pharmacological agent, as was discussed earlier for rapamycin.
While the genome is vast, many of its features reoccur (e.g., domains,
cofactors, etc.) in several different signaling contexts. This biochemical
similarity presents a specificity problem on one hand but an engineering
opportunity on the other; introducing specificity handles using carefully
designed mutations can help provide insight into critical connections between
biochemical specificity and biological function.
References
39. H.A. Greisman, C.O. Pabo, A general 41. M.D. Simon, K.M. Shokat,
strategy for selecting high-affinity zinc Adaptability at a protein-dna interface:
finger proteins for diverse dna target re-engineering the engrailed
sites, Science 1997, 275(5300),657-61. homeodomain to recognize an
40. R.R. Beerli, B Dreier, C.F. Barbas, unnatural nucleotide, J . Am. Chem.
Engineering polydactyl zinc-finger SOC.2004, 126(26),8078-9.
transcription factors, Nat. Biotechnol.
2002, 20(2), 135-41.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
140
I 3 Engineering Control Over Protein Function Using Chemistry
3.2
Controlling Protein Function by Caged Compounds
Andrea Giordano, Sirus Zarbakhsh, and Carsten Schultz
3.2.1
Introduction
3.2.2
Photoactivatable Groups and Their Applications
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Giinther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
3.2 Controlling Protein Function by Caged Compounds I 141
I OYX
"&" \ /
tI
I
t
X
a:
a: I ko
qq
3
3
N
0
CK-
LT
z
zII-
-K
o-+g
X
t
3.2 Controlling Protein Function by Caged Compounds I 143
H3C0W H
I NU2 OCH3
OCH3
DMNPE DNP NTP DMNTP
carboxylate group was attached to the cage (CNB, Fig. 3.2-1C), eliminating
the problem 1171. In addition, this CNB group showed faster release kinetics
than the N B group [17]. CNB has also been successfully used to cage glycine
derivatives [20]. However, additional charges are not always beneficial. CAMP-
dependent protein kinase A (PKA) was made to react with CNB bromide to
yield a caged version of the enzyme [21]. The caging group was introduced at
Cys199 and inactivated PKA. Unfortunately, the caged protein was unable to
undergo significant photoactivation. In contrast, simple o-nitrobenzyl bromide-
modified PKA not only exhibited a substantial loss in kinase activity but also
showed a 20-30 fold reactivation of the catalytic activity upon exposure to UV
light (for more detailed information on caged PKA, see below).
A particular form of CNB is (2-nitropheny1)glycine (Npg). This artificial
amino acid (Npg, Fig. 3.2-1D) was successfully incorporated into ion channels
like the nicotinic acetylcholine receptor [22] by nonsense suppression, a
technique developed by Peter Schultz and coworkers [23-261. Irradiation (4
h, > 360 nm) of proteins containing Npg led to peptide backbone cleavage in
Xenopus oocytes [22].
Like the nitrobenzyl group, NPE and CNB groups absorb only weakly at
wavelengths greater than 340 nm, thus limiting applications in the suitable
range of 350-400 nm. Wavelengths under 300 nm are inconvenient because
of considerable absorption by proteins and nucleic acids as well as by any kind
of glass, including microscope lenses.
This was overcome when electron-donating groups were added to the
aromatic moiety. The 4,5-dimethoxy-2-nitrobenzyl (DMNB) (Fig. 3.2-1E) cage
(2-nitroveratryl) was introduced in 1970 by Patchornik and Woodward as
144
I a “nitrogen” protecting group [27]. The substituents on the aromatic ring
3 Engineering Control Over Protein Function Using Chemistry
w o , eH n z y m e
HO-enzyme
*OH
alcohol groups. Amino groups are readily reacted with chloroformate deriva-
tives (Scheme 3.2-3).In fact, the most commonly used nitrobenzyl derivative
(DMNB-OCOC~)is commercially available. Other reagents are prepared by
reaction of the alcohol with phosgene or alternatively with carbonyldiimida-
zole (CDI) [42]. Caging reactions proceed under mild conditions in aqueous
solution at slightly basic pH (9-10) [28]. An alternative is p-nitrophenyl car-
bonate esters. The leaving group permitted the formation of a carbamate
directly from the hydrochloric acid salt of glutamate in the presence of
4-(dimethylarnino)pyridine(DMAP) at room temperature (Scheme 3.2-3) [57].
Thiol groups are preferentially reacted with aryl methylhalogenides, for
instance, bromo nitrobenzyl derivatives (Scheme 3.2-4). The conditions are
extremely mild (Tris buffer pH 7.2) and reactions were reported to be finished
within an hour [71].When the reactive caging group is equipped with a suitable
amino acid docking sequence, a specific cysteine can be labeled, even with a
300-fold excess of the reagent [21]. Another photoactivatable caging reagent
that covalently binds to thiols in proteins is the a-haloacetophenone group.
Its aromatic character is recognized particularly well by phosphotyrosine
phosphatases (PTP) [72,73].Accordingly,haloacetophenone groups are potent
photoreleasable inhibitors of PTPs in vitro. No details about the labeling
procedure have been published so far.
It is of special interest to label serine and threonine residues, due to their role
as acceptors for posttranslational modifications, namely, for phosphorylation.
Tris buffer
- PH 7.2 R
-
R-SH +
CFsS03H NO2
R-OH + Cl3C"Q
' CH2C12 Rm*ocH3
OCH3
OCH3 OCH3
Scheme 3.2-5 A method that does not require base to form ethers o f hydroxy amino acids.
(b)
B r A T s N H - N h - B r d O H T s EtSN ~~d
\ \
AcO AcO AcO
Scheme 3.2-6 Two commonly used synthetic routes to diazo compounds. Ts - tosyl.
derivative of the caging group was reacted with hydrazine, followed by oxidation
to the diazo compound in the presence of MnOz (Scheme 3.2-G(a))[lo,81,821.
After the removal of MnOz by filtration and several washes, the diazo reagent
was used mostly without further purification. In an alternative method, a tosyl
hydrazone was formed. Treatment with base then gave the diazo compound
(Scheme 3.2-G(b))[GO, 611.
3.2.3
Caged Peptides and Proteins
0 20 40 60
Time (min)
iodoacetate (IA-TMR) directed against Cys707 that this residue was crucial
for sliding of F-actin filaments in the in vitro motility assay. Therefore, it
was reasoned that Cys707-caged HMM could show a similar behavior, which
eventually could be reverted upon photoactivation. HMM was reacted with
DMNB-Br in aqueous buffer at pH 7.4. Two cage groups per HMM molecule
(or one cage per ATPase domain of HMM) were incorporated in the reported
protocol. Although the calcium/ATPase activity of purified caged HMM was
increased fivefold compared to unlabeled HMM, caged HMM failed to produce
appreciable sliding of F-actin filaments, unless irradiated with pulsed (500 ms)
340-400 nm UV light, conditions that produced sliding of 90% of F-actin
filaments in the in vitro motility assay with a velocity of up to 4 pm s-l, a value
comparable to unmodified HMM [%I.
Protein kinases constitute a large family of enzymes (>500) whose activity
includes the transfer of the y -phosphoryl group of ATP to serine, threonine,
and tyrosine residues in a wide range of protein substrates, giving rise to
a large collection of phosphorylation-based signal transduction pathways. A
well-defined spatially and temporally activatable kinase is of invaluable utility
in elucidating many aspects of signal transduction phenomena in living cells,
under both physiological and pathological conditions.
One of the best-studied kinases is protein kinase A. An interesting
comparison of the behavior of three different caged catalytic subunits of
PKA was reported by Bayley and colleagues [91]. Working with a single
cysteine mutant (C343S) of the murine catalytic subunit of PKA, the unique
Cys residue 199 was masked with the thiol-reactive cage groups NB-Br, CNB-
Br, and DMNB-Br. Cys199 is placed in close proximity to the critical Thr197
in the “activation loop” of the enzyme [92]. The caged protein showed, as
expected, a significant inactivation when kinase activity was tested in vitro with
the artificial substrate Kemptide (LRRASLG).Interestingly, only the NB-caged
enzyme showed, among the three, low values of residual activity after caging
(3-5%) and satisfactory activity after photolysis (pH 6.0,80- 100%)with respect
to the unmodified enzyme. Moreover, the quantum yield of photolysis was an
impressive 0.84. The ‘‘lesson’’from this work, using the authors’ phrasing,
is that given a particular target protein a variety of photoremovable protecting
groups have to be tested since a reagent that works well with one protein (for
instance, the CNB-caged aHL described earlier) may not work well with others.
Cofilin is a kinase-regulated, F-actin binding protein whose activation state
is regulated by phosphorylation at Ser3 through the LIM-domain-containing
kinase (LIM kinase). Unphosphorylated cofilin monomers bind cooperatively
to F-actin in vitro leading to depolymerization of actin filaments [93], while
phosphorylation by LIM kinase inactivates these features of the cofilin function
(Fig. 3.2-5).Lawrence and coworkers [94]observed that the cysteine mutant S3C
cofilin is constitutively active because it is unable to undergo phosphorylation
by LIM kinase, while a CNB-caged S3C cofilin is unable to depolymerize
actin filaments in vitro. This shows the importance of Ser3 for cofilin activity.
Accordingly, S3C cofilin activity was restored up to 80% upon irradiation and
154
I 3 Engineering Control Over Protein Function Using Chemistry
Fig. 3.2-5 Activity o f cofilin initiated by local decaging. A 2-s laser pulse aimed at the area
indicated in F gave local protrusion within 1 t o 3 rnin. With permission from Ref. [95].
hi.
6
S"H
s o
hV / +$OH
/ + Cys-protein
+ /e i $
OR OR OR OR
Scheme 3.2-7 Cysteine-containing proteins like phosphatases are caged in the active site
with phenacyl bromides or chlorides.
156
I 3 Engineering Control Over Protein Function Using Chemistry
ATP(r)S HP-Br
Tlg7Ca b Ti 97Ca
Tig7Ca
PDK-1 kinase hv
I
0 0
I I
Br -s-p=o S-P-OH
Q
I II
OH 0
H0’
HP-Br= OH
3.2.4
Caged Proteins by Introduction o f Photoactive Residues via Site Directed,
Unnatural Amino Acid Mutagenesis
is substituted with a nonsense codon (like the amber stop codon UAG)
via standard site-directed mutagenesis, (b) a specific “nonsense suppressor”
tRNA able to recognize this codon is prepared and acylated with the desired
unnatural amino acid, (c) addition of the mutagenized gene or mRNA and the
aminoacylated suppressor tRNA to an in vitro extract or biosynthetic apparatus
generates a mutant protein containing the unnatural amino acid at the desired
position.
Thus, the generation of the specific suppressor tRNA, its acylation with the
unnatural residue, and the synthesis of sufficient amount of mutagenized
protein are the key steps of the entire methodology, more recently expanded
in some technical aspects from its original design [101-103].
With this technique, caged amino acids have been successfully introduced
into various protein sequences as unnatural residues. Enzymatic catalysis
before and after photoirradiation has been explored by means of caged residues
replacing the natural ones in critical positions. Schultz and coworkers described
a mutant phage T4 lysozyme (T4L)containing an aspartyl /3-nitrobenzyl ester
in place of the wild-type Asp20 in the active site of the enzyme [104]. This
residue, along with Glull, is responsible for the catalytic activity [105]. The
caged protein, produced in 37% yield, showed no activity in vitro. Conversely,
activity was restored to a 32% level compared to the wild-type enzyme after
irradiation at 315 nm (Hg-Xe arc lamp 1000 W). In another experiment these
investigators managed to photochemically initiate protein splicing from the
Thermococccus litoralis DNA Vent polymerase by introducing the 2-nitrobenzyl
ether of serine in the place of the conserved Ser1082 [106].
NB- or DMNB-caged aspartates were instrumental in controlling the
dimerization of HIV-1 protease [107].This enzyme exists as a 22-kDa monomer
that self-assembles into the active dimeric aspartyl protease. The active site is
placed at the interface of the homodimer and consists of Asp25 and Asp125,
both necessary for the proteolytic activity [108, 1091. Introduction of a NB-Asp
into position 25 led to minimal proteolytic activity, while its recovery after UV
irradiation (500 W mercury-xenon lamp, 10 min, 0 “C,pH 6.0) was about 97%
as revealed by a fluorescence-based protease assay [110]. The introduction of
the caged aspartate did not prevent dimerization, suggesting that H bonding
involving the wild-type residue is not a prerequisite for monomer association
of HIV-1 protease. Instead, it was believed that it affected the stability of the
dimer [107].
A similar behavior was shown by the H133A mutant of BamHI endonuclease
having incorporated a caged Lys132 [lll].Lys132 along with Glu167, Glu170,
and His133 participates in the salt-bridge network at the dimer interface of the
active wild-type enzyme [112, 1131. Site-directed introduction of DMNB-OCO-
Lys132 (yield 55%) in the H133A mutant did not prevent dimer formation
but abolished enzyme activity almost completely. Photoirradiation (365 nm,
20min, 0°C) led to a recovery of both activity and specificity toward a
substrate DNA (ADNA). A different behavior was shown for the H133A
BamHI mutant incorporating DMNB-Glul67 or DMNB-Glul70 which did not
158 3 Engineering Control Over Protein Function Using Chemistry
I exhibit recovery of activity after photoactivation, suggesting misfolding of the
protein subsequent to the introduction of these caged residues. A site-directed
incorporation of a phenylazo-Phe residue (azoAla) at the same position 132
was also performed (incorporation efficiency of 52%) [114]. Dimer formation
and enzyme activity was achieved by inducing trans-cis photoisomerization
of the azobenzene moiety. The substihtion K132azoAla produced a mutant
enzyme with drastically reduced activity (measured by cleavage efficiency of
a DNA substrate), while after irradiation and trans-cis isomerization almost
full activity was recovered compared to the wild-type enzyme. Thus, in its
trans conformation, the bulkiness of the azoAla residue prevented a correct
association of monomers, while the more compact size of the cis isomer did
not preclude the proper assembly into the active form. Gradual gain of activity
was observed within 5 min of photoirradiation (366 nm, 0°C) without further
increase in a global 20 min exposure time.
Several proteins are naturally produced as inactive proenzymes and acquire
full activity only when cleaved at a specific position by another enzyme.
Caspase-3, a cysteine protease, is a key component of the apoptosis signaling
pathway. Its inactive form procaspase-3 is cleaved at position Ser176 by caspase-
8 in the “death receptor-induced’’ apoptosis pathway, eventually forming the
active tetramer. Majima and coworkers artificially reproduced the activation
mechanism of procaspase-3 by photoinducing the cleavage of the backbone
in a mutant protein containing a Npg residue specifically introduced at
position 176 [115]. The incorporation efficiency of Npg by using an i n vitro
transcription/translation system was only 15%. Nevertheless, photoactivation
(366 nm, O’C, up to 10 min exposure time) of Npg-caspase-3 was followed
within 1 min by a clear activation of enzymatic activity as quantified by the
change in fluorescence of the peptidic substrate Z-DEVD-rhodamine 110.
All these studies were performed i n vitro. Some i n vivo experiments with
caged proteins engineered by nonsense suppression were successful, especially
on the acetylcholine receptor.
In the mouse muscle nicotinic receptor (nAChR), NB-tyrosine was
incorporated at positions 93 and 198 of the (Y subunit. These are conserved
residues crucial for acetylcholine binding. The mutagenized mRNA and the
relative nonsense suppressor tRNA charged with the NB-Tyrwere injected into
Xenopus oocytes. The channel was successfully expressed and incorporated into
the egg membrane [ 1161. In the following voltage-clamp study, a train of about
20 near-UV laser pulses (300-350 nm) was able to increase acetylcholine-
induced conductance across the membrane with about 5% of decaged Tyr
residues in any one flash.
A qualitatively similar result was achieved in another elegant experiment
where the same ion channel was mutagenized by direct incorporation of
NB-Cys or NB-Tyr replacing a conserved leucine residue in the y subunit that
is known to be involved in channel gating [117].As stated by these authors, the
work represented the first successful incorporation of caged amino acids into a
transmembrane segment of a membrane protein. Interestingly, the presence
3.2 Controlling Protein function by Caged Compounds 1 159
of the bulky nitrobenzyl group did not disturb both assembly and trafficking
of the receptor, but likely distorted its conformation leading to an alteration of
the conductance. This condition was reverted by photoactivation performed
with 1-ms pulses of UV light. The different and characteristic kinetics of
channel activation after flash photolysis for tyrosine and cysteine for the
respective caged receptors were determined. Oocytes expressing the mutant
acetylcholine receptor wVall32Npg showed acetylcholine-induced conductance
similar to the wild type! but upon photoinduced cleavage of the backbone in
the localized region of the w subunit about 90% of the current was lost. Thus,
in addition to playing a key role in the correct assembly of the various subunits,
this conserved portion proved to be essential for receptor function [22].
The work of this group clearly showed the importance and usefulness of
caged proteins as tools for the elucidation of protein function in living cells
[118- 1201.
3.2.5
Small Caged Molecules Used to Control Protein Activity
attached to the phosphate. From a synthetic standpoint, there are two ways
of preparing caged phosphopeptides: by using an already assembled caged
phosphoamino acid or by introducing the caged phosphate after cleavage of
the mature peptide from the resin. Phosphopeptides will bind to proteins
usually interacting with phosphoproteins as soon as the cage is removed. With
the help of membrane-penetrating peptide sequences, “peptide interference”
is now on its way into biology labs.
The 20-amino acid residue peptide RS-20, whose sequence derives from
smooth muscle myosin light chain kinase (M LCK),is a well-known calmodulin
binding peptide [144]. Both, RS-20 and LMS-1, a 13-residue peptide derived
from the autoinhibitory domain of MLCK, have the capability of inhibiting
MLCK phosphorylation activity, normally directed toward the molecular motor,
actin binding protein myosin 11, which is involved in physiological phenomena
like cell polarization and locomotion [145, 1461.
The interaction of RS-20 with its target protein calmodulin has been
extensively studied and hydrophobic residues Trp5 and Leu18 were shown to
be critical for binding [147, 1481. Tyr9 in LMS-1 peptide is in turn crucial for
the inhibitory effect as is predicted from mutagenesis studies on MLCK [149].
Walker and others expanded the study on these molecules, both in vitro and in
vivo, using a caged version ofboth peptides (Scheme 3.2-9)[150].Trp5 in RS-20
was replaced with a masked tyrosine bearing a CNB cage on the phenolic group.
The carboxylic group of the cage mimicked the negative charge of a glutamate,
a mutation known to have a negative effect on binding. Accordingly, the caged
RS-20 peptide was largely unable to bind to calmodulin, as assessed in vitro
by a quantitative calmodulin-dependent MLCK assay. The photoproduct 5Y-
RS-20 generated after 10-min irradiation at 300-400 nm showed an apparent
50-fold increase in its affinity toward calmodulin. A similarly Tyr9-caged
LMS-1 proved to be an effective switchable inhibitor of MLCK in vitro, being
indistinguishable from authentic LMS-1 in its inhibitory potency. The effect
of local photoactivation of the two caged peptides was finally assessed in
vivo in fast-moving Newt eosinophil cells [151]. Peptides were introduced by
microinjection in an estimated concentration of 20-100 pM. Photoactivation
9
0 COOH
NO,
+
L I 1
5cgY-RS-20 H,N-ARRKYQKTGHAVRAIGRLSS-COOH
- hv
peptides +
0C
,O
,OH
9cgY-LMS-1 H,N-LSKDRMKKYMARR-COOH
r~ 1
SH op02s-
I I
H2N-LRRACLGLRRASLG-COOH
j
NB-Er. pH 4.0
O z N q s
$;:‘*Hy
O=P-OH
s
O=P-OH
NO2
0 SH 0
I I I I
H2N-LRRACLGLRRASLG-COOH H2N-LRRACLGLRRASLG-COOH
I
HZN-EPQYEEIPILG-COOH
Kck kinase, Co"
NB-Br (75%)
H2N-EPQYEEIPILG-COOH H2N-EPQYEEIPILG-COOH
hv 312nm I
hv 312nm HP-Br
(50-70%) (90%) 02N7$
NB-cagedpeptide (inactive)
HP-cagedpeptide (inactive)
%OH
Scheme 3.2-11 Tyrosine residues equipped with various caging groups rendered
peptides inactive with respect to SH2-domain binding.
This caged PKA showed less than 2% of the activity displayed by the native
protein, while UV irradiation (300-400 nm, up to 15 min) restored about 50%
of the activation of the unmodified enzyme in vitro. Following these in vitro
observations, 3-7 pM solutions of caged PKA were microinjected in living rat
embryo fibroblasts (REF)-10-fold dilution was estimated after injection - and
irradiated with near-UV light (300-400 nm, up to 15 min). In these cells,
photoactivation of PKA led to disruption of actin stress fibers, membrane
rufling, and change of cell shape from flat to rounded, in accordance with
the phenotype observed when unmodified, active catalytic PKA subunit was
injected into the same cells. Microinjected cells that were not exposed to UV
irradiation retained their stress fibers and flat morphology, indicating that the
PKA-inducedpathway had not been activated [21].
PKI is a heat-stable protein first described in 1982 as a potent inhibitor
of PKA [153]. On the basis of a short binding sequence, a potent inhibitor
peptide with the sequence GRTGRRNAI was identified. The underlined
Arg residue played an essential role for the inhibitory behavior of this
3.2 Controlling Protein Function by Caged Compounds 1 165
Ac CONHp
CONHp
"\
I
522 nm
ACC
-ONH~
O f
<?
0-P=O
\
h = 501 nm
\ /N\
MeoX:r-""
Fig. 3.2-7 A chemotactic tripeptide caged
at the N-formyl group.
\
Me0 H
YN'Met-Leu-Phe-OMe
0
3.2.6
Conclusions
References
R. Carraway, R.E. Ikebe, M. Fay, F.S. 159. M.B. Rittinger, K. Volinia, S. Caron,
Walker, Proc. Natl. Acad Sci. U. S. A. P.R. Aitken, A. Leffers, H. Gamblin,
1998, 95,1568-1573. S.1. Smerdon, S.J. Cantley, L.C. Yaffe
152. J.S. Koszelak, M. Liu, J. Lawrence, Cell 1997, 91, 961-971.
D.S. Wood, J . Am. Chem. Soc. 1998, 160. M.E. Nitz, M. Stehn, J . Yaffe, M.B.
120,7145-7146. Imperiali, B. Vazquez,]. Am. Chem.
153. S. Walsh, D.A. Whitehouse, J . Biol. SOC. 2003, 125,10150-10151.
Chem. 1982, 257,6028-6032. 161. D. Chassaing, G. Prochiantz,
154. H.C. Kemp, B.E. Pearson, R.B. A. Derossi, Trends Cell. Bid. 1998, 8,
Smith, A.J. Misconi, L. Vanpatten, 84-87.
S.M. Walsh, D.A. Cheng, J . Biol. 162. M.E. Muir, T.W. Hahn, Angew
Chem. 1986, 261,989-992. Chem., Int. Ed. 2004,43,5800-5803.
155. B.E. McAnaney, T.B. Park, E.S. Jan, 163. T.W. Muir, Annu. Rev. Biochem.
Y.N. Boxer, S.G. Jan, L.Y. Cohen, 2003, 72,249-289.
Science 2002, 296, 1700-1703. 164. J.P. Hahn, M.E. Muir, T.W. Pellois,
156. M.E. Rothman, D.M. Imperiali, J . Am. Chem. Soc. 2004, 126,
B. Vazquez, Org. Biomol. Chem. 2004, 7170-7171.
2,1965-1966. 165. E.L. Bleich, H.E. Day, A.R. Freer, R.J.
157. A.J. Tanner, J.W. Allen, P.M. Shaw, Clasel, J.A. Visintainer, J. Becker,
A.S. Muslin, Cell 1996, 84,889-897. Biochemistry 1979, 18,4656-4668.
158. A. Rothman, D.M. Stehn, 166. M.C. Drabik, S.J. Ahamed, J. Ah,
J. Imperiali, B. Yaffe, M.B. Nguyen, H. Pirrung, Bioconjug. Chem. 2000,
Nat. Biotechnol. 2004, 22, 993-1000. 11,679-681.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
174
I 3 Engineering Control Over Protein Function Using Chemistry
3.3
EngineeringControl Over Protein Function; Transcription Control by Small
Molecules
John T.Koh
Outlook
3.3.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
3.3 Engineerjng Control Over Protein Function; Transcription Control by Small Molecules I 175
activate one of many cellular pathways responsive to the same ligand, and may
further provide new strategies to rescue disease-associated mutants of ligand-
dependent proteins. In addition, new methods to control gene expression with
light can be used to spatially and temporally pattern genes in tissues.
3.3.2
The Role of Ligand-dependent Transcriptional Regulators
(4
..fro-
HO
0
OH -.
(b)
,
(c)
3.3.3
Engineering New Ligand Specificities into NHRs
3.3.4
The Requirement of “Functional Orthogonality”
3.3.5
Overcoming Receptor Plasticity
(-./.&?&C?F5
Fig. 3.3-4 The estrogen receptor has sufficient flexibility to accommodate a diverse array
of ligands that interact with ER a t low or sub-nanomolar potencies.
selective for the mutant over the wild-type receptor. These studies highlight
the remarkable ability of the wild-type receptor to accommodate ligands that
differ in hydrophobic shape even when modeling might suggest that these
ligands should not be accommodated by the ligand-binding site. In general,
protein plasticity limits the use of “bump and hole” engineering of flexible
proteins.
Our group has therefore focused on exploring methods to manipulate polar
groups to impart specificity to engineered ligandlreceptor pairs, following
the general notion that polar interactions impart specificity to molecular
recognition events because mismatched polar interactions cannot be easily
avoided by simple side-chain reorganization. In an early work on the retinoic
acid receptor, hormone-binding selectivity was changed by modifying a key
arginine residue, (Arg278) that forms a salt bridge to the carboxylate of
bound retinoic acid [39].Although a neutral ethylamide analog of retinoic acid
displayed some mutant versus wild-type selectivity, this analog was notably
less potent than the wild-type retinoic acid- RAR (retinoic acid receptor)
pair and showed only partial selectivity. A more dramatic attempt to impart
selectivity through the manipulation of polar interactions was the reversal of
a ligand-receptor salt bridge by creating a guanidine functionalized retinoid,
which showed selective but weak activity for the charge-complementing mutant
RARy (S289G/R278E).The weaker cellular activity of this ligand-receptor pair
is not entirely unexpected in the light of studies by Warshel suggesting that salt-
bridge interactions are stabilized protein dipoles that would be destabilizing
if the salt bridge were reversed [40, 411. In general, charged or neutral polar
182
I groups found in the interior of proteins are stabilized by multiple polar
3 Engineering Control Over Protein Function Using Chemistry
interactions from the protein in the form of ion pairs, hydrogen bonds, and
local or macrodipoles. Adding, removing, or rearranging polar groups found in
the interior of protein-ligand complexes is generally disfavored as it leaves the
associated polar groups unsatisfied. The solution to this problem of selectivity
is not immediately obvious but in at least some cases can be solved.
The Koh and the Katzenellenbogen groups simultaneously explored estro-
gen analogs that could complement the same Glu353 + Ala or Ser mutation
in the estrogen receptor [42-441. Glu353 forms an intramolecular salt bridge
with Arg274 and both residues form key hydrogen bonds to the 3-hydroxyl
of E2 (Fig. 3.3-5(a)).Mutations to Glu353 greatly reduce the receptor’s affinity
for the natural ligand E2. While a number of estrogen analogs bearing neutral
functional groups in place of the 3-hydroxyl of E2 could activate the Glu353
mutants with high affinity, in almost all cases, these analogs activated the
wild-type ERs with equal or greater potency. A few low-potency ligands ( t 2 %
wild-type potency) show receptor selectivities as high as 34-fold (mutantlwild
type) (Fig. 3.3-G(a))[42]. By comparison, carboxylate-functionalized estrogen
analogs designed to restore (intermolecularly) the lost protein salt bridge
with Arg274 form high affinity/potency complexes with the mutant receptor
(Fig. 3.3-5(b)).These complexes are not of higher affinity than the analogs
having neutral appendages, suggesting that the favorable energetics of form-
ing a salt bridge with Arg274 is offset by the substantial cost of desolvating
the ligand-associated carboxylate [44].However, carboxylate-functionalizedlig-
ands of appropriate size and shape provided a significant gain in selectivity,
which can be as high as 95- to 400-fold in favor of the mutant over the wild-type
0
RTP = I .S RTP = 15 RTP = 17 KTP = 0.9 RTP = 2
RS = 34 RS = 1.3 R5= I1 KS = 9.2 RS = 1.6
Fig. 3.3-6 Complements for ERa(E353A). structure provide high selectivity without
(a) Neutral modifications tend t o provide significant loss in affinity. RTP - relative
only modest mutant versus wild-type transcription potency; RS - receptor
selectivity. (b) Acidic analogs of appropriate selectivity (ECSowild type/ECSomutant).
3.3.6
Nuclear Receptor Engineering by Selection
Miller and Whelan were perhaps the first to recognize the potential of screening
or selecting NHR mutants from receptor libraries to identify ERs with modified
ligand specificities [46,47].Using error prone PCR, they generated populations
of mutant ERs in yeast that decreased responsiveness to E2 but has increased
responsiveness to the synthetic diphenyl indene-ol GRl32706X. Despite their
184
I elegant plan, the selected mutants had good potencies but relatively modest
3 Engineering Control Over Protein Function Using Chemistry
3.3.7
Ligand-dependent Recombinases
i = S’-TATAAClTCGTATAGATATGCTATACGAAGTTAT-3’
1
(b)
edRE-ER a
ER ligand
11111,
ATG STOP
have been reported that make use of Cre or the site-specific recombinase Flp
including Cre, Cre-PR (progesterone receptor fusion), Cre-GR (glucocorticoid
receptor fusion), and EcR-Flp [Sl-531.
Although some of these ligand-dependent recombinases have been
reengineered to selectively respond to synthetic receptor antagonists such
as Tamoxifen responsive Cre-ER or RU486 responsive Cre-PR, the need to
treat cells for up to several days with these potent receptor antagonists may
have unwanted side effects, particularly, when used in in vivo developmental
models [SO, 531. This suggests that functionally orthogonal ligands may still
have an important role to play, providing the next generation of highly selective
ligand-dependent recombinases.
3.3.8
Complementation/Rescue o f Genetic Disease
nuclear receptors are associated with a family ofhuman genetic diseases, which
include VDR mutations associated with rickets, TR mutations associated with
resistance to thyroid hormone, mineralcorticoid resistance, PPAR mutations
associated with certain forms of severe insulin independent diabetes,
and androgen receptor mutations associated with androgen insensitivity
syndrome [67-691. Additionally, mutations to the androgen, estrogen, and TRs
are associated with the pathology of prostate, breast, and thyroid cancers [70].
A significant subset of these disease-associated mutations is located at the
receptor-hormone interface suggesting that appropriately designed hormone
analogs may be able to “complement” or “rescue” the function of these
receptors. Unlike current gene therapy strategies that use nucleic acid analogs,
hormone analogs typically have good druglike properties (i.e., bioavailability,
biostability) suggesting that hormone receptor complements may represent a
new strategy toward developing new treatments for genetic disease.
The possibility of using hormone analogs to rescue nuclear receptor
mutations was perhaps first explored by DeGroot et al. who demonstrated that
some synthetic hormone analogs were more potent than triiodothyronine (T3)
in mutant forms of TR, associated with resistance to thyroid hormone [71].
More recently, Feldman and Peleg similarly screened vitamin D3 analogs
that partially complement VDR mutants associated with vitamin D resistant
rickets [72], and Chatterjee et al. have identified PPAR agonists that can
restore activity to PPAR mutants associated with severe insulin independent
diabetes [73]. The first example of a molecule being designed as a rescuing
function to a mutant protein associated with a genetic disease was the
development of the thyroid hormone analog HY1, which was designed
to complement the RTH (thyroid hormone resistance) associated mutant
TRB(R320C)[74].This study represented a significant advance over the earlier
studies by DeGroot, in that the complementing analog was selective for the
mutant form of TRB over the TRcr subtype. In more recent work, new thyroid
hormone analogs have been developed that restore efficacy and potency to three
ofthe most common RTH-associated mutants Arg320 -+ Cys, Arg320 + His,
Arg316 + His (Fig. 3.3-9) [75, 761. All of the compounds used to rescue these
mutations affect the carboxylate-binding cluster of arginines, and are based on
the same general complementation strategy involving more neutral hydrogen
bonding groups in place of the ligand’s carboxylate. This suggests that once
general rules for designing complementing analogs are established, the process
of identifying new compounds may be reasonably efficient.
It is important to distinguish these “functional rescue” studies from sev-
eral other important studies showing that small molecules can stabilize or
chaperone folding of mutant proteins such as mutant p53 associated with
cancer [77, 781, mutant forms of V2R associated with nephrogenic diabetes
insipidus [79, SO], mutant forms of opsin associated with retinitis pigmen-
tosa [81],and B-glucosidase mutants associated with gaucher disease [82, 831.
By contrast, nuclear receptor mutants are often well-folded,stable proteins that
188
I 3 Engineering Control Over Protein Function Using Chemistry
OH
A’
H HY1
TRfl(R320C) EC,=7.0 nM TRp(R320H) EC= , 0.46 nM
rnuffrx selectivity = 5.5 rnuffu selectivity = 1.O
H KG-8 H
TRp(R320C) EC& 7 nM TR[$(R316H)EC=, 12.6 nM
rnuffn selectivity = 12 muffu selectivtty = 4
3.3.9
De Novo Design of Ligand-binding Pockets
R
HO LCH no
1,25dihydroxyvitaminD, ss-Ill
Wild-type VDR; EC,=2.0 nM VDR(R274L); EC=, 7.0 nM
VDR(R274L); EC, 2000 nM
LG190155 0
Wild-type VDR; EC,= 110.0 nM ss-Ill
VDR(R274L); EC, = 85 nM VDR(R274L); EC=, 3.3
lysozyme [86].Although these de novo binding sites have only weak affinity for
these solvent substrates, they clearly demonstrated that new small-molecule
binding sites could be created into proteins. Barbas and Schultz have been
able to use this strategy to create zinc finger domains that bind only in the
presence of isoindole derivatives [87]. By fusing these inducible zinc finger
domains to transactivation domains, the isoindoles can be used to remotely
regulate gene transcription. Currently, the affinity of these de novo designed
cavities for their ligands are of only modest potency. However, combined with
recent advances in computational methods to de novo design ligand-binding
cavities [88-911, this general strategy provides a potentially powerful approach
to creating ligand-inducible transcriptional regulators.
3.3.10
Light-activatedGene Expression from Small Molecules
demonstrated that photocaged forms of RNA and DNA can be injected into
zebrafish oocytes (single cell stage) and are sufficiently stable to be carried
into essentially all cells ofthe developed organism [lOG]. The caged RNA could
then be released in a subpopulation of cells where it is locally translated into
gene product. The use of caged nucleic acids to photoregulate gene expression
was first demonstrated by Hasselton et al. in mouse models [103-1051. The
application of caged RNAs has recently been expanded to light-activated RNAi
methods by Friedman [107].
References
caged transfected plasmid 11: delivery Targeting expression with light using
by gene gun to organ cultured caged DNA, 1.Biol. Chem. 1999, 274,
corneas, Invest. Ophthalmol. Vis. Sci. 20895-20900.
1997,38,2083-2083. 106. H. Ando, T. Fumta, R.Y. Tsien,
104. F.R. Haselton, W.C. Tseng, M.S. H. Okamoto, Photo-mediated gene
Chang, Light activated protein activation using caged RNA/DNA in
expression using caged transfected zebrafish embryos, Nat. Genet. 2001,
plasmid I: delivery by liposomes to 28,317-325.
cultured retinal endothelium, Invest. 107. S. Shah, S. Rangarajan, S.H.
Ophthalmol. Vis. Sci. 1997, 38, Friedman, Light-activated RNA
2082-2082. interference, Angew. Chem. Int. Ed
105. W.T. Monroe, M.M. McQuain, M.S. 2005,44,1328-1332.
Chang, J.S. Alexander, F.R. Haselton,
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I199
4
Controlling Protein- Protein Interactions
4.1
Chemical Complementation: Bringing the Power o f Genetics to Chemistry
Outlook
4.1.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
200
I 4 Controlling Protein-Protein Interactions
the cell, such as nucleic acids and small molecules. We also consider the pos-
sibilities for exploiting the two-hybrid assay for chemical discovery-extending
the power of genetics to chemistry not naturally carried out in the cell.
The two-hybrid assay works by detecting protein-protein interactions as
reconstitution of a transcriptional activator, a natural eukaryotic transcription
factor, and as activation of a reporter gene. One protein is fused to the
DNA-binding domain (DBD) of the transcriptional activator, and the other
protein is fused to the activation domain (AD).If the two proteins bind to one
another, they effectively dimerize and hence reconstitute the transcriptional
activator (Fig. 4.1-2). In practice, this assay is used not just to test a single
protein-protein interaction, but to test all of the proteins expressed in a given
organism or cell line for binding to the protein of interest. A library of AD-
fusion proteins, encoding all ca lo4 different proteins, is transformed en masse
into an appropriate two-hybrid selection strain containing the DBD-protein
fusion of interest. Only cells expressing an AD-protein fusion that binds
to the DBD-protein fusion will then survive under the appropriate reporter
gene selection conditions. The assay is general because the transcription-
based selection works for any protein-protein interaction. Therefore, while
4. I Chemical Complementation: Bringing the Power ofGenetics to Chemistv I 201
Fig. 4.1-2 In the yeast two-hybrid system, activator recruits the transcriptional
dimerization of fusion proteins machinery t o the promoter region of the
X-DNA-binding domain and Y-activation reporter gene, initiating its transcriptional
domain reconstitutes the transcriptional activation.
activator. The reconstituted transcriptional
the cell. Furthermore, these so-called n-hybrid assays extend these powerful
transcription-based genetic assays to chemistry not naturally carried out in
the cell. This extension should allow these genetic assays to be used not only
for the discovery of biological pathways but also for new chemistry, including
drug discovery and the directed evolution of molecules with new functional
properties.
4.1.2
History/Developrnent
DBD
I >
A I
DBD
I >
I DNA binding site I I Reporter gene I I DNA binding site I I Reporter gene I
DBD DBD
4.1.2.5 Catalysis
In all the previous applications, the n-hybrid assay is used to detect a binding
event, whether it is protein, DNA, RNA, or small molecule binding. Our
laboratory and others have been interested in the idea that this powerful
genetic assay could be brought to bear on a broader variety of questions.
Several different approaches have now been devised for linking enzyme
catalysis to reporter gene transcription using the n-hybridassay. Our laboratory
introduced “Chemical Complementation”, which detects enzyme catalysis of
bond formation or cleavage reactions on the basis of covalent coupling of two
small molecule ligands in vivo (Fig. 4.1-4) [20]. In this assay, the enzyme is
introduced as a fourth component to the small molecule yeast three-hybrid
system, and the linker in the small molecule CID acts as the substrate for the
enzyme. Bond formation is detected as synthesis of the CID and hence the
activation of an essential reporter gene; bond cleavage is detected as cleavage
of the CID and hence the repression of a toxic reporter gene. In theory, this
approach should be readily extended to new chemistry, simply by synthesizing
small molecule heterodimers with different chemical linkers as the enzyme
substrates. Inspired by traditional genetics, our hope is to make a general
complementation assay that would link enzyme catalysis of a broad range of
chemical reactions to cell survival-extending genetic selections to chemistry
beyond that naturally carried out in the cell.
4. I Chemical Comp/ementation: Bringing the Power ofGenetics t o Chemistry I 207
E I
Substrate
DBD
I
I DNA binding site I I Reporter gene I
Fig. 4.1-4 Chemical Complementation. A either cleavage or formation of the bond
reaction-independent complementation between the two small molecules can be
assay for enzyme catalysis based on the detected as a change in transcription o f the
yeast three-hybrid assay. A heterodimeric reporter gene. The assay can be applied t o
small molecule bridges a DNA-binding new chemical reactions simply by
domain-receptor fusion protein and an synthesizing small molecules with different
activation domain-receptor fusion protein, substrates as linkers and adding an enzyme
activating transcription o f a downstream as a fourth component t o the system.
reporter gene in vivo. Enzyme catalysis o f
4.1.3
General Considerations
Dexamethasone
Me0
FK506 SLF
Trimethoprim
HO&
Estrone Biotin
Fig. 4.1-5 Small molecules used t o create chemical inducers of dimerization (CIDs) for
the yeast three-hybrid system.
210
I 4 Controlling Protein-Protein Interactions
available, including Stratagene and Clontech, which market the Gal4 system,
Origene, for the LexA system, and Invitrogen, which offers versions of both
systems. All of the basic features of the two-hybrid system have been covered
already in several excellent reviews and the chapters on methods.
In our laboratory we have used the Brent two-hybrid system to build our
Dex-Mtx yeast three-hybrid system. We favor the Brent system, which uses
LexA, an E. coli transcription factor, and B42, an artificial activator isolated
from E. coli genomic DNA. Both LexA and B42 are orthogonal to standard
yeast genetic tools and nontoxic to the yeast cell, yet the artificial LexA-B42
transcriptional activator is on par with the strongest transcriptional activators
endogenous to S. cerevisiae [31].Moreover, the LexA system permits the use of
the tightly regulated GAL1 promoter to drive the expression of the LexA DBD
and B42 AD-protein fusions by varying the ratio of galactose and glucose in the
growth medium. As reported by Lin et al., we use pMW103, a multicopy 2~
plasmid with a HIS3 maker, to encode the LexA DBD fusions and pMW102,
a multicopy 2,u plasmid with a TRPl marker, to encode the B42 AD fusions.
Rather than the original EGY48 LEU2 selection strain, we chose the FY251
strain (MATa trplA63 his3A200 ura3-52 leuZAlGal+), which provides an
additional selective marker for greater flexibility. The LEU2 or URA3 markers
can then be used either for the transcription activation growth selection or
introduction of additional plasmids. In this initial publication, we then used the
lacZ reporter plasmid pMW112, which encodes the lacZ gene under control of
eight tandem LexA operators. Thus, small molecule CID-induced transcription
activation could be detected using standard lacZ transcription assays either on
plates or in liquid culture [25]. Further optimization of the yeast three-hybrid
system in our lab led us to conclude that integration of either the AD or DBD
into the yeast chromosome stabilizes the transcription read-out of the reporter
gene without loosing transcriptional strength, effectively reducing the number
of false positives in the detection of novel ligand-receptor interactions [34].
Fig. 4.1-6 The bacterial two-hybrid system and Y. Binding ofthe Acl repressor t o the A
developed by Hochschild and coworkers. operon followed by dirnerization o f X and Y
The Acl repressor and the a-subunit o f recruits RNAP leading t o transcription
RNAP are fused t o two arbitrary proteins, X activation o f a downstream reporter gene.
4. I Chemical Complementation: Bringing the Power ofGenetics t o Chemistry I 213
in our opinion, comes from Golemis and Brent, in which they estimated that
the KD cutoff for the yeast two-hybrid assay is ca 1 p M [4G].Assuming that
the proteins are being expressed at ca 1 p M concentrations, the two-hybrid
assay can only detect relatively high-affinity interactions (ca K D = 1 pM).
Thus, while the two-hybrid assay is quite successful at identifying new
interactions, it is probably not appropriate to assume that a high-throughput
two-hybrid assay gives a snapshot of all interactions. In fairness, however,
it should be pointed out that traditional affinity chromatography approaches
are even further impaired because they rely on the natural abundance of
any given protein in the cell. Extending this analysis to drug discovery
using the small molecule three-hybrid assay, it is our opinion that the three-
hybrid assay was long underutilized because the original systems had low
sensitivity owing to the CID anchor. Recently, we have shown that our
Mtx three-hybrid system has a KD cutoff of ca 100nM [29].Consistent
with this idea, GPC Biotech reported last year the use of the Mtx three-
hybrid system for identification of protein targets of CDK inhibitors [47].
Interestingly, Hochschild and coworkers have shown that they can build
additional sensitivity into their bacterial two-hybrid assay by adding cooperative
interactions [48].
The n-hybrid assay can also be used for directed evolution. For example,
Pabo and coworkers have adapted a bacterial one-hybrid assay to evolve zinc-
finger variants with defined DNA-binding specificities [49].Starting with a
three zinc-finger protein that has nanomolar affinity for its DNA-binding
site, the authors replaced the binding site for the third zinc finger with a
new DNA sequence and then randomized the third finger to evolve a zinc-
finger variant with increased affinity for the target sequence. Impressively,
the evolved zinc finger showed DNA affinity within 10-fold of the wt protein,
KD = 0.01 nM, and a 10- to 100-fold preference for the modified over the
wt DNA sequence. Given the low K D cutoff and the fact that the n-hybrid
assay is governed by equilibrium binding, there are two likely limitations to
using this assay for directed evolution. First, the assay cannot effectively detect
initial, weak binders. Second, the assay is limited in its ability to distinguish
evolved variants on the basis of improvements in KD since energy differences
of only a few kilocalories per mole determine whether a molecule is bound
at equilibrium. In theory, however, these limitations could be overcome by
varying the concentration of the n-hybrid components or, again, by building in
a series of tunable, cooperative interactions. Pabo and coworkers, then, choose
their problem well. They began with a zinc-finger protein with two out of three
zinc fingers intact. This initial binding affinity enabled them to select good
binders in a single round of selection, rather than trying to improve binding
affinity through multiple rounds of selection. A similar analysis suggests that
the n-hybrid assays may be ideally suited to catalysis applications since large
differences in catalytic activity are needed to significantly affect the half-life of
product formation.
216
I 4 Controlling Protein-Protein lnteractions
4.1.4
Applications
Although introduced only in 1989, the yeast two-hybrid assay has emerged as
an integral tool for biology research. Two-hybrid screens now appear regularly
in the biology literature. Genome-widetwo-hybrid screens are even the focus of
major research publications. Somewhat surprisingly then, there have been few
applications of the related n-hybrid technologies to detect protein interactions
with DNA, RNA, and small molecules, or applications beyond cloning. Here
we look at more recent applications of n-hybrid assays with an eye for asking
whether this discrepancy results from the relative power of these different
n-hybrid assays or rather the biases of current research.
a ND - not determined
identify peptide aptamers that inhibit Cdk2 from a library of random peptide
sequences (Table 4.1-1) [52]. The 20-residue peptide library was displayed in
the active site loop of E. coli thioredoxin (TrxA).The TrxA loop library was
fused to the AD, and Cdk2 was fused to the DBD. In a single round of assay,
6 x lo6 TrxA-AD transformants, a very small percentage of the 20mers
possible, were tested for binding to LexA-Cdk2. From this assay, they isolated
66 colonies that activated transcription of both a LEU2 and a lacZ reporter
gene. Remarkably, these colonies converged on 14 different peptide sequences
that bound Cdk2 with high affinity. Using surface plasmon resonance, the
peptide aptamers were shown to bind Cdk2 with KDs of 30-120 nM. In kinase
inhibition assays, the peptide aptamers had ICsos for the CdkZ/cyclin E kinase
complex of 1- 100 nM. What is particularly impressive about this experiment is
that nanomolar affinity ligands are being isolated in a single round of selection
from a library only on the order of 106-108. Similar results have been obtained
using peptide aptamers in a traditional genetic selection [53].
Given the success of this and related “aptamer” selections, it is somewhat
surprising that these “aptamer” scaffolds are not more widely used.
There are several potential advantages to directed evolution over traditional
monoclonal antibody technology for generating selective binding proteins.
Optimistically, six months are required from the start of immunization,
through immortalization, and finally screening to generate a monoclonal
antibody. On the other hand, if several peptide aptamer libraries were
maintained for routine use, the libraries could be screened against a new target,
false positives could be sorted out, and biochemical assays could validate a
target in less than a month and at considerably less expense. Moreover, protein
218
I scaffolds other than antibodies may prove more robust for use as reagents and
4 Controlling Protein-Protein lnteractions
1 round of -
s e T I d g I
F3 ZF
2F3 F
Fig. 4.1-8 Development ofzinc fingers the cy-subunit o f RNAP. I f ZF3 bound t o the
specific for a specific DNA sequence using a first site with high affinity, the RNAP
one-hybrid assay adapted from a bacterial complex would be recruited, activating
two-hybrid system. Zinc fingers (ZF) 1, 2, transcription o f a HIS3 reporter gene.
and 3 from the Zif268 protein were fused to Significantly, in just one round o f assay,
the Call 1 protein. The Gal4 protein, which several proteins were identified that bound
binds Gall 1 with high affinity, was fused to specifically to the target DNA sequence.
24 codons at six amino acids per three zinc finger = (246)3),which cannot
be covered by this high-throughput method. Thus, the authors are limited to
randomizing one finger at a time, while keeping the other two unchanged. We
believe that conserving the high affinity of two zinc fingers for the DNA may be
important for the success of Pabo and coworkers’ directed evolution, because
starting a directed evolution with a high-affinity protein for DNA ensures the
evolution of proteins within the dynamic range of the n-hybrid system. For this
zinc-finger evolution, they created a library of ca 10’ variants, and identified
a total of nine sequences that bound specifically to three target DNAs with a
preference of 10-to100-fold for the modified over the wt DNA.
Comparing their results for the zinc-finger evolution using the bacterial
hybrid system with earlier results obtained in a similar zinc-finger evolution
study using phage display, Pabo and coworkers conclude that the affinity and
specificity of the selected zinc fingers is superior to those obtained in earlier
phage display studies. Moreover, the bacterial hybrid system is a more rapid
alternative to phage display because it permits isolation of functional fingers
in a single selection step instead of using multiple rounds of enrichments.
Speaking to the power of this approach, Sangamo uses a modified one-hybrid
assay for its selection of artificial DNA-binding proteins for commercial appli-
cations [55, 561. The success found here raises the question of other binding
interactions. One could speculate that the success here depends on starting
with two known zinc fingers with high affinity for their DNA target, except that
the protein “aptamer” scaffold selections described in the previous section
have begun with scaffolds with no measurable affinity for their protein target.
interactions. An
impressive application ofthis system is the cloning of a regulatory protein from
Caenorhabditis elegans that binds to the 3’ untranslated region of the FEM-3
(fern-33’UTR)and mediates the sperm/oocyte switch in hermaphrodites [57].
In this assay, a bifunctional RNA plasmid possessing fern-33’UTRand the RNA
ligand for the MS2 coat protein was introduced into a yeast strain expressing
a DBD-MS2 upstream of the HIS3 and lac2 reporter genes. Into this strain,
a complementary DNA-AD library was introduced. Cells containing a positive
protein-RNA interaction were selected first for HIS3 and lacZ activation
followed by screening for the presence of the bifunctional RNA plasmid. The
RNA plasmid from successful candidates was lost by reverse selection and
the cells were tested again for lacZ activation to reduce the number of false
positives. Cells that failed to activate lacZ after plasmid loss were tested for
fern-33’UTR binding specificity by reintroduction of the bifunctional RNA
plasmids. The protein encoded in the only cDNA-AD that satisfied all selection
and screening criteria was found to have 93% homology at the nucleotide level
with two genes encoded in the C. elegans genome. Further testings confirmed
these genes to be regulators of the sperm/oocyte switch in hermaphrodite
C. elegans. The specificity with which the RNA three-hybrid assay selected
just one protein from thousands for the selected protein-RNA interaction
illustrates the power of this assay for finding novel protein-RNA interactions
[lG].The recent discovery, for example, of RNAi highlights the need not to forget
about molecules other than proteins when carrying genetic assays [58, 591.
4.1.4.5 Catalysis
The widespread utility and robust transcription read-out of the n-hybrid system
motivated several laboratories to develop general methods to detect enzyme
4. I Chemical Complementation: Bringing the Power ofGenetics to Chemistry I 221
4.1.5
Future Development
are being studied increasingly at the systems level, the two-hybrid assay has
the potential to be quite useful for analyzing total protein dynamics in living
cells. As seen in the PCA work by Michnick and coworkers, it is here that
technical improvements will prove important for the two-hybrid assay.
But it is the n-hybrid assays that have the potential to extend the power
of genetics to molecules other than proteins, such as nucleic acids and
small molecules. Despite this enormous potential, use of these other n-hybrid
assays pales in comparison to that of the two-hybrid assay. As we argue in
this chapter, a consideration of the published literature suggests that this
discrepancy is not the result of some inherent technical limitation to the
n-hybrid assays, but rather likely reflects the bias of current practice. Thus,
it is here that we believe there is most potential for the future development
of the n-hybrid assay and indeed genetics as a whole. Technically, the n-
hybrid assays probably still can be further developed for different classes
of molecules or posttranslational modifications. But already in their present
form these assays seem to have tremendous potential for biological discovery,
uncovering new functions for the many classes of molecules that make up
the cell.
These advances also expand our ability to engineer the cell to harness
its synthetic and functional capabilities for chemical discovery. Just as
protein engineering impacted both basic research and the biotechnology
and pharmaceutical industries in the last 25 years, so should cell engineering
in this century. Such systems engineering likely will require a much more
quantitative understanding of cellular processes, and accordingly the n-hybrid
assays will have to be characterized and rebuilt on this level, allowing, for
example, the K D cutoff of the assay to be dialed-in. Using this genetic assay
in entirely new ways should then open the door for new chemistry, with the
potential to match the complexity of cell function.
References
1. S. Fields, 0. Song, A novel genetic Manual, 1st ed., Cold Spring Harbor
system to detect protein-protein Lab Press, New York, 2002.
interactions, Nature 1989, 340, 5. B.T. Carter, H. Lin, V.W. Cornish, in
245-246. Directed Molecular Evolution of Proteins,
2. E.M. Phizicky, S. Fields, Protein- (Eds.: S. Brakmann, K. Johnsson),
protein interactions: methods for Wiley-VCH Verlag, Weinheim, 2002.
detection and analysis, Microbiol. Rev. 6. E. Phizicky, P.I. Bastiaens, H. Zhu,
1995,59,94-123. M. Snyder, S. Fields, Protein analysis
3. L. Keegan, G. Gill, M. Ptashne,
on a proteomic scale, Nature 2003,
Separation of DNA binding from the
transcription-activating function of a 422,208-215.
eukaryotic regulatory protein, Science 7. C.R. Geyer, R. Brent, Selection of
1986, 231,699-704. genetic agents from random peptide
4. E.A. Golemis, Protein-Protein aptamer expression libraries, Methods
Interactions: a Molecular Cloning En~ymol.2000,328,171-208.
224
I 4 Controlling Protein-Protein interactions
8. H. Lin, V.W. Cornish, In vivo 18. S.L. Schreiber, Chemistry and biology
protein-protein interaction assays: of the immunophilins and their
beyond proteins we would like to immunosuppressive ligands, Science
thank Tony Siu, Dr. Charles Cho, and 1991,251,283-287.
the members of our lab for their 19. E.J. Licitra, 7.0. Liu, A three-hybrid
helpful comments as we were system for detecting small ligand-
preparing this manuscript, Angew. protein receptor interactions, Proc.
Chem., Int. Ed. Engl. 2001,40, Natl. Acad. Sci. U.S.A. 1996, 93,
871-875. 12817-12821.
9. H. Lin, V.W. Cornish, Screening and 20. K. Baker, C. Bleczinski, H. Lin,
selection methods for large-scale G. Salazar-Jimenez,D. Sengupta,
analysis of protein function, Angew. S. Krane, V.W. Cornish, Chemical
Chem., Int. Ed. Engl. 2002, 41, complementation: a
4402-4425. reaction-independent genetic assay for
10. L.H. Hwang, L.F. Lau, D.L. Smith, enzyme catalysis, Proc. Natl. Acad. Sci.
C.A. Mistrot, K.G. Hardwick, E.S. U.S.A. 2002, 99,16537-16542.
Hwang, A. Amon, A.W. Murray, 21. S.M. Firestine, F. Salinas, A.E. Nixon,
Budding yeast Cdc20: a target of the S.J. Baker, S.j. Benkovic, Using an
spindle checkpoint, Science 1998, 279, AraC-based three-hybrid system to
1041- 1044. detect biocatalysts in vivo, Nut
11. J.A. Chong, G. Mandel, in The Yeast Biotechnol 2000, 18, 544-547.
Two-Hybrid System, (Eds.: B. P.L., 22. D.D. Clark, B.R. Peterson, Rapid
S. Fields), Oxford University Press,
detection of protein tyrosine kinase
New York, 1997, pp. 289-297. activity in recombinant yeast
12. M.K. Alexander, D. Bourns, V.A.
expressing a universal substrate, /.
Zakian, in Two-Hybrid Systems,
Proteome Res. 2002, I , 207-209.
Methods and Protocols, Vol. 177 (Ed.:
23. D.M. Spencer, T.J. Wandless, S.L.
P.N. MacDonald), Humana Press,
Schreiber, G.R. Crabtree, Controlling
New Jersey, 2001, pp. 241-260.
13. M.M. Wang, R.R. Reed, Molecular signal transduction with synthetic
cloning of the olfactory neuronal ligands, Science 1993, 262, 1019-1024.
24. J.F. Amara, T. Clackson, V.M. Rivera,
transcription factor Olf-1 by genetic
selection in yeast, Nature 1993, 364, T. Guo, T. Keenan, S. Natesan,
121-126. R. Pollock, W. Yang, N.L. Courage,
14. S. jaeger, G. Eriani, F. Martin, Results D.A. Holt, M. Gilman, A versatile
and prospects of the yeast three-hybrid synthetic dimerizer for the regulation
system, F E E S Lett. 2004, 556, 7-12. of protein-protein interactions, Proc.
15. B. Zhang, B. Kraemer, D. SenGupta, Natl. Acad. Sci. U.S.A. 1997, 94,
S. Fields, M. Wickens, Yeast 10618-10623.
three-hybrid system to detect and 25. H. Lin, W. Abida, R. Sauer, W.V.
analyze interactions between RNA and Cornish, Dexamethasone-
protein, Methods Enzymol. 1999, 306, methotrexate: an efficient chemical
93-113. inducer of protein dimerization in
16. D.J. SenGupta, B. Zhang, B. Kraemer, vivo,J. Am. Chem. SOC.2000, 122,
P. Pochart, S. Fields, M. Wickens, A 4247-4248.
three-hybrid system to detect 26. S.J. Kopytek, R.F. Standaert, J.C. Dyer,
RNA-protein interactions in vivo, Proc. J.C. Hu, Chemically induced
Natl. Acad. Sci. U.S.A. 1996, 93, dimerization of dihydrofolate
8496-8501. reductase by a homobifunctional
17. N. Kley, Chemical dimerizers and dimer of methotrexate, Chem. Biol.
three-hybrid systems: scanning the 2000, 7,313-321.
proteome for targets of organic small 27. S. Gendreizig, M. Kindermann,
molecules, Chem. Biol. 2004, I I , K. Johnsson, Induced protein
599-608. dimerization in vivo through covalent
References I225
labeling,]. Am. Chem. SOC.2003, 125, 3 6. S.L. Dove, J.K. Joung, A. Hochschild,
14970-14971. Activation of prokaryotic transcription
28. S.S. Muddana, B.R. Peterson, Facile through arbitrary protein-protein
synthesis of cids: biotinylated estrone contacts, Nature 1997, 386, 627-630.
oximes efficiently heterodimerize 37. E.A. Althoff, V.W. Cornish, A bacterial
estrogen receptor and streptavidin small-molecule three-hybrid system,
proteins in yeast three hybrid systems, Angew. Chem., Int. Ed. Engl. 2002, 42,
Org. Lett. 2004, 6, 1409-1412. 2327-23 30.
29. K.S. de Felipe, B.T. Carter, E.A. 38. S.W. Michnick, I. Remy, F.X.
Althoff, V.W. Cornish, Correlation Campbell-Valois, A. Vallee-Belisle,
between ligand-receptor affinity and J.N. Pelletier, Detection of
the transcription readout in a yeast protein-protein interactions by protein
three-hybrid system, Biochemistry fragment complementation strategies,
2004,43,10353-10363. Methods Enzymol. 2000, 328, 208-230.
30. W.M. Abida, B.T. Carter, E.A. Althoff, 39. 1. Remy, J.N. Pelletier, A. Galarneau,
H. Lin, V.W. Cornish, Receptor- in Protein-Protein Interactions, (Ed.:
dependence of the transcription E. Golemis), Cold Spring Harbor
read-out in a small-molecule Laboratory Press, New York, 2001,
three-hybrid system, Chembiochem pp. 449-475.
2002,3,887-895. 40. S.W. Michnick, 1. Remy, F. Valois, in
31. J. Gyuris, E. Golemis, H. Chertkov, Methods in Enzymology,Vol. 14, (Eds.:
R. Brent, Cdil, a human G1 and S J. Abelson, S. Emr, J. Thorner),
phase protein phosphatase that Academic Press, London, 2000,
associates with Cdk2, Cell 1993, 75, pp. 208-230.
791-803. 41. F. Rossi, C.A. Charlton, H.M. Blau,
32. M. Vidal, R.K. Brachmann, A. Fattaey, Monitoring protein-protein
E. Harlow, J.D. Boeke, Reverse interactions in intact eukaryotic cells
two-hybrid and one-hybrid systems to by beta-galactosidase
detect dissociation of protein-protein complementation, Proc. Natl. Acad.
and DNA-protein interactions, Proc. Sci. U.S.A. 1997, 94,8405-8410.
Natl. Acad. Sci. U.S.A. 1996, 93, 42. T. Wehrman, B. Kleaveland, J.H. Her,
10315-10320. R.F. B a h t , H.M. Blau,
33. H.M. Shih, P.S. Goldman, A.J. Protein-protein interactions
DeMaggio, S.M. Hollenberg, R.H. monitored in mammalian cells via
Goodman, M.F. Hoekstra, A positive complementation of beta-lactamase
genetic selection for disrupting enzyme fragments, Proc. Natl. Acad.
protein-protein interactions: Sci. U.S.A. 2002, 99, 3469-3474.
identification of CREB mutations that 43. 1. Remy, S.W. Michnick, Clonal
prevent association with the selection and in vivo quantitation of
coactivator CBP, Proc. Natl. Acad. Sci. protein interactions with
U.S.A. 1996, 93, 13896-13901. protein-fragment complementation
34. K. Baker, D. Sengupta, G. Salazar- assays, Proc. Natl. Acad. Sci. U.S.A.
Jimenez, V.W. Cornish, An optimized 1999, 96,5394-5399.
dexamethasone-methotrexate yeast 44. I. Remy, S.W. Michnick, Visualization
3-hybrid system for high-throughput of biochemical networks in living
screening of small molecule-protein cells, Proc. Natl. Acad. Sci. U.S.A.
interactions, Anal. Biochem. 2003, 3 15, 2001. 98,7678-7683.
134-137. 45. E.A. Althoff, Engineering Ligand-
35. J.C. Hu, E.K. O’Shea, P.S. Kim, R.T. Receptor Interactions Using a Bacterial
Sauer, Sequence requirements for Three-Hybrid System, Columbia
coiled-coils: analysis with lambda University, New York, 2004.
repressor-GCN4 leucine zipper 46. J. Estojak, R. Brent, E.A. Golemis,
fusions, Science 1990, 250, 1400-1403. Correlation of two-hybrid affinity data
226
I 4 Controlling Protein-Protein lnteractions
4.2
Controlling Protein-Protein interactions Using Chemical inducers and
Disrupters of Dimerization
Tim Clackson
Outlook
4.2.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WlLEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
4 Controlling Protein-Protein Interactions
228
I
Fig. 4.2-1 Schemes showing the principle cells. (b) Heterodimerization. In this
of chemically induced dimerization o f example, one fusion protein is membrane
proteins. (a) Homodimerization. in this tethered; the other is expressed as a soluble
example, fusion proteins are tethered t o the cytosolic protein and is recruited to the cell
cell membrane through fusion to a peptide membrane upon addition ofdimerizer.
sequence that becomes myristoylated inside
4.2.2
Development o f Chemical Dimerization Technology
In the initial paper, Spencer et al. used the FK506-FKBP interaction itselfto
provide building blocks for the dimerizer system. They generated a dimerizer
by linking two molecules of FK506 to create FK1012, a molecule that can bind
two FKBP domains simultaneously (but not calcineurin). They then created
a suitable variant of their target protein, the T-cell receptor zeta chain, by
appending three copies of FKBP. Addition of FK1012 to cells expressing the
engineered protein led to clustering of the protein and activation of authentic
downstream cellular events.
FK1012 is a homodimerizer, with two identical binding motifs. It was
quickly recognized that induced heterodimerization should also be feasible, by
fusing the two proteins of interest to different protein-binding domains
that are targeted by a suitable nonsymmetrical dimerizer (Fig. 4.2-l(b))
[4-61. Dimerizers used for such approaches have included, for example,
dimers of FK506 and cyclosporine (FK-CsA) [4]. However, it is most
straightforward to simply use the bifunctional natural products directly.
Rapamycin, an immunosuppressive drug related to FK506, functions by
binding simultaneously to FKBP and the protein kinase FRAP/mTOR [7]and
can be used to heterodimerize proteins fused to these protein modules [5, 61.
The ability to induce a protein-protein interaction inside cells provided a
general way to generate inducible alleles of signaling and other proteins - one
that can be activated in real time, in contrast to classical genetic approaches [8].
This suggested a series of important applications, ranging from mechanistic
analysis of protein function to understanding the consequences of activating
signaling in whole cells and even transgenic animals. Initial hopes have been
more than fulfilled, and several hundred papers have now been published that
describe diverse uses of the technology [9].
4.2.3
Dimerization Systems
A major focus, following the initial reports, was on refining the tools used to
achieve chemical dimerization - in particular, the dimerizers themselves.
Important aims were to improve chemical feasibility, specificity, and
pharmacological properties, the latter to permit studies in experimental
animals. This section will describe the options that have evolved for
different types of induced dimerization. The focus will be on the FKBP-
based technologies and applications developed by the author’s group and its
collaborators, although other systems will also be mentioned.
4.2.3.1 Homodimerization
A series of FK1012 variants has been described with different linkers and, in
some cases, facile syntheses using FK506 as a starting point (Fig. 4.2-2) [lo].
All of these can be used to effect dimerization between FKBP fusion proteins.
230
I 4 Controlling Protein-Protein Interactions
FK1012 Linker X
OH Z
OMe
OMe Me0 H2 ii3
AP1510
4.2.3.2 Heterodimerization
Although early heterodimerization studies used molecules such as FK-CsA,
the most common approach is the use of rapamycin, which naturally functions
4.2 C o n t r d i n g Protein-Protein lnteractions 1 231
as a heterodimerizer [7]. One protein is fused to FKBP, and the other to the
-100 amino acid domain of FRAP/mTOR which binds to the FKBP-rapamycin
complex, termed FRB (for FKBP-rapamycin binding domain) [13]. FKBP and
FRB have no detectable affinity for one another in the absence of rapamycin,
yet the drug binds simultaneously to both proteins with high affinity. Thus,
addition of rapamycin to cells expressing FKBP and FRB fusion proteins leads
to strictly drug-dependent heterodimerization.
Because of its inherent directionality, heterodimerization is often a more
precise tool than homodimerization and can be used in many configurations.
For example, a protein can be inducibly recruited to the plasma membrane
by fusing it to one of the drug-binding domains, and fusing the other
to a myristoylation motif (see Fig. 4.2-l(b)) [4]. A major application of
heterodimerization is in the control of transcription (see Section 4.2.3.4) [5, 61.
In addition to the rapamycin system, other heterodimerization systems
have been described, including dimers of methotrexate and dexamethasone
to target dihydrofolate reductase and glucocorticoid receptor fusion proteins,
respectively [14, 151, and dimers of estrogen analogs and biotin analogs to
target fusions to estrogen receptors and streptavidin [16].
Fig. 4.2-3 Engineering specificity into FKBP system. Bumped “rapalogs” are able to
dimerizing agents using “bumps and induce heterodimers between FKBP fusion
holes”. (a) Homodimerization system. proteins and FRB fusion proteins engineered
Bumped homodimers are able t o induce with a specific “hole”. The compounds can
dimers between FKBP fusion proteins still bind to endogenous FKBP, but have
engineered with appropriate “holes”, while reduced or eliminated antiproliferative
evading endogenous FKBP. activity because this complex cannot bind
(b) Rapamycin-based heterodimerization effectively t o endogenous FRAP/mTOR.
4.2 C o n t r o h g Protein-Protein interactions 1 233
~ Dtrnerizer x Linker Y
O H
Fig. 4.2-4 Bumped homodimerizers. These compounds are designed to bind potently
and specifically to the F36V mutant of FKBP.
234
I 4 Controlling Protein-Protein hteractions
Rapamycinl
AP rapalogs Rapalog R16 R32
Me0
Rapamycin OMe II
0
OMe /I
Me0 AP22594 0
OMe
AP1861 II
0
Me0 ~
MA-rap
AP21967 I
OH
~
L7
AP23102 HN,koa I1
0
J,
Fig. 4.2-5 Bumped rapalogs used as rapamycin), in which the triene portion of
heterodimerizers. The rapalogs listed in the rapamycin is modified as shown, is active in
panel are all active in dimerization systems dimerizeration systems incorporating the
incorporating the T2098L mutation in FRB specific FRB triple mutation PLF
fusion proteins. Ma-rap (CZO-methallyl (K2095P/T2098L/W2101 F) [22].
236
I 4 Controlling Protein-Protein lnteractions
Fig. 4.2-6 Schemes for controlling transcription using chemically induced dimerization.
(a) Control using homodimerizers. (b) Control using heterodimerizers (rapalogs).
Fig. 4.2-7 Comparison of conventional and proteins. (b) Reverse dimerization system
"reverse" FKBP dimerization systems. using monomeric ligand (AP21998) and
(a) induced dimerization using bumped F36M fusion Proteins.
homodimerizer AP20187 and F36V fusion
4.2.4
Applications
Fig. 4.2-8 X-ray crystal structures of (b) Structure o f raparnycin in complex with
dimerized complexes. In each case, protein wild-type FKBP green and the FRB domain
N-termini are marked in blue and C-termini of FRAP/rnTOR gray (Protein Data Bank
in red. (a) Structure ofAP1903 in complex (PDB) ID: 4FAP) [7]. (c) Structure ofthe
with two molecules o f FKBP-F36V (our homodimeric complex o f the
unpublished data). The two proteins are self-associating FKBP mutant F36M
brought close to each other in a “parallel” (PDB ID: 1 EYM) [27]. The two molecules
configuration, and intramolecular interact through their ligand-binding sites in
drug-drug interactions are extensive. an “antiparallel” configuration.
inducible animal models of disease. The second is the direct use of the
technologies in potential therapeutic applications, generally in the context of
cell or gene therapies. Examples of both will be reviewed in the following
sections.
4.2 Contro//ing Protein-Protein interactions 1 239
also be used to test potential drugs for the ability to block the induced FGFRl
signal and its consequences.
A general approach to creating animal models of degenerative diseases is to
induce apoptosis specifically in target tissues or organs. This can be achieved
through tissue-specific expression of inducible alleles of the Fas receptor or
through any number of downstream caspases. Mice in which hepatocytes can
be inducibly ablated represent a valuable model for liver diseases [38], and
mice expressing inducible caspase in macrophages are a valuable resource for
probing the roles of these cells [39].
4.2.4.4.1 Three-hybridApproaches
Another use of dimerizer-controlled transcription is in three-hybrid assays
[14, 151. In these applications, the “third hybrid” is the dimerizer, and gene
activation serves merely as an assay to report on the interaction between a
dimerizer and the two fusion proteins, rather than as the end in itself. Three-
hybrid assays can be used to identify target proteins for a given small molecule
(by incorporating the molecule into a dimerizer and screening against a cDNA
library fused to an AD; see Chapter 18.2), or to identify small molecules that
bind a given target (by cloning the target as an AD fusion protein and screening
against a library of dimerizers in which one monomer is diversified). More
recently, they have been applied to directed evolution of the catalytic properties
of proteins using “chemical complementation” (see Chapter 4.1).
4 Fig. 4.2-12 Use ofthe reverse dimerization Cells expressing an insulin-F36M fusion
system t o control protein secretion in protein were exposed t o AP21998 for three
mammalian cells. (a) Scheme for inducible 1-h periods as indicated, and medium was
secretion. (b) Chemical structure o f collected every hour and assayed for insulin
monomeric ligand AP21998. (c) Pulsatile levels [55].
release o f insulin from engineered cells.
4.2.5
Future Development
4.2.6
Conclusion
Acknowledgments
I thank Len Rozamus, Xiaotian Zhu, Vic Rivera, and Renate Hellmiss
for preparing the figures. I am indebted to my many ARIAD colleagues
and collaborators, past and present, who have contributed to our work on
dimerization technology. Particular thanks are due to Vic Rivera for numerous
discussions over many years. Kits for the regulated dimerization of proteins
may be requested through ARIAD’s website at www.ariad.com/regulationkits.
References
47. T. Clackson, Regulated gene 54. L.M. Sanftner, V.M. Rivera, B.M.
expression systems, Gene Ther. 2000, Suzuki, L. Feng, L. Berk, S. Zhou, J.R.
7, 120-125. Forsayeth, T. Clackson,
48. H. Chong, A. Ruchatz, T. Clackson, J. Cunningham, Dimerizer regulation
V.M. Rivera, R.G. Vile, A system for of AADC expression and behavioral
small-molecule control of response in AAV-transduced 6-OHDA
conditionally replication-competent lesioned rats, Mol. Ther. 2006, 13,
adenoviral vectors, Mol. Ther. 2002, 5, 167- 174.
195-203. 55. V.M. Rivera, X. Wang, S. Wardwell,
49. R. Pollock, M. Giel, K. Linher, N.L. Courage, A. Volchuk, T. Keenan,
T. Clackson, Regulation of D.A. Holt, M. Gilman, L. Orci,
endogenous gene expression with a F. Cerasoli Jr, J.E. Rothman,
small-molecule dimerizer, Nat. T. Clackson, Regulation of protein
Biotechnol. 2002, 20, 729-733. secretion through controlled
50. X. Ye, V.M. Rivera, P. Zoltick, aggregation in the endoplasmic
F. Cerasoli Jr, M.A. Schnell, G. Gao, reticulum, Science 2000, 287,826-830.
J.V. Hughes, M. Gilman, J.M. Wilson, 56. A. Volchuk, M. Amherdt,
Regulated delivery of therapeutic M. Ravazzola, B. Brugger, V.M.
proteins after in vivo somatic cell gene Rivera, T. Clackson, A. Perrelet, T.H.
transfer, Science 1999, 283, 88-91. Sollner, J.E. Rothman, L. Orci,
51. V.M. Rivera, G.P. Gao, R.L. Grant, Megavesicles implicated in the rapid
M.A. Schnell, P.W. Zoltick, L.W. transport of intracisternal aggregates
Rozamus, T. Clackson, J.M. Wilson, across the Golgi stack, Cell 2000, 102,
Long-term pharmacologically 335- 348.
regulated expression of erythropoietin 57. J.E. Gestwicki, G.R. Crabtree, I.A.
in primates following AAV-mediated Graef, Harnessing chaperones to
gene transfer, Blood 2005, 105, generate small-molecule inhibitors of
1424-1430. amyloid beta aggregation, Science
52. A. Auricchio, G.P. Gao, Q.C. Yu, 2004,306,865-869.
S. Raper, V.M. Rivera, T. Clackson, 58. C.Y. Majmudar, A.K. Mapp, Chemical
J.M. Wilson, Constitutive and approaches to transcriptional
regulated expression of processed regulation, Curr. Opin. Chem. Biol.
insulin following in vivo hepatic gene 2005, 9,467-474.
transfer, Gene Ther. 2002, 9, 963-971. 59. S.A. Qureshi, R.M. Kim, Z. Konteatis,
53. A. Auricchio, V. Rivera, T. Clackson, D.E. Biazzo, H. Motamedi,
E. O’Connor, A. Maguire, R. Rodrigues, J.A. Boice, J.R. Calaycay,
M. Tolentino, J. Bennett, J. Wilson, M.A. Bednarek, P. Griffin, Y.D. Gao,
Pharmacological regulation of protein K. Chapman, D.F. Mark, Mimicry of
expression from adeno-associated viral erythropoietin by a nonpeptide
vectors in the eye, Mol. Ther. 2002, 6, molecule, Proc. Natl. Acad. Sci. U.S.A.
238-242. 1999, 96,12156-12161.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
250
I 4 Contro//;ng Prote;n-Protein interactions
4.3
Protein Secondary Structure Mimetics as Modulators o f Protein-Protein and
Protein- Ligand Interactions
Outlook
4.3.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Giinther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
4.3 Protein Secondary Structure Mimetics I251
4.3.2
History and Development
Fig. 4.3-1 X-ray crystal structure o f the h C H (purple)/hCHbp (cyan) complex. Side
chains of the critical amino acid residues (hot spots) are shown in stick representation.
4.3.3
General Considerations
Conventional drug discovery often starts by screening a large and diverse chem-
ical library, from which lead compounds can be identified using biochemical
and cell-based evaluation methods. The subsequent steps involve an iterative
loop of structure determination, modeling, and lead optimization. In many
cases, millions of compounds in the preliminary screening, dozens of high-
resolution X-ray structures of a drug target, as well as months of collaborative
research are necessary to achieve the potency, selectivity, and pharmacokinetic
and toxicological properties required of a preclinical drug candidate.
Rational inhibitor design offers a compelling alternative for the identification
of protein-protein disrupters as it is based on a structural knowledge of the
interface. In particular, synthetic scaffolds that mimic the key elements of a
protein surface can potentially lead to small molecules with the full activity of
a protein domain, a fraction of the molecular weight, and no peptide bonds.
Furthermore, lead compounds derived from rational design can be readily
optimized by structure-activity relationship (SAR) studies.
In general, structure-based drug design treats the backbone of the protein as
a relatively rigid entity. Once the structure of a complex of the protein with a
representative ligand has been solved experimentally, it can be used as a valid
template, onto which atoms or functional groups can be added to the ligand
if free space is available within the binding pocket. In reality, protein side
chains within the binding pocket may move to accommodate a ligand and, in
some cases, there may even be limited movement of the polypeptide backbone.
Moreover, bound solvent may define the surface of the binding pocket, rather
than the protein itself, and thus limit the space available for the addition of
substituents.
Before designing small molecule agents that target certain protein-protein
interfaces, it is helpful to consider the characteristics of a general pro-
tein-protein complex. The association constant, which is determined by
254
I the free energy difference (AG) between the associated and unassociated
4 Contro//ing Protein-Protein Interactions
contributions of the individual side chains did not correlate with their buried
surfaces [23]. In several cases, a set of energetically unimportant contacts
surrounded the hot spot, seeming to occlude bulk solvent in the manner of an
0 ring. Certain amino acid residues, in particular, tryptophan (21%),arginine
(13%), and tyrosine (12%), appear more frequently in hot spots (contribute
more than 2 kcal mol-' to a binding interaction) than others, such as leucine,
methionine, serine, threonine, and valine, each of which account for less than
3% of the overall hot spot residues [24]. Tryptophan, arginine, and tyrosine
residues are also found more frequently in the protein interfaces, with 3.91-,
2.47-, and 2.29-fold enrichment, respectively, in hot spot areas. An enrichment
of tyrosine and tryptophan as well as a discrimination against valine, isoleucine,
and leucine has also been reported in antibody complementarity-determining
region (CDR) sequences [25]. Padlan et al. proposed that the enrichment
of these aromatic amino acid residues is due to their ability to participate
in hydrophobic contacts without large entropic penalty, as they have fewer
rotatable bonds.
Recent developments in bioinformatics have provided insights into the
analysis of protein-protein interfaces and have helped detection of the hot
spots. A wealth of data of alanine mutations in various protein-protein
complexes is available (www.asedb.org) and has assisted in the design of
small molecules to modulate their interactions [2G]. Table 4.3-1 lists the
protein-protein interactions whose alanine scanning energetic data are
currently available on the ASEdb database. Alternatives for detecting hot
spot regions include computational tools that generate combinatorial libraries
offunctional epitopes and identify recurring sets ofresidues in the epitope [27].
The spatial arrangement of key structural motifs at protein-protein interfaces
has been efficiently detected by this method. Ben-Tal and coworkers have
developed an algorithm, Rate4Site, and a web-server Consurf (consurf.tau.ac.il)
[28] for identification of functional interfaces based on the evolutionary
relations among homologous proteins, as reflected in phylogenetic trees [29].
Using the tree topology and branch lengths corresponding to the evolutionary
relationships between two proteins, the algorithm accurately identified a
homodimer interface of a hypothetical protein Mj0577 that was also detected
in an X-ray crystallographic analysis.
4.3.4
Applications and Practical Examples
1 2
Fig. 4.3-2 Structure of j3-D-glucose-based peptidomimetics of SRIF.
Previous studies had shown that cyclic hexapeptide 1 was a potent agonist
of SRIF [32], due to the dipeptide motif of Phe-Pro, enforcing a B-turn
conformation and the correct positioning of the remaining four side chains. In
addition, the aromatic side chains of the Phe-Pro dipeptide provide favorable
hydrophobic interactions with the SRIF receptor.
On the basis ofthis peptide agonist of SRIF, compound 2 was designed with
the critical side chains of 1 projected on a B-D-glucose scaffold (Fig. 4.3-2).
B-D-Glucose is a good design for a B-turn mimetic because: (a) the pyran
ring imposes an appropriate projection of the side chains, and (b) the glucose
backbone is relatively rigid. The shape and substitution pattern of B-D-glucose
was found to best present the Trp, Lys, and Phe side chains. A radiolabeled
binding assay showed that 2 completely displaced a peptide ligand, 12'I-CGP
23996, from the SRIF receptor on membranes from AtT-20 cell lines with an
ICso of 1.9 pM. Binding studies using cerebral cortex and pituitary membrane
cells showed similar results. Taken together, this study supported the validity
of using nonpeptide scaffolds to mimic protein secondary structures that are
of biological interest.
In a follow-up study, Smith and Hirschmann have elaborated a pyrrolinone-
based mimetic of the /I-strandlp-sheet conformations [33, 341, in which
all of the key recognition features (i.e., side chains and hydrogen-bond
donors/acceptors) are faithfully represented within a low-molecular-weight
nonpeptide analog 4 (Fig. 4.3-3). This design has been applied to the
development of antagonists of HIV-1 protease and more recently to mimics of
major histocompatibility complex (MHC)class I1 protein substrate [34, 351.
Computational modeling using the Macromodel program suggested that
3,S-linked pyrrolin-4-ones can structurally mimic a short peptide in a
B-strand conformation. In a computer-simulated conformational search, the
pyrrolinone rings fix the dihedral angles analogous to 4, $, and w in a
peptide (Fig. 4.3-3). This favored conformation is due to the hindrance of the
gauche interaction between the side chain substituents and their neighboring
pyrrolinone rings. The side chains appended at the 5-positions of pyrrolinone
258
I 4 Controlling Protein-Protein Interactions
I>
3 4
Fig. 4.3-3 Polypyrrolinone-based B-turn peptidomimetic 4.
Fig. 4.3-4 Complex o f t h e HIV-1 protease and p-strand peptide inhibitor JC-365.
5 (L682,679) 6
Fig. 4.3-5 HIV-1 protease inhibitors 5 and 6
To test this general design, Hamilton and coworkers have developed a-helix
mimetics of the Bak protein that binds into a shallow hydrophobic cleft on
the surface of Bcl-xL. Bak and Bc1-x~are members of the B-cell lymphoma-
2 (Bcl-2) protein family, which plays an important role in the apoptotic
pathway [40]. This protein family can be divided into two subgroups: the
proapoptotic and the prosurvival subfamilies. The proapoptotic subfamily
proteins, such as Bak, Bad, and Bax, share a minimal helical homologous
region, the BH3 domain, which is responsible for mediation of apoptosis
through heterodimerization with the prosurvival Bcl-2 family members [41].
Overexpression of the prosurvival proteins, such as Bcl-2 and Bcl-x~,can
inhibit the potency of many currently available anticancer drugs by blocking
the apoptotic pathway [42].
A current strategy for modulating apoptosis is to target the Bak-recognition
site on BcI-XL and thereby disrupt the protein-protein contact. The structure
of the Bcl-xL/Bak complex determined by N M R spectroscopy showed that a
helical region of Bak (amino acid 72 to 87) binds to a hydrophobic cleft on
the surface of Bcl-x~(& = 340 nM) [43].Furthermore, the crucial residues for
binding, shown by alanine scanning, are Va174, Leu78, Ile81, and Ile85, which
+ + +
project at the i, i 4, i 7, and i 11positions along one face of the Bak helix.
The design of agents that directly mimic the death-promoting BH3 domain
of the proapoptotic subfamily of Bcl-2 proteins is of much current interest as
they can potentially provide drugs that control apoptosis [44].
A series of terphenyl derivatives with different side chains was prepared
as structural mimetics of the Bak peptide using a modular and convergent
synthesis. We used a fluorescence polarization assay to monitor the interaction
between the inhibitor and the target protein. Some of the structure-activity
results are listed in Table 4.3-2. Terphenyl 7, with two carboxyl groups and
a substituent sequence of isobutyl, 1-naphthylmethylene,isobutyl groups in
the 3,2',2"-positions, was identified as a potent inhibitor (Kd = 114 nM) of the
Bak/Bcl-xLcomplexation. The binding specificity was confirmed by scrambling
the sequence of the substitutions, as in isomer 12, which caused a 25-fold drop
in Ki. The importance of the side chains was confirmed by terphenyll3 which
lacks the ability to disrupt Bak binding to BcI-XL, ruling out the possibility of
nonspecific binding by the terphenyl backbone.
"N-HSQC N M R experiments with 7 indicated that the terphenyl derivatives
target the same hydrophobic cleft on Bc1-x~as the Bak peptide (shown in blue,
Fig. 4.3-7). Residues A89, L99, L108, T109, S110, 4111, 1114, 4125, L130,
F131, W137, G138, R139, 1140, A142, S145, and F146 (shown in magenta
in Fig. 4.3-7) showed significant chemical shift changes on addition of the
synthetic inhibitor 7. Some other residues, including G94, L112, S122, G134,
K157, E158, and M159 (shown in yellow in Fig. 4.3-7) showed moderate
chemical shift changes under the same conditions. All these affected residues
lie near the shallow cleft on the protein surface into which the Bak BH3 helix
binds. The targeted residues V74, L78, and I81 of Bak BH3 are within 4 A
distance of residues F97, R102, L108, L130, 1140, A142, and F146 of Bc~-xL,
262
I 4 C o n t r o h g Protein-Protein Interactions
Bn -iBu 11 2.73
q . 3 iBu iBu 12 2.70
H H H 13 >30.0
C02H
Polarization measurements were recorded on titration of
inhibitors at varying concentrations in a solution of 15 n M
labeled Bak peptide (F1-CQVCRQLAIIGDDINR-CONH2) and
184 nM Bcl-xL (25 "C, 1.0 mM PBS, pH 7.4)
most ofwhich showed significant chemical shift changes (F97 overlapped with
NS), confirming that 7 and Bak BH3 target the same area on the exterior surface
of Bc1-x~.Overlay of 7 and the Bak BH3 peptide suggested that the terphenyl
indeed adopts a staggered conformation, mimicking the cylindrical shape of
the helix with the substituents making a series of hydrophobic contacts with
the protein surface.
Further studies using human embryonic kidney 293 (HEK293) cells have
shown that terphenyl 7 disrupts Bak/Bcl-xL binding in whole cells [lG].
HEK293 cells transfected with both HA-Bcl-xL and flag-Bax,an analog of Bak,
were treated with terphenyl derivatives. After 24-h incubation, the cells were
harvested and lysed. HA-tagged BcI-XLwas collected via immunoprecipitation
with HA antibody. The resulting mixture was loaded on to a 12.5% SDS-PAGE
gel, and proteins transferred to nitrocellulose for western blot analysis. The
presence of Bax protein was probed with antiflag antibody. The inhibitory
potencies of the terphenyl compounds were determined by measuring the
relative intensity of the Bax protein bound to Bcl-xL. We found that 51% of the
Bak/Bcl-xL interaction was disrupted in HEK293 cells treated with terphenyl
7, indicating that certain terphenyls are competitive with the full-length
protein-protein interaction in a cellular environment.
4.3 Protein Secondary Structure Mimetics I 263
Fig. 4.3-7 Results ofthe "N-HSQC and highest ranked binding mode o f inhibitor 7
computational docking experiments o f 7 predicted from a computational docking
binding to BcI-xL. The residues that showed simulation (Autodock 3.0) has been
significant chemical shift changes in the superimposed on the helical Bak BH3
presence o f 7 are shown in yellow. The domain for comparison.
4.3.5
Future Developments
than one binding pocket, each of which might contribute separately to the
complex formation. Furthermore, smaller molecules offer better starting
points for drug discovery because they can be readily assembled into larger
compounds. Wells et al. have reported a powerful technique for identifying
antagonists of protein-protein interactions with only medium to low potency
(micromolar - millimolar) by using a dynamically interconverting thiol-
tethered library [SO]. This method has a great advantage in searching for
inhibitors that target a mobile protein surface. Kodadek et al. have developed
a general methodology that is effective in searching for a second binding site
on the protein surface. A library of combinatorial oligomeric compounds
is attached to a low-affinity anchor compound that can recognize the
target protein. The resulting library is then screened under conditions too
demanding for the lead to support robust binding to the protein target.
Using MDM2 as a model, they have identified relatively potent chimeric
compounds that simultaneously recognize multiple binding sites on the
protein surface [Sl].
4.3.6
Conclusion
Acknowledgments
References
Angew. Chem. Int. Ed. Engl. 1991, 30, 34. A.B. Smith, A.B. Benowitz, P.A.
1278-1301. Sprengeler, J. Barbosa, M.C. Guzman,
31. P. Brazeau, W. Vale, R. Burgus, R. Hirschmann, E. J. Schweiger, D.R.
R. Guillemi, Isolation of Somatostatin Bolin, 2. Nagy, R.M. Campbell, D.C.
(a somatotropin-release-inhibiting- Cox, G.L. Olson, Design and synthesis
factor) of ovine hypothalamic origin, of a competent pyrrolinone-peptide
Can.]. Biochem. 1974,52,1067-1072. hybrid ligand for the class Ii Major
32. P. Brazeau, W. Vale, R. Burgus, histocompatibility complex protein
N. Ling, M. Butcher, J. Rivier, Hla-Dr1,J. Am. Chem. SOC.1999, 121,
R. Guillemi, Hypothalamic 9286-9298.
polypeptide that inhibits secretion of 35. A.B. Smith, R. Hirschmann,
immunoreactive pituitary A. Pasternak, W.Q. Yao, P.A.
growth-hormone, Science 1973, 179, Sprengeler, M.K. Holloway, L.C. Kuo,
77-79. Z.G. Chen, P.L. Darke, W.A. Schleif,
33. A.B. Smith, W.Y. Wang, P.A. An orally bioavailable pyrrolinone
Sprengeler, R. Hirschmann, Design, inhibitor of Hiv-1 protease:
synthesis, and solution structure of a computational analysis and X-ray
pyrrolinone-based beta-turn crystal structure of the enzyme
peptidomimetic, J . Am. Chem. SOC. complex, J . Med. Chem. 1997, 40,
2000, 122,11037-11038; A.B. 2440-2444; P.V. Murphy, J.L. O’Brien,
Smith, H. Liu, R. Hirschmann, A L.J. Gorey-Feret, A.B. Smith, Synthesis
second generation synthesis of of novel Hiv-1 protease inhibitors
polypyrrolinone nonpeptidomimetics:
based on carbohydrate scaffolds,
prelude to the synthesis of
Tetrahedron 2003, 59, 2259-2271; P.V.
polypyrrolinones on solid support,
Murphy, J.L. O’Brien, L.J. Gorey-Feret,
Org. Lett. 2000, 2,2037-2040 A.B.
A.B. Smith, Structure-based design
Smith, T.P. Keenan, R.C. Holcomb,
and synthesis of Hiv-1 protease
P.A. Sprengeler, M.C. Guzman, J.L.
Wood, P.J. Carroll, R. Hirschmann, inhibitors employing
Design, synthesis, and beta-D-mannopyranoside scaffolds,
crystal-structure of a Bioorg. Med. Chem. Lett. 2002, 12,
pyrrolinone-based peptidomimetic 1763-1766.
possessing the conformation of a 36. J.R. Huff, Hiv Protease - a Novel
beta-strand - potential application to Chemotherapeutic Target for Aids, /.
the design of novel inhibitors of Med. Chem. 1991,34, 2305-2314 A.L.
proteolytic-enzymes, J. Am. Chem. SOC. Swain, M.M. Miller, J. Green, D.H.
1992, 114,10672-10674; A.B. Smith, Rich, J. Schneider, S.B.H. Kent, A.
L.D. Cantin, A. Pasternak, L. Wlodawer, X-ray crystallographic
Guise-Zawacki, W.Q. Yao, A.K. structure of a complex between a
Charnley, J. Barbosa, P.A. synthetic protease of human
Sprengeler, R. Hirschmann, S. immunodeficiency virus-1 and a
Munshi, D.B. Olsen, W.A. Schleif, substrate-based hydroxyethylamine
L.C. Kuo, Design, synthesis, and inhibitor, Proc. Natl. Acad. Sci. U.S . A.
biological evaluation of 1990,87,8805-8809.
monopyrrolinone-based Hiv-1 37. W.D. Stein, The Movement ofMolecules
protease inhibitors, J. Med. Chem. across Cell Membranes, Academic, New
2003,46, 1831-1844; A.B. Smith, York, 1967, pp. 65-125.
M.C. Guzman, P.A. Sprengeler, T.P. 38. D.P. Fairlie, M.L. West, A.K. Wong,
Keenan, R.C. Holcomb, J.L. Wood, P.J. Towards protein surface mimetics,
Carroll, R. Hirschmann, De-novo Curr. Med. Chem.1998,5, 29-62.
design, synthesis, and x-ray 39. L.D. Walensky, A.L. Kung, I. Escher,
crystal-structures of pyrrolinone-based T.J. Malia, S. Barbuto, R.D. Wright,
beta-strand peptidomimetics, J . Am. G. Wagner, G.L. Verdine, S.J.
Chem. Soc. 1994, 116, 9947-9962. Korsmeyer, Activation of apoptosis in
References I 2 6 9
I271
5
Expanding the Genetic Code
5.1
Synthetic Expansion o f the Central Dogma
Masahiko Sisido
Outlook
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinhrim
ISBN: 978-3-527-31150-7
5 Expanding the Genetic Code
272
I phosphorylated or glycosylated proteins as medicinal tools, and so on.
Furthermore, synthesis of mutant proteins that contain specialty amino acids
in living cells will open a way toward “synthetic microorganisms” that function
differently from the existing organisms.
5.1.1
Introduction
Progress of synthetic chemistry during the last century was really over-
whelming. Chemists with the state-of-the-artknowledge and technique can
produce almost any compounds that can exist in nature. Moreover, they
can fabricate compounds into membranes, vesicles, and other supramolecu-
lar assemblies by using secondary forces, like hydrogen bonds, electrostatic
forces, hydrophobic interactions, and so on. Then, a question arises, whether
chemists can create a living organism. Creation of a living organism is not
an unrealistic target, because essential mechanisms of major reactions in
living cells and important structures of biomolecules that function inside
the cells have been clarified during the last 30 years. It may be possible, at
least in theory, to put all components of the DNA replicating system and
the protein synthesizing system inside an artificial liposome together with
relevant monomers for creation of a minimum prototype of a self-replicating
system.
The most advantageous point of the synthetic approach is, however, not
a simple reconstitution of the existing living organisms, but expansion or
alteration of the existing systems by introducing analogs and surrogates
of biomolecules. Analogs of biomolecules are artificial compounds that
resemble existing biomolecules and function like they do in living organisms.
Nonnatural amino acids and nonnatural nucleic bases, described in this
chapter, are typical analogs. Surrogates are also artificial molecules that have
structures different from those of existing biomolecules but function similarly
or alternatively as some of them. Peptide nucleic acid (PNA) is a typical
surrogate that emulates the hybridization behavior of DNAs and RNAs. By
introducing analogs and surrogates into biochemical systems, we can alter
or expand biochemical functions to create novel functions that have not
been observed in the existing organisms. In particular, expansion of protein
biosynthesizing system to include a variety of nonnatural amino acids is the
subject of this chapter.
The introduction of the 21st and more nonnatural amino acids requires
expansion ofwhole steps in protein synthesis (central dogma) as illustrated in
Fig. 5.1-1 [l-41.
1. Synthesis of nonnatural amino acids of desired functions.
2. Preparation of an orthogonal tRNA that cannot be
aminoacylated by any aminoacyl-tRNAsynthetases
(aaRSs)in the biochemical system. The orthogonal tRNA,
5.7 Synthetic Expansion ofthe Central Dogma I 273
Fig. 5.1-1 Mechanism of protein synthesis (central dogma) and its expansion to include
nonnatural amino acids.
Steps 4 and 6 may not be serious, since both EF-Tu and ribosome are
tolerant to accept all 20 naturally occurring amino acids and this tolerance
may hold for some nonnatural amino acids also. However, if we want to
incorporate large-sized nonnatural amino acids whose side chain structures
are very different from the naturally occurring ones, we cannot postulate the
tolerance of EF-Tu and ribosome. In these cases, we will also have to expand
them.
274
I 5.1.2
5 Expanding the Genetic Code
Fig. 5.1-2 Hecht method for chemical aminoacylation oftRNA with a nonnatural amino
acid.
5.1 Synthetic Expansion ofthe Central Dogma I 275
However, they have not confirmed if the aminoacylated tRNA really works
in vitro or in vivo.
5.1.2.2
Micelle-mediated Arninoacylation
Very recently, the author found that cationic rnicelles mediate arninoacyla-
tion of tRNAs with N-protected amino acid activated ester under ultrasonic
irradiation (Fig. 5.1-3) [9].A cationic rnicelle, like CTACI rnicelle, solubilizes hy-
drophobic N-pentenoyl amino acid cyanomethyl ester inside the hydrophobic
core, whereas the negatively charged tRNA molecules are concentrated on the
positively charged rnicelle surface. The two components are separated inside
and outside the rnicelle and do not react with each other as they stand still.
When the mixture was ultrasonicated, the rnicellar structure may have fluttered
and the reaction taken place. For example, when 5 mM of N-pentenoyl-~-2-
naphthylalanine cyanomethyl ester and 0.01 rnM tRNA were sonicated in a
90 mM imidazole buffer (pH 7.5) that contained 1 8 mM CTAC1, up to 75%
yield of the aminoacylated tRNA was achieved within 10 minutes. Product
analysis indicated that about 70% of the aminoacylation is occurring at the 2'
or 3' OH group of the 3' end and no aminoacylation to the amino groups of the
nucleobases occurs. This high regioselectivity is surprising, because there are
77 OH groups in the tRNA and most of them are exposed to the solvent. The
rest of 30% arninoacylation occurs at the OH groups of other nucleotide units.
protein synthesis, presumably because they cannot bind to EF-Tu and cannot
go into the A site of ribosome. Indeed, when the crude aminoacyl-tRNAwas
added to Escherichia coli in vitro protein biosynthesizing system, a mutant
protein incorporated with a 2-naphthylalanine was obtained. The success of
micellar aminoacylation suggests that the t RNA aminoacylation is inherently
specific to the 2’(3’)-OHgroup, presumably because of the high reactivity of
the gem-diol group. A drawback of the micellar aminoacylation is that a small
amount of the cationic detergent remains attached to the negatively charged
tRNA. This may reduce the protein yield to some extent.
5.1.2.3
Ribozyme-mediated Aminoacylation
tRNAs. From the library, they selected those that undergo self-aminoacylation
with a biotinylated amino acid cyanomethyl ester. The identified RNA sequence
worked as an artificial aaRS even after it was cleaved off from the original
tRNA. Because the ribozyme is flexible enough to aminoacylate a wide variety
of tRNAs that have a common ACCA 3’ end, with a variety ofp-substituted
phenylalanine derivatives, it was named as a Jexizyrne. After optimization
and minimization of the RNA sequence, the flexizyme was charged onto a
columnar gel. The flexizyme column can aminoacylate tRNAs with a variety
of p-substituted phenylalanine cyanomethyl esters simply by passing a tRNA
with an amino acid cyanomethyl ester through the column [14-161. The
aminoacylated tRNA has been shown to work in E. coli in vitro system to
introduce the p-substituted phenylalanine derivatives into proteins. Recently,
the flexizyme has been given tRNA specificity by extending its 3’ end with a
complementary chain to a specific tRNA [17].
5.1.2.4
PNA-assisted Aminoacylation
not too tightly, otherwise it will remain attached after the aminoacylation and
retard or even inhibit the protein synthesis. In the case of yeast phenylalanine
tRNA, the 9-mer PNA was the best choice, but the chain lengths had to be
optimized for other tRNAs. Addition of an equimolar amount of the aa*-S-
sp-PNA conjugate to the tRNA gave 40-50% yield of aminoacylation against
yeast phenylalanine tRNA.
The PNA-assisted aminoacylation was specific to a target tRNA that has
a complementary 3‘-region to the PNA in an E. coli S30 in uitro protein
synthesizing system that contained a variety of endogenous tRNAs. When
we put a 2-naphthylalanine thioester-spacer-PNA conjugate together with an
orthogonalized yeast phenylalanine tRNA into the S30 system, the nonnatural
amino acid was successfully incorporated into the target protein.
The PNA-assisted aminoacylation/in vitro translation system is currently the
simplest way to obtain nonnatural mutants, if the relevant compound is given.
Since this is a chemical expansion of the aminoacylation process, it will be
applicable to a wide variety of nonnatural amino acids and different tRNAs.
The PNA-assisted aminoacylation is specific to a complementary tRNA and
is potentially effective in a living cell. The only obstacle against the in uiuo
aminoacylation is that the Nielsen-type PNA does not easily penetrate through
cell membranes. Efforts to design different types of PNAs that can penetrate
through cell membranes are in progress [20, 211.
5.1.2.5
Directed Evolution of Existing aaRS/tRNA Pair to Accept Nonnatural
Amino Acids
Fig. 5.1-6 Selection oftRNAs that are not aminoacylated by any o f t h e aaRSs in E. coli.
Fig. 5.1-7 Negative selection for eliminating TyrRS mutants that aminoacylate the
orthogonal tRNA with Tyr or any o f natural amino acids in E. coli.
280
I 5 Expanding the Genetic Code
Fig. 5.1-8 Positive selection for picking up TyrRS mutants that aminoacylate the
orthogonal tRNA with 0-methyltyrosine.
Fig. 5.1-9 Expanded living organism that produces proteins including a nonnatural
amino acid as the 21 st one.
5.I Synthetic Expansion ofthe Central Dogma I 281
similar procedure, they introduced various nonnatural amino acids into living
cells [24-26]. Later, they put the orthogonal tRNA/aaRS pair together with
an enzyme that synthesizes p-aminophenylalanine from basic carbon sources
[27].This is the first example of a cell that self-creates a 21st amino acid and
lives with it.
Yokoyama and coworkers also used a similar approach to find an orthogonal
aaRS/tRNA pair that works in mammalian cells. They used the orthogonal
pair to incorporate iodotyrosine into proteins [28, 291. The i n vivo system that
produces proteins in which iodine atoms are incorporated at specific positions
will find applications in large-scale production of heavy-atom labeled proteins
for X-ray analysis.
The elegant approaches of Schultz and Yokoyama are, however, typical
examples of biological expansion. It is not surprising, therefore, that their
screening processes, so far, produced aaRS/tRNA pairs only for amino acids
that are not far from the naturally occurring ones. It seems difficult, if not
impossible, to identify aaRS/tRNA pairs that can introduce large-sized amino
acids from their screening processes. Since nonnatural amino acids of specialty
functions, like fluorescence, electron donating, and accepting functions, often
carry large side groups, a more widely applicable method for aminoacylation
is needed.
At this moment, aminoacylation of tRNA with a nonnatural amino acid is
still a bottleneck step for nonnatural mutagenesis both in vitro and i n vivo.
Hecht method is versatile to almost any types of amino acids, but can be
done only for isolated tRNAs in a test tube. Further, the aminoacylation step
of pdCpA is sometimes tricky. For aminoacylation in a test tube, micelle-
mediated method is easier than the Hecht method, at least for some types
of amino acids. The ribozyme technique of Suga is applicable to a variety
of p-substituted phenylalanines and to a wide variety of tRNAs. This is, at
present, the simplest and most dependable method of aminoacylation for
isolated tRNAs. It has not been, however, applied to i n vivo systems and to
large-sized amino acids. Our PNA-assisted aminoacylation method may also
be applicable to a wide variety of amino acids and tRNAs. Since the PNA-
assisted aminoacylation is tRNA selective, it works as a potential amino acid
donor in living cells. The orthogonal tRNA/aaRS pairs reported by Schultz
and by Yokoyama are effective in some nonnatural amino acids with small
side groups, but they have not been applied to large-sized amino acids,
so far.
5.1.3
Other Biornolecules That Must Be Optimized for Nonnatural Amino Acids
Fig. 5.1-10 Orthogonal tRNAs that are not aminoacylated by any of natural amino acids
in E. coli, but can bring a nonnatural amino acids efficiently into the ribosome A site.
5.1 Synthetic Expansion ofthe Central Dogma I 283
5.1.3.2
Adaptability of EF-Tu to Aminoacyl-tRNAs Carrying a Wide Variety of Nonnatural
Amino Acids
Aminoacyl-tRNAs that carry nonnatural amino acids enter into the A site
of ribosome with the aid of an enzyme called an elongation factor, EF-
Tu. Only a single type of EF-Tu molecule exists in E. coli and it delivers
all types of aminoacyl-tRNAs into the ribosome A site. Therefore, the
EF-Tu molecule has an adaptability to bind a wide range of aminoacyl-
tRNAs, presumably, including those with some nonnatural amino acids.
Our preliminary experiment indicates that the E. coli EF-Tu binds yeast
phenylalanine tRNA that carries a variety of nonnatural amino acids with,
however, reduced affinities [31]. Aminoacyl-tRNAs carrying bulky nonnatural
amino acids, like 1-pyrenylalanine bind very weakly to the EF-Tu. Although
the binding affinity to EF-Tu may not be directly proportional to the
incorporation efficiency, it is clear that insufficient binding to EF-Tu leads
to unsuccessful incorporation of the nonnatural amino acid. Design and
synthesis of engineered EF-Tus that bind wider range of aminoacyl-tRNAs
with bulky nonnatural amino acids, are now in progress.
5.1.3.3
Adaptability of Ribosome to Wide Variety o f Nonnatural Amino Acids
Since the peptide bonds form in the ribosome, its expansion to accept wide
range of nonnatural amino acids will be the final target. It is somewhat
surprising that amino acids that carry large side groups like those shown
in Fig. 5.1-11 (left) have been incorporated into proteins in fairly high yields
in E. coli and other biosynthesizing systems [32]. This indicates that the
ribosomes of various species are very tolerant to a wide variety of amino
acids even beyond the naturally occurring ones. At the same time, however,
there are kinds of nonnatural amino acids that are rigorously rejected from
the ribosome, although their side groups are not very bulky [32]. Some
examples are shown in Fig. 5.1-11 (right).Typically, D-amino acids have been
rigorously rejected by the E. coli ribosome [33, 341. Similarly, our recent
experiment suggests that 9-anthrylalanine is rigorously rejected [32], even
though chemically aminoacylated yeast Phe tRNA with 9-anthrylalanine binds
to EF-Tu with somewhat reduced affinity [31].
The adaptability of E. coli ribosome has been investigated by using puromycin
analogs that carry a variety of nonnatural amino acids [35]. Since puromycin
is known to bind to the ribosomal A site without assistance of EF-Tu, the
extent of the inhibition of translation by the puromycin analogs can be a direct
measure of the adaptability of the A site to a variety of nonnatural amino acids.
The inhibition efficiency indicated that some aromatic amino acids that carry
widely expanded side groups, like 9-anthrylalanine and 1-pyrenylalanine, are
284
I 5 Expanding the Genetic Code
\
I I
D-Amino
NH
I
o=s=o
NrC=O acids
I
NMe,
not accepted by the A site. Recently, Roberts and coworkers also showed that
analogs carrying D-aminO acids or ,!?-aminoacids are little bound to the A site,
although they did not carry very large side groups [36].
These facts suggest that the inner structure of A site is very critical to reject
some types of amino acids and even small modifications of its structure will
expand its amino acid adaptability significantly. Indeed, Hecht and coworkers
showed that an E. coli ribosome with 23s rRNA with a UGGCA sequence
instead of GAUAA in the region 2447-2451, accepts D-amino acids to some
extent [37].Elaboration on the ribosome structure will open a way to synthesize
proteins that contain much wider variety of nonnatural amino acids.
5.1.4
Expansion o f the Genetic Codes
5.1.4.2
Four-base Codons
We have demonstrated that several four-base codons like CGGG and AGGU
can be used independently in the framework of the existing three-base codon
system [45, 461. The idea of the four-base codon has been inspired from the
naturally occurring frame-shift suppression. An undesired frame shift that
originates from an insertion of one nucleotide unit can be suppressed by
a frame-shift suppressor tRNA that contains a four-base anticodon. Similar
to the frame-shift suppressor tRNA, some of the four-base codons can be
successfully decoded by artificial frame-shift suppressor tRNAs that contain
the complementary four-base anticodons. Unsuccessful translation of a four-
base codon as the corresponding three-base codon causes an undesired
frame shift, but it often leads to an encounter of a stop codon downstream
(Fig. 5.1-12).Therefore, the four-base codon method, like the amber method,
gives exclusively a full-length protein that contains a nonnatural amino acid at
that position and an undesired decoding as a three-base codon gives a truncated
protein. The probability of the undesired three-base codon decoding can be
reduced by choosing rare codons as the first three bases of the four-base codons.
The most remarkable advantage of the four-base codons as compared
with the amber codon is that we can incorporate two or more different
nonnatural amino acids into single proteins [47, 481. We have identified five
different four-base codons that work independently in E. coli system, namely,
AGGU, CGGG, GGGU, CUCU, and CCCU [4G]. Similarly, CGGU(CGCU),
CCCU, CUCU(CUAU), and GGGU work efficiently in the lysate of rabbit
reticulocyte [49]. Since they are independent and orthogonal to each other, we
can introduce, in theory, up to five different nonnatural amino acids into a
single protein in E. coli system, and up to four in the rabbit system. In practice,
however, because of the reduced incorporation efficiencies of nonnatural
amino acids, the maximum number of nonnatural amino acids in a single
protein is limited to three, at this moment. The multiple incorporation has
286
I 5 Expanding the Genetic Code
5.1.4.3
"Synthetic Codons" That Contain Nonnatural Nucleobases
isoC
*H
isoG
5.1.5
In vivo Synthesis o f Nonnatural Mutants
Fig. 5.1-14 Import oftRNA aminoacylated with nonnatural amino acids into a living cell
through endocytosis.
5.I Synthetic Expansion ofthe Central Dogma I 289
facts, for the transfection method to be efficient, the endosomes must be broken
in the cytoplasm as quickly as possible, or alternatively, another technique that
leads to direct penetration of aminoacyl-tRNA must be developed.
5.1.6
Application o f Nonnatural Mutagenesis - Fluorescence Labeling
1 2 3 4
H 2 N 3
NH
I
o=s=o
5 $ 6 7
NMe,
Fig. 5.1-15 Nonnatural amino acids carrying fluorescent groups, that have been
incorporated into proteins with high efficiency.
5.1.7
Future Development and Conclusion
Basic strategy ofnonnatural mutagenesis was first reported more than 15 years
ago, as a promising technology for structural and functional analyses of
proteins in vitro and in vivo and for creating proteins of specialty functions.
However, it still remained a special method for only a limited number of
researchers, mainly because of the lack of an easy way of aminoacylation
and lack of appropriate nonnatural amino acids for useful applications.
Fortunately, facile and dependable methods for aminoacylation are now
available and several nonnatural amino acids reported recently appear to
be really useful for fluorescence labeling, glycosylation, phosphorylation, and
other applications. Commercialization of the reagents for aminoacylation
and the nonnatural amino acids carrying specialty side groups will further
accelerate the prevalence of this method. Nonnatural mutagenesis is a unique
method that enables position-specific labeling with a variety of functional
groups. Further, the labeling can be done even in living cells. No alternative
technique can do this. Wide application of this method will open a new area
in protein research in general and, especially, in drug discovery and protein
network analysis.
Acknowledgments
Recent experimental results from our laboratory described in this chapter have
been obtained by a support from a Grand-in-Aid for Scientific Research of the
Ministry of Education, Science, Sports, and Culture, japan (No. 15101008).
References
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Ghnther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
6
Forward Chemical Genetics
Stephen]. Haggarty and Stuart L. Schreiber
Outlook
6.1
Introduction
It is sometimes thought that the Neurospora work was responsible for the “one gene-one
enzyme” hypothesis - the concept that genes in general have single primavyfunctions, aside
from serving an essential role in their own replication, and that in many cases thisfunction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
300
I 6 Forward Chemical Genetics
is to direct specijicities ofenzymatically active proteins. Thefact is that it was the other way
around - the hypothesis was clearly responsiblefor the new approach.
Since the time of Gregor Mendel (1822-1884) and the discovery of “heritable
factors” [I],which are now referred to as genes, classical genetics, and more
recently molecular genetics, has become the dominant experimental paradigm
for understanding biological systems [2].An attractive feature of the genetic
approach is its adherence to the logic that to understand a system you
should perturb it and observe the consequences. Another important feature
is its generality, that is, genetics provides an experimental approach that is
applicable to the dissection of almost all biological systems provided that the
systems can reproduce and heritable mutations in genes can be made.
Despite the successes of classical genetics and knowledge of the complete
sequence of deoxyribonucleic acid (DNA) that comprises the human genome
[ 3 ] , the functions of the majority of genes and other regulatory elements
within the genome remain as enigmatic as they were at the time of Mendel.
In fact, many recent studies analyzing the basic tenets of what constitutes a
“gene”, as well as studies on the regulatory roles of ribonucleic acids (RNA),
challenge many of the tenets of the central dogma (DNA-to-RNA-to-protein).
Moreover,while knowledge of the complete human-genome sequence provides
a foundation for understanding disease biology, even for the majority of cases
of single-gene Mendelian disorders (e.g.,Huntington’s disease, cystic fibrosis),
knowledge of the genetic variation that causes the diseases is only the first step
toward an understanding of the disease pathogenesis and the development of
therapeutic treatments. Furthermore, it is now widely recognized that many
common human diseases, including cancer, schizophrenia, and diabetes, have
a strong genetic component, but the heritability of these diseases is so-called
complex in terms of the number of alleles (variants of genes) that contribute
to the final outcome and susceptibility. As a result of these challenges, there
exist only a handful of medical treatments based on an understanding of the
molecular etiology of a particular disease, and very few treatments that take
into account an individual’s genetic history. Therefore, there exists a great need
to expand the “molecular toolkit” available to both researcher scientists and
clinicians - the field of chemical biology is well poised to contribute toward
this task.
As stated above, George W. Beadle in his acceptance speech for the Nobel
prize in medicine or physiology in 1958 (shared with Edward L. Tatum “for
their discovery that genes act by regulating definite chemical events” using
the red bread mold Neurospora crussa, and with Joshua Ledenberg “for his
discoveries concerning genetic recombination and the organization of the
genetic material of bacteria”) noted that the desire to test new hypotheses
in science can be the genesis of new approaches that are transformative to
G. 1 Introduction 1 301
the existing scientific paradigm - rather than the other way around. With
this notion in mind, and with the aim of deciphering the functions of the
human and other model genomes, chemical genetics provides an approach
both to discover and to dissect the functions of gene products encoded within a
genome using biologically active small molecules (Fig. 6-1) [4-111. By directly
targeting gene products, mostly encoding for proteins, rather than by mutating
an organism’s genetic material, this approach differs from classical genetics.
However, as discussed in this chapter and elsewhere in this book, the overall
logic of chemical genetics and many of the principles of the approach are
similar to classical genetics. Given the temporal control offered by small
molecules, and the ability to use Combinations of small-molecule modulators,
chemical genetics promises to complement the use of pure genetic analysis
to study a wide range of biological systems and mechanisms. In this regard,
it is possible that many of the hypotheses that can be tested using chemical
genetics will ultimately play a transformative role in the coming years, much
like Beadle and Tatum’s efforts over a half-century ago.
To be effective as probes of biological mechanisms, and to function as
therapeutic agents in the clinical setting, small molecules must modulate
biological states by perturbing cellular networks through interactions with
macromolecular molecules. The challenge of doing this effectively is
highlighted by emerging models from genome- and proteome-wide interaction
6.2
History/Development
OMe
6
Fig. 6-2 Examples of biologically active depressant and sedative; (6) colchicine, an
small molecules whose structural inhibitor o f mitosis that causes microtubule
complexity, protein targets, and consequent destabilization; (7) rapamycin, an anticancer
observable phenotypes are different. (1) agent that inhibits TOR proteins when
Penicillin C,an antibiotic; (2) thiamine complexed t o FKBP12; (8) latrunculin B, a
(vitamin BI), a metabolite that is an enzyme destabilizer of actin microfilaments; (9)
cofactor; (3) geldanamycin, an inhibitor o f caffeine, a central nervous system stimulant
heat-shock protein 90 (HSP90); (4) that targets proteins including cyclic
dopamine, a neurotransmitter; ( 5 ) nucleotide phosphodiesterases.
haloperidol, a central nervous system
( 6 ) first used by the Egyptians over 35 centuries ago for the treatment
of what is now recognized as cancer, and later used to discover tubulin,
a major component of the cytoskeleton; rapamycin (7) a natural product
with anticancer properties first isolated from the bacteria Streptornyces and
later used to discover mammalian FKB P12-rapamycin-associated protein
(FRAP)/mammalian target of rapamycin (mTOR); latrunculin (8), a natural
product isolated from the marine sponge that causes destabilization of the
actin cytoskeleton; and caffeine (9), a naturally occurring methylxanthine
found in coffee and tea, which has several cellular actions, including the
inhibition of cyclic nucleotide phosphodiesterases. Indeed, many aspects of
biological research - from using antibiotics (e.g., ampicillin), to selecting for
the transformation of Escherichia coli with a recombinant DNA plasmid, to
the vitamin constituents (e.g., vitamin B6) of the basic culture media used
to culture mammalian cells, to the inhibition of proteases (e.g., leupeptin)
and phosphatases (e.g., pervanadate) during biochemical purification of
proteins - rely on the use of small molecules. Besides these routine uses
in biology, biologically active small molecules are widely used as imaging
304
I reagents in basic research and clinical diagnosis (e.g., fiuorodeoxyglucose
G Forward Chemical Genetics
Chromatin remodeling
Trapoxin B Cultured cells Reversal of transformed phenotype: histone
3
Histone deacetylases F
acetylation
Depeudecin Cultured cells Reversal of transformed phenotype: histone
5
Histone deacetylases p
acetylation
Trichostatin A Cultured cells Reversal of transformed phenotype; histone Histone deacetylases
3
n
acetylation 3
ITSAl Cultured cells Bypasses cell-cycle arrest by trichostatin A Unknown 9
-
(continued overleaf)
a
cn
Table 6-1 (continued) 3
Small molecule $
Assay format Key phenotype Target a
n
Protein synthesis,folding, traficking, and secretion
Geldanamycin
Leptomycin B Antiviral/antifungal Inhibits nuclear export Crml
Multiple inhibitors In vitro translation extract Inhibition of translation initiation and elongation RNA and varied
Multiple inhibitors Cultured cells Inhibit FOXOla nuclear export Varied
Brefeldin A Antiviraljantifungal Blocks ER-to-Golgi transport Arfl
Exol Cultured cells Blocks ER-to-Golgi transport Unknown
Ex02 Cultured cells Blocks ER-to-Golgi transport Unknown
Multiple sulfonamides Cultured cells Block Golgi-to-cell-membranetransport Unknown
Sortins Cultured cells Induce secretion Unknown
Ubiquitin-proteasome pathway
Lactacystin Cultured cells Neurite induction and protease inhibition Proteasome
Ubistatin Xenopus extract Inhibits ubiquitin-dependent proteolysis Multiubiquitin chain
Signaling pathway
Cyclopamine Cultured cells Inhibits hedgehog signaling Smoothened
Cyclosporin Cultured cells Inhibits T-cell signaling Cyclophilin and calcineurin
FK50G Cultured cells Inhibits T-cell signaling FKBP12 and calcineurin
Rapamycin Cultured cells Inhibits T-cell signaling FKBPl2 and TOR kinase
Fumagillin Cultured cells Inhibits endothelial cell proliferation Methionine aminopeptidase
SMIR4 Cultured cells Suppresses rapamycin Nirlp (Ybr077cp)
Purmorphamine Cultured cells Induces osteogenesis Hedgehog signaling agonist
TWS119 Cultured cells Induces neurogenesis Glycogen synthase kinase-3b
Cardiogenol Cultured cells Induces cardiomyogenesis Unknown
Concentramide Zebrafish embryos Disrupts heart patterning Unknown
GS4012 Zebrafish embryos Suppresses cardiac defect Upregulates VEGF levels
6.3 General Considerations I 307
6.3
General Considerations
6.3.1
Small Molecules as a Means to Perturb Biological Systems Conditionally
6.3.2
Forward and Reverse Chemical Genetics
Table 6-2
Fig. 6-3 Forward versus reverse chemical molecules that can be used t o probe the
genetics. While forward chemical genetics function o f the selected protein. Both
relies on a phenotype o f interest t o guide the approaches require the use o f small
selection o f biologically active small molecules and phenotypic assays but differ
molecules, reverse chemical genetics use a in the starting Points ofdiscovery.
protein of interest t o identify small
Fig. 6-4 Phenotypic assays for chemical the presence of a particular antigen using a
genetics. (a) Types of assays that have been specific primary antibody in solution. A
used for chemical-genetic screening. secondary antibody covalently linked t o
(b) Example o f a cell-based assay involving horseradish peroxidase is added and the
phospho-specific antibody-based presence of the entire complex is detected
determination o f a cell state [31]. A cytoblot through the chemiluminescent reaction
involves growing cells on the bottom of a caused by addition of luminal and hydrogen
well, fixing the cells and probing the cells for peroxide.
be low such that methods of analysis can readily identify which molecules
are active. Ideally, instead of using visual observations or considering a binary
descriptor of “0” or “I”, the assay being used is quantitative in nature in terms
of providing a continuous valued measure of activity that can be recorded
electronically using plate readers designed to measure changes in absorbance,
fluorescence, and luminescence.
High-throughput (10000-200 000 compounds per day) phenotypic assays
involving the measurement of changes in calcium levels or second messengers,
like cyclic adenosine monophosphate (CAMP),in cultured cells have been
possible using “fluorescence imaging plate readers” (FLIPRs) for many years.
However, almost exclusively, these assays have been performed in the context
of the development of drugs targeting directly specific cell surface receptors,
including the large family of G-protein coupled receptors (GPCRs), whose
expression has been engineered to occur in a particular cell line that is readily
amenable to high-throughput screening. While these assays have produced
many biologically active small molecules that work as either receptor agonists
6.3 General Considerations I 313
Fig. 6-5 Example o f a high-content multiple cell types, and phenotypes can be
image-based screen for small molecules that quantified from a single image using image
alter neural stem-cell differentiation. Unlike segmentation and computational analysis.
homogeneous, plate-reader based assays,
G.3 General Considerations I 31 5
product leptomycin, directly inhibited the nuclear export factor CRM1. Besides
this class of compounds, a number of other compounds inhibiting PI3K/Akt
signaling were discovered, which included multiple antagonists of calmodulin
signaling and psammaplysene A [39],a natural product isolated from marine
extracts. Given the importance of the PI3K/PTEN/Akt signal transduction
pathway in a variety of cancers, and the ability of FOXOla targeted to the
nucleus to reverse tumorigenicity of PTEN null cells, these small molecules
and their targets may provide a new generation of therapeutic agents.
6.3.4
Nonheritable and Combinations o f Perturbations
6.3.6
Sources of Phenotypic Variation: Genetic versus Chemical Diversity
genetics would likely have never anticipated such developments, the advent of
even improved methods for genome manipulation, including gene disruptions
due to insertion of transposable elements, gene trap vectors, and homologous
recombination, now allow a wide spectrum of genetic variation to be studied.
The serendipitous discovery of small molecules “spontaneously” produced
by natural sources, such as cultured bacteria and marine sponges, has been a
long-standing source of bioactive small molecules [44, 451. Like the discovery
of X rays and other agents that can induce phenotypic variation, chemical
biologists are becoming increasingly adept at making small molecules that are
suitable for use in forward and reverse chemical-genetic studies [6, 46-49].
These methods include the use of DNA template-mediated, and target-and
diversity-oriented organic synthesis, peptide and carbohydrate synthesis, and
enzyme-mediated synthesis, the latter of which enables in vitro evolution,
protein engineering, and even nonnatural amino acids to be incorporated
into polypeptides. The collective aim is to provide increasingly complex and
effective small-molecule modulators of biological processes by developing
efficient (three- to five-step) syntheses of collections of small molecules having
rich skeletal and stereochemical diversity. Such synthetic strategies are not
directed toward any one molecular target, as occurs in target-oriented synthesis;
instead, the efforts are ultimately aimed at being able to target all molecular
components of the networks regulating biological processes [G,461.
An important conceptual development in chemical library synthesis has
been the recognition of the importance of not only creating diversity (so as to
increase the likelihood of finding an active small molecule) but also retaining
the potential to site- and stereoselectively attach appendages to the small
molecule during a postscreening optimization stage. Such chemical handles
not only facilitate the addition of functionalities that increase the potency
or selectivity of the small molecule but, equally as important, can also be
used to facilitate the identification of interacting target proteins and pathways
(see below). With access to such idealized collections of small molecules, the
challenge for the field of chemical biology includes: (a) determining which of
these molecules have spec@ effects on biological systems (at various levels of
resolution from proteins to whole organisms), (b) determining the structural
and physiochemical properties of molecules that specify associated biological
activities, and ultimately (c) directing future synthetic efforts along particular
pathways in the synthetic network to produce effectively small molecules that
modulate biological systems in any desired manner.
6.3.7
The “Target Identification” Problem
that the exact nature ofthe molecular interactions that give rise to the phenotype
be further investigated, usually by lower-throughput methods. This situation
differs from efforts directed toward target validation through indirect means,
such as loss of function caused by gene targeting, overexpression, or reduction
in expression by RNAi. By considering the effects of small molecules on intact
biological networks as part of the initial discovery process, the logic of forward
chemical genetics is a reversal of the logic of most ofthe current efforts in drug
discovery. Current drug discovery often picks a specific molecular target based
on indirect means of target validation, and then optimizes the interactions of
small molecules with a network of main- and side-chain interactions from an
individual polypeptide in vitro or in silica Since the eventual desire of the drug
discovery approach is to use the small molecule in the context of intact living
systems, the full spectrum of phenotypic effects is later explored only for a few
select compounds. As such, there exists a paucity of information about the
phenotypic effects of large collections of small molecules. Such information
would help enable the design of new probes and generations of small-molecule
therapeutics.
Besides the examples of the identification of the targets of the immuno-
suppressant compounds CsA and FK506 that are described above, there are
a growing number of successful examples of identifying the targets of small
molecules identified from forward chemical-genetic screens (Table 6-2) [SO].
However, as was true for early geneticists who used random mutagenesis to
introduce genetic variation and then faced the challenge of identifying where
in the genome the mutation was, the most challenging aspect of forward
chemical genetics, and the rate-limiting step in the discovery cycle, involves
the identification of the target of the small-molecule perturbation. To be suc-
cessful in targeting the myriad possible gene products that might result in a
desired phenotypic effect, chemical genetics requires access to diverse small
molecules that incorporate structural features to assist in target identification
and resynthesis.
One method of target identification that requires the modification of the
small molecules, which was the approach taken to identify the cellular targets
of CsA and FK506, involves the fractionation of cellular extracts with an
affinity matrix covalently modified with the biologically active small molecules.
A classic example of this approach is that of the identification of the target of
microbially derived cyclotetrapeptide trapoxin B (Fig. 6-7)[Sl]. Like trichostatin
A and butyrate [ 5 2 ] , trapoxin B was known at the time to share the properties
of causing both reversion of oncogene-transformed fibroblast cells and the
accumulation of acetylated histones [Sl]. However, unlike trichostatin A
and butyrate, trapoxin B was found to be an irreversible inhibitor of the
deacetylation of histones, and its cellular and in vitro activity were dependent
on the presence of the epoxide functionality [Sl]. Since trapoxin by itself was
not directly amenable to modification to facilitate target identification, using
a total of 20 steps from commercially available staring material, Taunton and
6.3 General Considerations 1 321
OH
Y297
N , D173
0 (Dl911
(Y303, 0
0
<N
" H131
.I,
(ti1401
D25& %o OJ D166
(D264)
0168 l(D174)
iDli6)
K- -
Fig. 6-7 Target identification o f an inhibitor
o f histone deacetylation.
K - t v Affi-Sol 10 offinity matrix
0
..& , ..
F N'-
B
F N '
MP ' Me
H O I "
6.3.8
Relationship between Network Connectivity and Discovery o f Small-molecule
Probes
models, where protein and genetic interaction networks are robust and have a
power-law distribution of edges, if a random perturbation results in a change
in phenotype, then the perturbation is more likely to target a highly connected
node (a node with many edges) than a node with a low degree of connectivity.
The relevance of these network properties can be illustrated by the following
experiment designed to simulate the act of screening small molecules in a cell-
based assay. Consider four nodes (modeling proteins), with edges (modeling a
function of a protein) of degrees of one, two, three, and four respectively, such
that the total sum of edges equals 10. If these nodes are randomly sampled by
picking an edge (simulating a molecular recognition event in which a small
molecule modulates a protein function), then even though there is a 25%
chance of picking each node, 70% of the time nodes of a degree equal to or
greater than three will be selected (assuming replacement of nodes after each
selection). This preferential selection of highly connected nodes is due to the
increased probability of interacting with a node with many edges. Thus, if we
consider that biological systems have evolved over time, and that many gene
products have been formed by reusing protein domains (e.g.,immunoglobulin
or GTP-binding domains) and by gene duplications, then identifying small
molecules with similar phenotypic effects in evolutionary distant organisms
may provide a method for mapping the chemical properties ofhighly connected
and, therefore, functionally important nodes in biological networks.
In support of this, many small molecules, including: rapamycin (inhibitor
of TOR proteins), FK506 (calcineurin phosphatase inhibitor), trichostatin A
(histone deacetylase inhibitor), colchicines/nocodazole (microtubule desta-
bilizers), taxol (microtubule stabilizer), latrunculin B (actin microfilament
destabilizer), brefeldin A (inhibits ADP ribosylation), etoposide/camptothecin
(topoisomerase inhibitors), wortmanin (phosphatidylinositol kinase inhibitor),
staurosporine (protein kinase C inhibitor), UCN-01 (Chkl/2 inhibitors), caf-
feine (ATM/ATR kinase inhibitors), roscovitine (cyclin-dependent kinase
inhibitor), target functionally important nodes in mammalian cells and have
similar biochemical interactions and phenotypic effects in organisms, such as
S. cerevisiae. Testing the hypothesis that there exists a correlation between the
connectivity of proteins in a biological network and the likelihood of finding a
modulating small molecule by screening will require further characterization
of the targets of biologically active small molecules in multiple biological
systems, and the analysis of the connectivity of these targets in the relevant
biological network.
6.3.9
Computational Framework for Forward Chemical Genetics: Legacy o f Morgan
and Sturtevant
a result of numerous such screens now available in the public domain, the
resulting datasets allow answering this question, but the size and complexity
(in terms of the number of possible comparisons between objects) of the
datasets require the use of computational tools that are designed for allowing
visualization and pattern recognition in high-dimensional spaces.
The need to develop a suitable computational framework is reminiscent of
the need of classical geneticists close to a century ago to develop an analytical
framework to guide the then nascent field. At that time, geneticists such
as Thomas H. Morgan and his graduate student Alfred H. Sturtevant, were
struggling with understanding the nature of Mendelian genes and trying
to interpret a growing amount of observational data on heritable variation
collected using forward genetic screen in the fruit fly Drosophila [2]. Particularly
puzzling was the pattern of inheritance of combinations of traits that did not
sort independently during meiosis as predicted by Mendel’s second law (law
of independent assortment) [l].After many years of collecting mutants and
analyzing data, Morgan and Sturtevant recognized that the “. . .frequency of
crossing over (recornbination) furnish[ed] evidence of the linear order of the
elements (genes) in each linkage group and of the relative position of the
elements (genes) with respect to each other” [2].Accordingly, mutant genes
(or allelic variation) could be “mapped” as a point in a one-dimensional
space using the metric (measured in centiMorgans) of 1% recombination
equal to one map unit. By making overlapping distance measurements, it was
discovered that a genetic map corresponding to the relative arrangement of
genes in the linear space could be constructed.
From these genetic maps, it became apparent that the deviation observed
from Mendel’s law of independent assortment could be explained by “linkage”
of genes due to their location within a similar position in the space representing
the underlying DNA sequence [2]. Although not obvious at the onset of Morgan
and Sturtevant’s studies, the maps of these genetic spaces are now known
to correspond physically to the arrangement of genes within a linear and
continuous sequence of the DNA, constituting a chromosome. In the end,
the recognition that genes could be arranged as a linear series provided the
conceptual foundation for the eventual sequencing of the complete human
and other model organism’s genomes [3].
6.3.10
Mapping of Chemical Space Using Forward Chemical Genetics
research and, potentially, the discovery of novel therapeutic targets and agents
[74-761. But how can biologically active small molecules be “mapped” as points
(loci) in a space? If they can be mapped, what would the global properties
of this space look like and, moreover, what might the global properties of
such space reveal about the nature of the interaction of small molecules with
biological systems? While it is much too early to have a full answer to these
questions, a number of ideas have emerged as to how the “mapping” of small
molecules using biological descriptors might be approached.
Unlike genes, which are physically located at a locus on a chromosome based
on their linkage to other sequences of DNA (although they may move owing
to transpositions and recombination events), small molecules that induce
phenotypic variation in biological systems are themselves not physically located
in a space. Thus, if small molecules are to be mapped to a common space, then
the space must be considered to represent “abstract space” in the sense that it
is mathematically derived [74-761. This abstract space, which we will refer to
as “chemical space”, is formed by multiple dimensions, or axes, such that the
relative distance between small molecules represented by points becomes a
measure of their structural or functional similarity. The notion is that certain
regions in this space correspond to small molecules that have similar structure
or function.
According to such a framework, the corresponding data structure for
analyzing chemical space is most often that of a two-dimensional array, or
matrix, denoted by S, consisting of an ordered array of n columns and m rows
(Fig. 6-10). Each column (y]) in S, corresponds to a descriptor, and is denoted
by a bold face, lower case letter subscripted j (wherej = 1 to n). Each row (xi)
in S corresponds to a chemical, and is denoted by a bold face, lower case letter
subscripted i (where i = 1 to m). Accordingly, an element (en) of S encodes
information (m, n) about chemical m for descriptor n. This allows the elements
of S to be considered as coordinates in a multidimensional space spanned by
the descriptor axes, which, in turn, allows each chemical to be represented
as a vector whose magnitude and direction are given by the corresponding
values in S, x, = [el, e2, . . . . e,]. In this matrix-based representation of chemical
space, the relative distance between chemicals x, becomes a measure of their
similarity with respect to the particular descriptors considered.
As depicted in Fig. 6-10, when considering the dimensions or axes of
chemical space there are two fundamentally different classes of descriptors
that are used: computed and measured [74-761. These classes differ insofar as
the former are generally calculated using a computer and various algorithms
designed to determine the value of a specified mathematical function [77,
781, whereas the latter involve the observation of the effect of a given
small molecule on, for example, the function of a gene product (nucleic
acids, proteins) or metabolite (carbohydrate, lipid, other organic molecules)
[79, 801. Recognizing the distinction between chemical spaces derived from
computed descriptors as compared to measured descriptors is of fundamental
importance. While the former is unambiguously definable, the latter involves
328
IG Fonvard Chemical Genetics
Fig. 6-10 Mapping chemical space 1761. considered. Accordingly, small molecules xi
Principle component models o f chemical can be considered t o befunctionally similar i f
space are shown for 480 small molecules they are closely positioned (i.e., within a
analyzed using 24 computed molecular specified radius) in the underlying
descriptors and 60 measured phenotypic descriptor space. Since similarity between
descriptors derived from a cell-based assay small molecules is determined by the
o f cell proliferation. By considering the pattern o f interaction with biological
elements o f S as coordinates, small systems, the corresponding distance metric
molecules can be modeled as vectors, D complements the definition o f similarity
xi = [el, e2, . . . , en], in an n- dimensional obtained from calculated molecular
vector space. By defining the Euclidean descriptors based on chemical structure.
distance D between two vectors (e.g., x1 and Furthermore, since similarity in cell-based
x2) in this vector space t o be: assays results from patterns o f small
D I =~ C[(x1~ - xz)’], the space o f molecules interacting with expressed gene
chemical-genetic observation can be products, the corresponding distance metric
considered as a metric space. This means D complements the definition o f similarity
the relative distance D between chemicals xi obtained from DNA sequence or
is informative with respect t o similarity gene-expression analysis.
between the particular descriptors
the process of observation, and as such involves noise inherent to the process
of measurement. Measured phenotypic descriptors are also subject to the
influence of a variety of other variables, including the dose of the chemical,
length of treatment, and the genotype of the biological system.
I
6.3 General Considerations 329
Fig. 6-11 Small molecules as chemical node (atom), the type o f edge (bond), and
graphs [Sl].Representation of the structure the connectivity of nodes. Hydrogen atoms
of small molecules as graphs encoded by an are not considered as nodes in the graph.
adjacency matrix that specifies the type of
330
I 6 Forward Chemical Genetics
6.3.1 1
Dimensionality Reduction and Visualization of Chemical Space
Fig. 6-12 Mapping chemical space using characterization. Clustering and the
multidimensional phenotypic descriptors. construction o f chemical-genetic networks
Phenotypic data from multiple assays are provide methods for visualization o f
arranged in a chemical-genetic data array high-dimensional observation spaces and
and computational methods are used t o pattern finding.
select small molecules for further
6.3 General Considerations I 331
SMPs (SMP-1to -7) and a control treatment (e.g., only organic solvent), which
are subject to an array of five, chemical-genetic screens consisting of three
cell-based assays measuring: (a) neurite extension, (b) neuron viability, and
(c) synapse formation, and two in vitro assays with cell extracts to measure the
polymerization of: (d) actin, and (e) tubulin (Fig. 6-14(a)).In the resulting data
matrix, a value of “1” encodes the observation that the SMPs were active in
the assay and otherwise a value of “0” is used. Even with such a small dataset,
which uses a binary rather than a continuous valued measure, the challenge
of defining the major activity patterns and the compounds that are similar to
each other becomes apparent. What exactly does “similar” mean and how is it
computed?
Although for binary data other distance, metrics are in general more
appropriate (e.g., Tanimoto metrics), for simplicity we can compute the
standardized (to the mean and standard deviation of the distribution) Pearson
correlation matrix, which contains the correlation coefficients between each of
the five assays. These data can then be used to cluster the chemicals based on
their correlation as a metric of similarity. The groupings depicted in Fig. 6-14(b)
Assay
Tiibiiliii
A - SMP-1
Q)
Neiirite Exteiisioe N e w o i l Viability Syiiapse Foriliatioil
1 1 1
Actiii
1 0
3 SMP-2 1 0 0 0 1
-8 SMP-3 1 1 1 1 0
1 0 0 1
2 E : 0
- SMP6
1
0 0
I 1
0
0
1
0
0
E SMP-7 1 1 I 0 0
u, coaliol 1 0 0 0 0
reflect the fact that, of the seven SMPs, some had identical patterns of activity
(analogous to mutations mapping to the same region of the chromosome),
while others showed varying levels of common activity (analogous to mutations
mapping to different regions of a chromosome). Likewise, by transposing
the data matrix and considering the small molecules as descriptors for the
phenotypic assays, it becomes possible to use the information encoded in the
pattern of interaction of small molecules with biological systems to classify
the assay measurements instead of the small molecules (Fig. G-l4(c)). Just
as for the small molecules, the resulting data creates a high-dimensional,
information-rich signature of the biological system being probed, which in
turn can be used for pattern recognition and classification. The activity patterns
from small-molecule descriptors can provide a measure of the diversity of
particular cell types or cell states when subject to additional perturbations,
such as those provided by natural genetic variation and chemical-genetic
modifiers. When characterizing different genotypes, the generation of these
“perturbation profiles”, by analogy to mRNA profiling, has been referred to
as chemical-genomicprofiling (see below) [82]. The nature of these profiles can
shed light on the underlying chemical differences between cell states, and
may eventually be useful as cellular network-based diagnostics to complement
traditional use of DNA sequence analysis. However, to date there have been
only a few studies that have purposefully used the patterns of activities of small
molecules to classify biological systems.
Besides clustering, which has been widely used to group small molecules
into various structural and activity classes, another method of dimensionality
reduction for multidimensional chemical-genetic screening is that of principal
component analysis (PCA). Unlike clustering, this method does not group
small molecules into discrete groups by imposing a particular structure of the
data (i.e.,to form clusters). Instead, to analyze the diversity of small molecules,
PCA consists of a linear transformation of the original system of axes formed
by the n-dimensions of the data matrix, where n is the number of descriptors.
This transformation is in the form of a Euclidean distance-preserving rotation,
the directions of which are determined by computing a set of eigenvectors
and corresponding eigenvalues of a diversity matrix created by computing
a standardized covariance matrix (i.e., Pearson correlation coefficients). The
resulting eigenvectors provide a new set of linearly independent, orthogonal
axes, calledfactors or principal components,each ofwhich accounts for successive
directions in the n-dimensional ellipsoid spanning the multivariate distribution
of the original data. The corresponding eigenvalues account for progressively
smaller fractions of the total variance in the original data. Accordingly, PCA
creates a global model that minimizes the information lost on projection into a
space of reduced dimensionality, and is thus well suited for exploring complex
activity patterns and datasets that do not have a clustered structure. Besides
allowing for visualization of multidimensional data, PCA has a practical
application for data analysis, as the reduced number of dimensions simplifies
subsequent computations that may be memory- and time-intensive. While PCA
334
I provides a readily computable, linear dimensionality reduction affording linear
G Fonvard Chemical Genetics
6.3.12
Discrete Methods of Analysis o f Forward Chemical-genetic Data
sw-2 .9.
SMP-3 * ii
Control 8‘2
6.4
Applications and Practical Examples
One of the most useful applications of chemical genetics is to reveal the gene
products that function in pathways or processes in an unbiased manner. In
this section we will describe two practical examples. We will then end with
another example of applying collections of small molecules discovered using
chemical genetics to study the phenotypic differences of cells with different
genotypes in an unbiased, global manner (chemical-genomic profiling).
6.4.1
Example 1: Mitosis and Spindle Assembly
Fig. 6-17 Forward chemical-genetic screen compound activity from the initial cell-based
for inhibitors of mitosis (data from Ref. 73). and in vitro tubulin polymerization assay.
(a) Overview o f mitotic cell cycle. (b) (d) Examples o f a compound that
Example of data from one 384-well plate destabilized microtubules (deploy-2b) and a
form the cytoblot primary screen with compound that stabilized microtubules
increased TC-3 mAb reactivity indicative of (synstab A).
an increased mitotic index. (c) Summary o f
6.4.2
Example 2: Protein Acetylation
Fig. 6-18 New activities in chemical space and antimitotics (blue). In all, there were 20
and the target o f monastrol. suppressors o f ICRF-193, 21 suppressors o f
(a) Three-dimensional representation o f ITSA, 89 antimitotics, and 2 small molecules
chemical space showing the position o f that scored in both the antimitotic and
15 120 small molecules-(colored balls) in a trichostatin A suppressor screen.
molecular descriptor space derived from the Monastrol's location was as shown. Testing
first three principal components axes o f over 30 structurally similar analogs
(W1 W3) obtained from the analysis ofthe
~
revealed no other active compounds [71].
corresponding structural and (b) Cocrystal structure o f monastrol with the
physiochemical descriptors (data from Refs motor domain o f human KSP (Eg5) showing
40, 41, 70, 80). Inset shows 132 biologically that monastrol confers inhibition by creating
active small molecules colored based on an "induced-fit'' to a pocket away from t h e
phenotypic data from cell-based assays for adenosine triphosphate and magnesium
suppressors o f the topoisomerase inhibitor binding site within the catalytic center (data
ICRF-193 (red), suppressors o f t h e histone from Ref. 87).
deacetylase inhibitor trichostatin A (green),
synthetic sources [55]. For example, using a panel of cell-based assays based
on the recognition of histone and a-tubulin acetylation on specific lysine
residues using antibodies and a library of over 7200 small molecules derived
from a diversity-oriented synthesis that included “biasing” elements to target
the compounds toward the family of HDACs [89], over 600 small-molecule
inhibitors of protein deacetylation were identified (Fig. 6-20) [80]. Following
the decoding of chemical tags and resynthesis, the selectivity of one inhibitory
molecule (tubacin) was shown toward a-tubulin deacetylation and that of
another (histacin) toward histone deacetylation (Fig. 6-21) [80]. Tubacin was
found not to affect the level of histone acetylation, gene-expression patterns,
or cell-cycle progression. Using immunoprecipitated, recombinant enzyme,
it was determined that the class I1 histone deacetylase 6 (HDAC6) is the
intracellular target of tubacin [90]. Through a combination of the use of
catalytically inactive point mutations in each of the two catalytic domains
of HDAC6 and tubacin, it was shown that only one of the two catalytic
domains of HDAC6 possesses tubulin deacetylase activity, and that only that
domain’s deacetylase activity could be inhibited by tubacin. Collectively, the
small molecules identified as suppressors of trichostatin A (ITSAs) and the
selective inhibitors of protein deacetylation should facilitate dissecting of the
role of acetylation in a variety of cell-biological processes (Fig. 6-22) [40, 901.
6.4.3
Example 3: Chemical-genomic Profiling
Fig. 6-20 Forward chemical-genetic screen AcLysine-selective (green), and most potent
for inhibitors o f protein deacetylation (data (blue). (c) Chemical-genetic network from
from Ref. 80). (a) Overview o f cell-based screening data after applying the
screens o f the 1,3-dioxane-based, Fruchterman-Reingold “energy”
diversity-oriented synthesis-derived library minimization algorithm
using antibodies t o measure tubulin and (http://vlado.fmf.uni-lj.si/pu b/
histone acetylation. (b) Relative position o f networkslpajekl). Nodes represent either
selected active compounds in a assays or small molecules according t o the
three-dimensional principal component indicated colors. Edges (black lines) connect
model computed from five cell-based assay bioactive small molecules t o the
descriptors. AcTubulin-selective (red), corresponding assay.
Fig. 6-21 Selective inhibitors ofu-tubulin (tubacin) and histone deacetylation (histacin)
identified by chemical-genetic screening [go].
Fig. 6-22 Molecular tools for the dissection o f intracellular protein acetylation [40, 80)
Fig. 6-23 Chemical-genomic profiling (data node networks derived from the mapping o f
from Ref. 82). (a) 276 unique combinations a matrix o f 2 4 x 24 combinations o f small
and 24 single treatments o f “small-molecule molecules against a set o f 10 strains o f t h e
perturbagens” (SMPs) were assayed for an budding yeast. Graphs were visualized using
effect on the cell cycle o f budding yeast. Pajek v0.72 and “energy” minimizations
Each ofthe 10 strains profiled had a performed using the Fructherman-Reingold
different genotype yielding a algorithm (http://vlado.fmf.uni-
three-dimensional matrix o f 24 x 24 x 10 lj.si/pub/networks/pajek/). None o f the 10
observations. (b) Structures o f 23 small chemical-genetic networks were identical,
molecules (other than dimethylsulfoxide) indicating that the structure o f t h e genetic
used to profile 10 yeast genotypes in a network determines the structure ofthe
three-dimensional matrix. (c) Twenty-four chemical-genetic network.
6.5
Future Development
For chemical genetics to truly compete with classical genetics, and for it to
function as a general approach to dissecting biological mechanisms, there
6.5 Future Development 1345
6.6
Conclusion
Indeed, the vista ofthe biochemist is one with a n infinite horizon. And yet, this program of
explaining the simple through the complex smacks suspiciously ofthe program ofexplaining
atoms in terms ofcomplex mechanical models. I t looks sane until the paradoxes crop up and
come into sharperfocus. In Biology we are not yet at the point where we are presented with
clear paradoxes and this will not happen until the analysis ofthe behavior ofliving cells has
been carried intofargreater detail. This analysis should be done on the living cell’s own terms
and the theories should befomulated withoutfear ofcontradicting molecular physics.
Max Delbruck
Nobel prize in medicine or physiology, 1958
Acknowledgments
References
I355
7
Reverse Chemical Genetics Revisited
7.1
Reverse Chemical Genetics - An Important Strategy for the Study of Protein
Function in Chemical Biology and Drug Discovery
7.1.1
Introduction
Drug discovery has seen several paradigm shifts over the last two decades.
Several new techniques have been introduced to widen what was believed
to be the bottleneck of this endeavor at the given time. Although many of
these techniques did not keep their initial promise, there is no doubt that
high-throughput screening (HTS) and protein structure-based drug design
have contributed enormously to the process of developing new high-affinity
protein binders and have made it more efficient. The sequencing of whole
genomes has provided numerous new potential drug targets. Unfortunately,
the undisputed value of these techniques has not (yet) led to an increase
in the number of new chemical entities entering the market. Spectacular
cases of several costly failures of drug candidates in late-stage clinical trials
or - even worse - the withdrawal of several drugs, (e.g., COX-2 inhibitors),
which benefited millions of patients due to unanticipated side effects, has
reminded us that the biological systems with which we are dealing are
extremely complex. Target validation has become the critical factor in drug
discovery. Consequently, all methods that contribute to a deeper understanding
of biological systems ranging from protein function within a cell to the
complex interplay within multicell organisms will gain importance in the
future. Systems biology, although still in its infancy, might be one approach
to achieve this goal.
The pharmacological approach, in which protein function is modulated by
small molecules, has played a prominent role in the study ofbiological systems.
Compared to other and complementary approaches, such as DNA knockouts,
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor. and Cunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KCaA, Weinheim
ISBN: 978-3-527-31150-7
356
I 7 Reverse Chemical Genetics Revisited
I I
- Gene knockout - Antisense - Small molecules
- DNA-binder - RNAi
Scheme 7.1-1 Probing of biological systems on different levels of hierarchy.
7.1.2
History/Developrnent
The concept of reverse chemical genetics has been applied since natural
product probes have been discovered as research tools in biology. In
experiments on the salivary gland of the cat, J. N. Langley (in 1878)
showed the mutually antagonistic effect between pilocarpine and atropine.
He observed a similar relationship between nicotine and curare in his study
of the contraction of muscle cells. These results inspired him to formulate the
“receptor theory” of drugltarget interaction, which has become the main pillar
of pharmacology [4].Once it was realized that the toxicity of colchizine, the
poison of meadow saffron, originates from its ability to lead to cell cycle arrest,
biologists have exploited this property to intentionally create this condition
and study the biological consequences. The use of microtubule poisons has
enabled numerous important discoveries, such as the determination of the
correct number of diploid chromosomes in humans or the demonstration
of the role of microtubuli in cell migration, tumor invasion, or anchoring
of the Golgi complex at the microtubule-organizing center [ S ] . Many other
such probes have been identified and as shown in Table 7.1-1 the number
of references containing their name may serve as an indicator how big their
impact is on biological studies.
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery I 357
Lm
3
r
N
.h
r 2
0
00 m CQ
N L
D
3 2
0 0
fp
""\$
y yo o
0
N
358
I 7 Reverse Chemical Genetics Revisited
00
0
* 2
0
00
N
i
I
0
I
-0
~
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery I 359
Adrenaline
(a@-agonist)
Noradrenaline lsoprenaline
(a-agonist) (P-agonist)
,f--NH2
N
H
Histamine Cimetidine
(agonist) (Hp-agonist)
Fig. 7.1-2 Development ofcimetidine as a Hz-selective agonist for the treatment of ulcers.
7.1 The Study of Protein Function in Chemical Biology and Drug Discovery I 361
7.1.3
General Considerations
The key element of any reverse chemical genetics approach is the access
to a small molecule, which modulates protein function by binding to the
target protein [11]. Such molecules can be identified using two different
approaches (1) HTS of large compound collections and (2) computer-aided
design of compounds on the basis of the structure of the target protein,
directed synthesis, and biological testing of selected compounds.
1. High-throughput screening: HTS is used to test large numbers of
compounds for their ability to affect the activity of target proteins. Today,
entire in-house compound libraries with millions of compounds can be
screened with a throughput of 10000 (HTS) up to 100000 compounds per
day (ultra high-throughput screening, uHTS) using robust test assays [12, 131.
Homogeneous “mix and measure” assays are preferred for HTS as they avoid
filtration, separation, and wash steps that can be time consuming and difficult
to automate. Assays for HTS can be grouped into two categories: so-called
solution-based biochemical assays and cell-based assays [ 14, 151. The former
are based on radioactive (scintillation proximity assay, SPA), fluorescence
(fluorescence resonance energy transfer, FRET, fluorescence polarization, FP,
homogeneous time resolved fluorescence, HTRF, and fluorescence correlation
spectroscopy, FCS), calorimetric and surface plasmon resonance (SPR, e.g.,
BiaCore) detection methods to quantify the interaction of test compounds with
biological target molecules. SPAS in HTS have largely replaced heterogeneous
assays that make use of radiolabeled ligands with subsequent filtration steps to
measure high-affinity binding to receptors. Cell-based assays include (a)second
messenger assays that monitor signal transduction, (b) reporter gene assays
that monitor cellular responses at the transcriptional/translational level, (c) cell
proliferation assays that detect induction or inhibition of cell growth, and
(d) phenotypic assays that monitor change in cell morphology or related
parameters.
Once a robust test assay has been set up, the choice of suitable compound
libraries is the next key step. An excellent source of selective small molecule
probes is the natural product pool. In an evolutionary process of millions of
years, nature has come up with molecular structures that offer an evolutionary
advantage to the species that makes the effort to synthesize these molecules. In
most cases, these molecules are used to defend against enemies or to paralyze
or kill preys. It is in the nature of these processes that such molecular weapons
act most efficiently if they interfere with important biological processes of the
target species, meaning that biologically relevant protein targets are addressed.
A disadvantage of natural compounds is the often complex structure and
the associated low synthetic accessibility. However, as has been outlined in
Chapter 7.1.2 natural products have been the first small molecule probes
used in biological studies and continue to be of significant importance (vide
362
I infia). Recently,the combination of chemoinformatics,bioinformatics, and the
7 Reverse Chemical Genetics Revisited
chemistry of natural products has led to the insight that natural products can
be regarded as evolutionary selected starting points in chemical space and to
the establishment of “natural product guided compound library development”
[ l G , 171. Historically grown libraries of synthetic compounds or compounds
from combinatorial chemistry approaches are usually the first choice in the
pharmaceutical industry for HTS. Every large pharmaceutical company and an
increasing number of startup companies and research institutions now have
access to a collection of these compounds. These collections have been built by
in-house synthetic efforts, purchased from commercial vendors, or obtained
by the synthesis of compound libraries using combinatorial methods [ 181.
2. Computer-assisted drug design: Small molecule probes can also be identified
or designed from scratch using computational tools exploiting knowledge of
pharmacophores or the protein structure as a guiding principle. Computational
tools encompass 3D-pharmacophore searches and high-throughput docking
[17, 191. In 3D-database searching, structures of compounds from virtual
or physically existing libraries are screened to identify compounds that
fulfill a certain spatial arrangement of functional groups (a pharmacophore).
High-throughput docking involves the in silico docking of small molecules
into binding sites of target proteins with known or predicted structure.
Empirical scoring functions are used to evaluate the steric and electrostatic
complementarity (the fit) between the compounds and the target protein.
The highest ranked compounds are then suggested for biological testing.
These software tools are attractive and cost-effective approaches to generate
chemical lead structures, virtually and before committing expensive synthetic
chemistry. Furthermore, they allow rapid and thorough understanding of the
relationship between chemical structure and biological function. Depending
on the software used, the virtual screening of small molecules normally takes
less than a minute per chemical structure per computer processor (CPU)
[17]. Utilizing clusters of CPUs results in a high degree of parallelization. The
throughput with 100parallel CPU machines is even higher compared to current
uHTS technologies. The main advantage is that the method does not depend
on the availability of compounds, meaning that not only in-house libraries can
be searched but also external or virtual libraries. The application of scoring
functions on the resulting data sets facilitates smart decisions about which
chemical structures bear the potential to exhibit the desired biological activity.
On the other hand, the high-throughput docking approach can only be applied
to protein targets for which structural information based on X ray, NUCLEAR
MAGNETIC RESONANCE NMR, or homology models are available.
Once a hit compound has been identified, its specificity to the protein target
has to be assigned. Ideally,the small molecule should exhibit perfect selectivity
toward the protein of interest. In reality, it is more likely that none of the
small molecule probes used today fulfill this requirement. Compounds that
previously had been thought to be specific have turned out to hit more protein
7.7 The Study ofprotein Function in Chemical Biology and Drug Discovery I 363
targets once they are subjected to screens against other protein targets. In the
light of new technological opportunities and by failure of drugs in clinical trials
or practice due to off-target activity, efforts have been initiated to reinvestigate
the biological activity of existing drugs or interesting chemical compounds
and annotate their activity to as many proteins as available. An example of a
pioneering effort toward this direction has been the proteomic analysis of the
selectivity of kinase inhibitors by the groups of Meijer, Daub, and Lockhart
[20-231.
As the development of protein assays progresses rapidly and leads to
improvements in quality and quantity of information and a significant increase
in scope of screened protein targets, the door for full annotation of chemical
compounds has been opened. Screening the hit compound against many
protein targets has become imperative for two reasons: First of all, lack of
selectivity might be addressed by preparation of a second generation compound
library using the methods described above, and secondly, if this process does
not lead to further improvement, knowledge about the off-target promiscuity
of a small compound probe will allow a careful and critical interpretation of
the results of the biological studies carried out with this probe (Scheme 7.1-2).
The small molecule probe that has been selected by the process detailed
above is then used as a tool in a series of biological studies, exploiting the
whole repertoire of modern molecular and cell biology, such as genomic or
proteomic profiling, imaging techniques, or functional readouts (241.
Other techniques that are used for the assignment of gene function involve
the preparation of DNA mutants or gene knockouts, the application of
gene silencing via antisense probes, or RNA interference [25]. As shown
in Scheme 7.1-1, biological systems are probed with these strategies at the
level of genetic information or transcriptional expression. Consequently, the
main advantage of these genetic techniques is the pronounced, in many
cases even absolute specificity,with which they allow the probing of biological
systems (Table 7.1-2).On the other hand, reverse chemical genetics has several
unique advantages complementing these genetic techniques [26, 271:
region [28].
The human genome encodes >SO0 kinases, many of them playing important
roles in key processes such as cell signaling and cell division. Although
all kinases have an ATP-binding pocket, which qualifies them for small
molecule binding, the structural Similarity of these ligand-binding sites
renders specificity almost impossible. Shokat et al. have developed an elegant
approach, which allows for the allele-specificchemical intervention of kinases.
A promiscuous kinase inhibitor was modified by a bulky substituent, which
prohibited binding to the regular ATP-binding sites of native kinases. Almost
all kinases exhibit a hydrophobic residue at the ATP-binding site, which
functions as the “gatekeeper”. Mutational replacement of the gatekeeper-
residue against Gly does not affect the regular activity of the kinase, but
opens intervention by the bulky inhibitor, which interacts only with sensitized
kinases. Shokat et al. used this technique, for instance, to show that there
are significant phenotypic differences between the rapid loss of activity by
inhibition and the deletion of the genomic copy of the cyclin-dependent kinase
Pho85 [29, 301.
366
I 7 Reverse Chemical Genetics Revisited
7.1.4
Applications and Practical Examples
To date, 48 nuclear receptors have been identified in the human genome. Each
of these receptors contains the signature DNA-binding and/or ligand-binding
domain (LBD). However, only 12 receptors bind to the classical steroid and
retinoid hormones, and the remaining 36 have been designated as orphan
nuclear receptors. Researchers from GlaxoSmithKline Inc. used HTS of nat-
ural compound and combinatorial chemistry libraries to deorphanize selected
members ofthe nuclear receptor family [49,50].The farnesoid X receptor (FXR)
has been shown to be weakly activated by farnesol. However, this effect is only
indirect since farnesol does not bind to the receptor. Screening ofa collection of
naturally occurring steroids revealed that FXR is a receptor for bile acids, with
Fig. 7.1-4 Isotype-selective probes for E R a and ERB. Reprinted with permission from The
Endocrine Society [58].
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery I 369
Natural killer (NK) cells and cytotoxic T lymphocytes (CTL)are the primary line
of defense against viruses and other intracellular pathogens in the immune
system. The cytotoxic lymphocytes recognize infected host cells and kill them
with the help of the pore-forming protein perforin and by proteolytic events
carried out by members of the granzyme family of serine proteases. Although
an essential component of immunity under normal conditions, aberrant
cytotoxic lymphocyte activity has been associated with autoimmune disorders
such as rheumatoid arthritis, diabetes, or allograft rejection [GS].
Craik and Mahrus applied a reverse chemical genetics approach to reveal
the role of the most important granzymes A and B in cell lysis, as two
classical approaches of cell biology have led to contradictory results: Cytotoxic
lymphocytes from knockout mice (lacking either granzyme A, granzyme B, or
both) behave relatively normal in their ability to lyse target cells. On the other
hand, a reconstituted system in which target cells are treated with sublytic
levels of perforin and either granzyme A or granzyme B leads to efficient cell
lysis. This discord in findings could result from the well-known limitations
of these two approaches: It is known that the results from genetic deletion
studies are obscured by compensation effects of similar genes, whereas in
reconstituted systems the concentrations and mode of delivery of the agents
can be nonphysiological.
Craik and Mahrus used a positional scanning approach to prepare two
isozyme-specific phosphonate inhibitors as affinity labels of granzymes A and
B (Fig. 7.1-5). Both inhibitors were tested against a panel of all known human
granzymes A, B, H, K, and M and only exhibited activity against their target
protein. Use of these activity-based probes in cytotoxicity assays then allowed
dissection of the contribution of granzymes A and B to lysis of target cells by
N K cells. Granzyme B functions as a major effector of target cell Ivsis, whereas
granzyme A is only a minor effector in the same process. Tlie difFerence
between the outcome of the reverse chemical genetics approach and the above
mentioned conventional experiments might be a consequence of the fact that
in pharmacological studies high temporal control circumvents compensation,
and also because no alterations are made to the concentrations and mode of
delivery of granzymes and perforin.
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery 1 371
Probe A
(granzyme A-selective)
Probe B
(granzyme B-selective)
Fig. 7.1-5 Isozyme-selective probes for reverse chemical genetics of granzymes A and B.
The observation that the Ras proteins are critically involved in the development
of cancer has spurred substantial interest in developing new classes of
antitumor drugs on the basis of interference with the impaired signal
transducing activities of Ras. The Ras proteins belong to the class of proteins
whose biological activity is dependent on lipid modification. In the normal
and oncogenic state, the H- and N-Ras isoforms are anchored to the plasma
membrane by means of S-farnesylation and S-palmitoylation at their C-
terminus, which are required to exert their full biological activity. While
inhibition of the enzyme farnesyltransferase is known and has become a drug
target for intervention of tumors carrying a mutation in the Ras oncogene, the
enzyme responsible for the palmitoylation of the Ras and other G-protein has
not been identified so far.
The only known “bona jide player” in Ras-palmitoylation was acyl protein
thioesterase 1 (APTl), which depalmitoylates H-Ras and other lipidated
proteins [GG].However, its relevance to Ras biology was unclear. In an attempt
to elucidate the biological role of APT1 the groups of Giannis, Kuhlmann,
and Waldmann followed a Chemical Genetics approach, that is, developed a
372
I 7 Reverse Chemical Genetics Revisited
Raspalin 3
(APTl : C
I, = 148 nM)
Fig. 7.1-7 Reduction of PC12 cell differentiation rate by Raspalin i n the PC12
differentiation assay.
Fig. 7.1-8 Inhibition o f plasma membrane staining ofthe plasma membrane (a),
localization o f fluorescently labeled Ras coinjection o f 2 pM inhibitor Raspalin 3
protein by Raspalin 3. Localization ofthe results in an accumulation ofthe lipoprotein
fluorescent lipoprotein was monitored 7 h in cytoplasmic structures, which is typical
after microinjection by confocal microscopy. for nonpalmitoylatable Ras constructs (b).
Although Ras protein alone shows a distinct
Sildenafil
(ViagraTM)
NO
GTP Smooth
~GMP muscle Erection
GMP relaxation
T
Sildenafil
7.1.5
Future Developments
MI1 Gly-Cys-Cys-Ser-Asn-Pro-Val-Cys-His-Leu-Glu-His-Ser-
a 6 b 2 Y u3B2
Asn=u-Cys-NH2
AuIA Gly-Cys-Cys-Ser-Tyr-Pro-Pro-Cys-Phe-Ala-Thr-As~-Ser-
a3p4
Asp-Tyr-vs-NHz
AuIC Gly-Cys-Cys-Ser-Tyr-Pro-Pro-Cys-Phe-Ala-Thr-As~-Ser-
u3p4
Gly-Tyr-CT-NHl
PnIA Gly-Cys-Cys-Ser-Leu-Pro-Pro-Cys-Ala-Ala-Asn-Asn-Pro-
- u3B2
Hz
Asp-Tyrl”1-Cys-N
PnI B Gly-Cys-Cys-Ser-Leu-Pro-Pro-Cys-Ala-Leu-Ser-Asn-Pro-
u7
Asp-Tyrlcys-NH2
EPI Gly-Cys-Cys-Ser-Asp-Pro-Arg-Cys-Asn-Met-Asn-Asn-Pro-
~ 3 ~u3B4. 2 . a7
Asp-TyrlGys-NH2
AnIA Cys-Cys-Ser-His-Pro-Ala-Cys-Ala-Ala-Asn-Asn-Gln-Asp-
- a3p2
TyrIal-Cys-NHl
AnlB Gly-G~Cys-Cys-Ser-His-Pro-Ala-Cys-Ala-Al~-Asn-Asn-
- a3B2
Gln-Asp-Tyr[”l-Cys-NHz
AnIC Gly-Gly-Cys-Cys-Ser-His-Pro-Ala-Cys-Phe-Ala-Ser-As~.
- u3P2
Pro-Asp-Tyrl”I-Cys-NH2
GIC Gly-Cys-Cys-Ser-His-Pro-Ala-Cys-Ala-Gly-As~-Asn-Gln-
u3b2 (~6B283
His-Ile-CGNHz
GID Ile-Arg-~p-Gla~’~-Cys-Cys-Ser-Asn-Pro-Ala-Cys-Arg-Val-
w3P2 2 (u7
Asn-Asn-Hyp-His-Val-Cys
VCl.1 Gly-Cys-Cys-Ser-Asp-P~Arg-Cys-Asn-Tyr-Asp-His-Pro-u3B4
G lu-He-CTNH 2
PIA Arg-Asp-Pro-Cys-Cys-Ser-Asn-Pro-Val-Cys-Thr-Val-His-
a 6 l a 382B3
Asn-Pro-Glu-Ile-Cys-NH2
AuIB Gly-Cys-Cys-Ser-~-Pro-Pro-Cys-Phe-Ala-Thr-Asn-Pro-a3b4
ASP-CYS-NH~
ImI Gly-Cys-Cys-Ser-Asp-Pro-Arg-Cys-Ala-Trp-Arg-Cys-NHl
u7
lmI1 a7
Ala-Cys-Cys-Ser-Asp- Arg-Arg-Cys- Arg-Trp- Arg-qs-N Hz
- - n.d.(not
ImIII Tyr-Cys-Cys-His-Arg-Gly-Pro-Cys-Met-Val-Trp-C>-NHl
determined)
BuIA Gly-Cys-Cys-Ser-Thr-Pro-Pro-Cys-Ala-Val-Leu-Tyr-Cys-
- - a6lu3B2 Y
NH2 a6lu3p4
~
We think that the following developments will shape the future of the field to
a major extent:
1. The completion of the sequencing of the human genome
has provided a global map of the potential landscape of
378
I 7 Reverse Chemical Genetics Revisited
7.1.6
Conclusion
Acknowledgments
References
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study I 385
7.2
Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study
Philip A. Cole
Outlook
7.2.1
Overview
Chemical Biology From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gbnther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study I 387
In 1994, the method of native chemical ligation was developed, which allows
for the efficient linking of large peptide segments with amide bonds [7].
The native chemical ligation strategy is based on Wieland’s chemoselective
reaction between an N-terminal Cys of one peptide and a C-terminal thioester
of another. This methodology was subsequently expanded in 1996 to use
in protein semisynthesis by generating N-terminal cysteines in recombinant
protein fragments via proteolysis [8]. An even more practical advance was
achieved when recombinant protein fragments containing thioesters were
generated by exploiting nature’s inteins [9, 101. These thioesters can be linked
to N-terminal cysteine containing peptides in a process that has been called
expressedprotein ligation (EPL)(Fig. 7.2-1).This technology has been particularly
useful in the study of enzyme recognition, mechanism, and regulation. EPL
is most efficiently applied when the region of the protein under study is near
the C-terminus such that chemical modification can be introduced within the
N-terminal cysteine containing synthetic peptide.
7.2.2
The Enzymology of Posttranslational Modifications o f Proteins
Protein kinase
4&isx
u
ROH ROP0,'-
Hobo- 0
+H3N
Ser
H O G0o -
+H3N
Thr
Ho\o-+H3N
TYr
0 +H3N qo-
Ala
0 "--i.:
+H3N
Phe
0 0-
Fig. 7.2-4 Phosphosphorylated amino acid residues and genetically encoded mimics
0 0
-0 -;Lo- -o-;!.o-
0- 0- 0-
0 0 0 0
slightly different from an ester linkage, they are fairly close approximations.
The relative merits of fluoro versus hydrogen substitution in the bridging
methylene have also been described [21]. While the CF2 is slightly larger than
CH2 and sterically bulkier than a single oxygen atom, CF2, like oxygen, has the
potential to be a hydrogen bond acceptor via the fluorine lone pairs. Perhaps
more importantly, it confers a more physiologic pKa for the nonbridging
phosphate oxygens, encouraging the dianionic form at neutral pH. From a
practical perspective, the CF2 group can be exploited as a specific and sensitive
probe in NMR studies, although this has not been performed routinely.
Early work on the use of phenylalanine phosphonates in synthetic peptides
as SH2 domain ligands and phosphotyrosine phosphatase inhibitors proved
the efficicacy of these agents in medicinal chemistry [20,22]. Incorporation of
phosphonomethylene alanine (Pma)and phosphonomethylene phenylalanine
(Pmp) using nonsense-mediated suppression has also been shown to be
feasible using in vitro translation [5], but this has not been used for practical
applications, perhaps because of scale-up challenges. Pma and Pmp have not
yet been used in vivo in nonsense suppression, presumably because of the
limited cell permeability of the amino acids.
Protein semisynthesis and, in particular, EPL can provide a straightforward
route to phosphonate incorporation. Indeed, these techniques prove valuable
for site-specificincorporation of the standard phosphoamino acids which have
been effectively used in structural and enzymatic analyses [9, 231. EPL is most
efficiently used when the phosphate modification is within 50 amino acids of
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study I 391
the C-terminus of the desired protein or protein fragment. The next simplest
case for protein semisynthesis occurs when the modification of interest is near
the N-terminus and is installed in a C-terminal thioester containing peptide.
Because of the somewhat more challenging task of preparing complex peptides
carrying thioesters, this strategy can be a bit more cumbersome than EPL.
However, phosphonates have now been incorporated using both strategies
and in the following text, we will describe applications of these approaches in
investigations on PTPs and serotonin N-acetyltransferase.
PTPase c SHP-1
Fig. 7.2-6 Domain architecture of protein tyrosine phosphatases SHP-1 and SHP-2. The
highlighted tyrosine residues are modified by protein tyrosine kinases.
UnphosphorylatedSHP-2
PTPase
-
Y-542
/ \
protein tyrosine kinase
,7-580
pj-580
pY-542
i C-SH2
PTPase
PTPase
542-Phosphorylated
580-Phosphorylated
-
dNH2
- H
C02H
Tryptophan
Hydroxylase
0 2
H o
H
d ' 2
Aromatic
aminoacid
decarboxylase
H
Serotonin
L-Tryptophan (5-hydroxytryptarnine)
0 0
A
-
..
H H
N-Acetyl-serotonin
Melatonin
. ... ..
"Destruction"
dirner
"Protection"
Fig. 7.2-9 Proposed model for the regulation of serotonin N-acetyltransferase (AANAT)
by phosphorylation.
Pma-32 and PhosThr32 AANAT proteins showed strong (and similar) affinity
for the 14-3-3interaction, whereas the Ala and Glu AANAT proteins showed
minimal binding to 14-3-3under these conditions [32]. Likewise, F2 Pma-205
and PhosSer205 AANAT showed similar 14-3-3binding affinity to each other
but enhanced 14-3-3affinity compared to Ser205 AANAT.
The stabilities of semisynthetic AANATs were explored in Chinese hamster
ovarian (CHO)cells using microinjection methods [32,33].This cell type, while
not identical to the natural pinealocytes, has been shown to recapitulate many
of the features of AANAT regulation and has, thus, been used as a model system
[34].Immunocytochemistry showed that nonphosphorylated AANAT injected
into CHO cells is readily observed minutes after microinjection but disappears
mostly by 1 h [32]. Stabilities were low and similar for PhosThr32 and Glu32
containing AANATs. Strikingly, Pma-32 AANAT is greatly stabilized compared
to each of these other proteins, indicating a direct role for this phosphorylation
event in stimulating melatonin production [32].It is noteworthy that PhosThr32
AANAT showed diminished stability compared to Pma-32 AANAT and this
suggests that phosphatases play a critical role in rapidly reversing the effects
of cellular phosphorylation. The importance of 14-3-3 in contributing to the
AANAT regulation was revealed by demonstrating that PhosThr32 AANAT but
not Glu32 AANAT was significantly stabilized by concomitant microinjection
with the 14-3-3 adaptor protein [32]. Related findings were demonstrated in
the case of Ser205-modified protein comparing F2Pma and Ser205 AANAT
stability [33].Thus, phosphonate analogs have been effectivelyutilized to clarify
the basis of AANAT and melatonin regulation.
vivo.They can be used to aid in structural studies and other biophysical analyses.
Numerous natural products and synthetic scaffolds have been employed for
this purpose [35]. Most efforts that have led to potent protein kinase inhibitors
have exploited the ATP-binding site [35]. The advantage of this site is that it is
relatively hydrophobic, deep, and contains hydrogen bond donorslacceptors,
which allow for enhanced affinity. Molecules that target the ATP site are often
cell permeable and can show favorable pharmacokinetic properties. However,
ATP binding is relatively conserved among protein kinases, making specificity
difficult to achieve.
Because protein kinases, by definition, always must bind a protein substrate
prior to phosphorylation, compounds that disrupt this interaction would also
be useful kinase inhibitors. The advantage of protein substrate sites is that
they often display relatively specific interactions with their individual tar-
gets, necessary for achieving their precise biological functions [36]. However,
the kinase interactions with protein targets are often of modest affinity,
reflecting the shallow interaction surfaces involved. Aside from a few notable
exceptions often inspired by naturally occurring protein kinase inhibitor
peptide sequences [37],protein substrate site inhibitors have not yet proved to
be highly efficacious.
An approach to inhibitors that have the potential to improve both potency
and specificity involves the covalent linking of nucleotide and peptide site
ligands. Often termed bisubstrate analogs, these compounds can, in principle,
achieve binding energies that are equal to or greater than the sum of the
binding energies of the individual ligands [38]. In the case of protein kinases,
much of the potency can be expected to be derived from the nucleotide-
binding site, whereas the specificity should relate to the more divergent
protein substrate-binding site. A critical element in the design of such protein
kinase-bisubstrate analog inhibitors relates to the choice of the linker. To
underscore this point, an early effort to prepare a potent protein kinase A
bisubstrate inhibitor resulted in a relatively weak compound [39]. In this
design, the consensus peptide substrate kemptide was directly linked via its
Ser oxygen to the y-phosphate of ATP generating 1 (Fig. 7.2-10). Bisubstrate
analog 1 showed an approximate Ki of 125 p M and was slightly weaker in
affinity than ATP itself [39].
R2
RZ
H O OH HO OH
R1=NH2-Leu-Arg-Arg-Ala- R =AcNH-Lys-Lys-Lys-Leu-Pro-Ala-Thr-Gly-Asp-
1
R2= -Leu-Gly-C02H R,= -Met-Asn-Met-Ser-Pro-Val-Gly-Asp-C02H 2
n
HO OH
NH2
HO OH
b
0
R2
6
HO OH
R, =AcNH-Lys-Lys-Lys-Leu-Pro-Ala-Thr-Gly-Asp-
Rp= -Met-Asn-Met-Ser-Pro-VaCGly-Asp-COzH
Fig. 7.2-12 Bisubstrate analog inhibitors of the insulin receptor kinase with varying
linkers.
7 o y p ?3
R4
HNxNH2
HNLNH
1. (PhW4Pd(0)
+
2. Et2NCS2H
Et3N R4
NH,
7 HO OH
R, =AcNH-Leu-Arg-Arg-Ala-
R2= -Leu-Gly-C02H
R,=AcNH-Leu-Arg( Pmc)-Arg(Pmc)-Ala-
R4= -Leu-Gly-C02-Wang resin
Fig. 7.2-13 Synthetic scheme for the generation o f a protein kinase A selective
bisubstrate analog inhibitor based on a dissociative transition state.
References I401
References
1. L.N. Johnson, D.C. Phillips, Nature 5. L. Wang, P.G. Schultz, Angav. Chem.,
1965, 206,761-763. Int. Ed. Engl. 2004,44, 34-66.
2. C.T. Walsh, Enzymatic Reaction 6. C.]. Wallace, Cum. Opin. Biotechnol.
Mechanisms, W.H. Freeman, 1978, 1995, 6,403-410.
New York, NY. 7. P.E. Dawson, T.W. Muir,
3. G. Winter, A.R. Fersht, A.J. 1. Clark-Lewis, S.B. Kent, Science 1994,
Wilkinson, M. Zoller, M. Smith, 266, 776-779.
Nature 1982, 299,756-758. 8. D.A. Erlanson, M. Chytil, G.L.
4. T.W. Muir, S.B. Kent, Curr. Opin. Verdine, Chem. B i d . 1996,
BiotechnoL 1993, 4,420-427. 3,981-991.
7 Reverse Chemical Genetics Revisited
402
I 9. T.W. Muir, D. Sondhi, P.A. Cole, Proc. 27. W. Lu, K. Shen, P.A. Cole,
Nat!. Acad. Sci. U.S.A. 1998, 95, Biochemistry 2003, 42, 5461-5468.
6705-6710. 28. Z. Zhang, K. Shen, W. Lu, P.A. Cole,
10. T.C. Evans Jr, J. Benner, M.Q. Xu, J . Biol. Chem. 2003, 278,4668-4674.
Protein Sci. 1998, 7, 2256-2264. 29. T. Araki, H. Nawa, B.G. Neel,J. Biol.
11. C.T. Walsh, Posttranslational Chem. 2003,278,41677-41684.
Modijcation of Proteins: Expanding 30. S . Ganguly, S.L. Coon, D.C. Klein, Cell
Nature’s Inventory, Roberts & Co, 2005, Tissue Res. 2002, 309, 127-137.
Greenwood Village, Co. 31. S. Ganguly, J.L. Weller, A. Ho,
12. G. Manning, D.B. Whyte, R. Martinez, P.Chemineau, B. Malpaux, D.C.
T. Hunter, S. Sudarsanam, Science Klein, Proc. Natl. Acad. Sci. U.S.A.
2002,298,1912-1934. 2005, 102,1222-1227.
13. A. Alonso, J. Sasin, N. Bottini, 32. W. Zheng, Z. Zhang, S. Ganguly, J.L.
I. Friedberg, A. Osterman, A. Godzik, Weller, D.C. Klein, P.A. Cole, Nat.
T. Hunter, J. Dixon, T. Mustelin, Cell Struct. Biol. 2003, 10, 1054-1057.
2004, 117,699-711. 33. W. Zheng, D. Schwarzer, A. LeBeau,
14. K.M. Shokat, Chem. Biol. 1995, 2, J.L. Weller, D.C. Klein, P.A. Cole,].
509-514. Biol. Chem. 2005,280,10462-10467.
34. G. Ferry, J. Mozo, C. Ubeaud,
15. M.A. Shogren-Knaak, P.J. Alaimo,
K.M. Shokat, Annu. Rev. Cell Deu. Biol. S. Berger, M. Bertrand, A. Try,
2001, 17,405-433. P. Beauverger, C. Mesangeau,
16. S.A. Johnson, T. Hunter, Nat. Methods P. Delagrange, J.A. Boutin, Cell. Mol.
2005, 2,17-25.
L f e Sci. 2002,59,1395-1405.
35. P. Cohen, Nat. Rev. Drug Discov. 2002,
17. D.M. Williams, P.A. Cole, Trends
1, 309-315.
Biochem. SOC.2001, 26, 271-273.
36. D.S. Lawrence, J. Niu, Pharmacol.
18. P.A. Cole, A.D. Courtney, K. Shen,
Ther. 1998, 77, 81-114.
Z. Zhang, Y. Qiao, W. Lu, D.M.
37. J.H. Lee, S.K. Nandy, D.S. Lawrence, J .
Williams, Acc. Chem. Res. 2003, 36,
Am. Chem. SOC.2004, 126,3394-3395.
444-452.
38. K. Parang, P.A. Cole, Pharmacol. Ther.
19. D. Wang, P.A. Cole,J. Am. Chem. SOC.
2002, 93,145-157.
2001, 123,8883-8887.
39. D. Medzihradszky, S.L. Chen, G.L.
20. S.M. Domchek, K.R. Auger, Kenyon, B.W. Gibson, J . Am. Chem.
S. Chatterjee, T.R. Burke Jr, S.E. SOC.1994, 116,9413-9419.
Shoelson, Biochemistry 1992, 31, 40. A.S. Mildvan, Proteins 1997, 29,
9865-9870. 401-416.
21. L. Chen, L. Wu, A. Otaka, M.S. Smyth, 41. K. Parang, J.H. Till, A.J. Ablooglu,
P.P. Roller, T.R. Burke Jr, J. den R.A. Kohanski, S.R. Hubbard, P.A.
Hertog, Z.Y. Zhang, Biochem. Biophys. Cole, Nat. Struct. Biol. 2001, 8, 37-41.
Res. Commun. 1995,216,976-984. 42. A.C. Hines, K. Parang, R.A. Kohanski,
22. T.R. Burke Jr, Z.J.Yao, D.G. Liu, J. S.R. Hubbard, P.A. Cole, Bioorg.
Voigt, Y. Gao, Biopolymers 2001, 60, Chem. 2005,33,285-297.
32-44. 43. S.R. Hubbard, EMBOJ. 1997, 16,
23. J.W. Wu, M. Hu, J. Chai, J. Seoane, 5572-5581.
M. Huse, C. Li, D.J. Rigotti, S. Kyin, 44. A.C. Hines, P.A. Cole, Bioorg. Med.
T.W. Muir, R. Fairman, J. Massague, Chem. Lett. 2004, 14,2951-2954.
Y. Shi, Mol. Cell. 2001, 8, 1277-1289. 45. P.A. Cole, K. Shen, Y. Qiao, D. Wang,
24. W. Lu, D. Gong, D. Bar-Sagi, P.A. Curr. Opin. Chem. Biol. 2003, 7,
Cole, Mol. Cell. 2001, 8, 759-769. 580-585.
25. H. Cho, R. Krishnaraj, M. Itoh, 46. K. Shen, P.A. Cole, J . Am. Chem. SOC.
E. Kitas, W. Bannwarth, H. Saito, C.T. 2003, 125,16172-16173.
Walsh, Protein Sci. 1993, 2, 977-984. 47. T. Obsil, R. Ghirlando, D.C. Klein,
26. B.G. Ned, H. Gu, L. Pao, Trends S. Ganguly, F. Dyda, Cell 2001, 105,
Biochem. Sci. 2003, 28, 284-293. 257-267.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
7.3
Chemical Strategies for Activity-based Proteomics
Outlook
7.3.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited bv Stuart L. Schreiber. Tarun M. Kauoor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag G d b H & Co KGaA Weinheim
ISBN 978-3-527-31150-7
404
I order functional proteomics methods, a chemical proteomic strategy referred
7 Reverse Chemical Genetics Revisited
7.3.2
History/Development
Microarrays
t
Genomics
MudPlT
Proteomics
Chemical
probes
f
ABPP
Fig. 7.3-1 Overview of genomic and (ABPP) applies active site-directed chemical
proteomic methods. Standard genomic and probes t o measure dynamics in enzyme
proteomic approaches measure changes in activities, directly in the context of whole
mRNA and protein abundance, respectively. Proteomes and living systems.
In contrast, activity-based protein profiling
7.3 Chemical Strategiesfor Activity-based Proteomics I 407
7.3.3
General Considerations
4
Protease gene
t- Transcription
4
Protease mRNA
+Translation
Inactive
zyrnogen
J- t Secretion
Inactive
4
zyrnogen
+Activation
Endogenous Active
inhibitors protease
1 t Degradation
ECM
of a broad range of enzymes from a particular enzyme class (or classes), and
(b) one or more chemical tags, such as biotin and/or a fluorophore, for the
consolidated detection and isolation of probe-labeled enzymes from complex
proteomes. The RG elements of moderate reactivity and electrophilicity were
selected, thereby priming them to preferentially modify enzyme active sites
that offer a binding pocket enriched in nucleophilic residues important for
catalysis. Finally, in certain cases a third structural element may also be
introduced into probes design in the form o f a binding group (BG) intended
to direct RGs to different enzyme active sites present in the proteome.
nondirected ABPP, which utilizes probes that, unlike directed reagents, lack
well-established selectivity for a given class of enzymes. Screening libraries of
probes against individual proteomes also provided a complementary method to
detect specifically labeled proteins, which were expected to show selectivity for
a select number of probes on the basis of the structure of their respective BGs
and should therefore be discernible from proteins that reacted indiscriminately
(i.e., nonspecifically) with the probe library.
The utility of nondirected methods for ABPP was initially demonstrated
with a modest-sized library of sulfonate ester (SE) probes bearing varying
alkyl/aryl BGs that was generated and screened against a collection of tissue
and cell line proteomes [43,44]. The SE-group was selected as the library’s
RG based on a general survey of the literature, which revealed that a large
range of enzyme classes, including proteases, kinases, and phosphatases,
are susceptible to covalent inactivation by natural products and/or synthetic
inhibitors that possess carbon electrophiles. Accordingly, it was hypothesized
that ABPP probes incorporating a carbon electrophile RG may prove capable
of profiling enzymes not only within but also across mechanistically distinct
classes. Consistent with this premise, several heat-sensitive protein targets of
the sulfonate library were identified and found to represent members of at least
nine different enzyme classes (Table 7.3-1). Interestingly, each enzyme target
displayed a unique reactivity profile with the SE probe library, indicating
that the structure of the variable BG strongly influenced probe-protein
interactions. Several lines of evidence supported that the sulfonate probes
labeled the active sites of their enzyme targets. For example, the addition
of cofactors and/or substrates was found to inhibit the labeling of several
enzymes, while the reactivity of others was either positively or negatively
affected by known allosteric regulators of catalytic activity [43,441. Notably,
for one enzyme target, aldehyde dehydrogenase-1 (ALDH-1) sulfonate probes
were shown to act as time-dependent inactivators of catalytic activity [43, 441.
Finally, advanced LC-MS platforms for ABPP have revealed that, in nearly
all cases, SE probes label their enzyme targets on conserved active site
residues [27].
While these original studies demonstrated that nondirected strategies can in
fact deliver bonafide activity-based probes for enzyme families not yet accessible
by directed methods, one major drawback still existed in the limited structural
diversity of the SE library, a factor proposed to be responsible for the modest
differences in the proteome reactivity profiles observed for these probes. To
test the hypothesis, that exploring further proteome space would require a
more structurally diverse library of electrophilic agents, one such library was
developed in which an a-chloroacetamide (a-CA)RG was coupled to a variable
dipeptide BG that would enable the intrinsic diversity of amino acid functional
groups to be exploited for probe binding to additional enzyme families [45].In
addition to its tempered electrophilicity (stable under many synthetic chemistry
conditions), the a-CA group is small in size, therefore limiting the likelihood
412
I 7 Reverse Chemical Genetics Revisited
6
5
m
W
m
v)
v)
-
U
W
$
S
W
I
W
c
'0
S
m
v)
2
c
2
m
4-
al
n
ea
a
n
m
Q
7
x
2 %
S P
I_mE
7.3 Chemical Strategiesfor Activity-based Proteomics I 413
a,
F
414
I 7 Reverse Chemical Genetics Revisited
-
*
3
0"
*
I
7.3 Chemical Strategiesfor Activity-based Proteamics I 41 5
7.3.4
Applications and Practical Examples
survival of several human parasites, the specific roles played by these enzymes
during the complex life cycle of P. fulcipururn remain ill defined. ABPP of
P. fulcipurum proteomes isolated at various stages of the parasite life cycle
identified a specific cysteine protease, falcipain 1,that was upregulated during
the invasive merozoite stage of growth. Falcipain 1-selective inhibitors were
then identified by screening epoxide-based chemical libraries for compounds
that blocked probe labeling of this enzyme in complex proteomes. These
inhibitors were subsequently demonstrated to inhibit parasite invasion of host
erythrocytes, with no detectable effect on other parasite processes (as opposed
to the general papain family protease inhibitor, E-64,which produced multiple
aberrations and, ultimately, developmental arrest). Importantly, this ABPP
analysis of falcipain 1 function and inhibition was carried out directly in whole
parasite lysates, circumventing the need for technically difficult gene ablation
experiments and/or recombinant enzyme expressions that often serve as the
basis for such studies.
Fig. 7.3-5 Inhibitor discovery by ABPP. The analyzed to identify enzymes sensitive t o
potency and selectivity of inhibitors can be individual inhibitors (reflected by a reduction
profiled in parallel by performing in intensity of probe labeling). Active
competitive ABPP reactions in proteomes. enzymes are denoted by open/unshaded
Complex proteomes are treated with a active sites, with their inhibitor-bound
reversible inhibitor library and an counterparts shaded in color.
activity-based probe, and subsequently
tightly regulating their activity within the cell, including spatial and temporal
expression, binding to small-molecule or protein cofactors, and posttransla-
tional modification. Furthermore, since the physical disruption of cells and
tissues may alter the concentrations of endogenous activators/inactivators of
enzymes, as well as their respective subcellular distributions, i n vitro proteomic
preparations can only, at best, approximate the dynamic functional state of
proteins within the physiologically relevant environment of the living cell or
organism.
A general method for performing ABPP in vivo required that this strategy
be transformed into a “tagfree” method, as most reporter groups (e.g., biotin
and fluorophores) inhibit the cell permeability and distribution of probes. To
address this issue, bio-orthogonal chemical reactions were sought to enable
ligation of reporter tags onto proteins after covalent labeling by ABPP probes.
In one example, conjugation of the reporter group to the probe following
proteome labeling was accomplished by engineering into these reagents a pair
of biologically inert coupling partners, the alkyne and azide, which can react
to form a stable triazole product via the Huisgen’s 1,3-dipolar cycloaddition
reaction [51, 521. The key to the success of this strategy was the recent
description by Sharpless and colleagues of a Cu(1)-catalyzed,stepwise version
of the azide-alkyne cycloaddition reaction, which can be carried out under
mild conditions to produce high yields of product in rapid reaction times (“click
chemistry” [53]).Click chemistry-based ABPP has been applied to living cells
and organisms, leading to the discovery of enzymes that are selectively labeled
i n vivo but not i n vitro [52]. A second bio-orthogonal reaction, the Staudinger
ligation, has also been applied to profile proteasomal subunits labeled i n
situ with azide-modified probes [37]. Collectively, these studies emphasize
the importance of performing ABPP in vivo and underscore the value of
bio-orthogonal chemical reactions to achieve this goal.
7.3.5
Future Development
obvious.
Finally, as the proteome coverage of ABPP continues to grow, it is
becoming clear that this strategy would benefit from improved methods for
the qualitative and quantitative analysis of probe-labeled samples. Currently,
most probe-labeled proteomes are analyzed by 1DE or 2DE, which exhibit
limited resolving power, especially for large protein families with members
of similar molecular mass. Future efforts to merge ABPP with gel-free (e.g.,
LC-MS [27], capillary electrophoresis [28]) proteomic platforms, may provide a
complementary strategy for resolving large numbers of probe-labeled enzyme
activities. The enhanced resolution offered by gel-free methods may permit the
multiplexing of ABPP probes, such that proteomes of limited quantity could
be analyzed simultaneously with a collection of probes. Adapting ABPP for
direct LC-MS analysis should also permit comparative quantitation of probe-
labeled proteomes by isotope-coded mass tagging [ l l ] . Still, it is important to
emphasize that, although such LC-MS platforms will surely exhibit superior
resolving power compared to 1DE gel-based methods for analyzing probe-
labeled proteomes, the 1DE approach does possess the advantage of exhibiting
much higher throughput (i.e., dozens of proteomes can be compared on
a single gel). Thus, the choice of whether to employ gel-based or gel-free
strategies (or both) for the analysis of ABPP experiments will likely depend on
the scientific problem under examination, with the former strategy being more
suitable for the rapid comparison of large numbers of proteomes and the latter
approach being superior for the in-depth analysis of a restricted set of samples.
In either case, continued efforts to advance both the chemical and technical
components of ABPP should foster the development of an increasingly robust
and sensitive platform for the functional analysis of both the proteome and its
individual constituents.
7.3.6
Conclusions
The field of proteomics aims to develop new tools and methods for the
functional characterization of proteins on a global scale. The daunting size and
diversity of eukaryotic proteomes, however, have inspired efforts to approach
this goal by developing technologies that address the proteome as tractable
functional units, that is, the profiling of activity state of specific enzyme classes.
In this chapter, we have attempted to illustrate how ABPP offers a powerful
strategy to directly access higher order biological information to assist in
elucidating the function of proteins in complex cell and organismal systems.
Ultimately, the general and systematic application of ABPP will likely require
the advent of integrated platforms for the design, synthesis, and analysis of
chemical probes that target a large diversity of enzyme classes. However,
as outlined here, the success of ABPP studies carried out thus far suggests
References I 4 2 3
that this goal may in fact be attainable. This is highlighted by the impressive
number of enzyme classes for which activity-based probes have already been
developed as a result of both directed and nondirected approaches, as well as
the insights that have been gained by applying ABPP to complex biological
systems, ranging from cancer cells and tumors to invasive malarial parasites
to mouse models of obesity.
More broadly, this chapter has attempted to emphasize the potential ofABPP
to identify new diagnostic markers and therapeutic targets for human disease.
Through the integration of the comparative and competitive profiling platforms
that have been described here, ABPP provides a powerful new avenue for
the parallel discovery of disease-associated enzymes (target discovery) and
chemical inhibitors thereof (inhibitor discovery), thus complementing the
studies being carried out within other realms of chemical biology, as well
as providing valuable tools and insight that can be beneficial across multiple
disciplines, extending from the lab to the clinic. Indeed, it has been recently
stated that chemical biology, as a whole, has as one of its grand challenges the
charge of identifying small-molecule modulators for each individual function
of all human proteins [58], which would address the large gap that currently
exists between basic and clinical research. We anticipate that ABPP will play
an important role in achieving this goal.
Acknowledgments
The authors would like to acknowledge the support of the National Institutes of
Health [CA087660(B.F.C.)],the California Breast Cancer Research Foundation
(N.J. and B.F.C.), and the Skaggs Institute for Chemical Biology.
References
1. P.O. Brown, D. Botstein, Exploring the profiling, Cum. Opin. Chew. Biol.
new world of the genome with DNA 2004, 8, 54.
microarrays, Nut. Genet. 1999, 21, 33. 6. L.J. van’t Veer, H. Dai, M.J. van de
2. S.D. Patterson, R. Aebersold, Vijver, Y.D. He, A.A. Hart, M. Mao,
Proteomics: the first decade and H.L. Peterse, K. van der Kooy, M.J.
beyond, Nat. Genet. 2003, 33, 311. Marton, A.T. Witteveen, G.J.
3. B. Kobe, B.E. Kemp, Active Schreiber, R.M. Kerkhoven,
site-directed protein regulation, Nature C. Roberts, P.S. Linsley, R. Bernards,
1999,402,373. S.H. Friend, Gene expression
4. Y. Liu, M.P. Patricelli, B.F. Cravatt, profiling predicts clinical outcome of
Activity-based protein profiling: the breast cancer, Nature 2002, 415,530.
serine hydrolases, Proc. Natl. Acad. 7. R.A. Heller, M. Schena, A. Chai,
Sci. U.S.A. 1999, 96, 14694. D. Shalon, T. Bedilion, J. Gilmore,
5. N. Jessani, B.F. Cravatt, The D.E. Woolley, R.W. Davis, Discovery
development and application of and analysis of inflammatory
methods for activity-based protein disease-related genes using cDNA
7 Reverse Chemical Genetics Revisited
424
I microarrays, Proc. Natl. Acad. Sci. 16. Y. Ho, A. Gruhler, A. Heilbut, G.D.
U.S.A. 1997, 94, 2150. Bader, L. Moore, S.L. Adams,
8. T. Kodadek, Protein microarrays: A. Millar, P. Taylor, K. Bennett,
prospects and problems, Chew. Biol. K. Boutilier, L. Yang, C. Wolting,
2001, 8,105. I. Donaldson, S. Schandorff,
9. W.F. Patton, B. Schulenberg, T.H. J. Shewnarane, M. Vo, J. Taggart,
Steinberg, Two-dimensional M. Goudreault, B. Muskat,
electrophoresis: better than a poke in C. Alfarano, D. Dewar, Z. Lin,
the ICAT? Curr. Opin. Biotechnol. K. Michalickova, A.R. Willems,
2002, 13, 321. H. Sassi, P.A. Nielsen, K.J.
10. V. Santoni, M. Molloy, T. Rabilloud, Rasmussen, J.R. Andersen, L.E.
Membrane proteins and proteomics: Johansen, L.H. Hansen, H. Jespersen,
un amour impossible? Electrophoresis A. Podtelejnikov, E. Nielsen,
2000, 21,1054. J. Crawford, V. Poulsen, B.D.
11. S.P. Gygi, B. Rist, S.A. Gerber, Sorensen, J. Matthiesen, R.C.
F. Turecek, M.H. Gelb, R. Aebersold, Hendrickson, F. Gleeson, T. Pawson,
Quantitative analysis of complex M.F. Moran, D. Durocher, M. Mann,
protein mixtures using isotope-coded C.W. Hogue, D. Figeys, M. Tyers,
affinity tags, Nat. Biotechnol 1999, 17, Systematic identification of protein
994. complexes in Saccharomyces
12. M.P. Washburn, D. Wolters, J.R. Yates cerevisiae by mass spectrometry,
111, Large-scale analysis of the yeast Nature 2002, 415, 180.
proteome by multidimensional 17. G. MacBeath, S. Schreiber, Printing
protein identification technology, Nat. proteins as microarrays for
Biotechnol. 2001, 19, 242. high-throughput function deter-
13. D.K. Han, J. Eng, H. Zhou, mination, Science 2000, 289, 1760.
R. Aebersold, Quantitative profiling of 18. H. Zhu, M. Bilgin, R. Bangham,
differentiation-induced microsomal D. Hall, A. Casamayor, P. Bertone,
proteins using isotope-coded affinity N. Lan, R. Jansen, S. Bidlingmaier,
tags and mass spectrometry, Nat. T. Houfek, T. Mitchell, P. Miller, R.A.
Biotechnol. 2001, 19, 946. Dean, M. Gerstein, M. Snyder, Global
14. T. Ito, T. Chiba, R. Ozawa, analysis of protein activities using
M. Yoshida, M. Hattori, Y. Sakaki, A proteome chips, Science 2001, 293,
comprehensive two-hybrid analysis to 2101.
explore the yeast protein interactome, 19. D. Kidd, Y. Liu, B.F. Cravatt, Profiling
Proc. Natl. Acad. Sci. U.S.A. 2001, 98, serine hydrolase activities in complex
4569. proteomes, Biochemistry 2001, 40,
15. A.C. Gavin, M. Bosche, R. Krause, 4005.
P. Grandi, M. Marzioch, A. Bauer, 20. N. Jessani, Y. Liu, M. Humphrey, B.F.
J. Schultz, J.M. Rick, A.M. Michon, Cravatt, Enzyme activity profiles of the
C.M. Cruciat, M. Remor, C. Hofert, secreted and membrane proteome that
M. Schelder, M. Brajenovic, depict cancer invasiveness, Proc. Natl.
H. Ruffner, A. Merino, K. Klein, Acad. Sci. U.S.A. 2002, 99, 10335.
M. Hudak, D. Dickson, T. Rudi, 21. Y.A. DeClerck, S. Imren, A.M.P.
V. Gnau, A. Bauch, S. Bastuck, Montgomery, B.M. Mueller, R.A.
B. Huhse, C. Leutwein, M.A. Heurtier, Reisfeld, W.E. Laug, Proteases and
R.R. Copley, A. Edelmann, protease inhibitors in tumor
E. Querfurth, V. Rybin, G. Drewes, progression, Adv. Exp. Med. Biol. 1997,
M. Raida, T. Bouwmeester, P. Bork, 425,239.
B. Seraphin, B. Kuster, G. Neubauer, 22. M. Huse, J. Kuriyan, The
G. Superti-Furga, Functional conformational plasticity of protein
organization of the yeast proteome by kinases, Cell 2002, 109, 275.
systematic analysis of protein 23. H. Shirato. H. Shima, G. Sakashita.
complexes, Nature 2002,415,141. T. Nakano, M. Ito, E.Y. Lee,
References I 4 2 5
8
Tags and Probes for Chemical Biology
8.1
The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
Stephen R. Adams
Outlook
8.1.1
Introduction
The ability to label proteins with green fluorescent protein (GFP) in living cells
has been a major research advance in cell biology in the last decade [I]. In
response to this success, chemical biologists have devised an ever-increasing
variety of alternative methods to provide a wider range of fluorescent colors and
other useful functionalities than those available from GFP and its variants. One
of the key features of GFP is that it can be genetically encoded; that is, the DNA
of the GFP gene can be fused to the DNA of any desired protein by standard
molecular biology techniques and then the chimeric protein can be expressed
in cells, tissues, or transgenic animals [ 2 ] . All the chemical biological methods
incorporate this major stratagem but differ from GFP in that the genetically
encoded peptide or protein sequence does not become autofluorescent (like
GFP) but acts as a specific receptor for derivatives of fluorophores that can
be added exogenously to the expressing cells. The size and structure of this
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L.. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
428
I receptor can be quite varied, from proteins or enzymes the size of GFP (-240
8 Tags and Probes for Chemical Biology
8.1.2
History and Design Concepts o f the Tetracysteine-biarsenical System
X = H, 1.2-ethanedithiol,EDT
X = CHzOH, British Anti-Lewisite,BAL
p.
SSI X
Scheme 8.1-1 The regeneration of protein-lipoates cofactors and enzyme thiols bound to
arsenic by reaction with small dithiols.
8 Tags and Probesfor Chemical Biology
430
I
with FlAsH. When FlAsH is bound to two moles of EDT, forming FlAsH-
EDT2, its fluorescence is almost completely quenched; but on reaction with
a tetracysteine peptide a strongly fluorescent complex is formed (Fig. 8.1-2).
This feature is particularly useful when labeling cells expressing tetracysteine-
tagged proteins, as unbound dye does not have to be fully removed by
washing to generate contrast unlike most alternative labeling methods. Even
so, nonspecific binding of FlAsH to thiols and hydrophobic sites can generate
some background signal that limits the sensitivity of this method compared to
GFP [8, 10, 201.
8.1.3
General Considerations
HOTOH
/ , -
HgO
/
O Y C F 3OYCF3
~ 0 00
,
~
,1 ASC13
0
Pd(OAc)z
H
A
'As'"
o
n
~'As' o
/
7
A
H
n
'As'
\ - \
'As'
2
n
/ H + ~ ~
. O0
/ \ TFA / \
\
O
0
2 EDT
' \
\
0
/
\
coz-
I
colorless colored,
non-fluorescent
1 Hg2'
2. -2H'
0
As
0
As
O w , . FlAsHO
Dianion
auinone tautomer
&CO, Colored.
weakly-fluorescent
n n
S\A<S s,AAis
O&L&-
"CI O 0 w -
CI
CHOXASH-EDT, ReAsH-EDT, \
n n
v
6
BarNile-EDT,
Qco2 H A 0
Environment-sensitive
fluorescence
OR SulfoFIAsH-EDT,
Immobilized
",1 FIAsH-EDT,
coz-co2- Membrane impermeant
ligand for extracellular AffinltYchromatography
Calcium green FIAsH-EDT, proteins
Low affinity fluorescent Ca2+indicator
Biarsenicals, which replace one or both of the phenolic groups with amino
substituents to form rhodol or rhodamine biarsenicals, can also be synthesized.
An amino derivative of Nile Red, a napthorhodol, can be converted by the
usual method to give an environmentally sensitive biarsenical fluorophore
(Scheme 8.1-3) [23]. Biarsenical derivatives of tetramethyl rhodamine or
rhodamine B have also been made [S]; the usual mercuration conditions
gave no reaction, but reaction of the free base in nonpolar solvents
was successful. However, despite both rhodamine biarsenicals binding to
tetracysteine peptides, the complexes were neither fluorescent nor colored,
suggesting that the rhodamine is in the lactone form. This is presumably
because steric hindrance between the arsenic-dicysteine group and the N,N-
dialkyl group forces the nitrogen lone pairs out of conjugation and destabilizes
the quinone tautomer. Screening a library of tetracysteine variants failed to
find any sequences that formed fluorescent complexes with these biarsenicals,
appropriately named TrAsH and RbAsH (unpublished results). Rhodamines
lacking alkyl substituents have proven much harder to synthesize so far
and would also fluoresce in the green-like FlAsH; however, their improved
resistance to photobleaching might make them valuable as labels for single-
molecule studies.
Biarsenical derivatives of other fluorophores emitting at longer wavelengths
would also be useful, particularly those based on nonxanthene skeletons
8.1 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications I 433
the reaction products of FlAsH with different peptides indicate that a number
of isomeric products can be formed, indicating that such different binding
configurations are possible. However, with all the peptides containing the
CCXXCC motif, only two products were formed with identical fluorescent
quantum yields, suggesting conformational isomers involving the hindered
benzoic acid group. These isomers interconvert at pH 7 and are only isolatable
under the acidic conditions of the HPLC separation. Indeed, ReAsH that has
no benzoic acid substituent forms only a single product with these tetracysteine
peptides.
8.1.4
Practical Applications of the Biarsenical-tetracysteine System
GFP, typically CFP and YFP (yellow fluorescent protein), which are capable
I 44’
8.1.4.1.6 An Assay for Targeted Nuclear Acid Repair for Gene Therapy in Yeast
The biarsenical-tetracysteine system has also been used to develop an in vivo
assay of nucleic acid repair, targeted by DNA-RNA hybrid oligonucleotides
[40, 411. These double-stranded hairpin-capped molecules form a double-D
loop structure on hybridization with the targeted chromosomal gene, initiating
repair, and have been used to correct mutated genes in several animal models.
As a model system to investigate the proteins involved in such repair and
the conversion efficiency, yeast was transfected with a plasmid that expresses
a mutated neomycin marker containing an internal stop codon TAG, which
is tagged with C-terminal 19 amino acid tetracysteine tag. Expression of the
protein results in premature termination, so no fluorescence is seen when the
cells are labeled with FlAsH. Repair oligonucleotides can be coelectroporated
into the cells and if repair is successful, the G is converted to C and a
complete protein is produced resulting in green fluorescence after labeling
with FlAsH. The biarsenical-tetracysteines system is advantageous over GFP
as the dyes bind and generate fluorescence rapidly with no requirement for
protein folding, allowing the rate of conversion of different oligonucleotides
to be compared. Cellular inheritance of the repair can also be demonstrated
by washing out the label, expanding individual cells, and then by relabeling;
long-lived GFP molecules might yield false positives.
Fig. 8.1-3 Specificity of FlAsH staining in (a) FlAsH fluorescence, (b) staining with a
HeLa cells expressing Cx43-tetracysteine. A Cx43-specific antibody, (c) overlay of these
gap junction plaque between two channels combined with a propidium iodide
transfected cells is marked with an arrow. stain (blue) to indicate nucleii.
Fig. 8.1-4 Two-color pulse chase of connexin43-tetracysteine in HeLa cells. See text for
details.
8.1.4.2.3 Probing the lntracellular Site o f Synthesis ofthe HIV-1 Gag Protein
Recently, the two-color pulse chase has been used to image the dynamics
of recently synthesized Gag, a primary structural protein of human
immunodeficiency virus type 1 (HIV-1) in living HeLa, Me1 JuSo, and
Jurkat T cells [43].The biarsenical-tetracysteine system was used for its small
size and because binding of the dye is independent of localized secondary
structure unlike GFP that only generates fluorescence after folding (various
mutants have half-lives of 30 min-4 h). Gag was tagged with a C-terminal
improved sequence (GSMPCCPGCCGC)derived from the first peptide library
screen described above, and gave good FlAsH staining in these cell types
that colocalized with anti-Gag antibody staining. Deconvolution microscopy
revealed that Gag-TC (tetracysteine) localized primarily to discrete areas
(possiblylipid rafts) of the (PM) plasma membrane even when using two-color
pulse chase to detect recently synthesized protein (-30 min) suggesting that
Gag is synthesized close to the PM. Gag-tetracysteine and similar construct
containing an extended linker were compatible with forming VLPs when cells
were transfected with a plasmid containing the complete HIV-1 genome. These
lower expressing viral plasmids also gave good plasma membrane staining;
although, the construct with a longer linker showed more intracellular vesicular
staining that colocalized with markers for the protein degradation pathway. The
importance of posttranslational myristoylation of Gag for correct targeting was
demonstrated, as mutations at this site gave diffuse cytoplasmic fluorescence
with no plasma membrane or organellar fluorescence. In contrast, mutations
in the L-domain required for efficient budding from the PM gave no effect on
Gag localization.
formation.
Fig. 8.1-5 Correlated fluorescence and ReAsH (red) two-color pulse-chase labeling.
electron microscopy o f ReAsH-labeled (b) The corresponding electron micrograph
connexin43-tetracysteine in HeLa cells. with photoconverted DAB staining indicated
(a) Fluorescence confocal image o f a gap with arrows. (c) Higher magnification
junction plaque after FlAsH (green) and micrograph of boxed region in (b).
8.1.5
Future Developments and Applications
with this approach in tissue culture cells; future work in more biologically
interesting cell types such as neurons is likely to be informative about the spatial
and temporal dynamics of Ca2+ signaling pathways. Using imaging modes
other than fluorescence are currently being explored, such as luminescence
which in conjunction with time-resolved imaging could lead to more sensitive
detection limits of tagged proteins in cells. Transgenic animals expressing
tetracysteine-tagged proteins have already been described but the development
of new protocols or biarsenical derivatives for labeling live animals and tissues
will probably be required. Nonspecific staining will probably be the major
limitation in these applications and better antidotes will be required to either
prevent or remove such background. This will be aided by the more-than-GO-
years research into more effective dithiol antidotes to combat the enormous
human health problems resulting from arsenic-contaminatedwater supplies
in many parts of the world today. Finally, this method has considerable
potential as a general approach for site-specific labeling of crude or purified
proteins in vitro with any desired probe, for example, for phosphorescence and
fluorescence anisotropy, FRET, NMR, EPR (electron paramagnetic resonance),
and so on, by simple conjugation to a biarsenical. The ability to predict where
tetracysteines can be inserted and labeled in proteins may become possible
with the determination of the three-dimensional structure of the complex and
will lead to more applications, both in vitro and in living cells.
8.1.6
Conclusions
Designing protein tags for use in living cells requires chemistry compatible
with the complex biochemical milieu that proceeds with high reactivity and
selectivity. The biarsenical-tetracysteine method was one of the first such
methods to be developed and the lessons learnt during the process and from
its application to address biological questions should be of general interest to
chemical biologists.
Acknowledgments
I would like to thank all my coworkers over the years in the Tsien and Ellisman
labs who have contributed to the development of the biarsenical-tetracysteine
method, particularly Roger Tsien for devising the original concepts and for
continual input into their improvement.
References
1. J. Zhang, R. Campbell, A. Ting, probes for cell biology, Nat. Rev. Mol.
R. Tsien, Creating new fluorescent Cell Biol. 2002, 12, 906-918.
References I 4 5 5
Outlook
8.2.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies I 459
proteins can be observed in real time, thereby giving new insights into the
dynamic distribution and localization of proteins in the cell. Other prominent
fusion protein-based approaches to study protein function in live cells include
the yeast two-hybrid system and the split-ubiquitin sensor, two methods that
allows the characterization and identification of protein-protein interactions
[5, 61. One impressive proof of the enormous utility of fusion proteins can
be found in the efforts to fuse all open reading frames of a given organism
to an appropriate tag and to exploit the properties of the tag to investigate
certain aspects of the biological function of the corresponding fusion protein
library. Examples include efforts to construct genome-wide protein interaction
maps using the two-hybrid system, to gain an inventory of all cellular protein
complexes using affinity tags, to observe the intracellular localization of all
proteins in the cell via GFP fusions, to map protein-protein interactions
among 705 integral membrane proteins of the yeast Saccharomyces cerevisiae,
or to display the entire proteome of an organism as a protein microarray
using a polyhistidine tag and nickel-coated glass slides [7-121. As impressive
these efforts were, the genome-wide application of fusion proteins dramatically
revealed two shortcomings of the currently available tags: their limitation to
properties that can be genetically encoded and the restriction of each fusion
tag to one particular type of functional assay. The latter point is acceptable
for the studies of individual proteins but more bothersome for genome-wide
approaches. In recent years, a new approach to exploit fusion proteins in
functional proteomics, which addresses these limitations has been developed:
this new approach is based on a tag-mediated labeling of fusion proteins,
either in vitro or in live cells, with synthetic molecules that transfer a unique
and specific property to the fusion protein [ 131.
8.2.2
General Considerations
The labeling of proteins with small molecules that can serve as spectroscopic
probes or cross-linkers is one of the cornerstones of protein chemistry.
However, the lack of specificity of the underlying chemistry used in traditional
protein labeling makes its application in the living cell or complex protein
mixtures, impossible. Currently, there exist two different strategies to equip
proteins with synthetic probes to monitor and manipulate protein function:
the incorporation of unnatural amino acids pioneered by the group of Schultz
[14],and the use of protein tags to mediate an exclusive labeling of synthetic
molecules, which will be the focus of this article. To be of general use, the
mechanism of labeling must be sufficiently promiscuous with respect to the
synthetic molecule so that different functionalities can be attached to the
tag but at the same time highly specific with respect to the protein tag so
that only the fusion protein is labeled with the synthetic molecule. Currently
used approaches for the labeling of fusion proteins with small molecules or
8 Jags and Probes for Chemical Biology
460
I ligands that carry the desired functionality can be classified into three groups:
(a) intein-based labeling of proteins with small molecules; (b) tags that bind to
a small molecule through noncovalent interactions, and (c) tags that bind to
a small molecule through covalent bond formation. Intein-based approaches
are a powerful method for the derivatization and semisynthesis of proteins in
uitro and applications of this approach will be discussed in detail in a different
chapter of this book [15, 161. Concerning the labeling of proteins with small
synthetic probes in live cells, an approach based on transsplicing inteins has
been developed by the group of Tom Muir [17].This approach is very elegant
as the intein tag removes itself in the process of the labeling, however, its
general applicability remains to be shown. A list compiling the approaches
developed so far for the noncovalent or covalent labeling of fusion proteins is
shown in Table 8.2-1. Concerning the labeling of fusion proteins via tags that
noncovalently interact with small molecules, a variety of different approaches
has been developed over the last few years. These include antibodies binding
to haptens, streptavidin binding to biotin derivatives, dihydrofolate reductase
(DHFR) binding to methotrexate (Mtx) or trimethoprim derivatives, FKBP12
mutants binding to a synthetic ligand, and short peptides binding to derivatized
a-bungarotoxin or to Texas red derivatives [18-241. These tags have been
successfully used for the labeling of fusion proteins with fluorophores and
other probes in live cells. A good example is the study of receptor trafficking of
the a -amino-3-hydroxy-5-methyl-4-isoxazole-propionate(AMPA) receptor [241.
In this study, the AMPA receptor was expressed as fusion protein with an
a-bungarotoxin-binding peptide and subsequently labeled with fluorescent,
radioactive, or biotinylated a-bungarotoxin derivatives. Using this approach,
the total receptor expression, surface expression, internalization, and insertion
of receptors into the plasma membrane could be visualized and quantified in
fixed or live cells. A possible limitation of tags labeled through noncovalent
interactions is the reversibility of the labeling. This feature is disadvantageous
for applications such as pulse-chase type labeling experiments, long-term
studies, and the detection of the labeled protein under denaturing conditions.
The remaining part of the chapter is therefore dedicated to discussing the
approaches for a covalent labeling of fusion proteins, in more detail.
The first tag allowing for a covalent labeling of fusion proteins in vitro
and in live cells was the tetracysteine tag that specifically binds to biarsenical
compounds such as FlAsH, a biarsenical fluorescein derivative [25,33].The two
main advantages of the tetracysteine tag is its relatively small size, which can
be as small as 6 amino acids (CCPGCC in one-letter code), and the possibility
to use different fluorophores. Potential disadvantages of the approach are the
reported unspecific binding of the biarsenical fluorophores and the need to
coincubate with dithiols such as 1,2-ethanedithiol to minimize this unspecific
binding. However, the Tsien group recently reported sequences with increased
affinity toward biarsenical compounds that should enhance the performance
of the approach in live cells [34]. The use of the tetracysteine tag is discussed
in detail by Steve Adams in another chapter of this book.
Table 8.2-1 Tags used for the selective labeling of fusion proteins
with synthetic molecules
Tag Sizela] Label Required additives Type of linkage Applications Comments References Po
h,
Tetracysteine tag 6-12 Biarsenical Dithiols to suppress Covalent and Intracellular Cell surface (251
fluorophores unspecific binding reversible applications require
reduction of
disulfide bonds n
b
182 Benzylguanine None Covalent and Intracellular, - ‘D
AGT [261
derivatives irreversible cell surface a
5-
n
N-terminal Cys -
>l[’’] Thioester None Covalent and Intracellular Limited specificity ~ 7 1 2
derivatives irreversible of labeling; slow
reaction
16 NTA derivatives None Noncovalent and Intracellular, -
::
‘p
His tag PI
reversible cell surface +
.P.
Texas red binders 38,42 Texas red None Noncovalent and Intracellular, Lo.
2
derivatives reversible cell surface
cY-Bungarotoxin- 1 3 a-Bungarotoxin None Noncovalent and Cell surface
binding peptide derivatives (74 a.a.) reversible
157 Methotrexatel None Noncovalent and Intracellular - [21, 221
DHFR
trimethoprim reversible
derivatives 2
(continued overleaf)
2.
3
%
P
N
m
-
00
2
B
n
5
Table 8.2-1 (continued) sl.
-0
a
Tag Size[a] Label Required additives Type of linkage Applications Comments References
~
a Size is given in amino acids; AGT 06-alkylguanine-DNA
alkyltransferase; DH FR - dihydrofolate reductase;
NTA - nitrilotriacetic acid; CP - carrier protein.
b Requires expression as N-terminal fusion with ubiquitin or
intein.
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies 1 463
8.2.3
Applications and Practical Examples
hTA?
: ~ k
coo-
{<ANH2
been controlled and studied with this approach, including signal transduction
and control of transcription in eukaryotic and prokaryotic cells. Previously used
CIDs relied on noncovalent interactions and we extended the approach through
covalent labeling of AGT fusion proteins with ligands capable of interacting
with other proteins. As a first ligand we chose Mtx. Mtx is a tight-binding in-
hibitor of DHFR, and heterodimers of Mtx and dexamethasone, a ligand of the
glucocorticoid receptor (GR),have been used as CIDs to control transcription
in yeast [45]. In this so-called three-hybrid system, a DNA-binding domain and
a transcriptional activation domain were expressed as DHFR and GR fusion
proteins, respectively, and transcription was initiated through the addition of
the CID. On the basis of these studies, we synthesized a 06-benzylguanine-
methotrexate (BGMtx) heterodimer as CID (Figs. 8.2-l(b)and 3) [43]. To use
BGMtx as CID in a three-hybrid system, we constructed fusion proteins ofAGT
with the DNA-binding domain LexA and of DHFR with the transcriptional ac-
tivation domain B42 (Fig. 8.2-3).The in vivo labeling of the AGT fusion protein
with Mtx using BGMtx then induced the dimerization of the AGT and DHFR
fusion proteins, leading to stimulation of transcription of a reporter gene.
Pairs of plasmids encoding LexA and B42 fusion proteins were transformed
into the yeast strain L40, in which the dimerization of LexA and I342 fusion
proteins leads to transcription of the reporter genes H I S 3 and lacZ. Growing
these yeast strains in the presence of BGMtx then complemented the histidine
auxotrophy of the yeast and also induced the expression of B-galactosidase.
These experiments clearly showed that BG derivatives can be used as CIDs
to control transcription in yeast and, more generally, also demonstrated how
AGT fusion proteins can be used to control protein dimerization in vivo.
Fig. 8.2-5 (a) Use of ACT-based protein and rapamycin (100 nM) and afterwashing
microarrays t o screen for protein-protein analyzed for fluorescence: (1) detection o f
interactions. (b) Purified ACT-FKBP and Cy3, (2) detection o f Cy5 on same
ACT-FRB (both 1 p M ) were immobilized in microarray as in ( l ) , (3) overlay of (1) and
arrays o f 8 x 8 spots each on a BG-covered (2). (c) Same experiments as in (b) but
glass. The slide was then incubated with a using cell lysates o f E. coli BL21 (DE3)
solution containing Cy3-labeled ACT-FKBP expressing either ACT-FKBP or ACT-FRB for
(100 pM), Cy5-labeled ACT-FRB (100 pM), spotting.
the labeling of the protein and its immobilization are practically irreversible, a
requirement that is not fulfilled by low-affinity tags such as the His tag.
The generation of AGT-based protein microarrays requires the display
of BG on otherwise bioinert glass slides. We have previously shown that
surfaces covered either with carboxymethylated dextran or polymer brushes
of poly(oligo(ethy1eneglyco1)methacrylate)(POEGMA) and displaying BG are
sufficiently bioinert for the selective immobilization of ACT fusion proteins
[46, 481. Building on these results, we used glass slides covered either
470
I with carboxylated hydrogel or POEGMA. To demonstrate the use of AGT-
8 Jags and Probesfor Chemical Biology
and remodel the extracellular matrix play the most prominent role in these
activities. The detailed in vivo characterization of proteins is therefore an
important prerequisite for understanding the biology of the cell surface in
molecular terms. As the surfaces of cultured cells are freely accessible to
chemical treatment, the labeling of their proteins with synthetic molecules
appears to be an attractive strategy to equip them with probes that allow
for their functional characterization [SO]. Tetracysteine tag and AGT are two
examples for tags that were designed primarily for the covalent modification
of intracellular proteins. Consequently, these protein tags are not necessarily
suitable for applications in the oxidizing environment of the cell surface. For
example, the application of the tetracysteine tag on cell surfaces requires the
reduction of the otherwise oxidized and unreactive cysteines of the tag using
membrane-impermeable reductants such as 2-mercaptoethanesulfonate and
tris(carboxyethy1)phosphine [Sl]. Since this treatment will also reduce the
disulfide bridges of most cell surface proteins, it will automatically perturb
many of their activities. The labeling of AGT fusion proteins, on the other
hand, relies on the alkylation of the reactive cysteine of AGT. While we
have previously shown that AGT mutants with increased stability toward
oxidizing conditions can be displayed in an active form on cell surfaces or viral
particles, the requirement for a reactive cysteine makes AGT fusion proteins,
nevertheless, to some extent sensitive to the oxidative environment of cell
surfaces. The noncovalent labeling of cell surface proteins can alternatively
be achieved by expressing them with an oligohistidine tag and incubating
the corresponding cells with probes comprising a chromophore together
with a metal-ion-chelating nitrilotriacetate (NTA) moiety [28]. This moiety
binds reversibly to the oligohistidine sequences that are displayed by the
fusion proteins. The feasibility of the approach has been demonstrated by
binding NTA-chromophore conjugates to oligohistidine fusion proteins of a
ligand-gated ion channel and a G protein-coupled receptor (GPCR). Possible
drawbacks of the approach are the modest stability of the complex and
unspecific binding of the NTA derivate to other proteins. As already mentioned,
an alternative strategy is based on the expression of a cell surface protein as
a fusion protein with an a-bungarotoxin-binding peptide and the incubation
of cells expressing this protein with covalently derivatized a-bungarotoxin
derivatives [24]. This labeling is of higher specificity than the His tag-based
labeling, but also suffers from the fact that it is noncovalent and hence
reversible.
We have recently developed a novel labeling strategy for cell surface proteins,
which promises to overcome some of the limitations of these approaches [30].
Here, the protein of interest is fused to a carrier protein (CP) and the
corresponding fusion protein is then specifically labeled with CoA derivatives
through a posttranslational modification catalyzed by a PPT.
CPs are integral components of various primary and secondary metabolic
pathways, including fatty acid synthesis (FAS),nonribosomal peptide synthesis
(NRPS), polyketide synthesis (PKS), and lysine biosynthesis. All CPs harbor
472
I a phosphopantetheine (Ppant) as a covalently attached prosthetic group
8 Jags and Probesfor Chemical Biology
(Fig. 8.2-G(a))[52]. The Ppant serves as the attachment site for the building
blocks and intermediates of different pathways. The different substrates are
coupled as acyl thioesters to the free SH group of Ppant. Depending on
the structure of the bound substrate, CPs are named acyl carrier proteins
(ACPs),peptidyl carrier proteins (PCPs) or aryl carrier proteins (ArCPs).The
covalent attachment of Ppant to the CP is catalyzed by a group of enzymes
named phosphopantetheine transfrases [52]. PPTases use CoA as the source
for Ppant and attach it as a phosphodiester to an invariant serine residue of
the CP (Fig. 8.2-G(a)).Representative examples for PPTases are the PPTase
acyl-carrier protein synthase (AcpS) from E. coli, which modifies ACPs, and
the PPTase Sfp from Bacillus subtilis, which accepts PCPs from NRPS as
substrates but also ACPs of FAS and PKS [52]. The overlapping substrate
specificity of Sfp stands in contrast to that of AcpS which transfers the Ppant
only to ACPs, but not to the PCPs of the enterobactin synthetase EntF from
E. coli or other PCPs.
Structural and biochemical studies have revealed that the #?-mercapto-
ethylamine group of CoA does not participate in the recognition of CoA by
PPTases and that thiol-modified CoA derivatives can be employed for the
labeling of CPs [53]. This lack of sensitivity with respect to the modification
of the #?-mercaptoethylamine of CoA has been exploited to achieve specific
labeling of CP-fusion proteins on the surface of eukaryotic cells (Fig. 8.2-G(b))
[30, 501. In initial applications, we chose the ACP/PPTase pair from E. coli.
ACP from E. coli is a small protein of only 77 residues that folds into a
compact structure composed of four a-helices, a fold shared by other CPs.
The Ppant derivative is attached to Ser3G of ACP. The protein contains no
cysteines, thus avoiding a potential misfolding of secreted ACP fusion proteins
due to unwanted oxidations. When tested in vitro, ACP from E. coli is readily
modified by CoA derivatives and the rate of reaction does not show a significant
dependence on the nature ofthe label. At concentrations of 0.2 pM AcpS, 1 pM
ACP, and 5 pM of the CoA derivative, a typical labeling experiment is complete
within 10 min and the reaction is nearly quantitative.
The ACP-Saglp fusion protein serves as a representative example for the
modification of a protein on the surface of the yeast S. cerevisiae. Saglp is
the a-agglutinin of yeast cells and is covalently attached to the B-1,G-glucan
of the cell wall via its modified glycosylphosphatidylinositol anchor [54].
For the construction of the fusion protein we replaced the natural signal
sequence of Saglp with the signal sequence of the a-factor followed by the
coding sequence of the bacterial ACP. The combined addition of CoA-Cy3
and AcpS resulted in the specific labeling of yeast cells expressing ACP-
Saglp (Fig. 8.2-7(a)).The observed specificity and efficiency of labeling can be
rationalized by two properties of the system. First, the cell surface separates
the cell-impermeable CoA derivatives and the appropriate PPTase from host
PPTase, host ACPs, and underivatized CoA, thereby suppressing unwanted
side reactions such as the labeling of internal CPs. Second, bacterial ACPs
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies I 473
8.2.4
Conclusions and Future Developments
The labeling of AGT and CP-fusion proteins discussed here demonstrates the
two main advantages of a tag-mediated labeling of fusion proteins. Firstly,
proteins can be equipped with functionalities that cannot be genetically
encoded. This can be achieved in live cells or in vitro and possible functionalities
range from synthetic fluorophores to ligands that mediate the interaction with
other proteins. Secondly, a single fusion protein can be used for a variety of
different applications. This second point applies, in particular, to AGT fusion
proteins that can be used for pulse-chase experiments in live cells or for
the generation of protein microarrays. Together, these properties make such
fusion proteins powerful tools for functional proteomics and we are convinced
that we will see many applications of these and other related technologies in
the near future.
What kind of further technological developments can be expected in this
area of research? An obvious extension of the previous work is the specific
labeling of fusion proteins in multicellular organisms such as Duosophila
melanoguster, Caenorhabditis elegans, or mice. Another important development
would allow the specific and simultaneous labeling of multiple fusion proteins
with different (fluorescent) probes to collect multiple parameters and proteins,
simultaneously in one cell. Such a multicolor imaging could either be achieved
by using different labeling approaches, such as AGT and the tetracysteine tag,
or by generating mutants of one tag with so-called orthogonal substrate
specificities. As previous experiments have shown, AGT appears to be an
ideal candidate for the latter strategy. Furthermore, the active transport of
membrane-impermeable compounds for labeling experiments in live cells
would significantly extend the general applicability of the approach. Here, the
recently described arginine transporters are attractive candidates to achieve
References I 4 7 7
Acknowledgments
References
Chemical Biology. Fr-om Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
9
Diversity-oriented Synthesis
9.1
Diversity-orientedSynthesis
Derek S. Tan
Outlook
9.1.1
Introduction
Small molecules are extremely powerful tools for studying biological systems
[ 11. They allow rapid and conditional modulation of biological functions,
often in a reversible, dose-dependent manner. Moreover, they can modulate
individual functions of multifunctional targets and distinguish between
different posttranslational modification and conformational states of proteins.
These features make the chemical, genetic, or pharmacological approach
a valuable complement to genetic and RNA interference-based methods,
particularly for dissecting complex, dynamic biological processes. Small
molecules can also be used to illuminate new potential therapeutic targets
and provide a very direct means of validating these targets in model systems.
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
9 Diversity-orjented Synthesis
484
I However, the identification of new, highly specific small molecule probes
remains a major challenge in chemical biology. Structure- or mechanism-
based rational design is sometimes feasible when a single protein target and
a natural ligand are known. Conversely, high-throughput screening (HTS)
of small molecule libraries provides a practical and effective solution for
individual targets that may be less well characterized and for systems that
involve multiple targets. Diversity-oriented synthesis (DOS) has emerged as a
valuable approach to generate combinatorial libraries for use in these screens,
particularly novel libraries that explore untapped or underrepresented regions
of biologically relevant chemical structure space [2]. Efforts in DOS have led
to the discovery of powerful new biological probes and have also spurred
continuing advances in synthetic organic chemistry.
9.1.2
History/Developrnent
Early efforts in DOS were reported in the 1990s. However, several key synthetic
technologies that were developed earlier laid the foundations for DOS.
Foremost among these are (1)solid-phase synthesis and related separation
techniques and (2) combinatorial synthesis.
that circumvent the need for tedious purifications at each step of a multistep
synthesis.
4 Fig. 9.1-1 Separation techniques used in (purple) are soluble in organic solvents,
diversity-oriented synthesis. (a) Solid-phase again allowing homogeneous reaction
synthesis allows reaction products to be conditions to be used. The reaction products
separated easily from excess reagents and can then be separated from excess reagents
reaction by-products by rinsing the solid and reaction by-products by extraction with
supports (yellow) with appropriate solvents. an immiscible fluorocarbon solvent, or by
At the end o f t h e synthesis, the products are passage over a column offluorinated silica
usually cleaved from the solid support for gel (not shown). (d) Solid-supported
screening. (b) Precipitation tags or phase reagents, such as the carbodiimide shown,
switches (red) allow homogeneous reaction are used with substrates in solution t o
conditions, but can then be precipitated facilitate removal o f reaction by-products
(blue) from the crude reaction mixture by that may be difficult t o separate using
addition o f the appropriate solvent or traditional extraction or chromatographic
reagent. The reaction products are again techniques. (e) Solid-supported scavengers,
easily separated from excess reagents and such as the amine shown, are used t o
reaction by-products. Ideally, the remove excess reagents or reaction
precipitation tag can then be resolublized by-products from solution phase reactions.
for subsequent reactions. (c) Fluorous tags
4 Fig. 9.1-2 Synthetic strategies used t o stochastic distribution of substrates for the
generate combinatorial libraries. Several next reaction. Generally, at least three copies
approaches t o a 256-member library o f o f a library are synthesized t o maximize the
tetrapeptides, composed o f the four amino probability that each putative library
acids aspartate (D), histidine ( H ) , lysine (K), member is represented a t least once. Since
and threonine (T), are shown. For simplicity, there are four coupling reactions required at
only the coupling reactions are considered each step, the overall synthesis requires
in these analyses. (a) Mixture synthesis 4 x 4 = 16 coupling reactions. Importantly,
involves simultaneous coupling o f all each solid support has been exposed t o only
building blocks in one-pot reactions. The a single synthetic sequence, and hence
synthesis requires only four coupling steps, carries only a single library member.
but provides a complex mixture o f 256 Encoding the solid supports with orthogonal
products, complicating screening and chemistry or physical methods (e.g., TAGT)
identification o f active library members. allows the history o f each bead t o be
(b) In parallel synthesis, each library reconstructed t o identify active library
member (3 out o f 256 are shown) is members. (d) Recursive deconvolution can
(a),
synthesized in a separate reaction vessel, also be used to identify active library
allowing each to be cleaved, purified, and members. In the first round the last set
screened individually. Since each o f reaction products are not repooled, but
tetrapeptide requires four coupling steps, are screened separately so that, for a given
the overall synthesis requires active library member (*), the identity o f t h e
256 x 4 = 1024 coupling reactions. For final (N-terminal) building block is known.
(a,
larger libraries, recourse to robotics may be Using this information, progressively
necessary. (c) In split-pool synthesis, each smaller sublibraries are made @, 0)
building block is coupled t o the solid with an increasing number o f fixed building
supports in a separate reaction vessel. The blocks until the identities o f all the building
solid supports are then pooled, mixed, and blocks have been determined.
split into new reaction vessels t o provide a
of split-pool synthesis is that the pooling steps obscure the precise identity
of each individual library member. Thus, either recursive deconvolution or
encoding strategies must be used to determine the identities of active library
members.
9.1.2.2.6 Encoding
Recursive deconvolution is effective but time consuming. An alternative
approach to identifying individual members of split-pool libraries is to use
a physical or chemical method to encode the building block coupled in each
reaction vessel [13].This can be accomplished by attaching an inert chemical
“tag” to the solid supports using orthogonal reactions (Fig. 9.1-2(c)).Once an
active library member is found, its identity can be determined by decoding the
tags from the corresponding solid support. Notably, the tags do not identify
the structure of the product directly, but instead provide a history of reaction
conditions to which the solid support has been exposed. This reaction sequence
must then be repeated to determine the structure of the library member using
standard analytical techniques.
Another approach involves physical tagging of the solid supports with
“barcoding” devices. This can be accomplished by direct modifications to
the solid supports or by enclosing the solid supports in a small permeable
reaction vessel along with the tag. A variety of tags have been used for this
purpose, ranging from Houghten’s original “tea bags” [14],to colored plastic
pegs, to fluorescent colloids, to radiofrequency transponders, to laser-etched
two-dimensional barcodes. In some cases, if the barcodes are assigned at the
very beginning of the synthesis, which are then read prior to each split step,
an exactly even distribution of synthetic intermediates can be accomplished,
allowing synthesis of exactly one copy of the library. Again, the tag only tracks
the history of reactions to which the solid support has been exposed.
/ /
R&
’R
R2-
0 4
Fig. 9.1-4 Early efforts toward multiscaffold (b) proposed squaric acid as a versatile
libraries. (a) Armstrong converted an Ugi precursor t o various cyclic scaffolds,
multicomponent reaction product t o several demonstrating several such reactions [20].
linear and cyclic derivatives [19] and
9 Diversity-oriented Synthesis
496
I 9.1.3
General Considerations
Current efforts in DOS are focused in three areas. First, a variety of library
design strategies are being explored to generate libraries that will provide
new biologically active molecules to probe a wide range of targets. Second,
new synthetic strategies are being developed to generate structural diversity
in a flexible, efficient fashion. Third, new chemical methodologies are being
developed to meet the stringent demands of DOS on reaction efficiency and
selectivity.
F
Difiucan F 0 0 Cipro Paxi1
(fiuconazole) (ciproftoxacin) (paroxetine)
Claritin 0 Viagra
(loratadine) ,N.Jf (sildenafil)
OH
Penicillin G
Amphotericln B
0 Hor'u"o
Me
NMe,
Vancomycin
OH
HO
HO
Vincristine
Paclitaxel(Taxol)
9. I Diversity-oriented Synthesis I 499
t Fig. 9.1-5 Structures of synthetic drugs which are often rich in stereochemical
and natural products. (a) Representative features and complex ring systems. For a
examples o f approved synthetic drugs, recent comparison o f synthetic drugs and
which are often rich in aromatic rings and natural products. see Ref. [28]. See also
nitrogen atoms. (b) Representative Fig. 9.1-7.
examples o f clinically used natural products,
and have been used in synthetic drugs. Some examples include purines,
indoles, and benzopyrans [26].
4 Fig. 9.1-6 Approaches t o skeletal diversity. scaffolds from precursors, all having a triene
(a) Schultz used multiple heterocyclic functionality in common (401. (c) Schreiber
scaffolds, all having a common set o f used a set o f differentially functionalized
functional groups, as building blocks that furan precursors t o generate multiple
could then undergo the same set o f scaffolds under a single set o f reaction
appendage-coupling reactions [39]. conditions [41]. The nature ofthe scaffold
(b) Schreiber used consecutive stepwise was determined by the functionalization o f
Diels-Alder cycloaddition reactions with the furan sidechains.
various dienophiles to generate multiple
9.1.4
Applications and Practical Examples
Fig. 9.1-7 Example o f principle component principal components account for 84.2%.
analysis comparison o f synthetic drugs and Synthetic drugs (squares, capitalized) and
natural products. A set o f 20 synthetic natural products (circles, italicized) cover
drugs, including the top 10 best-sellers in distinct regions of chemical space with
2004, and 20 natural products was analyzed limited overlap; Flonase and Zocor are
for nine molecular descriptors: molecular synthetic drugs that are analogs o f natural
weight, hydrophobicity (X log P or C log P), products. Molecular descriptors were
# hydrogen-bond donors, # hydrogen-bond obtained from PubChem
acceptors, # rotatable bonds, topological (http://pubchem.ncbi.nlm.nih.gov/) and
polar surface area 1431, # stereogenic ChemBank (http://chembank.broad.
centers, # nitrogen atoms, # oxygen atoms. harvard.edu/) or calculated using
PCA was used t o reduce the ChemDraw/Biobyte and Molinspiration
nine-dimensional vectors t o (http://www.molinspiration.com). PCA was
two-dimensional vectors, which were then performed with R v1.01 (http://cran.
replotted as shown. The first principal r-project.org/). Adapted from Ref. [2] with
component accounts for 55.1% o f the permission.
original information and the first two
504
I 9 Diversity-oriented Synthesis
Q
R3 CNH2
High-throughput
o*o screening 0 0
HO
Q
1,890-member library Uretupamine A
Structure-activity
relationship analysis
* 9 9
HO
Ph
Uretupamine B
/-kYR4 &w.,,
Q
HY 0
High-throughput
screening
ex*"'
_______)
oho and
statistical analysis
A
HO
HO+ S Y
N T P h
7,200-member Tubacin Ph
biased library
0% CN-Ph
HO
Histacin
9.1 Diversity-oriented Synthesis I 505
4 Fig. 9.1-8 Uretupamines, tubacin, and library [45]. This biased library was targeted
histacin. (a) Schreiber discovered to HDACs by capping each library member
uretupamine A as a function-selective with a metal-binding functional group at the
suppressor o f the yeast nutrient signaling end o f a long alkyl chain (YR4). Each subset
protein Ure2p through HTS o f a library of ofthe library was screened in two cytoblot
natural productlike compounds [44]. assays for histone acetylation and cr-tubulin
Analysis o f SAR led t o the development o f acetylation. PCA was used to replot the data
an improved analog, uretupamine B. See t o identify selective inhibitors o f histone
Fig. 9.1-9 for biological data. (b) Tubacin versus a-tubulin deacetylation, including
and histacin were discovered as histacin and tubacin. See Fig. 9.1-9 for
paralog-selective HDAC (histone biological data.
deacetylase) family inhibitors from a related
H H
HO 0
HO-N cyNO
,NH2 R
Cardiogenol A (R = NHPh)
Cardiogenol B (R = OPh)
Cardiogenol C (R = OMe)
TWS119 Cardiogenol D (R = (QCH=CHPh)
NH
Purmorphamine Reversine
Fig. 9.1-10 Small molecule modulators o f mouse embryonic stem cells [48].
stem cell differentiation. Schultz has (c) Purmorphamine induces osteogenesis o f
discovered a number o f small molecules mouse mesoderm fibroblast cells [49].
that modulate stem cell differentiation from (d) Reversine induces dedifferentiation o f
a multiscaffold library o f druglike lineage-specific murine myoblasts to
heterocycles [39] (see Fig. 9.1-6(a)). multipotent mesenchymal progenitor cells,
(a) T W S l l 9 induces neurogenesis o f mouse which can then be induced t o differentiate
embryonic stem cells [47]. (b) The into osteoblasts or adipocytes [50]. See
cardiogenols induce cardiomyogenesis of Fig. 9.1-9 for biological data.
R'
10,000.rnember
lfbrary
(R' = 9 scaffolds) Lead compounds (EC,, = 5-10yM)
Secondary OMe
Screening and
__3c _____)
0
Screening and
_____)
secondary library ~3.
synthesis
OMe OMe
Screening and
_____t
Secondary library R3
synthesis
Me,N
Fexaramine
OMe (EC,, = 25 nM) OMe
Fig. 9.1-11 Fexaramine, a potent, highly receptor [51, 561. Synthesis and screening o f
specific nonsteroidal agonist o f the multiple secondary libraries provided
farnesoid X receptor. Nicolaou used a library extensive SAR data, ultimately leading t o the
o f compounds built around the privileged development of fexaramine as a potent
2,2-dimethylbenzopyran substructure, which agonist. Fexaramjne proved t o be highly
is found in a wide range of natural products, specific for activation ofthe FXR signaling
t o discover lead compounds that were pathway. See Fig. 9.1-9 for biological data.
moderate agonists o f the farnesoid X
Solubon phase
synthesis
OMe X-R3
R’
0 0
lsoindoline diester 240-member library
High-throughput
screening
___L
-s 0 llA6817
Fig. 9.1-12 llA6B17, a small molecule biochemical FRET assay was used in the
inhibitor ofthe Myc-Max protein-protein initial screen and the hits were analyzed
interaction. Vogt and Boger identified further using ELISA, EMSA, and cell foci
llA6B17 by screening a library built around a formation assays. See Fig. 9.1-9 for
peptidomimetic isoindoline scaffold [52].A biological data.
514
I with small molecules. Such probes should be valuable tools for dissecting the
9 Diversity-oriented Synthesis
roles of these transcription factors in cancer and for evaluating their potential
as new therapeutic targets.
9.1.5
Future Development
DOS has provided a powerful arsenal of new small molecule probes to dissect
complex biological processes. It has also driven new advances in the field
of synthetic organic chemistry. In the continuing evolution of this field, the
current focus is on refining library design strategies so that new probes can be
identified as efficiently as possible given a particular biological target or system
of interest. For example, correlation of particular chemical scaffolds with
specific classes of biological targets will facilitate prioritization of appropriate
compounds to screen against these targets. Other targets may prove more
challenging, requiring ventures into new, uncharted regions of chemical
structure space. Systematic evaluation of various library design strategies
across a wide range of biological assays is on the horizon under the Molecular
Libraries Initiative of the National Institutes of Health [58]. Importantly, the
results of these experiments will be deposited into the publicly available
PubChem database (http://pubchem.ncbi.nlm.nih.gov/) to allow subsequent
statistical analyses through data mining. This will provide valuable information
for future efforts in library design.
9.1.6
Conclusion
Acknowledgments
References
1. J.S. Potuzak, S.B. Moilanen, D.S. Tan, 10. R.A. Houghten, General method for
Discovery and applications of small the rapid solid-phase synthesis of large
molecule probes for studying numbers of peptides: specificity of
biological processes, Biotechnol. Genet. antigen-antibody interaction at the
Eng. Rev. 2004, 21, 11-77. level of individual amino acids, Proc.
2. D.S. Tan, Diversity-oriented synthesis: Natl. Acad. Sci. U. S. A. 1985, 82,
exploring the intersections between 5131-5135.
chemistry and biology, Nut. Chem. 11. A. Furka, F. Sebestyen, M. Asgedom,
Biol., 2005, I, 74-84. G. Dibo, General method for rapid
3. R.B. Merrifield, Solid phase peptide synthesis of multicomponent peptide
synthesis. I. The synthesis of a mixtures, Int.]. Pept. Protein Res. 1991,
tetrapeptide, /. Am. Chem. Soc. 1963, 37,487-493.
85,2149-2154. 12. K.S. Lam, S.E. Salmon, E.M. Hersh,
4. F. Guillier, D. Orain, M. Bradley, V.J. Hmby, W.M. Kazmierski, R. J.
Linkers and cleavage strategies in Knapp, A new type of synthetic peptide
solid-phase organic synthesis and library for identifying ligand-binding
combinatorial chemistry, Chem. Rev. activity, Nature 1991, 354, 82-84.
2000, 100,2091-2157. 13. R.L. Affleck, Solutions for library
5. C.C. Tzschucke, C. Markert, encoding to create collections of
W. Bannwarth, S. Roller, A. Hebel, discrete compounds, Curr. opin.
R. Haag, Modern separation Chem. Bid. 2001, 5, 257-263.
techniques for efficient workup in 14. R.A. Houghten, General method for
organic synthesis, Angew. Chem. Int. the rapid solid-phase synthesis of
Ed. Engl. 2002,41,3964-4000. large numbers of peptides: specificity
6. A. Kirschning, H. Monenschein, of antigen-antibody interaction at the
R. Wittenberg, Functionalized level of individual amino acids, Proc.
polymers-emerging versatile tools for Natl. Acad. Sci. U. S . A. 1985, 82,
solution-phase chemistry and 5131-5135.
automated parallel synthesis, Angew. 15. J.A. Ellman, Design, synthesis, and
Chem. Int. Ed. Engl. 2001, 40,650-679. evaluation of small-molecule libraries,
7. J.G. Garcia, Scavenger resins in Ace. Chem. Res. 1996, 29,
solution-phase combichem, Methods 132-143.
En~ymol.2003,369,391-412. 16. S. Hobbs DeWitt, J.S. Kiely, C.J.
8. X. Li, D.R. Liu, DNA-templated Stankovic, M.C. Schroeder, D.M.
organic synthesis: Nature’s strategy for Reynolds Cody, M.R. Pavia,
controlling chemical reactivity applied “Diversomers”: an approach to
to synthetic molecules, Angew. Chem. nonpeptide, nonoligomeric chemical
Int. Ed. Engl. 2004, 43,4848-4870. diversity, Proc. Natl. Acad. Sci. U. S. A.
9. H.M. Geysen, R.H. Meloen, S. J. 1993, 90,6909-6913.
Barteling, Use of peptide synthesis to 17. D.S. Tan, M.A. Foley, M.D. Shair, S.L.
probe viral antigens for epitopes to a Schreiber, Stereoselective synthesis of
resolution of a single amino acid, Proc. over two million compounds having
Natl. Acad. Sci. U. S . A. 1984, 81, structural features both reminiscent of
3998-4002. natural products and compatible with
516
I 9 Diversity-orjented Synthesis
libraries, Org. Biomol. Chem. 2003, I , 59. W. Zang, Fluorous technologies for
908-920. solution-phase high-throughput
57. D.L. Boger, J. Desharnais, K. Capps, organic-synthesis, Tetrahedron, 2003,
Solution-phase combinatorial 59,4475-4489.
libraries: modulating cellular 60. T. Hideshima, J.E. Bradner, J. Wong,
signaling by targeting protein-protein D. Chauhan, P. Richardson, S.L.
or protein-DNA interactions, Angew. Schreiber, K.C. Anderson,
Chem., Int. Ed. Engl. 2003, 42, Small-molecule inhibition of
4138-4176. proteasome and aggresome function
58. C.P. Austin, L.S. Brady, T.R. Insel, induces synergistic antitumor activity
F.S. Collins, Policy forum: molecular in multiple myeloma, Proc. Natl. Acad.
biology: NIH molecular libraries S C ~U.
. S. A. 2005, 102,8567-8572,
initiative, Science 2004,
306,1138- 1139.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
9.2
Combinatorial Biosynthesis o f Polyketides and Nonribosomal Peptides
Outlook
9.2.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited bv Stuart L. Schreiber. Tarun M. Kauoor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN 978-3-527-31150-7
520
I 9 Diversity-oriented Synthesis
OH 0
0
'
OH 0
Actinorhodin Tetracenornycin
Erythromycin A
YNH2
0
I ,OH 0
Tyrocidine A
Surfactin A
Fig. 9.2-1 Polyketide and nonribosomal tetracenomycin are constructed via aromatic
peptide structures described in the text. polyketide synthases. Surfactin A and
Erythromycin A is produced by a modular tyrocidine A are produced through
polyketide synthase. Actinorhodin and nonribosomal peptide synthetases.
A
521
-
B
3 5 3 3 3
OH SH S OH SH
0
H o t B O A0
f j C0o A
[7ry-rq-7][FlTrTIF] JL
H
3 3 3 3 3 3
SH OH SH OH
1 , 1 1 ,
HO HO
the first module via thioester linkage to the active-site cysteine (Fig. 9.2-2).
The next sequential (downstream) acyl carrier protein (ACP)receives a specific
extender unit, usually derived from malonyl- or methylmalonyl-CoA, from
the appropriate acyl transferase (AT) domain. A Claisen-like decarboxylative
condensation between the primer and extender units affords an ACP-bound
p-ketothioester. The ultimate oxidation state and stereochemical configuration
of the intermediates are determined by collaboration of optional ketoreductase
522
I (KR), dehydratase (DH), and enol reductase (ER) domains while docked at
9 Diversity-oriented Synthesis
the ACP. Once fully processed, the extended chain is passed to the KS of
the subsequent module by a transthioesterification reaction. The process is
repeated, leading to the final module where the product is generally excised
via hydrolysis or thioesterase (TE)mediated macrocyclization.
The less clearly understood aromatic PKSs utilize a single KS(CLF)/ACP
pair capable of multiple elongation reactions to construct the complete
polyketide backbone. The number of elongation events is controlled by the CLF
associated with the KS domain. Transthioesterification and decarboxylative
condensation reactions proceed in an analogous fashion to modular systems.
The ultimate topology of advanced aromatic polyketides is controlled by
precise combination of tailoring enzymes responsible for redox chemistry and
cyclization pattern.
Analogous to polyketide biosynthesis, nonribosomal peptide natural prod-
ucts are produced by nonribosomal peptide synthetase (NRPS) assembly lines.
A thioester template similar to the PKS systems is employed but with very
different extender units. In place of simple malonate and substituted malonate
groups, NRPSs utilize amino acids (proteinogenic and nonproteinogenic) as
their aminoacyl-AMP derivatives for chain extension. Minimal NRPSs consist
of an adenylation domain (A), peptidyl carrier protein (PCP) or thiolation
domain (T), and a condensation domain (C). The A domain is responsible
for loading the PCP or T domain with the appropriate aminoacyl component.
The condensation domain then catalyzes the peptide bond formation between
flanking aminoacyl-PCP/T domains. Auxiliary domains including methylation
(M), epimerization (E), cyclization (Cy), and TEs combine to control peptide
topology and functionality similar to aromatic PKS assemblies.
An increasing number of “hybrid” systems containing both NRPS and PKS
components are being identified. The compatibility of these systems speaks
of the mechanistic similarities and offers an additional level of potential
regarding genetic and chemical reprogramming. Despite the many lingering
questions concerning nonribosomal peptide and polyketide syntheses (vide
infa), our current level of understanding provides numerous possibilities for
combinatorial biosynthesis. It is clear that deciphering the elaborate interplay
between chemistry and biology that governs the reactivity in these systems will
require innovative thought and experimentation.
In the simplest of terms, manipulation of polyketide and nonribosomal
peptide components involves alteration of materials, tools, or both. From a
chemical standpoint, modification of building blocks can ideally result in
structures limited only by our imagination. Biologically, genetic control over
biosynthetic machinery could allow, theoretically, for boundless reprogram-
ming capabilities. Realistically, insight from both perspectives will be required
as enzyme selectivity and reactivity can impede combinatorial prospects.
With a basic understanding of the intricate construction of polyketides
and nonribosomal peptides, we can discuss the potential for biosynthetic
generation of analogous compounds. Chemical synthesis provides a powerful
9.2 Cornbinatorial Biosynthesis ofpolyketides and Nonribosornal Peptides I 523
9.2.2
History/Development
[24-261. The difficult task of adding these domains where they are absent
has been accomplished through generation of hybrid modules. Santi and
coworkers were able to control ultimate oxidation state of 6-deoxyerythronolide
B analogs by genetic insertion of redox-active domains from the rapamycin
synthase into various DEBS modules [27]. Interestingly, some modifications
resulted in incomplete reduction of intermediates possibly due to competition
between reduction and chain transfer to the downstream module. This
observation underscores the delicate reactivity balance that must be addressed
when combining domains and modules not naturally associated with one
another.
Attempts at altering polyketide chain length have resulted in a number
of abridged lactones. By repositioning the thioester domain in DEBS to the
C-terminal end of module 5, a 12-membered macrolactone analog of 10-
deoxymethynolide, the aglycon precursor to methymycin, was produced [28].
This study revealed the propensity for TE cyclization of nonnatural substrates,
which has since been used to permit multiple turnover experiments using
single, isolated modules. In contrast, the stand-alone TE domain exhibits
increased selectivity relative to those fused to various modules indicating a
possible change in the mechanism [29].
In contrast to the modular systems, our understanding of aromatic PKSs
remains largely undeveloped. However, this area does benefit from several
high-resolution crystal and solution structures of individual domains, which
provide enormous insight into enzyme specificity and mechanism [30-341.
The ability to program specifio polymerization parameters promises readily
accessible structure variation. By simply choosing an appropriate starter unit
and polyketide length determinant, arrays of small aromatic molecules could
be potentially designed.
To elucidate the precise role of the CLF, the chain length specificity in
the actinorhodin (act) and tetracenomycin (tcrn) PKSs was effectively altered
by site-specific mutagenesis of the CLF [35]. For this, residues associated
with the KS-CLF dimer interface (as determined from crystallographic data)
were compared across a number of aromatic PKSs that specifically produce
polyketide backbones ranging from Clb to C24. Mutation of two key residues
in the CLF enabled the production of decaketide products in the typically
octaketide-specific act system. Similarly, single point mutation of the wild-
type tcrn CLF effected conversion of a decaketide synthase to an octaketide
one. Importantly, overall polyketide yields in these mutant systems were
comparable to the natural synthases indicating no significant influence on
enzyme reactivity.
Some aromatic polyketides including frenolicin and R1128 are derived
from nonacetate starter units which require a unique primer module for
their incorporation into the iterative portion of the PKS [ll].Tang et al.
have recently combined the R1128 priming module with the actinorhodin
or tetracenomycin minimal PKS in an attempt to generate novel aromatic
polyketide structures [36-381. The engineered bimodular PKS could efficiently
526
I 9 Diversity-oriented Synthesis
HO-S -CoA
5x HO-S-CoA
9.2.3
General Considerations
I"' 0 '/OH
I+ O I", O
0 0
10 21 22 i3 24 is
9.2 Combinatorial Biosynthesis of Polyketides and Nonribosomal Peptides 1 529
9.2.4
Applications and Practical Examples
9.2.5
Future Development
&
\'
,
b b b
R+kIDlmu€
01
9.2 Cornbinatorid Biosynthesis ofpolyketides and Nonribosorna/ Peptides I 533
4 Fig. 9.2-7 Aromatic polyketide library from been identified are in gray. Blue - keto-
genetically combining initiation modules reductase (KR) requirements, red - cyclase
(IM) with minimal aromatic PKSs. requirements, green - other methyl trans-
Compounds that have been reported are ferases (MT), and additional KRs. Figure
shown in bold. Predicted combinations are taken from Y. Tang, T.S. Lee, H.Y. Lee,
shown in plain text. KS-CLFs that have not C. Khosla, Tetrahedron 2004, GO, 7659-7671.
9.2.6
Conclusion
Given a wealth of natural chemical scaffolds for improved drug design, our
ability to generate novel pharmaceuticals requires increased understanding of
the biosynthetic processes that may lead to their discovery and production.
Polyketide and nonribosomal peptide assembly offers enormous potential for
development of combinatorial biosynthetic methods. The structural complexity
of these natural products often prohibits practical chemical synthesis, which
underscores the need for alternative means of accessing them in usable
quantities. Research in this area requires in-depth knowledge of chemical,
9 Diversity-oriented Synthesis
534
I biological, and engineering principles that typify the field of chemical biology.
The studies highlighted in this chapter demonstrate significant forward
progress but there is much need for motivated scientists from all disciplines
to take part in the development and exploration of improved methods.
Acknowledgment
This work was supported by grants from the National Institutes of Health
(CA66736 and CA77248). Nathan A. Schnarr is a recipient of an NIH
postdoctoral fellowship.
References
I 537
10
Synthesis of Large Biological Molecules
10.1
Expressed Protein Ligation
Matthew R. Pratt and Tom W. Muir
Outlook
10.1.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
538
I 70 Synthesis of Large Biological Molecules
10.1.2
History/Development
EPL had its genesis in the convergence of chemical synthesis and protein
biochemistry. The established areas of peptide and protein chemistry provided
70.7 Expressed Protein Ligation I 539
10.1.3
General Considerations
has also been reported recently, which could be utilized in a similar fashion
to create peptide thioesters through thioylsis [44]. Another method involves
the coupling of “masked” thioester equivalents to fully protected peptide free
acids post-SPPS [45]. In one example (Fig. 10.1-2(b)),an amino acid derivative
was coupled to a fully protected peptide, followed by global deprotection, to
give a masked thioester intermediate. Treatment of this intermediate with
exogenous thiols reduces the disulfide bond, allowing for a spontaneous
rearrangement resulting in the formation of a peptide thioester. Finally, a
masked thioester equivalent has recently been introduced as a linker for SPPS
(Fig. 10.1-2(c))[46]. Standard cleavage conditions allows for the isolation of the
peptide-linker intermediate, which upon treatment with thiols, rearranges to
yield a peptide thioester. These examples, along with others, have been used
successfully in NCL and EPL syntheses of peptides and proteins.
As noted above, the production of recombinant protein thioesters was first
achieved by the use of mutant inteins rendered incapable of resolving their
Fig. 10.1-5 The extension of ligation technology past the requirement o f cysteine using
auxiliaries (a), desulfination (b), and the Staudinger ligation (c).
10.1.4
Applications and Practical Examples
EPL has been applied to an array of proteins ranging from kinases and
phosphatases, to transcription factors, polymerases, ion channels, and many
others. A variety of modifications have been introduced into these proteins,
allowing for studies of protein structure and function that would be
difficult with other techniques. Some of these applications are highlighted
below.
Fig. 10.1-6 Biosensor for c-Abl a change in the distance between the termini
phosphorylation o f c-Crk-ll. c-Abl ofthe protein. This change is reported by
phosphorylates Tyr221 of c-Crk-ll, which the FRET pair tetramethylrhodamine (Rh)
induces an intramolecular association with and fluorescein (FI) incorporated at the N-
the SH2 domain. This rearrangement yields and C-termini, respectively.
H2N(OH H N G : H H*N
H'N$H #? HO 0 H N
HPcH'
H,N
0
I -
- o=p-o
lH
,NO
uN,C
OH OH OH 0
H,N HN,
0 0 0 R
NorLeu Phospho-SerTThr Phospho-Tyr Tyr phosphonate N-Biotin EDTA
HO
HO
0 0 R R R=N'-Lysine
Fig. 10.1-7 Some o f t h e amino acids introduced into proteins using EPL.
versions containing either or both of the mucin domains (Fig. 10.1-8). The
two proteins containing only one mucin domain were synthesized using one
ligation site between a synthetic glycopeptide and a recombinant protein.
GlyCAM-1 containing both mucin domains was created using a three-part
sequential ligation strategy with two synthetic glycopeptides and a recombi-
nant thioester protected at the N-terminus with a factor Xa cleavage peptide.
The resulting glycoproteins bearing a-GalNAc residues can then be enzymati-
cally elaborated with further glycsosyltransferases to generate the endogenous
6-sulfo sialyl Lewis' motifs required for L-selectin binding.
Transforming growth factor /3 (TGFB) is a member of a large family of
secreted cytokines of central importance in the eukaryotic development and
homeostasis [72]. The initiation of TGFB signaling involves a ligand-induced
multiple phosphorylation event ofTGFB receptor I by TGFB receptor I1 (TBR-I
andTBR-I1respectively). This yields an activated TBR-I, enabling it to phospho-
rylate members of the Smad family of transcription factors. The modification
of Smads allows them to oligomerize, giving active transcription complexes
that can enter the nucleus and mediate gene expression. EPL has been used ele-
gantly to shed light on the molecular mechanisms of many of these steps in the
TGFB signaling pathway. To understand the activation of TBR-I by phosphory-
lation, a semisynthetic version of the receptor was produced containing three
phosphoserines and one phosphothreonine [73].Access to this homogeneous
preparation of activated TBR-I allowed the mechanism of receptor activation
to be studied for the first time [74].Accordingly, phosphorylation was shown
to increase the binding affinity of TBR-I for Smad2 and decrease its affinity
for an inhibitor of the pathway, FKBP12. These observations yielded a new
model of receptor activation in which phosphorylation of the receptor switches
it from an inhibited state into an activated form capable of binding substrate.
The next step in the pathway, the effectof phosphorylation on Smad2, has also
10.J Expressed Protein Ligation I 553
Fig. 10.1-9 Semisynthetic SmadZ containing two phosphoserines was used to confirm
the trimeric state of the active protein.
been investigated using EPL [75]. Phosphorylation occurs in the last two serine
residues in the C-terminus of Smad2 during signaling. It had been shown
previously that phosphorylation of h a d 2 is indispensable in TGFB signaling,
but how phosphorylation affects the conformation and function of Smad was
yet to be elucidated. To investigate this, a homogeneous, doubly phosphory-
lated version of Smad2 was synthesized. Biochemical studies on this protein
indicated that phosphorylation induced trimerization of the protein. As show
in Fig. 10.1-9,this conclusion was confirmed when the crystal structure of such
a trimer was determined. These investigations revealed how phosphorylation
of Smad2 allows dissociation from the activated TBR-I receptor and simulta-
neously induces hetero-oligomerization with a key regulatory protein, Smad4.
Muir and coworkers have used EPL to generate two semisynthetic versions
of Smad2 to probe its transport to the nucleus. The first such protein contains
two phosphates, a fluorescent probe, a fluorescence quenching molecule, and
a photocleavable linker (Fig. 10.1-10)[7G]. The linker acts as a bifunctional
caging group, both interfering with Smad2 trimerization and quenching the
fluorescence of the molecule. Thus, cleavage of this linker with light results in
the formation of active protein, as well as the induction of protein fluorescence.
Indeed, when examined by gel filtration, the caged protein was found to be
incapable of forming trimers, but after cleavage there was a clean conversion
to the trimeric state. Importantly, this was also accompanied by an -26-fold
increase in fluorescence. This caged protein is currently the focus of study
for unraveling the behavior of Smad2 and the kinetics of the TGFB signaling
pathway. In a complementary system, the same group synthesized a unique
version of Smad2 in which the phosphate groups on the last two serines
are photocaged (Fig. 10.1-ll(a))[77]. Again, the caged protein was unable
to form the obligatory trimers for signaling. However, after photoactivation
the phosphates were released and oligomerization could occur. Furthermore,
the semisynthetic protein was used successfully in a nuclear import assay
554
I 10 Synthesis of Large Biological Molecules
Fig. 10.1-10 Design of caged SmadZ based Photolysis with 365 n m light causes
on a modified C-terminal phosphopeptide. simultaneous activation of both Smad2 and
Fluorescence and activity of Smad2 are fluorescence.
blocked by a photocleavable caging group.
the use of an artificially split S. cerevisiae VMA intein. Two model exteins,
maltose binding protein (MBP) and a polyhistidine-containing sequence (HIS),
were used to explore the scope of the technology. CPS displays little to no
background and produces the product within 10min of the addition of
rapamycin, indicating the advantage of the posttranslational nature of CPS for
quick responses. Furthermore, the level of product formation was dose and
time dependent (Fig. 10.1-13(b))and can be attenuated with inhibitors of the
three-hybrid system, such as ascomycin [go].
Because of the promiscuity of inteins for their flanking extein sequences,
CPS is expected to have a certain level of generality. In fact, the only strict
extein sequence requirement is the cysteine residue of the C-extein, necessary
in EPL. In the most general form of CPS, a polypeptide with a novel func-
tion could be obtained by splicing together two fragments that lack function
individually. This general goal can be achieved in several ways. For example,
two domains of a protein that display no activity could be spliced together to
give a functional protein. Alternatively, one splicing partner could be a peptide
localization sequence, resulting in relocalization of the splicing product on
addition of rapamycin.
Liu and coworkers have recently developed a different strategy for small-
molecule activated protein splicing [91]. In this report, an intein was inserted
70.7 Expressed Protein Ligation I 559
into a protein of interest, interrupting its function, which is restored after splic-
ing. Simple insertion ofa natural ligand-binding domain into a minimal intein,
destroyed the splicing activity and yielded an evolvable intein-based molecular
switch that transduces binding of a srnall molecule into the activation of a
protein of interest. Specifically, the Mycobui-terium tuberculosis RecA intein was
modified with the human estrogen receptor- ( E R ) ligand binding domain (LBD)
(residues 304-55 I ) ,which binds the small-molecule 4-hydroxytamoxifen. This
protein was then evolved through multiplr rounds of mutation and selection
in S.ctrevkiat by linking the splicing to cell survival or fluorescence. Iterated
cycles of inutagenesis and selection yielded intcins with strong splicing activ-
ities that depended highly on the presencc ofthe srnall molecule. Insertion of
one of these inteins into different unrelated proteins in living cells revealed
560
I that the technology allows for ligand-dependent protein function that it is
10 Synthesis of Large Biological Molecules
10.1 .s
Future Development
Because of the power of EPL and protein splicing, these techniques will
undoubtedly be used for many applications in the future. EPL provides
researchers with a versatile tool for the study of protein function by allowing
the preparation of proteins containing both natural and artificial modifications.
As seen above, this technology is well suited for biochemical and biophysical
studies; however, it may also be a valuable tool for areas such as proteomics,
material science, and nanotechnology. For example, the Yao group has reported
on the preparation of a protein microarray by first biotinylating proteins
using EPL and then spatially arranging these on an avidin-coated slide [92].
Importantly, EPL ensures that the site of modification in all proteins is
consistent with respect to the site of immobilization, the C-terminus in this
case. These types ofprotein surfaces could be used for both proteomic profiling
of cellular interactions and protein modifications. In addition, homogeneous
surfaces coated with specific proteins can be prepared, which can be useful
for materials and other biophysical applications (e.g., assay development,
and cellular patterning). The highly controlled nature of EPL could also be
used in the areas of biomedicine, through the generation of novel protein
therapeutic drugs and diagnostic tools. In one such example, Sydor et al.
established conditions that allow single-chain antibodies to be utilized in EPL
reactions [93].Thus, it should now be possible to attach any synthetic molecule
to the C-terminus of an antibody. Used in conjugation with technologies such
as quantum dots and contrast reagents, EPL can be powerful in the area of
bioimaging, as well as vaccine development and targeted-drug delivery.
Protein transsplicing also has potential in the area of proteomics.
The Umezawa group has developed a two-hybrid approach to probe for
protein-protein interactions in the cytosol of prokaryotic [94] and eukaryotic
cells [95]. The strategy involves fusing each half of a reporter protein (GFP or
luciferase) to the appropriate end of a split intein. The intein fragments are
then fused to either a receptor protein (fish) or to a library of potential ligands
(bait). Interaction between a fish and bait pair results in protein splicing
and generation of an active reporter protein. This type of strategy could be
extended to profile interacting partners of a protein of interest, by tagging
binding partners with a reporter construct. CPS could also be extended to the
investigation of enzymes and signaling proteins. Indeed, this has already been
accomplished in vitro through the generation of an inducible version of the
kinase PKA [96]. Extrapolation of this technology to cellular systems should
References I561
follow in due course, and the development of nontoxic rapamycin analogs [97]
may broaden the technology to living animals.
10.1.6
Conclusion
References
1. P. Cohen, The development and Natl. Acad. Sci. U.S.A. 1998, 95,
therapeutic potential of protein kinase 6705-6710.
inhibitors, Curr. Opin. Chem. Bid. 9. K. Severinov, T.W. Muir, Expressed
1999, 3,459-465. protein ligation, a novel method for
2. N.L. Pohl, Functional proteomics for studying protein-protein interactions
the discovery of carbohydrate-related in transcription, J . Biol. Chem. 1998,
enzyme activities, C u r . Opin. Chem. 273,16205-16209.
Biol. 2005, 9, 76-81. 10. T.C. Evans Jr, I. Benner, M.Q.Xu,
3. J.M. Antos, M.B. Francis, Selective Semisynthesis of cytotoxic proteins
tryptophan modification with rhodium using a modified protein splicing
carbenoids in aqueous solution, J . Am. element, Protein Sci. 1998, 7,
Chem. SOC.2004, 126,10256-10257. 2256-2264.
4. N.S. Joshi, L.R. Whitaker, M.B. 11. T.W. Muir, Semisynthesis ofproteins
Francis, A three-component by expressed protein ligation, Annu.
Mannich-type reaction for selective Rev. Biochem. 2003, 72, 249-289.
tyrosine bioconjugation, J. Am. Chem. 12. R. David, M.P. Richter, A.G.
SOC.2004, 126,15942-15943. Beck-Sickinger, Expressed protein
5. I. Chen, A.Y. Ting, Site-specific ligation. Method and applications, Eur.
labeling of proteins with small J . Biochem. 2004, 271,663-677.
molecules in live cells, Curr. Opin. 13. C.J. Wallace, Peptide ligation and
Biotechnol. 2005, 16, 35-40. semisynthesis, Curr. Opin. Biotechnol.
6. P.M. England, Unnatural amino acid 1995, 6,403-410.
mutagenesis: a precise tool for probing 14. D.F. Dyckes, T. Creighton, R.C.
protein structure and function, Sheppard, Spontaneous re-formation
Biochemistry 2004, 43, 11623-11629. of a broken peptide chain, Nature
7. L. Wang, P.G. Schultz, Expanding the 1974,247,202-204.
genetic code, Angew. Chem., Int. Ed. 15. C.J. Wallace, I. Clark-Lewis,
E& 2004, 44,34-66. Functional role of heme ligation in
8. T.W. Muir, D. Sondhi, P.A. Cole, cytochrome c. Effects of replacement
Expressed protein ligation: a general of methionine 80 with natural and
method for protein engineering, Proc. non-natural residues by
10 Synthesis of Large Biological Molecules
562
I semisynthesis,]. Biol. Chem. 1992, 25. K. Rose, Facile synthesis of homo-
267,3852-3861. geneous artificial proteins,]. Am.
16. Y. Chen, Y.W. Ebright, R.H. Ebright, Chem. SOC.1994, 116,30-33.
Identification of the target of a 26. M. Schnnlzer, S.B.H. Kent,
transcription activator protein by Constructing proteins by dovetailing
protein-protein photocrosslinking, unprotected synthetic
Science 1994, 265, 90-92. peptides-backbone-engineered HIV
17. J. Mukhopadhyay, A.N. Kapanidis, protease, Science 1992, 256, 221-225.
V. Mekler, E. Kortkhonjia, Y.W. 27. P.E. Dawson, S.B. Kent, Synthesis of
Ebright, R.H. Ebright, Translocation native proteins by chemical ligation,
ofo(70)with RNA Polymerase during Annu. Rev. Biochem. 2000, 69,
transcription: fluorescence resonance 923-960.
energy transfer assay for movement 28. p , ~D, ~T.W. ~~ ~ ~ i ~~ ,~ ,
relative to DNA, Cell 2001, 106, I. Clark-Lewis,S.B. Kent, Synthesis of
45 3-463. proteins by native chemical ligation,
18. D. Macmillan, R.M. Bill, K.A. Sage, Science 1994, 266, 776-779.
D. Fern, S.L. Flitsch, Selective in vitro 29. M, Chytil, B,R, peterson, D,A,
glycosylation of recombinant proteins: Erlanson, G,L, Verdine, The
semi-synthesis Of novel homogeneous orientation ofthe AP-1 heterodimer on
glycoforms of human erythropoietin, DNA strongly affects transcriptional
Chem. Bid. 2001, 8,133-145.
potency, Proc. Natl. Acad. Sci. U.S.A.
19. M. Ghosh, I. Ichetovkin, X. Song, J.S.
1998, 95, 14076-14081,
Condeelis, D.S. Lawrence, A new
30. C.J. Noren, J. Wang, F.B. Perler,
strategy for caging proteins regulated
Dissecting the chemistry of protein
by kinases,]. Am. Chem. SOC.2002,
splicing and its applications, Angew.
124,2440-2441.
Chem., [nt. Ed. Engl. 2000, 39,
20. G.A. Homandberg, M. Laskowski Jr,
450-466.
Enzymatic resynthesis of the
31. H. Paulus, Protein splicing and related
hydrolyzed peptide bond(s) in
ribonuclease S, Biochemistry 1979, 18, forms of protein autoprocessing,
586-592. Annu. Rev. Biochem. 2000, 69,
21. D.Y. Jackson, J. Burnier, C. Quan, 447-496.
M. Stanley, J. Tom, J.A. Wells, A 32. M.Q. Xu, F.B. Perler, The mechanism
designed peptide ligase for total of protein splicing and its modulation
synthesis of ribonuclease A with by mutation, EMBO]. 1996, 15,
unnatural catalytic residues, Science 5146-5153.
1994,266,243-247. 33. I. Giriat, T.W. Muir, F.B. Perler,
22. F. Bordusa, Proteases in organic Protein splicing and its applications,
synthesis, Chem. Rev. 2002, 102, Genet. Eng. (N.Y.) 2001, 23, 171-199.
4817-4868. 34. S. Chong, F.B. Mersha, D.G. Comb,
23. H.F. Gaertner, K. Rose, R. Cotton, M.E. Scott, D. Landry, L.M. Vence,
D. Timms, R. Camble, R.E. Offord, F.B. Perler, J. Benner, R.B. Kucera,
Construction of protein analogues by C.A. Hirvonen, J.J. Pelletier,
site-specificcondensation of H. Paulus, M.Q. Xu, Single-column
unprotected fragments, Bioconjugate purification of free recornbinant
Chem. 1992,3,262-268. proteins using a self-cleavableaffinity
24. H.F. Gaertner, R.E. Offord, R. Cotton, tag derived from a protein splicing
D. Timms, R. Camble, K. Rose, element, Gene 1997, 192,271-281.
Chemo-enzymic backbone 35. T.C. Evans Jr, J. Benner, M.Q. Xu, The
engineering of proteins. Site-specific in vitro ligation of bacterially
incorporation of synthetic peptides expressed proteins using an intein
that mimic the 64-74 disulfide loop of from Methanobacterium themoauto-
granulocyte colony-stimulating factor, trophicum, 1.Bid. Chem. 1999, 274,
I. Bid. Chem. 1994, 269,7224-7230. 3923-3926.
References I 5 6 3
36. S. Mathys, T.C. Evans, I.C. Chute, synthetic glycoproteins by ultimately
H. Wu, S. Chong, J. Benner, X.Q. Liu, convergent routes: a solution to a
M.Q. Xu, Characterization of a long-standing problem, /. Am. Chem.
self-splicing mini-intein and its Soc. 2004, 126,6576-6578,
conversion into autocatalytic N- and 46. P. Botti, M. Villain, S. Manganiello,
C-terminal cleavage elements: facile H. Gaertner, Native chemical ligation
production of protein building blocks through in situ 0 to S acyl shift, Org.
for protein ligation, Gene 1999, 231, Lett. 2004, 6, 4861-4864.
1-13. 47. M. Villain, J. Vizzavona, K. Rose,
37. D.W. Wood, W. Wu, G. Belfort, Covalent capture: a new tool for the
V. Derbyshire, M. Belfort, A genetic purification of synthetic and
system yields self-cleaving inteins for recombinant polypeptides, Chem. Biol.
bioseparations, Nut. Biotechnol. 1999, 2001, 8,673-679.
17,889-892. 48. D. Bang, S.B. Kent, A one-pot total
38. M.W. Southworth, E. Adam, synthesis of crambin, Angew.Chem.,
D. Panne, R. Byer, R. Kautz, F.B. lnt. Ed. Engl. 2004, 43, 2534-2538.
Perler, Control of protein splicing by 49. G.J. Cotton, B. Ayers, R. Xu, T.W.
intein fragment reassembly, E M B O J . Muir, Insertion of a synthetic peptide
1998, 17,918-926. into a recombinant protein
39. K.V. Mills, B.M. Lew, S. Jiang, framework: a protein biosensor, /. Am.
H. Paulus, Protein splicing in trans by Chem. Soc. 1999, 121, 1100-1101.
purified N- and C-terminal fragments 50. R.M. Hofmann, T.W. Muir, Recent
of the Mycobacterium tuberculosis RecA advances in the application of
intein, Proc. Natl. Acad. Sci. U.S.A. expressed protein ligation to protein
1998, 95, 3543-3548. engineering, Curr. Opin. Biotechnol.
40. H. Wu, Z. Hu, X.Q. Liu, Protein 2002, 13,297-303.
trans-splicing by a split intein encoded 51. L.E. Canne, S.J. Bark, S.B. Kent,
in a split DnaE gene of Synechocystis Extending the applicability of native
sp. PCC6803, Proc. Nutl. Acad. Sci. chemical ligation, 1.Am. Chem. Soc.
U.S.A. 1998, 95,9226-9231. 1996, 118,5891-5896.
41. T. Yamazaki, T. Otomo, N. Oda, 52. L.Z. Yan, P.E. Dawson, Synthesis of
Y. Kyogoku, K. Uegaki, N. Ito, peptides and proteins without cysteine
Y. Ishino, H. Nakamura, Segmental residues by native chemical ligation
isotope labeling for protein NMR combined with desulfurization, 1.Am.
using peptide splicing, /. Am. Chem. Chem. SOC. 2001, 123,526-533.
SOC.1998, 120,5591-5592. 53. E. Saxon, C.R. Bertozzi, Cell surface
42. B.J. Backes, ].A. Ellman, An engineering by a modified Staudinger
alkanesulfonamide “safety-catch” reaction, Science 2000, 287,
linker for solid-phase synthesis, /. Org. 2007-2010.
Chem. 1999, 64,2322-2330. 54. B.L. Nilsson, R.J. Hondal, M.B.
43. Y. Shin, K.A. Winans, B.J. Backes, Soellner, R.T. Raines, Protein
S.B.H. Kent, J.A. Ellman, C.R. assembly by orthogonal chemical
Bertozzi, Fmoc-based synthesis of ligation methods, 1.Am. Chem. Soc.
peptide-(cu)thioesters: Application to 2003, 125,5268-5269.
the total chemical synthesis of a 55. R.J. Hondal, B.L. Nilsson, R.T. Raines,
glycoprotein by native chemical Selenocysteine in native chemical
ligation, /. Am. Chem. Soc. 1999, 121, ligation and expressed protein
11684-11689. ligation, /. Am. Chem. SOC.2001, 123,
44. Y. Kwon, K. Welsch, A.R. Mitchell, 5140- 5141.
J.A. Camarero, Preparation of peptide 56. D. Wang, P.A. Cole, Protein tyrosine
p-nitroanilides using an aryl hydrazine kinase Csk-catalyzed phosphorylation
resin, Org. Lett. 2004, 6, 3801-3804. of Src containing unnatural tyrosine
45. 1.D. Warren, 1,s. Miller, S.I. Keding, analogues, 1. Am. Chem. Sac. 2001,
S.J. Danishekky, Toward fully ” 123, f883-8886.
10 Synthesis of Large Biological Molecules
564
I 57. K. Alexandrov, I . Heinemann, K. Alexandrov, Structure of Rab
T. Durek, V. Sidorovitch, R.S. Goody, GDP-dissociation inhibitor in complex
H. Waldmann, Intein-mediated with prenylated YPTl GTPase, Science
synthesis of geranylgeranylated Rab7 2003,302,646-650.
protein in vitro, /. Am. Chem. SOC. 67. P.R. Selvin, Fluorescence resonance
2002, 124,5648-5649. energy transfer, Methods Enzymol.
58. R. Xu, B. Ayers, D. Cowburn, T.W. 1995,246,300-334.
Muir, Chemical ligation of folded 68. G.J. Cotton, T.W. Muir, Generation of
recombinant proteins: segmental a dual-labeled fluorescence biosensor
isotopic labeling of domains for N M R for Crk-I1 phosphorylation using
studies, Proc. Natl. Acad. Sci. U.S.A. solid-phase expressed protein ligation,
1999, 96, 388-393. Chem. Biol. 2000, 7,253-261.
59. F.I. Valiyaveetil, R. MacKinnon, T.W. 69. R.M. Hofmann, G.J. Cotton, E. J.
Muir, Semisynthesis and folding of Chang, E. Vidal, D. Veach,
the potassium channel KcsA, 1.Am. W. Bornmann, T.W. Muir,
Chem. SOC.2002, 124,9113-9120. Fluorescent monitoring of kinase
60. T.M. Hackeng, J.H. Griffin, P.E. activity in real time: development of a
Dawson, Protein synthesis by native robust fluorescence-based assay for
chemical ligation: expanded scope by Abl tyrosine kinase activity, Bioorg.
using straightforward methodology, Med. Chem. Lett. 2001, 11,3091-3094.
Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 70. A. Varki, R. Cummings, J. Esko,
10068- 10073. Essentials of Clycobiology, Cold Spring
61. S. Chong, K.S. Williams, Harbor Labs, Cold Spring Harbor,
C. Wotkowicz, M.Q. Xu, Modulation 1999.
of protein splicing of the 71. D. Macmillan, C.R. Bertozzi, Modular
Saccharomycescerevisiae vacuolar assembly of glycoproteins: towards the
membrane ATPase intein, /. Biol. synthesis of GlyCAM-1 by using
Chem. 1998,273,10567-10577. expressed protein ligation, Angew.
62. R.Y. Tsien, The green fluorescent Chem., Int. Ed. Engl. 2004, 43,
protein, Annu. Rev. Biochem. 1998, 67, 1355-1359.
509-544. 72. P.M. Siegel, J. Massague, Cytostatic
63. T.J. Tolbert, C.-H. Wong, Intein- and apoptotic actions of TGFP in
mediated synthesis of proteins homeostasis and cancer, Nat. Rev.
containing carbohydrates and other Cancer 2003,3,807-821.
molecular probes, /. Am. Chem. SOC. 73. M. Huse, M.N. Holford, J. Kuriyan,
2000, 122,5421-5428. T.W. Muir, Semisynthesis of
64. V. Mekler, E. Kortkhonjia, hyperphosphorylated type I TGFB
J. Mukhopadhyay, J. Knight, receptor: addressing the mechanism
A. Revyakin, A.N. Kapanidis, W. Niu, of kinase activation, /. Am. Chem. SOC.
Y.W. Ebright, R. Levy, R.H. Ebright, 2000, 122,8337-8338.
Structural organization of bacterial 74. M . Huse, T.W. Muir, L. Xu, Y.G.
RNA polymerase holoenzyme and the Chen, J. Kuriyan, J. Massague, The
RNA polymerase-promoter open TGF beta receptor activation process:
complex, Cell 2002, 108, 599-614. an inhibitor- to substrate-binding
65. A. lakovenko, E. Rostkova, switch, Mol. Cells 2001, 8, 671-682.
E. Merzlyak, A.M. Hillebrand, N.H. 75. J.W. Wu, M. Hu, J. Chai, J. Seoane,
Thoma, R.S. Goody, K. Alexandrov, M. Huse, C. Li, D.J. Rigotti, S. Kyin,
Semi-synthetic Rab proteins as tools T.W. Muir, R. Fairman, J. Massague,
for studying intermolecular Y. Shi, Crystal structure o f a
interactions, FEBS Lett. 2000, 468, phosphorylated Smad2. Recognition
155- 158. ofphosphoserine by the MH2 domain
66. A. Rak, 0. Pylypenko, T. Durek, and insights on Smad function in
A. Watzke, S. Kushnir, L. Brunsveld, TGF-beta signaling, Mol. Cells 2001, 8,
H. Waldmann, R.S. Goody, 1277-1289.-
References I 5 6 5
76. J.P. Pellois, M.E. Hahn, T.W. Muir, protein fold through backbone
Simultaneous triggering of protein cyclization, /. Mol. Biol. 2001, 308,
activity and fluorescence, /. Am. Chem. 1045- 1062.
Soc. 2004, 126,7170-7171. 86. D.P. Goldenberg, T.E. Creighton,
77. M.E. Hahn, T.W. Muir, Photocontrol Folding pathway of a circular form of
of Smad2, a multiphosphorylated bovine pancreatic trypsin inhibitor, /.
cell-signaling protein, through caging Mol. Biol. 1984, 179, 527-545.
of activating phosphoserines, Angew. 87. T.M. Kinsella, C.T. Ohashi, A.G.
Chem., Int. Ed. Engl. 2004, 43, Harder, G.C. Yam, W. Li, B. Peelle,
5800-5803. E.S. Pali, M.K. Bennett, S.M.
78. F.I. Valiyaveetil, M. Sekedat, Molineaux, D.A. Anderson, E.S.
R. Mackinnon, T.W. Muir, Glycine as Masuda, D.G. Payan, Retrovirally
a D-amino acid surrogate in the delivered random cyclic Peptide
K(+)-selectivity filter, Proc. Natl. Acad. libraries yield inhibitors of
Sci. U.S.A. 2004, 101,17045-17049. interleukin-4 signaling in human B
79. D. Cowburn, T.W. Muir, Segmental cells, J . Biol. Chem. 2002, 277,
isotopic labeling using expressed 37512-37518.
protein ligation, Methods Enzymol. 88. I. Giriat, T.W. Muir, Protein
2001,339,41-54. semi-synthesis in living cells, /,Am.
80. J.A. Camarero, A. Shekhtman, E.A. Chem. SOC.2003, 125,7180-7181.
Campbell, M. Chlenov, T.M. Gruber, 89. H.D. Mootz, T.W. Muir, Protein
D.A. Bryant, S.A. Darst, D. Cowburn, splicing triggered by a small molecule,
T.W. Muir, Autoregulation of a 1.Am. Chem. SOC.2002, 124,
bacterial m factor explored by using 9044- 9045.
segmental isotopic labeling and N M R , 90. H.D. Mootz, E.S. Blum,A.B.
Proc. Natl. Acad. Sci. U.S.A. 2002, 99, Tyszkiewicz, T.W. Muir, Conditional
8536-8541. protein splicing: a new tool to control
81. A. Romanelli, A. Shekhtman, protein structure and function in vitro
D. Cowburn, T.W. Muir, and in vivo, J. Am. Chem. SOC.2003,
Semisynthesis of a segmental 125,10561-10569.
isotopically labeled protein splicing 91. A.R. Buskirk, Y.C. Ong, Z. J. Gartner,
precursor: N M R evidence for an D.R. Liu, Directed evolution of ligand
unusual peptide bond at the dependence: small-molecule-activated
N-extein-intein junction, Proc. Natl. protein splicing, Proc. Natl. Acad. Sci.
Acad. Sci. U.S.A. 2004, 101, U.S.A. 2004, 101, 10505-10510.
6397 - 6402. 92. M.L. Lesaicherre, R.Y.P. Lue, G.Y.J.
82. H. Iwai, A. Lingel, A. Pluckthun, Chen, Q. Zhu, S.Q. Yao,
Cyclic green fluorescent protein Intein-mediated biotinylation of
produced in vivo using an artificially proteins and its application in a
split PI-PfuI intein from Pyrococcus protein microarray, I . Am. Chem. SOC.
furiosus,J. Biol. Chem. 2001, 276, 2002, 124,8768-8769.
16548-16554. 93. J.R. Sydor, M. Mariano, S. Sideris,
83. H. Iwai, A. Pluckthun, Circular S. Nock, Establishment of
b-lactamase: stability enhancement by intein-mediated protein ligation under
cyclizing the backbone, FEBS Lett. denaturing conditions: C-terminal
1999,459,166-172. labeling of a single-chain antibody for
84. C.P. Scott, E. Abel-Santos, M. Wall, biochip screening, Bioconjugate Chem.
D.C. Wahnon, S.J. Benkovic, 2002, 13,707-712.
Production of cyclic peptides and 94. T. Ozawa, S. Nogami, M. Sato,
proteins in vivo, Proc. Natl. Acad. Sci. Y. Ohya, Y. Umezawa, A fluorescent
U.S.A. 1999, 96,13638-13643. indicator for detecting protein-protein
85. J.A. Camarero, D. Fushman, S. Sato, interactions in vivo based on protein
I. Giriat. D. Cowburn, D.P. Raleigh, splicing, Anal. Chem. 2000, 72,
T.W. Muir, Rescuing a destabilized 515 1- 5157.
566
I 10 Synthesis of Large Biological Molecules
95. T. Ozawa, A. Kaihara, M. Sato, Angew. Chem., Int. Ed. Engl. 2004, 43,
K. Tachihara, Y. Umezawa, Split 5189-5192.
luciferase as an optical probe for 97. S.D. Liberles, S.T. Diver, D.J. Austin,
detecting protein-protein interactions S.L. Schreiber, Inducible gene
in mammalian cells based on protein expression and protein translocation
splicing, Anal. Chern. 2001, 73, using nontoxic ligands identified by a
2516-2521. mammalian three-hybrid screen, Proc.
96. H.D. Mootz, E.S. Blum, T.W. Muir, Natl. Acad. Sci. U.S.A.1997, 94,
Activation of an autoregulated protein 7825-7830.
kinase by conditional protein splicing,
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
10.2
Chemical Synthesis o f Proteins and Large Bioconjugates
Philip Dawson
Outlook
This chapter describes the strategies and techniques used to chemically syn-
thesize large macromolecules. Due to the large size and functional diversity of
biological macromolecules, traditional approaches that require extensive use
of protecting groups have limited utility. Instead, biological macromolecules
are synthesized using chemical ligation methods that utilize highly chemose-
lective reactions to link medium sized synthetic precursors without the need
of extensive functional group protection. Although these reactions are used for
the synthesis of carbohydrates and nucleic acids, the general principles will be
described with a focus on the chemical synthesis of proteins.
10.2.1
Introdudion
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Cunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA. Weinheim
ISBN: 978-3-527-31150-7
568
I 10 Synthesis of Large Biological Molecules
10.2.2
History/Developrnent
10.2.2.3 Protein Synthesis using Peptide Fragments Derived from Solid Phase
Peptide Synthesis
The ability of SPPS to generate high purity polypeptides (30-GO amino
acids) in reasonable yields (5-25% based on the loading of the C-terminal
amino acid) has lead to the development of approaches to assemble these
570
I polypeptide fragments into the large polypeptides that compose proteins. One
10 Synthesis $Large Biological Mo/ecu/es
approach uses the backbone protection methods described above to enable the
purification and assembly of protected peptide fragments [25].However, more
frequently, these approaches start with largely unprotected peptides derived
from SPPS and purified by HPLC.
Fig. 10.2-1 Thioester method for the fragment condensation of partially protected
peptides. (R = Horalkyl).
10.2 Chemical Synthesis ofproteins and Large Bioconjugates I 571
C-terminus, most side chains did not need protection except for the Cys thiol
group. In addition, this approach was not demonstrated using Lys with an
unprotected side chain amine. However, these acyl transfer reactions pro-
ceeded over several hours in dimethylsulfoxide (DMSO)/base and enabled the
synthesis of several peptides, up to 39 amino acids.
(b)
eOMeo
Me0
Fig. 10.2-6
H
N
]+
OMe
- (C)
*s-
HN3NpGzq
X
Auxiliary mediated native chemical ligation. (a) trans thioesterification,
S-to-acyl tranfer, removal of auxiliary. (b) Tmb auxiliary (c) Z-phenylethane thiol auxiliary
70 Synthesis $Large Bio/ogica/ Molecules
578
I ligation [66-691. Both these approaches enable ligation when there is a Gly
residue at the ligation junction.
10.2.3
General Considerations
10.2.4
Applications and Practical Examples
XRANTES(4-68)
Moderate potency 0 - 0
Natural Product &- ,
?'il
Nzi
HO Position 1 optimization
0 XRANTES(4-68)
0-
HO
. o
Position 2 optimization
4
~ ' ~ i Y ! 4 A0 N T E S ( 4 - 6 8 )
Highly potent
O . 0
protien mimetic
A
Fig. 10.2-9 Protein Medicinal Chemistry. The N-terminus of the chemokine RANTES was
systematically modified to improve receptor binding and HIV microbicide activity.
Fig. 10.2-10 Total synthesis of HIV-1 matrix protein with an N-terminal myristoyl group.
proteins. HIV-1 matrix protein p17 is a 131 amino acid protein with an
N-terminal myristoyl (C14) group. When covalently linked to the HIV Gag
polyprotein, p17 targets the polyprotein to the host-cell membrane for particle
assembly. However, on HIV viral maturation, proteolytic cleavage occurs
at the C-terminus of p17 and enables p17 to partially dissociate from the
viral membrane. Since large quantities of myristoylated p17 cannot be
obtained through heterologous expression systems, the protein was chemically
synthesized to study the effects ofmyristoylation on p17 structure and function.
As shown in Fig. 10.2-10, the 131 amino acid protein was assembled from
three peptide segments using an S-Acm protecting group for the peptide
corresponding to residues 56-85 to avoid cyclization of this central subunit.
Using this approach, 275 mg of this 15-kDa lipoprotein was synthesized which
enabled detailed biophysical measurements. These studies suggest that the
role of the myristoyl group is to stabilize the trimeric folded state of the
protein rather than to effect a conformational change as had been previously
proposed. Significantly, this large protein was synthesized with an overall yield
of 7.5% based on the loading of the peptide resin used in solid phase synthesis,
emphasizing the efficiency of the synthetic procedures (over 300 synthetic
steps were performed in the synthesis of this protein).
a linear template that contains multiple reactive groups onto which linear
peptides can be ligated to generate a branched peptide structure. Chemical
ligation approaches are the methods of choice for the generation of such
template assembled synthetic protein (TASP) [77]and multiantigenic peptide
(MAP) [78] structures, and they have been assembled using thioester [79],
thioether, oxime, hydrazone, and thiazolidine ligation reactions.
A notable example of this approach for assembling proteins is the synthesis of
tetrameric and pentameric TASP molecules on the basis of the transmembrane
(TM)domain of HIV virus protein u (Vpu).Viral membrane proteins frequently
oligomerize to form ion channels but analysis ofthese channels is complicated
by difficulties in determining the oligornerization state of the protein. As a
result, the chemical synthesis of branched peptides with a desired (four or five)
stoichiometry of TM peptides is an attractive approach. However, TM peptides
are highly insoluble, which complicates the purification and assembly of the
multimeric product. To overcome these problems, polyethylene glycol-derived
polyamide (PPO) solubilization tag was attached through a cleavable thioester
bond to the C-terminus of each Vpu TM peptide. In order to ligate the peptides
to the tetravalent or pentavalent template, an N-terminal aminooxy group was
incorporated to each TM peptide, complementary to the ketoamide moieties
on the template. As shown in Fig. 10.2-11,this synthetic strategy enabled the
assembly of soluble Vpu TM-PPO-based TASP molecules with a molecular
weight of over 20 000 Da. Cleavage of the thioester link to the solubilizing
PPO moiety and incorporation into liposomes enabled the characterization of
4 and 5 helical bundle ion channels. Conductivity measurements on these Vpu
TASP molecules suggest that a pentamer is the oligomeric state of the Vpu
ion channel.
Another nonlinear architecture that has been explored in proteins is head-
to-tail cyclization. Small cyclic peptides are common in peptidomimetic efforts
to mimic protein loops using peptides but traditional peptide cyclization
methods are not applicable to large polypeptide chains. Cyclic proteins
can be synthesized from a polypeptide containing both an N-terminal
Cys and a C-terminal thioester [80-821. It has been shown in multiple
proteins that the intramolecular ligation reaction proceeds at a faster rate
than the competing polymerization reaction yielding near-quantitative cyclic
polypeptide structures. This procedure has been used to synthesize naturally
cyclic proteins such as the cyclotide family [82] and also engineered cyclic
proteins designed to increase thermodynamic stability [SO-821.
Protein cyclization was taken one step further by the synthesis of a protein
catenane, consisting of two interlocked cyclic peptides [83, 841. This structure
was designed from the tetramerization domain of p53 which folds in a
bisecting U conformation (Fig. 10.2-12). To construct the catenane, linear
peptides corresponding to the p53 tet domain were synthesized with both an
N-terminal Cys and a C-terminal thioester. The catenane was assembled by
folding the peptide to preorganize the bisecting conformation. Since protein
folding is faster than chemical ligation, native chemical ligation of the ends
586
I 10 Synthesis $Large Biological Molecules
10.2.5
Future Directions
CGGGEY ~'TLVIKGKERt;EMFKELNEALELKDAQAGKEPCIG-COS~
Fig. 10.2-12 Synthesis of a protein catenane based on the p53 tetramerization domain
References
orthogonal coupling methods, Proc. 73. T.M. Hackeng, J.H. Griffin, P.E.
Natl. Acad. Sci. U.S.A. 1995, 92, 12485. Dawson, Protein synthesis by native
62. L.Z. Yan, P.E. Dawson, Synthesis of chemical ligation: expanded scope by
peptides and proteins without cysteine using straightforward methodology,
residues by native chemical ligation Proc. Natl. Acad. Sci. U.S.A.1999, 96,
combined with desulfurization, /. Am. 10068.
Chem. Soc. 2001, 123, 526. 74. M. Villain, H. Gaertner, P. Botti,
63. L.E. Canne, S.J. Bark, S.B.H. Kent, Native chemical ligation with aspartic
Extending the applicability of native and glutamic acids as C-terminal
chemical ligation, J . Am. Chem. Soc. residues: scope and limitations, Eur. /.
1996, 118,5891. Org. Chem. 2003, 17, 3267.
64. J. Offer, P.E. Dawson. N"-2- 75. G.S. Beligere, P.E. Dawson,
Mercaptobenzylamine-assisted Conformationally assisted protein
chemical ligation, Org. Lett. 2000, ligation using C-terminal thioester
2, 23. peptides,J. Am. Chem. SOC. 1999, 121,
65. J. Offer, C.N. Boddy, P.E. Dawson, 6332.
Extending synthetic access to proteins 76. C.L. Hunter, G.G. Kochendoerfer,
with a removable acyl transfer Native chemical ligation of
auxiliary, 1.Am. Chem. Soc. 2002, 124, hydrophobic [corrected] peptides in
4642.
lipid bilayer systems, Bioconjugate
66. T. Kawakami, K. Akaji, S. Aimoto, Chem. 2004, 15,437.
Peptide bond formation mediated
77. M. Mutter, P. Dumy, P. Garrouste,
by 4,5-dimethoxy-2-
C. Lehmann, M. Mathieu, C. Peggion,
mercaptobenzylamine after periodate
S. Peluso, A. Razaname,
oxidation of the N-terminal serine
G . Tuchscherer, Template assembled
residue, Org. Lett. 2001, 3, 1403.
synthetic proteins (tasp) as functional
67. C. Marinzi, J. Offer, R. Longhi, P.E.
mimetics of proteins, Angew.Chem.,
Dawson, An o-nitrobenzyl scaffold for
peptide ligation: synthesis and
Int. Ed. Engl. 1996, 35, 1482.
applications, Bioorg. Med. Chem. 2004, 78. J.P. Tam, Recent advances in multiple
12, 2749. antigen peptides, /. Immunol. Methods
68. P. Botti, M. Villain, S. Manganiello, 1996, 196, 17.
H. Gaertner, Chemical synthesis of 79. P.E. Dawson, S.B.H. Kent, Convenient
proteins through native and extended total synthesis of a 4-helix
chemical ligation, Biopolymers 2003, template-assembled synthetic protein
71, 283. (TASP) molecule by chemoselective
69. P. Botti, M.R. Carrasco, S.B.H. Kent, ligation, /. Am. Chem. Sac. 1993, 215,
Native chemical ligation using 7263.
removable N-alpha-(l-phenyl-2- 80. J.P. Tam, Y.A. Lu, Synthesis of large
mercaptoethyl) auxiliaries, Tetrahedron cyclic cystine-knot peptide by
Lett. 2001, 42, 1831. orthogonal coupling strategy using
70. T.M. Hackeng, J.A. Fernandez, P.E. unprotected peptide precursor,
Dawson, S.B. Kent, J.H. Griffin, Tetrahedron Lett. 1997, 38, 5599.
Chemical synthesis and spontaneous 81. J.A. Camarero, T.W. Muir,
folding of a multidomain protein: Biosynthesis of a head-to-tail cyclized
anticoagulant microprotein S, Proc. protein with improved biological
Natl. Acad. Sci. U.S.A. 2000, 97, 14074. activity, /. Am. Chem. Soc. 1999, 121,
71. G.S. Beligere, P.E. Dawson, Synthesis 5597.
of a three zinc finger protein, Zif268, 82. N.L. Daly, S. Love, P.F. Alewood, D.J.
by native chemical ligation, Craik, Chemical synthesis and folding
Biopolymers 2000, 52, 363. pathways of large cyclic polypeptide:
72. D. Bang, S.B. Kent, A one-pot total studies of the cystine knot polypeptide
synthesis of crambin, Angew. Chem., kalata B1, Biochemistry 1999, 38,
Int. Ed. Engl. 2004, 43, 2534. 10606.
7 0 Synthesis of Large Bio/ogica/Mo/ecules
592
I 83. L.Z. Yan, P.E. Dawson, Design and 88. L.E. Canne, P. Botti, R.J. Simon,
synthesis of a protein catenane, Angew. Y.J. Chen, E.A. Dennis, S.B.H. Kent,
Chem., lnt. Ed. Engl. 2001, 40, 3625. Chemical Protein Synthesis by Solid
84. J.W. Blankenship, P.E. Dawson, phase ligation, J . Am. Chem. Soc.,
Thermodynamics of a designed 1999, 121,8720.
protein catenane, J . Mol. Biol. 2003, 89. A. Brik, E. Keinan, P.E. Dawson,
327, 537. Protein synthesis by solid-phase
85. J.D. Warren, J.S. Miller, S.J. Keding, chemical ligation using a safety
S.J. Danishefsky, Toward fully catch linker, J. Org. Chem. 2000, 65,
synthetic glycoproteins by ultimately 3829.
convergent routes: a solution to a 90. D. Bang, S.B.H. Kent, A one-pot total
long-standing problem, J . Am. Chem. synthesis of crambin, Angew. Chem.,
SOC.2004, 126, 6576. lnt. Ed. Engl. 2004, 43, 2534.
86. R.S. Goody, T. Durek, H. Waldmann, 91. T.W. Muir, Development and
L. Brunsveld, K. Alexandrov, in application of expressed protein
GTPases Regulating Membrane ligation, Synlett 2001,733.
Targeting and Fusion, Methods 92. D. Bang, S.B. Kent, His6 tag-assisted
Enzymol.,2005, 403, 29. chemical protein synthesis, Proc. Natl.
87. Y. Kajihara, N. Yamamoto, Acad. Sci. U.S.A.2005, 102, 5014.
T. Miyazaki, H. Sato, Synthesis of 93. B.L. Nilsson, L.L. Kiessling, R.T.
diverse asparagine linked Raines, High-yielding Staudinger
oligosaccharides and synthesis of ligation of a phosphinothioester and
sialylglycopeptide on solid phase, azide to form a peptide, Org. Lett.
Cum. Med. Chem. 2005, 12,527. 2001, 3, 9.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
10.3
New Methods for Protein Bioconjugation
Matthew B. Francis
Outlook
This chapter surveys new chemical methods for the attachment of synthetic
molecules to proteins. Strategies targeting both native and unnatural functional
groups are discussed, including an evaluation of the selectivity that each
technique can achieve. A particular emphasis has been placed on the
unique mechanistic attributes that these reactions possess and the practical
circumstances under which they can be used.
10.3.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
594
I 10 Synthesis of Large Biological Molecules
Fig. 10.3-1 A survey of molecules and materials that are commonly attached to proteins
through bioconjugation reactions.
have become possible. These techniques are not used for the majority of
bioconjugation reactions at the time of this writing, but they are certain to
provide countless new strategies as these methods become more available
and general. Although these techniques are described in more detail in other
chapters of this book, some examples of their use in selective bioconjugation
will be presented whenever possible.
10.3.2
History/Development
R-N=C=X
2: X = 0 (Isocyanates)
3: x = s (Isothiocyanates) *\N
H
8: lodoacetamides
-
f
QH
5 (in varying amounts) 6: HOBT
10.3.3
New Bioconjugation Methods Targeting the Natural Amino Acids
Fig. 10.3-3 Tyrosine residues as targets for by the white arrows) can be (b) fully
bioconjugation. (a) In contrast t o charged exposed, (c) partially buried, or (d) fully
amino acid side chains, tyrosine residues buried. The protein shown is
(yellow) are more closely associated with the a-chymotrypsinogen A. (e) Modification o f
protein surface. The reactive 3- and tyrosine residues through electrophilic
5-positions ofthe phenolic ring (indicated aromatic substitution reactions.
Fig. 10.3-4 Highly efficient modification o f and (e) the appearance o f an azo absorption
tyrosine residues using electron-deficient band in the visible spectrum. (t) Similarly,
diazonium salts. (a) General preparation 2100 copies oftyrosine 139 (yellow) line the
method for nitro-substituted diazonium exterior surface ofthe tobacco mosaic virus
salts. (b) There are 180 copies oftyrosine 85 (TMV). (g) These sites can be modified
(green) displayed on the interior surface o f using a two-step diazonium-couplingjoxime
bacteriophage MS2. (c) Virtually all these formation strategy. In both cases, the
sites can be modified using diazonium salt reactions are completely selective for the
10a, as evidenced by (d) MALDI-TOF MS indicated tyrosine residues.
studies. Through further elaboration of these sites, carrier materials are being
prepared for drug delivery applications and as targeted diagnostic agents. As
a second example, tyrosine 139 of the tobacco mosaic virus (TMV) capsid
was modified using ketone-substituted diazonium salt lOc, resulting in the
installation of 2100 sites on the exterior surface, which can be further labeled
through oxime formation [lo]. Once again, virtually complete conversion was
obtained, and the capsid remained assembled after the modification reaction.
As a result, tubelike materials with tailorable surface properties have become
available for nanoscience applications.
The above studies emphasize the ability of diazonium-coupling reactions
to modify proteins with extremely high efficiency, but one of the limitations
of this method is the lack of selectivity that can be obtained when there
are multiple tyrosines on the surface of a single protein. This has not
been problematic for the viral capsids shown above, as only one tyrosine
is accessible on each monomer, but many applications demand higher levels
of selectivity than allowed by these coupling reactions. To address this need,
and to increase the substrate scope for bioconjugation reactions in general, a
versatile Mannich-typereaction has been developed for tyrosine modification,
Fig. 10.3-5 [25]. In this reaction, aldehydes and anilines are mixed to form
(4
r ~ J ! O H
Tyrosine residues
0
HKR
Phosphate buffer
25 mM 25 mM 12 22% 18 h
Fig. 10.3-5 Tyrosine modification using a when proteins are treated alone with either
three component Mannich-type reaction. component. (b) The reaction conversion is
(a) Aldehydes and anilines condense to listed for a number o f anilines and aliphatic
form imines in situ, which react with tyrosine amines using a-chymotrypsinogen A as the
residues through an electrophilic aromatic substrate and formaldehyde as the aldehyde
substitution reaction. No reaction occurs component.
10.3 New Methodsfor Protein BioconJugation I 601
imines 12, which subsequently react with phenolic side chains through an
electrophilic aromatic substitution reaction [26]. Anilines bearing electron-
donating substituents have proven to be the most effective components in
the reaction, affording over 70% overall conversion in some cases. To date,
no aliphatic amines have been observed to participate in the reaction - a
useful feature, as cross-linking reactions with lysine residues are avoided.
Formaldehyde has yielded the highest amount of reactivity, although aldehydes
such as pyruvaldehyde, glyoxylic acid, and furan-carboxylic acid have proven
effective in some instances. Enolizable aldehydes are generally ineffective in
the reaction, presumably due to competing aldol self-condensation pathways.
Some particularly attractive features of this reaction include its mild conditions
(pH 6.5, aqueous buffer, 22-37 "C), very high selectivity for tyrosine residues,
and broad substrate tolerance with respect to the aniline component. I t
should be noted that formaldehyde cross-linking techniques require high
concentrations of the aldehyde (up to 37%) and/or elevated temperatures [27].
With the low concentrations used in these reactions, no modification of the
proteins has been observed in the absence of the aniline component.
In many labeling applications, anilines bearing additional aliphatic amino
groups (such as 13) are particularly useful building blocks, as the aliphatic
amino group of these compounds can be coupled to NHS esters before
using them in the Mannich coupling reaction. This effectively converts the
large number of commercially available lysine labeling reagents into more
selective tyrosine modification reagents using a simple one-pot procedure,
Fig. 10.3-6. This strategy has been applied to the labeling of two antibody
binders, protein A and protein G', with a number of useful functional groups
for immunoassays [28].
As the Mannich reaction does not target cysteine or lysine residues,
both thiols and aliphatic amines can be present in the bioconjugation
substrates. This allows unprotected peptides to be coupled to tyrosine residues
using a tandem Mannich-native chemical ligation (NCL) [29] strategy. TO do
this, N-terminal cysteine mimic 14 has been coupled to tyrosine residues
using the Mannich reaction, Fig. 10.3-7(a). This functional group couples
to peptide thioesters (e.g., IS), ultimately resulting in the synthesis of
branched polypeptide backbone architectures. By moving the location of the
tyrosine residue through site-directed mutagenesis, the branch point can be
repositioned on the protein surface, Fig. 10.3-7(b).The use of this technique
allows the growing set of peptide building blocks, including lanthanide binding
peptides [30]and affinity tags, to be appended to proteins in a flexible manner.
S H
100 HM lysozyme
Ligation
center
Fig. 10.3-7 Native chemical ligations using C-terminal thioesters (e.g., 15) obtained
tyrosine residues. (a) Reactive N-terminal using solid-phase synthesis techniques.
cysteine mimics can be installed through (b) By changing the location o f t h e tyrosine
tyrosine modification using the Mannich residue, the branch point o f the resulting
reaction. After disulfide reduction with DTT structure can be moved.
(dithiothreitol), these groups react with
10.3 New Methodsfor Protein Bioconjugation I 603
40 M M Pd(OAc),
0 5 mM P(C,H,SO,-),>
(“w
D y e - d PdLn+
Tyrosine
residues ~
Dye-NJ
pH 8 6
16 17
(b) 0
w \ o A\ N - s o 3 -? / / /
25667(Expecled 25656)
H (unmodified)
18 water soluble farnesyl derivative
ll -1
0 44 mM Pd(OAc),. 5 3 mM P(m-C,H,SO,.),
>
$I 25875 (Expected 25860)
(M+1 modification)
pH 9, RT. 3 h
t 25000 27000 29000
ESI-MS ( d z )
Chymotrypsinogen A
(200 PM)
Fig. 10.3-8 Tyrosine modification using selectivity. (b) Charged groups can be
palladium n-ally1 chemistry. (a) Allylic attached to hydrophobic chains to assist in
acetates (shown), carbonates, and solubilization. These carriers are lost on
carbamates can be activated by formation of the n-ally1 complexes, and thus
palladium(0) in aqueous solution t o yield are not incorporated into the protein targets.
electrophilic rr-ally1 complexes. These This provides a useful method for the
species alkylate tyrosine residues with high synthesis o f membrane-associated proteins.
604
I 70 Synthesis of Large Biological Molecules
(4
4;R
100 pM Rh,(OAc),
22
ph+oR 75 mM HONHpHCI p h q O R H * +
H,O/ethylene
(80:20) glycol Tryptophan
10 mM 20 0 (1residues
0-100 pM)
RT, 7 h
19: R = (CH,CH2O),CH3
p h q O R 23 Ph
21 0
R'O C 0 3
CH,
?-OH
Low pH H
I + 26
CH3 CH3
Fig. 10.3-9 Tryptophan modification using addition t o reacting with the aqueous
rhodium carbenoids. (a) These species can solvent. Control experiments that were run
be formed in situ through the reaction of in the absence o f rhodium catalyst afford no
vinyldiazo compound 19 with catalytic modification products. (b) Proposed
amounts of RhZ(OAc)4. Intermediate 20 can binding modes for hydroxylamine at low 24
react with tryptophan residues, forming a and elevated 25 p H levels.
mixture o f N - and 2-alkylated indoles, in
(a)
/
Sigma tropic
rearrangement P h w R
28 0
0
19: R = (CH2CH,0),CH3 20
27b X = Rh,(OAC),
27C: X =H
100 FM Rh2(0Ac),
0 0
RT. 7 h
yrotein
catalyst 27b. This results in the transfer of the styryl acetic acid group to
this neighboring site. As is the case with the 1,3-insertion pathway, this
reaction preserves the disulfide linkage after protein modification. Although
the conditions of this reaction are unlikely to maintain secondary and tertiary
protein structures, it still provides the only protein modification method that
is directed by disulfide groups.
HCOi
,8, + H,NR'
31
-A
OMe
H
x H
H
32
OMe
33
14428
(b)
(+I) 14547 (+2)
14665 (+3)
20 pM catalyst
0 25 mM HC0,Na H
10 pM protein
(+4)
50 mM K,HPO, buffer
$N
-R
, 14309
+ R ~ H
pH 7.4,22-37"C, 2-18h 114781
(1 m w
13600 14500 15400
ESI-MS (ml~)
(4 1. 1 equiv DMP
CH ZCI, 1 h
M e O b o * O nH
2.PEG precipitation
* Me0 n
MW = 2000 37% conversion 34
100 pM lysozyme
20 uM catalvst
(aldehyde 34 at'l mM)
* N A Protein Catalyst: + - +
25 mM HC0,Na MeO+O*
" H Aldehyde: + + -
50 mM K,HPO, buffer PEG-OH: - - +
pH 7.4,37"C, 15 h
Fig. 10.3-11 Reductive alkylation of (c) Commercially available PEG alcohols can
proteins using iridium catalyzed transfer be readily oxidized to aldehydes using the
hydrogenation. (a) The iridium(ll1) catalyst Dess-Martin periodinane (DMP). This
shown reacts with formate ion to form a product can then be conjugated to proteins
water-stable hydride. This species reduces using the transfer hydrogenation process, as
imines formed in situ. (b) This reduction observed by SDS-PAGE analysis. The arrows
process proceeds readily on proteins, indicate the PEG conjugates. No reaction
affording multiple alkylated products. occurs in the absence o f catalyst.
70.3 New Methodsfor Protein Bioconjugation I 609
residues. This approach has been particularly successful in the context of NCL
strategies with thioesters (Fig. 10.3-12(a))[29],a technique that is discussed in
depth elsewhere in this book (see also Fig. 10.3-14). In addition, N-terminal
cysteines can be modified with aldehydes through thiazolidine formation
(Fig. 10.3-12(b))[43],although the amide linkage formed in NCL reactions is
more resistant to hydrolysis. Similar linkages have been reported using the
Pictet-Spengler reaction (Fig. 10.3-12(c))[44],which proceeds via electrophilic
aromatic substitution reactions between indoles and imines formed with the
N-terminus. An extensive review of these techniques has recently appeared in
Ref. 43.
A critical consideration for N-terminal modification strategies is the ease
with which the identity of the first amino acid can be established. Although all
proteins begin with methionine as the first amino acid due to the commonality
of the AUG start codon, this group is nearly always removed after translation
in eukaryotes. The situation is more complicated in prokaryotes, however, as
the methionyl aminopeptidases are sensitive to the size of the second amino
(b)
HS
R = H, CH,
a decarboxylation step. Because it can be used with many amino acids, this
technique provides a general method for the site-selective modification of
virtually any protein under mild reaction conditions.
FH SH SH sH
$-cyscys cys-cys-$
I 1
Pro-Gly
+
HO
- + HS-SH
u
39
Pro-Gly
Non-fluorescent Fluorescent
single natural amino acid can be expected to display the required selectivity.
As a solution to this problem, a labeling technique based on the recognition
of a specific sequence of amino acids has been reported. It was recognized
that the ethanedithiol groups of fluorescein bis(arsenica1)dye 39 (aka FlAsH)
can be displaced by tetracysteine motifs expressed on a protein of interest,
Fig. 10.3-15 [2]. Conformational changes that occur on binding reduce the
fluorescence-quenching effect of the arsenic atoms, resulting in a substantial
(up to GO-fold) enhancement in the quantum yield of the chromophore. The
unbound dye remains relatively nonfluorescent, thereby reducing the need for
scrupulous removal of the excess reagent. Although many ( C Y S ) ~sequences
can be recognized, CCPGCC has been particularly effective. Since the initial
publication, additional chromophores with varied optical characteristics have
become available [SO]. Although, similar labeling selectivity can be achieved
on the translational level using green fluorescent protein (GFP) fusion
techniques [Sl], a particular strength of the FlAsH approach is the reliance on
a small molecule modification that is less likely to affect protein trafficking,
binding, and catalytic function. A more detailed description covering the
applications of this powerful technique in cellular imaging appears somewhere
else in this book.
10.3.4
New Methods for the Biosynthetic Incorporation of Unnatural Functional Groups
biological functional groups have been developed. Of these, the azide has
proven particularly useful, as it has a high thermodynamic driving force for
several reactions, and yet it is kinetically inert under physiological conditions.
The first bioconjugation reaction to capitalize on these properties was the
Staudinger ligation [G3]. In this method, azides on biomolecules react with
triarylphosphines (such as 45)to form iminophosphorane 46 with concomitant
loss of nitrogen gas, Fig. 10.3-17(a).Normally, this species would be hydrolyzed
to yield the amine and the phosphine oxide; however, it was shown that this
intermediate could be trapped by a pendant ester group displayed on the
aromatic ring. This ultimately results in the formation of an amide bond
that links the phosphine group to the biomolecular target of interest. The
mechanism of the reaction was examined in detail, including the isolation
and X-ray characterization of intermediate 471,when the reaction was carried
out in anhydrous solvent [7G].These studies have determined that the reaction
rate is accelerated both in polar solvents (such as water) and when electron-
rich aryl rings are attached to the phosphorus atom (although this also leads
to more rapid aerobic oxidation). For aliphatic azides, the rate-determining
step is the formation of iminophosphorane 46, but for aromatic azides the
45 46 47a X = H 48
47b X = CH3
(b)
Cellular
metabolism
* PhzP”oO HO~M& 2-
0-Cell surface 0-Cell surfac
HO
H P ! ? f N HO
Biotin-N 0
40c
0
(4
50 N3
0
Fig. 10.3-17 The Staudinger ligation. into sialic acid residues through metabolic
(a) Triarylphosphines and azides react to engineering. These groups can then be
form iminophosphorane imtermediate 46, labeled using biotinylated phosphine 49.
which is trapped by the pendant ester group. (c) For direct protein modification,
Intermediate 47b has been characterized by azidohomoalanine 50 can be incorporated
X-ray crystallography under anhydrous into proteins biosynthesized in methionine
conditions. (b) Modification o f cell surfaces auxotrophs. In this example, the azides were
using the Staudinger ligation. Treatment of labeled with a phosphine conjugated t o a
mammalian cells with mannose derivative FLAG peptide epitope 51.
40c results in the incorporation o f azides
618
I intramolecular attack on the ester group is rate limiting. The size of the
10 Synthesis $Large Biological Molecules
ester substituent also influences the efficiency of the reaction, with bulky
alkyl groups favoring competing hydrolysis pathways. It should be noted that
“traceless” versions of this reaction have also been developed [77,78],in which
the phosphine oxide moiety is excised during the peptide bond formation step.
This alternative method has proven especially useful for protein synthesis
via segment condensation reactions [79].A review of both Staudinger ligation
types has recently appeared in Ref. 80.
The use of this reaction in the biological context was first demonstrated for
the chemospecific labeling of Jurkat cell surfaces [63]. Metabolic engineering
with N-acetylmannosamine derivative 40c was used to incorporate azides
into sialic acid groups on cell surfaces. The cells were then incubated with
biotinylated phosphine 49,and the extent of the reaction was quantified by
flow cytometry after treatment with fluorescent avidin. Importantly, neither
the azide nor the phosphine displayed any reactivity with the cell-surface
groups in the absence of its reactive partner. In addition, the cells showed
unchanged growth rates after modification.
Since the original disclosure, the Staudinger ligation has evolved into a
powerful tool for the study of glycosylation pathways. The reactive specificity
for the azide/phosphine pair allows virtually any substrate bearing an azido
sugar to be derivatized and quantified in a Western blot or well-plate assay.
As examples, azide analogs have been used to identify protein targets
for N-acetylglucosamine modification in crude lysates [81] and to identify
glycosidases using azidosugars further substituted with fluorine atoms to
prevent enzyme turnover [82]. It has also been used to develop a parallel-plate
“azido-ELISA” assay for the identification of specific peptide sequences that
are targeted for mucin-type 0-glycosylation [68,83].More recently, the reaction
has even been used to modify cell-surface glycoproteins in living animals [84].
The Staudinger ligation has also been used to modify proteins into
which azides have been incorporated directly [85]. In this case, methionine
auxotrophic hosts were used to introduce azidohomoalanine 50 into multiple
sites of murine dihydrofolate reductase. These groups were then modified
using a phosphine bearing a FLAG peptide epitope 51 and detected using
a Western blot assay. The specificity of the labeling reaction was again
demonstrated by labeling proteins in crude cell lysates.
Second generation phosphine reagents have been developed for the
fluorescent detection of azide groups [86]. This system employs coumarin-
substituted phosphine 52, which is nonfluorescent due to excited state
quenching by the lone pair on the phosphorus atom. On oxidation of the
phosphine in the Staudinger ligation this quenching process is relieved,
resulting in a dramatic enhancement in the quantum yield for the dye,
Fig. 10.3-18.The use ofthis activatable fluorescence system provides significant
advantages over the traditional Western blot analyses because it can detect
azide-labeled proteins without the need for extensive washing steps and
antibody-based detection schemes.
10.3 New Methodsfor Protein Bioconjugation I 619
N,-Protein -
52: nonfluorescent 53: fluorescent
(@= 0.01 1) (@= 0.65)
+
10.3.5.3 [3 21 Dipolar Cycloadditions of Azides and Alkynes
In 2001, Kolb, Finn, and Sharpless published an article enumerating the
stereospecific chemical reactions that can join reactive components with high
yields and little by-product formation [87]. An interesting feature that they
share, termed Click reactions, is a great deal of exothermicity through the
use of “spring-loaded” reactive components. This report also focused on
the use of reactions that are air and water tolerant and can be used in the
absence of protecting groups. Thus, many of the reactions on the “Click”
list (e.g. hydrazone formation, oxime formation, and epoxide opening) would
be natural considerations for biomolecule modification, and, in fact, have
been used.
One reaction that proceeds particularly well in aqueous solution is the
+
Huisgen [ 3 21 electrocyclization of azides and alkynes [88]. Although the
individual components of the reaction are unreactive under most conditions,
they can be joined under thermal conditions (often by heating them to
80°C in the absence of solvent) to form triazole products. In the thermal
reaction, equimolar mixtures of syn- and anti-triazoles are obtained when
terminal alkynes are used. As an early demonstration of the specificity of these
components in this reaction, highly potent enzyme inhibitors were synthesized
in the active site of acetylcholine esterase using a library of azide and alkyne
components [89]. Although no reaction occurred between these compounds
in the absence of enzyme, the proximity of the reactive groups in the active
site promoted the [ 3 + 21 cycloaddition at room temperature, affording hybrid
compounds with femtomolar binding constants.
The chemospecificity of the reaction suggested that it could be carried out
using azides or alkynes attached to proteins if the reaction temperature could be
lowered. This breakthrough was achieved by two groups who simultaneously
reported that the reaction could be dramatically accelerated in the presence
of Cu(1) salts [90, 911. This allowed the reaction to take place in aqueous
solution with temperatures from 4 “C to RT. In the copper-catalyzed version
of the reaction, terminal alkynes show high specificity for the antiproduct.
620
I 70 Synthesis ofLarge Biological Molecules
Fig. 10.3-19 Modification o f proteins using obtained when capsids bearing alkynes were
“Click” chemistry. (a) Sixty azide groups exposed t o azides. (b) Azide- and
were introduced on the surface o f t h e alkyne-containing amino acids were
cowpea mosaic virus (CPMV) through the incorporated into proteins using unnatural
alkylation o f genetically introduced cysteine tRNA/synthetase pairs obtained using
residues. These groups can be modified selection techniques. These groups can be
through exposure t o alkynes, &(I) (the modified with high chemoselectivity using
Cu(ll) source is reduced in situ by the TCEP), the appropriate Click CoLJPlingPartners.
and ligand 58 or 59. Similar results were
622
I of the wild-type protein. This study highlights the power of artificial amino
10 Synthesis of Large Biological Molecules
+ oo
H
9 0 - N ~ Biotin
3 0
62 (250 pM)
63a 63b
their surface. Although this reaction appears to be somewhat slower than the
copper-catalyzed reaction, no losses in cell viability were observed in these
studies.
NH2 NH2
64 65
+ NAO
Protein
67
R2
ProteinJyq HNAo
\ N 4
1. Oxidation
3. Oxidation
2.H20 ~ .a.q0
Protein
\ N 4
69 70
10.3.6
New Methods for Bioconjugate Purification
10.3.7
Future Development
Fig. 10.3-23 A general strategy for the o f unmodified protein via filtration. The
purification o f chromophore-labeled captured proteins can be eluted from the
proteins. (a) This approach takes advantage resin using a competitive cyclodextrin
o f host/guest interactions between binder, such a s adamantane carboxylic acid
Sepharose-bound cyclodextrins and (78). (c) Purification o f Oregon Green
hydrophobic organic molecules. A sample o f labeled myoglobin. The removal o f residual
compatible chromophores is shown at right. unlabeled protein can be confirmed through
(b) The resin captures chromophore-labeled UV-vis analysis, or (d-f) by using ESI-MS.
proteins selectively, allowing facile removal
is that even the most predictable chemical reactions can display unexpected
reactivity and selectivity when applied to complex molecular targets. Similar
behavior is often observed for protein modification, as each biomolecular
target presents multiple chemical environments of unmatched complexity.
The “personality” of each protein can be difficult to predict, owing to
variations in the solvent accessibility oftargeted residues and the effects oflocal
environments on p K, values. Further complications arise on consideration of
the rapid conformational changes of the surface groups and the aggregation
of proteins and reagents in aqueous solution. As a result, the scope and utility
of each bioconjugation reaction can be evaluated only by applying it to many
10.3 New Methodsfor Protein Bioconjugation 1 627
10.3.8
Conclusion
Taken together, the new chemical tools described herein have dramatically
altered the landscape of chemical biology. Each of these techniques has
expanded the scope of bioconjugates that can be prepared, and thus the
creativity with which new experimental systems can be designed. Many of
the labeling reactions can achieve levels of selectivity that were previously
impossible to attain, even allowing single proteins to be targeted in the
complex biochemical settings of living cells. Equally important is the
continued development of the conceptual framework that is needed to create
future reactions. In addition to improving our understanding of enzyme
function and protein trafficking, these new techniques have enabled frontier
applications in proteomics, single molecule spectroscopy, and the preparation
of biomolecular materials, among others. As new strategies continue to
emerge, this field is certain to retain its crucially important role in chemical
biology.
Acknowledgments
I would especially like to thank the students with whom I have had the pleasure
of working during the past four years. They are an extremely talented and
creative group of scientists, and 1 cannot overemphasize my gratitude for their
enthusiasm, hard work, and intellectual input. Our efforts in the area ofprotein
modification have been generously funded by the Biomolecular Materials
Program at Lawrence Berkeley National Labs, the DOE Nanoscale Science and
Engineering Technology (NSET)program, the NIH (R01 GM072700-Ol),and
the Department of Chemistry at UC Berkeley.
References I 6 2 9
References
1. A.F. Straight, A. Cheung, J. Limouze, 11. ].A. Maurer, D.E. Elmore, H.A.
I. Chen, N.J. Westwood, J.R. Sellers, Lester, D.A. Dougherty, Comparing
T.J. Mitchison, Dissecting temporal and contrasting Escherichia coli and
and spatial control of cytokinesis with Mycobacterium tuberculosis
a myosin I1 inhibitor, Science 2003, mechanosensitive channels
299,1743-1747. (MscL) - new gain of function
2. B.A. Griffin, S.R. Adams, R.Y. Tsien, mutations in the loop region, /. B i d .
Specific covalent labeling of Chem. 2000,275, 22238-22244.
recombinant protein molecules 12. Q. Wang, T.W. Lin, L. Tang, J.E.
inside live cells, Science 1998, 281, Johnson, M.G. Finn, Icosahedral
269-272. virus particles as addressable
3. E. Babini, I. Bertini, M. Borsari, nanoscale building blocks, Angew.
F. Capozzi, C. Luchinat, X.Y. Zhang, Chem. Int. Ed. Engl. 2002, 41,
G.L.C. Moura, I.V. Kurnikov, D.N. 459-462.
Beratan, A. Ponce, A.J. Di Bilio, J.R. 13. For an example of double
Winkler, H.B. Gray, Bond-mediated chromophore labeling for FRET
electron tunneling in studies, see M. Borsch, M. Diez,
ruthenium-modified high-potential B. Zimmermann, R. Reuter,
iron-sulfur protein, J. Am. Chem. SOL. P. Graber, Stepwise rotation of the
2000, 122,4532-4533. y-subunit of EFoFl-ATP synthase
4. S. Zalipsky, Chemistry of observed by intramolecular
polyethylene-glycol conjugates with single-molecule fluorescence
biologically-active molecules, Adu. resonance energy transfer, FEES Lett.
Drug Deliv. Rev. 1995, 16, 157-182. 2002, 527,147-152.
5. S. Zalipsky, J.M. Harris, Introduction 14. R.F. Doolittle, Redundancies in
to chemistry and biological protein sequences, in Prediction of
applications of poly(ethy1ene glycol), Protein Structure and the Principles of
Poly(EthyleneGlycol) 1997, 680, 1-1 3. Protein Conformation,(Ed.: G.D. Fas-
6. H.C. Hang, C.R. Bertozzi, man), Plenum Press, New York,
Chemoselective approaches to 1989.
glycoprotein assembly, Acc. Chem. 15. J. Houk, G.M. Whitesides, Structure
Res. 2001, 34, 727-736. reactivity relations for thiol disulfide
7. C.M. Niemeyer, Nanoparticles, interchange, /. Am. Chem. SOL.1987,
proteins, and nucleic acids: 109,6825-6836.
biotechnology meets materials 16. T.P. King, Y. Li. L. Kochoumian,
science, Angew. Chem. Int. Ed. Engl. Preparation of protein conjugates via
2001,40,4128-4158. intermolecular disulfide bond
8. N.C. Seeman, A.M. Belcher, formation, Biochemistry 1978, 17,
Emulating biology: building 1499- 1506.
nanostructures from the bottom up, 17. For an example, see S. Zalipsky,
Proc. Nut. Acad. Sci. U. S. A. 2002, 99, M. Qazen, J.A. Walker, N. Mullah,
6451-6455. Y.P. Quinn, S.K. Huang, New
9. For an excellent review of common detachable poly(ethy1ene glycol)
bioonjugation techniques, see G.T. conjugates: cysteine-cleavable
Hermanson, Bioconjugute Techniques, lipopolymers regenerating natural
Academic Press, San Diego, 1996. phospholipid, diacyl
10. T.L. Schlick, Z.B. Ding, E.W. Kovacs, phosphatidylethanolamine,
M.B. Francis, Dual-surface Bioconjug. Chem.1999, 10, 703-707.
modification of the tobacco mosaic 18. H.R.Adams,C.H. Paik, W.C.
virus, J . Am. Chem. SOC.2005, 127, Eckelman, R.C. Reba, Electrophilic
3718-3723. iodination of aromatic rings, J .
630
I I0 Synthesis of Large Biological Molecules
Labelled Comp. Radiopharm. 1982, 19, 30. K.J. Franz, M. Nitz, B. Imperiali,
1477- 1478. Lanthanide-binding tags as versatile
19. W.C. Eckelman, H.R. Adams, C.H. protein coexpression probes,
Paik, Electrophilic iodination of Chembiochem 2003,4,265-271.
aromatic rings, Int. ]. Nucl. Med. Biol. 31. H. Dibowski, F.P. Schmidtchen,
1984, 11,163-166. Bioconjugation of peptides by
20. J.F. Leite, M. Cascio, Probing the palladium-catalyzed C-C
topology of the glycine receptor by cross-coupling in water, Angew.
chemical modification coupled to Chem. Int. Ed. Engl. 1998, 37,
mass spectrometry, Biochemistry 476-478.
2002,41,6140-6148. 32. D.T. Bong, M.R. Ghadiri,
21. H.G. Higgins, D. Fraser, The Chemoselective Pd(0)-catalyzed
reaction of amino acids and proteins peptide coupling in water, Org. Lett.
with diazonium compounds. 1. A 2001,3,2509-2511.
spectrophotometric study of 33. A. Ojida, H. Tsutsumi, N. Kasagi,
azo-derivativesof histidine and I. Hamachi, Suzuki coupling for
tyrosine, Australian]., Sci. Res. Ser. A protein modification, Tetrahedron
Phys. Sciences 1952, 5, 736-753. Lett. 2005, 46, 3301-3305.
22. H.G. Higgins, K.J. Harrington, 34. S.D. Tilley, M.B. Francis, Submitted.
Reaction of amino acids and proteins 35. J. Stubbe, D.G. Nocera, C.S. Yee,
with diazonium compounds. M.C.Y. Chang, Radical initiation in
2. Spectra of protein derivatives,Arch. the class I ribonucleotide reductase:
Biochem. Biophys. 1959, 85, 409-425. long-range proton-coupled electron
23. J.A. Shin, Specific DNA binding transfer? Chem. Rev. 2003, 103,
peptide-derivatized solid support,
2167-2201.
Bioorg. Med. Chem. Lett. 1997, 7,2367.
36. J.M. Antos, M.B. Francis, Selective
24. J.M. Hooker, E.W. Kovacs, M.B.
tryptophan modification with
Francis, Interior surface modification
rhodium carbenoids in aqueous
of bacteriophage MS2, J. Am. Chem.
SOC.2004, 126,3718-3719. solution, ]. Am. Chem. SOC.2004, 126,
10256-10257.
25. N.S. Joshi, L.R. Whitaker, M.B.
Francis, A three-component 37. H.M. Davies, P.R. Bruzinski, D.H.
mannich-type reaction for selective Lake, N. Kong, M.J. Fall, Asymmetric
tyrosine bioconjugation, J. Am. cyclopropanations by rhodium(I1)
Chem. SOC. 2004, 126, 15942-15943. N-(arylsu1fonyl)prolinate catalyzed
26. For an example of a decomposition of
lanthanide-promoted phenol vinyldiazomethanes in the presence
modification with imines in organic of alkenes. Practical enantioselective
solvents, see T.S. Huang, C.J. Li, synthesis of the four stereoisomers of
Synthesis of amino acids via a 2-phenylcyclopropan-1-aminoacid, J .
three-component reaction of phenols, Am. Chem. SOC. 1996, 118,
glyoxylates and amines, Tetrahedron 6897 - 6907.
Lett. 2000, 41, 6715. 38. Most proteins are not denatured by
27. H. Fraenkel-Conrat, H.S. Olcott, the use of this cosolvent. For
Reaction of formaldehyde with examples, see Y.L. Khmelnitsky, V.V.
proteins. VI. cross-linking of amino Mozhaev, A.B. Belova, M.V.
groups with phenol, imidazole, or Sergeeva, K. Martinek, Denaturation
indole groups, /. Biol. Chem. 1948, capacity - a new quantitative
174,827-843. criterion for selection of
28. N.S. Joshi, M.B. Francis, Submitted. organic-solvents as reaction media in
29. P.E. Dawson, T.W. Muir, biocatalysis, Eur. ]. Biochem. 1991,
I. Clarklewis, S.B.H. Kent, Synthesis 198,31-41.
of proteins by native chemical 39. J.M. Antos, M.B. Francis,
ligation, Science 1994, 266, 776-779. Unpublished results.
References I 6 3 1
40. An analogous rearrangement proteins - a review, J . Protein Chem.
pathway has been observed for small 1984,3,99-108.
molecule disulfides in organic 49. For examples, see T.J. Tolbert, C.H.
solvents M. Hamaguchi, T. Misumi, Wong, Intein-mediated synthesis of
T. Oshima, Reaction of proteins containing carbohydrates
vinylcarbenoids with cyclic disulfides: and other molecular probes, /. Am.
formation of 1,3-insertion products Chem. SOC. 2000, 122, 5421-5428.
as well as 1,l-insertion products, 50. S.R. Adams, R.E. Campbell, L.A.
Tetrahedron Lett. 1998, 39, Gross, B.R. Martin, G.K. Walkup,
7113-7116. Y. Yao, I. Llopis, R.Y. Tsien, New
41. J.M. McFarland, M.B. Francis, biarsenical ligands and tetracysteine
Reductive alkylation of proteins motifs for protein labeling in vitro
using iridium catalyzed transfer and in vivo: synthesis and biological
hydrogenation, J . Am. Chem. SOC. applications, J . Am. Chem. SOC. 2002,
2005, in press. 124,6063-6076.
42. T.J. Sereda, C.T. Mant, A.M. Quinn, 51. R.Y. Tsien, The green fluorescent
R.S. Hodges, Effect of alpha-amino protein, Annu. Rev. Biochem. 1998,
group on peptide retention behavior 67,509-544.
in reversed-phase 52. C.J. Noren, S.J. Anthonycahill, M.C.
chromatography - determination of Griffith, P.G. Schultz, A general
the pK(a) values of the alpha-amino method for site-specific incorporation
group of 19 different N-terminal of unnatural amino-acids into
amino-acid-residues, /. Chromatogr. proteins, Science 1989, 244, 182-188.
1993, 646,17-30. 53. J.A. Ellman, D. Mendel,
43. J.P. Tam, Q.T. Yu, Z.W. Miao, S. Anthonycahill, C.J. Noren, P.G.
Orthogonal ligation strategies for Schultz, P. G. Biosynthetic method
peptide and protein, Biopolymers for introducing unnatural
1999, 51,311-332. amino-acids site-specifically into
44. X.F. Li, L.S. Zhang, S.E. Hall, J.P. proteins, Methods Enzymol.1991,
Tam, A new ligation method for 202,301-336.
N-terminal tryptophan-containing 54. L. Wang, A. Brock, B. Herberich,
peptides using the Pictet-Spengler P.G. Schultz, Expanding the genetic
reaction, Tetrahedron Lett. 2000, 41, code of Escherichia coli, Science 2001,
4069-4073. 292,498-500.
45. P.H. Hirel, J.M. Schmitter, 55. J.W. Chin, S.W. Santoro, A.B. Martin,
P. Dessen, G. Fayat, S. Blanquet, D.S. King, L. Wang, P.G. Schultz,
Extent of N-terminal methionine Addition of p-azido-L-phenylalanine
excision from escherichia-coli to the genetic code of Escherichia
proteins is governed by the coli, J . Am. Chem. SOC. 2002, 124,
side-chain length of the penultimate 9026-9027.
amino-acid, Proc. Nut. Acad. Sci. U. S. 56. L. Wang, P.G. Schultz, Expanding
A. 1989, 86,8247-8251. the genetic code, Chem. Commun.
46. K.F. Geoghegan, J.G. Stroh, 2002, 1 , 1-11.
Site-directed conjugation of 57. L. Wang, Z. Zhang, A. Brock, P.G.
nonpeptide groups to peptides and Schultz, Addition of the keto
proteins via periodate-oxidation of a functional group to the genetic code
2-amino alcohol - application to of Escherichia coli, Proc. Nat. Acad.
modification at N-terminal serine, S C ~U.. S. A. 2003, 100, 56-61.
Bioconjug. Chem. 1992, 3, 138-146. 58. R.A. Mehl, J.C. Anderson, S.W.
47. J.M. Gilmore, R.A. Scheck, M.B. Santoro, L. Wang, A.B. Martin, D.S.
Francis, Unpublished results. King, D.M. Horn, P.G. Schultz,
48. For a related reaction catalyzed by Generation o fa bacterium with a 2 1
copper ions, see H.B.F. Dixon, amino acid genetic code, J . Am.
N-terminal modification of Chem. SOC. 2003, 125,935-939
6321 70 iynthesis of Large Biological Molecules
59. K.L. Kiick, D.A. Tirrell, Protein 69. 1. Chen, M. Howarth, W. Lin, A.Y.
engineering by in vivo incorporation Ting, Site-specificlabeling of cell
of non-natural amino acids: control surface proteins with biophysical
of incorporation of methionine probes using biotin ligase, Nat.
analogues by methionyl-tRNA Methods 2005, 2, 99-104.
synthetase, Tetrahedron 2000, 56, 70. M. Howarth, K. Takao, Y. Hayashi,
9487-9493. A.Y. Ting, Targeting quantum dots to
60. K.L. Kiick, R. Weberskirch, D.A. surface proteins in living cells with
Tirrell, Identification of an expanded biotin ligase, Proc. Nat. Acad. Sci. U.
set of translationally active S. A. 2005, 102,7583-7588.
methionine analogues in Escherichia 71. Y. Kho, S.C. Kim, C. Jiang, D. Barma,
coli, F E B S Lett. 2001, 502, 25-30. S.W. Kwon, J.K. Cheng, J. Jaunbergs,
61. K. Kirshenbaum, I.S. Carrico, D.A. C. Weinbaum, F. Tamanoi, J. Falck,
Tirrell, D. A. Biosynthesis of proteins Y.M. Zhao, A tagging-via-substrate
incorporating a versatile set of technology for detection and
phenylalanine analogues, proteomics of farnesylated proteins,
Chembiochem 2002,3, 235-237. Proc. Nut. Acad. Sci. U. S. A. 2004,
62. L.K. Mahal, K.J. Yarema, C.R. 101,12479-12484.
Bertozzi, Engineering chemical 72. J. Yin, F. Liu, X.H. Li, C.T. Walsh,
reactivity on cell surfaces through Labeling proteins with small
oligosaccharide biosynthesis, Science molecules by site-specific
1997, 276,1125-1128. posttranslational modification, J . Am.
63. E. Saxon, C.R. Bertozzi, Cell surface Chem. SOC.2004, 126,7754-7755.
engineering by a modified staudinger 73. V.W. Cornish, K.M. Hahn, P.G.
Schultz, Site-specificprotein
reaction, Science 2000, 287,
modification using a ketone handle,
2007-2010.
J. Am. Chem. SOC.1996, 118,
64. E. Saxon, S.J. Luchansky, H.C. Hang,
8150-8151.
C. Yu, S.C. Lee, C.R. Bertozzi,
74. W.P. Jencks, Studies on the
Investigating cellular metabolism of
mechanism of oxime and
synthetic azidosugars with the
semicarbazone formation, J . Am.
staudinger ligation, J. Am. Chem. SOC.
Chem. SOC.1959,81,475-481.
2002, 124,14893-14902.
75. Z.W. Zhang, B.A.C. Smith, L. Wang,
65. J.H. Lee, T.J. Baker, L.K. Mahal, A. Brock, C. Cho, P.G. Schultz, A
J. Zabner, C.R. Bertozzi, D.F. new strategy for the site-specific
Wiemer, M.J. Welsh, Engineering modification of proteins in vivo,
novel cell surface receptors for Biochemistry-Us2003, 42,6735-6746.
virus-mediated gene transfer, J. B i d . 76. F.L. Lin, H.M. Hop, H. van Halbeek,
Chem. 1999,274,21878-21884. R.G. Bergman, C.R. Bertozzi,
66. S.J. Luchansky, C.R. Bertozzi, Azido Mechanistic investigation of the
sialic acids can modulate cell-surface staudinger ligation, J. Am. Chem. SOC.
interactions, Chembiochem2004, 5, 2005, 127,2686-2695.
1706- 1709. 77. E. Saxon, J.I. Armstrong, C.R.
67. R.A. Chandra, E.A. Douglas, R.A. Bertozzi, A “traceless” Staudinger
Mathies, C.R. Bertozzi, M.B. Francis, ligation for the chemoselective
Programmable cell adhesion encoded synthesis of amide bonds, Org. Lett.
by DNA hybridization, Angew. Chem. 2000,2,2141-2143.
Int. Ed. Engl. 2006, 45,896-901. 78. B.L. Nilsson, L.L. Kiessling, R.T.
68. H.C. Hang,C.Yu, D.L. Kato, C.R. Raines, Staudinger ligation: a peptide
Bertozzi, A metabolic labeling from a thioester and azide, Org. Lett.
approach toward proteomic analysis 2000,2,1939-1941.
of rnucin-type 0-linked glycosylation, 79. B.L. Nilsson, R.J. Hondal, M.B.
Proc. Nat. Acad. Sci. U. S. A. 2003, Soellner, R.T. Raines, Protein
100,14846-14851. assembly by orthogonal chemical
References I 6 3 3
ligation methods, J . Am. Chem. Soc. 90. V.V. Rostovtsev, L.G. Green,V.V.
2003, 125,5268-5269. Fokin, K.B. Sharpless, A stepwise
80. M. Kohn, R. Breinbauer, The huisgen cycloaddition process:
staudinger ligation-A gift to copper(1)-catalyzed regioselective
chemical biology, Angew. Chem. Znt. “ligation” of azides and terminal
Ed. Engl. 2004,43, 3106-3116. alkynes, Angew. Chem. Znt. Ed. Engl.
81. D.J. Vocadlo, H.C. Hang, E.J. Kim, 2002,41,2596-2599.
J.A. Hanover, C.R. Bertozzi, A 91. C.W. Torn~re,C. Christensen,
chemical approach for identifying M. Meldal, Peptidotriazoles on solid
0-GlcNAc-modified proteins in cells, phase: [1,2,3]-triazoles by
Proc. Natl. Acad. Sci. U. S. A. 2003, regiospecific copper(1)-catalyzed
100,9116-9121. 1,3-dipolar cycloadditions of terminal
82. D. J. Vocadlo, C.R. Bertozzi, A alkynes to azides, J . Org. Chem. 2002,
strategy for functional proteomic 67,3057-3062.
analysis of glycosidase activity from 92. Q. Wang, T.R. Chan, R. Hilgraf, V.V.
cell lysates, Angew. Chem. Int. Ed. Fokin, K.B. Sharpless, M.G. Finn,
Engl. 2004,43,5338-5342. Bioconjugation by
83. H.C. Hang, C. Yu, M.R. Pratt, C.R. copper(1)-catalyzedazide-alkyne
Bertozzi, Probing glycosyltransferase +
(3 21 cycloaddition,]. Am. Chem.
activities with the staudinger ligation, SOC.2003, 125, 3192-3193.
1.Am. Chem. Soc. 2004, 126,6-7. 93. L.V. Lee, M.L. Mitchell, S.J. Huang,
84. J.A. Prescher, D.H. Dube, C.R. V.V. Fokin, K.B. Sharpless, C.H.
Bertozzi, Chemical remodelling of Wong, A potent and highly selective
cell surfaces in living animals, Nature inhibitor of human
2004,430,873-877. alpha-l,3-fucosyltransferasevia click
85. K.L. Kiick, E. Saxon, D.A. Tirrell, C.R. chemistry,]. Am. Chem. SOC.2003,
Bertozzi, Incorporation of azides into 125,9588-9589.
recombinant proteins for 94. P. Wu, A.K. Feldman, A.K. Nugent,
chemoselective modification by the C.J. Hawker, A. Scheel, B. Voit,
staudinger ligation, Proc. Natl. Acad. J. Pyun, J.M.J. Frechet, K.B.
S C ~U.. S . A. 2002, 99, 19-24. Sharpless, V.V. Fokin, Efficiency and
86. G.A. Lemieux, C.L. de Graffenried, fidelity in a click-chemistry route to
C.R. Bertozzi, A fluorogenic dye triazole dendrimers by the
activated by the staudinger ligation,]. copper(I)-catalyzed ligation of azides
Am. Chem. Soc. 2003, 125, and alkynes, Angew. Chem. Int. Ed.
4708-4709. Engl. 2004, 43, 3928-3932.
87. H.C. Kolb, M.G. Finn, K.B. 95. V.O. Rodionov, V.V. Fokin, M.G.
Sharpless, Click chemistry: diverse Finn, Mechanism of the ligand-free
chemical function from a few good Cu-I-catalyzed azide-alkyne
reactions, Angew. Chem. Znt. Ed. Engl. cycloaddition reaction, Angew. Chem.
2001,40,2004-2021. Znt. Ed. Engl. 2005, 44, 2210-2215.
88. R. Huisgen, in 1,3-Dipolar 96. W.G. Lewis, F.G. Magallon, V.V.
Cycloaddition Chemistry, (Ed.: Fokin, M.G. Finn, Discovery and
A. Padwa), Vol I, Wiley, New York, characterization of catalysts for
1984, pp. 1-176. azide-alkyne cycloaddition by
89. W.G. Lewis, L.G. Green, fluorescence quenching,]. Am.
F. Grynszpan, 2 . Radic, P.R. Carlier, Chem. SOC.2004, 126,9152-9153.
P. Taylor, M.G. Finn, K.B. Sharpless, 97. S . S . Gupta, J. Kuzelka, P. Singh,
Click chemistry in situ: W.G. Lewis, M. Manchester, M.G.
Acetylcholinesterase as a reaction Finn, Accelerated bioorthogonal
vessel for the selective assembly of a conjugation: a practical method for
femtomolar inhibitor from an array the ligation of diverse functional
of building blocks, Angew. Chem. Int. molecules to a polyvalent virus
Ed. Engl. 2002,41,1053-1057. scaffold, Bioconjug. Chem. in press.
634
I 10 Synthesis of Large Biological Molecules
98. A.E. Speers, G.C. Adam, B.F. Cravatt, functionality in bacterial cell surface
Activity-based protein profiling in proteins,]. Am. Chem. SOC.2004, 126,
vivo using a copper(1)-catalyzed 10598- 10602.
azide-alkyne [ 3 + 21 cycloaddition, ]. 102. N.J. Agard, J.A. Prescher, C.R.
Am. Chem. SOC. 2003, 125, Bertozzi, A strain-promoted ( 3 + 21
4686-4687. azide-alkyne cycloaddition for
99. A. Deiters, T.A. Cropp, M. Mukherji, covalent modification of
J.W. Chin, J.C. Anderson, P.G. biomolecules in living systems, J .
Schultz, Adding amino acids with Am. Chem. SOC.2004, 126,
novel reactivity to the genetic code of 15046- 15047.
Saccharomyces cerevisiae, /. Am. 103. J.M. Hooker, M.B. Francis,
Chem. SOC.2003, 125,11782-11783. Submitted.
100. A.J. Link, D.A. Tirrell, Cell surface 104. J.F. Corbett, Benzoquinone imines.
labeling of Escherichia coli via part IV. Mechanism and kinetics of
copper(1)-catalyzed [3 + 21 the formation of bandrowski’s base,
cycloaddition, J. Am. Chem. SOC. J . Chem. SOC. B 1969, 818.
2003, 125,11164-11165. 105. T. Nguyen, N.S. Joshi, M.B. Francis,
101. A.J. Link, M.K.S. Vink, D.A. Tirrell, Submitted.
Presentation and detection of azide
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
I635
11
Advances in Sugar Chemistry
11.1
The Search for Chemical Probes to Illuminate Carbohydrate Function
Outlook
Until the 1970s, it was believed that the major cellular functions of
carbohydrates were confined to their use as structural elements or energy
sources. Since then, evidence that glycoconjugates function in many diverse
roles has led to an increased appreciation of these biomolecules. Saccharides
act as information carriers and effect many signaling events, cell-cell
communication, cell adhesion, differentiation, inflammation, and tumor cell
metastasis [ 1-31. Moreover, defects in the production of glycoconjugates cause
a series of human diseases referred to as congenital disorders of glycosylation
(CDG) [4,51. In prokaryotes, carbohydrates are essential constituents of
bacterial cell walls; consequently, agents that block their incorporation can
function as novel antimicrobials. These examples underscore the value
of understanding glycoconjugate biosynthesis and function for human
health.
11.1.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
636 I J Advances in Sugar Chemistry
I the physiological roles of glycoconjugates. The use of compounds that block
a specific protein-carbohydrate interaction or that inhibit the biosynthesis
of carbohydrates would facilitate insights complementary to those obtained
using only genetic approaches. Here, we will discuss the issues that have
complicated the efforts to determine how carbohydrates function, the tools
that have been developed to enhance our understanding, and the advances
in the generation and identification of chemical agents to probe carbohydrate
function.
11.1.2
History and Development
Carbohydrate-modifyingEnzymes
For understanding glycoconjugate function, a complementary strategy to
inhibiting protein-carbohydrate interactions is to block glycoconjugate
assembly. The development of strategies for high-throughput analysis of
enzymes that act on carbohydrates or glycoconjugates is also challenging. Many
such enzymes use identical or similar glycosyl donors; therefore, it is difficult
to determine their acceptor specificity. For example, there are several hundred
glycosyltransferases in humans [SG], and many utilize similar or identical
sugar-nucleotide substrates. Traditional proteomics-based strategies, such as
two-dimensional gel electrophoresis [57] and isotope-coded affinity tags [58],
can provide valuable information about protein abundance. However, these
experiments give no information about enzyme activity levels or substrate
specificity. The need for this information has prompted the development of
several new strategies [59]. Recently, Pohl and coworkers reported an assay
based on mass spectrometry for the study of a rabbit muscle phosphorylase
[GO] and sugar nucleotidyltransferases from yeast and Escherichia coli [ G l ] . This
group also has reported the design of a library of mass-differentiated substrates
to examine the substrate specificity of glycosidases [G2]. Additionally, several
research groups have used sugar derivatives to directly label and detect active
enzymes [G3-G5]. Finally, carbohydrate microarrays have also been utilized
to explore the substrate specificity of carbohydrate-utilizing enzymes [39,
481. These tools have provided valuable information about the activity and
specificity of carbohydrate-utilizing enzymes. This knowledge is fundamental
and also critical for developing assays that monitor the activities of biosynthetic
enzymes.
Another issue facing those interested in developing inhibitors of glycocon-
jugate biosynthesis is what types of compounds to test as inhibitors. Many
known ligands for carbohydrate-utilizing enzymes are transition state analogs.
For example, imino sugars are commonly used to mimic oxocarbenium ions
that serve as intermediates in glycosidase or glycosyltransferase reactions [GG].
Although transition state analogs have provided important information about
many enzymes, they often lack selectivity for the target of interest. Moreover,
few of these compounds are cell permeable. Unlike protein-carbohydrate
interactions that occur on the surface of the cell, carbohydrate processing
occurs within the cell, necessitating the development of cell-permeable lig-
ands to investigate the roles of carbohydrate-utilizing enzymes within an
organism. Genetic knockout animals and human genetics have uncovered
new and unexpected roles for enzymes that participate in glycoconjugate
biosynthesis [GI. Some of these enzymes, however, are essential for early devel-
opment. Cell-permeable compounds that block these enzymes would offer a
number ofbenefits, including the ability to exert temporal control over enzyme
function [G7].
One of the most efficient ways to identify "cell-permeable" ligands is
through the utilization of high-throughput screens. Identification of inhibitors
through this method has been hampered by the lack of effective assays.
1 I . 1 The Searchfor Chemical Probes to llluminate Carbohydrate Function I 639
Recently, however, several such assays have been developed for the study
of carbohydrate-enzyme interactions. As mentioned previously, carbohydrate
microarrays can be employed for inhibitor identification. Indeed, they were
recently used by Wong and coworkers to identify fucosyltransferase inhibitors
from a small library (85 compounds) of triazole-containing compounds [4G,
471. Kiessling [G8] and Walker [G9] have reported high-throughput binding
assays that use fluorescence polarization to facilitate the identification
of ligands for uridine 5’-diphosphate-galactopyranose mutase (UGM) and
MurG, enzymes that utilize nucleotide-sugar substrates and are involved
in bacterial cell wall biosynthesis. These assays were used to screen
large commercially available small molecule libraries (-16 000 and -49 000
members respectively). The availability of data from high-throughput screens
such as these may lead to the identification of key scaffolds for inhibitor
design. Such information will guide the development of effective probes for
glycobiology.
Heterocycle
H $ L O H olf'&- HHO O q
HO
OH HO 8P-OR
0-
The biological processes that have been studied using these ligands range
from virus interaction with host cells [73, 76, 771, bacterial toxin binding
[78, 791, and adhesion of leukocytes to endothelial cells [30, 74, 80, 811.
Thus, multivalent ligands have been used to explore protein-carbohydrate
interactions, and they often serve as potent inhibitors. The identification of
monovalent ligands of modest affinity can be leveraged to create multivalent
probes.
As the aforementioned examples highlight, most efforts to inhibit
either protein-carbohydrate interactions or the enzymes responsible for
glycoconjugate biosynthesis have focused on the utilization of carbohydrates
and their derivatives. Many of the available compounds, however, are
not optimal for studies in cells or organisms because they have low
binding affinity and selectivity, poor metabolic stability, and limited cell
permeability. Additionally, the synthesis of carbohydrate derivatives can
be difficult and labor intensive, and many iterations may be required
to improve the activities of the typical low-affinity carbohydrate leads.
Therefore, attention has recently turned to the design of compounds that
are not derived from carbohydrate building blocks. This review highlights the
development of noncarbohydrate-like ligands to study the physiological roles
of carbohydrates. First, we discuss the approaches to examine lectins, receptors
that use sugar-binding interactions to facilitate cell adhesion or cell signaling
events.
In conjunction with our overview of glycomimetics that block pro-
tein-carbohydrate interactions, we also discuss strategies to develop inhibitors
of carbohydrate-processing enzymes; this section focuses on the enzymes in-
volved in bacterial cell wall biosynthesis because it is an area in which many
new advances have been made. Enzymes that utilize sugars and synthesize
glycoconjugates unique to pathogens have been identified, and cell-permeable
inhibitors can be used to explore their biological roles or validate a potential
11.7 The Searchfor Chemical Probes to flluminate Carbohydrate Function
therapeutic target. The scaffolds identified in this work may also be applicable
to the development of probes of other prokaryotic and perhaps even eukaryotic
carbohydrate-utilizing enzymes.
11.1.3
General Considerations: Cell-surface Carbohydrate Recognition Interactions
Fig. 11.1-5 Selectins mediate the rolling ofwhite blood cells, causing them t o adhere t o
and then pass through the endothelium toward the site o f infection.
7 7 . J The Searchfor Chemical Probes to Illuminate Carbohydrate Function 1 645
the search for high affinity monovalent inhibitors has continued. Specifically,
the therapeutic value of selectin inhibitors has prompted considerable effort
to develop more conventional “druglike” compounds that block these
protein-carbohydrate interactions. Moreover, higher affinity monovalent
ligands could be used to generate even more potent multivalent inhibitors.
11.1.4
Applications: Identification o f Inhibitors of Protein-Carbohydrate Interactions
Hydrophobic interaction
Ionic interaction
HOoH
Calcium coordination
3 4
QCOOH
natural ligand and their proposed library of compounds. Pivotal to the design
of their docking experiments was the hypothesis that interactions between
the 2- and 3-OH groups on the fucose unit of sLeXand the calcium ion, and
the carboxylic acid group of the sialic acid moiety and Arg97 are essential
for binding. Their modeling studies suggested that a bicyclic mimic such
as compound 6 , though significantly smaller than the natural ligand, would
possess the features necessary to favorably interact with the receptor. Indeed,
these molecules did have inhibitory activity comparable to the natural ligand.
However, the authors found that both enantiomers (only one of which has a
display of hydroxyl groups similar to that of fucose) had the same activity as
measured by a cell-based competition assay (ICso = 4.5-7.0 mM). This result,
along with the elucidation of the structure of both P- and E-selectins bound to
sLeXby X-ray crystallography [log],suggests that their model was only partially
correct. Structural data confirms that interactions between the carboxylic acid
moiety and Arg97 are important for binding. These data also indicate that it
is the 3- and 4-OH groups that are important for substrate binding not the
hypothesized 2- and 3-OH groups. This difference likely explains the lack of
specificity of compound 6 and its enantiomer.
As previously mentioned, high-throughput screening may lead to potent
inhibitors of protein-carbohydrate interactions. Some success in the selectin
field has been achieved by Slee et al., who identified several potent inhibitors
of P-selectin by screening a library of compounds in an ELISA [53]. After
initial lead identification, they performed modeling studies that suggested
ligand modifications that would enhance the activity of their ligand. They
ultimately identified a compound, 7, with very good P-selectin inhibitory activity
(ICso = 300 nM) (Fig. 11.1-9). What sites on P-selectin this ligand binds,
1 1 . 1 The Searchfor Chemical Probes t o Illuminate Carbohydrate Function I 647
Ho2cw \ /
-
/ \
N,ci~H33
H
Fig. 11.1-9 Potent inhibitor of P-selectin
and selectin-mediated rolling in uiuo.
however, are not apparent. Given that the lectin interacts with glycosylated
peptide sequences that contain sulfated tyrosine residues, compound 7 may
compete with the peptide sequence. Interestingly, compound 7 bears some
structural resemblance to the Kondo inhibitor 5, suggesting that this type
of “trimodal” scaffold may be a general selectin inhibitor. As mentioned
for the Kondo ligands, it is not clear whether this compound, with its long
alkyl substituent, acts as a true monovalent inhibitor. Still, this compound is
notable because it was also found to inhibit selectin-mediated rolling in vivo
and dramatically reduce inflammation in a mouse peritonitis model.
The vast majority of glycomimetic studies have been targeted toward one
or two members of each lectin class. Strategies in which the same scaffold
can be used to derive specific inhibitors of different members of a large
class of proteins are even more powerful. Until recently, general scaffolds for
inhibitors of protein-carbohydrate interactions had not been described. In
contrast, peptidomimetic scaffolds such as benzodiazepines have been shown
to be useful for generating a variety of agonists and antagonists to G-protein
coupled receptors [114, 1151. Kiessling and coworkers sought to develop this
type of privileged scaffold for use in generating glycomimetics. To ascertain
whether such a strategy could be implemented, they targeted C-type lectins.
Many C-type lectins bind oligosaccharides that possess a key carbohydrate
residue with the axial-equatorial-equatorial hydroxyl orientation in mannose
(and L-fucose).While these groups can be essential for binding, substitution
at C1 and C6 of the mannosylated (or fucosylated) ligand often varies [116].
Thus, Schuster et al. utilized shikimic acid 8 as a building block to synthesize
mannose (fucose)-like compounds 9 (Fig. 11.1-10) 1551. Functionalization of
shikimic acid through the conjugate addition of a nucleophile (i.e.,a thiolate)
generates a structure that possesses the desired hydroxyl group orientation,
while introducing a site of diversity. Further library diversification can be
achieved by varying the amino acid substituent at the acid moiety ( R I ) , adding
648
I 1 I Advances in Sugar Chemistry
8 9
dithiols (RZ), and subsequently functionalizing the resulting free thiol with
alkyl or benzyl bromides (R3). To test this strategy, they synthesized a focused
library of 192 compounds, which was screened for inhibition of MBP. From
this small library, they identified 10 compounds with activity comparable to or
better than the known ligand, a-methyl mannopyranoside (IC50 = 4-14 mM).
The high hit rate underscores the utility of this strategy.
11.1.5
Overview and Future Development: Inhibition of Protein-Carbohydrate
Interactions
11.1.6
General Consideration: Inhibitors o f Sugar- Nucleotide-binding Enzymes
11.1.7
Applications: Identification of Inhibitors of Sugar- Nucleotide-binding Enzymes
nature has generated an inhibitor of this enzyme: the natural product antibiotic,
fosfomycin 11. Fosfomycin covalently labels a cysteine residue in the PEP
binding site of MurA and renders the enzyme inactive [145].A structure of the
MurA-fosfomycin complex, determined by X-ray crystallographic analysis,
has provided valuable information about the complex [146]. Moreover, it has
been utilized in the design of inhibitors of this sugar-nucleotide-processing
enzyme [143, 1461.
Several research groups have reported the identification of non-carbohydrate
inhibitors of MurA [146-1481. For example, Bush and coworkers identified
inhibitors using a high-throughput screen of a library of compounds in an assay
that monitored formation of inorganic phosphate (Fig. 11.1-13)[148].Three of
the identified inhibitors exhibit lower ICso values than does fosfomycin 12- 14
(Fig. 11.1-14). Modeling and inhibition studies were used to determine the
likely binding mode of these compounds. These data suggest that the identified
inhibitors are noncovalently binding at or near the PEP binding site, leaving
the sugar site unoccupied. These compounds are not glycomimetics, yet they
suggest that targeting unique features of the sugar-nucleotide-binding site
can lead to potent inhibitors.
High-throughput screening techniques have also been utilized to identify
inhibitors of MurG, a glycosyltransferase that mediates one of the final
steps of peptidoglycan synthesis (Fig. 11.1-13) [69]. Rather than assaying
for activity, the Walker group screened for compounds that could inhibit
binding of the substrate UDP-GlcNAc. With a fluorescence polarization
assay, they tested a commercially available library of approximately 49 000
druglike compounds, and identified several MurG inhibitors containing a
2-thioxo-4-thiazolidinone core (15, Fig. 11.1-15, K, = 1.3 pM, ICso = 1.4 pM)
[69, 1491. Using the MurG structure determined by X-ray crystallography
[150], they modeled the complexes to explore the possible binding mode(s) of
?h
1 7 . 1 The Searchfor Chemical Probes to Illuminate Carbohydrate Function 1 653
pp:h%
13 14
Fig. coworkers.
and 11.1-15 MurC inhibitor identified by Walker
15
this scaffold. The authors suggest that the thiazolidinone heterocycle could
mimic the diphosphate moiety by engaging in hydrogen-bonding interactions.
Presumably, the carbonyl (and carbonyl-like) moieties of the heterocycle
interact with hydrogen-bond donors on the protein. The studies also suggest
that the thiazolidinone substituents interact with the uridine and sugar-binding
regions of the protein. More recently, these inhibitors have been shown to
selectively block MurG over several other enzymes that utilize similar or
identical substrates [149].
Inhibitors of enzymes that use sugar-nucleotide substrates have also been
found in the pathway that leads to arabinogalactan synthesis in mycobacteria
[151, 1521. Arabinogalactan is composed of two sugars derived from the
donors, UDP-arabinofuranose and UDP-galactofuranose (UDP-Galf). The
biosynthetic donor of galactofuranose moieties (UDP-Gar) is synthesized by
UGM and the Gay-containing oligosaccharides are assembled by the putative
enzyme, UDP-galactofuranosyltransferase. The most efforts to explore G a y
incorporation have focused on UGM.
UGM is responsible for the isomerization of the thermodynamically favored
UDP-galactopyranose to the less favored UDP-galactofuranose (Fig. 11.1-16).
Sugar-based probes have been employed to study both UGM [153-156] and
the transferase [ 1571, but only recently have non-carbohydrate inhibitors
been identified. The Bertozzi and McNeil groups used a design strategy
that appears similar to that used by nature for tunicamycin. Specifically,
they modified a uridine with substituents. From their uridine-based library,
654
I 1 I Advances in Sugar Chemistry
UGM HO& 8
o-yo-yo
?
- bH bH 0- 0-
UDP-galactopyranose HO OH UDP-galactofuranose HO OH
93% 7%
18
they identified several inhibitors of UGM [158]. Although the results are
promising, the initial hits did not appear to be cell permeable. It will
be interesting to explore the specificity of such ligands. Although it is
not clear whether one can achieve selectivity against other UDP-sugar
binding enzymes with this strategy, tunicamycin acts selectively on its
target.
The Kiessling group pursued an alternative approach. Although the assay
used by Bertozzi and McNeil monitored UGM activity, a high-throughput
fluorescence polarization-binding assay was used by Soltero-Higgin et al.
to identify UGM inhibitors. As with the Walker screen, the hits identified
contain a thiazolidinone or related nitrogen-containing heterocyclic core
(16-18, & 2 4.0 yM, ICso 2 1.6 yM) (Fig. 11.1-17) [68]. It is intriguing that
these compounds have structural features similar to those identified for
MurG. These shared features include the five-membered ring heterocycle
and the 1,3-arrangement of the substituents. This display of functionality
likely facilitates interactions with the sugar-nucleotide-binding regions of
the protein. Unlike the most potent MurG lead 15, which displays one
aromatic and one aliphatic substituent, all the UGM inhibitors contain
11. I The Searchfor Chemical Probes to llluminate Carbohydrate Function I 655
c+
0' 0'
?I
+
19 20
Fig. 11.1-18 Potent MurB inhibitors developed bywalsh 19 and Snyder 20.
antibacterial activity, which compound 19 did not have. One of the most potent
inhibitors 20 (IC50 = 15 pM, MIC = 4 pg mL-') is depicted in Fig. 11.1-18.
To identify probes of rhamnose biosynthesis, Lee and coworkers developed
an in silico library of 3888 compounds that were based on heterocycle 19. The
authors selected RmlC, as they believed it to be the best drug target in this
biosynthetic cascade. It has high substrate specificity, a unique structure, and
lacks a cofactor binding site. They docked these compounds into the active
site of RmlC and selected compounds with the best affinity (the top 5%) for
synthesis. They reported the synthesis of 47 of the 144 prospects (each of
the 47 compounds was synthesized as the esterified and free acid forms, for
example, 21 and 22 in Fig. 11.1-19).Although they did not find any compounds
that potently inhibit bacterial growth, they were able to identify molecules 21
and 22 that can differentiate between two similar enzymes, RmlC and RmlD
(Fig. 11.1-19) [163]. This result provides additional evidence that selective
inhibitors of nucleotide-sugar-processing enzymes can be discovered.
To identify inhibitors of several Mur enzymes, Mansour and coworkers
synthesized a small library (-50 members) of urea- or carbonate-containing
1
Fig. 11.1-20 The most potent urea-containing
inhibitor of MurA and Band bacterial growth.
/ A Y C N
F N N S
H H
23
11.1.8
Overview and Future Development: Inhibitors o f Carbohydrate-processing
Enzymes
Despite the relatively small number of studies that have identified non-
carbohydrate inhibitors of sugar-nucleotide-processing enzymes, it is appar-
ent that structural commonalities exist between these inhibitors (Fig. 11.1-21).
Some authors have suggested that these core structures may be acting as
electronic mimics of the diphosphate through hydrogen-bonding interactions
with their protein-binding partners. It is also possible that these core elements
are simply effective spatial mimics of the diphosphate moiety. The oriented
display of substituents of these heterocyclic scaffolds appears to be conserved
throughout the currently developed probes, suggesting that the spatial orien-
tation enforced by these core elements is at least partially responsible for the
inhibitory activity of these compounds. Undoubtedly, much will be learned
from the continued pursuit of molecules based on these and similar core
structures.
While the identification of these core structures suggests a promising
direction for generating inhibitors of glycan biosynthesis, it also suggests a
potential problem. Specifically, given the aforementioned similarities between
these probes, it may be difficult or impossible to achieve selectivity for
targeting one enzyme over another. While this problem may arise, the current
data suggest that selective inhibitors can be developed. For example, despite
658
I J 7 Advances in Sugar Chemistry
Fig. 11.1-21 Several structurally and/or electronically related scaffolds have been
identified.
the large similarities between the MurG and UGM inhibitors presented here,
both the Walker and Kiessling groups report selective inhibition of their
target enzyme over related proteins [68, 1491. Thus, it seems likely that these
common core structures can be diversified to yield selective inhibitors of
many different sugar-nucleotide-utilizing enzymes. It is also possible that
information acquired from the study of bacterial sugar-processing enzymes
will provide clues for the development of probes for eukaryotic enzymes
that mediate glycan biosynthesis. In addition to its role in bacterial cell
wall biosynthesis, UGM is also found in eukaryotic parasites, such as
Leishmania, and multicellular organisms, such as C. elegans [ 1651. Therefore,
the thiazolidinone-based inhibitors identified for a bacterial UGM could be
tested for efficacy in a eukaryotic system. It will be intriguing to determine
whether these scaffolds or others will be identified as hits from screens with
eukaryotic enzymes. We anticipate that with the advent of cell-permeable
probes of glycan biosynthesis, a greater understanding of the roles of these
enzymes in human disease will emerge.
11.1.9
Conclusion
1. G.E. Ritchie, B.E. Moffatt, R.B. Sim, 13. D. Kahne, Combinatorial approaches
B.P. Morgan, R.A. Dwek, P.M. Rudd, to carbohydrates, Curr. Opin. Chem.
Glycosylation and the complement B i d . 1997, I , 130-135.
system, Chem. Rev. 2002, 102, 14. P. Sears, C.-H. Wong, Toward
305-31 9. automated synthesis of
2. C.R. Bertozzi, L.L. Kiessling, oligosaccharides and glycoproteins,
Chemical glycobiology, Science 2001, Science 2001,291,2344-2350.
291,2357-2364. 15. C. Leimkuhler, 2. Chen, R.G.
3. T. Feizi, Carbohydrate-mediated Kruger, M. Oberthur, W. Lu, C.T.
recognition systems in innate Walsh, D. Kahne, Glycosylation of
immunity, Immunol. Rev. 2000, 173, glycopeptides: a comparison of
79-88. chemoenzymatic and chemical
4. S . Grunewald, G. Matthijs, J. Jaeken, methods, Tetrahedron: Asymmetry
Congenital disorders of glycosylation: 2005, 16,599-603.
a review, Pediatr. Res. 2002, 52, 16. P. Mowery, Z.Q. Yang, E.J. Gordon,
618-624. 0. Dwir, A.G. Spencer, R. Alon, L.L.
5. H.H. Freeze, Human disorders in Kiessling, Synthetic glycoprotein
N-glycosylation and animal models, mimics inhibit L-selectin-mediated
Biochim. Biophys. Acta 2002, 1573, rolling and promote L-selectin
388-393. shedding, Chem. Biol. 2004, 1 I ,
6. J.B. Lowe, J.D. Marth, A genetic 725-732.
approach to mammalian glycan 17. M.J. Grogan, M.R. Pratt, L.A.
function, Annu. Rev. Biochem. 2003, Marcaurelle, C.R. Bertozzi,
72,643-691. Homogeneous glycopeptides and
7. M.A. Schmidt, L.W. Riley, I. Benz, glycoproteins for biological
Sweet new world: glycoproteins in investigation, Annu. Rev. Biochem.
bacterial pathogens, Trends Microbiol. 2002, 71,593-634.
2003, 11,554-561. 18. Y. He, R.J. Hinklin, J. Chang,
8. A. Dell, H.R. Morris, Glycoprotein L.L. Kiessling, Stereoselective
structure determination mass N-glycosylation by staudinger
spectrometry, Science 2001, 291, ligation, Org. Lett. 2004, 6,4479-4482.
2351-2356. 19. D. Macmillan, A.M. Daines, Recent
9. J. Zala, Mass spectrometry of developments in the synthesis and
oligosaccharides, Mass Spectrom. Rev. discovery of oligosaccharides and
2004, 23,161-227. glycoconjugates for the treatment of
10. A. Holeman, P.H. Seeberger, disease, Curr. Med. Chem. 2003, 10,
Carbohydrate diversity: synthesis of 2733-2773.
glycoconjugates and complex 20. W. Zhang, Fluorous tagging strategy
carbohydrates, Curr. Opin. Biotechnol. for solution-phase synthesis of small
2004, 15,615-622. molecules, peptides and
11. S.J. Keding, S.J. Danishefsky, oligosaccharides, Curr. Opin. Drug.
Prospects for total synthesis: a vision Discov. 2004, 7, 2269-2272.
for a totally synthetic vaccine 21. T. Feizi, W.G. Chai, Oligosaccharide
targeting epithelial tumors, Proc. microarrays to decipher the glyco
Nutl. Acad. Sci. U S A . 2004, 101, code, Nut. Rev. Mol. Cell Bid. 2004, 5,
11937-1 1942. 582-588.
12. S. Hanson, M. Best, M.C. Bryan, 22. I . Shin, S. Park, M.R. Lee,
C.-H. Wong, Chemoenzymatic Carbohydrate microarrays: an
synthesis of oligosaccharides and advanced technology for functional
glycoproteins, Trends Biochem. Sci. studies of glycans, Chem. - Eur. J.
2004, 29,656-663. 2005, 1I , 2894-2901.
660
I 1 7 Advances in Sugar Chemistry
23. D.M. Ratner, E.W. Adams, J. Su, B.R. 33. G.S. Jacob, C. Kirmaier, S.Z. Abbas,
O’Keefe, M. Mrksich, P.H. S.C. Howard, C.N. Steininger, J.K.
Seeberger, Probing Welply, P. Scudder, Binding of sialyl
protein-carbohydrate interactions lewis X to E-selectin as measured by
with microarrays of synthetic fluorescence polarization,
oligosaccharides, Chembiochem2004, Biochemistry 1995,34,1210-1217.
5, 379-383. 34. R.V. Weatherman, L.L. Kiessling,
24. 0. Blixt, S. Head, T. Mondala, Fluorescence anisotropy assays reveal
C. Scanlan, M.E. Huflejt, R. Alvarez, affinities of C- and 0-glycosides for
M.C. Bryan, F. Fazio, D. Calarese, concanavalin a, J. Org. Chem. 1996,
J. Stevens, N. Razi, D.J. Stevens, J.J. 61,534-538.
Skehel, 1. van Die, D.R. Burton, I.A. 35. P. Sorme, B. Kahl-Knutsson,
Wilson, R. Cummings, N. Bovin, M. Huflejt, U.J. Nilsson, H. Leffler,
C.-H. Wong, J.C. Paulson, Printed Fluorescence polarization as an
covalent glycan array for ligand analytical tool to evaluate
profiling of diverse glycan binding galectin-ligand interactions, Anal.
proteins, Proc. Natl. Acad. Sci. U.S.A. Biochem. 2004,334,36-47.
2004, 101,17033-17038. 36. C.T. Oberg, S. Carlsson, E. Fillion,
25. Y.C. Lee, R.T. Lee, H. Leffler, U.J. Nilsson, Efficient
Carbohydrate-protein interactions: and expedient two-step pyranose-
basis of glycobiology, Ace. Chem. Res. retaining fluorescein conjugation of
1995, 28,321-327. complex reducing oligosaccharides:
26. E.J. Toone, Structure and energetics galectin oligosaccharide
of protein carbohydrate complexes, specificity studies in a fluorescence
Curr. Opin. Struct. Bid. 1994, 4, polarization assay, Bioconjugate
719-728. Chem. 2003, 14,1289-1297.
27. L.L. Kiessling, N.L. Pohl, Strength in 37. M. Mizuno, M. Noguchi, T. Imai,
numbers: non-natural polyvalent T. Motoyoski, T. Inazu, Interaction
carbohydrate derivatives, Chem. Biol. assay of oligosaccharide with lectin
1996, 3,71-77. using glycosylasparagine, Bioorg.
28. R. Roy, Syntheses and some Med. Chem. Lett. 2004, 14,485-490.
applications of chemically defined 38. E.A. Smith, W.D. Thomas, L.L.
multivalent glycoconjugates, Cum. Kiessling, R.M. Corn, Surface
Opin. Struct. Biol. 1996, 6, 692-702. plasmon resonance imaging studies
29. B.E. Collins, J.C. Paulson, Cell of protein-carbohydrate interactions,
surface biology mediated by low J . Am. Chem. Soc. 2003, 125,
affinity multivalent protein-glycan 6140-6148.
interactions, Curr. Opin. Chem. Biol. 39. B.T. Houseman, M. Mrksich,
2004,8,617-625. Carbohydrate arrays for the
30. W.J. Sanders, E.J. Gordon, 0. Dwir, evaluation of protein binding and
P.J. Beck, R. Alon, L.L. Kiessling, enzymatic modification, Chem. Bid.
lnhibition of L-selectin-mediated 2002, 9,443-454.
leukocyte rolling by synthetic 40. D.A. Mann, L.L. Kiessling, in
glycoprotein mimics, J . Bid. Chem. Glycochemistry:Principles, Synthesis,
1999, 274,5271-5278. and Applications, 1st ed., (Eds.: P.G.
31. K. Kakehi, M. Oda, M. Kinoshita, Wang, C.R. Bertozzi), Marcel Dekker,
Fluorescence polarization: analysis of New York, 2001, pp. 221-275.
carbohydrate-protein interaction, 41. D.M. Ratner, E.W. Adams, M.D.
Anal. Biochem. 2001, 297,111-122. Disney, P.H. Seeberger, Tools for
32. E.G. Weinhold, J.R. Knowles, Design glycomics: mapping interactions of
and evaluation of a tightly binding carbohydrates in biological systems,
fluorescent ligand for influenza a Chembiochem 2004,51375-1383.
hemagglutinin, J . Am. Chem. Soc. 42. E.W. Adams, D.M. Ratner, H.R.
1992, 114,9270-9275. Bokesch, J.B. McMahon, B.R.
References I 6 6 1
O’Keefe, P.H. Seeberger, 52. P. Sorme, Y. Qian, P. Nyholm,
Oligosaccharide and glycoprotein H. Leffler, U.J. Nilsson, Low
microarrays as tools in HIV micromolar inhibitors of galectin-3
glycobiology: glycan-dependent based on 3’-Derivatization of
gpl20/protein interactions, Chem. N-acetyllactosamine, Chembiochem
Bid. 2004, 11, 875-881. 2002,3, 183-189.
43. S. Fukui, T. Feizi, C. Galustian, A.M. 53. D.H. Slee, S.J. Romano, 1. Yu, T.N.
Lawson, W. Chai, Oligosaccharide Nguyen, J.K. John, N.K. Raheja, F.U.
microarrays for high-throughput Axe, T.K. Jones, W.C. Ripka,
detection and specificity assignments Development of potent
of carbohydrate-protein interactions, non-carbohydrate imidazole-based
Nut. Biotechnol. 2002, 20, 1011-1017. small molecule selectin inhibitors
44. S. Park, M.-r. Lee, S.-J. Pyo, I. Shin, with antiinflammatory activity, J .
Carbohydrate chips for studying Med. Chem. 2001,44,2094-2107.
high-throughput carbohydrate- 54. P. Sorme, P. Arnoux,
protein interactions, /. Am. B. Kahl-Knutsson, H. Leffler, J.M.
Chem. SOC.2004, 126,4812-4819. Rini, U.J. Nilsson, Structural and
45. T. Feizi, F. Fazio, W. Chai, C.-H. thermodynamic studies on cation-11
Wong, Carbohydrate microarrays-a interactions in lectin-ligand
new set of technologies at the complexes: high-affinity galectin-3
frontiers of glycomics, Cum. Opin. inhibitors through fine-tuning of an
Struct. Biol. 2003, 13, 637-645. ariginine-arene interaction, /. Am.
46. M.C. Bryan, L.V. Lee, C.-H. Wong, Chem. Soc. 2005, 127,1737-1743.
55. M.C. Schuster, D.A. Mann,T.J.
High-throughput identification
Buchholz, K.M. Johnson, W.D.
of fucosyltransferase inhibitors using
Thomas, L.L. Kiessling, Parallel
carbohydrate microarrays, Bioorg.
synthesis of glycomimetic libraries:
Med. Chem. Lett. 2004, 14,3185-3188.
targeting a C-type lectin, Org. Lett.
47. F. Fazio, M.C. Bryan, 0. Blixt, J.C.
2003, 5, 1407-1410.
Paulson, C.-H. Wong, Synthesis of
56. P.M. Coutinho, E. Deleury, G.J.
sugar arrays in microtiter plate,]. Am. Davies, B. Henrissat, An evolving
Chem. SOC.2002, 124, 14397-14402. hierarchical family classification for
48. H.C. Hang, C. Yu, M.R. Pratt, C.R. glycosyltransferases, ]. Mol. Biol.
Bertozzi, Probing glycosyltransferase 2003, 328,307-317.
activities with the staudinger ligation, 57. H. Wang, S. Hanash, Intact-protein
/. Am. Chem. Soc. 2004, t26,6-7. based sample preparation strategies
49. L. Nimrichter, A. Gargir, M. Gortler, for proteome analysis in combination
R.T. Altstock, A. Shtevi, with mass spectrometry, Muss
0. Weisshaus, E. Fire, N. Dotan, R.L. Spectrom. Rev. 2005, 24,413-426.
Schnaar, Intact cell adhesion of 58. S.P. Gygi, B. Rist,
glycan microarrays, Glycobioloa S.A. Gerber, F. Turecek, M.H.
2004, 14,197-203. Gelb, R. Aebersold, Quantitative
50. M.D. Disney, P.H. Seeberger, The analysis of complex protein mixtures
use of carbohydrate microarrays to using isotope-coded affinity tags,
study carbohydrate-cell interactions Nut. BiotechnoL. 1999, 17, 994-999.
and to detect pathogens, Chem. Biol. 59. N.L. Pohl, Functional proteomics for
2004, 11,1701-1707. the discovery of carbohydrate-related
51. H. Moriyama, Y. Hiramatsu, enzyme activities, Curr. Opin. Chem.
T. Kiyoi, T. Achiha, Y. Inoue, Bid. 2005, 9, 76-81.
H. Kondo, Studies on selectin 60. C.J. Zea, N.L. Pohl, Kinetic and
blocker. 9. SARs of non-sugar substrate binding analysis of
selectin blocker against E-, P-, phosphorylase b via electrospray
L-selectin bindings, Bioorg. Med. ionization mass spectrometry: a
Chem. 2001, 9, 1479-1491. model for chemical proteomics of
1 7 Advances in Sugar Chemistry
662
I sugar phosphorylases, Anal. Biochem. 71. L.L. Kiessling, T. Young, K.H.
2004,327,107-113. Mortell, in Glycoscience: Chemistry
61. C.1. Zea, N.L. Pohl, General assay for and Chemical Biology 1-111,1st ed.,
sugar nucleotidyltransferases using (Eds.: B. Fraser-Reid, K. Tatsuta,
electrospray ionization mass J. Thiem), Springer, New York, 2003,
spectrometry, Anal. Biochem. 2004, pp. 1817-1861.
328,196-202. 72. L.L. Kiessling, J.E. Gestwicki, L.E.
62. Y. Yu, K.4. KO, C. Zea, N.L. Pohl, Strong, Synthetic multivalent ligands
Discovery of the chemical function of in the exploration of cell-surface
glycosidases: design, synthesis, and interactions, Curr. Opin. Chem. Biol.
evaluation of mass-differentiated 2000,4,696-703.
carbohydrate libraries, Org. Lett. 73. M. Mammen, S.-K. Choi, G.M.
2004, 6,2031-2033. Whitesides, Polyvalent interactions
63. C.-S. Tsai, Y.-K. Li, L.-C. Lo, Design in biological systems: implications
and synthesis of activity probes for for design and use of multivalent
glycosidases, Org. Lett. 2002, 4, ligands and inhibitors, Angew. Chem.,
3607-3610. lnt. Ed. Engl. 1998,37,2755-2794.
64. M. Ichikawa, Y. Ichikawa, A 74. E.E. Simanek, G.J. McGarvey, J.A.
mechanism-based affinity-labeling Jablonowski, C.-H. Wong,
agent for possible use in isolating Selectin-carbohydrate interactions:
N-acetylglucosaminidase, Bioorg. from natural ligands to designed
Med. Chem. Lett. 2001, 11, mimics, Chem. Rev. 1998, 98,
1769-1773. 833-862.
65. D.J. Vocadlo, C.R. Bertozzi, A 75. J.E. Gestwicki, C.W. Cairo, L.E.
strategy for functional proteomic Strong, K.A. Oetjen, L.L. Kiessling,
analysis of glycosidase activity from Influencing receptor-ligand binding
cell lysates, Angew. Chem., Int. Ed. mechanisms with multivalent ligand
Engl. 2004,43,5338-5342. architecture, J. Am. Chem. Soc. 2002,
66. P. Sears, C.-H. Wong, Carbohydrate 124,14922-14933.
mimetics: a new strategy for tackling 76. H. Kamitakahara, T. Suzuki,
the problem of N. Nishigori, Y. Suzuki, 0. Kanie,
carbohydrate-mediated biological C.-H. Wong, A lysoganglioside
recognition, Angew. Chem., Int. Ed. poly-L-glutamic acid conjugate as a
Engl. 1999,38,2300-2324. picomolar inhibitor of influenza
67. B.R. Stockwell, Chemical genetics: hemagglutinin, Angew. Chem., Int.
ligand-based discovery of gene Ed. Engl. 1998,37,1524-1528.
function, Nut. Rev. Genet. 2000, I , 77. J.D. Reuter, A. Myc, M.M. Hayes,
116-125. Z.H. Gan, R. Roy, D.J. Qin, R. Yin,
68. M. Soltero-Higgin, E.E. Carlson, J.H. L.T. Piehler, R. Esfand, D.A. Tomalia,
Phillips, L.L. Kiessling, Identification J.R. Baker, Inhibition ofviral
of inhibitors for adhesion and infection by
UDP-galactopyranose mutase, J. Am. sialic-acid-conjugated dendritic
Chem. SOC.2004, 126,10532-10533. polymers, Bioconjugate Chem. 1999,
69. J.S. Helm, Y. Hu, L. Chen, B. Gross, 10,271-278.
S. Walker, Identification of active-site 78. P.I. Kitov, J.M. Sadowska, G. Mulvey,
inhibitors of MurG using a G.D. Armstrong, H. Ling, N.S.
generalizable, high-throughput Pannu, R.J. Read, D.R. Bundle,
glycosyltransferase screen, I.Am. Shiga-like toxins are neutralized by
Chem. SOC.2003, 125,11168-11169. tailored multivalent carbohydrate
70. L.L. Kiessling, J.K. Pontrello, M.C. ligands, Nature 2000, 403,669-672.
Schuster, in Carbohydrate-Based Drug 79. E.K. Fan, Z.S. Zhang, W.E. Minke,
Discovery, 1st ed. (Ed.: C.-H. Wong), 2. Hou, C. Verlinde, W.G.J. Hol,
Wiley-VCH, Weinheim, 2003, High-affinity pentavalent ligands of
pp. 575-608. Escherichia coli heat-labile enterotoxin
References I663
668
I I 1 Advances in Sugar Chernistv
11.2
Chemical Clycomics as Basis for Drug Discovery
Outlook
11.2.1
Introduction
Three major classes of polymers are responsible for the storage and transfer
of information in biological systems: These are nucleic acids, proteins, and
polysaccharides. DNA, the genetic material transferring information from
generation to generation, functions as the blueprint of life. RNA serves as a
transient repository of genetic information on the way from DNA to proteins,
but also has pivotal roles in cell division, gene expression, and catalysis. The
protein synthesis machinery, called the ribosome, consists of RNA [l].Proteins,
the second major class of biopolymers, which are encoded by nucleic acids,
represent the catalytic machinery carrying out most of the reactions in the
cell. Proteins are also important as skeletal material of numerous organisms
to provide strength as well as flexibility. Glycosyltransferases, a special class of
enzymes, are responsible for the synthesis of carbohydrates, the third class of
biopolymers.
While nucleic acids and proteins are linear assemblies, carbohydrates are
structurally and stereochemically more diverse. A wide array of available
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA. Weinheim
ISBN: 978-3-527-31150-7
11.2 Chemical Clycomics as Basisfor Drug Discovery I 669
,,'
, 0, < ' ., ,,,,' Proteomics
Protein - Proteir;\
Nucleic acid - Nucleic acid interactions :\'
interactions
-----...._.._.
Glycomics /
",%,Carbohydrate- Carbohydrate,,,"
*.
interactions ,,/
-----.__..-.._.---
_ _ - +
_.-'.
Fig. 11.2-1 Interactions o f t h e three main classes o f biopolymers.
11.2.2
Automated Carbohydrate Synthesis
8 7 6 5 4
11.2.3
Tools for Clycomics
Table 11.2-1 General cycle used with glycosyl phosphates for the
construction of oligosaccharides 1-3
over the chip. During this process, the refractive index of the chip changes
owing to the interaction as well as the accumulation of analyte. The kinetic data,
obtained in this fashion allows one to calculate association and dissociation
constants from sub-microgram quantities of material. There is no need to
label the ligand or the analyte, and any influence of a label on the binding
affinities can be excluded. A further advantage is that these measurements
permit evaluations of low and high affinity interactions. SPR is on the way
to become an extremely powerful tool in glycomics, since structure-activity
relationships are quickly assessed.
11.2.4
Oligosaccharide Conjugate Vaccines: Malaria and HIV
\ I
11.2.5
Carbohydrate- Nucleic Acid Interactions: Aminoglycosides
I
OR
-OR 13
10
OH OH
HO
HHO O S
0 OH
HHO O M
-OR OR
14
11
OH OH
&
''ORHO
Hoa
15
HO
12
1
HO
OR
16
71.2 Chemical Clycomics as Basisfor Drug Discovery I 681
11.2.6
Carbohydrate- Protein Interactions: Selectins and Heparin
HO
OH
Kanamycin A
HO
Neomycin B
OH NH
HO OH HO OH
Ribostamycin 6-N-P-Alanin-l,3,3'-N-guanidino-
ribostamycin
allow for normal trafficking and are involved in the extravasation of leukocytes
during the inflammatory cascade.
With the aid of monoclonal antibodies, sialylated carbohydrate structures,
notably sialyl Lea and sialyl Le", were discovered to function as receptors for
the selectins [43]. Sialyl Le" is usually located on leukocytes, but also highly
expressed on a variety of different cancer cells [45]. The same holds true for
sialyl Lea,which serves as a tumor marker on gastrointestinal and pancreatic
cancers [4G]. Owing to the function ofsialyl Lewis structures in the extravasation
of cancer cells from the bloodstream and promoting metastatic spread to other
tissues, a clear correlation of expression of sialyl Lea and sialyl Le" on tumors
with enhanced progression and metastasis was observed. Since it is assumed
that these tumor-associated carbohydrate markers enhance extravasation and
metastasis by interactions with selectins, experiments were performed where
selectin expression was inhibited. Long-term studies showed that cancer
11.2 Chemical Clycomics as Basisfor Drug Discovery I 683
patients with tumors that express high amounts of sialyl Lea had a 4.5 times
higher probability to survive over a 10-yearperiod if the expression of E-selectin
was permitted [47]. These results point to a specific new form of cancer
therapy by directly inhibiting these carbohydrate-protein interactions that are
responsible for metastasis and tumor progression. Thus, the pharmaceutical
industry has explored the use of the bioactive conformations of sialyl Lea
and sialyl Le" to design glycomimetic drugs that bind to selectins. Beyond
developing glycomimetics based on rational design, combinatorial approaches
had much success. Solid-phase techniques were used to obtain libraries of
fucopeptides [48] for in vitro screening, and high-throughput screening of a
P-selectin assay showed that glycomimetics devoid of carbohydrate structures
also revealed strong binding [49]. However, in general selectins are problematic
for drug discovery because they show relatively weak multivalent interactions
that make a general approach more difficult.
Heparin is widely known to be a biologically important and chemically
unique polysaccharide, regulating a large variety of physiological processes. It
interacts with a plethora of different proteins of physiological importance [50].
The interaction with antithrombin I11 (AT 111) is best understood. Thus, since
the late 1930s heparin has served as a clinical anticoagulant in the treatment
of heart disease. Interactions with growth factors, chemokines, lipid-binding
proteins, and viral envelope proteins are worth noting [SO].
Heparin is a linear, unbranched, highly sulfonated polymer that consists of
(1+4)-linked pyranosyluronic acid and glucosamine units (Fig. 11.2-6) [51].
The type of uronic acid varies; usually 90% of L-iduronic acid and 10% of
D-glucuronic acid are found. Commonly, 20 to 200 disaccharide repeat units
are found giving rise to a tremendous complexity.
Because ofthe high content of negatively charged sulfate and carboxyl groups,
the most prominent type of interaction between heparin and basic amino acids
of the protein is of ionic nature. But, in some cases, hydrogen bonding and
even hydrophobic interactions are not negligible. With the exception of the
AT 111-heparin interaction, where the exact sequence of heparin associating
with the protein has been identified, the structure-function relationship of
11.2.7
Detection o f Pathogenic Bacteria
5
0
0
0
50
2) O
-H
O
. 0
-O-NH~
5p
r'
3, HO+NHz quench
OH
17
11.2.8
Conclusion
Fig. 11.2-7 Laser scanning confocal transmitted light images). (c) Fluorescence
microscopy image of: (a) Mutant Escherichia microscopy image of a large fluorescent
coli that does not bind t o polymer 17. (b) A bacterial cluster. (d) Conventional
fluorescent bacterial aggregate due t o fluorescence spectra of polymer 17 (black)
multivalent interactions between the and normalized fluorescence spectra of a
mannose-binding bacterial pili and the bacterial cluster obtained using confocal
polymer 17 (superimposed fluorescence and microscopy (red).
Acknowledgments
We thank all present and past members of the Seeberger group and our
collaborators who contributed to the results reported in this chapter. Daniel
B. Werz is grateful to the Alexander von Humboldt Foundation for a Feodor
Lynen Research Fellowship and to the Deutsche Forschungsgemeinschaft
(DFG) for an Emmy Noether Fellowship. Peter H. Seeberger thanks the ETH
for financial support.
References
1996, 96,683-720; (d) Y.C. Lee, R.T. carbohydrates, Pure Appl. Chem. 1995,
Lee, Carbohydrate- 67,1609-1616.
protein interactions: Basis of 9. O.J. Plante, E.R. Palmacci, P.H.
glycobiology, Acc. Chem. Res. 1995, 28, Seeberger, Automated solid-phase
322-327; (e) W.H. Chambers, C.S. synthesis of oligosaccharides, Science
Brisette-Storkus, Hanging in the 2001,291,1523-1527.
balance: natural killer cell recognition 10. P.H. Seeberger, Automated
of target cells, Chem. Biol. 1995, 2, carbohydrate synthesis to drive
429-435. chemical glycomics, Chem. Commun.
3. T. Hunkapiller, R. J. Kaiser, B.F. Koop, 2003, 1115-1121.
L. Hood, Large-scale and automated 1 I. 0. J. Plante, R.B. Andrade, P.H.
DNA sequence determination, Science Seeberger, Synthesis and use of
1991,354,59-67. glycosyl phosphates as glycosyl
4. (a) M.H. Caruthers, Gene synthesis donors, Org. Lett. 1999, I, 211-214.
machines: DNA chemistry and its 12. R.R. Schmidt, W. Kinzy,
uses, Science 1985,230,281-285; Anomeric-oxygen activation for
(b) M.H. Caruthers, Chemical glycoside synthesis: the
synthesis of DNA and DNA analogs, trichloroacetimidate method, Adv.
Acc. Chem. Res. 1991,24,278-284. Carbohydr. Chem. Biochem. 1994, 50,
5 . E. Atherton, R.C. Sheppard, 21-123.
Solid-phase peptide synthesis: A practical 13. K.R. Love, P.H. Seeberger, Automated
approach, Oxford University Press, solid-phase synthesis of protected
Oxford, 1989. tumor-associated antigen and blood
6. (a) R. Rodebaugh, S. Joshi, group determinant oligosaccharides,
B. Fraser-Reid, H.M. Geysen, Angew. Chem., Int. Ed. 2004, 43,
Polymer-supported oligosaccharides 602-605.
via n-pentenyl glycosides: 14. M.C. Hewitt, P.H. Seeberger,
methodology for a carbohydrate Automated solid-phase synthesis of a
library, J . Org. Chem. 1997, 62,
branched Leishmania cap
5660-5661; (b) J. Rademann,
tetrasaccharide, Org. Lett. 2001, 3,
A. Geyer, R.R. Schmidt, Solid-phase
3699-3702.
supported synthesis of the branched
15. R.B. Andrade, O.J. Plante, L.G.
pentasaccharide moiety that occurs in
Melean, P.H. Seeberger, Solid-phase
most complex type N-glycan chains,
oligosaccharide synthesis: Preparation
Angew. Chem., Int. Ed. 1998, 37,
of complex structures using a novel
1241- 1245.
7. (a) S.J. Danishefsky, M.T. Bilodeau,
linker and different glycosylating
Glycals in organic synthesis: the agents, Org. Lett. 1999, I, 1811-1814.
evolution of comprehensive strategies 16. E.R. Palmacci, M.C. Hewitt, P.H.
for the assembly of oligosaccharides Seeberger, “Cap-Tag” - novel
and glycoconjugates of biological methods for the rapid purification of
consequence, Angew. Chem., Int. Ed. oligosaccharides prepared by
Engl. 1996, 35, 1380-1419; (b) P.H. automated solid-phase synthesis,
Seeberger, S.J. Danishefsky, Angew. Chem., Int. Ed. 2001, 40,
Solid-phase synthesis of 4433-4437.
oligosaccharides and glycoconjugates 17. G.Hummel, R.R. Schmidt,
by the glycal assembly method: A five Glycosylimidates. 79. A versatile
year retrospective, Acc. Chem. Res. preparation of the lactoneo-series
1998, 31, 685-695; (c) R.R. Schmidt, antigens-preparation of sialyl dimer
J.C. Castro-Palomino, 0. Retz, New Lewis X and the dimer Lewis Y,
Aspects of glycoside bond formation, Tetrahedron Lett. 1997, 38, 1173-1 176.
Pure Appl. Chem. 1999, 71,729-744. 18. P.P.Deshpande, S.J. Danishefsky,
8. C.-H. Wong, Enzymic and Total synthesis of the potential
chemo-enzymic syntheses of anticancer vaccine KH-1
References I689
adenocarcinoma antigen, Nature 1997, protein-carbohydrate interactions with
387,164-166. microarrays of synthetic
19. G. Ragupathi, P.P. Deshpande, D.M. oligosaccharides, ChemBioChem2004,
Coltart, H.M. Kim, L. J. Williams, S.J. 5, 379-383.
Danishefsky, P.O. Livingston, 23. D. Wang, S. Liu, B.J. Trummer,
Constructing an adenocarcinoma C. Deng, A. Wang, Carbohydrate
vaccine: immunization of mice with microarrays for the recognition of
synthetic KH-1 nonasaccharide cross-reactive molecular markers of
stimulates anti-KH-1and anti-Le(y) microbes and host cells, Nut.
antibodies, Znt. J . Cancer 2002, 99, Biotechnol. 2002, 20, 275-281.
207- 2 12. 24. B.T. Houseman, M. Mrkisch,
20. D.M. Ratner, E.W. Adams, M.D. Carbohydrate arrays for the evaluation
Disney, P.H. Seeberger, Tools for of protein binding and enzymatic
glycomics: Mapping interactions of modification, Chem. Biol. 2002, 9,
carbohydrates in biological systems, 443-454.
ChemBioChem2004, 5, 1375-1383. 25. E.W. Adams, J. Ueberfeld, D.M.
21. (a) D. Barnes-Seemann, S.B. Park, Ratner, B.R. O’Keefe, D.R. Walt, P.H.
A.N. Koehler, S.L. Schreiber, Seeberger, Encoded fiber-optic
Expanding the functional group microsphere arrays for probing
compatibility of small molecule protein-carbohydrate interactions,
microarrays: Discovery of novel Angew. Chem., Znt. Ed. 2003, 42,
calmodulin ligands, Angew. Chem., 5317-5320.
lnt. Ed. 2003, 42,2376-2379; 26. B.T. Houseman, E.S. Gawalt,
(b) S. Fukui, T. Feizi, C. Galustian,
M. Mrksich, Maleimide functionalized
A.M. Lawson, W. Chai,
self-assembled monolayers for the
Oligosaccharide microarrays for
preparation of peptide and
high-throughput detection and
carbohydrate biochips, Langmuir 2003,
specifity assignments of
19,1522-1531.
carbohydrate-protein interactions, Nat.
27. World Health Organization, World
Biotechnol. 2002, 20, 1011-1017:
malaria situation 1990, World Health
(c) A.N. Koehler, A.F. Shamji, S.L.
Stat. Q. 1992, 45, 257-266.
Schreiber, Discovery of an inhibitor of
28. L. Schofield, F. Hackett, Signal
a transcription factor using small
molecule microarrays and diversity transduction in host cells by a
oriented synthesis, J. Am. Chem. SOC. glycosylphosphatidylinositol toxin of
2003, 125,8420-8421; (d) P.J. malaria parasites, J . Exp. Med. 1993,
Hergenrother, K.M. Depew, S.L. 177,145-153.
Schreiber, Small molecule 29. L. Schofield, M.C. Hewitt, K. Evans,
microarrays: Covalent attachment and M.-A. Siomos, P.H. Seeberger,
screening of alcohol-containing small Synthetic GPI as a candidate anti-toxic
molecules on glass slides, J . Am. vaccine in a model of malaria, Nature
Chem. SOC.2000, 122,7849-7850. 2002,418,785-789.
22. (a) S. Bidlingmaier, M. Snyder, 30. M.C. Hewitt, D.A. Snyder, P.H.
Carbohydrate analysis prepares to Seeberger, Rapid synthesis of a
enter the “omics” era, Chem. Biol. glycosylphosphatidylinositol-based
2002, 9,400-401; (b) K.R. Love, P.H. malaria vaccine using automated
Seeberger, Carbohydrate arrays as solid-phase oligosaccharide synthesis,
tools for glycomics, Angew. Chem., Znt. J . Am. Chem. Soc. 2002, 124,
Ed. 2002, 41, 3583-3586: (c) L.L. 13434-13436.
Kiessling, C.W. Cairo, Hitting the 31. E.W. Adams, D.M. Ratner, H.R.
sweet spot, Nat. Biotechnol. 2002, 20, Bokesch, j.B. McMahon, B.R. O’Keefe,
234-235; (d) D.M. Ratner, W.W. P.H. Seeberger, Oligosaccharide and
Adams, J. Su, B.R. O’Keefe, glycoprotein microarrays as tools in
M. Mrkisch, P.H. Seeberger, Probing HIV glycobiology: Glycan-dependent
690
I 1 1 Advances in Sugar Chemistry
12
The Bicyclic Depsipeptide Family of Histone Deacetylase
Inhibitors
Paul A. Townsend, Simon J . Crabb, Sean M . Davidson, Peter W. M . Johnson,
Graham Packham. and Arasu Ganesan
Outlook
It is only a decade since the first human histone deacetylase (HDAC) was
identified. Within this short period of time, these enzymes have had a
glorious history. Broad ranging studies by both chemists and biologists have
dramatically increased our fundamental understanding of H DACs and their
function in eukaryotic cell regulation. On the drug discovery front, multiple
HDAC inhibitors are at stages of clinical development as anticancer agents.
It is probable that more than one will soon be approved as a drug. A further
development is the link between HDAC inhibitors and a growing set of
therapeutic indications outside the cancer area. One can anticipate proof of
concept animal models leading to clinical trials for these drugs in the near
future.
In this review, we have focused on the bicyclic depsipeptide family of natural
product HDAC inhibitors. Compared to other classes, these compounds exhibit
high potency and a marked degree of selectivity between individual HDACs.
One of the natural products, FK228, is currently in advanced clinical trials
for cancer. Others, the spiruchostatins, were recently discovered and show a
similar biological profile of action. With these natural products, it is unclear
(and unlikely) that their precise structure represents the optimal molecule
within this class for human therapeutics. Several academic laboratories,
including our own, have achieved the total synthesis of depsipeptides. These
routes are being applied to the preparation of novel unnatural analogs,
which hold great promise in further exploiting the depsipeptides as subtype-
selective biological probes of HDAC function and as potential therapeutic
agents.
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH 61 Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
694
I 12.1
Epigenetic Mechanisms o f Gene Regulation
12.2
Histone Deacetylases
Transcription factor p73, TCF, GATA-I, RelA, E2F, UBF, EKLF, NF-Y,
STATG, CREB, c-Jun,CIEBDj3, E2A, HMGI (Y),
UBF, N F - K B p65/Rel A, NF-KB p50, YYI, BclG,
Cart-1, HIV-1 Tat, Brm, MyoD, TALl/SCL, E2A,
HIF-la, TFIIE, TFIIF, PC4, TFIIB, TAFI68
Tumor suppressor P53
Cell cycle Rb
Cell adhesion p-Catenin
Nuclear hormone receptor AR, E R a
Nuclear import factor Importin a , Rehl
Cytoskeleton protein a-Tubulin
Chaperone protein HSPOO
Signaling regulation Smad7
Apoptosis regulator Ku70
Nonhistone chromatin protein HMGBl/HMGl, HMGB2/HMGZ,
HMGNl/HMG14, HMGN2/HMG17
DNA metabolism Flap endonuclease-1, thymine DNA glycosylase,
Werner DNA helicase
DNA replication factor PCNA, MCM3
Chromatid cohesion protein San, cohesion subunits
Viral protein Adenoviral ElA, large T antigen, HIV Tat, s-HDAg
Bacterial protein Alba, CheY, acetyl CoA synthetase
Histone acetyl transferase DCAF, p300, CBP
12.3
Class I and Class II HDACs as Drug Discovery Targets
12.4
HDAC Inhibitors
This strategy has yielded successful drugs in the past, such as the angiotensin
converting enzyme (ACE) inhibitor Captopril and later congeners. More
recent examples include inhibitors of matrix metalloproteinases and peptide
deformylase. For HDACs, the pharmacophore is defined by a metal-binding
group attached to a linear unit of similar dimensions to the lysine side chain of
the substrate. This is terminated by a “cap” that serves to orient the inhibitor
in the enzyme’s substrate-binding channel.
The difficulty of expressing eukaryotic HDACs and obtaining them in
pure form has hampered our understanding of the mechanism of action
at the molecular level. A seminal breakthrough came about in 1999 with
the X-ray structure [20] of a HDAC-like protein from the thermophilic
bacterium Aqu@x aeolicus. Since bacteria lack histones, presumably the
protein acts as a lysine deacetylase upon other substrates. The bacterial
protein shares high homology with class I HDACs in its catalytic domain
and offers a reliable working model for the latter. The zinc atom in
the enzyme active site lies at the end of a narrow substrate-binding
channel that binds the acetyl-lysine side chain (Fig. 12-3). More recently,
the structures of human HDACS [21] and a bacterial enzyme [22]homologous
to class I1 HDACs were disclosed. At a gross level, all these structures are
similar in their substrate-binding channels. They are less informative in
Fig. 12-3 The X-ray structure of a bacterial corresponds to that of Fig. 12-4, with the
histone deacetylase-like protein catalytic zinc in purple. Source: T. A. Miller,
homologous to human class I HDACs. The D. 1. Witter, 5. Belvedere,J. Med. Chern.
color coding of amino acid residues 2003,46,5097-5116.
700
I 12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
Fig. 12-4 Sequence homology between rim regions are shown in color. Source:
mammalian HDACs and the bacterial T. A. Miller, D. J. Witter, S. Belvedere,J.
HDAC-like protein (HDLP). Conserved Med. Chem. 200% 4 6 5097-5116.
residues within the active site, channel, and
-
Valproic acid SAHA
0 '
MS-275
x
FK228
Fig. 12-5 Examples of H D A C inhibitors that have reached clinical trials as anticancer
agents.
product trichostatin A (TSA).Although too toxic for therapeutic use, TSA was
the first HDAC inhibitor to be mechanistically identified as such [24] and
remains the standard chemical probe of HDAC function and is widely used
as a molecular biological tool. Thousands of synthetic hydroxamic acid H DAC
inhibitors have been reported. Breslow's suberoylanilide hydroxamic acid
(SAHA) illustrates the design requirements for HDAC inhibition perfectly:
a hydroxamic acid metal-binding group, a linear spacer, and an anilide
cap. SAHA was commercialized via the startup Aton Pharmaceuticals, later
acquired by Merck for several hundred million dollars. SAHA is currently
under review for FDA approval and is an excellent illustration that drugs can be
minimalistic in structure and be successfully discovered in an academic setting.
The third family of HDAC inhibitors are cyclic tetrapeptide natural products
exemplified by the trapoxins and apicidins. A ketone functions as the metal-
binding group and an adjacent epoxide capable of irreversible covalent binding
to the enzyme is often present. The natural products contain a mixture of L
and D amino acids and a proline residue to favor the tight turn necessary to
cyclize a tetrapeptide. Although the cyclic tetrapeptides have yet to advance
to clinical trials, they are important biological tools. Schreiber's group [25]
used an affinity column with immobilized trapoxin B to identify its target of
action, and this led to the first characterization of a mammalian HDAC. More
recently, Nishino and Yoshida have reported [26]a series of unnatural analogs
based on the tetrapeptide scaffold with different zinc-binding groups.
Benzamides represent a fourth class of HDAC inhibitors. Unlike the other
H DAC inhibitors above, benzamides do not conform to the simple pharma-
cophore model with an obvious metal-binding group connected to a linear
spacer. Whether they work by the same mechanism or target an allosteric
site on the enzyme is not fully resolved. Nevertheless, they display nanomolar
potency, and more than one compound have reached phase I clinical trials for
cancer.
702
I 72 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
Fig. 12-6 Relative expression levels of HDACs, by qPCR, in a series o f cancer cell lines.
72.5 The Depsipeptide HDAC Inhibitors
12.5
The Depsipeptide HDAC Inhibitors
I
FK228 FR901,375 Spiruchostatin A R = i-Pr
Spiruchostatin B R = s-Bu
lntracellular Spiruchostatin C R= i-Bu
disulfide reduction
SHC! \
Spiruchostatin D
/
HS
FK228, active form
12.6
Total Synthesis of Depsipeptide HDAC Inhibitors - Routes to the B-HydroxyAcid
Fragment
Disconnection of the depsipeptides at the amide and ester bonds plus the
intramolecular disulfide bridge leads to a peptide fragment and a p-hydroxy
acid. Neither of these is particularly daunting by the standards of modern day
complex molecule total synthesis. Nevertheless, the molecule as a whole has
an intricate array of functional groups that need to be selectively manipulated.
In addition, two macrocycles need to be made, which is always challenging
due to the entropic difficulty of making large-sized rings.
All the depsipeptides contain a common B-hydroxy acid, which can be
disconnected by an aldol reaction. However, it is an example of an “acetate
aldol” that suffers from poor facial selectivity of the acetate enolate. Many
of the auxiliaries and reagent-based conditions that work for propionate and
other a-substituted enolates are unsuitable for acetate aldols. In the event, each
depsipeptide total synthesis has featured a different route for the synthesis of
this B-hydroxy acid fragment.
In Simon’s pioneering FK228 synthesis [34], methyl pentadieonate was
reacted with trityl thiol to give the 1,6 conjugate addition product 1 as an
inconsequential mixture of a$- and p ,y-unsaturated isomers. Reduction
to the alcohol 2 and oxidation provided the a,B-unsaturated aldehyde 3.
The key asymmetric acetate aldol reaction was carried out using Carreira’s
conditions (Scheme 12-1) to give 4 in nearly quantitative yields and perfect
enantioselectivity, followed by hydrolysis to acid ent-5.This is the enantiomer of
the fragment present in the natural products. Because of later difficulties with
the macrolactonization, that step was carried out under Mitsunobu conditions
with inversion of the alcohol, hence necessitating the opposite stereochemistry
in precursor ent-5.
In the Wentworth-janda synthesis [35] of FR-901375, aldehyde 3 was
obtained by a shorter route via conjugate addition to acrolein and Wittig
reaction (Scheme 12-2). The authors had difficulties reproducing the
high enantioselectivity of Simon’s aldol reaction and alternative solutions
were sought. The successful synthesis utilized the Evans’ chiral auxiliary
with chloroacetate. The chloride is a “dummy” substituent ensuring high
diastereoselectivity in the aldol adduct 6 . The chloride was then reduced and
the auxiliary removed to give acid ent-5.
In our synthesis [36] of spiruchostatin A, we followed Simon’s procedure for
the preparation of 3. We too were unable to achieve the Carreira aldol in good
yield. Moreover, the reaction requires the preparation of three noncommercial
materials: the binaphthyl chiral aminophenol, the t-butyl salicaldehyde, and
the silyl ketene acetal. Instead, we opted for a diastereoselective aldol with
the Nagao auxiliary. For reasons that are not completely clear, the Nagao
thiazolidinethione auxiliary exhibits high diastereoselectivity in acetate aldols
unlike the more popular Evans oxazolidinone auxiliary. In this case, aldol
adduct 7 was obtained in good yield (Scheme 12-3).Unlike the other syntheses,
this was coupled directly to the peptide rather than hydrolyzed to the acid 5.
In the Doi-Takahashi synthesis [37] of spiruchostatin A, the acetate aldol was
performed with the Seebach quaternary oxazolidinone chiral auxiliary. The best
12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
706
I
k
0
Me0 78% M e O L S T f l
(6:la$ to P,yisomer)
- b
91% HO-
(6:la,p to p,y isomer)
C
1 2
I
Ti(0i-Pr),
BnO d 100%
Toluene 4 "C, 36 h; TBAF, THF, 5 min
0 OH
HO-STfl
enf-5
Scheme 12-1 Simon's route to acid 5. equiv (COCI)?,2.4 equiv DMSO, CH2C12,
Reagents and conditions: (a) 1.2 equiv -78"C, 30 min; 2.4 equiv Et3N, -30°C 4 h.
TrtSH, 1.2 equiv C s 2 C 0 3 ,THF, 20 h. (b) 2 (d) 10 equiv LiOH, MeOH, 3 h.
equiv DIBAL, CH&, -78 "C, 3 h. (c) 1.2
12.7
Total Synthesis o f Depsipeptide HDAC Inhibitors - Peptide Synthesis and
Formation o f the seco-Hydroxy Acid
Simon's FK228 synthesis, the first in this area, provided a blueprint for
preparation of the peptide fragment and its linkage to the B-hydroxy acid
5. Starting from D-valine, standard peptide coupling furnished the linear
peptide 9 (Scheme 12-4).The dehydrobutyrine side chain was now introduced
by conversion of the threonine to a tosylate followed by elimination. After
Fmoc deprotection, the free N-terminus was coupled to acid ent-5, and the
C-terminus methyl ester hydrolyzed to give seco-acid 10. A similar strategy was
employed in Wentworth and Janda's synthesis of FR901,375. For this target,
the absence of the dehydrobutyrine unit simplifies the tetrapeptide synthesis,
which was accomplished in a straightforward manner. Coupling with ent-5
and hydrolysis gave the seco-acid 11.
12.7 Peptide Synthesis and Formation of the seco-Hydroxy Acid
I 707
0
0
1.8 equiv Bu2BOTf
A ,N I . ~2~equivI i-Pr,NEt
1
-.,
'Bn
CH2CI2, -78 to -10 "C, 8 h
69%, >90% de
1.7 equiv
0 JNk
d., S OH
3 'r
1.9 equiv TiCI, ~ 0 JN
STrt
d,,7
h
1.9 equiv i-Pr,NEt
CH,C12, -78 "C, 30 rnin 'r 76%
' 0 STrt
H L S3T r t BuLi
1.2 equiv Cp,ZrC12
THF, -78 "C to rt
XJ.
Ph Ph
8
51Yo
Scheme 12-3 The Canesan and Doi-Takahashi procedures for enantioselective acetate
aldol reactions with aldehyde 3.
In the spiruchostatin syntheses, the presence ofa statine unit in the peptide
fragment requires a significantly different protecting group strategy. Statine
esters, unless sterically hindered, rapidly undergo intramolecular cyclization
708
I 12 The Bicyclic Depsipeptide family of Histone Deacetylase Inhibitors
HzNs*
Me0
b
85%
AOMeoH 45% ‘,
o \
W
13
'L
y y
H
-
B O C - N A C ~ ~ H34%
d
~ o c .
OH 0
~ v l l
14 OH
15
was removed, and the amine coupled with D-Cys(Trt)as described above. The
free alcohol was protected and the peptide sequentially coupled with D-alanine
and thiazolidinethione 7. Reductive removal of the trichloroethyl ester under
neutral buffered conditions provided seco-acid 13. The Doi-Takahashi route was
essentially similar, except that the statine unit 14 was an allyl ester, and the seco-
acid 15 had a free alcohol in place of the triisopropylsilyl (TIPS) protected 13.
12.8
Total Synthesis o f Depsipeptide HDAC Inhibitors - Macrocyclizationsand
Completion of the Synthesis
Scheme 12-6 Completion ofthe total syntheses of FK228 and FR901,375 by Mitsunobu
macrolactonization.
12.8 Macrocyclizations a n d Completion ofthe Synthesis I 711
0 \
0
15
OH (b) 12, MeOH/CH,CI,
67%
- -
Spiruchostatin A
16. epi-Spiruchostatin A
Scheme 12-7 Final stages in the Canesan and Doi-Takahashi total syntheses of
spiruchostatin A, and the structure of spiruchostatin A epimer 16.
12.9
The Biological Characterization o f Spiruchostatin A
As described above, the spiruchostatins were first isolated on the basis of their
ability to regulate gene expression in cell-based reporter assays. Nevertheless,
the close structural similarity to FK228 suggested that these natural products
were HDAC inhibitors. Following our total synthesis, we characterized in
detail the activity of spiruchostatin A as an HDAC inhibitor in various model
systems. Initial analysis [3G]demonstrated that spiruchostatin A was a potent
nanomolar growth inhibitor of MCF7 human breast cancer cells. An increase
was observed in histone acetylation and in p21cip1/waf1 promoter activity - two
characteristic cellular responses to HDAC inhibitors.
FK228 is believed to work by a prodrug mechanism involving intracellular
activation by reduction of the disulfide bond. We have obtained evidence [41]
that spiruchostatin A works in a fashion similar to in vitro enzyme inhibition
assays. In the presence of DDT, reduced spiruchostatin A inhibited total HeLa
cell HDAC activity with an ICso of approximately 2 nM. In the absence of
DIT, intact spiruchostatin A was essentially inactive. Another hallmark of
FK228 is its selectivity between HDAC isoforms. The Yoshida group has
investigated this with overexpressed HDACs containing an epitope tag that is
partially purified from cell lysates by immunoprecipitation using antibodies.
In this assay, spiruchostatin A was approximately 500-foldhigher in the activity
against the class I HDACl compared to the class I1 HDACG (Table 12-2).These
results show that FK228 and spiruchostatin A have similar characteristics and
mechanisms as HDAC inhibitors.
Trichostatin A 15.0 61
FK228 (with DDT) 4.0 790
Spiruchostatin A (with DDT) 0.6 360
12.9 The Biological Characterization ofSpiruchostatin A I 713
Spiruchostatin A FK228
.
A
MCF7
NHDF
1201 - m
A
MCF7
NHDF
2
cn
a,
g
40
20
4i\t-, 40
20
a
0 I I , I 7I
-2 - 1 0 1 2 3 4 -2 -1 0 1 2 3 4
Log dose [nM] Log dose [nM]
TSA SAHA
1 MCF7
A NHDF
- 2 - 1 0 1 2 3 4 - 2 - 1 0 1 2 3 4
Log dose [nM] Log dose [nM]
DMSo r-~ - - ~ ~
0 50
~ T
100
- -
150
~ ~
200
- T - ~ ~
250
- r ~ -
300
- - - ~ - ~ ~
Fold induction
Fig. 12-11 Induction of histone acetylation control (Co). (b) MCF7 cells were treated
by spiruchostatin A. (a) MCF7 cells were with indicated concentrations o f
treated with spiruchostatin A, reduced spiruchostatin A in the presence or absence
spiruchostatin A or spiruchostatin A in o f epi-spiruchostatin A. Histone acetylation
serum free media (SFM), all a t 15 n M for up and PCNA expression (loading control) was
t o 24 h. Untreated cells were analyzed as a analyzed by immunoblotting.
The kinetics of acetylation were not altered by culturing cells in the absence
of serum, suggesting that binding to serum proteins does not limit drug
action (Fig. 12-1l(a)).We also tested the effect of prereducing spiruchostatin
A before addition to cells. However, the kinetics of acetylation induced by
reduced and oxidized spiruchostatin A were essentially identical, suggesting
that intracellular reduction is not a rate-limiting step (Fig. 12-1l(a)).Finally,
we used the inactive epimer of spiruchostatin to investigate the potential
contribution of saturable transporters (Fig. 12-11(b)). We reasoned that this
chemically similar compound might compete for a putative transporter and
interfere with spiruchostatin A-induced acetylation. However, spiruchostatin
A-induced acetylation was equivalent in the presence or absence of its epimer.
Further studies are required to determine the factors that influence the kinetics
of action of depsipeptide HDAC inhibitors.
The significance of these findings for the clinical application of these
compounds is unclear. We and others have shown that transient histone
acetylation associated with “pulse” treatment of cells with hydroxamic acids
is not sufficient to promote G2M arrest. Consistent with this, it may be the
References 1 717
duration of histone acetylation rather than the peak levels that best predict
responses in individual patients in clinical trials. Therefore, the ability of
depsipeptide inhibitors to promote prolonged acetylation may be advantageous.
However, it may be necessary to maintain the circulating concentrations of
these compounds above a threshold for a considerable time before acetylation
is induced. A combination of a rapid acting hydroxamic acid HDAC inhibitor
and a long-lived depsipeptide HDAC inhibitor may provide a particularly
attractive combination.
References
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited bv Stuart L. Schreiber. Tamn M. Kauoor. and Gunther Wess
Copyright 02007 WILEY-VCH Verlag G k b H & Co. KGaA, Weinhelm
ISBN: 978-3-527-31150-7
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
I723
13
Chemical Informatics
13.1
Chemical Informatics
Paul A. Clemons
Outlook
13.1.1
Introduction: Cheminformatics and Chemical Space
13.1.2
General Considerations: Chemical Structure Graphs
Fig. 13.1-1 The concept of chemical space. computed and measured properties.
(a) Chemical structure as an input t o (c) Chemical space as a mathematical
operations producing numerical outputs. framework for comparing molecules, where
(b) Conceptual illustration ofa possible “distance” is related t o “dissimilarity”.
predictive relationship between arbitrary
13.1.3
History and Development: Computable Representations of Structure
Since the advent of modern computers, much attention has been paid to
methods to represent chemical structure in ways that are electronically
encodable. Such representations underlie most modern systems designed
to store and utilize chemical information, such as chemical documentation
using databases. Beginning in the mid-twentieth century, several methods
of encoding chemical information for machine processing were developed.
Chemical cipher notations had been introduced and refined by Gordon [17,
181, Dyson [19], Waldo [20, 211, and Wiswesser [22, 231, among others,
beginning in the late 1940s. In 1962, Bouman introduced one of the first
linear-cipher representations, a “linearly organized chemical code for use in
computer systems (Locus)”,whose representations of chemical structure are
recognizably ancestral to modern molecular line-entry notations (Fig. 13.1-2(a))
[24]. Significantly, one stated objective of Bouman was to reduce the chemical
knowledge required to use the system, allowing more of the coding work to be
done by machines or by chemically na’ive clerical stafF.
In 1964, Spialter introduced the “atom connectivity matrix (ACM)” in
an attempt to define algebraically a “characteristic polynomial” associated
Muir was well aware of “lock and key” theory of enzyme-substrate reactions,
. . .[and] I was conditioned to explain substituent effects in the electronic terms
of the Hammett equation.” Hansch et al. were considering different ways of
mathematically combining Hammett constants and partition coefficients to
reduce data variance in their models, and Fujita had initially suggested a linear
combination. Only later, when Hansch could “bring [himlself to postulate that
log (1/C) was not linearly but parabolically dependent on log P”, did they obtain
a generally useful relationship.
Hansch rationalized this relationship by saying that molecules that are highly
hydrophilic will not penetrate lipophilic barriers, while highly hydrophobic
molecules will be soaked up by the first lipophilic material they encounter;
either way, such molecules will have difficulty reaching their sites of action.
Thus, only molecules with intermediate lipophilicities will readily exert
biological influence. These insights represent groundbreaking thinking for
their time, and herald the modern age of QSAR. Currently, both linear and
nonlinear relationships between structure and activity are routinely considered,
and the effects of both electronic (polar) and hydrophobic interactions
are embedded within QSAR models. Such considerations allow generally
predictive models of activity based on small-molecule structures, at least
within congeneric series of molecules. Moreover, hydrophobicity, expressed
as the octanol-water partition coefficients (log P ) , has proven useful in
predicting various biological observations [37,40],and this property is now used
extensively in drug discovery and predictive toxicology [41, 421. The Hansch-
type approach that correlates physicochemical properties with activities using
multivariable regression techniques has subsequently been widely applied to
problem areas such as toxicity, enzyme inhibition, ligand-receptor binding,
carcinogenicity, mutagenesis, and metabolism [43],and the insights of Hansch
with respect to the interplay of hydrophobic and electronic parameters presage
decades of research into molecular descriptor analysis that continues to
this day.
where d is the shortest distance obtained by counting bonds between the two
atoms, and the sum is computed over all pairs of atoms in G .
Importantly, it has subsequently been shown that the Weiner index for
a molecular graph may have strong correlations with chemical properties
[49-511. Consequently, it is often the objective of synthetic efforts, particularly
in drug discovery optimization, to construct compounds with certain properties
by synthesizing lead compounds with a particular Weiner index. This strategy
is an important example of how computed properties (that correlate predictively
with desired properties) can be used to create new compounds that have certain
values; that is, they occupy certain regions of a chemical descriptor space [44].
Weiner also observed the following relation for molecules that have acyclic
graphs:
W(G) = n,(bond,IG)nz(bond,IG)
where the sum is computed over all bonds in G , and where nl(bondilG) and
nz(bondi1G) are the number of atoms lying on either side of a given bond
[14, 46, 481. This result can be conceptualized first by considering that large
contributions to the sum in the first definition of W will come from atoms near
the molecular perimeter, since these are more bonds removed, on average,
from most other atoms, whereas smaller contributions to the sum will come
from more central atoms (Fig. 13.1-3(a)).Since all pair-wise distances used
in the sum are obtained by counting bonds, the alternative calculation of W
involves considering the number of times each bond must be traversed to
account for all paths between pairs of atoms separated by at least that bond
(Fig. 13.1-3(b)).
Weiner’s work set the stage for one of the first true multidimensional
molecular descriptor spaces. In 1979, Randit et al. published the details of
a program, written in both BASIC and FORTRAN, which found all the
paths through a molecular skeleton represented using a molecular graph [52].
Though the total number of such paths increases rapidly with molecular size,
and especially with the number of rings in a molecule, even at the time of
first publication such path counting was a practical computing task for most
chemical structures.
The strategy of this approach was to develop a set of molecular codes
corresponding to the number of self-avoiding paths of each length in a
molecule, for use both as a convenient representation in subsequent similarity
searches [52-541, and as a quantitative measure of structural complexity.
Since the basic calculation method for these codes was again based on
counting bonds, it is easy to visualize how these path codes are related to the
Weiner index (Fig. 13.1-3).While these initial molecular codes did not address
73 Chemical fnformatics
734
I
Fig. 13.1-3 Topological indices and path side ofthe bond. (c) Illustration of a graph
counts. (a) Illustration o f path counting C as a molecular representation, and o f the
leading to topological indices; beginning relationship between Randit’s path coding
with the atom labeled 1, red bonds illustrate system [52] and the Weiner index.
paths of lengths 1 through 6, terminating (d) Illustration o f a graph C’ representing
with the atoms labeled with asterisks. Randit’s later attempts [53]to include bond
(b) Illustration ofWeiner’s observation [14, order in finding paths; note how this
481 that a bond, labeled with an asterisk, will modification breaks the symmetry o f this
be traversed 3 x 9 = 27 times to account for graph, requiring relabeling o f four atoms.
all paths between pairs of atoms on either
multiple bonds, Randit later published a second version of the program [53]
that enumerates paths in chemical graphs with multiple bonds (Fig. 13.1-3(d)).
In this case, both the input information and the algorithm are more complex,
and the numeric values of the codes could be much larger, especially in the
case of molecules with multiple double bonds, but this improvement was a
step closer to representing the chemical reality of bond order.
Randit’s methods allow the association of numerical parameters with
chemical structure in a way analogous to more detailed structural studies based
on numerical calculations derived from theoretical models (e.g., quantum
chemical calculations). The distinction between these two approaches is in
the nature of the parameters, rather than the goal, which in both cases
is to define correlative relationships between numerical computation and
I
13.1 Chemical ~nformatics 735
Fig. 13.1-4 Atom-pairs and topological space. (e) Topological torsions represent
torsions. (a) Illustration o f atom-types used the topologies of sets of four directly
in atom-pair descriptor calculation including connected atoms, using the same
atomic identities, pi-bonding, and molecular atom-types as atom-pairs; these encode
topology. (b) Distinct atom-types, some of local information only, whereas atom-pairs
which occur multiple times, make up the contain information about both local and
basic unit of atom-pair calculation. distant pair-wise relationships. Note how
(c) Atom-pairs are enumerated by the inclusion o f stereochemistry in
assembling the list ofall distinct pairs of topological torsions would increase the
atom-types and the path length connecting number o f distinct topological torsions in
them. (d) Distance metric defined by this molecule from 18 to 20 (gray ovals).
Carhart [73]in an atom-pair descriptor
13. 1 Chemical lnformatics I 737
have many, fewer than the theoretical maximum possible, atom-pairs (1/2
[n (n-l)]’ for a molecule with n atoms), both by virtue of having multiple
atoms of the same type, and because the order of the two atoms’ appearance
within an atom-pair is not important (Fig. 13.1-4(c)).A very significant aspect
of the definition of atom-types is that Carhart et al. provided both a distance
metric and a normalized similarity score for molecules based on the atom-pair
definition (Fig. 13.1-4(d)).Formally, such a provision is a requirement for any
metric descriptor space intended to afford a basis for comparison of molecular
738
I similarity or analysis of molecular diversity. Often, however, descriptors are
73 Chemical Informatics
)(;
2
S”+1
I=
S
where N is the principal quantum number, S is the number of connected
atoms other than hydrogen, and 8’ is the number of valence electrons not
involved in bonds to hydrogen (Fig. 13.1-5(a)).The intrinsic state aims to
encode the accessibility of an atom to intramolecular interaction as well as the
collection of bonds over which adjacent atoms may influence its state [83, 841.
Note that this definition provides identical resolution of structural elements as
the atom-types used for atom-pair and topological torsion calculation (compare
with Fig. 13.1-4(b)).Estates, however, modify the intrinsic state by accounting
for all influences between atoms using the formula:
1.33
E-state values now reflect the influences of neighboring atoms, and thus
discriminate atoms with quite similar environments as having at least slightly
different E-state values (Fig. 13.1-5(b)).One of the primary benefits of the
E-state description of molecules is its generality; the calculations proceed from
first principles and can produce, overall, a high-dimensional “state space”
into which each molecule is positioned. Indeed, Kier and Hall argue that
to “generalize any analysis of molecular description to large collections of
arbitrary structures, it is necessary to work in a mathematical framework
that accounts adequately for the number and type of descriptors necessary
to build a relatively complete description of chemical structure.” This and
similar methods allow for an encoding of such structural features as size,
branching, unsaturation, cyclicity, heteroatom content, etc., in quantitative
terms, and provide a framework for numerous structure-activity applications
[55, 56, 85-87].
13.1.4
Applications and Examples: Molecular Descriptor Spaces
outcomes. In this particular study, Patterson et al. used their method to validate
a number of individual descriptors and multidimensional descriptor spaces,
concluding that CoMFA fields, as well as two-dimensional (2-D) fingerprints
of the variable portions of the molecule series (each molecular descriptions of
high dimensionality), were most often possessed of neighborhood behavior.
Satisfactorily, later work using these concepts at Bristol-Myers-Squibb [ 1151
allowed for the prospective choice of molecules to synthesize that they were
significantly enriched in biological activity against angiotensin 11. In these later
studies, the topomer shape similarity description was once again shown to be a
highly effective predictor of activity, followed by the atom-pair description. For
this particular problem, most other descriptions did not exhibit the required
“neighborhood’ behavior.
Consistent with the results of Patterson, which allow large differences
in diversity descriptors to produce large variation in biological activity,
later work found that the use of “valid” molecular description methods
was more important than whether the test compounds used to inform the
prospective syntheses were weakly active or strongly active, suggesting that
this method should be a general way to approach lead optimization problems.
To generalize these conclusions with respect to chemical descriptor spaces,
especially notable is the better performance of two-dimensional fingerprints
of variable side-chains to whole-molecule two-dimensional fingerprints in the
original validation study [ 1141, suggesting that the highest dimensional space
relating to the variable portions ofthe molecules is desirable to use as a diversity
description. Intuitively, such descriptor spaces satisfactorily correspond to the
most information-rich description of the molecules under consideration.
Benigni et al. [11G] also compared different molecular description methods,
inspired by the study of global versus local properties of a molecular descriptor
space. Comparing a series of 148 structure keys, similar to those described
earlier, to a heterogeneous set of 37 one-dimensional (e.g., molecular weight),
two-dimensional (e.g., Weiner indices and E-states), and three-dimensional
(e.g., surface areas) molecular descriptors, Benigni et al. investigated a col-
lection of nearly 300 noncongeneric small molecules at both global and local
levels. Among the strengths of this approach was the authors’ clear distinction
between effects evident using local methods such as cluster analysis and effects
evident using global methods such as principal component analysis (PCA).
While cluster analysis techniques provided a detailed description of local struc-
ture within a chemical space, such as similarities between cluster members
and intercluster distances, factorial techniques, such as PCA, describe the en-
tire dataset in terms of a small number of orthogonal basis vectors. The authors
make use of this complementarity to show that the two descriptor spaces are
globally similar (isomorphic) as judged by the overall high mutual correlation
of their PCA transforms, and the progressive increase in this concordance
with increasing numbers of principal components (matched between the two
spaces to achieve similar levels of explanation of the overall variance). On the
other hand, cluster analysis, using k-means clustering and several choices of
748
I k, revealed that the structure-key description had much lower cluster propen-
13 Chemical fnformatics
sity (departure from a uniform population of the descriptor space) than did
the composite space composed of the one-dimensional, two-dimensional, and
three-dimensional descriptors. The authors suggest that this result can be ex-
plained by the much lower information density of the former space, composed
as it is from a series of binary features (presence or absence of predefined
structural features; see also Section 13.1.3.3) rather than from a collection of
discrete- or continuous-value variables. The generality of these results to ad-
ditional descriptor spaces will likely require additional experiments involving
many more compounds, but the conclusion that global isomorphism between
two descriptor spaces does not predict similarity in the fine structure between
those spaces is inescapable. The latter result has very important consequences
when considering the use of molecular descriptors in different computational
chemistry tasks. First, it suggests that any sufficiently information-rich repre-
sentation of chemical structure, whether composed of a large number of binary
variables (such as fingerprints) or composed of a smaller number of discrete-
or continuous-valued variables, is suitable for global analysis problems, such
as maximizing the overall diversity of a screening collection. On the other
hand, it suggests that the choice of descriptor space is quite important for local
problems such as lead exploration as envisioned in the neighborhood plots of
Patterson, or QSAR studies among members of congeneric series.
Rusinko et al. [117] reported an elegant method for feature (chemical
subspace) selection among binary descriptors using recursive partitioning.
The method requires that some measure of activity be recorded for the
compounds, but this activity figure can be qualitative. In this study, the
activities were simply 0, 1, 2, 3 , representing no activity, weak, moderate, or
strong activity. The authors' method uses sparse-matrix techniques to move
quickly through a very large set of descriptors and choose those descriptors
most responsible for discriminating active compounds from inactive ones.
The descriptors used were atom-pairs, topological torsions, and atom-triples,
computed for a group of 1650 monoamine oxidase (MAO) inhibitors. Using
the statistical T-test to find individual descriptors that accounted for large
differences in mean activities between the two groups, the authors achieved
15-fold enrichment (7/227) versus 72/3 5631 in inhibitors relative to random
selection. However, the false-negative and false-positive rates were both high,
since the method picked 220 other molecules that were not M A 0 inhibitors
and failed to find 65 M A 0 inhibitors in the dataset. The authors provide an
excellent discussion of the comparison of this method with other methods,
especially including those methods that fail badly when multiple mechanisms
of action are simultaneously operant in a dataset.
Also using chemical space as a framework, Agrafiotis [118] presented a
very fast method for diversity analysis on the basis of simple assumptions,
statistical sampling of outcomes, and principles of probability theory. This
method presumes that the optimal coverage of a chemical space is that
of uniform coverage. The central limit theorem of probability theory
73.7 Chemical Informatics 1 749
flexible substituents. The authors confront the apparent paradox that the
search for synthetic substitutes for natural compounds often proceeds by
making exactly the types of changes known to medicinal chemists to
result in weaker and less specific activities. Not surprisingly, actual drug
molecules occupy a region of chemical space overlapping with both natural
products and synthetic molecules, since some drugs come from each of these
sources. Here, the authors suggest complementing traditional “drug-like”
property filters (i.e., Lipinski’s “rule of 5” [40]) with “natural product-like’’
property filters in an effort to synthesize molecules sharing more features
in common with natural products, in hope of synthetically accessing a
potentially underpopulated portion of pharmacologically relevant chemical
space.
These examples provide a good survey of approaches to problems in
cheminformatics, which rely on molecular descriptors and the definition
of a molecular descriptor space. One take-home message underpinning
all of these studies is that in defining chemical similarity and diversity,
both the choices of objects (molecules) and attributes (descriptors) are
important in determining the outcome. Many of these studies also show
how advances in computer hardware and software have been brought
to bear to address large-scale problems not explicitly tractable even a
generation ago.
13.1.5
Future Development: Multidimensional Outcome Metrics
References
1. E.J. Corey, X.-M. Cheng, The logic of 7. D. Bonchev, D.H. Rouvray, Chemical
Chemical Synthesis, John Wiley, New Graph 7heory: Introduction and
York, 1989. Fundamentals, Abacus Press, New
2. I. Ugi, J. Bauer, K. Bley, A. Dengler, York, 1991.
A. Dietz, E. Fontain, B. Gruber, 8. J. McMurry, Organic Chemistry,
R. Herges, M. Knauer, K. Reitsam, Brooks/Cole Publisher, Pacific
N. Stein, Computer-assisted solution Grove, 1992.
of chemical problems - the historical 9. C.A. Russell, The History ofValency,
development and the present state of Humanities Press, New York.
the art of a new discipline of 1971.
chemistry, Angew. Chew., Int. Ed. 10. A. Cayley, On the mathematical
Engl. 1993,32,201-227. theory of isomers, Philos. Mag. 1874,
3. K. Zuse, Der Computer, Mein 47,444-446.
Lebenswerk, Springer, Berlin, New 11. J.J.Sylvester, Chemistry and algebra,
York, 1984. Nature 1877, 17, 284.
4. J. Lederberg, Topological mapping of 12. D. Vukicevic, A. Milicevic, S. Nikolic,
organic molecules, Proc. Natl. Acad. J. Sedlar, N. Trinajstic, Paths and
Sci. U.S.A. 1965,53, 134-139. walks in acyclic structures:
5. R.K. Lindsay, Applications ofArt$cial plerographs versus kenographs,
Intelligencefor Organic Chemistry: T h e ARKIVOC2005, x 33-44.
DENDRAL Project, McCraw-Hill 13. N. Biggs, E.K. Lloyd, R.J. Wilson,
Book, New York, 1980. Graph Theory 1736-1936, Clarendon
6. G.E. Vleduts, Concerning one system Press, Oxford [England], 1976.
of classification and codification of 14. 1. Gutman, D. Vidovic, L. Popovic,
organic reactions, If: Storage Retr. Graph representation of organic
1963, 1 , 117. molecules: Cayley’s plerograms vs
754
I 13 Chemical lnforrnatics
760
I 13 Chemical Informatics
13.2
WOMBAT and WOMBAT-PK Bioactivity Databases for Lead and Drug Discovery
Marius Olah, Ramona Rad, Liliana Ostopovici, A h a Bora, Nicoleta Hadaruga,
Dan Hadaruga, Ramona Moldovan, Adriana Fulias, Maria Mracec, and
Tudor I. Oprea
Outlook
13.2.1
Introduction: The WOMBAT Databases
indeed active - some are merely patent claims with no factual basis; (b) not all
chemotypes disclosed as active are equally active, or selective for that matter, on
the target(s)of choice; and (c) not all compounds sharing the same therapeutic
indication behave in the same manner with respect to, for example, side effects.
Some of these were considered at AstraZeneca R&D Molndal, Sweden, in
May 2001, to initiate a data-gathering project centered primarily on the Journal
ofMedicinal Chemistry (JMC),in collaboration with scientists at the Romanian
Academy Institute of Chemistry in Timisoara, Romania. The major goal of
this project was to capture chemical structures and the associated biological
activities disclosed in the JMC, with an initial goal of 20000 entries set for
the first year. The first version of this database was available at AstraZeneca
R&D Molndal in May 2002; this version contained 21 700 structures (with
duplicates), and 36 738 experimental activities on 324 targets, captured from
837 JMC papers (1996-1999).
Because the internal dissemination of this database within AstraZeneca
R&D (a company with 11 R&D sites across four continents) was not deemed
a success, AstraZeneca decided to discontinue the project as of May 2002.
Backed by private funding, the database, renamed World of Molecular BioAc-
Tivity (WOMBAT)in 2003, continued to evolve [13]as discussed for WOMBAT
2006.1, below. Recognizing the paucity of chemical databases that capture clin-
ical pharmacokinetics data in a searchable manner, we further developed the
WOMBAT-PK (WOMBAT-Pharmacokinetics),to index such data from litera-
ture [14].This chapter summarizes the contents of WOMBAT and WOMBAT-
PK [ 1S], some of the problems encountered in appropriately indexing biological
activities and correct chemical structures (with focus on machine-readable
contents for data mining), and provides some examples of data mining with
WOMBAT. Other bioactivity databases [ 161,focused mostly on patent literature,
are shown in Table 13.2-1together with the on-line references.
13.2.2
WOMBAT 2006.1: Overview
WOMBAT 2006.1 contains 154 236 entries (136 091 unique SMILES Simpli-
fied Molecular Input Line Entry System [17, 18]),covering 6801 series from
over 6791 papers with more than 307 700 activities for 1320 unique targets.
All biological activities are automatically converted to the - log,, of the molar
concentration, regardless of activity type. Numerical values for activity are
stored in three fields; the additional two fields capture the experimental error,
when reported']. Besides exact numeric values (the vast majority), WOMBAT
Fig. 13.2-2 Target type distribution pie representation of each activity category:
charts in WOMBAT 2006.1, classified by inactives, 2%; low activity (0-6), 18%;
activity value (in the - log,o scale). The size medium activity (6-8), 41%; and high
of the pie chart is proportional to the activity (8-14.4), 40%.
") [Entries indicate the number of structures of 100 n M or better; percentage values relate
recorded for each target class, whereas "ac- to the total number of entries]
tives" indicate those entries with an activity
Table 13.2-3 Medicinal chemistry publications covered in
13.2 WOMBATand WOMBAT-PK
I 765
WOM BAT.2006.1
I ROOT I
:- ,. ..~..~.~~
~ ~....~..
SMDLID
,... .
~ . ...
~,
~~~~ . . ~. . .
entry identifier
....... ~ ~~~~ ~...~~...~.
+! SID series identifier (related to the references database)
ference
..... . ~.~~
~
i ~ . ~ I
short bibliografic reference
~ ~~~~ . .. ~.
. . .
~
t-.+
T a r g e thlame target name
S w i s s P r ot I D SwissProt I D / A N &species
panel (Fig. 13.2-4) provides bioactivity types and values, some basic target
information, the minimal reference information as well as structural, chemical
(2D depiction and SMILES code), and related information (chirality, salt). The
Target and Biological Infomation panel (Fig. 13.2-5) provides detailed target
information, including biological information (species, tissue, etc.), detailed
target and target class information (including hierarchical classification for
G-protein coupled receptors, nuclear hormone receptors, and enzymes) as
well as further information regarding the bioassays (radioligand, assay type,
etc.). SwissProt [20] reference IDS are stored for most targets (-88%). The
Computed Chemical Properties panel (Fig. 13.2-6) includes several calculated
and experimental properties for each chemical structure, for example, counts
of miscellaneous atom types, Lipinski’s rule-of-five (Ro5) parameters [21]
(including the calculated octanol/water partition coefficient), ClogP [22] and
Tetko’s calculated water solubility [23],polar surface areas (PSAs)and nonpolar
surface areas (NPSAs), and so on. Finally, the Reference Database contains
bibliographic information (Fig. 13.2-7),including the Digital Object Identifier
13.2 WOMBATand WOMBAT-PK
I 767
(DOI) format [24] with URL links to pdf files for all literature entries, as well
as the PubMed ID for each paper.
13.2.3
WOMBAT Database Structure
WOMBAT is a dynamic database, which evolves as new data types are included.
The database structure is, however, preserved as much as possible from one
release to the next. Each root record (or WOMBAT entry) is identified by a
unique number (SMDLID),and is defined by the combination of one chemical
structure and one or more associated biological activities as entered in one
publication (Fig. 13.2-3). One field, series identifier (SID), links all the root
records indexed from one reference (article). There are 6801 SID values in
WOMBAT 2006.1 (see also Fig. 13.2-7). At the root level, information about
the bibliographic reference (unique SID) from which the entry originated the
entry is recorded together with various properties (illustrated in Fig. 13.2-6).
Separate keywords describe structural characteristics, related to stereochem-
istry (e.g., absolute, relative, f,R/S, ‘non-chiral’ or racemic) and to the salt
768
I 13 Chemical Informatics
see also Fig.13.2-3. We record the salt separately to avoid the salt-
I 769
13.2.4
WOMBAT Quality Control
(1R,2S,3S,5S)-8-methyl-3-phenyl
-2-propyl-8-azabicyclo[3.2.1IoctaneA.
OH OH
Error: ‘Stereo bonds are only allowed (2R,3R,4S,5R)-2-(6-amino-9H-purin-9-yl)-
between chiral and achiral atoms’ 5-(Rgroup)-tetrahydrofuran-3,4-diol
this with ACDName [27] on the structures depicted in Fig. 13.2-8: The
software does not perceive two stereo centers for the tropane ring on
the left side and returns an error for the sugar structure. The errors
are not specific to ACDName - this program is used only to illustrate
the problem. Another type of problem in structure-conversion is the cross
up/down wedge error, when two such bonds emerge from the same chiral
center (Fig. 13.2-8): Software cannot assign the proper chirality, since by
convention three atoms are in the ‘paper plane’, and only one is ‘wedged’
(up or down); two wedged bonds are simply not possible according to the
convention. Most of these errors can be corrected by checking previous
literature. Sometimes, even the cited reference may turn out to be an
error, for example, the reported MW is not consistent with the drawn, or
named, structure.
From a quality control standpoint, the assignment of the SwissProt ID
for each target can be a challenge, as publications do not always specify the
exact target used in an assay. In some instances, the species from which the
target was isolated is not explicitly mentioned, whereas some publications
do not mention what target subtype was used. For example, there are 1780
entries in WOMBAT 2006.1 that contain ‘estrogen receptor’ (ER) in the target
name, which implies that ERs present in a particular organ (e.g., uterus,
breast, brain) were tested for binding, agonism or antagonism. Of these, 1201
entries were annotated for a specific receptor subtype, either E R a or ERP, or
‘3A1’ and ‘3A2’ according to the nuclear receptor nomenclature [28]. For the
remaining 579 entries, a target could not specifically be assigned to a single
SwissProt ID.
This begs the question of storing multiple SwissProt ID values when a
mixture of targets is present. This situation is common for integrin receptors
that have the two protein chains separately defined in SwissProt. In the
ER example, 114 of the 579 entries were tested on MCF7 cells; however, it
is now clear that a third ER, GPR30 [29], could be present in MCF7 cells
[30]. Therefore, the observed anti-estrogenic activities for these 114 entries
should be questioned in the light of this new information; should three such
receptors be encoded? It further illustrates the dynamic nature of biological
targets: As biologists uncover more information about a particular target or
class of targets, and as our understanding about each target evolves, the
exact nomenclature changes as well. For example, there are 852 entries in
WOMBAT 2006.1 that contain ‘VEGFR-2’ as the target name: This target
name stands for the vascular endothelial growth factor receptor subtype 2,
but was previously known as ‘Flk-l/KDR’,or ‘fetal liver kinase-1’ and ‘kinase
insert domain-containing receptor’. The VEGFR-2 name is present in all
852 entries, even though some of the older (before 1999) publications did
not refer to this target by the VEGFR-2 name. In an annotated database
such as WOMBAT, one has to monitor and update not only changes
related to biology but also changes related to chemistry (and chemical
772
I errors), discussed
13 Chemical informatics
in more detail below. Practical applications based on
WOMBAT data mining using targets [31] and descriptors [32] have been
described.
13.2.5
Uncovering Errors From Literature
Example 2. In Table l b of Ref. 37, page 4361, the core structure contains an
oxygen atom instead of the correct sulfur atom [38]:
13.2 WOMBATand WOMBAT-PK
I 773
wrong correct
C34H42N607S2 C33H40N607S2
k
%&
Compound identifiers Merck Index structure Correct structure
and error description
MG30, anagyrine (CAS#
486-89-5):chiral center -
M 1854, carisoprodol
(CAS# 78-44-4):
+fOOH
completely different
structure. All other HNY--NH2
0
information about
M1854 is correct (name,
formula and molecular
weight). The formula is
correct in the ninth
edition of the Merck
Index
The examples from SciFinder and the Merck Index are not intended to
question the quality of these products, which we consider to be outstanding.
They are invaluable resources to many chemists worldwide, and the error rate
in these two databases is insignificant if one takes into account the enormous
volume of indexed data. We have published a structure-activity paper on
HIV-protease inhibitors [41] in which a modified peptide was present in both
the training set, and the test set. A1 Leo of Pomona College has recently [42]
detected 100 chemical and name errors in the printed version of the sixth
edition of Burger's Medicinal Chemistry [43],errors that are to be corrected in
the on-line edition [44].One can never be too careful in verifying the available
information, in particular if one is to invest a significant amount of resources
in that area.
13.2.6
13.2 WOMBATand WOMBAT-PK
I 775
As PK data has become more important during lead discovery and evaluation,
we screened the clinical pharmacokinetics literature and developed a chemical
database that captures such data in numerical searchable format (WOMBAT-
PK). Its organization is illustrated in Figs. 13.2-9-11, which illustrate three
of the 4 panels of the database: The Compound Description panel (Fig. 13.2-9)
provides the drug marketed names, some physico-chemical characteristics,
as well as structural, chemical (2D depiction and SMILES code), and related
information (chirality, salt). The Phamacokinetic Data panel (Fig. 13.2-10)
provides the drug target information, and multiple PK and Tox parameters,
indexed in both numerical and text form. The third panel, Potential Side
Efects, captures data for BBB (blood-brain barrier) permeability, cardiac
toxicity data, possibly related to hERG (human ether-a-go-go potassium
channel 1) bioactivity, in vitro bioactivities from WOMBAT, as well as
mammalian tox data (e.g., the lethal dose 50%, LD50). The fourth panel,
Computed Chemical Properties panel, is identical to the one in WOMBAT
(see Fig. 13.2-6). The 2006 release of WOMBAT-PK contains 900 marketed
drugs (in rare cases, some are metabolites) with documented PK and Tox
properties.
Currently indexed PK, Tox, and physico-chemical properties data are
summarized in Table 13.2-4. The top nine properties were captured from
the following sources: Goodman 8 Gilman's ninth edition [45] (GSrG),
Avery's fourth edition [46] (Av), and the Physician Desk Ref. 11 (PDR). FDA's
Center for Drug Evaluation and Research website [47] was consulted for
FDA-approved drug labels. Other resources (e.g., Google'") were sometimes
used to compile the WOMBAT-PK database. The maximum recommended
therapeutic dose [48](MRTD)is available from the FDA [49],whereas MRTD-U
(MRTD corrected for the fraction-unbound) was determined by using the
percentage plasma protein binding (%PPB)data already indexed in WOMBAT-
PK. Thus, MRTD-U = MRTD x (1 - %PPB), and is available for 498 drugs.
Experimental LogD7.4and LogP values from compilation tables [SO] and from
the Sangster database [Sl], and pK, values from Avery [46] and the Merck
Index [25] were collected for these drugs. In WOMBAT-PK, drug targets
are assigned to 753 drugs (of these, 97% have SwissProt IDS), whereas the
phase I metabolizing enzymes (all with SwissProt IDS) are recorded for
13.2 WOMBATand WOMBAT-PK
I 777
419 entries. Regarding cardiac toxicity, there are 218 drugs indexed for QT-
prolongation (a clinical observation based on the ECG, the electrocardiogram),
89 for Torsade de Pointes risk (another ECG signal), and 71 with hERG
binding data. Curating clinical PK data requires individual examination
[52], and sources such as Goodman & Gilman’s are often considered more
reliable.
Often, such experimental values are “greater than” or “less than” a given cut-
off value. A systematic round-off procedure was implemented, whereby < 5” “
13.2.7
Datamining With WOMBAT
MW < 300, ClogP < 3, number of hydrogen bond donors and acceptors 5 3,
flexible bonds 5 3, and PSA 5 60 A’. Using these criteria, WOMBAT 2006.1
returns 6607 entries. Of these, 2001 entries contain at least one biological
activity better than, or equal to 100 nM, and 543 of these contain a generic
name. This usually means that they are either launched drugs, or natural
products, or otherwise in an advanced stage of development. The examples
given in Fig. 13.2-13 illustrate the chemotype, target, and activity diversities
that can be found in rule-of-three compliant molecules: Neurotransmitter
and nuclear hormone receptor agonists (EC50) and antagonists (Ki, ICso, and
A’), neurotransmitter transporters, as well as enzyme inhibitors are present,
most of them with multiple activities. On the basis of the WOMBAT 2006.1
entries, it appears that there are a number of interesting chemotypes that
are rule-of-three compliant. Such cheminformatics-based mining can identify
target-specific small molecules for fragment library design [63].
13.2.8
Conclusions and Future Challenges
FH3
&CH H -
H , CH . N i o q N
H,C.‘ C H ’
C C H 3 /
0
Quinpirole Physostigmine
MW = 219.33 MW = 275.35 Norethindrone
ClogP = 2.02 ClogP = 1.95 MW = 298.43
EC, = 8.66 (D,) IC, = 9.16 (AChE) ClogP = 2.78
K, high = 8.80 (D4) EC, = 8.66 (PR,)
IC, = 8.09 (BChE)
K, low = 7.31 (D,) K, = 8.73 (PR,)
K, high = 7.62 (D3)
K, (OW = 6.38 (D,) CH3 CH3
9
H3C RTI-110
HO &
Morphine
OH
Ondansetron MW = 279.77
ClogP = 3.12 MW = 285.35
MW = 293.37
IC, = 9.21 (DAT) ClogP = 0.57
ClogP = 2.71
K, = 8.2 (H3) IC, = 8.38 (NET) K, = 9.3 (P,)
K, < 6.0 (H,) IC, = 8.26 (5-HTT) K, = 8.6 012)
K, = 9.1 1 (5-HT,) K, = 6.55 (6)
A, = 9.9 (5-HT4) K, = 7.31 (k,)
K, = 7.48 (k,)
FH3
O w N H
CH, z
5-OMe-a-Me-Tryptamine
MW = 204.27 LY-191704 H
ClogP = 1.75 MW = 249.74 SU-5416
K, = 8.66 (5-HT2,) ClogP = 2.82
MW = 238.29
K, = 8.08 (5-HT2,) IC, = 8.07 ( 5 ~ - R 1 ) ClogP = 2.83
K, = 9.0 (5-HT2,) IC, = 5.76 (5a-R2) IC, = 8.1 (Flt-I)
Acknowledgments
The authors thank Prof. Hugo Kubinyi (Heidelberg, Germany) for suggestions.
References
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag CmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim I789
14
Chemical Biology and Drug Discovery
14.1
Managerial Challenges in implementing Chemical Biology Platforms
Frank L. Douglas
14.1.1
introduction
This chapter will present the experiences and perspectives that led to the
creation of a concept named Chemical Biology Platform (CBP). CBPs embrace
the modern day version of the “drug discoverer” and the management
challenges associated with innovation. The management challenges are largely
due to the complexity and marked increase in quantity of information about
chemical structures, disease targets, and pathophysiology, as well as the
pharmacology studies in disease models and patient subpopulations.
Currently, management must also address the additional complexity of
mergers, which also affects information integration and organizational
collaboration. The challenge of accessing and correlating information
generated by the partners in the merger is often underestimated. Perhaps,
even more challenging is the attempt to build a culture for the newly merged
company in which scientists from different countries and organizations
share information, collaborate, determine global standards, and leverage both
tacit and explicit knowledge. The discussion will therefore focus on both
the scientific and cultural underpinnings of CBPs within an organizational
context.
14.1.2
The Management Challenge
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Giinther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
790
I development to approval. Since, not surprisingly, the probability of success
74 Chemical Biology and Drug Discovery
Probability
of
success
0 TI LI 1 LO PR GRAMS 15 years
IND/CTX
Discovery and development time
14.1.3
Observation-based Discovery Background
his further pursuit of the observations [4]. In 1928, Sir Alexander Fleming
observed that a species of the mold Penicillium had inhibited the growth of
Staphylococcus aureus in a culture.
Like a true drug discoverer, however, Sir Alexander Fleming, having
discovered lysozyme in 1922, sensed the importance of his serendipitous
observation and pursued it. His tireless enthusiasm for and presentation
of his work on penicillin finally won the interest of Drs. Cecil Paine,
Howard Florey, and Ernest Chain. They were able to demonstrate the medical
potential of penicillin in individual infected cases, as well as succeed in
extracting “purified” drug in about 1940. Between 1940 and 1942, efforts
were successfully focused on the challenge of optimizing the production of
penicillin. Seventeen years later, John Sheehan of Massachusetts Institute of
Technology achieved a total synthesis of natural penicillin [S].
Thus, the penicillin story demonstrates a history similar to that of aspirin.
As in the case of aspirin, a target was serendipitously recognized from an in
vitro observation and there was a simultaneous proof of presence of an active
ingredient or compound. Drug discovery was thereafter focused on isolating
and synthesizing the active ingredient while pharmacological experiments
were performed in parallel. In the case of aspirin, the discovery of penicillin
is another case in which one started with a validated, unidentified target and
an active unidentified, unoptimized drug. Progress was accelerated when the
structure of penicillin was solved and its mechanism in inhibiting the cross-
linking of peptidoglycan was identified [GI.This discovery led to a number of
semi- and synthetic penicillins and cephalosporins, both based on the j3-lactam
structure that inhibited the enzyme that forms the peptidoglycan structure of
the cell walls of bacteria.
14.1.4
Mechanism-based Discovery Background
14.1.5
Twenty-first Century Experience: Ketek (Novel Anti-infective Drug in 2003)
In our own experience at Aventis with Ketek, we could go rapidly from concept
to regulatory submission, because the in vitro biological models existed. The
models rapidly validated (a) its antibacterial activities and (b) the binding at
two sites on the 23s rRNA of the 50s ribosomal subunit which made it
effective against penicillin-resistant Streptococcus pneurnoniae [8].Secondly, an
understanding of the drug’s metabolism enabled targeted clinical studies to
evaluate any potential liabilities with respect to liver side effects or QT,. Thus,
the POS was high due to the extensive knowledge in the antibiotic arena and
expertise in QT, that existed in Hoechst Marion Roussel where it could be
leveraged during the discovery and development of Ketek. This was the case
of a validated target but unoptimized compound (Fig. 14.1-1).Ketek was also a
second compound in the series, as the first compound was terminated because
of liver side effects.
The above examples satisfy the Sir James Black criteria for selecting projects
with a high initial POS. Sir James Black’s advice was:
1. Start with a clinical problem.
2. Identify the controlling chemicals or hormones in the
system.
3 . Start at the most basic molecular level and test similar
molecules for in vitro activity [9].
The three points mentioned above were clearly observed in the discovery
of Enbrel. In this case, a fusion protein consisting of soluble p75-TNF (tumor
necrosis factor) receptor type 11 and the F, protein of human IgG receptor
was the “chemical” of interest. This approach was very clever in that Craig
Smith and Raymond Goodwin proposed that injecting a soluble TNF receptor
would assist in binding the excess TNF, which on interacting with its receptor
on the cell triggers the inflammatory process in rheumatoid arthritis patients.
The excess circulating TNFa, was the identified and somewhat validated target.
This cytokine plays a critical role in synovial proliferation. The technical
optimization step was the cloning and expressing of the TNF receptor. And
as in the earlier case of propranolol, an animal model existed, namely,
14. I Managerial Challenges in Implementing Chemical Biology Platforms I 795
14.1.6
Observation Summary and Future Application
The above examples reveal the following characteristics for an enhanced POS:
1. degree of validation of the target
2. optimization of leads
3 . ability to link optimization of lead with in vivo validation
of target
4. ability to test early in humans, particularly with aid of
biomarkers
5. rapid prototyping through leveraging of knowledge
generated from previous, relevant studies.
initial work in clinical phase 111 is underway, and only at the conclusion
of phase 111 are the data available to determine whether the target is valid and
relevant.
The upper curve is the best-case scenario. Here the target is not only
identified but also validated. In addition, the biological structure is known and
as a result one can start with rational drug design and de nouo synthesis. Here,
the time to LI is shortest. At the very outset of the project, the POS is very high,
both because the target is validated and there is structural information that
enables rapid lead finding, optimization, and prototyping. This situation is
approached when one is working on follow-on or next generation compounds
for a drug that is already in the market, and has a clear mechanism of action
or target.
The genomic age presents a significant opportunity to rapidly generate infor-
mation and approximate the upper or common mechanism curve. Genomics,
proteomics, metabolomics, pharmacogenomics, and bioinformatics will bear
fruit when two additional disciplines mature. These disciplines are the struc-
tural biology and the application of knowledge management to families of
targets such as kinases, proteases, ion channels, and G-protein coupled re-
ceptors (GPCRs).This will enable prediction and generation of SARs in silico,
which is the hope and future of CBPs.
14.1.7
Establishment o f Organizational Structures for Chemical Biology Platforms
each project. The members of project teams also benefited from the knowledge
that existed in their disciplines, as they could bring the expertise of their
colleagues to any challenge.
In 1999, during another set of discussions on how to best share knowledge
across project teams in different sites, we discerned several key points. First,
we had 54 projects with kinases as targets. These projects were focused on
inflammatory diseases, cancer, and central nervous system disorders and
existed in all three sites. Secondly, there were no organized mechanisms to
foster communication or knowledge sharing among the scientists.
A third revelation was that there were some common problems, for example,
the toxicity of lead compounds against kinase targets; or the need to develop
biased libraries of compounds to enhance “hit” finding; or lack of structural
information about the specific kinase enzymes.
A fourth revelation was that, although we had made significant progress
in DMPK, we were still dramatically losing compounds in man because of
safety issues. However, sharing of knowledge among the DMPK scientists did
contribute positively to the improvement in attrition rate due to poor DMPK
characteristics.
Another reality was that 60% of the 200 top selling drugs came from four
classes of mechanisms, namely, GPCRs, proteases, kinases, and ion channels
and transporters.
798
I 14 Chemical Biology and Drug Discovery
Finally, there was the recognition that the strategies used to find leads
were related to the amount of information we had about the structure of the
target. Thus the more knowledge available, the less time was needed to find
a lead compound. In fact, the strategies used to find lead compounds were in
decreasing order; de novo synthesis, virtual screening, focused screening, and
high-throughput screening, depending on the extent of knowledge available.
A focus on understanding the structure of the target to identify the spatial and
energy requirements of the potential agonist or inhibitor was a clear need.
The anticipated deciphering of the human genome was seen as the event
that would catalyze the ability to elucidate the structure of targets and further
enable rational drug design.
14.1.8
Chemical Biology Platforms (CBP)
In 2000, I introduced the Kinase Chemical Biology Platform that was the
first of our four CBPs. The initial step was to identify all scientists across
the company (now Aventis) with expertise and interest in kinases. The survey
yielded about 300 scientists, many of whom were actively involved in kinase
projects. We created a Kinase Community of Practice with these scientists as
members and used knowledge mail to facilitate communication, exchange,
and development of the kinase network.
The second step was the establishment of the Platform. There were two
key principles in establishing the CBP. First, (a) no changes in the DI&A
basic organizational structure and (b) the goal of the Platform was to facilitate
knowledge transfer to enable simultaneous drug discovery. (Simultaneous
drug discovery meant anticipating the critical issues and working on them in
a parallel rather than sequential fashion.)
A CBP core team was appointed and given a charter. This team consisted of
senior scientists who were respected by their peers. Each represented one of the
following disciplines: medicinal chemistry, computational chemistry, struc-
tural biology, molecular biology, toxicology, DMPK, clinical pharmacology,
and IT. A knowledge management specialist was assigned to the CBP.
The overall responsibility of each CBP core team was to:
leverage globally the target family knowledge across projects
independent of disease focus and priorities of each site;
improve Aventis’ target family compound collections (focused
libraries)
develop and apply the concept “all target compounds see all
targets of a family”;
develop target family-specific predictive models and tools
use external networks of experts in the field
Each member of the CBP core team was expected to convene a small team of
individuals from hislher discipline, who were active members of project teams
within the same target family. These CBP strategy teams, as they were called,
identified problems that were common to several project teams and developed
strategies to solve them. Sometimes this involved engaging academic experts
to assist in the resolution. The results and “learnings” were shared with all
interested scientists (Fig. 14.1-3).
The responsibility ofthe core team was to discuss issues being pursued by the
strategy teams, identify the downstream implications for their individual areas,
and to look for “breakthrough” solutions or new methods of solving problems.
Areas of particular interest included use of structural biology information,
strategies for designing focused libraries, and identification of biomarkers.
thus, the Protease Chemical Biology Platform with Hans Peter as head was
launched.
Shortly thereafter, a total of four chemical biology platforms: kinase (CBK)
led by Dr Andreas Batzer, protease (CBP) led by Dr Hans Peter Nestler, ion
channels and transformers (CBICT) led by Dr Heiner Glombik, and G-protein
coupled receptors (CBG) led by Dr Bruce Baron, were in operation. Thus,
within 18 months of my describing CBPs in my keynote address at IBC Drug
Discovery Conference in Boston in 2000, four CBPs were functioning.
Incidentally, this conference was very significant because the other keynote
address was delivered by Dr Craig Venter, who described the challenges
of deciphering the human genome. The next address was mine and it
acknowledged that, due to this incredible achievement that was led by
Dr Venter and Dr Francis Collins, one would be able to think in terms of target
families and develop knowledge about both structure and pathophysiology
more rapidly. The deciphering of the genome was critical to the application of
CBPs in industry.
14.1.9
Other Organizational and Knowledge Challenges
The desire to correlate information across projects and sites disclosed a critical
barrier. As a consequence of mergers or groups working independently, such
as in business unit structures with a single company, there was a lack of
standardization of assays, connectivity of databases, annotation of data, and
hence, we were unable to leverage knowledge or data. Thus, the correlation of
chemical and biological data was very difficult. We therefore launched, with
the help of a small team from McKinsey & Company, a program to establish
an informatics platform to support the CBPs. The goals of this effort included:
Provision of a curated, standardized, central repository to
enable rapid querying and retrieval of diverse, accurate
biological data (e.g., sequence similarity, expression, disease
association).
Knowledge-based establishment of correlations between
chemical space (compounds, hits, leads, etc.) and biological
space (e.g., target sequence and target 3D structure, as well as
ADMET data).
Ability to increase POS of the selected portfolio of projects by
selecting groups of targets with similar biological properties.
Identification of additional predictive and simulation tools to
leverage curated data, for example, ADM ET (absorption,
distribution, metabolism, elimination and toxicology). Rapid
identification of “privileged fragments” that lead to selection
of compounds of high interest for a specific target.
802
I 14 Chemical Biology and Drug Discovery
The overall hope was that the IT platform would not only improve
communications among the scientists but lead to increased correlations and
serendipitous findings.
14.1.1 0
Conclusion
Source: CBK
References I803
mechanism projects across sites. External networks were under way and the
early results of the experiment were encouraging. I would recommend further
evaluation of this organizational approach to improve productivity in the
biopharmaceutical industry, and of the attempts made to quantify the results
to determine organizational benefits.
References
804
I 14 Chemical Biology and Drug Discovery
14.2
The Molecular Basis o f Predicting Druggability
Bissan Al-Lazikani,Anna Gaulton, Gaia Paolini, Jerry Lanfear, John Overington,
and Andrew Hopkins
14.2.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GrnbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
14.2.2
14.2 The Molecular Basis of Predicting Druggability
I 805
14.2.3
Molecular Recognition is the Basis for Druggability
14.2.4
Estimatingthe Size of the Druggable Genome
Gene family Redundant Ro5 Redundant Redundant Ro5 Redundant Nonredundant Ro5 Nonredundant
ortholog targets ortholog ortholog ortholog human human targets
(all species) targets (all species) mammalian mammalian targets (10 p M
t 1 0 KM t10 pM targets <10 p M targets t 1 0 p M t10 p M
Aminergic GPCRs 71 71 61 61 34 34
Aspartyl proteases 10 4 9 4 7 3
Cysteine proteases 20 18 19 17 16 14
Enzymes - others 149 117 131 104 102 81
GPCRs class A - others 59 47 49 38 35 30 ...,
A
GPCRs class B 12 7 10 5 5 2 i
u
GPCRs class C 20 20 19 19 10 10
54 44 46 37 34 28
2
Hydrolases
Ion channels - ligand gated 52 42 47 37 26 20 I
0
Ion channels - others 20 18 18 16 14 12 n
Kinases - others 11 8 11 8 7 6 2,
>
Metalloproteases 60 56 53 50 41 39 m
Nuclear hormone receptors 45 33 33 26 22 19 9
z.
Others 188 144 146 109 108 79
Oxidoreductases 67 63 62 58 39 37 %2
PDEs 15 13 15 13 11 11 a
Peptide GPCRs 99 72 80 59 52 42 s.
Protein kinases 101 90 87 78 75 66 09
D
Serine proteases 34 30 34 30 27 24 2
Transferases 68 46 57 39 42 30
-...
9c
Total 1155 943 987 808 707 587 -_
r"
-
814
I 14 Chemical Biology and Drug Discovery
database, doubles over in size the number of identified proteins with existing
lead matter.
Using this larger database of drug targets, which show some precedent
of modulation by small-molecule leads or drugs, we attempted to estimate
the size of the potential druggable genome based on a homology of known
drug targets. The underlying assumption in this analysis is that if one gene
family member has shown the propensity to selectively bind small molecule
modulates, other members of the gene family may significantly contain
physical-chemical and architectural properties that are also likely to bind
druglike small molecules. Proteins that have a similar sequence are generally
likely to share very similar three dimensional properties and perform similar
or related functions. If a protein therefore has a high degree of sequence
similarity to the target of a drug (or other protein that is known to be
14.2 The Mo/ecu/ar Basis of Predicting Druggability
I 815
Fig. 14.2-4 Proportion oftargets with leads observed with at least one rule-of-five
compliant compound within each gene family.
database, identified a 2921 protein sequence within the same sequence identity
cut-offs.
In addition to using a sequence homology approach, we also approached the
problem of identifying the druggable subset of the human proteome using a
feature-based Bayesian method.
algorithm was a trained set against a test set of 400 protein complexes binding
small-molecule, rule-of-five compliant ligands. From this analysis, a decision
tree was derived to predict the druggability of a binding site or cavity from
calculated physicochemical properties. The decision tree predicts whether a
cavity is druggable within the statistical confidence levels of the tree. This
method has demonstrated a91% success rate when predicting druggability on
the protein drug targets (of oral drugs as defined in Inpharmatica’s Drugstore
database of approved drugs). The method requires either an experimentally
derived structure or a high quality homology model. Ideally, because of the
inherent flexibility of many protein-ligand-binding sites, a sample of multiple
conformations is preferred. The method is scalable to be employed on the
entire PDB (December 2004 release). By removing short peptides, 27 409 files
were suitable for analysis, which were further classified into 76 322 structural
domains using SCOP [32] and DISCO base; of which 28% (21 522) of the
structural domains were found to have at least one site predicted, to some
degree, to be druggable. Because of the high redundancy in the PDB and the
high number of ligand-protein complexes reduced to a nonredundant set of
human targets, 427 proteins were predicted to contain a druggable-binding
site, with 281 of these proteins having no prior known compounds or drugs
developed against those targets. Structure-based druggability algorithms could
be automatically applied to continuously assess the stream of novel structures
determined by the structural genomic initiatives.
Combining a nonredundant set of genes from all the following methods:
current targets of approved drugs;
current targets of chemical lead or chemical tool;
sequence homology to current drug targets;
sequence homology to current chemical lead targets;
feature-based sequence probability prediction;
structure-based prediction;
sequence homology to structure-based prediction,
that were outlined earlier we can identify a total of 3505 unique genes that
are predicted with first- and second-order evidence and with high confidence
level to encode small-molecule druggable proteins of which only 170 are the
primary human targets for marketed drugs (Table 14.2-3).The results of this
combined analysis concur with the previous result estimated by Hopkins and
Groom [2] which shows that approximately 14% of the human genome could
be inferred to be potentially druggable.
14.2.5
How Many Drug Targets are Accessible to Protein Therapeutics?
If, in our explorations, the proportion of the protein targets expressed by the
human genome accessible to modulation by high affinity to druglike small
818
I 74 Chemical Biology and Drug Discovery
molecules is limited how much larger is the universe for drug targets if we
expand our investigations to include targets of protein therapeutics such as
antibodies and recombinant biologicals? At the time of writing, approved
antibody therapeutics were known to act on 15 human targets whilst in total
all biological drugs in the pharmacopeia currently work via 59 modes of
action. Because of the inherently lower toxicity observed for fully humanized
antibodies and the rising rate of biological approvals, it has been argued that
antibodies may soon overtake NCE approvals [ 3 3 ] . Interestingly, it has also
been observed by studying rates of attrition that antibodies acting against novel
modes of action often show a higher chance of success in phase I1 clinical
studies than small-molecule drugs acting on mechanisms of precedence
[34-361. Thus, we attempted to estimate how many targets are accessible to
biological drugs as the targets of antibody therapies. Other criteria, such as
antigenicity are also important in developing inhibitory antibodies. However,
these have not been considered in this analysis, as they are not common to
both antibody and other protein drugs.
To estimate the number of genes expressing products that could be accessible
to antibody therapeutics, we assume that proteins are required to be located
in the extracellular matrix. We also assume that the extracellular location
is the union of secreted and transmembrane sets of proteins. Where the
extracellular location is known, this is often included in Swiss-Prot and gene
ontology (GO) [37] database annotation for the protein. Secreted proteins
can be predicted by the presence of a signal peptide whilst transmembrane
14.2 The Molecular Basis of Predicting Druggability 1 819
14.2.6
Conclusion
Fig. 14.2-5 Gene family distributions (a) small-molecule druggable genome (b) protein
therapeutics.
14.2 The Molecular Basis ofpredicting Druggability I 821
for approved, small-molecule drugs. While there may be many more proteins
expressed by the human genome, which may be discovered to be modulated
by small-molecule tools or drugs, the proteins identified as belonging to the
subset known as the druggable genome represent those targets we can readily
predict as having a higher confidence level of discovering a small-molecule
chemical tool than the remaining genes in the genome. Since it was first
proposed that the various physicochemical constraints on druglike chemicals
would reduce the available target space, it has been suggested that accessible
drug target space may expand considerably with the application of biologic
drugs such as fully humanized antibodies. Protein therapie approved to date
act via about 59 human targets, 18 ofthese are targeted by marketed antibodies.
With the commercialization of recombinant protein production, the number
of biological drugs receiving approval and being studied in the clinic is steadily
rising. Several commentators predict that the rise of antibody therapies may
challenge the premier position of small-molecule chemical entities as the
dominant technology of medicines [ 3 3 ] . Our analysis of the proposition of
the genome, potentially accessible to modulation by protein therapeutics such
as antibodies, is around 13% with 3258 genes predicted to encode proteins
druggable via protein therapeutics. Interestingly, 70% of all the drug targets
are also predicted to be accessible to modulation by antibody therapy. Indeed, if
we expand the analysis to compare the overlap between the antibody-accessible
druggable genome and the small-molecule druggable genome, 1516 genes
are predicted to encode proteins druggable by both small molecules and
protein therapeutics; which is approximately 45% of our current estimate of
the small-molecule druggable genome (Figs. 14.2-5 and 6).
We would like to thank Colin Groom (UCB Celltech, Cambridge, UK) for
his long-standing contribution to this work. We also sincerely thank Edith
Chan (Inpharmatica, London), Robin Spencer (Pfizer, Groton), Lee Beeley
(Pharmamatters,Ramsgate), and Jonathan Mason (Pfizer, Sandwich) for their
helpful discussions in the development of this work.
References
1. A.L. Hopkins, C.R. Groom, Target Opin. Drug Discov. Devel. 2001, 4,
analysis: a priori assessment of 102-109.
druggability, Ernst Schering Research 10. I. Muegge, S.L. Heald, D. Brittelli,
Foundation Workshop,Berlin, 2003, 42. Simple selection criteria for drug-like
2. A.L. Hopkins, C.R. Groom, The chemical matter, /. Med. Chem.2001,
Druggable Genome, Nat. Rev. Drug 44,1841-1846.
Discou. 2002, I , 727-730. 11. D.F. Veber, S.R. Johnson, H.Y. Cheng,
3. J. Overington, Prioritizing the B.R. Smith, K.W. Ward, K.D. Kopple,
proteome: identifying Molecular properties that influence
pharmaceutically relevant targets, the oral bioavailability of drug
Drug Discov. Today 2002, 7, 516-521. candidates, J. Med. Chem. 2002,45,
4. C.A. Lipinski, F. Lombardo, B.W. 2615-2623.
Dominy, P.J. Feeney, Experimental 12. J.R. Proudfoot, Drugs, leads, and
and computational approaches to drug-likeness: an analysis of some
estimate solubility and permeability in recently launched drugs, Bioorg. Med.
drug discovery and development Chem. Lett. 2002, 12, 1647-1650.
settings, Adv. Drug Deliu. Rev. 1997, 13. W.P. Walters, M.A. Murcko,
23, 3-25. Prediction of ‘drug-likeness’,Adv.
5. A. Ajay, W.P. Walters, M.A. Murcko, Drug Delivery Rev. 2002, 54, 255-271.
Can we learn to distinguish between 14. W.J. Egan, W.P. Walters, M.A.
“drug-like’’ and “nondrug-like” Murcko, Guiding molecules towards
molecules?j. Med. Chem. 1998, 41, drug-likeness, Curr. Opin. Drug Discov.
33 14- 3324. Deuel. 2002, 5, 540-549.
6. J. Wang, K. Ramnarayan, Towards 15. I. Muegge, Selection criteria for
designing drug-like libraries: a novel drug-like compounds, Med. Res. Rev.
computational approach for prediction 2003, 23, 302-321.
of drug feasibility of compounds, J. 16. M.S. Lajiness, M. Vieth, J. Erickson,
Comb. Chem. 1999, I , 524-533. Molecular properties that influence
7. W.P. Walters, A. Ajay, M.A. Murcko, oral drug-like behavior, Curr. Opin.
Recognizing molecules with drug-like Drug Discov. Devel. 2004, 7,470-477.
properties, Curr. Opin. C h e w Biol. 17. M. Vieth, M.G.Siegel, R.E. Higgs, I.A.
1999,3,384-387. Watson, D.H. Robertson, K.A. Savin,
8. C.A. Lipinski, Drug-like properties P.A. Durst Hipskind, et al.
and the causes of poor solubility and Characteristic physical properties and
poor permeability, J . Pharmacol. structural fragments of marketed oral
Toxicol. Methods 2000, 44, 3-25. drugs, J. Med. Chem. 2004,47,
9. B.L. Podlogar, I. Muegge, L.J. Brice, 224-232.
Computational methods to estimate 18. I.D. Kuntz, K. Chen, K.A. Sharp, P.A.
drug development parameters, Curr. Kollman, The maximal affinity of
References I823
ligands, Proc. Natl. Acad. Sci. U.S.A. 29. C. Burgess, I. Golden, IBC Drug
1999, 96,9997-10002. Discovery and Technology Conference,
19. P. Ertl, B. Rohde, P.Selzer, Fast Curagen Corpo, Boston, 2002.
calculation of molecular polar surface 30. J.B. Golden, Prioritizing the human
area as a sum of fragment based genome: knowledge management for
contributions and its application to the drug discovery, Curr. Opin.Drug.
prediction of drug transport Discov. Devel. 2003, 6,310-316.
properties, J . Med. Chem. 2000, 43, 31. J . Golden, Towards a tractable
3714-3717. genome: knowledge management in
20. P.J. Hajduk, J.R. Huth, S.W. Fesik, drug discovery, Curr. Drug Discov.
Druggability Indices for protein 2003,17-20.
targets derived from NMR-based 32. A.C. Murzin, S.E. Brenner,
screening data, 1.Med. Chem. 2005, T. Hubbard, C. Chothia, SCOP: a
48,2518-2525. structural classification of proteins
21. J . Drews, S. Ryser, Classic drug database for the investigation of
targets, Nat. Biotechnol. 1997, 15, sequences and structures, J . Mol. Biol.
1318-1 319. 1995, 274,536-540.
22. J. Drews, Genomic sciences and the 33. S. Arlington, S. Barnett, S. Hughes,
medicine of tomorrow, Nat. J. Palo, Pharma 2010: The Threshold of
Biotechnol. 1996, 14, 1516-1518. Innovation, IBM Business Consulting
23. J. Drews, Drug discovery: a historical Services, London, 2002.
perspective, Science 2000, 287, 34. A.K. Pavlou, J.M. Reichert,
1960-1964. Recombinant protein
24. E. Lander, Initial sequencing and therapeutics-success rates, market
analysis of the human genome, Nature trends and values to 2010, Nat.
2001,409,860-921. Biotechnol. 2004, 22, 1513-1519.
25. J. Venter, The sequence of the human 35. J.M. Reichert, Protein therapeutic
genome, Science 2001,1304-1351. success rates increase with biotech
26. A.P. Orth, S. Batalov, M. Perrone, S.K. advances. Tufts center for the study of
Chanda, The promise of genomics to drug development impact report 2005,
identify novel therapeutic targets, 7.
Expert Opin.Ther. Targets 2004, 8, 36. Windhoven know they R&D enemy:
587-596. the key to fighting attrition, In Vivo
27. A.P. Russ, S. Lampel, The druggable 2005.
genome Drug Discov. Today, 2005, 37. G.O. Consortium, Creating the gene
10(23-24), 1577-9. ontology resource: design and
28. K. Davies, Cracking the ‘Druggable implementation, Genome Res. 2001,
Genome’. Bio-IT world, 2002, 1 1 , 1425-1433.
http://www.bio-itworld.com/
archive/100902/firstbase.html.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I825
15
Target Fami lies
15.1
The Target Family Approach
Hans Peter Nestler
Outlook
15.1.1
Introduction
The sequencing of the human genome [l]marked the apex of the transforma-
tion of biology from an observational and descriptive activity to a hypothesis-
driven science. With the information about the building blocks for cells, it is
now possible to modulate and investigate the phenomenology of organisms
at a molecular level. Drug discovery underwent, in parallel, a tremendous
change from an empirical process driven by the experience of medicinal
chemists that translated pharmacological effects to changes in molecules, to a
knowledge-driven operation based on biochemistry, high-throughput synthesis
and screening, and structure-driven drug design. Yet, in spite of this evolution,
the productivity of the pharmaceutical industry has plummeted and 2004 saw
the lowest number of new drugs in history, coming to the market. Soon after
the sequences became available, discussions arose about how many of the ap-
proximately 27 000 genes that had been assigned [I]would be “druggable”, that
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber. Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
826
I is, their associated protein products could be modulated with small molecules
15 Target Families
by anecdotal examples, we will focus on target family ideas that enable the early
stages. We will demonstrate their application and applicability to representative
target families, in this chapter. We will pay particular attention to the core
aspect of “chemical biology”, the matching of chemical and biological spaces
[6]. For a drug molecule to exert its pharmaceutical action, it is crucial that
the molecular shape complements the cast offered by the target protein. This
fact has been recognized first by Emil Fischer who phrased it as a “Key-Lock’’
828
I principle [lo], being unaware of the dynamic and flexible nature of protein
15 Target Families
15.1.2
Understanding Biological Space
accessible: All proteins can be enumerated at the gene level and classified on
the basis of their sequence homology by bioinformatic tools [I].However, this
comfortable straightforward picture is complicated by the fact that genes can
be expressed in various forms, but the target family classifications hold up in
a first approximation.
In spite of the successes at the genomic and proteomic levels, the identifica-
tion of novel protein targets for modulation does not proceed at the expected
pace as the proteins do not act as isolated entities but as complexes in an
almost overcrowded environment. To exert their biological effect, the individ-
ual entities enter into dynamic physical interactions with each other and our
textbook knowledge about kinetics and thermodynamics does not necessarily
stand up to the task because of the high concentration and high viscosity of
the cytoplasmic space. Furthermore, the monitoring of gene expression and
protein analysis does not reveal the complete picture about their respective
binding partners. Today, we are still ignorant about many protein/protein-
and protein/ligand complexes, such as GPCR agonist, ion-channel modulator,
or protease-substrate pairs that associate and dissociate in a cell and are re-
sponsible for biological activity. Even in cases where we know the respective
binding partners, we are a long way from understanding the structural basis
and dynamics of these interactions. Structural biology methods such as crys-
tallography and nucleous magnetic resonance (NMR) have taught us much
about soluble proteins, such as kinases and proteases, but gaining structural
insights about membrane-bound proteins such as GPCRs and ion channels,
remains difficult. To date, only one structure for a bacterial GPCR and three
for ion channels have been reported [12-151. We will discuss in this section
the approaches to identify physiological and artificial ligands for proteins as
well as to gain structural knowledge about their interactions.
often for a variety of ligands to each protein. This information is used intensely
for inhibitor optimization purposes but also allows structural comparisons at a
target family level.These analyses are based on structural overlays ofthe protein
structures within the target families, respectively the subfamilies of proteases,
and the affinity and repulsion of various small molecular probes, such as water
or methanol, to the active site’s surface. The studies provide “target family
landscapes” that show the relationships of the target family members at a struc-
tural level [21-231. The landscapes provide the tools necessary to understand
the cross-reactivities of inhibitors with closely related proteins or to assess the
likelihood of success for transforming an inhibitor for a particular target into
an inhibitor for another family target (Fig. 15.1-2). Furthermore, they allow
selection of closely related proteins as structural surrogates for those family
members, where crystallographic information is not available. This so-called
homology modeling is of crucial importance for understanding the structural
space covered by membrane-bound proteins, such as GPCRs or ion channels.
Using the rhodopsin GPCR structure [12] as a template and target family
homology, it has been possible to get topological information about the bind-
ing sites for many GCPRs to foster an understanding of the binding modes
of ligands [25].At a resolution of about 3.5 A, which can usually be achieved,
it is possible to understand differential binding of ligands to the receptors
and to rationalize their activation, as demonstrated by Goddard et al. in a
homologous series of ketones activating the olfactory receptor 912-93. Fur-
thermore, the differences in activation between mouse and human orthologs
could be assigned to a Ser105/Gly105 mutation [26]. This study also points to
an instrumental aspect for the structural modeling of membrane proteins. In
addition to sequence homologies, ligand-binding strengths are used to refine
the topologies and interactions. If combined with molecular dynamics, refine-
ment of the loops connecting the transmembrane helices as demonstrated
by the program PREDICT [27],the accuracy of the models becomes powerful
enough to perform virtual screening and to discriminate between ligands and
their binding modes [28, 291. In the ion-channel field, homology models can
be based on three crystal structures of various potassium channels, two of
which show the channel in the open [13, 141 and one in the closed state [IS].
Although ion channels are multimeric proteins and structurally more diverse
than GPCRs, good models have become available using the three structures
and ligand-activity information, as highlighted by the possibility of predicting
hERG blocking activity of ligands [30, 311. The hERG channel is of general
pharmacological interest as an antitarget, because blocking this channel can
induce fatal cardiac fibrillation. Thus, most biological data is available and the
homology-based models, even though they are built on the bacterial MthK
channel [13], have meanwhile reached the same accuracy as models derived
from SAR data [32]and can guide chemical optimization to achieve specificity
of ligands. Beyond the prediction of ligand-binding, homology models help
the functional analysis of ion channels. In a recent example, the gating of the
75.7 The Target Family Approach I831
Fig. 15.1-2 Assigning membership of a throughout the kinome [24]. To gain insight
protein to a protein family and analyzing the at the structural level, three-dimensional
structural relationships can be achieved by structures must be aligned and compared.
two major concepts. Starting from protein The comparison involves studies o f
sequence information, the similarities o f the interactions with various probes such as
sequences can be investigated and proteins amides, carbonyl, or water. The proteins are
can be clustered in phylogenetic trees. positioned in a cube and the interaction o f
These analyses were the basis o f the the probes at various positions in the cubes
assignments o f target families as reported, is measured. The statistical analysis o f the
for example, by Venter et al. [I]. At a higher interaction surfaces provides the
resolution, such trees can also be generated dimensions for separating the proteins in
within gene families. While these trees can structure-based landscape maps [21, 221.
provide information about the evolutionary The protein relations within these maps
relationships, the relations do not translate reflect the affinity profiles toward small
into structural similarities at a detailed level, molecule ligands and can be used t o
as shown by the distribution of affinities rationalize specificities.
toward various small molecule ligands
832
I Kir6.2 channel by ATP could be explained at the atomic level [33]utilizing the
15 Target Families
structures ofthe open Kir3.1 channel [14] and the closed KirBacl.1 channel [15].
GPCR action and it is a valid assumption that many of the “orphan drugs” will
show to be GPCR modulators, thus expanding the toolchest of deorphaning
agents.
For kinases and proteases, the search for substrates may seem more
straightforward, as these enzymes act on and transform other proteins.
Phosphoproteomics has been established for kinases to identify interaction
partners at the protein level on a genomic scale [35]. Basically, cell cultures
are incubated with 32P-ATPand the cellular extracts are analyzed by two-
dimensional gel electrophoresis. As all kinases can use ATP as a substrate,
the phosphorylation patterns become very complex and do not point to an
individual kinase. To achieve specificity in detection and to avoid the heavy
use of radioisotopes, antibodies reacting to the phosphorylated proteins are
required. While nonspecific phosphoserine or tyrosine recognizing antibodies
are available, they pose the same challenge deconvoluting the specific
phosphorylation of one substrate by a specific kinase. Sequence-specific
antibodies can be raised against the phosphorylated peptide epitope [ 361.
To identify the epitopes, combinatorial peptide libraries are incubated with
purified kinases and 32P-ATP.The phosphorylated peptides can be identified
by microradiography and Edman degradation [37, 381 and can be used for
raising the antibodies. The gained sequence information could be applied for
designing selective inhibitors addressing the substrate-bindingpockets instead
of the ATP site, an approach that is currently not followed, as the peptide-
binding sites are not as distinct as for proteases. While antibodies reveal
information on the phosphorylation state of a protein, it remains unclear
which kinase is responsible for the phosphorylation at a specific position. In
a complementary approach, Shokat et al. were able to track phosphorylation
substrates for individual kinases, using kinases with an extended ATP-binding
site and a bulky ATP derivative. As only the mutated kinases are able to use the
bulky ATP analog, only the substrates of this kinase will be phosphorylated at
the specific phosphorylation sites [39]. Taking the information from all these
approaches together, it is possible to decipher the signaling pathways of the
kinome and to derive structural insights from the substrate sequences, which
could be translated into inhibitors and drugs.
Tracking protease activity remains one of the major challenges. As
mentioned earlier, gene expression levels do not correlate tightly with the
activity of a protease and even monitoring tools like in situ hybridization
cannot elucidate the protease activity in tissues or cellular systems, as the
antibodies employed do not often discriminate between the proenzyme and
activated proteases. Recently, efforts to image protease activity in a cell have led
to activity labeling probes, that act as suicide substrates and lead to fluorescent
tagging of the active site of active proteases [40].Currently, this technology is
limited to proteases that allow for covalent attachment of the probes, namely,
serine and cysteine proteases that act through a nucleophilic substitution, and
it does not reveal the proteins that are cleaved by the protease. Unfortunately,
straightforward labeling approaches as for kinases are not suitable, as no
834
I additional moieties are introduced. Therefore, alternate approaches based
15 Target Families
15.1.3
Exploring Chemical Space
Fig. 15.1-3 Schematic visualization o f the diversity and often mismatch to biological
various concepts to address chemical and space. Chemical biology approaches
biological space (shaded areas) in drug combine the technologies established for
discovery. Medicinal chemistry focused on array synthesis with choosing appropriate
compound series (red dots) that had shown starting points for the libraries. Focused
activity in pharmacological assays and libraries start from known active
compound optimization was driven by a compounds. Scaffold hopping (blue arrows)
tight feedback from biological experiments, and morphing (green arrows) attempts
leading to a focused nonarrayed addressing evolve known structures by searching for
of chemical space. The combinatorial close neighbors or by combination of
promise was t o systemically explore the elements o f two compound series. Fragment
chemical space with diverse arrays o f approaches identify chemical motifs with
compounds (blue dots) to find the suitable biological activity that can provide novel
starting points. Analysis o f combinatorial starting points (flags) for arrayed synthesis.
chemistry libraries showed their limited
weights below 500, estimates reached ballpark figures of 10'' [45]. Even if
we assume that we could represent this space through 1%of the structures,
an estimate that is made often for representative selections from compound
sets, we are still looking at structures. The material requirements for
a single representation of each structure go beyond the resources available
in the known universe. Besides the disillusioning caused by the numbers, it
was soon recognized that compounds from combinatorial libraries were often
inactive or poorly active on biological molecules unless they were derived from
known active compounds. The structures were based on chemical feasibility
and therefore densely populated the regions of chemical space offered by the
scaffolds. With the insight that combinatorial libraries would not be capable
of addressing the biological space and would even fall way short of filling
the chemical space even within the boundary of molecular weights below
500, the utilization of combinatorial chemistry and parallel synthesis shifted
from a diversity approach to densely populating chemical space around proven
starting points, compounds with documented biological activity.
The literature and database on marketed drugs provide many of these
starting points. The analysis of drugs in the market and development revealed
that a limited set of 32 frameworks formed the basis of more than 50% of
the marketed drugs [4G].Although this analysis, like all retrospective studies,
may be biased toward GPCR activity modulators that represent a significant
fraction of drugs in the market, the study underlines two aspects. First, up-to-
date we have explored only a very limited subset of chemical space in our drug
discovery efforts, but remaining within this space makes us quite successful.
Secondly, nature may not be as structurally creative and tolerant as it has been
assumed and therefore biological space may be not as diverse as envisioned.
Beyond these points, the bias toward GPCR ligand may not be as limiting
as it may seem, as GPCR through their subfamilies are binding a variety of
structural motifs, such as nucleotides, lipids, and peptides, and small molecule
ligands like nicotinic acid or dopamine [47]. These ligand types are actually
shared with other target families and therefore the structural motifs from
the drugs in the market can be transferred to drug discovery of other target
families that may seem unrelated at first glance, such as nucleotide mimics
for kinases and peptide mimics for proteases. Although we are using a target
family approach, molecular frameworks may be the uniting concept between
target families, a fact underlining the importance of structural analysis and
knowledge gathering discussed earlier.
These insights have reshaped our thinking about library synthesis and high-
throughput screening and lead to the concept of focused target family libraries
to improve screening efficiency. Focused screening sets provide, if constructed
appropriately, multiple advantages. Firstly, they reduce the cost and efforts of
screening campaigns and address the throughput limitations of some assay
types. Second, high-quality activity data are gathered from the beginning as
the smaller compound numbers allow measuring of multiple data points per
75.I The Target Family Approach I 837
compound and thus reduce false positive and negative occurrence. Third, they
provide higher hit rates and thus SAR from the initial screening and provide
guidance for chemical programs directly. Yet, a delicate balance between
focused screening and the chance for serendipity remains to be maintained,
especially to address the challenge of discovering novel chemotypes that enable
securing an intellectual property position and exploring novel interfaces of
chemical and biological spaces.
As we have gained more and more structural insights, the rational design
of lead structures and the virtual screening of compound collections or
even virtual compound collections have gained tremendous importance.
While the methods have become more sophisticated over the years, the
challenges of making extrapolations from known chemotypes and data,
remain. With the advent of combinatorial chemistry molecular diversity was
one of the predominant themes. Although many measures for diversity
have been devised, the “Tanimoto” coefficient being the most renown, the
results depend heavily on the descriptors used to span the chemical space.
Furthermore, coming from a structural diversity assessment the measures do
not reflect the diversity with respect to the targets. Until today, the development
and selection of suitable descriptors for the chemical space remains a
challenge: An exhaustive enumeration of molecules in the “druglike” space
is not feasible, therefore all the descriptor sets in use focus on specific
applications and pharmacophoric subregions of chemical space. The use of
the above-mentioned “privileged fragments” as virtual building blocks for the
enumeration of structures, constitutes one approach that has proven useful
for the design of target family oriented libraries (Fig. 15.1-4). Utilizing these
scaffolds, for example, fused heteroaromatic cores for kinases and nucleotide-
binding GPCRs, offers the ability to target the libraries toward the respective
protein families and ensures the stability of the computational methods
through the similarity of the generated structures. The targeted libraries
usually represent 200- 1000 compounds around a given scaffold, giving a high
certainty in assessing whether the elaborated chemotype is suitable for a given
target or target family.
The privileged fragments mimic in most cases the natural ligands.
This makes kinases and nucleotide-binding GCPRs quite suitable to this
approach and the scaffolds used cannot deny their pedigree. In addition
to these ATP mimetics, kinases accept another class of ligands, “hinge-
binders”, out of their catalytically inactive conformation. This conformation
has been termed DFG-out conformation, due to the observed orientation
of a loop containing the amino acid triplet aspartate-phenylalanine-glycine.
This binding mode was unexpected but is used by many selective kinase
inhibitors, such as Gleevec. The other subclasses of GPCRs, such as amine
or peptide-binding GPCRs, accept tertiary amines or dipeptide ligand mimics.
Peptidomimetic approaches are used heavily to build protease scaffolds.
Selective protease inhibitors are quite straightforward to be obtained because
of the substrate variety and specificity of the proteases. However, the
concept of privileged scaffolds does not carry far. The unifying element
in protease substrates is the extended p-strand conformation that allows
interactions with four to six subpockets in the protease active site [43].
Mimics for this conformation have been developed but they still lack universal
applicability. Unlike the scaffolds for kinase or GPCR ligands, the cores
of protease inhibitors, like the peptidic backbone in the substrate, do not
contribute the majority of binding energy, and are therefore not crucial for
15. I The Target Family Approach I 839
Starting probe
(described active against
5-HT3a in MDDR)
~ ~
Lo+ Atropin
w -&
muscarinic cholinergic
‘5.,
N\ 1
oA \ \ o
OH
receptor antagonist
(plant alkaloid)
Cocaine
W:L
dopamine receptor
antagonist
acetylEpibatidine
choline receptor very fast deathAfactor
Anatoxin \ I (plant alkaloid)
(poison frog) (cyanobacteria) 0
Fig. 15.1-5 In silico scaffold hopping and biological defense. While cyanobacteria as
biological scaffold morphing. Starting from monocellular organisms use only cytotoxity
a bioactive probe reported as active against for defending themselves, multicellular
the 5HT3A receptor in the MDDR, about organisms have fine-tuned the activity of
120 000 records ofthe MDDR were searched tropane-like molecules to affect the central
using relaxed similarity requirements. The nervous system of natural enemies, while at
discovered chemotypes provide novel ideas the same time being resistant to the
for chemistry [Sl]. The bicyclic structure poisons. Yet the successful bicyclic amine
evolved to address multiple targets for was maintained as a core ofthe molecule.
IS. I The Target Family Approach I 841
Fig. 15.1-6 Selected fragment screening mass spectrometry [62] or surface plasmon
experiment applied to proteases and resonance [65]and established the binding
kinases. In their landmark study, Fesik et al. modes o f the ligands after identification
equilibrated hydrophobic molecules with through crystallography. In a recent
stromelysin and detected binding by shift o f approach, crystals o f CDK2 were used t o
NMR signals, retrieving structural select oxindole ligands from a dynamic
information from the initial study [MI. Other combinatorial library and established the
studies screened fragment collections using binding modes by crystallography in situ [76].
fragments to be screened against DHNA. The probing of the enzyme with the
same fragment set that had been used for urokinase by Nienaber et al. [71],
allowed establishing the structural requirements for selectivity in the initial
screening run and guiding the extension of the discovered fragments into
nanomolar inhibitors for DHNA [74]. Starting from privileged scaffolds for
the ATP pocket of kinases, fragments binding to p38 MAP kinase and cyclin-
dependent kinase 2 (CDK2) were discovered, that can serve as novel central
building blocks for kinase inhibitors [75].As the throughput of crystallography
is still limited compared to biochemical screenings, collection sizes have to be
small or as in the previous example, mixtures of fragments have to be screened.
To expand the size of collections that can be screened by crystallography,
Congreve et al. devised a dynamic combinatorial library system using “CDK2”
846
I 75 Target Families
protein crystals as selectors for the tightest binding ligands which are formed
from the condensation of isatin and hydrazines. Instead of equilibrating with a
large amount of template protein, the reaction mixture is exposed to individual
crystals of CDK2 guiding the selective formation of imino-indolones. The
structures of selected reaction products are determined by crystallography,
immediately establishing a binding mode for the nanomolar inhibitors of
CDK2 [7G].
Today, the application of fragment approaches is still limited to soluble
proteins, but in future there will be adaptations to membrane-bound proteins,
especially those in which the ligand does not have to compete with natural
ligands, like GPCRs or ion channels, to exert a functional response in a
biochemical assay. The structural insights in the target families will guide the
selection of fragment sets and allow using individual proteins as surrogates
for the whole target family.
15.1.4
Epilogue
Over the last 5 years chemical biology has reshaped the methods of doing
drug discovery. The investigations of the structural characteristics of target
families allow us today to take a more rationale approach toward selecting
appropriate compounds for synthesis and testing. Through the sequencing
of the human genome, we have the blueprint of the building blocks of life
that can be modulated in their interactions through therapeutics. In addition
to the aspects discussed in this chapter, the analysis of pharmacokinetic
characteristics of molecules in the human body has established guidelines and
boundaries for molecules that help us to navigate the chemical space in regions
that offer a higher population of structures than those that may be suitable as
drugs [3,4].While many of the concepts of the target family approach may not
be novel if looked at individually, their conscious combination adds another
dimension: “chemical biology” is based on a thorough structural knowledge of
similarities and differences within a target family. On the basis ofthe sequence
homologies of proteins we can currently make predictions for ligands to
hitherto unexplored targets, thus building a powerful stepping-stone for lead
discovery. We have also learned how to use closely related family members
as surrogates when the target under study is not amenable to a particular
technology, such as crystallography. Today’s structural understanding also
allows us to make more sophisticated choices about investigations to prevent
side effects, and the increasing biological knowledge helps us to rationalize
side effects of drugs and to modify affected drugs accordingly. Yet, we
still run into the trap of building assay schemes for drug discovery that
allow high throughput and are self-consistent. The high-throughput design
sacrifices the biochemical mimicking of the cellular environment, such as the
previously mentioned high concentration and viscosity, for technical feasibility.
References I 8 4 7
The self-consistency often leads to the risk of loosing the relevance for the
pathophysiological phenomenology and thus jeopardizes the predictivity for
the therapeutic setting, being detached from reality like the “Hessian glass
bead game” [77].Eventually “systems biology” will elucidate how the building
blocks of life work together in networks and pathways and which results can
be expected by tweaking one dial in the system, leading to novel and powerful
assay set-ups. Thus, drug discovery may come a full circle to where it started,
but equipped with the chemical biology armentarium of understanding and
predicting the phenomenological changes observed in diseased states and after
the administration of drugs.
References
1. J.C. Venter, M.D. Adams, E.W. Myers, F. Zhong, W. Zhong, S. Zhu, S. Zhao,
P.W. Li, R.J. Mural, G.G. Sutton, D. Gilbert, S. Baumhueter, G. Spier,
H.O. Smith, M. Yandell, C.A. Evans, C. Carter, A. Cravchik, T. Woodage,
R.A. Holt, J.D. Gocayne, F. Ah, H. An, A. Awe, D. Baldwin,
P. Amanatides, R.M. Ballew, H. Baden, M. Barnstead, I. Barrow,
D.H. Huson, J.R. Wortman, K. Beeson, D. Busam, A. Carver,
Q. Zhang, C.D. Kodira, X.H. Zheng, A. Center, M.L. Cheng, L. Curry,
L. Chen, M. Skupski, S. Danaher, L. Davenport, R. Desilets,
G. Subramanian, P.D. Thomas, S. Dietz, K. Dodson, L. Doup,
J. Zhang, G.L. Gabor Miklos, S. Ferriera, N. Garg, A. Gluecksmann,
C. Nelson, S. Broder, A.G. Clark, B. Hart, J. Haynes, C. Haynes,
J. Nadeau, V.A. McKusick, N. Zinder, C. Heiner, S. Hladun, D. Hostin,
A.J. Levine, R.J. Roberts, M. Simon, J. Houck, T. Howland, C. Ibegwam,
C. Slayman, M. Hunkapiller, J. Johnson, F. Kalush, L. Kline,
R. Bolanos, A. Delcher, I. Dew, S. Koduru, A. Love, F. Mann, D. May,
D. Fasulo, M. Flanigan, L. Florea, S. McCawley, T. Mclntosh, The
A. Halpern, S. Hannenhalli, sequence of the human genome.
S. Kravitz, S. Levy, C. Mobarry, Science 2001, 291,1304-1351.
K. Reinert, K. Remington. 2. J. Drews, Drug discovery: a historical
J. Abu-Threideh, E. Beasley, perspective, Science 2000, 287,
K. Biddick, V. Bonazzi, R. Brandon, 1960-1963.
M. Cargill, I. Chandramouliswaran, 3. C.A. Lipinski, F. Lombardo, B.W.
R. Charlab, K. Chaturvedi, 2. Deng, Dominy, P.J. Feeney, Experimental
V. Di Francesco, P. Dunn, K. Eilbeck. and computational approaches to
C. Evangelista, A.E. Gabrielian, estimate solubility and permeability in
W. Gan, W. Ge, F. Gong, Z. Gu, drug discovery and development
P. Guan, T.J. Heiman, M.E. Higgins, settings, Adv. Drug Delivery Rev. 1997,
R.R. Ji, Z. Ke, K.A. Ketchum, Z. Lai, 23, 3-25.
Y. Lei, 2. Li, J. Li, Y. Liang, X. Lin, 4. D.F. Veber, S.R. Johnson, H.-Y.
F. Lu, G.V. Merkulov, N. Milshina, Cheng, B.R. Smith, K.W. Ward et al.,
H.M. Moore, A.K. Naik, V.A. Narayan, Molecular properties that influence
B. Neelam, D. Nusskern, D.B. Rusch, the oral bioavailability of drug
S. Salzberg, W. Shao, B. Shue, J. Sun, candidates, J. Med. Chem. 2002, 45,
Z. Wang, A. Wang, X. Wang, J. Wang, 2615-2623.
M. Wei, R. Wides, C. Xiao, C. Yan, 5. A.L. Hopkins, C.R. Groom, The
A. Yao, J. Ye, M. Zhan, W. Zhang, druggable genome, Nat. Rev. Drug
H. Zhang, Q. Zhao, L. Zheng, Discov. 2002, I , 727-730.
15 Target Families
848
I 6. G. Wess, M. Urmann, G-protein-coupled receptors in the
B. Sickenberger, Medicinal chemistry: human genome form five main
challenges and opportunities, Angew. families. Phylogenetic analysis,
Chem., Int. Ed. Engl. 2001, 40, paralogon groups, and fingerprints,
3341-3350. Mol. Pharmacol. 2003, 63, 1256-1272.
7. P.P. Wangikar, A.V. Tendulkar, 17. D.K. Vassilatis, J.G. Hohmann,
S. Ramya, D.N. Mali, S. Sarawagi, H. Zeng, F. Li, J.E. Ranchalis et al.,
Functional sites in protein families The G protein-coupled receptor
uncovered via an objective and repertoires of human and mouse,
automated graph theoretic approach, Proc. Natl. Acad. Sci. U.S.A. 2003, 100,
]. Mol. Bid. 2003, 326, 955-978. 4903-4908.
8. M.A. Koch, R. Breinbauer, 18. M.H. Saier Jr, A functional-
H. Waldmann, Protein structure phylogenetic classification system for
similarity as guiding principle for transmembrane solute transporters,
combinatorial library design, Biol. Microbiol. Mol. Biol. Rev. 2000, 64,
Chem. 2003,384,1265-1272. 354-411.
9. L. Orning, G. Krivi, F.A. Fitzpatrick, 19. S . Caenepeel, G. Charydczak,
Leukotriene A4 hydrolase. Inhibition S . Sudarsanam, T. Hunter,
by bestatin and intrinsic G. Manning, The mouse kinome:
aminopeptidase activity establish its discovery and comparative genomics
functional resemblance to of all mouse protein kinases, Proc.
metallohydrolase enzymes, ]. Biol. Natl. Acad. Sci. U.S.A. 2004, 101,
Chem. 1991,266,1375-1378. 11707-11712.
10. E. Fischer, Effekt der 20. S.M. Foord, Receptor classification:
Zuckerkonfiguration auf die post genome, Curr. Opin. Phamacol.
Enzymwirkung. Ber. Dtsch. Chenz. Ges. 2002, 2,561-566.
1894, 27,2985. 21. T. Naumann, H. Matter, Structural
11. D.E. Koshland Jr, The lock-and-key classification of protein kinases using
principle and the induced-fit theory, 3D molecular interaction field analysis
Angew. Chem., Int. Ed. Engl. 1994, 33, of their ligand binding sites: target
2475-2478. family landscapes, 1.Med. Chem. 2002,
12. K. Palczewski, T. Kumasaka, T. Hori, 45,2366-2378.
C.A. Behnke, H. Motoshima et al., 22. H. Matter, W. Schwab, Affinity and
Crystal structure of rhodopsin: A G selectivity of matrix metalloproteinase
protein-coupled receptor, Science 2000, inhibitors: a chemometrical study
289,739-745. from the perspective of ligands and
13. Y. Jiang, A. Lee, J. Chen, M. Cadene, proteins,]. Med. Chem. 1999, 42,
B.T. Chait et al., Crystal structure and 4506-4523.
mechanism of a calcium-gated 23. M. Vieth, R.E. Higgs, D.H. Robertson,
potassium channel, Nature 2002, 417, M. Shapiro, E.A. Gragg et al.,
515-522. Kinomics-structural biology and
14. M. Nishida, R. MacKinnon, Structural chemogenomics of kinase inhibitors
basis of inward rectification: and targets, Biochim. Biophys. Acta
cytoplasmic pore of the G 2004, 1697,243-257.
protein-gated inward rectifier GIRKl 24. M.A. Fabian, W.H. Biggs, D.K.
at 1.8. ANG. resolution, Cell 2002, 111, Treiber, C.E. Atteridge, M.D.
957-965. Azimioara et al., A small
15. A. Kuo, J.M. Gulbis, J.F. Antcliff; molecule-kinase interaction map for
T. Rahman, E.D. Lowe et al., Crystal clinical kinase inhibitors, Nat.
structure of the potassium channel Biotechnol. 2005, 23, 329-336.
KirBacl.1 in the closed State, Science 25. N. Vaidehi, W.B. Floriano,
2003,300,1922-1926. R. Trabanino, S.E. Hall, P. Freddolino
16. R. Fredriksson, M.C. Lagerstrom, et al., Prediction of structure and
L.-G. Lundin, H.B. Schioth, The function of G protein-coupled
References I 8 4 9
receptors, Proc. Natl. Acad. Sci. U.S.A. and strategies, Curr. Opin. Chem. Biol.
2002, 99, 12622-12627. 2003, 7,64-69.
26. P. Hummel, N. Vaidehi, W.B. 36. H. Zhang, X. Zha, Y. Tan, P.V.
Floriano, S.E. Hall, W.A. Goddard 111, Hornbeck, A.J. Mastrangelo et al.,
Test of the binding threshold Phosphoprotein analysis using
hypothesis for olfactory receptors: antibodies broadly reactive against
explanation of the differential binding phosphorylated motifs, /. Biol. Chem.
of ketones to the mouse and human 2002, 277,39379-39387.
orthologs of olfactory receptor 912-93, 37. 2 . Songyang, S. Blechner,
Protein Sci. 2005, 14, 703-710. N. Hoagland, M.F. Hoekstra,
27. S. Shacham. Y. Marantz, S. Bar-Haim, H. Piwnica-Worms et al., Use of a n
0. Kalid, D. Warshaviak N. Avisar, oriented peptide library to determine
B. Inbal, A. Heifetz, M. Fichman, the optimal substrates of protein
M. Topf, 2 . Naor, S . Noiman, kinases, Curr. Biol. 1994, 4, 973-982.
O.M. Becker, PREDICT modeling and 38. P.M. Chan, H.P. Nestler, W.T. Miller,
in-silico screening for G-protein Investigating the substrate specificity
coupled receptors, Proteins 2004, 57, of the Her-Z/Neu kinase using peptide
51-86. libraries, Cancer Lett. 2000, 1 GO,
28. T. Klabunde, G. Hessler, Drug 159-169.
design strategies for targeting 39. L.A. Witucki, X. Huang, K. Shah,
G-protein-coupled receptors, Y.Liu, S . Kyin eta]., Mutant tyrosine
ChemBioChem 2002,3,928-944. kinases with unnatural nucleotide
29. O.M. Becker, Y. Marantz, S. Shacham, specificity retain the structure and
B. Inbal, A. Heifetz et al., G phospho-acceptor specificity of the
protein-coupled receptors: in silico wild-type enzyme, Chem. Biol. 2002, 9,
drug discovery in 3D, Proc. Natl. Acad. 25-33.
Sci. U.S.A. 2004, 101, 11304-11309. 40. D.C. Greenbaum, W.D. Arnold, F. Lu,
30. J.S. Mitcheson, J. Chen, M. Lin, L. Hayrapetian, A. Baruch et al., Small
C. Culberson, M.C. Sanguinetti, A molecule affinity fingerprinting a tool
Structural basis for drug-induced long for enzyme family subclassification,
QT-syndrome, Proc. Natl. Acad. Sci. target identification, and inhibitor
U.S.A. 2000, 97, 12329-12333. design, Chem. Biol. 2002, 9,
31. R.A. Pearlstein. R.J. Vaz, J. Kang, X.-L. 1085-1094.
Chen, M. Preobrazhenskaya et al., 41. H.P. Nestler, A. Doseff, A
Characterization of HERG potassium two-dimensional, diagonal sodium
channel inhibition using CoMSiA 3D dodecyl sulfate polyacrylamide gel
QSAR and homology modeling electrophoresis technique to screen for
approaches, Bioorg. Med. Chem. Lett. protease substrates in protein
2003, 13,1829-1835. mixtures, Anal. Biochem. 1997, 251,
32. A.M. Aronov, Predictive in silico 122-125.
modeling for hERG channel blockers. 42. M. Meldal, 1. Svendsen, K. Breddam,
Drug Discou. Today 2005, 10,149-155. F.-I. Auzanneau, Portion-mixing
33. J.F. Antcliff, S. Haider, P. Proks, peptide libraries of quenched
M.S.P. Sansom, F.M. Ashcroft, fluorogenic substrates for complete
Functional analysis of a structural subsite mapping of endoprotease
model of the ATP-binding site of the specificity, Proc. Natl. Acad. Sci. U.S.A.
KATP channel Kir6.2 subunit, E M B O 1994, 91, 3314-3318.
I. 2005, 24,229-239. 43. I.D.A. Tyndall, T. Nall, D.P. Fairlie,
34. 0. Civelli, GPCR deorphanizations: Proteases universally recognize beta
the novel, the known and the strands in their active sites, Chem. Rev.
unexpected transmitters, Trends 2005, 105,973-999.
Pharmacol. Sci. 2005, 26, 15-19. 4.A. Eschenmoser, One hundred years
35. D.E. Kalume, H. Molina, A. Pandey, of the lock-and-key principle, Angew.
Tackling the phosphoproteome: tools Chem., Int. Ed. Engl. 1994, 33, 2363.
15 Target Families
850
I 45. R.S. Bohacek, C. McMartin, W.C. 54. L. Naerum, L. Norskov-Lauritsen, P.H.
Guida, The art and practice of Olesen, Scaffold hopping and
structure-based drug design: a optimization towards libraries of
molecular modeling perspective, Med. glycogen synthase kinase-3 inhibitors,
Res. Rev. 1996, 16, 3-50. Bioorg. Med. Chem. Lett. 2002, 12,
46. G.W. Bemis, M.A. Murcko, The 1525-1528.
properties of known drugs. 1. 55. D.G. Lloyd, C.L. Buenemann, N.P.
Molecular frameworks, J. Med. Chem. Todorov, D.T. Manallack, P.M. Dean,
1996,39,2887-2893. Scaffold hopping in de novo design:
47. K. Bondensgaard, M. Ankersen, ligand generation in absence of
H. Thogersen, B.S. Hansen, B.S. receptor information, J. Med. Chem.
Wulff et al., Recognition of privileged 2004,47,493-496.
structures by G-protein coupled 56. I.D. Kuntz, K. Chen, K.A. Sharp, P.A.
receptors, J. Med. Chem. 2004, 47, Kollman, The maximal affinity of
888-899. ligands, Proc. Natl. Acad. Sci. U.S.A.
48. Y.C. Martin, J.L. Kofron, L.M. 1999, 96,9997-10002.
Traphagen, Do structurally similar 57. P.A. Rejto, G.M. Verkhiver,
molecules have similar biological Unraveling principles of lead
activity? I.Med. Chem. 2002, 45, discovery: from unfrustrated energy
4350-4358. landscapes to novel molecular
49. A.M. Aronov, M.A. Murcko, Toward a anchors, Proc. Natl. Acad. Sci. U.S.A.
pharmacophore for kinase frequent 1996, 93,8945-8950.
hitters, J . Med. Chem. 2004, 47, 58. D.A. Erlanson, R.S. McDowell,
5616-5619. T. O’Brien, Fragment-based drug
discovery, J . Med. Chem. 2004, 47,
50. H. Matter, E. Defossa, U. Heinelt,
3463-3482.
P.-M. Blohm, D. Schneider et al.,
59. D.C. Rees, M. Congreve, C.W. Murray,
Design and quantitative
R. Carr, Fragment-based lead
structure-activity relationship of
discovery, Nat. Rev. Drug Discov. 2004,
3-amidinobenzyl-1H-indole-2-
3,660-672.
carboxamides as potent, nonchiral,
60. H.P. Nestler, Combinatorial chemistry
and selective inhibitors of blood
and fragment screening - two unlike
coagulation factor Xa, J . Med. Chem.
siblings? Curr. Drug Discov. Technol.
2002,45,2749-2769.
2005, 2, 1-12.
51. J.L. Jenkins, M. Glick, J.W. Davies, A 61. Y.M.Dunayevskiy, P. Vouros, E.A.
3D similarity method for scaffold Wintner, G.W. Shipps, T. Carell et al.,
hopping from known drugs or natural Application of capillary
ligands to new chemotypes, J. Med. electrophoresis-electrospray ionization
Chem. 2004,47,6144-6159. mass spectrometry in the
52. D. Horvath, C. Jeandenans, determination of molecular diversity,
Neighborhood behavior of in silico Proc. Natl. Acad. Sci. U.S.A. 1996, 93,
structural spaces with respect to in 6152-6157.
vitro activity spaces-a novel 62. G. Agnihotri, M.P. Scott, M.H.
understanding of the molecular Alaoui-Ismaili, U.F. Mansoor,
similarity principle in the context of D. Murphy et al., Identification of
multiple receptor binding profiles, I. potent inhibitors of c-Jun N-terminal
Chem. InJ Comput. Sci. 2003, 43, kinase-1 (JNK1) using ultra
680-690. high-throughput affinity based
53. H. Matter, Selecting optimally diverse screening, 12th Symposium on Second
compounds from structure databases: Messengers and Phospho-proteins
a validation study of two-dimensional (SMP-2004),2004.
and three-dimensional molecular 63. Y. Hou, J. Felsch, A. Annis, C.E.
descriptors, I . Med. Chem. 1997, 40, Whitehurst, C.C. Cheng et al.,
1219-1229. Identification of small molecule
References I851
852
I 15 Target Families
15.2
Chemical Biology of Kinases Studied by NMR Spectroscopy
Marco Betz, Martin Vogtherr, Ulrich Schieborr, Bettina Elshorst, Susanne Grimme,
Barbara Pescatore, Thomas Langer, Krishna Saxena, and Harald Schwalbe
Outlook
15.2.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Desip.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GinbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
15.2 Chemical Bio/ogy ofKinases Studied by NMR Spectroscopy I 853
signaling networks, governing processes such as cell growth, cell division, and
cell death. The pathophysiology of many diseases is caused by the perturbation
ofthese pathways, whether caused by environmental stresses or genetic defects.
Regarding protein kinases, deregulated activity is involved in all aspects of
neoplasia, including proliferation, invasion, angiogenesis, and metastasis IS].
As a result of the sequencing of the human genome, approximately 500 protein
kinases have been predicted. Although only a comparatively small number
of protein kinases has actually been targeted by established drugs, it is now
accepted that finding protein kinase inhibitors is a viable way to discover
new drugs. The remarkable success of the first inhibitors, the anticancer
drugs Gleevec (Novartis) [GI and Iressa (AstraZeneca) [7], supports the idea
of targeting a kinase that is pivotal to a malignant phenotype. These findings
have increased the efforts in drug discovery and development research in
this area.
as a hinge. The catalytic site is located at the interface region between both
lobes. The adenine moiety of ATP binds deep in a hydrophobic pocket between
the lobes, while the phosphates of ATP are aligned by interactions with the
backbone amides of a glycine-rich loop. The protein substrate binding site is
associated mostly with the C-terminal lobe. The catalytic cycle of the phosphate
transfer and the conformational reorganizations linked to it are reasonably
well understood [19].
Crystallographic studies of mammalian protein kinase A (PKA) with and
without Mg-ATP and an inhibitory polypeptide have revealed two different
conformational states. The so-called open form is seen in the apo form and in
the binary complex with the peptide. The N-terminal lobe is turned away from
the C-terminal lobe by 14" when compared to the closed conformation. The
closed structure can be observed in the ternary complex with Mg-ATP and the
peptide substrate. This conformation is necessary to bring the residues into
the correct orientation to promote catalysis [20].
A key aspect of regulation is that most kinases can be activated by specific
phosphorylation, but there are numerous other kinase-specific activation
and inactivation pathways that involve protein-protein interactions. The
phosphorylation takes place on residues located in a particular segment
in the center of the kinase domain, which is termed the activation segment. It
is defined as the region spanning conserved sequences DFG and APE. The
conversion from an inactive to an active state involves conformational changes
in the protein that lead to the correct disposition of substrate binding and
catalytic groups. Structures of kinases with unmodified activation loops fall
into two classes cyclin dependent kinase 2 (CDK2)and insulin receptor kinase
(IRK) are representatives of enzymes that adopt inactive conformations in
their resting state. Their activation loop has an inhibitory fold, blocking the
sites for ATP and substrate [14, 161. p21 activated protein kinase 1 (PAK1)and
PKA, when freed of a negative regulator as the inhibitory switch (IS) domain
[21] or the R subunit [22], respectively, appear to relax into an accessible
conformation.
15.2.2
Protein NMR Spectroscopy on Kinases
(a) Met: 12/12 signals (b) Ile. 12/13 signals (c) Leu: 25/26 signals
mm 1D
~ ~ ~~
~ D m l
116
118
00000 00000 00000 ,
e
122
iI:
~
7
2 124 ' I
126 j
1
128 1
4
130
-
'
%7,
132 I 135
, , ' , , , , , , ~ , ,
95 90 85 80 75 70 ppm 11 10 9 8 7 6 ppm 11 10 9 8 7 6 PPm
135[ , , , , , , , 1 , , ' , , , , 1
11 10 9 8 7 6ppm 11 10 9 8 7 Gppm
6'H [PPml
Fig. 15.2-3 [' H,'SN]-TROSY spectra of the upper corner represent the 20 possible
different BTK samples with selectively amino acids. Proline is not considered
labeled "N-amino acids: Met (a), Ile (b), because it lacks the NMR-detectable amide
Leu (c), Val (d), and Phe (e). The circles in proton.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy I 859
Protein kinase B (PKB/Akt) is a validated target in drug design, but the catalytic
domain cannot be expressed in E. coli in a functional form. PKA and Akt/PKB,
both of which belong to the AGC family (PKA/protein kinase G/PKC) of
protein kinases, share a high sequence homology. Distinct point mutations
in the active site of PKA (PKAB6 and PKAB8 chimeras) are introduced to
enhance their similarity and their corresponding binding profile [32].
Depending on the used expression system and fusion protein, the yield
of an expressed kinase can vary by one magnitude. Changing the gluthation
S-transferase fusion protein (GST) to an N-terminal His tag for the expression
of p38 results in an increase by factor 8 for the yield of the recombinant
protein.
Another issue for construct optimization efforts is done for mitogen-
activated protein kinase-activated protein kinase-2 (MAPKAP-2). Screening
of 20 protein constructs with different N- and C-terminal ends leads to an
NMR-feasible kinase. In this case, the protein expression yield of the different
constructs is of minor importance; the major goal to achieve is a properly
folded protein with long-term stability during the recording of the N M R
spectra.
The domain boundary has to be carefully chosen, which is also observed for
PKA expression. The A helix (amino acids 16-31) contains an N-myristylation
motif and a SO-residue extension at the C-terminus of the catalytic domain.
This amino acid stretch with the aromatic FTEF sequence must be
included during the NMR investigations because it folds back onto an
hydrophobic patch of the N-terminal lobe, thus stabilizing the whole protein
construct [ 331.
As a result, these expression efforts lead to a triple labeled protein kinase
sample, providing the basis for the N M R assignment of the specific kinase.
105 - ' ,
b
110 -
- 115 -
-z2
a
v)
120-
r
Lo
125 -
130 - 1.) ,
135 - 1
i , ;, , , , , , ,_ I
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy I 861
Fig. 15.2-6 [’ H,15N]-TROSYspectrum of active murine protein kinase A (PKA) with the
annotated assignment.
sequences is to be included. There are three practical tools that are used
to enable the assignment of the protein, (a)the chemical shift matching
procedure, (b) the use of paramagnetic spin labels, and (c) the use of amino
acid-selective labeled samples as described previously in Sections 15.2.1
and 15.2.2 [ 3 3 ] .
864
I 15.2.2.3.1
15 Target Families
Fig. 15.2-7 The shift match procedure can these diagrams, a t the bottom, the RMSD is
be divided into two steps. The identification shown as a function o f the assumed start
o f resonances o f subsequent residues and residue in the protein sequence. Low RMSD
the calculation o f the root mean square dif- values indicate possible correct start resi-
ference (RMSD) in calculated and predicted dues. The diagram shows the alignment o f
chemical shifts. (a) Strips from the three- the resonances o f t w o residues, 93-94 (left),
dimensional NMR spectra HNCOCACB and three residues, 93-95 (middle) and four
HNCACB. Through the matching o f C“ and residues, 93-96 (right). The alignment o f
CP chemical shifts the neighboring amino two residues leads to three different possible
acids can be concatenated to a “stretch”. solutions (circles). After the identification o f
The positioning of this stretch onto the three or four subsequent resonance sets, the
primary sequence, putative Ala93 to Leu96, correct alignment is the unique solution o f
is ambiguous a t this stage. (b) In each o f the shift matching process.
866
I In near all cases, the solution becomes unique upon the addition of a third
15 Target Families
amino acid.
\-I
Distance [A] Distance [A]
MET431 12.8 * MET501 21.3
MET437 16.2 MET509 14.9 *
MET449 13.6 * MET570 18.9
MET450 14.9 MET587 17.1
MET477 16.5 MET596 18.9
MET489 25.1 MET630 23.3
Fig. 15.2-9 (a) ['H,"N]-TROSY spectra o f attenuated, are marked with an asterisk.
the selectively "N-Met labeled kinase BTK (b) Ribbon presentation ofthe structure o f
(black spectrum) showing 12 peaks BTK. Methionines are depicted as balls and
corresponding to the 12 methionine in the the spin-labeled adenosine i s shown as
primary sequence. Upon adding of sticks with the unpaired electron marked as
spin-labeled adenosine, the peak intensities a star. (c) Table ofthe distances o f t h e
are attenuated according t o the distance o f methionines to the spin-labeled adenosine.
the amino acid to the paramagnetic center. Four methionines are in closer distance t o
The percentage rate ofthe residual peak the spin-labeled adenosine marked with an
intensity is denoted by the peaks (light gray asterisk.
spectrum). Four peaks, which are strongly
comparably lower. Even the samples with 15N selectively labeled amino acids
I 869
yield less than the expected number of peaks, indicating that the disappearance
of signals is not due to the overlapping resonances. A detailed statistic for
p38 MAP kinase is given in Table 15.2-1. For PKA even less peaks could be
observed and assigned. Figure 15.2-10 depicts the extent of both assignments
mapped on the crystal structures of each kinase.
ASP 27 20 15
Ile 22 15 15
Leu 42 37 22
Met 10 9 5
Phe 13 12 12
TYr 15 12 9
Val 22 21 19
Total 34514 261(76%) 167(64%)
Fig. 15.2-10 Ribbon representation ofthe assigned regions marked in yellow are the
protein kinase PKA (a) and p38 MAP kinase more surface exposed regions. (c) Statistics
(b) showing the N-lobe, the C-lobe, and the of the assigned and unassigned peaks in the
ATP-binding site. In both proteins the [‘ H,’5N]-TROSY spectra.
870
I 15 Target Families
In both proteins, the assigned regions are the more surface exposed regions.
The N- and C-terminal sequences and also the p-sheet N-lobe are almost
entirely assigned. On the other hand, the C-helix,the catalytic loop and parts of
the activation segment remain unassigned in both proteins. These unassigned
regions are solvent inaccessible in the tertiary structure and form a contiguous
patch. However, the distribution of assigned versus unassigned regions of
both proteins (see Fig. 15.2-11) is different in many regions of the C-lobe.
It can be speculated that this observation indicates that the dynamics in the
Globe, or in the activation segment, are different in the two kinases, which
could correspond to the different functionality of these two proteins.
It is documented for the crystal structures of inactive human CDK2 and the
partially activated human CDK2-cyclin A complex that large conformational
changes of the activation segment occur [48]. Comparing the position of the
activation segment in the structures for Twitchin Kinase, IRK, calmodulin-
depend kinase I (CaMKI),and MAPK, a variety of conformations are revealed
that are accessible to different kinases in their inactive state [15, 16, 49,
SO]. The survey over the static crystal structures provides clues to the
conformational malleability of particular regions of the protein kinases, as
they move through the catalytic cycle while various substrates, inhibitors,
and scaffold proteins participate. It can be presumed that the mentioned
regions have residual mobility even in the absence of any ligands. These
local segmental motions could happen on a timescale, which is unfavorable
for conventional detection by solution N M R , as consequence, resonances
vanish because of excessive linebroadening. Fluorescence resonance energy
transfer (FRET)measurements support this hypothesis. One cysteine at the N-
terminal lobe of PKA was labeled with fluorescent probe acting as an acceptor.
The fluorescent donor was anchored at the opposing lobe and the observed
intramolecular anisotropy decay revealed that the apoenzyme is likely to be
highly dynamic [51].
Fig. 15.2-12 (a) Overlay o f a section o f the active state o f wild-type PKA and the inactive
[' H,15N]-TROSY spectra ofwild-type protein mutant T197A is proven by the large CSPs
kinase A (PKA) (black) and mutant T197A shown in the overlay. (b) The mutation o f
(red). The mutant T197A lacks the the other phosphorylation site Ser338 to an
autophosphorylation site Thr197 and is alanine does not cause conformational
therefore constitutively inactive. The changes since no CSPs could be observed in
conformational rearrangement between the the spectrum.
even possible for cases, when a resonance appears or disappears with ligand
binding, as it is shown in an example in Chapter 15.2.4.2.
15.2.3
Screening of Kinases by NMR
increases the demands for the protein production and pushes the achievable
size limit to 40-50 kDa. An investment in a cryo-probehead reduces the
protein amount by at least fourfold.
The synergy of ligand- and protein-based NMR screening is revealed in their
combination. Ligand-detected N M R is used as a primary screen in large scale
sampling, hit validation is performed with protein-based N M R with much less
samples. False positives obtained from the primary screen are ruled out and
subsequent analysis during the validation step increases the knowledge gain
about the desired interaction mode of the prestage drug candidates.
Fig. 15.2-14 Kinase selectivity profiles for a best with the observed kinase affinities.
representative dataset obtained by ligand- Twenty-nine clusters are lined up
detected NMR fragment approach. Each row consecutively and all the compounds, which
represents a kinase and the columns are members of a single cluster, are ordered
represent a small-molecule fragment. Eight again. The higher the average affinity, the
hundred and seventy compounds were position o f a compound is more t o the left
chosen out of a larger kinase-biased side within a given cluster and vice versa. At
screening library. The color-coding scheme the bottom, three clusters are denoted as an
corresponds t o a particular compound example. Most fragments bind with similar
having K, values greater (light gray) or lower affinities t o all kinases. To choose for
(dark gray) than 1.5 m M toward a single selectivity, isolated dark gray areas are t o be
kinase. The horizontal order is the result o f a picked within a row, where several similar
hierarchical clustering analysis compounds group together but do not show
(euclidean-Ward) with 65 descriptors (out o f the same affinities toward the other kinases
210 chemical descriptors), which correlate (light gray in the vertical).
Fig. 15.2-15 Development o f kinase-biased than to the others due t o the activated state
screening libraries during an NMR-based ofthis kinase. (b) A largerfragment library is
fragment approach. The K, values ofthe created by virtual screening, which utilizes
fragments are obtained by quantification o f the pharmacophoric fingerprints o f known
the STD NMR resonances o f an adenosine kinase inhibitors. After the screening, viable
derivative, which i s used as the reporter hits are selected and characterized by
ligand in a competition assay, (a) As a proof protein-based NMR. The information about
o f principle, published high-affinity ligands selectivity, binding site, and binding mode
are fragmented into their components. The was used for step-by-step optimization of
NMR method is applied t o this validation Small compound collections. (c) and
set, which reveals the typical affinities o f a (d) show that the synthesis efforts result in
particular kinase toward the “standard” the enhancement o f selectivity toward the
kinase fragments. For example, fragments third kinase.
usually exhibit higher affinities t o kinase A
15.2.4
Characterizing Kinase-Ligand Interactions by NMR
There are many interaction sites for inhibiting the phosphotransferase activity
of a protein kinase. Antagonism of the ATP-binding site to inhibit enzymatic
activity is the center of most investigations. Inhibition of this site can be
accomplished by unspecific inhibitors like staurosporine, and various kinase-
specific inhibitors have been discovered. Nevertheless, selectivity continues to
be a problem due to the commonality in the binding of ATP. All ATP site
binders bind to the highly conserved “hinge” region that connects N- and
C-terminal lobes. But the deep ATP cleft consists of several subsites that can
be utilized in the structure-based design of inhibitors. For example, the pivotal
role of protruding nonconserved residues has been reported, which facilitates
the access to particular subpockets, like a gate keeper. In the cases of imatinib,
gefitinib, and erlotinib clinical trials exhibited that single point mutations in
the active site lead to chronic resistance during the drug treatment [88, 891.
Alternatively, the kinase activation by interfering with regulatory subunit
binding can be prevented. Interactions can be stabilized, which maintain
kinase in the inactive form where it cannot bind ATP or where the residues
are misaligned for catalytic activity. Since inactive kinases must be correctly
recognized by activating enzymes, they differ more strongly from one another
than the activated forms, all of which fulfill the same function. The design of
binders to the inactive form could achieve a higher degree of selectivity. In
particular, the Asp-Phe-Gly motif (DFG) of the activation loop has attracted
much attention from medicinal chemists. A selective inhibitor at an adjacent
binding site turns a residue of the DFG loop into an “out” conformation
that precludes ATP from binding [go, 911. Kinase activity can be indirectly
inhibited by blocking the protein substrate recruitment site or by direct
inhibition of substrate phosphoacceptor subsite. Like all protein-protein
interaction surfaces this binding site is more difficult to target by small-
molecule inhibitors. It remains a considerable task for selectively targeting
individual kinases in this manner.
Fig. 15.2-16 Ligand binding is detected by which considers the average value o f CSPs
CSPs. The two-dimensional [’ H,15N]-TROSY and their mean square deviation. (a) CSPs
spectra o f the uniformly 15N-labeledp38 o f the small-molecule inhibitor 58203580
MAP kinase in the absence and presence are mapped on 1A9U.pdb (b) CSPs o f t h e
compared. The difference o f a given amide oligopeptide (KPDLRVVIPP) derived from
resonance on ligand binding is calculated the protein substrate MEF2A mapped on
and projected on the crystal structure o f the 1 LEW.pdb.
kinase. A color-coding scheme is used,
884
I 15.2.4.2
15 Target Families
DFC-in/DFC-out
Recently, an alternative binding site adjacent to the ATP-binding cleft has been
exploited for pharmaceutical intervention. The pyrazole-urea-based inhibitor
BIRB796 (structure see Fig. 15.2-8(d))induces an alternative conformation of
the DFG motif of p38 MAP kinase, turning the side chain of Phe169 from an
“in” to an “out” configuration. The corresponding loop undergoes a 10 A shift
that precludes ATP binding through the incompatibility of the new position
of the Phe side chain. This recognition principle has been successfully applied
to the protein kinases such as Raf [92],p38 MAP kinase [go, 91, 931, or kinase
insert domain receptor (KDR) [94].
In the NMR analysis of this part of the polypeptide chain, the DFG loop
(Asp168-Phe169-Gly170) turned out to be one of the segments that could
not be assigned in the spectra of the apo form of p38 MAP kinase. A [’H,
”N]-TROSY spectrum recorded from selectively l 5N-Phe labeled samples
revealed 12 of 13 phenylalanine correlations. The 12 visible signals were
unambiguously assigned; the unobservable signal belongs to Phe169 in the
DFG loop. This finding was confirmed by the spectrum of selectively ”N-Phe
labeled mutant Phel69Tyr, which exhibited an identical TROSY spectrum with
12 peaks. Altered field strengths, temperatures, and more sensitive acquisition
conditions with a cryoprobe head did not affect the result.
On addition of the pyrazole-urea-based DFG-out inhibitor to a selectively
”N-Phe labeled p38 sample, 13 peaks can be detected. A further investigation
with 13C’-labeledAsp/”N-labeled Phe, recording a HNCO-type experiment
confirmed the assignment of Phe169. The lineshape of the Phe169 amide res-
onance was simulated and analyzed with respect to the ability to detect the peak
in a [‘H, ”N]-TROSY spectrum. The chemical shift difference between DFG-
in and DFG-out conformations was estimated by a chemical shift prediction
according to the published X-ray structures. Figure 15.2-17 shows the relative
maximum peak intensities of the amide ”N-resonance of Phe169 as a function
of the exchange rate and the relative population of the “out” state in the sim-
ulated spectra. The lowest peak intensities are expected for medium exchange
rates at equally distributed states. The extent of this area shrinks with the
decreasing field strength of the spectrometer. The situation during the NMR
measurement of the apo protein and complexes with DFG-in ligands seems to
be in the depicted area, where the lineshape leads to excessive broadening. In
principle, the peak is detectable again by changes in temperature (move left or
right in the diagram), by a decrease of the field strength or by “freezing” one
of the two conformations with a ligand (move up or down in the diagram).
For the apo form of the p38 MAP kinase it was deduced that the absence
of the amide peak for Phe169 in the DFG motif under all tested N M R
conditions is consistent with a conformational “in/out” equilibrium taking
place at an intermediate NMR timescale. Binding of the pyrazole-urea-based
DFG-out inhibitor is not compatible with the DFG-in conformation; therefore,
the conformational exchange process of the DFG loop is directly interfered.
75.2 Chemical Biology ofKinases Studied by NMR Spectroscopy I 885
Fig. 15.2-17 Simulation o f NMR spectra o f maximum detectable peak intensities are
a two state DFC-in/DFC-out model. The expected for medium exchange rates and
grayscale represents the relative maximum about uniformly distributed states. The
peak intensities o f the 15N-amide resonance extent of this area shrinks with decreasing
o f Phe169 as a function o f the exchange rate field strength ofthe spectrometer.
and the population ofthe “out” state. The Unobservable peaks can be made visible
magnetic field strength is set according to a again by changes in temperature (move left
’H resonance at 600 MHz. The chemical or right in the diagram), by decrease o f t h e
shift difference was set to 13.7 PPm, as field strength, or by “freezing” one of the
predicted by chemical shift calculations two conformations with a ligand (move up
applied to the published X-ray structures Or down in the diagram).
(1 P38.pdb and 1 KV1 .pdb). The lowest
with the results obtained by biological assays, that is, DFG-in ligands do not
interfere with the p38 activation [98],whereas DFG-out inhibitors block both
activity and activation of p38 [99].
15.2.4.3 LIGDOCK
The LIGDOCK procedure [71] was suggested for the determination of
protein-ligand complex structures from non-X-ray data. Ambiguous exper-
imental data from NMR [loo, 1011 or from other biophysical or biochemical
experiments is introduced in an ambiguous manner [102-1041, which makes
it possible to determine proteinlligand complexes on the basis of only a few
experiments. The concept is based on the idea to collect readily available CSPs,
first. If necessary, more sophisticated experimental results have to be added to
improve the accuracy of the structure determination. The calculations consist
of three stages. In the first step, the two molecules are positioned distinct to
each other and a rigid body minimization is performed. Poses that best fulfill
the experimental parameters proceed to a simulated annealing in torsion angle
space keeping the ligand and the binding area ofthe protein as flexible. Possible
solutions are equilibrated with a molecular dynamics simulation using explicit
water. A critical step of the procedure is the ranking of the structures. Accurate
structures are picked from a “selection plot” in which both the intermolecular
van der Waals and experimental energy are plotted. Structures having both a
low van der Waals and a low experimental energy are possible solutions. By
contrast, structures in which only one of the two energy terms is low are dis-
carded. The approach was tested for three examples with increasing degree of
complexity. The determination of PTP8 in complex with ptplb can be resolved
with CSPs only. Here, the definition of the binding site suffers to resolve the
structural problem. The calculation for H7 in complex with PKA presents two
problems, which are common in the structure determination using non-X-ray
data: only partial NMR assignment of the protein was available and addition-
ally, the protein conformation in the complex is an “open” conformation but
the apo structure has a “closed” form. The choice of the starting conformation
influences the result of the simulation. Nevertheless, the calculations were
started with the “wrong” apo form. Surprisingly, the orientation and possible
constructive interactions of the quinazoline ring that is the main feature of the
H series of inhibitors are correctly reproduced, although the starting structure
of the protein and the known X-ray structure of the complex were very different
and only partial assignment of PKA was available. The determination of the
structure of SB203580 in complex with p38 was most complicated because of
the specific shape of the ligand. It has one twofold and one threefold rotation
symmetry axes, implying that the ligand can occupy the binding site also in
other symmetry-related orientations. Therefore, it is not possible to determine
the complex structure with CSPs only. But in combination with either STD
References I887
References
65. M. Vogtherr, K. Fiebig, EXS 2003, 93, 80. W. Jahnke, A. Florsheimer, M.J.
183-202. Blommers, C.G. Paris, J. Heim, C.M.
66. J.M. Moore, Curr. Opin. Biotechnol. Nalin, L.B. Perez, Curr. Top. Med.
1999, 10,54-58. Chem. 2003,3,69-80.
67. B. Meyer, T. Peters, Angew. Chem., 81. M.A. McCoy, M.M. Senior, D.F.
Int. Ed. Engl. 2003, 42, 864-890. Wyss,]. Am. Chem. Soc. 2005, 127,
68. M. Pellecchia, D.S. Sem, 7978-7979.
K. Wuthrich, Nut. Rev. Drug DiSCOv. 82. C.I. Chang, B.E. Xu, R. Akella, M.H.
2002, I , 211-219. Cobb, E.J. Goldsmith, Mol. Cells
69. K.A. Mercier, R. Powers,]. Biomol. 2002, 9, 1241-1249.
N M R 2005,31,243-258. 83. G. Kontopidis, M.J. Andrews,
70. S.B. Shuker, P.J. Hajduk, R.P. C. Mclnnes, A. Cowan, H. Powers,
Meadows, S.W. Fesik, Science 1996, L. Innes, A. Plater, G. Griffiths,
274,1531-1534. D. Paterson, D.I. Zheleva, D.P. Lane,
71. U. Schieborr, M. Vogtherr, S. Green, M.D. Walkinshaw, P.M.
B. Elshorst, M. Betz, S. Grimme, Fischer, Structure ( C u m b ) 2003, 1I ,
B. Pescatore, T. Langer, K. Saxena, 1537- 1546.
H. Schwalbe, Chembiochem2005, 13, 84. C, ~ ~M.J, ~ l ~D,I,
~ d ~ ~ ~ ~ ~ ~
13. Zheleva, D.P. Lane, P.M. Fischer,
72. N. Baurin, F. Aboul-Ela, X. Barril, Curr. Med. Chem. Anticancer Agents
B. Davis, M. Drysdale, B. Dymock, 2003, 3,57-69.
H. Finch, C. Fromont, 85. C. Mclnnes, P.M. Fischer, Curr.
'"f:
C. Richardson, H. Simmonite, R.E. P h a m . Des. 2005, 11,1845-1863.
Hubbard']' Cornput' sci' 86. J. Fejzo, C. Lepre, X. Xie, Curr. Top.
2004,44,2157-2166. Med. Chem. 2003, 3,81-97.
73. P.D. Lyne, P.W. Kenny, D.A. 87. J. Fejzo, C.A. Lepre, J.W. Peng,
Cosgrove, C. Deng, S. Zabludoff, J.J. G.W. Bemis, M.A. Murcko, Ajay,
Wendoloski, S. Ashwell, 1.Med. J.M. Moore, Chem. Biol. 1999, 6,
Chem. 2004,47,1962-1968.
755-769.
74. E. Vangrevelinghe, K. Zimmermann, 88. R. Ren, Nut. Rev. Cancer2005, 5,
J. Schoepfer, R. Portmann,
172-183.
D. Fabbro, P. Furet,J. Med. Chem.
89. T.A. Carter, L.M. Wodicka, N.P.
2003,46,2656-2662.
Shah, A.M. Velasco, M.A. Fabian,
75. E. ter Haar, W.P. Walters,
D.K. Treiber, Z.V. Milanov, C.E.
S. Pazhanisamy, P.Taslimi, A.C.
Atteridge, W.H. Biggs 111, P.T.
Pierce, G.W. Bemis, F.G. Salituro,
Edeen, M. Floyd, J.M. Ford, R.M.
S.L. Harbeson, Mini. Rev. Med.
Chem. 2004,4,235-253.
Grotzfeld, S. Herrgard, D.E. Insko,
S.A. Mehta, H.K. Patel, W. Pao, C.L.
76. C. Chuaqui, 2 . Deng, J. Singh,].
Med. Chem. 2005,48, 121-133. Sawyers, H. Varmus, P.P. Zarrinkar,
77. C. Dalvit, M. Flocco, S. Knapp, D.J. Lockhart, Proc. Natl. Acad. Sci.
M. Mostardini, R. Perego, B.J. U.S.A. 2005, 102, 11011-11016.
Stockman, M, Veronesi, M, Varasi,]. 90. C . Pargellis, L. Tong, L. Churchill,
Am. Chem. SOC.2002, 124, P.F. Cirillo, T. Gilmore, A.G.
7702-7709. Graham, P.M. Grob, E.R. Hickey,
78. W. Jahnke, P. Floersheim, N. Moss, S. Pav, J . Regan, Nut. Struct.
C. Ostermeier, X.Zhang, B i d . 2002, 9, 268-272.
R. Hemmig, K. Hurth, D.P. Uzunov, 91. J. Regan, A. Capolino, P.F. Cirillo,
Angew. Chem., Int. Ed. Engl. 2002, 41, T. Gilmore, A.G. Graham, E. Hickey,
3420-3423. R.R. Kroe, J. Madwed, M. Moriak,
79. W. Jahnke, M.J. Blommers, R. Nelson, C.A. Pargellis,
C. Fernandez, C. Zwingelstein, A. Swinamer, C. Torcellini,
R. Amstutz, Chernbiochem 2005, 6, M. Tsang, N. Moss,]. Med. Chem.
1607- 1610. 2003,46,4676-4686.
15 Target Families
890
I 92. P.T. Wan, M.J. Garnett, S.M. Roe, P.G. McCaffrey, S.P. Chambers, M.S.
S. Lee, D. Niculescu-Duvaz, V.M. Su,J. Biol. Chem. 1996, 271,
Good, C.M. Jones, C.J. Marshall, C.J. 27696-27700.
Springer, D. Barford, R. Marais, Cell 98. S. Kumar, M.S. Jiang, J.L. Adams,
2004, I 1 6,855-867. J.C. Lee, Biochem. Biophys. Res.
93. J. Branger, B. van den Blink, Commun. 1999,263,825-831.
S. Weijer, J. Madwed, C.L. Bos, 99. Y. Kuma, G. Sabio, J. Bain,
A. Gupta, C.L. Yong, S.H. Polmar, N. Shpiro, R. Marquez, A. Cuenda, J.
D.P. Olszyna, C.E. Hack, S.J. van Biol. Chem. 2005, 280,19472-19479.
Deventer, M.P. Peppelenbosch, 100. C. Dominguez, R. Boelens, A.M.
T. van der Poll, J. Immunol. 2002, Bonvin, J. Am. Chem. SOC.2003, 125,
168,4070-4077. 1731-1737.
94. P.W. Manley, G. Bold, J. Bruggen, 101. A.D. van Dijk, R. Boelens, A.M.
G. Fendrich, P. Furet, J. Mestan, Bonvin, J.P. Linge, S.I. O’Donoghue,
C. Schnell, B. Stolz, T. Meyer, M. Nilges, FEBSJ. 2005, 272,
B. Meyhack, W. Stark, A. Strauss, 293-312.
J. Wood, Biochim. Biophys. Acta 2004, 102. J.P. Linge, S.I. O’Donoghue,
1697,17-27. M. Nilges, Methods Enzymol. 2001,
95. Z . Wang, B.J. Canagarajah, J.C. 339,71-90.
Boehm, S. Kassisa, M.H. Cobb, P.R. 103. M. Nilges,J. Mol. Biol. 1995, 245,
Young, S. Abdel-Meguid, J.L. Adams, 645-660.
E.J. Goldsmith, Structure 1998, 6, 104. M. Nilges, S.I. O’Donoghue, Prog.
1117-1128. N M R Spectrosc. 1998, 32, 107-139.
96. 2. Wang, P.C. Harkins, R.J. Ulevitch, 105. P.J. Hajduk, J.C. Mack, E.T.
J. Han, M.H. Cobb, E.J. Goldsmith, Olejniczak, C. Park, P.J. Dandliker,
Proc. Nutl. Acud. Sci. U.S.A. 1997, 94, B.A. Beutel,]. Am. Chem. SOC.2004,
2327-2332. 126,2390-2398.
97. K.P. Wilson, M.J. Fitzgibbon, P.R.
Caron, J.P. Griffith, W. Chen,
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
15.3
The Nuclear Receptor Superfamily and Drug DiscoveryQ
Outlook
15.3.1
Introduction
A central theme that defines the field of endocrinology is the act of controlling
activities and processes at distal sites in the body. Signaling molecules, in
some cases nonprotein small molecules, traverse the body and ultimately relay
their chemically encoded information to a protein receptor at the target tissue.
The nuclear hormone receptor ( N R ) is a classic example of a receiver for such
small molecule, chemical messengers. The N R is well adapted for this type of
function because it not only specifically binds the small molecule but is also
capable of relaying or transducing a complex set of signals carried along by
the properties of the ligand. As reviewed herein, the nature of the information
that the ligand-bound N R relays, depends on a complex interplay of factors,
such as ligand and cell type.
In humans, 48 N R genes have been identified (Fig. 15.3-1) [l].A feature that
unifies the N R s as a superfamily is that each receptor consists of an assembly
of functional modules (Fig. 15.3-2) [2].For the purpose of this review, the
module most relevant to current drug discovery approaches is the C-terminal
$< A similar version of this paper was published in ChemMedChem 2006, 1, 504-523, Wiley-VCH,
Weinheim, Germany
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wrss
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheirn
ISBN: 978-3-527-31150-7
c
NR112-PXR
I NR113-CAR
NR11I-VDR
N R I B2-RARB
NRIBS-RARG
_.
-
-
-
-
-
NR2AI-HNF4
NR2A2-HNF4
NRGAI. .GCNF
NRSAl-SF-1
NRSA2-LRH-
NR2ES-PNR
Fig. 15.3-1 The NR superfamily amino acid sequence relationships. NRs are
represented as a phylogeny plot. The 48 named according to the accepted unified
identified receptors within the human nomenclature (see Table 15.3-1 for more
genome are shown clustered according t o description) [l].
ligand-binding domain (LBD). The LBD is typically about 250 amino acids in
length and contains a key regulatory element, the so-called activation function
2 (AF2) domain, as well as all the recognition elements required for ligand
binding (Fig. 15.3-3(a),left) [3].
The fold of the N R LBD is typically described as three stacked a-helical
sheets. The helices comprising the “front” and “back” sheets are roughly
aligned parallel to one another. The helices in the middle sheet run across the
two outer sheets and occupy space only in the upper portion of the domain
(Fig. 15.3-3(a),right). The space in the lower part of the domain is relatively
Fig. 15.3-2 Domain organization ofthe Examples of selected NRs (see Table 15.3-1
NRs. Shown are the basic structural for abbreviations) are shown below to
modules comprising an NR (AF-1, activation demonstrate that most NR LBDs are simi-
function-1; DBD, DNA binding domain; lar in amino acid length, but the N-terminal
LED, ligand binding domain). In the linear region varies ~ ~ O n gfamily
St members.
schematic at the top, the general functions Numbers represent amino acid Position.
o f the respective regions o f NRs are noted.
894
I 75 Target Families
15.3 The Nuclear Receptor Supefamily and Drug Discovery I 895
4 Fig. 15.3-3 Representative structures of atom type with carbons shown as blue and
NR functional modules. (a) The first NR oxygens shown as red. The domain on the
LBD to be solved crystallographically was right is rotated 90” t o clearly show the three
the apo RXR LBD (51. The representative helical layers that comprise the NR LBD
example structure shown here, depicted as fold. (b) Shown, as a ribbon diagram, this is
a ribbon diagram, is for PR bound t o its the first X-ray crystal structure o f an NR
natural ligand progesterone [6]. This DBD bound to a DNA response element.
structure, which was the first ofthe steroid This representative structure is the DBD
receptors to be solved, shows the basic fold from CR bound in an antiparallel fashion to
conserved among members ofthe NR its inverted direct-repeat DNA response site
superfamily. The major helices (red) are [7]. The CR DBD is bound as a homodimer,
labeled, the well-conserved small p-sheet is where one ofthe subunits is colored yellow
shown in yellow, and the random coil and the other is colored blue. The DNA helix
stretches connecting the major structural is shown with atoms represented as spheres
elements are colored green. The final and colored according t o atom type
C-terminal helix is labeled as the AF2 helix (carbon - green; oxygen - red;
and is described in more detail in the text. nitrogen - blue; phosphorus - magenta).
The progesterone molecule is colored by
void of protein, and for most NRs this creates an internal cavity for small
molecule ligands.
The central part of the typical N R contains the DNA binding domain (DBD),
which is usually about 70 amino acids, contains two zinc-finger motifs, and
is the most highly conserved sequence segment amongst the NRs. For some
NRs, the DBD forms a dimer and binds a DNA response element containing
a direct repeat of six base pairs [4]. The typical DBD contains three helices,
(Fig. 15.3-3(b))the first of which docks into the major groove of the DNA
recognition site. A second smaller helix and the loop preceding it create a
domain-domain interface. The third helix makes no DNA or other contacts.
Most NRs have an N-terminal domain, commonly referred to as the activation
finction I (AF1) domain. This module varies greatly in length amongst
receptors and generally contains a nonligand dependent transcriptional AF.
Upon activation by the ligand messenger, NRs typically function as
transcription factors where they bind to recognition elements and regulate
the expression of target genes. Once complexed to DNA, NRs recruit accessory
proteins such as coactivators, corepressors, and basal transcriptional factors,
thus initiating gene transcription (Fig. 15.3-4).In some cases, genes under the
control of a negative response element are downregulated by an NR; thus NRs
are able to act directly as activators or suppressors of gene function. As will
be discussed later in this chapter, N R pathway regulation goes beyond direct,
DNA-mediated transcriptional regulation. For example, some NRs crosstalk
with other important signal transduction schemes such as nuclear factor
kappa B (NF-KB)and activator protein 1 (AP-I)[8](Fig. 15.3-4).
NRs have a rich and long-standing history in drug discovery. This can be
attributed to several features inherent to this class of targets: (a) NRs have been
designed by nature to selectively bind “druglike” small molecules and (b) a di-
verse set of biologically important functions can be regulated through a single
ligand-activated receptor (see Table 15.3-1, e.g., of NR-targeted drugs). Data
896
I 15 Target Families
Classic Estrogen receptor E Rct NR3A1 Estradiol, estrogens Tamoxifen, raloxifene Menopausal symptoms,
steroid ERB NR3A2 (Evista),genestein, osteoporosis prevention,
receptors diethylstilbestrol, breast cancer a. .-
equine estrogens 9
lu
(Premarin)
Glucocorticoid receptor GR NR3C1 Cortisol; Prednisone, Inflammatory and 2
glucocorticoids dexamethasone, immunological diseases, 2
fluticasone propionate asthma, arthritis, allergic $
(Flovent, Flonase), rhinitis, cancer, immune 2
mometasone furoate suppressant for transplant $
(D
(Nasonex),budesonide ‘n,
(Rhinocort/Pulmicort) 2
VI
Mineralocorticoid MR NR3C2 Aldosterone; Spironolactone Hypertension, heart
receptor deoxycorticosterone (Aldactone), failure P
eplerenone (Inspra) 3
3.
Progesterone receptor PR NR3C3 Progesterone; RU486 (Mifepristone) Abortifactant, menstrual
3
progestins control n
Androgen receptor AR NR3C4 Testosterone; Flutamide, Prostate cancer 0
androgens bicalutamide (Casodex) 09
z
P
w.
(continued overleaf) 2
E
General Name Subtypes and Unified Natural ligand Examples of Therapeutic 2
categorylal abbreviations nomen- therapeutic %-+
(other common clatureIb] ligands 2
abbreviations) (trade name)
--.
2.
VI
Classic RXR- Thyroid hormone TRa NRlAl Thyroid hormone Levothyroxine Thyroid deficiency
heterodimer receptor TRB NRlA2 (Synthroid)
receptors
Retinoic acid receptor RARa NRlBl Retinoic acid Isotretinoin (Accutane) Acne
RARB NRlB2
NRlB3
Peroxisome PPARa NRlCl Fatty acids, eicosanoids Fenofibrate (Tricor; Dyslipidemia (PPARa),
proliferators-activated PPARS NRlC2 PPARa), diabetes and insulin
receptor PPARy NRlC3 thiazolidinediones sensitization (PPARy )
(Avandia, Actos;
PPARy)
Liver X receptor LXRa NRlH2 24,25-Epoxycholesterol - Role in lipid and
LXRB NRlH3 24-Hydroxycholesterol cholesterol metabolism;
atherosclerosis
Farnesoid X receptor FXR NRlH4 Chenodeoxycholic acid Cholesterol maintenance,
protect hepatocytes from
bile toxicity; cholestasis
Vitamin D receptor VDR NRlIl Vitamin D, bile acids Calcitriol (Rocaltrol) H ypocalcemia,
osteoporosis, renal failure
Retinoid X receptor RXRa NR2Bl All trans-retinoic acid LG1069 (Targretin) Skin cancer
RXRP NR2B2
RXRY NR2B3
Xenobiotic Pregnane X receptor PXR NRlI2 Xenobiotics St. John’s wort, Role in protection from
receptors rifampicin toxic metabolites
Constitutive androstane CAR NR113 Xenobiotics Phenobarbitol Role in protection
receptor toxic metabolites
Orphan ER-related receptor ERRa NR3Bl Unknown Tamoxifen, Muscle fatty acid
Receptor (or ERRB NR3B2 diethylstilbestrol metabolism ( E R R
recently ERRy NR3B3 (ERRY)
deorphaned)
RAR-related orphan RORa NRlFl Cholesterol, cholesterol - Role in cerebellu
receptor RORfi NRlF2 sulfate development,
RORy NRlF3 maintenance of b
(RORa);circadian
(RORB); lymph n
organogenesis (R
Human nuclear factor 4 HNF4a NR2Al Palmitic acid Role in diabetes
HNF4y N R2A2
Reverse erbA Rev-erbAa NRlDl Unknown Circadian rhythm
Rev-erbAfi NRlD2
Testis receptor TR2 NR2Cl Unknown Unknown
TR4 NR2C2
Tailless-like TLX NR2El Unknown Role in neuronal
development
Photoreceptor-specific PNR NR2E3 Unknown Role in photorece
nuclear receptor differentiation
Chicken ovalbumin COU P-TF I NR2Fl Unknown Role in neuronal
upstream COUP-TFII NR2F2 development (CO
promoter-transcription COUP-TFIII NR2F6 vascular develop
factor (Ear2) (COUP-TFII)
(continued
General Name Subtypes and Unified Natural ligand Examples of Therapeutic 2
categoryia1 abbreviations nomen- therapeutic 2
(other common clatureIb] ligands 2
abbreviations) (trade name) =
2.
2
NGF-induced factor B NGFIBa (also NR4A1 Unknown Role in thymocyte
NUR77) apoptosis
Nur related factor 1 NGFIBB NR4A2 Unknown Role in dopaminergic
(NURR1, NOT1) neuron development
Neuron-derived orphan NGFIBy NR4A3 Unknown Unknown
orphan receptor 1 (NOR1)
Steroidogenic factor 1 SF1 NR5A1 Phospholipids Role in mammalian sexual
development
Liver receptor LRHl NR5A2 Phospholipids Role in lipid homeostasis,
homologous protein 1 cell-cyclecontrol
Germ cell nuclear factor GCNF NRGAl Unknown Role in vertebrate
embryogenesis
NR-like, DSS-AHC critical region DAXl NROBl Unknown Role in sex determination
DBD-less on the chromosome, and development
repressors gene 1
Short heterodimer SHP NROBZ Unknown General repressor of NRs,
partner obesity
-
a Each of the 48 human receptors is roughly categorized into
several very generalized groups. The order descends from the
historically, more studied, classical receptors (top) to the more
recently discovered family members (bottom).
b Nomenclature from Ref. 111.
c Biological role of the receptor if ligand is currently not
identified.
15.3 The Nuclear Receptor Superfamily and Drug Discovery I 901
15.3.2
Brief History of N R s in Medicine and Drug Discovery
became the gold standard for the endocrine treatment of breast cancer and
relatively recently became the first approved cancer chemopreventative agent.
Not surprisingly, the first set of N R genes cloned were from the steroid
receptor subgroup where prior research yielded compounds to aid in
purification of the receptor. The first human N R cloned was the glucocorticoid
receptor (GR), an accomplishment that relied heavily on reagents made
available from the purification and biochemical characterization of adrenal
extracts. With purified receptors, selective antibodies were used to help isolate
the corresponding cDNA [14-161. cDNAs representing the full-length coding
region of GR provided the first full-length amino acid sequence of an NR. The
estrogen receptor (ER) was also cloned around the same time by three groups
using independent strategies [17-191.
Comparison of emerging N R sequences (from human as well as from other
species) revealed conserved domains shared virtually among all NRs. The
finding that NRs could be isolated without knowledge of their ligand increased
the rate at which new NRs could be identified. Initially, oligonucleotides
representing conserved N R motifs (such as the highly conserved DBD) were
employed as molecular probes to perform low stringency DNA hybridizations
to cDNA libraries. The number of orphan NRs quickly surpassed the number
of classical NRs [20-221.
By the late 1990s, the chosen method for identification of new NRs shifted
from the laboratory to in silico methods. This advance was made possible by the
availability of large databases of randomly generated partial cDNA sequences,
known as expressed sequence tags (ESTs),and the development of bioinformatic
searches and query tools such as BLAST. Two new mammalian NRs were
successfully identified through automated searches of EST databases. The
pregnane X receptor (PXR) was identified in a public database of mouse
ESTs by a high-throughput in silico screen for NR-like sequences [23],
and the photoreceptor cell-specific receptor (PNR) was found in a human
EST database [24]. After the isolation of PNR from EST databases, the
number of human NRs totaled 48. The availability of the complete human
genome sequence in 2001 confirmed that this set of 48 is the complete N R
genome [25, 261.
As new NRs were isolated, new connections between first-generation drugs
and their targets were made. For example, thiazolidinediones (TZDs) had
previously been discovered through traditional pharmacological methods
to show clinical benefit in diabetes; however, the molecular basis for
this therapeutic effect remained unclear. By using expression constructs
derived from the isolated N R genes, activity screens for each receptor were
developed. Using these screens, TZDs were found to be potent and selective
activators of peroxisome proliferator activated receptor gamma (PPARy)
[27]. Once this link was made, the search for a second generation of
PPARy compounds could be initiated using an in vitro assay for PPARy
activation.
75.3 The Nuclear Receptor Supefamily and Drug Discovery I 903
15.3.3
Basic Principles for Ligand-NR Recognition
o@- RU486
Fig. 15.3-6 Structure ofthe PPARy LBD (b) Close-up ofthe binding site with the
and features o f ligand binding. (a) Shown in PPARy LBD. The front face o f the site is
blue is a ribbon diagram ofthe crystal clipped away to show the bound
structure o f PPARy LBD bound with rosiglitazone molecule and the hydrophobic
rosiglitazone [40]. The AF2 helix, which is backside ofthe binding pocket. As shown, a
colored red, is in the active position for tyrosine residue from the AF2 helix o f
binding an LXXLL coactivator peptide (not PPARy makes a hydrogen bond with the
shown). The rosiglitazone molecule is thiazolidinedione head group o f
buried in the receptor and is represented in rosiglitazone. (c) Representative structure o f
space-filling mode with carbons colored a well-known PPARy ligand.
green, oxygens red, and nitrogens blue.
908
I 15 Target Families
15.3 The Nuclear Receptor Superfamily and Drug Discovery I 909
4 Fig. 15.3-7 Structure of the LXRB LBD and the case o f nonsteroidal synthetic
features o f ligand binding. (a) A ribbon molecules, protrudes toward the AF2 helix.
diagram representing the crystal structure o f (b) Close-up o f the ligand-binding pocket for
LXRP in complex with the synthetic agonist LXRB. The front half o f the receptor is cut
ligand, T0901317, i s shown in orange [42]. away t o show the ligand bound back face o f
The AF2 helix, which assumes the agonist the pocket. The histidine/tryptophan switch,
conformation, is colored red. The ligand is which is key for ligand-induced activation o f
shown in space-filling mode and carbons are LXR, is highlighted. The His-mediated
colored green, oxygens red, nitrogens blue, hydrogen bond i s indicated with yellow line.
and fluorines magenta. Similar t o the (c) Representative structures o f well-known
orientation o f steroids with the steroid LXR ligands.
receptors, the D ring, or D-ring mimetics in
15.3.4
Influence of Ligand on NR LBD Conformation
There have been numerous key studies demonstrating that ligand binding
does not simply trigger NRs from an off-state to an on-state. In fact these
studies revealed, at a molecular level, that activation of an N R by a small
molecule ligand is dramatically more complex than a two-state process. The
concept that ligand alters N R conformation to produce activity profiles pertains
mostly to the steroid receptors, PPARs, TR, RXR, RAR (retinoic acid receptor),
LXR, and FXR. Considerable doubt exists whether this concept applies to select
“constitutively active” receptors such as HNF4 and NGFIB.
One of the first studies to reveal the conformational effect of ligand utilized
a protease digestion assay to show that ER ligands could differentially affect
the pattern of protease-generated peptides [48]. As suspected from earlier
work, this study demonstrated that different ligand classes could affect N R
conformation and thus alter the AF2 activity of the receptor.
Predominantly structural studies using X-ray crystallography have shed
light on how ligands can alter N R conformation. In the late 199Os, two
groundbreaking reports on ER showed that ligand can particularly affect the
orientation of the most C-terminal a-helix of the LBD, referred to as the AF2
helix [34, 361. In these studies, the AF2 helix of ER, bound with an agonist
ligand such as estradiol or the synthetic DES, was shown to adopt a position
similar to that seen in the original RAR and PPARy agonist-bound structures
[S, 401 (Fig. 15.3-8(a)).In this active conformation, the AF2 helix spans across
H3 and H10. This arrangement creates a shallow, hydrophobic groove adjacent
75 Target Families
910
I
713 The Nuclear Receptor Supe6amily and Drug Discovery I 91 1
4 Fig. 15.3-8 Examples showing the many site for an LXXLL coactivator peptide, which
possible conformations ofthe AF2 helix. is colored yellow. The ligand tamoxifen
(a) E R u with the agonist diethylstilbestrol sterically interferes with the loop preceding
(341; (b) E R u with the antiestrogen the AF2 helix and causes the AF2 helix to
4-hydroxytamoxifen [34]; (c) PPARa with the reorient, bind within the coactivator cleft,
antagonist CW471 [49]. Each receptor, and block LXXLL peptide binding. For the
oriented in the standard position with PPARa:GW471 complex, the AF2 helix is
H1/H3 in front and slightly off to the right, perturbed in a way t o allow accommodation
is shown in space-filling mode. The AF2 o f a corepressor peptide (shown in
helix for each receptor is shown as a green magenta). In this case the AF2 helix is
ribbon, or as a green random coil for somewhat unwound and localizes on the
PPARa. On DES:ERu. the AF2 helix lies receptor in a different position relative to
across the receptor to help form a binding that seen for other NR LBD structures.
to the AF2 helix. This pocket accommodates a short helical peptide presented
at the surface of a coactivator protein (reviewed in a section below). Peptides
that bind this region of the activated N R typically contain an LXXLL motif
(where L and X represent leucine and any amino acid, respectively). This
short peptide motif is typically a-helical and the leucine residues are presented
on one face of the amphipathic helix. An additional electrostatic interaction
between amino acid side chains of the receptor and the peptide backbone are
believed to aid orientation and stability to the interactions.
The structures of E R bound with either tamoxifen or raloxifene, where both
are antagonists for AF2 function, strikingly revealed that the AF2 helix could
be repositioned from the agonist conformation (Fig. 15.3-8(b)).In each of
these structures, an amine-containing head group from the ligand protrudes
toward the surface of ER to destabilize the active position of the AF2 helix.
This shift causes the AF2 helix to rotate approximately 90" from the active
position. In the antagonist position, the AF2 helix occupies the coactivator
peptide-binding site on the surface of the receptor. These studies highlight
the ligand-induced flexibility and plasticity of the N R LBD particularly with
respect to the AF2 helix.
More recent structural studies using the GR LBD further demonstrate how
ligand can influence the conformation of the LBD [29, 501. The structure of
GR bound with the agonist dexamethasone shows that the AF2 helix exists in
an active position to allow coactivator peptide association. Two structures of
GR bound with the antagonist ligand RU486 have shown that a protruding
dimethylaniline group effectively prevents the AF2 helix from occupying the
active position. In one of these structures, the AF2 helix intramolecularly
blocks the coactivator site. In the other structure, the AF2 helix extends away
from the core of the LBD and associates with an adjacent LBD subunit in
the crystal. Again, these studies suggest that the AF2 helix and the loop that
precedes it are prone to ligand-induced conformational flexibility.
Two studies dealing with PPAR also demonstrate the ligand-induced
conformational aspects of the LBD. In a structure of PPARa, in complex
with both an antagonist ligand GWG471 (Fig. 15.3-9) and a peptide motif
15 Target Families
912
I
Fig. 15.3-9 Examples o f NR tool that is oriented toward the AF helix (as
compounds and drugs, many of which are determined from the crystal structure o f the
referred t o and discussed in the text. For NR-ligand complex) is shaded.
some ligands, the region o f the molecule
15.3 The Nuclear Receptor Superfamily and Drug Discovery I 913
15.3.5
The Multitude o f Ligand-induced NR Actions
15.3.5.1 Gene Regulation and the Role ofActivity Enhancing Accessory Proteins
At various stages in the activity cycle, NRs act in concert with a variety of
binding partners. For example, prior to ligand binding, GR resides in the
cytoplasm of the cell in complex with chaperone proteins such as hsp90 or
p23 [54]. Ligand association causes dissociation of chaperones and allows
GR to traverse the nuclear envelope. Using amino acids within the DBD,
the GR binds to a recognition site on a specific promoter, a site referred
to as a glucocorticoid response element (GRE). N R response elements have a
general half site consensus of RGGTCA (where R is a purine); these DNA
half sites are commonly arranged as repeats, either direct or inverted. The
precise mechanism by which NRs associate with DNA response elements
varies amongst the superfamily. In general, the steroid receptors bind to their
response elements as homodimers, although GR can form heterodimers with
MR, and ERa and ERD also can bind DNA as heterodimers. Several NRs, such
as TR, PPARs, LXR, VDR, RAR, and FXR, require heterodimerization with
RXR. Further, many ofthe orphan receptors, such as LRH1, SF1, and NGFIB
can bind DNA as a monomer.
The DNA-bound, ligand-activated N R serves as the docking site for a rather
large extended family of proteins called coactivators. Binding of a coactivator
914
I protein is believed to be one of the key events in initiating transcriptome
15 Target Families
the CoRNR box is approximately one a-helical turn longer, and the AF2
helix on PPARa is pushed out of position and does not play a role in
molecular recognition (Fig. 15.3-8(c)).There are several reports showing that
NRs occupied by nonagonist ligands, such as E R with raloxifene and GR with
RU486, increase corepressor binding. These results suggest that these type of
ligands not only disfavor coactivator binding but also create a surface on the
N R favorable for corepressor binding.
15.3.6
Specific Examples of Recent NR Drugs and Novel Drug Candidates
As mentioned earlier in the text, the NRs have a rather illustrious history in
pharmaceutical discovery (Table 15.3-1). Once a synthetic ligand has been
identified for a receptor, typically via screening and/or structure-guided
design efforts, the goal is to chemically alter the properties of the ligand
to appropriately modulate the activities of the receptor. Throughout the last
decade or so, ligands that display differential activities relative to the natural
ligand have been commonly referred to as selective nuclear receptor modulators
(SNuRMs). One of the original demonstrations of this concept involved ER
and the two classic selective estrogen receptor modulators (SERMs),OHT and
raloxifene. Essentially, it was found that these SERMs retained tissue-selective
agonist activity (such as in bone tissue and on lipid profile for raloxifene),
but functioned as antagonists in reproductive tissues [84, 851. Furthermore,
even though both molecules were originally considered "antiestrogens", OHT
generally shows a trend toward estradiol-like activity in uterine tissue [85],
whereas raloxifene does not. The groundbreaking work around novel ER
ligands has opened the gates to find novel, tissue-selective synthetic modulators
for several of the therapeutically relevant NRs.
In this section we will highlight a few of the more recent pursuits of
SNuRMs (Fig. 15.3-9). The purpose of this brief discussion is to give an
overview of the current state of the art for ligand and drug discovery by
mentioning a few somewhat recent specific examples. Overall, the present
mission in N R drug discovery is to manipulate the receptor with ligand
to retain tissue-selective benefits while minimizing the unwanted activities
(Table 15.3-2). These few selected examples cover the basic principles of N R
drug discovery - such as identifying small molecule binders and modifying
hits for N R modulation - and the use of recent techniques and methodologies.
Another relatively recent focus for ER-directed drug discovery relates to the
fact that there are two subtypes of this receptor, ERw and ERj3, which derive
from two separate genes [32, 971. Stimulated by the distinct tissue distribution
pattern of these two related receptors, the concept is that new indications, such
as inflammation and cancer, can be treated with an ER-selective molecule.
Toward this goal, several reports have demonstrated it to be possible to identify
ERB-selective ligands [37, 98-1001.
15.3.7
New Approaches to NR Drug Discovery
One of the more recent principles in the field of NR research and drug
discovery is the realization that a subset of the myriad of functions of NRs
can be selectively manipulated by ligand, a general concept referred to as N R
modulation. New technologies, including advanced computational methods, are
inspiring new strategies for discovering novel NR modulating drug candidates.
Importantly, new technologies allow profiling of N R ligands at greater speed
and in a more physiologically relevant context. Several new approaches to N R
15.3 The Nuclear Receptor Superfamily and Drug Discovery I 921
it has also identified unique sets of endogenous target genes for use in ligand
screening assays.
Using a similar system as described above (U20S cells expressing either E R a
or ERB), Tee and colleagues [124]evaluated the effects of different ER ligands
(including the SERMs raloxifene and tamoxifen) on E R a and ERB target genes.
Microarray analysis showed that raloxifene and tamoxifen regulated only 27%
of the same genes in both the E R a and ERB-containing cells. These results
indicate that estrogens and SERMs exert tissue-specific effects by regulating
unique sets of target genes through ERa/#?. Thus, these specific genes serve
as unique identifiers of compound action, and a subset is especially useful in
discriminating ER ligands.
Higher throughput methods to analyze gene expression hold the promise of
screening large numbers of compounds in a cellular environment using a cost-
effective technology. For example, with advances in glass slide preparations
for monitoring transcriptional changes of thousands of genes, a hit from a
multiwell cell treatment can be inexpensively assessed over a genome-wide
range of genes. with such an analysis, it is possible to observe distinctions
between even very closely related chemotypes. A recent study has used
gene expression profiling to characterize breast cancer cells and to identify
desired “molecular fingerprints” within the data [ 1251. Key “biomarkers” can
be identified, which provide information linked to the phenotypic effect of a
compound. With such a screen, knowledge ofthe target ofthe compounds (e.g.,
whether a compound has antiestrogen effects) is not an a priori requirement.
One challenge in this type of approach is that vast amounts of data are
generated and bioinformatics analysis becomes a limiting factor. Current
advances in gene expression profiling as a drug-screening method must go
hand in hand with advances in bioinformatics and data handling.
Changes in the steady-state levels of mRNA do not tell the whole story. Study
groups are now involved in integrating the data obtained from mRNA steady-
state level analysis with proteomic data. Huber et al. analyzed differences
between the gene and protein expression patterns of the human breast
carcinoma cell line T47D and its derivative T47D-r, which is resistant to the
pure antiestrogen ZM 182780 [126]. Microarray analysis was carried out in
parallel to a proteomics analysis where the total cellular protein content of
T47D or T47D-r was separated on two-dimensional gels. Thirty-eight proteins
were found to be reproducibly up- or downregulated more than twofold in
T47D-r versus T47D in the proteomics analysis. Comparison with differential
mRNA analysis revealed that 19 of these were up- or downregulated in parallel
with the corresponding mRNA molecules. For 11 proteins, the corresponding
mRNA was not found to be differentially expressed, and for 8 proteins an
inverse regulation was found at the mRNA level. A general conclusion from
such studies is that, though the pattern of expression of the two data sets is
similar, the disconnected trends emphasize the importance of posttranslational
mechanisms in cellular development. These types of changes can only be
Acknowledgment 1 923
15.3.8
Future Developments and Conclusions for NR Chemical Biology
The human NRs as a structural class are essential for life and survival, and
they play an integral role in many critical physiological processes such as
metabolism, homeostasis, differentiation, growth and development, aging,
and reproduction. This family of receptors has a common evolutionary
history as evidenced by their sequence relationship and their commonality
in cellular function [129]. The myriad of functions of NRs is vastly complex
and the pathways they control are intertwined with each other as well as with
numerous accessory proteins and partners in function. Even with this inherent
complexity, as reviewed briefly above, this family of receptors has had a long
and fruitful history for drug discovery. With the advent of high-throughput
chemistries, structural biology, novel biochemical methods, and pathway
analysis technologies, such as differential gene expression and proteomics,
there will undoubtedly be new discoveries leading to drugs with improved
therapeutic profiles. These N R modulator efforts should help in defining
better the ligand-induced activities that produce tissue-selective beneficial
effects and in minimizing unwanted activities. In addition, there are likely
to be advances toward ligand discovery for the remaining orphan receptors.
Studies using these tool compounds should lead to target validation and better
definition of therapeutic relevance for the remaining orphan NRs. Overall,
the future of targeting the N R superfamily with novel synthetic ligands holds
75 Target Families
924
I tremendous potential and should lead to a variety of safer, more effective
medicines for treatment of a plethora of human diseases.
Acknowledgment
We would like to thank Tim Willson for critically reading this review. We also
thank Lakshman Ramamurthy for his kind contribution of the N R superfamily
phylogeny plot. Finally, we would like to thank our many GlaxoSmithKline
colleagues for helpful discussions and collaborations on NR-related projects.
References
61. T.E. Spencer, G. Jenster, M.M. in the absence of high affinity DNA
Burcin, C.D. Allis, J. Zhou, C.A. binding by the estrogen receptor, J .
Mizzen, N.J. McKenna, S.A. Onate, Biol. Chem. 1994, 269,12940-12946.
S.Y. Tsai, M.J. Tsai, B.W. O’Malley, 71. A. Ray, K.E. Prefontaine, Physical
Steroid receptor coactivator-1 is a association and functional
histone acetyltransferase, Nature antagonism between the p65 subunit
1997,389,194-198. of transcription factor NF-kappa B
62. M.L. Privalsky, The role of and the glucocorticoid receptor, Proc.
corepressors in transcriptional Natl. Acad. Sci. U.S.A. 1994, 91,
regulation by nuclear hormone 752-756.
receptors, Annu. Rev. Physiol. 2004, 72. E. Caldenhoven, J. Liden, S. Wissink,
66, 315-360. A. Van de Stolpe, J. Raaijmakers,
63. J.D. Chen, R.M. Evans, A L. Koenderman, S. Okret, J.A.
transcriptional co-repressor that Gustafsson, P.T. Van der Saag,
interacts with nuclear hormone Negative cross-talk between RelA and
receptors [see comment], Nature the glucocorticoid receptor: a possible
1995,377,454-457. mechanism for the antiinflammatory
64. A.J. Horlein, A.M. Naar, T. Heinzel, action of glucocorticoids, Mol.
J. Torchia, B. Gloss, R. Kurokawa, Endocrinol. 1995, 9,401-412.
A. Ryan, Y. Kamei, M. Soderstrom, 73. V. Doucas, Y. Shi, S. Miyamoto,
C.K. Glass, M.G. Rosenfeld, A. West, I. Verma, R.M. Evans,
Ligand-independent repression by Cytoplasmic catalytic subunit of
the thyroid hormone receptor protein kinase A mediates
mediated by a nuclear receptor cross-repression by NF-kappa B and
co-repressor [see comment], Nature the glucocorticoid receptor, Proc.
1995,377, 397-404. Natl. Acad. Sci. U.S.A. 2000, 97,
65. M.G. Guenther, 0. Barak, M.A. 11893-11898.
Lazar, The SMRT and N-CoR 74. R. Losel, M. Wehling, Nongenomic
corepressors are activating cofactors actions of steroid hormones, Nat.
for histone deacetylase 3, Mol. Cell. Rev. Mol. Cell Biol. 2003, 4,46-56.
Biol. 2001, 21, 6091-6101. 75. V. Boonyaratanakornkit, M.P. Scott,
66. X. Hu, M.A. Lazar, The CoRNR motif V. Ribon, L. Sherman, S.M.
controls the recruitment of Anderson, J.L. Maller, W.T. Miller,
corepressors by nuclear hormone D.P. Edwards, Progesterone receptor
receptors, Nature 1999, 402, 93-96. contains a proline-rich motif that
67. M. Gottlicher, S. Heck, P. Herrlich, directly interacts with SH3 domains
Transcriptional cross-talk, the second and activates c-Src family tyrosine
mode of steroid hormone receptor kinases, Mol. Cells 2001, 8, 269-280.
action [see comment], J. Mol. Med. 76. M.A. Shupnik, Crosstalk between
1998, 76,480-489. steroid receptors and the
68. L.I. McKay, J.A. Cidlowski, Cross-talk c-Src-receptor tyrosine kinase
between nuclear factor-kappa B and pathways: implications for cell
the steroid hormone receptors: proliferation, Oncogene 2004, 23,
mechanisms of mutual antagonism, 7979-7989.
Mol. Endocrinol. 1998, 12,45-56. 77. C.Z. Song, X. Tian, T.D. Gelehrter,
69. L.I. McKay, J.A. Cidlowski, Molecular Glucocorticoid receptor inhibits
control of immune/inflammatory transforming growth factor-beta
responses: interactions between signaling by directly targeting the
nuclear factor-kappa B and steroid transcriptional activation function of
receptor-signaling pathways, Endocr. Smad3, Proc. Natl. Acad. Sci. U.S.A.
Rev. 1999, 20,435-459. 1999, 96,11776-11781.
70. A. Ray, K.E. Prefontaine, P. Ray, 78. A. Bruna, M. Nicolas, A. Munoz, J.M.
Down-modulation of interleukin-6 Kyriakis, C. Caelles, Glucocorticoid
gene expression by 17 beta-estradiol receptor-JNK interaction mediates
References I929
111. R.S. Tan, S.J. Pu, J.W. Culberson, function of residues in the nuclear
Role of androgens in mild cognitive receptor ligand-binding domain, J .
impairment and possible Mol. Biol. 2004, 341, 321-335.
interventions during andropause, 119. J.D. Baxter, J.W. Funder, J.W.
Med. Hypotheses 2004, 62, 14-18. Apriletti, P. Webb, Towards
112. A.F. Santos, H. Huang, D.J. Tindall, selectively modulating
The androgen receptor: a potential mineralocorticoid receptor function:
target for therapy of prostate cancer, lessons from other systems, Mol.
Steroids 2004, 69, 79-85. Cell. Endocrinol. 2004, 217, 151-165.
113. J . Rosen, A. Negro-Vilar, Novel, 120. J.D. Norris, L.A. Paige, D.J.
non-steroidal, selective androgen Christensen, C.Y. Chang, M.R.
receptor modulators (SARMs) with Huacani, D. Fan, P.T. Hamilton,
anabolic activity in bone and muscle D.M. Fowlkes, D.P. McDonnell,
and improved safety profile, 1. Peptide antagonists of the human
Musculoskelet. Neuronal Interact. estrogen receptor, Science 1999, 285,
2002,2,222-224. 744-746.
114. J.P. Berger, A.E. Petro, K.L. Macnaul, 121. M.A. Iannone, C.A. Simmons, S.H.
L. J. Kelly, B.B. Zhang, K. Richards, Kadwell, D.L. Svoboda, D.E.
A. Elbrecht, B.A. Johnson, G. Zhou, Vanderwall, S.-J. Deng, T.G. Consler,
T.W. Doebber, C. Biswas, M. Parikh, J . Shearin, J.G. Gray, K.H. Pearce,
N. Sharma, M.R. Tanen, Correlation between in vitro peptide
G.M. Thompson, J. Ventre, binding profiles and cellular activities
A.D. Adams, R. Mosley, R.S. Sunvit, for estrogen receptor modulating
D.E. Moller, Mol. Endocrinol. 2003, compounds, Mol. Endocrinol. 2004,
17,662-676. 18,1064-1081.
115. M. Downes, M.A. Verdecia, A.J. 122. K.H. Pearce, M.A. Iannone, C.A.
Roecker, R. Hughes, J.B. Hogenesch, Simmons, J.G. Gray, Discovery of
H.R. Kast-Woelbern, M.E. Bowman, novel nuclear receptor modulating
J.L. Ferrer, A.M. Anisfeld, P.A. ligands: an integral role for peptide
Edwards, J.M. Rosenfeld, J.G.
interaction profiling, Drug Discov.
Alvarez, J.P. Noel, K.C. Nicolaou,
Today 2004, 9, 741-751.
R.M. Evans, A chemical, genetic, and
123. F. Stossi, D.H. Barnett, J. Frasor,
structural analysis of the nuclear bile
B. Komm, C.R. Lyttle, B.S.
acid receptor FXR [see comment],
Katzenellenbogen, Transcriptional
Mol. Cells 2003, I I , 1079-1092.
profiling of estrogen-regulated gene
116. E.M. Quinet, D.A. Savio, A.R.
expression via estrogen receptor (ER)
Halpern, L. Chen, C.P. Miller,
P. Nambi, Gene-selective modulation
alpha or ERbeta in human
by a synthetic oxysterol ligand of the osteosarcoma cells: distinct and
liver X receptor, J . Lipid Res. 2004, 45, common target genes for these
1929-1942. receptors, Endocrinology 2004, 145,
117. B. Miao, S. Zondlo, S. Gibbs, 3473-3486.
D. Cromley, V.P. Hosagrahara, T.G. 124. M. Kian Tee, I. Rogatsky,
Kirchgessner, J. Billheimer, C. Tzagarakis-Foster, A. Cvoro, J. An,
R. Mukherjee, Raising HDL R.J. Christy, K.R. Yamamoto, D.C.
cholesterol without inducing hepatic Leitman, Estradiol and selective
steatosis and hypertriglyceridemia by estrogen receptor modulators
a selective LXR modulator, /. Lipid differentially regulate target genes
Res. 2004,45,1410-1417. with estrogen receptors alpha and
118. S . Folkertsma, P. van Noort, J. Van beta, Mol. Biol. Cell 2004, 15,
Durme, H.J. Joosten, E. Bettler, 1262-1272.
W. Fleuren, L. Oliveira, F. Horn, 125. P.E. Young, D.K. Bol, High-
J . De Vlieg, G. Vriend, A throughput transcriptional profiling
family-based approach reveals the for drug discovery and lead
15 Target Families
932
I development, Genet. Eng. News response elements, Mol. Endocrinol.
2003, 23. 2002, 16,1269-1279.
126. M. Huber, I. Bahr, J.R. Kratzschmar 128. V.X. Jin, Y.W. Leu, S. Liyanarachchi,
A. Becker, E.C. Muller, P. Donner, H. Sun, M. Fan, K.P. Nephew, T.H.
H.D. Pohlenz, M.R. Schneider, Huang, R.V. Davuluri, Identifying
A. Sommer, Comparison of estrogen receptor alpha target genes
proteomic and genomic analyses of using integrated computational
the human breast cancer cell line genomics and chromatin
T47D and the antiestrogen-resistant immunoprecipitation microarray,
derivative T47D-r, Mol. Cell. Nucleic Acids Res. 2004, 32,
Proteomics 2004, 3, 43-55. 6627-6635.
127. M. Podvinec, M.R. Kaufmann, 129. H. Escriva, S. Bertrand, V. Laudet,
C. Handschin, U.A. Meyer, The evolution of the nuclear receptor
NUBIScan, an in silico approach for superfamily, Essays Biochem. 2004,
prediction of nuclear receptor 40,ll-26.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
15.4
The CPCR - 7TM Receptor Target Family
Outlook
15.4.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
934
I act on GPCRs
15 Target Families
Ho&u
H 2 N
0
q
OH
H : + D - ~ ~ o / % hfNx
\
'N- NH
Gabapentin Salmeterol Valsartan
GABA, agonist p, agonist AT, antagonist
OH
0 f
F.NH
0 N
'J
0
Leuprorelin
LH-RH agonist
cell-cell interactions and (b) the taste receptors were reclassified into
two subgroups, one within the glutamate group and one together with
frizzledltaste2 group. While the GRAFS classification is useful, in this chapter
for historic reasons we will maintain the A, B, C nomenclature, as described
above.
In the last decades, several GPCR subfamilies were explored systematically in
such a way that today selective ligands and drugs are known for a large number
of receptors of these families [lo].The elucidation of the human genome with
the discovery of the sequences of many novel orphan GPCRs with unknown
functions provided the basis for further systematization of the exploration
of the GPCR superfamily for drug discovery. Because of the evolutionary
conserved commonalities existing inside a homogeneous subgroup of GPCRs,
especially for aspects of molecular recognition, it is a very rational expectation
that through further focus within subfamilies it will be possible to find ligands
of the new receptors and to discover innovative medicines [11,121.
This chapter will summarize the milestones of GPCR research and show how
modern chemical biology disciplines and discovery technologies are currently
used to explore this highly important target family and to contribute to new
and better medicines.
15.4.2
History/Development
Term Definition
factor for their discovery was the existence of relatively well established
knowledge of the physiology of the related hormone, and that new chemical
compounds were systematicallytested in biological models of multiple disease
15.4 The CPCR - TTM Receptor Target Family I 941
protein for other signaling pathways and recruits, for instance the c-Src kinase
via the poly-Pro-SH3 domain, and thereby activates mitogen-activated protein
(MAP) kinase signaling. Also, G-protein independent signaling toward the
NHEl ion exchanger was observed. This occurs via the Naf/H+ exchanger
regulatory factor (NHERF) protein interacting by its postsynaptic density-95,
disc large, zonulla occludens-1 (PDZ) domain with the PDZ binding motifs
found at the C-terminus of several GPCRs [33].
The investigation of the mechanism of agonist-induced receptor signaling,
desensitization, internalization, trafficking, and recycling resulted in the
discovery of many proteins that interact with GPCRs and are collectively
called G-protein-coupled receptor interacting proteins (GIPs) [34, 351. The GIPs
link GPCRs to large protein networks, called receptosomes, whose mechanistic
investigation and exploration for drug discovery is the subject of intense
research activity. We will elaborate more on this topic at the end of the chapter.
15.4.3
General Considerations
G receptors belong to
class C in both species, and none belong to class F/S.
Olfactory receptor genes represent the largest mammalian subgroup. They
are class A receptors encoded by single exons, and they are transcribed in the
olfactory epithelium, where they interact specifically with the G-protein Goif to
transduce odorant signals. They provided the basis for the understanding of
odor recognition, which was awarded in 2004 with the Nobel prize for medicine
and physiology to Buck and Axel [37]. For some olfactory receptors, expressed
sequence tags (ESTs) were picked up in peripheral organs; however, the
significance of these findings remains unclear at present (e.g., prostate-specific
gene receptor (PSGR)).Especially for the human olfactory receptor family, it
is not yet entirely clear which of these receptors are functionally expressed,
as about 50% of the genes identified likely represent pseudogenes. In the
mouse family, the majority of olfactory receptors appears to be functional.
The annotation and functional characterization of olfactory receptors is
rapidly evolving and specific databases have been created that follow recent
developments “online” (see Table 15.4-3). In addition to olfactory receptors,
taste and pheromone receptors are identified as chemosensory.
The pheromone receptors play an important role in modulating behavior in
rodents; whether they are involved in human behavior is a matter of debate.
Pheromone receptors belong to class C. They are specifically expressed in the
vemeronasal organ in rodents, which is a specific structure separate from, but
in proximity to, the main olfactory epithelium. While there are more than 100
active receptors in the mouse, only 11 have been identified in humans, and
their ligands are unknown.
Taste receptors come in two families that are rather well conserved between
human and mouse. One group belongs to class C and has three members
(TlR1,2,3);these receptors form heterodimers like y -aminobutyric acid type
B (GABAB)receptors, and the different entities formed are responsible for
detecting sugars and amino acid glutamate. The second group oftaste receptors
is class A like (T2Rs) and comprises more than 30 receptors in humans, which
appear to be involved in detecting bitter tastes. All taste receptors are expressed
exclusively in the tongue, and there is a separation between cells expressing
T1- and T2-type receptors.
The opsins represent the highly interesting small family of light-detecting
GPCRs [38]. In addition to the four well known opsins operating in rod
and cone cells, there are four additional opsin-related receptors (retinal G
protein-coupled receptor (RGR) opsin, peropsin, melanopsin, encephalopsin)
that are likely to bind chromopliores and appear to play interesting roles in
light-sensing,outside the well-described primary phototransduction processes.
For instance, melanopsin may be involved in the control of circadian rhythms.
ESTs for encephalopsin were isolated from several tissues, including brain
and skin.
The genome of the nematode C. eleguns was the first to be sequenced
in full, followed by Drosophilu shortly after. These very distantly related
Table 15.4-3 Publicly available Internet molecular informatics
resources providing relevant information for CPCR chemical
biology research
~ ~~
http://www.iuphar-db.org/iuphar-rd/index.html/ Official database of the IUPHAR Committee on Receptor Nomenclature and Drug
Classification, includes information on name synonyms, structure, functional assays,
ligands, agonist and antagonist potencies, radioligand assays, transduction mechanisms,
receptor distribution, tissue function, and phenotype.
http://kidb.bioc.cwr.edu/ Database of N l M H Psychoactive Drug Screening Program. Pharmacoinformatics systems
with strong focus on GPCR pharmacology and profile structure-activity data.
http://www.gpcr.org/7tm/ GPCRDB: Information system of CMBI in Nijmegen contains information about
sequences, multiple sequence alignments, phylogenetic trees, 3D models, GPCR mutation
data and ligand-binding constants.
http://bioinfo-pharma.u- hGPCRdb: The human druggable GPCR database at the University Louis Pasteur of
strasbg.fr/gpcrdb/gpcrdb-form.htm1 Strasbourg provides searching capabilities for chemogenomics analyses of the 7TM and
the binding cavity domains of human GPCRs.
http://senselab.med.yale.edu/senselab/ORDB/ Olfactory Receptor Database of the SenseLab project at Yale University which is a
long-term effort to build integrated, multidisciplinary models of neurons and neural 1
systems, using the olfactory pathway as a model. The database provides metadata of gene
and protein sequences of olfactory receptors.
(continued overleaf)
W
P
m
RAP/homepage.html/
http://www-grap.fagmed.uit.no/G The GRAP database at the University of Tromso contains information of mutants of family
A GPCRs with detailed description of the ligand-binding and signal transductional
properties.
http://umber.sbs.man.ac.uk/dbbrowser/gpcrPRINTSj A diagnostic bioinformatics resource at the Univei-sityof Xanchestei profiling a query
sequence against the PRINTS fingerprint database to determine most similar families or
receptor subtypes.
http://bioinformatics.biol.uoa.gr/PRED-GPCR/ Additional bioinformatics classifiers of GPCRs exist at the University of Athens and the
University of California Santa Cruz, and are, respectively, based on Hidden Markov Model
(HMM) and SVM methods.
http://www.soe.ucsc.edu/research/compbio/gpcr- ChemBank at Harvard University and Pubchem at the NCBI are cheminformatics
subclass/ databases for small molecules and their biological activities. Both systems are supported by
the NCI’s initiative for chemical genetics.
http://chembank.med.harvard.edu/ InterPro at EBI is a general bioinformatics database of protein families, domains, and
http://pubchem.ncbi.nlm.nih.gov/ functional sites in which identifiable features found in known proteins can be applied to
http://www.ebi.ac.uk/interpro/ unknown protein sequences.
15.4 The CPCR - 7TM Receptor Target Family I947
Fig. 15.4-4 Three ligand binding sites filling" ligands 5-HT (serotonin - yellow),
model for monoamine-related CPCRs propranolol (cyan), and
illustrated by a rhodopsin-based 3D model 8-hydroxy-N,N-dipropylaminotetralin
o f the S - H T ~ A
receptor (left: extracellular (8-OH-DPAT - green), respectively. All
view; right: side view). We recently proposed three binding sites are located within the
a three binding site hypothesis for the highly conserved 7TM domain o f the
molecular recognition o f ligands at receptor and overlap a t the residue Asp3 32
monoamine CPCRs by combining: (D116) in TM3, which constitutes the key
(a) analyses ofthe architectures o f known anchor site for basic monoamine ligands.
monoamine CPCR ligands (see Fig. 15.4-9), The three distinct binding sites are also
(b) analyses o f molecular models o f the reflected by the architectures o f known
ligand-receptor interactions, and high-affinity ligands, which cross-link two or
(c) structural bioinformatics analyses o f the three "one-site filling" fragments around a
sequence similarities o f the three distinct basic amino group. For further detail see
binding regions o f "one-site filling" ligand references [51, 531. Throughout this chapter
fragments within the monoamine CPCR the residue positions are number coded
family. For the ~ - H T receptor,
~A which according to van Rhee and Jacobson [32]:
provided a template for the discussion o f The first digit gives the transmembrane
other related ligand-CPCR interactions, domain and the following number indicates
mutagenesis studies map three spatially the position o f the residue relative t o
distinct binding regions, which correspond position 50 which i s arbitrarily attributed t o
to the binding sites o f the "small, one-site the most conserved residue in each helix.
the possibility for design of selective ligands based on privileged motifs. A broad
spectrum of homology modeling techniques ranging from strict, template-
based methods to de novo prediction methods (e.g., the PREDICT method [57])
are used to build GPCR models. Although some reports suggest that rhodopsin
template-based approaches can be adapted to the entire GPCR repertoire
[%I, the underlying sequence alignments of such models must be carefully
investigated, which for some helices in some subfamilies are not obvious [9].
While most of the time these models neglect the long intracellular loops and
N- and C-terminal domains, some studies emphasized the role of the second
extracellular loop E2 in ligand specificity. In the bovine rhodopsin structure,
the E2 loop, which is bridged via a conserved disulfide link to the residue
Cys3.25 in top of TM3, covers parts of the central binding crevice in a lidlike
manner. One of the two ,!?-strandsthat defines the fold of the loop, contacts
directly with the retinal ligand. As the length of the loop varies significantly
within the class A family, general conclusions are difficult. Recently, it was
proposed on the basis of random saturation mutagenesis experiments of the
C5aR that the E 2 loop acts as a negative regulator of receptor activation and
stabilizes the nonsignaling receptor conformation in the absence of the agonist
ligand [59]. Also, the E2 loop has been implicated in ligand-ligand allosteric
interactions which were experimentally investigated by the SCAM approach
[60]. For instance, in the interaction of the muscarinic M1 receptor with the
allosteric modulator gallamine, an acidic sequence segment just before the
loop cystein residue could be linked to these effects. The potential role of the E2
loop in the allosteric effects observed for amiloride on the action of antagonists
of the C X ~ Aand (Y2A adrenoceptors and dopamine receptors is reported.
Recently, the potential value of GPCR models for in silico screening
applications has become of interest. Using a 3D model of the NK1 receptor
generated by the modeling binding sites including ligand information explicitly
(MOBILE)approach, in combination with 2D and 3D database searches, novel
submicromolar NKI antagonists were discovered [61]. As shown in another
study [62], models of the dopamine D3, muscarinic MI, and vasopressin V1,
receptors based on the rhodopsin template seemed to be of sufficient accuracy
to be useful (20- to 40-fold enrichment compared to random screening) in
protein-based virtual screening experiments. This procedure used standard
docking software like DOCK, FlexX, or GOLD and searched for GPCR
antagonist starting from antagonist-bound models shaped by minimizing
manually docked antagonist into the binding site. The same procedure
was, however, not applicable when a single agonist ligand was used for
the binding site shaping step, indicating that the structural changes that can
be achieved by minimization to expand the binding site are not sufficient
for stimulating the conformational changes occurring in receptor activation.
Instead, a multiagonist ligand pharmacophore-based receptor refinement
method needed to be used to generate useful models for agonist virtual
screening. Corroborative findings were described for models generated with
15.4 The CPCR - 7TM Receptor Target Family 1953
the PREDICT method and using the DOCK software in prospective virtual
screening for the Dz, ~ - H T ~S-HTd,
A , NK1, and CCR3 GPCRs [63].
Given especially the differences in the length of the intra- and extracellular
loops, the latter are expected to contribute to ligand entry, binding and/or
modulation especially for the peptide and protein binding GPCRs, and given
that the currently available inactive state rhodopsin structures can, at best,
be a reference for an antagonist state of related class A GPCRs, there are
many significant unknowns for the understanding of the structure-function
relationship of GPCRs. In this respect, the modeling and indirect structural
experiments of GPCRs also revealed the functional role of structural
microdomains as opposed to simply considering individual residues. An
important microdomain is the so-called DRY domain, which refers to a
conserved sequence patch at the cytoplasmatic end of TM3 in class A GPCRs
and which also involves residues in TM2, TMG, and TM7 [64]. The overall
picture common to many class A GPCRs is that residue Arg3.S0 is hydrogen
bonded to a carboxylate side chain at position Asp3.49 and to one or two residues
in TM6 equivalent to residues Glu6.30 and T h ~ 6 . 3in~ rhodopsin. Removal of
these interactions often results in constitutive activation of the receptor,
and based on this and the findings of analysis of structural intermediates
of the photocycle of rhodopsin, the emerging theory for receptor activation
suggests a mechanism involving a separation of the TM3 and TM6 domains
together with a twist in TM6, which pulls the third intracellular I3 loop
into the cell, uncovering residues related to G-protein coupling. Since the
DRY microdomain is not conserved in other GPCR families (exceptions are
some class C GPCRs), it may be concluded that the conformational changes
and signaling mechanisms are not strictly conserved. Importantly, as the
active conformations generated through constitutively activating mutations
and specific agonist ligands seem to be nonidentical, the concept of protean
ligands was defined by Kenakin to explain that each specific ligand-receptor
pair defines a functional entity with distinct signaling and functional properties
[65]. Obviously, this concept raises questions on the generality of the above
mentioned virtual screening studies for GPCR agonists.
Regarding class B and class C GPCRs, significantly few modeling studies
are reported. For class B GPCRs, a general two sites model has emerged for
peptide binding [7]. In this mechanism, the C-terminal ligand region binds
the extracellular N-terminal domain of the receptor. This interaction acts as an
affinity trap, promoting the interaction of the N-terminal region of the ligand
with the juxtamembrane 7TM domain of the receptor. Molecular models
were, for instance, generated for the interaction of peptide agonists with the
CFR2and PTH receptors, putting emphasis on a-helix recognition sites [G6,G7].
Nonpeptide ligands bind the juxtamembrane or the N-terminal domain and, in
most cases, allosterically modulate peptide-ligand binding [7]. Also noteworthy
is the modeling work around the allosteric binding sites of the class C Ca+*-
sensing receptor (CaR) [68] and mGluRl and mGluRS receptors [G9], where
site-directed mutagenesis and rhodopsin-based homology modeling showed a
954
I novel antagonist binding site within the 7TM bundles, clearly separated from
15 Target Families
the agonist binding site located in the N-terminal domains of these receptors.
Oligomerization of GPCRs appears to further contribute to the complexity
of the picture [70, 711, and recently a structural hypothesis was provided using
molecular modeling to describe how the G-protein transducin docks on to
dimer and tetramer oligomeric states of rhodopsin, revealing structural details
of this critical interface in the signal transduction process [72]. Biophysical
studies, using a Combination of mass spectrometry after chemical cross-
linking together with neutron scattering in solution, of the leukotriene B4
BLTl receptor, reconstituted with a heterotrimeric G-protein, sustains this
hypothesis by providing evidence for the overall assembly of a pentameric
complex formed by two BLTl units and one trimeric G-protein [73].
Ultimately, it will require high-resolution structures of multiple receptors
bound to multiple ligands including agonist, inverse agonist, and antagonist,
coupled to G-proteins and other modulators to understand fully the confor-
mational dynamics of GPCRs. The development of systematic approaches for
X-ray and nuclear magnetic resonance (NMR) analysis of GPCR structures is
hence currently a major scientific challenge, which requires further progress in
the expression, purification, and crystallization of GPCRs and their interacting
proteins [74].
o$-Q
J/H
H
*OH
.--N
Br
?&; H
NPY, (lC50= 0.8 nM) NK, (lC,o = 0.8 nM) CCR, = 1190 nM)
CCR, (lC50= 920 nM)
rN' -NH,
Fig. 15.4-5 Examples ofCPCR active the library against several CPCRs led t o the
compounds based on the 2-aryl-indoles discovery o f NPYs, NKI, chemokine
privileged scaffold identified from a focused CCR~/CCRS,serotonin ~ - H T ~ A / ~ -and HT~,
combinatorial library at Merck. Screening o f SST, receptor antagonists [91].
needed in the context of generating knowledge from HTS data [92]. On the
basis of the molecular framework approach developed by Bemis and Murcko
[93], we recently initiated a systematic analysis using reference compound
and target information. Using the framework analysis as implemented in the
Scitegic Pipeline Pilot software, we designed a data pipelining protocol that
generates frequency analysis based on the input of the various reference sets.
The approach is illustrated in Fig. 15.4-6 for the monoamine GPCRs.
A different type of fragment-based design method called thematic analysis
was developed by researchers at Biofocus for the design of focused class
A GPCR libraries [77]. This knowledge-based method is comparable to a
method developed at Novartis, which is illustrated in Section 4.4.3 [53].
SARs were analyzed in detail across the whole class A GPCR family, and
family-activity relationships were used to develop a new classification process
on the basis of the pairing of sequence themes and ligand structural motifs.
A sequence theme is a consensus collection of amino acids within the central
binding cavity and a motif is a specific structural element binding to such
a particular microenvironment of the binding site. The analysis resulted in
a compilation of themes and motifs that, to date, are used at Biofocus to
generate focused discovery libraries and to increase the lead optimization
efficiency for these targets. The individual compound libraries are targeting
F+?I
PS2 0 0
p
5%:
0
0
0
0
0
0
15.4 The CPCR - T T M Receptor Target Family
I 957
a
0 0 0
Q
6
Cy 6
@H
Q
H
O\
6
HN\ 0
15.4.4
Applications and Practical Examples
4 Fig. 15.4-7 Cluster analysis o f t h e were observed. In the first group (a) genes
expression o f 100 randomly selected mouse were expressed primarily in peripheral
endo-CPCR genes in 1 7 peripheral tissues tissues. Seven o f these genes were
and 9 different brain regions. The genes expressed exclusively in peripheral tissues
were analyzed individually by RT-PCR as and not in the brain. The second group
shown and the intensity ofthe observed (b) contained genes expressed primarily in
bands was determined by scanning. Each brain. O f these 41 genes, 14 were solely
gene is represented by a single row o f expressed in brain and not in peripheral
colored boxes with four different expression tissues. In the third group (c), the genes
levels: no expression, blue; low expression, were broadly expressed in the brain and
purple; moderate expression, dark red; throughout the periphery. Figure
strong expression, pure red. Three groups o f reproduced with permission from [36].
endo-CPCRs with broadly related profiles
during the antagonist assay phase. Modulators are detected by using a small
agonist concentration in the second phase (Fig. 15.4-8(b))and may be devoid
of agonist properties. Antagonists are clearly appearing in the third phase
following an injection of a higher GABA concentration and are characterized
by a lack of intrinsic activity in the first phase (15.4-8(c)).
Multiplex assays do not achieve the compound throughputs possible with
single measurement assays; however, they produce much richer information
already in primary screening, which is invaluable for compound categorization
and prioritization by the medicinal chemists. A further advantage of such assays
75.4 The CPCR - 7TM Receptor Target Family I965
HO
kNH
v
HU 0
8 I \
H O a
CI
Serotonin PrOpranOlOl 8-OH-DPAT
5-HT ago. p antagonist 5-HT,, part. agonist
f5
b,,
o=s=o
o~ .
RO-16814 Kissei-1
Q0
’0,
p agonist D, antagonist
15.4.5
Future Development
signaling but does not bind the ligand. It was recognized that new compound
screening strategies, allowing the detection of ligand binding or function only
by a heterodimer pair in the presence of the corresponding homodimers, are
required to allow rapid and effective identification of ligands with these char-
acteristics. Only with such ligands at hand, it will be possible to tease out the
physiological relevance of GPCR heterodimerization [71]. The opioid agonist
ligand 6’-guanidinonaltrindole (6’-GNTI) is the first example of such a lig-
and. G’-GNTI has the unique property of selectively activating only G/K-opioid
receptor heterodimers but not homomers [122]. Importantly, G’-GNTI is an
analgesic, thereby demonstrating that opioid receptor heterodimers are indeed
functionally relevant i n vivo. However, G’-GNTI induces analgesia only when
it is administered in the spinal cord but not in the brain, suggesting that the
organization of heterodimers is tissue specific. Other studies are indirect and
may reflect cross-talk between the signaling pathways at a level downstream
of receptor activation. The ability of B-blockers to interfere with angiotensin
AT1-mediated signaling, and the ability of the AT1 receptor blocker valsar-
tan to reduce catecholamine-induced elevation in the heart rate may indicate
functional angiotensin AT1-adrenoceptor interactions i n vivo.
The discovery that some GPCRs appear to function in preformed and
dynamic complexes with other signal transduction and scaffolding proteins
opens many interesting possibilities for drug discovery. For instance, targeting
the postsynaptic density (PSD-95) and Homer scaffolding proteins might
result in a new manner to modulate receptor activity [123]. PSD-95 is known
to function in synaptic neurotransmission and plasticity by enhancing or
depressing the synaptic strength depending on the frequency of neuronal
firing. The protein is a multiadapter, which binds via its PDZ domain
specific GPCRs (e.g., ~ - H T ~ 5-HT2c)
A, and ion channels (e.g., N-methyl-
D-aspartate (NMDA)) and enables, together with other protein-protein
interactions, the spatial organization of complex microarchitectures jointly
with the cytoskeleton. Similarly, the Homer proteins, which play a role in
the glutamatergic synaptic transmission, are composed of an N-terminal
enabled VASP homology type 1 (EVH1) domain, interacting with GPCRs
(e.g., mGluRl, mGluR5), ion channels (e.g., IP3 or ryanodine Ca+* receptor
channels, Transient receptor potential cation channel 1 and 2) and other
proteins, and a C-terminal coiled-coil domain that enables dimerization and
complex formation. It remains to be seen how general or specific these
intracellular GPCR modulator mechanisms are. In addition, small molecular
compounds able to disrupt or reinforce these interactions are needed to further
understand their physiological importance.
A new trend is also the therapeutic evaluation of monoclonal antibodies
against GPCRs. Although small molecule drugs seem to be the preferred
agents, recent success stories targeting the CCR5 receptors against HIV entry,
or the thyroid-stimulating hormone (TSH) receptor in Grave’s disease, show
that this route is also feasible.
970
I 15 Target Families
15.4.6
Conclusions
Acknowledgments
References
agonist shows in vivo relevance of G 125. K.M. Small, D.W. McGraw, S.B.
protein-coupled receptor dimers, Liggett, Pharmacology and
Proc. Natl. Acad. Sci. U.S.A. 2005, physiology of human adrenergic
102,9050-9055. receptor polymorphisms, Annu. Rev.
123. J. Bockaert, L. Fagni, A. Dumuis, Pharmacol. Toxicol. 2003, 43,
P. Marin, GPCR interacting proteins 381-411.
(GIP), Phamtacol. Thher. 2004, 103, 126. E.A. Hallem, A. Nicole Fox, L.J.
203-221. Zwiebel, J.R. Carlson, Olfaction:
124. W.E. Evans, H.L. McLeod, mosquito receptor for human-sweat
Pharmacogenomics-drug disposition, odorant, Nature 2004, 427, 212-213.
drug targets, and side effects, N. Engl.
J . Med. 2003,348,538-549.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
15.5
Drugs Targeting Protein- Protein Interactions
Patrick Che'ne
Outlook
15.5.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH 6 Co. KGaA. Weinheim
ISBN: 978-3-527-31150-7
980
I needed, however, because this large number of possible new targets may not
15 Target Families
15.5.2
The Diversity o f Protein-Protein Interfaces
revealing a higher lipophilicity for this part of the contact region. There is a
correlation between the number of recognition patches and the size of the
interface [7, 141. The larger the interface, the more hot spots are present.
However, in most cases, only one hot spot is present at the interface, and on
an average it buries 1560 =t340 A’ of surface upon binding. In interfaces with
multiple recognition patches, one of them is generally larger and it has a size
similar to that of the hot spots found in single-patch interfaces. The presence
of recognition patches at protein interfaces is interesting for drug discovery.
Compounds that interact with these hot spots should prevent interaction
because a large part of the binding energy is concentrated in these areas. Since
the hot spots are of a smaller size that the full interface, it might be easier
to identify low-molecular-weightcompounds - comparable in size to enzyme
inhibitors - that inhibit interaction. By contrast, if the binding energy were
equally distributed over the entire interface, much larger molecules, with a
lower likelihood of success as drug-development candidates, would have to be
designed.
The shape of the interface is another important parameter for drug discovery
because it is more difficult to obtain potent inhibitors for flat interfaces than
for interfaces that contain well-defined cavities (pockets). The less flat the
interface between two proteins, the greater the tendency of one partner to be
buried and to form a more stable complex. The heterocomplexes have more
planar interfaces than homodimers, and permanent heterocomplexes have
more twisted contact surfaces than nonpermanent ones [8]. This suggests
that the most attractive complexes for drug discovery - the nonpermanent
complexes (see above) - have rather flat interfaces. The presence of cavities
(pockets) at the contact region should therefore be looked at very carefully
during the evaluation of a protein-protein interaction.
Even if an interface contains cavities, they must be suitable for drug
discovery. Of course, they must be large enough to accommodate inhibitors,
but their shape complementarity is also important. It might be more difficult
to generate potent competitive inhibitors if the two interacting chains are
closely packed and make an extensive number of direct interactions*’. In
contrast, if within the cavity, the shape complementarity between the two
chains is low, the interacting subunits may only make a limited number of
direct interactions. For such cavities, it might be easier to improve the potency
of the inhibitors. A potent inhibitor should contain chemical groups that, upon
binding to the target protein, mimic the key interactions (the most important
for AG) made by the competing subunit and chemical groups, which make
new interactions with the target protein. The creation of these additional
contacts between the inhibitor and the target protein leads to a favorable
15.5.3
A Proposed Decision Tree to Select Interfaces for Drug Discovery
Interface to evaluate
r-
I Structure I
Hydrophobicity I
Complementarity
Fig. 15.5-2 Competitive inhibition. The PI P2 complex by 50%. Note the influence
inhibitor (I) binds t o the target protein P1 of [S]on 1 5 ~ . Cheng and Prusoff have
blocking its association with protein P2. 150 published a detailed analysis on the
corresponds to the concentration o f relationship between IC50 and inhibition of
inhibitor required to inhibit/inactivate the enzymes [22].
988
I 15 Target Families
15.5.4
Experimental Validation of the Selected Interface
All the selection criteria presented in the decision tree in Fig. 15.5-1 are
general, and many protein interfaces will only fulfill some of them. In these
cases - and also for interfaces that meet all the decision tree criteria - an
experimental study of the interface should be carried out before starting drug
discovery activities. This experimental validation should enable a good level of
confidence to be obtained on the druggability of the selected interface.
A powerful way of performing this experimental validation is to combine
site-directed mutagenesis and peptide-binding experiments. Site-directed
mutagenesis is used to demonstrate the role of selected residues in the
interaction, while peptides will help in mapping the binding site and also
in defining the importance of key amino acids. The synthesis of peptides
containing nonnatural amino acids can also be used to create new contacts
with the targeted subunit. This should help in validating some optimization
15.5 Drugs Targeting Protein-Protein interactions 1 989
15.5.5
Screening Techniques, Compound Libraries, and Targets
Since the goal of any drug discovery programme that deals with a
protein-protein interaction is to identify low-molecular-weight compounds
that bind to a well-defined pocket, the technologies and compound libraries
used to identify enzyme inhibitors can also be used to identify protein-protein
interaction inhibitors.
Various assays are used to identify competitive inhibitors of protein-protein
interactions, but the ones in which the inhibition of the complex is directly
measured - competition assays - are the most commonly used. Several assay
formats exist: enzyme-linked immunosorbent assay (ELISA), fluorescence
polarization, fluorescence resonance energy transfer, and others. These
assays are designed in such a way that they use either the two full-
length proteins, only their interacting domains, or even, when possible,
peptides that mimic the binding region. One must be very cautious with
this type of assay when determining ICsos. The potency of competitive
inhibitors depends on the amount of the competing protein present in
the assay (Fig. 15.5-2). The amount of competing protein present in the
assay may vary between laboratories and even between different protein
batches (change in specific activity). To obtain an accurate estimate of the
binding properties of the inhibitors, their h4’ should be measured. The data
obtained with the competition assay should therefore be completed with
the & measurements obtained, for example, by isothermal calorimetry.
Calorimetric measurements also provide valuable information about the
energy of the interaction, which can be used to further optimize the
compounds (e.g., to generate more enthalpy-driven or entropy-driven
compounds [24]).
15.5.6
An Example: The Design o f Inhibitors ofthe p53-hdm2 Interaction
Fig. 15.5-3 Regulation o f p53 by hdm2. mediate other biological answers such as
The tumor suppressor p53 is a tetrameric senescence. hdm2 is a negative regulator o f
transcription factor. Upon various stress p53. Upon binding t o p53 it inhibits its
conditions such as DNA damage, and transcriptional activity, promotes its
activation of various oncogenes or hypoxia, degradation, and favors its export from the
p53 is activated and binds to DNA. nucleus. Therefore, in the presence of hdm2
Depending on the cell line and/or the the tumor suppressor activity o f p53 is
cellular stress, p53 induces either a cell-cycle inhibited.
arrest or apoptosis. p53 is also able to
Fig. 15.5-4 The structure o f p53 (residues chain is manually located in the structure of
17 t o 29) in complex with hdm2 (residues 25 the p53-hdm2 complex. The backbone o f
t o 109) [40]. (a) The surface o f hdm2 is the p53 peptide is shown in gray and hdm2
represented in white, the p53-binding site in Lys94 i s represented. (c) The different hdm2
green, and the p53 peptide in red. The residues (Leu57, Phe86, lle99, and lle103)
lateral chains o f p53 Phel9, Trp23, and surrounding p53 Trp23 are indicated and
Leu26 are shown. (b) p53 Leu22 has been their van der Waals surface is represented in
replaced by a tyrosine residue and the lateral green.
15.5 Drugs Targeting Protein-Protein lnteractions 1 995
Peptide Sequence G o ( F M)
1 Ac-Gln-Glu-Thr-Phe’9--Ser-Asp-Leu-Trp23-Lys-Leu-Leu26-Pro-NH~
8.7
2 Ac-Met-Pro-Arg-Phe”-Met-A~p-Tyr-Trp~~-Glu-Gly-Leu~~-Asn-NH~
0.3
3 A~-Phe”-Met-Asp-Tyr-Trp~~-Glu-Gly-Leu~~-N
HZ 8.9
4 Ac-Phe” -Met-Aib-Tyr-Trpz3
-Glu-Ac3
c-Leuz6- N H1 2.2
5 A~-Phe”-Met-Aib-Pmp-GC1Trp~’-Glu-A~3~-Leu~~-NHz 0.005
6-CI-Trp 6-chloro-tryptophan.
~
Q 10
- O q OH
O
6
CI
was not as good as predicted by the structural analysis of the interface but
also obtaining potent low-molecular-weight inhibitors was not an achievable
goal. However, scientists at Hoffmann-La Roche recently demonstrated the
feasibility of inhibiting the p53-hdm2 interaction with low-molecular-weight
compounds. Since the publication of the first reports on peptidic inhibitors
of the p53-hdm2 interaction, it took about 10 years to obtain such results!
By screening a library of synthetic chemicals, Vassilev et al. were able to
identify cis-imidazolines (11 - Fig. 15.5-5),which they optimized for potency
and specificity [60]. These compounds bind at the p53-binding site on hdm2,
and their different substitutions mimic the key contacts made by p53 Phel9,
Trp23, and Leu26 (Fig. 15.5-6). Furthermore, the halogen (C1 or Br) present
on one of their phenyl groups mimics the chlorine atom of 6-C1-Trpin peptide
5. Finally, these molecules build up around a heterocycle and have a rigid
conformation that minimizes the entropic contribution upon binding. Their
potency (ICso), measured in a competition assay, is in the 100 to 300nM
998
I 75 Target Families
Fig. 15.5-6 Binding mode of [40] have been superimposed. Only the
cis-imidazoline and p53 peptide. The bound cis-imidazoline and the p53 peptide
structures of the cis-imidazoline-hdm2 (in red) are represented. The lateral chain of
complex [60]and the p53-hdm2 complex p53 Phel9, Trp23, and Leu26 are shown.
range. These compounds are active in various tumor cells (IC50 between 1
and 2 pM), in which they induce the activation of the p53 pathway. More
importantly, they show efficacy as single agents in a tumor model in mice. One
of these compounds (11- Fig. 15.5-5) given orally at a dose of 200 mg kg.-'
twice daily for 20 days induces 90% inhibition of tumor growth (i.e., of cells
overexpressing hdm2). This treatment does not induce toxicity as measured by
bodyweight measurements and necropsy. These data are highly encouraging,
and it will be very exciting to see the effect of these molecules - or of their
follow-up - in the clinic.
15.5.7
Conclusions
and the only way to decide whether an interface is a “good” or a “bad” target for
drug discovery is to carry out a careful analysis of its structure before starting
any drug discovery activity. This should help in selecting better targets, thereby
reducing the risk of investing time and resources in programmes that do not
deliver the expected molecules. The p53-hdm2 interaction is one example of
the interfaces that have been successfully targeted with low-molecular-weight
compounds (see also Table 15.5-1). Many other protein-protein interactions
are under investigation, and it is likely that new inhibitors of protein-protein
interaction will be described in the future.
References
1. A.L. Hopkins, C.R. Groom, The 11. I.M. Nooren, J.M. Thorton, Structural
druggable genome, Nat. Rev. Drug characterisation and functional
Discov. 2002, 1, 727-730. significance of transient
2. A.R. Fersht, Enzyme Structure and protein-protein interactions, 1.Mol.
Mechanism, 2nd ed., Freeman, New Biol. 2003, 325, 991-1018.
York, 1985. 12. N. Brooijmans, K.A. Sharp, I.D.
3. P. Chene, The ATPases: a new family Kuntz, Stability of macromolecular
for a family-based drug design complexes, Proteins 2002, 48, 645-653.
approach, Expert Opin. Trter. Targets 13. W.L. DeLano, Unraveling hot spots in
2003, 7,453-461. binding interfaces: progress and
4. S. Li, C.M. Armstrong, N. Bertin, challenges, Curr. Opin. Struct. Bid.
H. Ge, S. Milstein, M. Boxem, P.O. 2002, 12, 14-20.
Vidalain, J.D. Han, A. Chesnau, 14. P.Chakrabarti, J. Janin, Dissecting
T. Hao, D.S. Goldberg, A map of protein-protein recognition sites,
interactome network of the metazoan Proteins 2002, 47, 334-342.
C. elegans, Science 2004, 303, 540-543. 15. J . Janin, Wet and dry interfaces: the
5. B. Kleizen, I. Braakman, Protein role of solvent in protein-protein and
folding and quality control in the protein-DNA recognition, Structure
endoplasmic reticulum, Curr. Opin. 1999, 7, R277-R279.
Cell Biol. 2004, 16, 343-349.
16. C.J. Tsai, S.L. Lin, H.J. Wolfson,
6. I.M.A. Nooren, J.M. Thornton,
R. Nussinov, Protein-protein
Diversity of protein-protein
interfaces: architectures and
interactions, E M B O ] . 2003, 22,
interactions in protein-protein
3486-3492.
interfaces and in protein cores. Their
7. R.P. Bahadur, P. Chakrabarti,
similarities and differences, Crit. Rev.
F. Rodier, J. Janin, Dissecting subunit
interfaces in homodimeric proteins, Biochem. Mol. Biol. 1996, 3 1 ,
Proteins 2003, 53, 708-719. 127-152.
17. T.A. Larsen, A.J . Olson, D.S. Goodsell,
a. S. Jones, J.M.Thornton, Principles of
protein-protein interactions, Proc. Morphology of protein-protein
Natl. Acad. Sci. U.S.A. 1996, 93, 13-20. interfaces, Structure 1998, 6, 421-427.
9. L. Lo Conte, C. Chothia, J. Janin,The 1a. Y. Ofran, B. Rost, Analysing six types
atomic structure of protein-protein of protein-protein interfaces, /. Mol.
recognition sites, /. Mol. Biol. 1999, Biol. 2003, 325, 377-387.
285,2177-2198. 19. C. Cole, J . Wanvicker, Side-chain
10. S.J.Wodak, J. Janin, Structural basis conformational entropy at
of macromolecular recognition, Adu. protein-protein interfaces, Protein Sci.
Protein Chem. 2003, 61, 9-73. 2002, 1 I , 2860-2870
1000
I 15 Target Families
20. J.A. Wells, Binding in the growth protein ligase for itself and p53,]. Biol.
hormone receptor, Proc. Natl. Acad. Chem. 2000,275,8945-8951.
Sci. U.S.A. 1996, 93, 7-12. 32. J . Roth, M. Dobbelstein, D.A.
21. T.R. Gadek, J.B. Nicholas, Small Freedman, T. Shenk, A.J. Levine,
molecule antagonists of proteins, Nucleo-cytoplasmic shuttling of the
Biochem. Pharmacol. 2003, 65, 1-8. hdm2 oncoprotein regulates the levels
22. Y.C. Cheng, W.H. Prusoff, of the p53 protein via a pathway used
Relationship between the inhibition by the human immunodeficiency
constant (Ki) and the concentration of virus rev protein, E M B O ] . 1998, 17,
inhibitor which causes 50 per cent 554-564.
inhibition (IC50) of an enzymatic 33. J. Momand, D. Jung, S. Wilczynski,
reaction, Biochem. Pharmacol. 1973, J. Niland, The MDM2 gene
22,3099-3108. amplification database, Nucleic Acids
23. J.J.Schwartz, S. Zhang, Res. 1998,26, 3453-3459.
Peptide-mediated cellular delivery, 34. B. Eymin, S. Gazzeri, C. Brambilla,
C u r . Opin. Mol. Tner. 2000, 2, E. Brambilla, Mdm2 overexpression
162-167. and pl4ARF inactivation are two
24. A. Velazquez-Campoy, I. Luque,
mutually exclusive events in primary
E. Freire, The application of
human lung tumors, Oncogene 2002,
thermodynamic methods in drug
21,2750-2761.
design, Tnermochim. Acta 2001, 380,
35. D. Polsky, B.C. Bastian, C. Hazan,
217-227.
K. melzer, J. pack, A. Houghton,
25. K.H. Vousden, X. Lu, Live or let die:
K. Busam, C. Cordon-Cardo, I . Osam,
the cell’s response to p53, Nat. Rev.
hdm2 protein overexpression, but not
Cancer 2002, 2, 594-604.
26. T. Soussi, K. Dehouche, C. Beroud, amplification, is related to
p53 website and analysis of p53 gene tumorigenesis of cutaneous
mutations in human cancer: forging a melanoma, Cancer Res. 2001, 61,
link between epidemiology and 7642-7646.
carcinogenesis, Hum. Mutat. 2000, 15, 36. J.D. Oliner, J.A. Pietenpol,
105-213. S. Thiagalingam, J. Gyuris, K.W.
27. S.M. Picksley, D.P. Lane, The Kinzler, B. Vogelstein, Oncoprotein
p5 3-mdm2 autoregulatory feedback mdm2 conceals the activation domain
loop: a paradigm for the regulation of of tumour suppressor p53, Nature
growth control by p53?, BioEssays 1993,362,857-860,
1993, 15,689-690. 37. J. Chen, V. Marechal, A.J. Levine,
28. X. Wu, J.H. Bayle, D. Olson, A.J. Mapping of the p53 and mdm-2
Levine, The p53-mdm-2 interaction domains, Mol. Cell. Biol.
autoregulatory feedback loop, Genes 1993, 13,4107-4114.
Deu. 1993, 7,1126-1132. 38. J. Lin, J. Chen, B. Elenbaas, A.J.
29. J. Momand, G.P. Zambetti, D.C. Levine, Several hydrophobic amino
Olson, D. George, A.J. Levine, The acids in the p53 amino-terminal
mdm-2 oncogene product forms a domain are required for
complex with the p53 protein and transcriptional activation, binding to
inhibits p53-mediated transactivation, mdm-2 and the adenovims 5 E1B
Cell 1992, 69,1237-1245. 55-kD protein, Genes Deu. 1994, 8,
30. R. Honda, H. Yasuda, Activity of 1235-1246.
MDMZ, a ubiquitin ligase, toward p53 39. S.M. Picksley, B. Vojtesek, A. Sparks,
or itself is dependent on the RING D.P. Lane, Immunochemical analysis
finger domain of the ligase, Oncogene of the interaction of p53 with
2000, 19,1473-1476. mdm2;-fine mapping of the mdm2
31. S. Fang, J.P. Jensen, R.L. Ludwig, K.H. binding site on p53 using synthetic
Vousden, A.M. Weissman, Mdm2 is a peptides, Oncogene 1994, 9,
RING finger-dependent ubiquitin 2523-2529.
References I1001
40. P.H. Kussie, S. Gorina, V. Marechal, S.F. Howard, S.M. Picksley, D.P. Lane,
B. Elenbaas, J. Moreau, A.J. Levine, Molecular characterization of the
N.P. Pavletich, Structure ofthe mdm2 hdm2-p53 interaction, /. Mol. Biol.
oncoprotein bound to the p53 tumor 1997, 269,744-756.
suppressor transactivation domain, 49. C. Garcia-Echeverria, P. Chene, M.J.
Science 199G, 274, 948-953. Blommers, P. Furet, Discovery of
41. M.J.J.Blommers, G. Fendrich, potent antagonists of the interaction
C. Garcia-Echeverria, P. Chene, On between human double minute 2 and
the interaction between p53 and tumor suppressor p53,J. Med. Chem.
mdm2: transfer NOE study of 2000,43, 3205-3208.
p53-derived peptide ligated to mdm2, 50. R. Banerjee, G. Basu, P. Chene,
J. Am. Chem. Soc. 1997, 119, S. Roy, Aib-based peptide backbone as
3425-3426. scaffolds for helical peptide mimics, 1.
42. M. Uesugi, G.L. Verdine, The a-helical Pept. Res. 2002, GO, 88-94.
FXXFF motif in p53: TAF interaction 51. A. Bottger, V. Bottger, A. Sparks,
and discrimination by mdm2, Proc. W.L. Liu, S.F. Howard, D.P. Lane,
Natl. Acad. Sci. U.S.A. 1999, 96, Design of a synthetic Mdm2-binding
14801- 14806. mini protein that activates the p53
43. Z. Lai, K.R. Auger, C.M. Manubay, response in vivo, C u r . Biol. 1997, 7,
R.A. Copeland, Thermodynamics of 860-869.
p53 binding to hdm2(1-126): effects 52. C. Wasylyk, R. Salvi, M. Argentini,
of phosphorylation and p53 peptide C. Dureuil, I. Delumeau, J. Abecassis,
length, Arch. Biochem. Biophys. 2000, L. Debussche, B. Wasylyk, p53
381,278-284. mediated death of cells overexpressing
44. 0. Schon, A. Friedler, M. Bycroft, MDMZ by an inhibitor of MDMZ
S.M.V. Freund, A.R. Fersht, Molecular interaction with p53, Oncogene 1999,
mechanism of the interaction between 18, 1921-1934.
mdm2 and p53,]. Mol. B i d . 2002, 323, 53. P. Chene, J. Fuchs, J. Bohn,
491-501. C. Garcia-Echeverria, P. Furet,
45. R. Stoll, C. Renner, S. Hansen, D. Fabbro, A small synthetic peptide,
S. Palme, C. Klein, A. Belling, which inhibits the p53-hdm2
W. Zeslawski, M. Kamionka, T. Rehm, interaction, stimulates the p53
P. Muhlhahn, R. Schumacher, pathway in tumour cell lines, J. Mol.
F. Hesse, B. Kaluza, W. Voelter, R.A. Biol. 2000, 299, 245-253.
Engh, T.A. Holak, Chalcone 54. P. Chene. J. Fuchs, 1. Carena, P. Furet,
derivatives antagonize interactions C. Garcia Echeverria, Study of the
between the human oncoprotein cytotoxic effect of a peptidic inhibitor
MDMZ and p53, Biochemistry 2001, 40, ofthe p53-hdm2 interaction in tumour
336- 344. cells, FEBS Lett. 2002, 529, 293-297.
46. R.A. Laskowski, SURFNET a program 55. J.W. Harbour, L. Worley, D. Ma,
for visualizing molecular surfaces, M. Cohen, Transducible peptide
cavities and intramolecular therapy for uveal melanoma and
interactions, /. Mol. Graph. 1995, 13, retinoblastoma, Arch. Ophthalmo.
323-330. 2002, 120,1341-1346.
47. V. Bottger, A. Bottger, S.F. Howard, 56. J. Zhao, M. Wang, J. Chen, A. Luo,
S.M. Picksley, P. Chene, X. Wang, M. Wu, D. Yin, 2 . Liu, The
C. Garcia-Echeverria, H.K. initial evaluation of non-peptidic
Hochkeppel, D.P. Lane, Identification small-molecule HDM2 inhibitors
of novel mdm2 binding peptides by based on p53-HDM2 complex
phage display, Oncogene 1996, 13, structure, Cancer Lett. 2002, 183,
2141 -2147. 69-77.
48. A. Bottger, V. Bottger, 57. P.S. Galatin, D.J. Abraham, A
C. Garcia-Echeverria, P. Chene, H.K. nonpeptidic sulfonamide inhibits the
Hochkeppel, W. Sampson, K. Ang, p53-mdm2 interaction and activates
15 Target Families
1002
I p53-dependent transcription in 59. N. Majeu, M. Scarsi, A. Caflisch,
mdm2-overexpressing cells, J. Med. Efficient electrostatic model for
Chem. 2004,47,4163-4165. protein-fragment docking, Proteins
58. S.J. Duncan, S. Gruschow, D.H. 2001,42,256-268.
Williams, C. McNicholas, R. Purewal, 60. L.T. Vassilev, B.T. Vu, B. Graves,
M. Hajek, M. Gerlitz, S. Martin, S.K. D. Carvajal, F. Podlaski, Z. Filipovic,
Wrigley, M. Moore, Isolation and N. Kong, U. Kammlott, C. Lukacs,
structure elucidation of chlorofusin, a C. Klein, N. Fotouhi, E.A. Liu, In vivo
novel p53-mdm2 antagonist from a activation of the p53 pathway by
Fusarium sp, J. Am. Chem. Soc. 2001, small-molecule antagonists of mdm2,
123, 554-560. Science 2004, 303, 844-848.
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I 1003
16
Prediction of ADM ET Properties
UlfNorinder and Christel A. S. Bergstrom
Outlook
This chapter describes some of the approaches and techniques used currently
to derive in silico models for the prediction of absorption, distribution,
metabolism, elimination/excretion, and toxicity (ADMET) properties. The
chapter also discusses some of the fundamental requirements for deriving
statistically sound and predictive ADMET relationships as well as some of
the pitfalls and problems encountered during these investigations. It is
the intention of the authors to make the reader aware of some of the
challenges involved in deriving useful in silico ADMET models for drug
development.
16.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited bv Stuart L. Schreiber. Tarun M. Kaooor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag G k b H 61 Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
1004
I 1 G Prediction ofADMET Properties
these determinations, one to two candidate drugs (CDs) are selected from the
library for further development (Fig 16-1).
The increase in new structures generated each year has not resulted in
the expected increase of marketed new drugs annually. This has amongst
others been attributed to poor pharmacokinetic (PK) properties of the
CDs, and as much as 40% of the attrition rate of CDs has been re-
lated to poor PK profiles [I]. Given this, reliable screening filters for fac-
tors such as absorption, distribution, metabolism, elimination/excretion,
and toxicity (ADMET) are highly desirable [2-41. Indeed, the consider-
able effort that has been invested in the development of experimental
absorption filters, for example, cell monolayers for permeability determina-
tions [5, 61 and the turbidimetric method for solubility measurements [7],
76. 7 introduction I 1005
Fig. 16-2 Reasons for attrition in drug formulation and cost o f goods were only
development in the years o f 1991 and 2000 observed as reasons for attrition in 2000 and
The following reasons were observed not 1991 Further Pl(/bioavailability profiles
clinical safety (black), efficacy (red), o f new drugs were largely improved d u r i n g
formulation (green), Pl(/bioavailability this decade Finally, commercial reasons for
(blue), commercial (yellow), toxicology attrition were m o r e than threefold higher in
(gray), cost o f g o o d s (purple). and 2000 than in 1991 [8]
unknown/others (white) Note that
16.1.1
Drug Solubility
16.1.2
Intestinal Permeability
A compound can permeate the intestinal wall by using the paracellular route
(between the cells) or the transcellular route (through the cells) by passive
diffusion. To generalize, small, hydrophilic, and/or charged compounds, which
cannot permeate the lipophilic cell membrane, diffuse through the aqueous
pores. However, the pores cover less than 1% of the intestinal surface [ll],and
this in concert with the solute restriction caused by the tight junctions of the
pores largely limits the contribution of the paracellular pathway. Compounds
that show a reasonable hydrophobicity (log D p H 7 . 4 of0-2) and intermediate size
(up to a molecular weight of 500) are assumed to permeate the intestinal wall by
passive transcellular diffusion. Even though the transport by the transcellular
route seems to be a rather complex process, demanding partitioning between
lipophilic and hydrophilic milieus several times, the vast majority of druglike
compounds utilize this pathway. Larger molecules with a large number of
hydrogen bond donors and acceptors, sometimes in combination with a high
lipophilicity value, may be utilizing active processes and transport proteins to
get through the cells. However, the latter properties also increase the risk that
the compound might be transported by ef€lux proteins, resulting in a secretion
of the compound back to the intestinal lumen. Such efflux results in a lower
drug concentration reaching the blood circulation and the site of action.
IG Prediction ofADMET Properties
1008
I To conclude, two of the main factors influencing intestinal drug absorption
are aqueous solubility and intestinal permeability. These characteristics are
dependent on opposed physicochemical properties, resulting in difficulties
in finding easily interpretable models for prediction of the drug absorption
process. Several computational solubility and permeability models have so far
been developed and a majority ofthese are either dataset restricted, for example,
only a small volume of the druglike space has been included in the training
of the model, or mechanism based, for example, valid for a specific transport
route or transport protein. This indicates that firstly, the datasets used in the
development of absorption models applicable in the drug discovery process
need to cover a large volume of the druglike space. Secondly, the development
of pharmaceutical informatics tools is crucial to extract correct information
from combinations of all mechanism-based models that are available.
16.1.3
Toxicity
16.2
History and Development
Traditionally, the discovery setting has worked in serial with the primary
focus set on identification of new structures that show good pharmacological
16.3 General Considerations 1 1009
16.3
General Considerations
16.3.1
General Terms
When trying to develop in silico models for the prediction ofADMET properties
there is in most cases a trade-off between accuracy, speed, and, many times,
transparency of the derived models. This is not always a significant problem
as the various models may be intended for different usages, for example, for
high-throughput in silico screening or for guidance and focusing, respectively.
In reality, this often means that rapidly computed descriptors, often of one-
IG
1010
I Prediction ofADMET Properties
-a-lji”[c
Library
a-
Virtual library
Privileged library
-El aaaaaa
8
8 CD selection
CD selection
CD selection
Fig. 16-4 The traditional setting applied in properties are experimentally evaluated
the candidate drug (CD) selection was a simultaneously and the complete profile can
serial experimental testing o f pharmacology be used when selecting the CD. In the
(P) followed by the different ADMET knowledge-based setting, a virtual library
properties, resulting in extended designed in the computer is primary
development times and difficulties t o find evaluated through different in silico models
the optimal compound. Currently the for pharmacology and ADMET properties. A
pharmaceutical industry applies a parallel priviliged library is synthesized on the basis
setting and moves toward the ofthe results from the virtual screening and
knowledge-based setting. In the parallel the compounds are thereafter
setting, both pharmacology and ADMET experimentally tested.
and two-dimensional nature, are utilized in the former kind of models while
more computer intensive, three-dimensional based, variables are employed
(sometimes in conjunction with one- and two-dimensional representations) in
the latter type of models.
Cronin and Schultz have in a recent article [14] quite nicely put forward
some rather basic requirements to derive statistically sound models:
1. well-defined and measurable target
2. a chemically and biologically diverse data set
3 . physicochemical descriptors that are consistent with the
modeled target
4. usage of an appropriate statistical technique
5. where possible, a strong mechanistic basis.
16.3.2
Datasets and Models
16.3.3
Statistical Tools
The PLS model becomes identical to the MLR when the number of
latent variables of a PLS derived model becomes equal to the number
of actual independent variables, something that rarely happens as a
consequence of model validation. The regression coefficients of the MLR
model are straightforward to interpret while the PLS latent variables need
to be retransformed into original variable space to be interpreted in a
similar manner. This also means that the PLS “regression” coefficients
are dimensional dependent, that is, they depend on how many latent variables
(PLS components) are used. However, since each PLS component explains
a decreasing amount of variance it is usually not that important if a PLS
model is based on three or four components, which also means that the PLS
“regression” coefficients will not differ very much between the three- and
four-component models.
76.3 General Considerations I 1013
Q2 = 1 - PRESS/SSY (3)
quite often be looked upon as “gray”, not “black”, boxes since each model can
be interpreted but the multitude of them makes the overall picture difficult to
comprehend.
1-D and 2-D descriptors are generally much faster to compute than the
corresponding 3-D based ones. Also, the possible problems associated with
generating a reasonable 3-D conformation for the investigated structure are
eliminated.
1. I-D descriptors such as molecular weight, molar
refractivity, as well as number of atoms and bonds have
been used to model permeability, absorption, solubility,
and toxicological effects. These kinds of descriptors are
usually rather easy to interpret.
2. A large number of 2-D descriptors exist. Many of them are
topological in nature, that is, they are computed from the
connectivity of the investigated compound or, more
specifically, from the mathematical graph that the
structure represents, and often contain important
information with respect to ADMET modeling. Some of
the more well known, and often much used, topological
variables are the Kier and Hall descriptors. However,
many times these topological descriptors are somewhat
difficult to interpret with respect to the question: “How
should the present structure be modified to improve the
ADMET property presently investigated?” A particular
subset of topological descriptors, the so-called
electrotopological ones, is an exception with respect to
interpretability. These kinds of descriptors are quite easy
to interpret in terms of hydrogen bonding and quite a few
published investigations have found the electrotopological
(or e-state) descriptors useful for deriving good ADMET
models.
3. In many cases 3-D based descriptors are superior to lower
dimensional ones because they capture important
information, such as internal hydrogen bonds, and other
potentially important, but buried functional groups
revealed only by using the actual 3-D representation of
investigated compound. The 3-D descriptors may also be
easier to interpret than some of the previously mentioned
variables. However, choosing the correct 3-D
conformation may, in some cases, cause problems
depending on how rapidly the descriptors must be
generated. There are softwares for converting 2-D
structures into 3-D ones, for example, Corina and
Concord, but although quite successful in a vast majority
of cases, both these programs sometimes fail during the
1018
I 1 6 Prediction ofADMET Properties
16.4
Applications and Practical Examples
16.4.1
Physiological Factors and Experimental Parameters Influencing the Accuracy of
Predictions of Intestinal Drug Absorption
16.4.1.1 Solubility
The intestinal solubility of a compound is dependent on physicochemical
properties of the molecule (discussed in Sections 16.1.1 and 16.4.2), the
location in the GI tract, the general physiology, and the dosage form. By
analyzing the descriptors in the Noyes-Whitneyequation [ 151the physiological
and pharmaceutical influence on dissolution becomes apparent:
Software Company Dissolution Sol Perm Trp Oral HIA BBB Metabolism Other Toxicity
bioavaila bility PK
AbSolv x x
ACD Solubility DB ACD labs X
ADME batches PharmaAlgortihms X X X
ADME boxes PharmaAlgorithms x x x X
Cerius2 AccelRys X x x X X X
Cloe PK Cyprotex X X X
GastroPlus Simulations Plus X x x X X X X
iDEA PKexpress Lion Biosciences X X X X
KnowItAll ADME/Tox Bio-Rad Laboratories X X x x X X
Oraspotter ZyxBio X X X -
o\
PK-sim Bayer Technology x x X X X X A
Services b
-Q
QikProp Schrodinger x x X 8
QMPRPlus Simulations Plus x x x X s.
0
SLIPPER TimeTec X
xX x
3
Crosses shows properties predicted in each of the reported
software. The following abbreviations are used: Sol - solubility,
Perm - intestinal permeability, Trp - transporters,
HIA - human intestinal absorption, BBB - blood-brain
barrier permeability, PK - pharmacokinetic properties.
IG Prediction ofADMET Properties
1020
I pH values below the pK, value. For ampholytes, the lowest solubility will be
found at the isoelectric point, which is obtained at a pH value between the
acidic and basic pKa values. Another physiological factor that will influence the
solubility is the ionic strength of the intestinal fluid. This will be dependent on
food and fluid intake, and on the absorption and secretion of fluid within the
intestine [17].In general, the solubility decreases with increased ionic strength,
because of the salting-out effect and/or the common ion effect displayed by
the counterions in the solution [18, 191. However, the presence of electrolytes
can in specific cases improve the solubility [lo].This phenomenon is known
as the salting-in effect, and occurs when additives such as electrolytes loosen
up the tight water structure and thereby drive the formation of solvent cavities
for the drug molecule. Further, food induces the secretion of bile salts, that
is, surfactants secreted by the bile bladder, which may improve the solubility
of poorly soluble compounds by acting as a wetting agent or by solubilization
within the lipophilic core of bile salt micelles formed at higher bile salt
concentrations [21].
The in silico models derived for solubility are based on intrinsic solubility
as their experimental input data. The intrinsic solubility is the solubility value
determined for the neutral (i.e., uncharged) species of the compound and
is generally determined at 2 pH units above the pKa value for bases and
2 pH units below the pK, value for acids. Ampholytes are determined at
their isoelectric point. The solubility values used for the model development
therefore seldom reflect the apparent solubility seen in the intestinal
fluids. Hence, the predicted values obtained from the models need to be
transferred to an in vivo situation, for instance, by use of the Henderson-
Hasselbalch equation, which takes into account the pH dependency of
solubility [16].
16.4.1.2 Permeability
The rate and extent of intestinal permeation is dependent on the physico-
chemical properties of the compound (see Sections 16.1.2 and 16.4.3) and the
physiological factors. Drugs are mainly absorbed in the small intestine due to
its much larger surface area and less tight epithelium in comparison to the
colon [17].The permeation of the intestine may be affected by the presence of
an aqueous boundary layer and mucus adjacent to cells, but for a majority of
substances the epithelial barrier is the most important barrier to drug absorp-
tion. The lipoidal cell membrane restricts the permeability of hydrophilic and
charged compounds, whereas large molecules are restricted by the ordered
structure of the lipid bilayer.
In the GI tract, a pH-dependent permeability is seen (see also
Section 16.4.1.1):the higher the degree of ionization of the compound, the
poorer the permeability. Other physiological factors influencing the perme-
ability value of the compounds are the motility of the GI tract, the expression
of transport proteins, and the thickness of the mucus layer adherent to the
76.4 App/ications and Practical Examples I 1021
To conclude, it is not unusual that FA data for the same compound varies
with 50% in the literature, for example, FA can be reported as either 10 or GO%,
generally sorted as poor and intermediate FA, respectively. If such data is used
for training the i n silico model, the model will to a large extent be based on
noise leading to poor external predictions and noninterpretable results. In our
mind, it is more relevant to estimate the FA on the basis of in silico solubility
and permeability screens.
16.4.2
In silico Solubility Models
Gasteiger MLR 797 0.79 0.93 496 0.82 0.79 21 0.56 1.20 Yan and Gasteiger, I.
Chem. In& Comput. Sci.,
2003,429-434
ANN40-8-1 797 0.93 0.50 496 0.92 0.59 21 0.85 0.77
Liu ANN7-2-1 1033 0.86 0.70 258 0.86 0.71 21 0.79 0.93 Liu and So, J. Chem. In&
Comput. Sci., 2001,
1633-1639
Tetko MLR 879 0.86 0.75 412 0.85 0.81 21 0.77 0.99 Tetko et al., J. Chem. In&
Comput. Sci., 2001,
1488- 1493
A N N 33-4-1 879 0.94 0.47 412 0.91 0.60 21 0.90 0.64
Huuskonen M LR 884 0.89 0.67 413 0.88 0.71 21 0.83 0.88 Huuskonen,]. Chem. 1nJ g
Comput. Sci., 2000, h
773-777 b
Ts
ANN30-12-1 884 0.94 0.47 413 0.88 0.60 21 0.91 0.63
253 0.93 0.54 21
P
Wegner ANN9-15-1 1016 0.94 0.52 0.82 0.79 Wegner and Zell, /. Chem.
In$ Comput. Sci., 2003,
2.
0
1077-1084
2
r\
1
x
Norinder PLS 800 0.87 0.69 497 0.93 0.58 21 0.80 0.82 Unpublished work SL
Norinder RDS/ensemble 800 0.97 0.35 497 0.95 0.51 21 0.87 0.67 Unpublished work a2-u
Model Type n Accuracy (%) n Accuracy (“3) n Accuracy (“A) s.
800 82.10 497 80.30 21 0.83 0
Norinder RDS/classification
RDS/classification/ 800 98.00 497 86.90 21 0.91 Unpublished work
Unpublished work c
P
Norinder
ensemble -m
3-
5
n - number of compounds, RZ - squared correlation
-
4
-2
-4
-6
-8
-1 0
-1 3.-
-12 -10 -8 -6 -4 -2 0 2
Experimental log(S)
Fig. 16-5 Model ofthe Huuskonen performance o f the developed model with
aqueous solubility dataset using PLS [34]. respect to usage for predicting aqueous
Triangles - training set, circles - test set. solubility for new potential drug compounds
The plot shows the "deceptively" good (see also Figure 16-6).
16.4.3
In silico Models o f Permeability and FA
-3
-4
A A
-5
A A
L A -6
-6 -5 -4
Experimental log(S)
the size and hydrophilicity of the compound, and thus, the use of these
two components might be regarded as more sound than logPo,,. Indeed,
the use of molecular weight and number of hydrogen bonds have been
shown to predict better the permeability of a smaller dataset than did
log pact [401.
The introduction of more complex datasets used for model development
has pointed at the need for several descriptors and multivariate data analysis
(Table 16-3). For instance, combinations of PSA and nonpolar surface area
(NPSA) proved to predict the permeability of a series of peptides when PSA
alone failed [41]. Moreover, the introduction of larger structures and structures
with larger flexibility showed that the partitioned total surface areas (PTSAs),
that is, the surface area of the molecule occupied by a specific atom, and/or
descriptors related to the flexibility of the molecule are in the permeability
predictions [42, 431.
Electrotopological indices have been used to predict permeability, computa-
tionally (Table 16-3). The electrotopological descriptors are not always easily
comprehended, even though they can be attributed to describe hydropho-
bicity, hydrophilicity, and size. Other typical 2D generated descriptors are
related to dispersion forces, polarizability, solute molar volume, and hydro-
gen bonding acidity and basicity [44-471. Descriptors such as log POct/logDo,,,
1028 I 1G Prediction ofADMET Properties
(continued overleaf)
1030
I 16 Prediction ofADMET Properties
polarizability, polarity, strength of Lewis base and acid, number and strength
of hydrogen bond donors/acceptors, obtained from quantum mechanics have
also been correlated to permeability [42, 48, 491. These descriptors did show
high accuracy in the prediction, even though less complex and more rapidly
calculated descriptors were almost as accurate. Thus, since quantum mechanic
descriptors are not outperforming the fragment-based descriptors with respect
to accuracy, they will not be usable in the drug discovery setting until such
calculations become faster.
16.4.4
A Computer-based Biopharmaceutical ClassificationSystem
In a recent study we used a BCS with six classes, where the solubility
was classified as either “low” or “high” in accordance with the cut-
off values set by the FDA and the permeability was classified as ‘‘low’’
(FA < 20%), “intermediate”(20% < FA < SO%), or “high” (FA > 80%) [55].
This classification was chosen because we believe it provides a better tool
for absorption ranking of compounds in drug discovery than the stricter
permeability classification provided by the FDA. Experimental determinations
of the Caco-2 permeability and intrinsic solubility were performed in-house,
and PLS i n silico models based on PTSAs were derived. In comparison to the
experimentally determined data, the combination of the two in silico models
resulted in 87% of the compounds being sorted into the correct class. The
compounds included in a reference test set given by the FDA were correctly
sorted with an accuracy of 77%. To summarize, these results indicate that
more sophisticated in silico models combining computational analysis of the
solubility and permeability can successfully estimate the absorption process
both qualitatively and quantitatively [55].
16.4.5
In silico Toxicity Models
is not too surprising since the concept of toxicophores has been used for
quite some time in explaining the toxicological behavior of compounds. At the
same time, however, the authors of the article also state that for large datasets
there is a clear need for the development of new descriptors and/or statistical
methods.
16.5
Future Development and Conclusions
Fig. 16-8 (a) To improve the drug is further divided into a large number o f
discovery setting, the development o f subgroups as exemplified by absorption.
informatics tools suitable for virtual These subgroups may cooperate,
pharmaceutical screening are highly counteract, or be independent ofeach other.
desirable. Such tools must have the ability Furthermore, both qualitative and
t o extract important information related t o quantitative information are compiled in
each o f the main areas investigated during these screening, further stressing the
the drug discovery and early development importance o f development o f specific
process, that is, pharmacological effect and software for this application.
ADMET properties. (b) Each ofthese groups
1036
I noise on the model. Thirdly, the models should be as simplified as possible.
16 Prediction ofADMEJ Properties
Acknowledgments
Christel Bergstrom acknowledges financial support from the Knut and Alice
Wallenberg foundation and the Swedish Fund for Research without Animal
Experiments.
Glossary
y = a0 + a1x1 + 02x2 + + +
~3x3 ' ' ' anXn + E (5)
The error parameter E is the residual. The parameters a, are adjusted
so that the sum of the squared errors ( C E ~for
) all the investigated objects
(compounds) is minimized.
U=BxT
The basic idea of the network is to adjust the weights (wi)of each connection
so that, as was the case for MLR, the sum of the squared errors ( C E * )between
experimental and predicted output for all the investigated objects (compounds)
is minimized.
Huuskonen Dataset
The Huuskonen dataset [31] consists of 1297 compounds compiled from
the AQUASOL dATAbASE of the University of Arizona (Yalkowsky,S. H.;
Dannelfelser, R. M. The ARIZONA dATAbASE of Aqueous Solubility;
College of Pharmacy, University of Arizona:
1038
I 1 G Prediction ofADMET Properties
BCUT Descriptors
The BCUT descriptors are the lowest and highest eigenvalues of a connectivity
matrix of a molecule in which the diagonal elements for each atom are assigned
properties such as atomic charges, atomic polarizability, or atomic hydrogen
bond parameters, respectively.
References
42. P. Stenberg, U.Norinder, 50. M.D. Wessel, P.C. Jurs, 1.W. Tolan,
K. Luthman, P. Artursson, S.M. Muskal, Prediction of human
Experimental and computational intestinal absorption of drug
screening models for the prediction of compounds from molecular structure,
intestinal drug absorption, J . Med. /. Chem. I$ Comput. Sci. 1998, 38,
Chem. 2001,44,1927-1937. 726-735.
43. D.F. Veber, S.R. Johnson, H.Y. Cheng, 51. G.L. Amidon, H. Lennernas, V.P.
B.R. Smith, K.W. Ward, K.D. Kopple, Shah, J.R. Crison, A theoretical basis
Molecular properties that influence for a biopharmaceutic drug
the oral bioavailability of drug classification: the correlation of in
candidates, I. Med. Chem. 2002, 45, vitro drug product dissolution and in
2615-2623. vivo bioavailability, Pharm. Res. 1995,
44. M.J. Karnlet, R.M. Doherty, 12,413-420.
v, Fiserova-Bergerova, P,W, Carr, 52. E. Walter, S. Janich, B.J. Roessler, J.M.
M.H. Abraham, R.W. Taft, Solubility Hilfinger, G.L.J.Amidon,
properties in biological media 9 HT29-MTX/Caco-2cocultures as an in
prediction of solubility and part tion of vitro m ~ ~for e the
l intestinal
organic nonelectrolytes in blood and epithelium: in vitro-in vivo correlation
tissues from solvatochrornic with permeability data from rats and
parameters, _I. Pharm. Sci. 1987, 76, humans, Pharm. Sci. 1996, 85,
1070-1076.
14-17.
53. S. Winiwarter, N.M. Bonham, F. Ax,
45. J.A. Gratton, M.H. Abraham, M.W.
A. Hallberg, H. Lennernas, A. Karlen,
Bradbury, H.S. Chadha, Molecular
Correlation of human jejunal
factors influencing drug transfer
permeability (in vivo) of drugs with
across the blood-brain barrier, /.
experimentally and theoretically
Pharm. Pharmacol. 1997,49, derived parameters. A multivariate
1211-1216.
data analysis approach, /. Med. Chem.
46. M.H. Abraham, Y.H. Zhao, J. Le, 1998,41,4939-4949.
A. Hersey, C.N. Luscombe, D.P. 54. N.A. Kasim, M. Whitehouse,
Reynolds, G. Beck, B. Sherborne, C. Ramachandran, M. Bermejo,
I. Cooper, On the mechanism of H. Lennernas, A.S. Hussain, H.E.
human intestinal absorption, Eur. J . Junginger, S.A. Stavchansky, K.K.
Med. Chem. 2002,37,595-605. Midha, V.P. Shah, G.L. Amidon,
47. O.A. Raevsky, S.V. Trepalin, H.P. Molecular properties of WHO
Trepalina, V.A. Gerasimenko, O.E. essential drugs and provisional
Raevskaja, SLIP P ER-2001- Software biopharmaceutical classification, Mol.
for predicting molecular properties on phamacol, 2004, 1, 85-96,
the basis of physicochemical 55. C.A.S. Bergstrom, M. Strafford,
descriptors and Structural Similarity,/. L. Lazorova, A, Avdeef, K. Luthman,
Chem. In$ Comput. Sci. 2002, 42, P. Artursson, Absorption classification
540-549. of oral drugs based on molecular
48. U.Norinder, T. Osterberg, surface properties, /. Med. Chem. 2003,
P. Artursson, Theoretical calculation 46,558-570.
and prediction of Caco-2 cell 56. N. Green, Computer systems for the
permeability using MolSurf prediction of toxicity: an update, Adv.
parametrization and PLS statistics, Drug D e h . Rev. 2002, 54, 417-431.
Pharm. Res. 1997, 14,1786-1791. 57. T.W. Schultz, M.T.D. Cronin, T.I.
49. U.Norinder, T. Osterberg, Netzeva, The present status of QSAR
P. Artursson, Theoretical calculation in toxicology,/. Mol. Struct. (THEO)
and prediction of intestinal absorption 2003, 622, 23-38.
of drugs in humans using MolSurf 58. J.C. Dearden, In silico prediction of
parametrization and PLS statistics, drug toxicity,/. Cornput.-Aided Mol.
Eur. I.Pharm. Sci. 1999,8,49-56. Des. 2003, 17, 119-127.
7G Prediction ofADMET Properties
1042
I 59. D.F.V. Lewis, S. Modi, M. Dickins, 61. T.R. Stouch, J.R. Kenyon, S.R.
Quantitative structure-activity Johnson, X.-Q. Chen, A. Doweyko,
relationships (QSARs)within Y. Li, In silico ADME/Tox: why
substrates of human cytochromes models fail, J . Cornput.-AidedMol. Des.
P450 involved in drug metabolism, 2003, 17,83-92.
Drug Metab. Drug Interact. 2001, 18, 62. J. Feng, L. Lurati, H. Ouyang,
221-242. T. Robinson, Y. Wang, S . Yuan, S. S
60. C. Hansch, S.B. Mekapati, A. Kamp, Young, Predictive toxicology:
R.P. Verma, QSAR of cytochromes benchmarking molecular descriptors
P450, Drug. Metab. Rev. 2004, 36, and statistical methods, J. Chem. Inf:
105- 156. Comput. S C ~2003,43,14G3-1470.
.
PART VII
Systems Biology
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tamn M. Kapoor, and Gunther Wess
Copyright 52 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I 1045
17
Computational Methods and Modeling
17.1
Systems Biology of the JAK-STATSignaling Pathway
Outlook
17.1.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gurither Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
1046
I 77 Computational Methods and Modeling
17.1.2
History Development
The developed control theory for metabolic systems allows inferring of,
for example, the effects of local changes, like the properties of an enzyme
on global properties as the flux through the system. Furthermore, general
global properties of the systems were captured by summation and connectivity
theorems, see [S] for a comprehensive review.
For signaling pathways and gene regulatory networks, the above constraints
do not hold and similar general statements are not available. But for specific
examples, the ideas from metabolic systems have been generalized to signaling
pathways [GI and design principles of signaling pathways and gene regulatory
networks have been discovered [7, 81. An important topic of recent research
is the robustness of the systems because they have to function in a noisy
environment under fluctuating conditions. These investigations reach from
bacterial chemotaxis [9, 101 via components of signaling pathways [11] to
developmental biology 1121, see Ref. 13 for a recent review.
For signaling pathways, recent years have seen an increasing number of
studies of specific pathways where mathematical modeling is applied to infer
systems’ properties from the models. These applications include the mitogen-
activated protein (MAP)-kinasepathways [14-161, apoptotic pathways [17-191,
the WntlB-Catenin [20],and the Janus kinase-signal transduction and activator
of transcription (JAK-STAT)pathway [21].A regulatory network that has been
studied intensively is the cell cycle [7, 22, 231.
Because of the nascent state of systems biology, only few textbooks are
available [24-261.
77.7 Systems Biology oftheJAK-STAT Signaling Pathway I 1047
17.1.3
General Considerations
Since Newton’s days, Physics and Engineering have been extremely successful
in understanding the inanimate part of nature by applying mathematics and
translating these insights into technological developments. It is foreseeable
that in the twenty-first century an analogous development will take place for
the animate part of nature, including technology based on the insights of the
basic sciences.
Arguments for the helpful contributions of mathematics applied to the life
sciences include:
Make assumptions explicit
Decades of work in biology have produced enormous
amounts of knowledge rendering it difficult to see the forest
for the trees, that is, to judge what the important players and
effects are. A mathematical description necessitates being
explicit about what the assumptions of a model are.
Understand essential properties from failing models
If a mathematical models fails to describe biological data, this
gives the valuable information that the assumptions of the
model missed an essential part.
Condense information, handle complexity
The huge extent of biological knowledge is also an obstacle
since it does not allow for intuition-based reasoning due to its
complexity. Mathematical modeling can help handle the
complexity by condensing it into a model.
Understand role of dynamical processes, for example,
feedback
Dynamic properties like combinations of positive and
negative feedbacks induce system properties that can only be
captured by mathematics, see Ref. 16 for an example, where a
mathematical treatment elucidates why cells react differently
to transient and sustained stimuli.
Impossible experiments become possible
Mathematical models allow for in silico biology. Experiments
that might be impossibIe biochemically can be conducted
using the computer.
Prediction and control
On the basis of mathematical models, new experiments can
be suggested and their outcome can be predicted. Especially,
the control of networks can be investigated in silico. This
enables identification of targets for medical intervention.
Understand what is known
Pure biological facts can be understood in the context of
dynamic behavior.
1048
I 17 Computational Methods and Modeling
All these arguments apply to biology in general; but due to its network
structure, especially to cell biology in terms of metabolism, signal transduction,
and gene regulation.
Systems biology can be defined as the endeavor to understand biomedical
systems using data-based mathematical modeling of their dynamic behavior.
The final goal is to turn the life sciences from a qualitative, descriptive science
into a quantitative, predictive science. Systems biology relies on other fields of
research but should also be distinguished from them, since systems biology is
more than . . .
. . . Mathematical Biology because systems biology is data
based
Mathematical Biology formulates and investigates
mathematical models inspired by biology but it is de facto a
part of mathematics often not getting back to biology. Systems
biology requires close collaborations between theoreticians
and experimentalists. This ranges from the joint planning of
experiments to the corporate interpretation of the results of
the mathematical models including the formulation of new
hypotheses to be tested in the next cycle between “wet-lab”
and “dry-lab”.
. . . Bioinformatics because systems biology considers the
dynamics
Bioinformatics is an important basis for systems biology in,
for example, identifying the components involved but does
not deal with the dynamic aspects of networks that are
substantial for systems biology.
. . . another ‘omics’-technologybecause systems biology
involves mathematics
Proteomics, genomics, metabolomics, and other
high-throughput technologies to monitor the state of cells in
certain respects provide important information for systems
biology, but systems biology should not be understood as
“putting the . . .omics together”. It should be noted that the
term systems in systems biology originates from systems
17.1 Systems Biology oftheJAK-STAT Signaling Pathway I 1049
17.1.4
Practical Example
by x 4 ( t ) ,we arrive at the following dynamic model where the time dependence
is suppressed for the sake of clarity:
23 = +0.5 k 2 ~ :- k 3 ~ 3 (3)
k4 =+k3~3 (4)
These equations describe the yield and loss of the different components.
For example, Eq. (1) states, that the unphosphorylated STAT-monomer x1
is reduced, expressed by the minus sign, with a certain rate k l due to the
interaction of the STAT-monomer with the activated receptor described by
x1E ~ o R A .Since this interaction leads to the phosphorylated STAT-monomer
x 2 , the same term as in Eq. (l),but with positive sign appears in Eq. (2). The
second part of Eq. (2) describes the loss of the phosphorylated STAT-monomer
x 2 by dimerization with rate constant k z . This term appears in Eq. (3) with the
factor of 0.5 since two monomers form one dimer. The second term in Eq. (3)
and the right-hand side of Eq. (4),finally, describe the transport of the dimer
into the nucleus.
17. I Systems Biology oftheJAK-STATSignaling Pathway I 1051
The initial values for x2,x3,and x4 are zero, the initial value for x1 is a free
parameter that in addition to the parameters k l , k 2 , and k3 has to be estimated
from the data.
These equations have a vivid meaning. For example, Eq. (1)means that the
rate of change of the unphosphorylated monomer is negative and proportional
to the interaction of the monomer with the activated receptor. The rate is
determined by kl .
By quantitative immunoblotting, the time courses of the phosphorylated
(monomeric, x2, and dimeric, xj) STAT-5 in the cytoplasm y t ( t ) , the total
amount of STAT-5in the cytoplasm y 2 ( t ) and the activation of the Epo receptor
y 3 ( t ) , were determined. The measured values represent relative units. For a
detailed description of the biochemical techniques to measure the different
components, see Ref. 21.
All together, the observation equations read:
Simulation 3 Simulation 4
1'
08
0" #I
0 2 4 6 8 10
t t
Equation (8) captures the dynamical equations (1-4), the parameters, and
the activation ofthe Epo receptor as an external input u. Equation (9)describes
how the sampled observables are linked to the dynamical variables and
also includes observational noise &(ti) always present in experimental data.
Estimation of the parameters is based on minimizing the error function:
2ot I 4
:01j
5
OO -72;-30- 40
O
Lp>
50
m
60
(4 Time (min) Time (min)
1.2,
a t
Fig. 17.1-3 Examples ofthe measured time series. (a) Activation o f t h e Epo receptor.
(b) Phosphorylated STAT-5 in the cytoplasm. (c) Total amount o f STAT-5 in the cytoplasm.
35 1.2
--.__
(I)
I-.-
-_I
- -
0 10 20 30 40 50 60
(4 Time (min) (b) Time (min)
Fig. 17.1-4 Fit ofthe feed-forward model, Eqs. (1-4), to the measured time series o f
phosphorylated (a) and total (b) STAT-5 in the cytoplasm.
1054
I in the cytoplasm is completely missed. This calls for a reconsideration of the
17 Computational Methods and Modeling
24 = p3x3-p4xj (14)
The results of a fit of this model to the data are displayed in Fig. 17.1-5 and
demonstrate a good agreement of the model trajectories with the experimental
data. As a surprising result, the sojourn time T of STAT-5 in the nucleus
turned out to be approximately G min. The fitted trajectory for phosphorylated
STAT-5 shows that the "plateau" between 10 and 30 min is not a plateau, but
results from waves of phosphorylated STAT-5through the nucleus.
Simulating the model allows investigation of the single populations x1 to x4
of STAT-5.The in silico results are given in Fig. 17.1-6.
It is observed that the unphosphorylated monomer x1 is completely pro-
cessed in the first wave of activation, Furthermore, the concentration of the
phosphorylated monomer x2 is low for the whole time because the dimerization
process is fast. This explains the experimental experience that the phosphory-
lated monomer is difficult to measure. The model explains this fact in a natural
way. On the basis of the fitted model, a sensitivity analysis is performed. These
in silico investigations mean that the parameters of the model are changed and
the (predicted) effect on the function of the system is determined. Because we
35 I 1.2
m IT
deal with signal transduction, activation of target gene is the most important
function. For the study, target gene activation is assumed to be proportional to
the shuttling STAT-5 in the nucleus. The results are displayed in Fig. 17.1-7.
Surprisingly, the first step in the network, that is, variation of the
phosphorylation of the monomeric STAT-5 described by kl has the smallest
17.1.5
Future Development
The limiting factor in systems biology is high quality data [16]. Mathematical
modeling can only give as much information as is coded in the data. Unfortu-
nately, most techniques including the high-throughput “omics” technologies
References
17.2
Modeling lntracellular Signal Transduction Processes
Outlook
The ability to control normal and diseased cell function will require quantitative
analyses of how cells perceive and decode information. Involving enzyme-
catalyzed reactions and assembly of protein-protein and protein-lipid
complexes that modulate enzyme activity, signal transduction is the biochemical
integration of information inside the cell, and manipulation of signal
transduction networks thus offers a broad-based approach to influence cell
behavior. Mathematical modeling approaches, wherein chemical kinetics,
spatial distributions of molecules, and biophysical constraints may be described
in dynamic and unambiguous terms, are being applied with increasing
frequency to analyze biochemical signaling mechanisms more critically. Once
validated by quantitative measurements, such models may soon offer a means
to predict the integrated behavior of interacting pathways and combinations
of cell stimuli. We discuss here the recent advances in, and challenges faced
by, this emerging field.
17.2.1
Introduction
The past 15 years or so have seen a shift in the focus of biological research to
the study of molecular mechanisms underlying cell regulation and function.
Thus, we now have a qualitative roadmap of how intracellular molecules are
organized to form signal transduction pathways, which govern cell decision-
making in a tightly controlled, context-dependent manner [l].However, it is
not yet fully appreciated how biochemical mechanisms affect the kinetics of
pathway activation, or how the magnitudes and/or timing of those signals are
related to the likelihood and quality of a cell response.
Mathematical modeling of signal transduction interactions, pathways, and
networks is emerging as a powerful tool that can aid in explaining and
interpreting experimental data. In most cases, the explanations are fairly
intuitive (at least in hindsight) once the model has been applied to the problem
at hand; in other cases, the conclusions are less so. In any case, quantitative
models provide a way to organize hypotheses and integrate the many effects
that may be at play. If done correctly, all the inherent assumptions are clearly
laid out, because the system is described in the unambiguous language of
mathematics.
Chemical Biology. From Small Molecults to System Biology and Drug Design
Edited by Stuart L. Schreiber. Tarun M . Kapoor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KCaA, Weinheim
ISBN: 978-3-527-31150-7
1062
I 17 Computational Methods and Modeling
17.2.2
Receptor-Binding and Regulation Mechanisms
The first step in most signaling pathways is the binding ofcell surface receptors,
which links the presence and concentration of a specific extracellular ligand
to the intracellular processes that ultimately govern the cellular response.
One often thinks of receptor binding simply as a reversible, bimolecular
process, characterized by the dissociation (inverse equilibrium) constant, KD;
an apparent KD value is generally defined as the free concentration ofligand that
yields half-maximal binding to the cell surface (or to receptors immobilized on
a solid support). In the simplest model, each ligand-bound receptor is activated
for signal transduction. This picture belies a number of complexities, however,
which are most often neglected in models of signal transduction. Arguably the
7 7.2 Modeling lntracellular Signal Transduction Processes I 1063
c
0
.- -
m
+
>
.-
C I -
m
0
-m
c -
.-0
17.2.3
Receptor-mediatedCovalent Modifications and Molecular Interactions
and dissociate with receptor complexes before they are dephosphorylated. Slow
versus rapid exchange is determined by the relative rates of substrate phospho-
rylation, dissociation from the receptor complex, and dephosphorylation within
the complex and in the cytosol; fast exchange has the effect of homogenizing
the phosphorylation state ofthe protein, which thereby responds globally to the
average status of the receptor complexes [28, 301. The ability to hold informa-
tion about the local receptor environment, in the context of phosphorylation
within the receptor complex, requires slow substrate exchange [33].
17.2.4
Spatial Organization and Gradients on Cellular and Subcellular Length Scales
Most of the examples cited above are purely kinetic models with variables
changing only with respect to time. While processes may be compartmentalized
in such models, with rate terms that account for transfer between cellular
compartments, spatial gradients within compartments are obviously not
accounted for. In most cases, signaling molecules encounter one another
through mutual diffusion, and net molecular transport from one location to
another depends on such gradients. However, the concept of a concentration
gradient serving as a “driving force” for macroscopic diffusion leads to a
common misconception. On a microscopic level, biological molecules are
constantly in motion through collisions with water (and occasionally other)
molecules, and thus it is obvious that they can associate in the absence of
concentration gradients. If one were to survey the cytoplasm of a typical cell,
the average distance between the plasma membrane and the nucleus is in the
-
range of L 1-10 ym. The diffusion coefficient D of a small molecule such as
Ca2+or ATP in the cytosol is -103pm2 spl,and that of a larger macromolecule
is -10 pm2 spl (the cytosolic D value for green fluorescent protein, medium
sized at 27 kDa, has been measured at 40 pm2 spl).In three dimensions, the
average time associated with traversing that distance is L 2 / 6 D , which yields a
range of times from 0.2 ms to 2 s. One concludes that diffusive transport in the
cytosol is relatively efficient on cellular length scales, and that the formation of
macroscopic gradients requires a fairly rapid degradation/turnover of the
molecule. In the case of intracellular calcium and certain other second
messengers, fluorescence imaging experiments and detailed kinetic and spatial
modeling [40, 411 have demonstrated that spatial waves propagate in the cell
as a result of rapid dynamical processes characteristic of excitable media [42].
For signaling proteins that are phosphorylated or otherwise modified at the
plasma membrane and/or at endosomal membranes but dephosphorylated
throughout the cell, models have been used to evaluate the possibility and
functional consequences of gradients of these phosphorylated proteins in the
cytosol[28,43-451; when the cytosolic phosphatase activity is either very strong
or very weak, however, a kinetic model is adequate [33].
1070
I 17 Computational Methods and Modeling
microscopy.
17.2.5
Downstream Signaling Cascades and Networks
17.2.6
Prospects and Challenges
perhaps foremost, one must choose a model structure that relates to molecular
mechanisms that may not be known completely, and so it is inevitable
that complex models will include controversial elements. Like conceptual
models of signaling mechanisms, quantitative models will need to be refined
and/or revised in the light of new findings, but then the model bears the
burden of showing whether earlier predictions and analyses remain valid.
Second, a fundamental problem with complex models is that they require the
specification of an increasing number of parameter (e.g., rate constant) values;
even when such values are obtained from the literature or from best-fits to
available data sets, it must be recognized that there is a great deal ofuncertainty
associated with this exercise. In the best-case scenario, the model would be
validated by direct comparison with quantitative measurements that assess
multiple intermediates activated under the same stimulation conditions, and
even then a sensitivity analysis will be warranted to identify those parameter
values that drive the quality of fit; in spite of the vast literature on signaling
mechanisms, the field is currently limited by the availability of such data.
Model generality is a related issue; it seems unlikely that a model that was
trained on one cellular context will transfer well to the analysis ofother systems.
Finally, more comprehensive models can be cumbersome to work with, and
how one might approach the analysis depends on the specific question(s)
being asked. In response, it has been suggested that one might build models
from smaller process modules, which might be analyzed individually and in
the context of other modules [87, 881. Software packages such as Virtual Cell
(http://www.nrcam.uchc.edu/) [89] have been developed for the purpose of
linking models together in a seamless and interactive way.
c,
k, j
internalized dimer
17.2.7
Concluding Remarks
Acknowledgments
References
29. B. Schoeberl, C. Eichler-Jonsson, E.D. Proc. Natl. Acad. Sci. U.S.A. 2005, 102,
Gilles, G . Muller, Computational 4824-4829.
modeling of the dynamics of the MAP 38. C. Wofsy, D. Coombs, B. Goldstein,
kinase cascade activated by surface Calculations show substantial serial
and internalized EGF receptors, Nat. engagement of T cell receptors,
Biotechnol. 2002, 20, 370-375. Biophyr.1. 2001, 80, 606-612.
30. J.M. Haugh, I.C. Schneider, J.M. 39. D. Coombs, A.M. Kalergis, S.G.
Lewis, On the cross-regulation of Nathenson, C. Wofsy, B. Goldstein,
protein tyrosine phosphatases and Activated TCRs remain marked for
receptor tyrosine kinases in internalization after dissociation from
intracellular signaling, J. 7’heor. Biol. pMHC, Nat. Immunol. 2002, 3 ,
2004, 230,119-132. 926-931.
31. R.G. Posner, C. Wofsy, B. Goldstein, 40. C.C. Fink, B. Slepchenko, 1.1. Moraru,
The kinetics of bivalent ligand-bivalent J . Schaff, J. Watras, L.M. Loew,
receptor aggregation: ring formation Morphological control of inositol-
and the breakdown of the equivalent 1,4,5-trisphosphate-dependent signals,
site approximation, Math. Biosci. 1995, 1. Cell Biol. 1999, 147, 929-935.
126,171-190. 41. J.C. Schaff, B.M. Slepchenko, Y.S.
32. U.S. Bhalla, R. lyengar, Emergent Choi, J . Wagner, D. Resasco, L.M.
properties of networks of biological Loew, Analysis of nonlinear dynamics
signaling pathways, Science 1999, 283, on arbitrary geometries with the
381-387. virtual cell, Chaos 2001, 11, 115-131.
33. J.M. Haugh, A.C. Huang, H.S. Wiley, 42. S.Y. Shvartsman, Shooting from the
A. Wells, D.A. Lauffenburger, hip: spatial control of signal release by
Internalized epidermal growth factor intracellular waves, Proc. Natl. Acad.
receptors participate in the activation Sci. U.S.A. 2002, 99,9087-9089.
of p21rasin fibroblasts, J . Biol. Chem. 43. B.N. Kholodenko, G.C. Brown, J.B.
1999,274,34350-34360. Hoek, Diffusion control of protein
34. J.M. Haugh, A. Wells, D.A. phosphorylation in signal
Lauffenburger, Mathematical transduction pathways, Biochem. /.
modeling of epidermal growth factor 2000, 350, 901-907.
receptor signaling through the 4. B.N. Kholodenko, MAP kinase cascade
phospholipase C pathway: signaling and endocytic trafficking: a
mechanistic insights and predictions marriage of convenience? Trends Cell
for molecular interventions, Biol. 2002, 12, 173-177.
Biotechnol. Bioeng. 2000, 70, 225-238. 45. I.V. Maly, H.S. Wiley, D.A.
35. T.W. McKeithan, Kinetic proofreading Lauffenburger, Self-organization of
in T-cell receptor signal transduction, polarized cell signaling via autocrine
Proc. Natl. Acad. Sci. U.S.A. 1995, 92, circuits: computational model
5042- 5046. analysis, Biophys. J . 2004, 86, 10-22.
36. W.S. Hlavacek, A. Redondo, C. Wofsy, 46. A. Gierer, H. Meinhardt, A theory of
B. Goldstein, Kinetic proofreading in biological pattern formation,
receptor-mediated transduction of Kybernetik 1972, 12, 30-39.
cellular signals: receptor aggregation, 47. M. Postma, P.J.M. Van Haastert, A
partially activated receptors, and diffusion-translocation model for
cytosolic messengers, Bull. Math. Biol. gradient sensing by chemotactic cells,
2002, 64,887-911. Biophys.J. 2001, 81, 1314-1323.
37. P.A. Gonzalez, L.J. Carreno, 48. A. Levchenko, P.A. Iglesias, Models of
D. Coombs, J.E. Mora, E. Palmieri, eukaryotic gradient sensing:
B. Goldstein, S.G. Nathenson, A.M. application to chemotaxis of amoebae
Kalergis, T cell receptor binding and neutrophils, Biophys. J . 2002, 82,
kinetics required for T cell activation 50-63.
depend on the density of cognate 49. K.K. Subramanian, A. Narang, A
ligand on the antigen-presenting cell, mechanistic model for eukaryotic
17 Computational Methods and Modeling
1080
I gradient sensing: spontaneous and 60. L.D. Shea, J.J.Linderman,
induced phosphoinositide Compartmentalization of receptors
polarization, J. Theor. Biol. 2004, 231, and enzymes affects activation for a
49-67. collision coupling mechanism, J.
50. L. Ma, C. Janetopoulos, L. Yang, P.N. Theor. Biol. 1998, 191, 249-258.
Devreotes, P.A. Iglesias, Two 61. K. Ritchie, X. Shan, J. Kondo,
complementary, local excitation, K. Iwasawa, T. Fujiwara, A. Kusumi,
global inhibition mechanisms acting Detection of non-brownian diffusion
in parallel can explain the in the cell membrane in single
chemoattractant-induced regulation of molecule tracking, Biophys. /. 2005, 88,
PI(3,4,5)P3response in Dictyostelium 2266-2277.
cells, Biophys. /. 2004, 87, 3764-3774. 62. D. Bray, Intracehlar signaling as a
51. J.M. Haugh, F. Codazzi, M. Teruel, parallel distributed process, /. Theor.
T. Meyer, Spatial sensing in fibroblasts Biol. 1990, 143, 215-231.
mediated by 3' phosphoinositides, J. 63. B.N. Kholodenko, J.B. Hoek, H.V.
Cell Biol. 2000, 151, 1269-1279. Westerhoff, G.C. Brown,
52. J.M. Haugh, I.C. Schneider, Spatial Quantification of information transfer
analysis of 3' phosphoinositide via cellular signal transduction
signaling in living fibroblasts: I. pathways, FEBS Lett. 1997, 414,
Uniform stimulation model and 430-434.
bounds on dimensionless groups, 64. A. Goldbeter, D.E. Koshland Jr,An
Biophys. /. 2004, 86, 589-598. amplified sensitivity arising from
53. G. Adam, M. Delbriick, Reduction of covalent modification in biological
dimensionality in biological diffusion systems, Proc. Natl. Acad. Sci. U.S.A.
processes, in Structural Chemistry and 1981, 78,6840-6844.
Molecular Biology, (Eds.: A. Rich, 65. A. Goldbeter, D.E. Koshland Jr,
N. Davidson), W.H. Freeman and Co., Ultrasensitivity in biochemical
San Fransisco, 1968,198-215. systems controlled by covalent
54. H.C. Berg, E.M. Purcell, Physics of modification: interplay between
chemoreception, Biophys. /. 1977, 20, zero-order and multistep effects, /.
193-219. Biol. Chem. 1984, 259,14441-14447.
55. L.D. Shea, G.M. Omann, J.J. 66. J.E. Ferrell Jr, Tripping the switch
Linderman, Calculation of fantastic: how a protein kinase cascade
diffusion-limited kinetics for the can convert graded inputs into
reactions in collision coupling and switch-likeoutputs, Trends Biochem.
receptor cross-linking, Biophys. 1. S C ~1996,
. 21,460-466.
1997, 73,2949-2959. 67. J.M. Haugh, D.A. Lauffenburger,
56. J.M. Haugh, A unified model for Physical modulation of intracellular
signal transduction reactions in signaling processes by locational
cellular membranes, Biophys. J. 2002, regulation, Biophys. /. 1997, 72,
82,591-604. 2014-2031.
57. H. Berry, Monte Carlo simulations of 68. J.E. Ferrell Jr, How regulated protein
enzyme reactions in two dimensions: translocation can produce switch-like
fractal kinetics and spatial segregation, responses, Trends Biochem. Sci. 1998,
Biophys.]. 2002, 83, 1891-1901. 23,461-465.
58. P.J. Woolf, J.J. Linderman, Untangling 69. A. Levchenko, J. Bruck, P.W.
ligand induced activation and Sternberg, Scaffold proteins may
desensitization of G-protein-coupled biphasically affect the levels of
receptors, Biophys. J. 2003, 84, 3-13. mitogen-activated protein kinase
59. M.J. Saxton, K. Jacobson, signaling and reduce its threshold
Single-particle tracking: applications properties, Proc. Natl. Acad. Sci. U.S.A.
to membrane dynamics, Annu. Rev. 2000, 97,5818-5823.
Biophys. Biomol. Struct. 1997, 26, 70. R. Heinrich, B.G. Neel, T.A. Rapoport,
373-399. Mathematical models of protein
References I1081
I 1083
18
Genome and Proteome Studies
18.1
Genome-wide Gene Expression Analysis: Practical Considerations and
Application to the Analysis of T-cell Subsets in Inflammatory Diseases
Outlook
The scope of this chapter is twofold. We will first review some important
conceptual and technical issues related to experiment design that we feel
should be addressed while designing studies using microarrays. In the second
part, we will illustrate how this technology can be employed practically to
promote insight into a specific biological field, by reviewing several studies
that address the molecular basis of inflammatory diseases using gene profiling.
We will focus on the gene expression analysis of T-lymphocyte subsets, the
key players in several inflammatory diseases.
18.1.1
Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
I S B N : 978-3-527-31150-7
1084
I 18 Genome and Proteome Studies
the expression level of many genes in parallel. The technology can reveal the
physiology of cells and tissues on an unprecedented scale by quantitating the
mRNA levels of tens of thousands of genes [l].
The amount of data generated by microarray experiments cannot be handled
by simple sorting in spreadsheets or by plotting on graphs. Microarray data
analysis has recently developed as a separate field with increasing impact of
mathematicians generating dedicated algorithms and tools [2-41. Sophisticated
computational tools are now available, but it should be noted that a basic
understanding of these tools is required for meaningful data analysis.
18.1.2
History/Development
18.1.3
General Considerations
Fig. 18.1-1 Global gene expression studies given gene. More recently, custom-designed
rely mainly on two technologies: spotted and commercial platforms using "long"
complementary DNA (cDNA) oligonucleotides (60-mers) are increasingly
microarrays (a) and oligonucleotide used.
microarrays (b). The first type o f microarray To generate hybridization targets, RNA i s
is generated by robotic spotting of cDNA extracted from the tissue o f interest and
fragments for defined genes on a glass slide, mRNA is reverse transcribed into cDNA. In
in an ordered fashion. In general, each gene protocols used mainly for spotted cDNA
is represented by double-stranded DNA arrays, fluorescently labeled nucleotides are
probe (up to 1 kb) that is usually generated incorporated into the cDNA during this
by polymerase chain reaction (PCR) step. In other protocols used mainly for
amplification. Current technology allows the high-density oligonucleotide arrays, a
deposition o f more than 10 000 genes on a biotin-labeled cRNA target is generated by
single slide. High-density oligonucleotide
transcribing the double-stranded cDNA
arrays are generated by in situ synthesis of
target with T7 RNA polymerase. This last
short oligonucleotides (25-mers) on a glass
step also results in a linear amplification
slide. A sophisticated process developed in
(approximately 50-fold) o f the material. In
the semiconductor industry, termed
photolithography, is used to synthesize both cases, the labeled target cDNA or
approximately 1 300 000 distinct cRNA is hybridized t o the array, and the
oligonucleotide features in defined places intensity of hybridization t o individual cDNA
on a chip. In contrast to spotted cDNA fragments or oligonucleotides on the array
arrays, each gene is represented by 11 to 20 is revealed by a high-resolution scanner. The
pairs o f oligonucleotides on a single chip. hybridization signal is then used t o
This allows the design o f oligonucleotide determine the expression level o f each gene
probes that hybridize to a specific exon o f a represented on the array.
the same array) with a common reference measurement (where each sample is
hybridized to a separate array), Park et al. found a high correlation between the
two settings, suggesting that multiple comparisons of experimental conditions
using a common control can achieve a satisfactory degree of accuracy [16].
of solid tissues (Refs. 32-36 and references therein). King et al. found
that gene expression measurements from small sample RNA are not
really equivalent to measurements from standard sample RNA, possibly
because of amplification failure of low-abundance transcripts and sequence-
specific differences in amplification efficiency. They, however, concluded
that biological variability in gene expression between independent samples
is greater than the technical variability associated with the amplification
process [36]. Some amplification methods have been shown to have
reproducible bias (such as overrepresentation of T-rich sequences), related
to the amount of starting material and to the number of amplification
cycles. Underrepresentation of mRNA with extensive secondary structure
may be partially resolved by performing the reverse transcription step at
higher temperatures [37]. Comparisons between amplified and nonamplified
samples show that the best correlations of expression levels are obtained for
abundant transcripts [38].
The choice ofthe amplification protocol may be important in determining the
quality and robustness of the results, as even small variations in methodology
introduce considerable distortion of gene expression profiles. Klur et al. have
focused on procedures in which a double-stranded cDNA produced from total
RNA is used as a template to generate a labeled cRNA, and have compared
random PCR amplification, which includes a PCR amplification step at the
double-stranded cDNA level and linear amplification, consisting of two cycles
of cDNA synthesis followed by in vitro transcription. The authors found that
brain microdissections prepared with either method gave similar expression
results, in their ability to identify differentially expressed genes. Analysis of
technical replicates, however, suggests that random PCR amplification may
be more reproducible, requires smaller RNA input, and generates cRNA of
higher quality than linear amplification [39]. Several comparisons between
amplification procedures are available in the literature [40-431.
pared. There are several normalization methods commonly used, and they
can be either based on the complete set of arrayed genes, or on endogenous
(housekeeping) or exogenous (spiked-in) control genes. All normalization
methods are based on some assumptions, such as that most gene expression
levels do not change across conditions or that total RNA levels in a sample
do not change. When relying on housekeeping genes for normalization, it
is useful to refer to a large number of genes, since expression of many of
the housekeeping genes can actually vary among different biological settings.
For more detailed discussion of data preprocessing, see Refs. 4, 44. These
first steps of data transformation are required to organize the data into a gene
expression matrix, a table where each row represents a gene and each column
an experimental condition. In addition to information on gene expression
levels, the table ideally contains information on the variability and accuracy
of measurement (e.g., standard deviations among replicates). Data organized
in such a way can then be used for analysis: the simplest is the identification
of differentially expressed genes. Many publications still characterize differen-
tially expressed genes as those whose expression ratios, or “fold changes” are
above an arbitrary set level; however, more complex algorithms that take into
account the intrinsic variability of the dataset are possible (see Refs. 4, 45, 46
for an overview of current methods). To further biological insight, additional
analytical methods can be applied to simplify the dataset and produce an
overview of the data. These analysis approaches can be “unsupervised”, that
is, based exclusively on the information intrinsic to the data (Figs. 18.1-2 and
18.1-3),or “supervised”, such as class prediction, which assigns new samples
to known classes, on the basis of already acquired biological information
(Figs. 18.1-4 and 18.1-5).Examples of unsupervised analyses are the various
“clustering” algorithms that create categories of similar data, either by group-
ing genes into classes with similar expression profiles, or by grouping samples
in classes defined by similarly expressed genes. Microarray analysis can also
be used to delineate the biological pathways involved in a process, by analyzing
whether certain functional classes of genes are overrepresented in a cluster.
There is a current effort to develop informatic tools that provide informative
gene annotation and correlation with biological pathways. Many of these,
such as ArrayXPath (http://www.snubi.org/software/ArrayXPath/), GoMiner
(http://discover.nci.nih.gov/gominer),MAPPfinder (http://www.genmapp.
org/MAPPFinder.html), or Onto-tools [47], use the organizing principles of
Gene Ontology, which characterize genes on the basis of molecular function,
biological process, and cellular component (http://www.geneontology.org). We
will be unable discuss here the many algorithms that have been formulated
to aid both in unsupervised and supervised analysis. For an introduction, we
refer the reader to Refs. 4,45,46. For links to analysis software the reader can
refer to further websites for array databases:
http://genopole. toulouse.inra.fr/bioinfo/microarray/;
http://www.rockefeller.edu./genearray/links.php;
18.7 Genome-wide Gene Expression Analysis I 1091
Fig. 18.1-2 In the unsupervised approach, expression between different samples, such
pattern-recognition algorithms are used to as hierarchical clustering o f groups o f genes
identify subgroups of samples that have with similar patterns o f expression in a set
related gene expression profiles. A of tumor samples. These so-called gene
commonly used method, termed hierarchical expression signatures may include genes
clustering [Z], calculates the similarity in expressed in a specific cell type or stage o f
expression o f t w o different genes across a differentiation, or genes expressed during a
set o f samples. Using this similarity particular biological response, such as
measure, genes can be ordered activation o f a specific intracellular signaling
hierarchically, leading to the identification o f pathway or cell proliferation. Typical graphic
genes that are regulated in a similar fashion representations o f data clustering are a
(coregulation). This method can also be dendogram and a “heat map”, which usually
used t o determine the similarity in gene color codes the levels o f gene expression.
69
0 cluster centers (“centroids”, in black) are chosen
0 randomly among the samples. The algorithm iteratively
00 assigns samples (in white) to the nearest (most similar)
centroid’s cluster and recalculates the centroid based on
the new inclusion. The process is repeated until all
0 O0 samples are assigned and centroids no longer change.
0 0
0 0
http://www.stat.uni-muenchen.de/-strimmer/rexpress.html;
http://nslij-genetics.org/microarray/soft.html;
ihome.cuhk.edu.hk/
-b400559/arraysoft.html
Disease 2
Gene combination 1
Experiment design
0 Goal of the experiment
0 Description of the experiment (e.g., abstract from a related publication)
disease state)
0 Manipulation of biological samples and protocols used
0 For each reporter, unambiguous characteristics of the reporter molecule, including the sequence for
oligonucleotide based reporters, the source, preparation and database accession number for long
reporters, and primers for PCR-based reporters
Appropriate biological annotation for each reporter
18.1.4
Applications and Practical Examples
Fig. 18.1-7 Control o f T helper cell this model is that cytokine receptor
differentiation. Following the identification signaling and STAT activation are placed
o f T-bet as the master transcription factor upstream o f the master T helper
inducing T h l development, a model o f T lineage-determining transcription factors
helper cell differentiation has been proposed T-bet and GATA-3. This model also infers
[68]. According t o this model, IL-12 signals that T-bet and GATA-3 antagonize each
through high-affinity IL-12 receptors via other. Subsequent studies have shown that
STAT4 t o activate expression ofT-bet. following stimulation o f na’l’ve CD4’ T cells,
Subsequently, T-bet activates expression o f expression of T-bet is strongly induced by
IFN-y and represses expression ofthe Th2 IFN-y signaling and STAT1 activation [71,
cytokines IL-4, IL-5, and IL-13. Consistent 721, indicating a positive feed back loop
with previous findings from several similar t o Th2 cell differentiation. This figure
laboratories (reviewed in Refs. 65, 70), IL-4 also indicates that in addition to TCR and
directs Th2 differentiation by a mechanism cytokine receptor signaling, costimulatory
that involves STAT6-dependent activation o f molecules (such as CD28), adhesion
GATA-3 expression. GATA-3 is the “mirror molecules (such as LFA-l), and signaling
image” ofT-bet in that it activates through other cell surface receptors (e.g.,
expression of Th2 cytokines and represses CD40-CD40 ligand interactions) can
the T h l cytokine, IFN-y. The main feature o f influence T helper cell differentiation.
4 Fig. 18.1-8 Gene expression profiles o f negative values indicate the opposite. Colors
human T h l and Th2 cells generated from indicate the “absolute” expression level o f a
five independent donors were analyzed gene (arbitrary fluorescence units). Black:
using high-density oligonucleotide arrays. high level ofexpression (>1000); grey:
Genes were selected i f differential medium level o f expression (200-1000);
expression between T h l and Th2 cells was white: low transcript abundance (<200). The
determined at a confidence level o f 95% on column next to the bar diagram indicates
the basis o f t-test statistics performed on a the P value obtained from the result o f a
dataset derived from five independent paired t-test performed with the data from
experiments and i f at least a twofold change independently derived T h l and Th2 cell lines
in expression level was observed. Bars from five donors. Genes were grouped
represent “fold change” ofthe mRNA level according to their presumed function, based
o f a particular gene when comparing T h l on information available in public databases
versus Th2 cells (mean o f five experiments). or in the literature (from Ref. 73).
Positive values indicate that the transcript is
more abundant in T h l than in Th2 cells and
Well-established marker genes for Thl cells, such as IFN-y and IL-12Rp2
were found at much higher levels in Thl than in Th2 cells (Fig. 18.1-8).In
addition, some genes that had previously not been implicated in the process
of T helper cell differentiation, such as oncostatin M (OSM), were found to
be overexpressed in Thl cells (Fig. 18.1-8).The gene expression profiles of
Thl and Th2 cells also revealed differential expression of genes encoding
transcription factors, some of which (GATA-3and IRF-1) had previously been
characterized in the context of T helper cell differentiation [69, 75-77]. In
addition, several transcription factors that had not been associated with T
helper cell polarization were also identified, including ETS-1, RORa2, IRF-
7A, and c-fos. Although, the target genes of these factors in regulating the
gene expression patterns specific to each T helper cell subset are not known,
it is possible that some of these factors may control individual cytokine
gene expression as GATA-3 and T-bet control IL-4 and IFN-y production,
respectively. In fact, the recent analysis of Ets-1-deficient mice demonstrated
that this transcription factor is an important cofactor ofT-bet to promote IFN-y
production and is essential for the efficient development ofThl responses [78].
Thl cells are more susceptible to activation-induced cell death (AICD), a
mechanism for downregulation of an immune response and maintenance of
T-cell tolerance and are important mediators of tissue damage in inflammatory
and autoimmune diseases. Results from our gene expression analysis
suggested a potential mechanism for increased susceptibility of Thl cells
to AICD and their cytotoxic effects [73]. Thl cells expressed higher levels of
TRAIL than Th2 cells, an apoptosis inducing molecule; BAK, a proapoptotic
Bcl-2 family member; and proapoptotic caspase-8, perforin and granzyme B
(Fig. 18.1-8).The functional program of Thl and Th2 lymphocytes requires
these cells to be home to different sites. Thl cells have been shown to
preferentially express the chemokine receptor CCR5 and CXCR3, whereas
Th2 cells were reported to preferentially express CCR3, CCR4, CCR8, and
78 Genome and Proteome Studies
1102
I the chemoattractant receptor CRTh2 [79]. Other gene expression changes
identified in our study were consistent with previous experiments defining
differential recruitment of Thl and Th2 cells to sites of inflammation. We
reported an increased expression of mRNA for fucosyltransferase VII (FucT-
VII), which codes for an enzyme that mediates the fucosylation of selectin
ligands on the surface of T cells (Fig. 18.1-8).This fucosylation is required for
the first step of lymphocyte adhesion to endothelial cells, “rolling”. Recent
in vivo observations have validated the biological relevance of this finding:
FucT-VII was in fact found to be upregulated in Thl cells infiltrating the
inflamed joints of patients affected by either RA or juvenile idiopathic arthritis
(JIA) [73,80].Moreover, FucT-VII expression and increased P-selectin binding
capacity of T cells were associated with a more severe course of the disease
[80].These data indicate a critical role of FucT-VII in the enhanced homing
of T cells to the inflamed synovium and suggest that inhibitors of FucT-VII
enzyme activity may be of significant therapeutic value in the treatment of
chronic arthritis.
IL-12 also induced two chemokine receptors CCR5 and CCR1, both of
which promote increased responsiveness ofThl, but not Th2, cells to MIP-la
or RANTES. The activity of RANTES and other chemokines is regulated
by CD2G (dipeptidyl-peptidase 1V)-mediated cleavage. The DPP4 (encoding
CD2G) mRNA was found upregulated in Thl cells compared to Th2 cells
(Fig. 18.1-8).The inactivation of chemokines by CD2G may contribute to the
fine control of chemotactic migration of Thl cells by providing a stop signal
that keeps cells at the site of inflammation. Finally, higher expression of
integrin aGP1 on Thl cells suggested that adhesion and extravasation of Thl
cells into tissues triggered by inflammatory chemokines might be mediated by
higher surface levels of integrin aGP1 binding to laminin in basal membranes
and extracellular matrix.
Of the 215 genes which we found differentially expressed in Thl and Th2
cells, 157 genes were expressed at higher levels in Thl cells and 58 genes
were overexpressed in Th2 cells. There are several possible explanations for
the apparent Thl bias of our shdy. Previous studies have demonstrated that
Th2 cells may require more time to acquire their effector functions than Thl
cells [81,821.
Hamalainen et al. have used an oligonucleotide microarray specifically
designed to screen for 250 inflammation-related genes to identify those
differentially expressed in human, cord blood-derived Thl and Th2 lines,
2 weeks after initial stimulation [83]. Although the experimental protocol
to generate Thl and Th2 cells used in the study by Hamalainen et al. was
quite distinct from our protocol [73], there was a large overlap of the genes
identified in both studies. In addition to the Thl/Th2 signature cytokines,
several chemokines (MIP-la, MIP-lP, RANTES) and chemokine receptors
(CCR1, CCR2, CCR4, CCR5) were found differentially expressed in human
Thl and Th2 cells [83]. These results further emphasize the importance of
correct homing of polarized effector T cells to eradicate pathogens.
18. I Genome-wide Gene Expression Analysis 1 1103
T helper cells. Two independent experiments were performed and genes that
showed greater than twofold changes in both were chosen for further analysis.
A global hierarchical clustering analysis revealed that the expression pattern
of day 1 or day 2 Thl cells is closer to day 1or day 2 Th2 cells than to Thl cells
harvested on day 3 or 4. A similar relationship was also observed for Th2 cells,
indicating that at the global gene expression level, Thl and Th2 cells begin
to diverge at day 3 after primary stimulation [92]. These findings correlate
with previous studies that analyzed the kinetics of changes in the chromatin
structure at the IFN-y and IL-4 cytokine loci. Histone hyperacetylation at the
IL-4 locus was observed in both Thl and Th2 cultures during the first 2 days
of T helper cell differentiation. However, at later time points of T helper cell
differentiation, histone acetylation was selectively detected at the IL-4 locus in
T h 2 cells and at the IFN-y locus in Thl cells [93].
The above studies have provided insight into the mechanisms that control
the development of polarized helper T-cell subsets and have given important
information about previously unknown effector functions of these cells.
However, the in vitro systems used to generate polarized Thl and Th2 cells
might not reproduce the conditions that lead to the differentiation of these
subsets in vivo. In addition, a critical issue that could not be addressed in these
studies concerns the interaction of differentiated Thl and Th2 cells with the
tissues, during an infection or in the setting of an inflammatory disease.
Infection with the parasite Schistosoma mansoni is a well-established model
to study Th2 responses in vivo [64]. Intravenous injection of S. mansoni
eggs, which are retained in the lung, results in a strong Th2 response and
granuloma formation in the lung. This model has been widely used to study
basic mechanisms of asthma, allergy, and other Th2-mediated inflammatory
diseases. Neutralization of IL-4 in this model results in a reduced granuloma
size and a diminished Th2 response, whereas neutralization of IL-12 results
in increased granuloma size and Th2 cytokine production. In the absence of
the immunoregulatory cytokine IL-10, enhanced levels of IL-4 and IL-12 are
secreted, compared to wild-type mice [94]. IL-4/IL-10 and IL-lO/IL-l2 double
knockout mice develop highly polarized Thl and Th2 responses, respectively,
after infection with S. rnansoni eggs [95]. Sandler et al. have recently analyzed
gene expression profiles of lung tissue from wild type, IL-4/IL-10, and
IL-lO/IL-12 double-deficient mice at several time points after challenge with
S. mansoni eggs [96].They found that Thl-polarized mice developed only small
granulomas and expressed genes that are characteristic of tissue damage. In
addition to genes known to be associated with Thl responses (IFN-y-induced
genes and TNF-a-induced protein 2), Thl-polarized mice expressed several
chemokines (IFN-y-inducibleprotein10 and RANTES),as well as Natural Killer
(NK) cell ligands. Activation of macrophages, a hallmark of Thl responses,
was reflected by the upregulation of MIP-3a, macrophage-expressed gene 1,
macrosialin, and macrophage C-type lectin. Thl-polarized mice also showed
features of the acute-phase response, as levels of both IL-1B and its activator
18. I Genome-wide Gene Expression Analysis I 1105
there is now good evidence that CD4+ CD25+ Treg constitute a separate
lineage that develops in the thymus (see Refs. 100-102 for recent reviews).
The recent identification of FOXP3 as a transcription factor essential for the
development and function of CD4+ CD25+ Treg has provided an important
breakthrough for the analysis of this subpopulation of peripheral CD4+ T cells
[ 103- 1051. Evidence that this forkhead/winged-helix transcription factor is
essential for Treg development comes from the analysis of scurfy mice. These
mice carry a mutated Fox@ gene and are characterized by a massive activation
and expansion of CD4+ T cells resulting in gross enlargement of secondary
lymphoid organs, severe dermatitis, lymphocytic infiltration of multiple
organs, hypergammaglobulinemia, and autoimmune hemolytic anemia [ 1061.
The analysis of scurfy mice demonstrated that the disease is mediated by CD4+
T cells. This finding was confirmed by the analysis of FOXP3-deficient mice,
which display polyclonal activation of CD4+ T cells already 7 days after birth
[103]. By knock-in of a GFP-FOXP3 reporter allele into the murine FOXP3
locus, the Rudensky laboratory has now provided compelling evidence that
Treg constitute a separate lineage that develops in the thymus and that FOXP3
is in fact the lineage-specification factor of these cells [107].
Importantly, FOXP3 mutations are also responsible for the pathogenesis
of immune dysregulation, polyendocninopathy, enteropathy, X-linked (IPEX),
a fatal human X-linked disorder characterized by extensive multiorgan lym-
phocyte infiltration and abnormal activation of effector CD4+ Tcells. At a
very young age, IPEX patients present with massive lymphoproliferation,
early onset IDDM, thyroiditis, eczema, severe enteropathy, and food allergies
preventing normal food intake, and additional autoimmune pathologies such
as autoimmune hemolytic anemia and thrombocytopenia, as well as severe
infection [108-1101. Affected males succumb to the IPEX syndrome between
3 and 4 weeks of age. Altogether, there is compelling evidence that FOXP3
is necessary for development of CD4+ CD25+ Treg in mice, and the identifi-
cation of FOXP3 mutations in IPEX patients suggests that this transcription
factor plays a similar role in humans. Although the identification of FOXP3
as lineage specification of Treg has provided a precious tool to understand
the ontogeny and function of this lineage, the important question of how
Treg acquire and exert their suppressive action remains unresolved [ill]. In
particular, the target genes of Foxp3 have not been identified and nothing
is known about the molecular mechanism by which this transcription factor
downregulates the activity of CD4+ T cells. Given the accumulating evidence
that the immunosuppressive potential of Treg could be used therapeutically
to treat autoimmune diseases and facilitate transplant tolerance, or could be
targeted to elicit tumor immunotherapy, it is not surprising that many labo-
ratories are currently trying to unravel the molecular basis of Treg-mediated
immunosuppression. Several labs have performed large-scale gene expression
studies to identify molecules mediating the suppressive effects of Treg. Most
of these studies have been performed in mice and, given the current pace of
the field, human studies are sure to follow soon.
1108
I 18 Genome and Proteome Studies
Gavin et al. have purified resting CD4+ CD25+ and CD4+ CD25- T cells
from normal BG mice by cell sorting and have analyzed their gene expression
profiles using Affymetrix m u l l K and mul9K oligonucleotide arrays [112].
In the first experiment, biotinylated cRNA was amplified directly from
cDNA, whereas in the second experiment two sequential rounds of in
vitro transcription were used to obtain enough cRNA for analysis. With a
few exceptions, only transcripts that were differentially expressed in both
experiments were considered for confirmation by real-time RT-PCR. A
comforting finding was the strong upregulation of CD25 in Treg when
compared to CD4+ CD25- T cells. Additional cell surface receptors that were
upregulated in Treg included cytotoxic T lymphocyte-associated protein 4
(CTLA4), a molecule that has been implicated in the suppressive effects
of Treg, and several members of the TNF receptor superfamily, including
glucocorticoid-induced tumor necrosis factor receptor (GITR, also called
Tnf$l8), OX40 (also called Tnf$4), 4-1BB (also called Tnfsp),and TNFR2
(also called Tnfsfllb, the p75 chain of the TNF receptor). Together with the
overexpression of FAS-associated phosphatase I (FAP-l), these data point
to a prolonged survival of Treg by restriction of TCR-induced apoptosis.
Furthermore, the authors found higher transcript levels of TGF-BRI, the
signal-transducing subunit of TGF-B, an important negative regulator of
cell growth and inflammation. Additional transcripts that were found
overexpressed in Treg include the suppressors of cytokine signaling SOCSl
and SOCSZ, as well as RGSI, a molecule that inhibits chemoltine-induced
signaling through heterotrimeric G proteins [ 1121. The authors concluded that
the interplay of several pathways, such as increased T-cell survival and blockage
of TCR and cytokine signaling, may account for the unique characteristics of
Treg [112].
The characteristics of mouse Treg and CD4+ CD25- Tcells were also
analyzed in a similar study by McHugh et al. [113].As in the previous report,
only two biological replicates were performed; however, this study also analyzed
the gene expression profiles of Treg and CD4+ CD25- T cells that had been
stimulated for 12 and 48 h with anti-CD3 antibodies. Gene expression profiling
was performed using Affymetrix m u l l K oligonucleotide arrays. Only 29 genes
were found to be differentially expressed when comparing resting Treg and
CD4+ CD25- T cells. For unknown reasons, in this study,the “positive control”
of this experiment, CD25, was not detected in Treg [113].Although the use of
only two replicate experiments in both studies certainly does not allow major
conclusions to be drawn, the fact that 50% ofthe genes found by McHugh et al.
were also detected in the study by Gavin et al. provides some cross-validation
of the results [112, 1131. McHugh et al. focused their study on the functional
role of GITR for the suppressive functions of Treg and demonstrated that
agonistic antibodies against G ITR could abrogate Treg-mediated suppression
in in vitro T-cell suppression assays [113].Additional microarray-based studies
have identified neuropilin-1 (Nrpl) [ 1141 and lymphocyte activation gene-3
(Lag-3) [115] as Treg-specific cell surface molecules. With respect to Lag-3, it
18.1 Genome-wide Gene Expression Analysis I 1109
should be noted that this receptor is also highly expressed on activated Thl
cells [116].
Herman et al. have recently analyzed the function of Treg in a type 1
diabetes model in mice [117].Type 1 diabetes models are particular useful for
the study of autoimmune diseases because mice spontaneously develop the
disease and their pathology is very similar to the human counterpart, IDDM.
The disease develops in two stages: in the BDC2.5 model cells invade the
pancreas and set up a massive infiltrate in the islets at 15-18 days of age
(insulitis). Subsequently, only 10-20% of animals develop diabetes resulting
from the massive destruction of pancreatic ,&cells at around 20 weeks of age.
The authors studied whether the relatively long prediabetic period and low
incidence of diabetes in this model may be explained by the presence of Treg
in the pancreas. They show that both Treg and effector T cells coexist within
the pancreatic lesion before the onset of diabetes. To assess the potential
roles of Treg within the lesion, they sorted CD4+ CD25+ CDG9- Treg cells
from the pancreas of prediabetic mice and compared their gene expression
profile to effector T-cell populations, also isolated by cell sorting from pancreas
preparations. Since only small cell numbers could be obtained with these
procedures, the authors used commercial kits to amplify RNA. Three to
five independent experiments were performed for each cell population and
statistical algorithms were used for data analysis [117]. In addition to genes
overexpressed in Treg, such as GITR, CD103, Nrp-1, IL-10, and CTLA-4, the
authors identified several molecules that had previously not been associated
with Treg functions [117]. One of these molecules, inducible costimulator
(ICOS), was shown to be specifically upregulated on Treg purified from
pancreas but not on Treg that had been purified from peripheral lymph nodes.
The authors showed that blockade of ICOS results in a rapid progression
from insulitis to diabetes, giving a strong indication that this molecule may
play an important role in the maintenance of the prediabetic stage [117].This
study provides an excellent example of how increased understanding of the
molecular and cellular basis of regulatory events in the pancreatic islets could
lead to the development of therapies that promote long-term tolerance even
after an immune response has been established in the lesion.
18.1.5
Future Development
Genome-wide gene expression analysis has become a tool that is widely used
in biology and biomedical research. Technological improvements are likely to
occur with respect to reduced sample input and/or more robust protocols for
the preamplification of RNA, an increase of sensitivity, a better signal-to-noise
ratio, the development of exon-specific probes to tackle the important issue
of differentially spliced transcripts and of probes allowing the analysis of
micro-RNAs. An equally important, although certainly more difficult, issue
1110
I 18 Genome and Proteome Studies
References
H. Xiao, K.E. Rogers, J.S. Wan, M.R. 39. S. Klur, K. Toy, M.P. Williams,
Jackson, M.G. Erlander, et al. Gene U.Certa, Evaluation of procedures
expression profiles of laser-captured for amplification of small-size
adjacent neuronal subtypes, Nut. samples for hybridization on
Med. 1999, 5, 117-122. microarrays, Genomics 2004, 83,
33. C. Leethanakul, V. Patel, J. Gillespie, 508-5 17.
M. Pallente, J.F. Ensley, 40. J. McClintick, R. Jerome,
S. Koontongkaew, L.A. Liotta, C. Nicholson, D. Crabb,
M. Emmert-Buck, J.S. Gutkind, H. Edenberg, Reproducibility of
Distinct pattern of expression of oligonucleotide arrays using small
differentiation and growth-related samples, BMC Genomics 2003, 4, 4.
genes in squamous cell carcinomas 41. R. Singh, R.J. Maganti, S.V. Jabba,
of the head and neck revealed by the M. Wang, G. Deng, J.D. Heath,
use of laser capture microdissection N. Kurn, P. Wangemann, Microarray
and cDNA arrays, Oncogene 2000, 19, based comparison of three
3220-3224. amplification methods for nanogram
34. L.V. Hooper, M.H. Wong, A. Thelin, amounts of total RNA, AmJ Physiol
L. Hansson, P.G. Falk, J.I. Gordon, Cell Physiol, 2005, 288, 1179-1189.
Molecular analysis of commensal 42. L. Li, J. Roden, B.E. Shapiro, B.J.
host-microbial relationships in the Wold, S. Bhatia, S.J. Forman,
intestine, Science 2001, 291,881-884. R. Bhatia, Reproducibility,fidelity,
35. V. Luzzi, M. Mahadevappa, R. Raja, and discriminant validity of mRNA
J.A. Warrington, M.A. Watson, Amplification for microarray analysis
Accurate and reproducible gene from primary hematopoietic cells, J.
expression profiles from laser Mol. Diagn. 2005, 7,48-56.
capture microdissection, transcript 43. J. J. Upson, R. Stoyanova, H.S.
amplification, and high density Cooper, C. Patriotis, E.A. Ross,
oligonucleotide microarray analysis, B. Boman, M.L. Clapper, A.G.
J. Mol. Diagn. 2003, 5, 9-14. Knudson, A. Bellacosa, Optimized
36. C. King, N. Guo, G.M. Frampton, procedures for microarray analysis of
N.P. Gerry, M.E. Lenburg, C.L. histological specimens processed by
Rosenberg, Reliability and laser capture microdissection, 1.Cell.
reproducibility of gene expression Physiol. 2004, 201, 366-373.
measurements using amplified KNA 44. B.M. Bolstad, F. Collin, K.M.
from laser-microdissected primary Simpson, R.A. Irizarry, T.P. Speed,
breast tissue with oligonucleotide Experimental Design and Low-Level
arrays, J. Mol. Diagn. 2005, 7, 57-64. Analysis of Microarray Data, Int Rev
37. T. Ernst, M. Hergenhahn, Neurobiol. 2004, 60, 25-58.
M. Kenzelmann, C.D. Cohen, 45. N.J. Armstrong, M.A. van de Wiel,
M. Bonrouhi, A. Weninger, Microarray data analysis: From
R. Klaren, E.F. Grone, M. Wiesel, hypotheses to conclusions using
C. Gudemann, J. Kuster, W. Schott, gene expression data, Cell. Oncol.
G. Staehler, M. Kretzler, 2004,26,279-290.
M. Hollstein, H.-J. Grone, Decrease 46. D.K. Slonim, From patterns to
and gain of gene expression are pathways: gene expression data
equally discriminatory markers for analysis comes of age, Nat. Genet.
prostate carcinoma: A gene 2002,32,502-508.
expression analysis on total and 47. P. Khatri, P. Bhavsar, G. Bawa,
microdissected prostate tissue, Am. J . S. Draghici, Onto-Tools:an ensemble
Pathol. 2002, 160, 2169-2180. of web-accessible,ontology-based
38. D.J. Kelly, S. Ghosh, RNA profiling tools for the functional design and
for biomarker discovery: practical interpretation of high-throughput
considerations for limiting sample gene expression experiments, Nucleic
sizes, Dis. Markers 2005, 21,43-48. Acids Res. 2004, 32, W449-W456.
References I 1113
central role for IL-10 in polarizing 103. J.D. Fontenot, M.A. Gavin, A.Y.
both T helper cell 1-and T helper cell Rudensky, Foxp3 programs the
2-type cytokine responses in vivo, J. development and function of
Immunol. 1997,159,5014-5023. CD4+CD25+ regulatory T cells, Nut.
95. K.F. Hoffmann, S.L. James, A.W. rmmunol. 2003,4, 330-336.
Cheever, T.A. Wynn, Studies with 104. R. Khattri, T. Cox, S.A. Yasayko,
double cytokine-deficientmice reveal F. Ramsdell, An essential role for
that highly polarized Thl- and Scurfin in CD4+CD25+ T regulatory
Th2-Type cytokine and antibody cells, Nut. Immunol. 2003, 4,
responses contribute equally to 337-342.
vaccine-induced immunity to 105. S . Hori, T. Nomura, S. Sakaguchi,
schistosoma mansoni, J . Immunol. Control of regulatory T cell
1999, 163,927-938. development by the transcription
96. N.G. Sandler, M.M. Mentink-Kane, factor Foxp3, Science 2003, 299,
A.W. Cheever, T.A. Wynn, Global 1057-1061.
gene expression profiles during acute 106. M.E. Brunkow, E.W. Jeffery, K.A.
pathogen-induced pulmonary Hjerrild, B. Paeper, L.B. Clark, S.A.
inflammation reveal divergent roles Yasayko, J.E. Wilkinson, D. Galas,
for Thl and Th2 responses in tissue S.F. Ziegler, F. Ramsdell, Disruption
repair, /. Immunol. 2003, 171, of a new forkheadlwinged-helix
3655-3667. protein, scurfin, results in the fatal
97. S. Sakaguchi, N. Sakaguchi, lymphoproliferative disorder of the
M. Asano, M. Itoh, M. Toda, scurfy mouse, Nut. Genet. 2001, 27,
Immunologic self-tolerance 68-73.
maintained by activated T cells 107. J.D. Fontenot, J.P. Rasmussen, L.M.
expressing IL-2 receptor alpha-chains Williams, J.L. Dooley, A.G. Farr, A.Y.
(CD25).Breakdown of a single Rudensky, Regulatory T cell lineage
mechanism of self-tolerance causes specification by the forkhead
various autoimmune diseases, 1. transcription factor foxp3, Immunity
Immunol.1995, 155,1151-1164. 2005,22, 329-341.
98. E.M. Shevach, CD4+ CD25+ 108. T.A. Chatila, F. Blaeser, N. Ho, H.M.
suppressor T cells: more questions Lederman, C. Voulgaropoulos,
than answers, Nut. Rev. Immunol. C. Helms, A.M. Bowcock, JM2,
2002,2,389-400. encoding a fork head-related protein,
99. S. Sakaguchi, Naturally arising is mutated in X-linked
CD4+ regulatory t cells for autoimmunity-allergic disregulation
immunologic self-tolerance and syndrome, J. Clin. Invest. 2000, 106,
negative control of immune R75-R81.
responses, Annu. Rev. Immunol. 109. R.S. Wildin, F. Ramsdell, J. Peake,
2004,22,531-562. F. Faravelli, J.L. Casanova, N. Buist,
100. R.H. Schwartz, Natural regulatory T E. Levy-Lahad, M. Mazzella,
cells and self-tolerance, Nut. 0. Goulet, L. Perroni, F.D. Bricarelli,
Immunol.2005, 6, 327-330. G. Byrne, M. McEuen, S . Proll,
101. S. Sakaguchi, Naturally arising M. Appleby, M.E. Brunkow, X-linked
Foxp3-expressingCD25+CD4+ neonatal diabetes mellitus,
regulatory T cells in immunological enteropathy and endocrinopathy
tolerance to self and non-self, Nut. syndrome is the human equivalent of
Immunol. 2005, 6,345-352. mouse scurfy, Nut. Genet. 2001, 27,
102. J.D. Fontenot, A.Y. Rudensky, A well 18-20.
adapted regulatory contrivance: 110. C.L. Bennett, J. Christie, F. Ramsdell,
regulatory T cell development and M.E. Bmnkow, P.J. Ferguson,
the forkhead family transcription L. Whitesell, T.E. Kelly, F.T.
factor Foxp3, Nut. Immunol. 2005, 6, Saulsbury, P.F. Chance, H.D. Ochs,
331-337. The immune dysregulation,
References I 1 1 1 7
1118
I 18 Genome and Proteome Studies
18.2
Scanning the Proteome for Targets of Organic Small Molecules Using
Bifunctional Receptor Ligands
Nikolai Hey
Outlook
18.2.1
Introduction
Chemical Biology. From Small Molecules to .System Biology and Drug DesigM
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
18.2 Scanning the Proteornefor Targets oforganic Small Molecules I 1119
achieve an optimal target spectrum and a therapeutic index for a given drug
candidate. Alternatively, the identification of proteins with known function
as novel molecular targets could reveal a previously unrecognized therapeutic
application(s) for a drug candidate or a marketed drug. In some instances,
this could also present an opportunity to resurrect drug candidates that failed
to progress in the discovery or development process due to the lack of a
good understanding of their mechanisms of action (MoA). With regard to
drug development, target discovery may also lead to the identification of
novel surrogate markers for therapeutic efficacy, permitting an assessment
of the extent to which a putative therapeutic drug might yield a satisfactory
clinical result. This is particularly important for the development of the new
generation of mechanism-based drugs. Thus, the identification of protein
targets of organic small molecules is of fundamental importance in many
areas of biomedical research.
Recent chemical proteomic initiatives have resulted in the emergence of
various alternatives to classical protein activity profiling (e.g., i n uitro kinase
assays using purified enzymes) for small molecule target identification. One
such alternative method utilizes a variety of chemically reactive probes to
profile and identify enzymes or other protein targets in complex mixtures
based on their catalytic or ligand-binding activities. This approach, known as
activity-based protein profiling (ABPP), is designed to address subproteomes,
such as a discrete enzyme family [l-71. Depending on the spectrum of targets
recognized by a “pan-active’’ chemical probe, competitive profiling provides
information on the selectivity profile of a compound. Because the number of
suitable reactive probes is steadily growing, ABPP promises to become a more
widely used methodology in chemical proteomics [7].
Another alternative that has been recently described is based on monitoring
the interaction of a small molecule with proteins expressed as fusions to T7
bacteriophage [8,91. This approach has been applied successfully in target and
selectivity profiling of kinase inhibitors [9]. I t is conceivable that this approach
could be adapted to support cDNA library screening, which would expand
its application to proteins other than kinases - although it would be limited
to proteins that function as monomers. Several other methods for detecting
small molecule-protein interactions have been described, including ribosome
display, drug-far western, and protein or small molecule microarray-based
methods [ 10- 141, but these studies consisted primarily of proof-of-principle
studies using known interaction partners. In contrast, the other alternatives
noted above have already been successfully applied to the profiling of specific
subproteomes, and have resulted in the discovery of many novel molecular
interactions.
Traditionally, the identification of protein targets of small molecules has
relied on in uitro biochemical methods, such as photocross-linking, radiolabeled
ligand binding, and affinity chromatography. Affinity chromatography is still a
widely used method and can be used to identify targets present in any cell extract
of choice. Therefore, it is, in principle, not restricted to an analysis of specific
1120
I 78 Genome and Proteome Studies
18.2.2
History and Development
The yeast three-hybrid (Y3H) system is a cellular assay system designed for
the identification and characterization of small molecule-protein interactions
in intact cells [25]. It uses yeast Saccharomyces cerevisiae as a host system
and combines aspects of the yeast two-hybrid (Y2H) system [26] with recent
developments in chemical dimerizer technology [27, 281.
The discovery of the MoA of the immunosuppressive macrocyclic lactone
lactams FK506 and rapamycin marked the beginning of our current
understanding of chemical dimerizers [29]. These bifunctional molecules
are able to simultaneously interact with two different proteins through
distinct structural elements, promoting the formation of a ternary complex
(Fig. 18.2-1). In the case of FK506, the ternary complex consists of
FKBP12-FK506-calcineurin. Recruitment of calcineurin, a Ca’+/calmodulin-
dependent protein phosphatase, to the FKBP12- FK506 complex inhibits
its function. This results in impaired signaling of the T-cell antigen
receptor (TCR) and subsequent immunosuppression [30]. Rapamycin forms
a FKB P 12-rapamycin- FRAP ternary complex (FRAP: FKBPl2-rapamycin-
associated protein, also named RAFTZ, RAPTI, or TOR). Recruitment of
18.2 Scanning the Proteomefor Targets oforganic S m a l l Molecules 1 1121
Fig. 18.2-1 (a). Chemical structures ofthe two fusion proteins containing FKBP and
immunosuppressants FK506 and FRB (FKBP12-rapamycin binding domain of
rapamycin. (b) Ribbon diagram ofthe FRAP) fused to specific signaling domains.
FKBP-FK506-calcineurin complex (adapted DD - "docking domain", which could be a
from Griffith et al., Cell 1995, 82, 507-522, DNA-binding domain or a sequence causing
with permission from Elsevier). Color membrane localization ofthe FKBP fusion
coding is as follows: calcineurin A (blue), protein. ED - "effector domain", which
calcineurin B (green), FKBP12 (red), FK506 could be a transcription activation domain
(white). (c) Schematic representation o f or Some other signaling domain (e.g.?
how rapamycin may be used to induce a kinase).
signal transduction through dimerization of
Fig. 18.2-2 The Y2H system: interaction o f grow in the absence of histidine in the
bait and prey fusion proteins activates the culture medium), LacZ (can be detected in a
expression of a reporter gene. colorimetric assay). Inset shows an array of
DBD - DNA-binding domain. yeast cells that has been generated using an
AD - transcription activation domain. appropriate robot. As shown, LacZ reporter
R E - promoter response element. induction (bluelgreen colored yeast cells)
Reporters: HIS3 (an auxotrophic marker, the reflects a productive protein-protein
induction of which enables yeast cells to interaction.
Fig. 18.2-3 Y3H system. (a) Components a mutant form o f glucocorticoid receptor
ofthe Y3H system. A MTX-based hybrid (GR) with high affinity for dexamethasone
ligand associates with a DBD-fusion protein (DEX). Activation o f gene expression is
and AD-fusion protein. Formation o f a reflected in positive yeast growth (HIS3
complex induces activation o f a reporter marker). Alternatively, induction o f the Lac2
gene. In the example shown here, the reporter is detected by a colorimetric assay.
MTX-fusion compound (MFC) incorporates (c) Example o f outgrowth o f yeast cells in
a PEG linker and the small molecule kinase which a positive interaction has taken place
inhibitor purvalanol B (PurvB). (b) Example in the presence o f a MFC. Such kind o f yeast
o f a Y3H interaction. The DNA-binding colonies, typically formed during cDNA
domain fusion protein is a LexA library screens, can be picked and subjected
(DBD)-DHFR fusion. The AD-fusion protein t o subsequent analysis, as described in
is a CR*-Cal4 (AD) fusion. GR* represents Fig. 18.2-4.
18.2 Scanning the Proteomefor Targets oforganic Small Molecules I 1125
Fig. 18.2-4 Y3H-cDNA library screening interrogated once again with the t e s t MFC
workflow, as recently described (adapted and control compounds (96-well format
from Becker et al., Chem. Biol. 2005, 1 7 , assay). Each 96-well plate represents the
21 1-223, with permission from Elsevier). effects o f one particular compound. Images
Screening involves transformation o f yeast from each array screen are then clustered to
cells with a cDNA library, selection ofyeast yield a composite image, as shown. The
colonies (HIS3 selection), picking o f yeast composite image shows an example o f the
cells, rearraying ofthese yeast cells, interaction o f MFCs o f kinase inhibitors, and
interrogation of arrays with t e s t MFC and variants thereof, with their respective
other hybrid ligands (and MTX-PEG), protein kinase targets (adapted from Becker
picking o f positives, isolation o f plasmid et al., Chem. Biol. 2005, 1 7 , 21 1-223, with
DNA, and sequencing. Plasmids are then permission from Elsevier).
retransformed into yeast cells and arrays are
Fig. 18.2-5 A Y3H competition assay. The activation induced by a “reference” MFC
competition assay provides a measure o f (reflected in the decrease in yeast growth in
cellular uptakelfunctionality o f a t e s t MFC. response to increasing concentration o f test
Also shown is an example o f experimental MFCs) (adapted from Becker et al., Chem.
results showing a dose-dependent Biol. 2005, I 1 , 21 1-223, with permission
competitive inhibition o f HIS3 reporter from Elsevier).
Y3H may not be suitable for lead discovery, it could prove particularly useful
in tracing an observed therapeutic/physiological effect of a small molecule on
one or more molecular targets or, alternatively, reveal molecular targets that
could suggest an alternative therapeutic potential for a particular drug, drug
candidate, or chemical class.
18.2.3
General Considerations
As outlined above, Y3H offers a promising alternative to other methods for the
identification and characterization of small molecule-target interactions. It
provides a means to rapidly screen complex cDNA libraries encoding candidate
target proteins. The identification of an interaction is directly associated with
the availability of a cDNA clone encoding a target protein, which enables
rapid secondary validation experiments. Furthermore, once a clone has been
identified, it becomes a permanent resource that can be interrogated in a
reiterative fashion with any small molecule hybrid ligand of interest. Another
advantage of Y3H is that it is a binding assay that does not require a priori
knowledge of the biochemical activity of candidate target proteins. Thus, it
also makes possible the identification and characterization of targets whose
biological functions are unknown.
Compared to Y2H, Y3H boasts the advantage that the DBD-fusion protein
for a given system (e.g., LexA-DHFR, see Fig. 18.2-3)remains invariant. Many
1128
I 18 Genome and Proteome Studies
a rational basis for the positioning of PEG linker in the test molecule, is
not available, positional scans may have to be performed. In that respect,
Y3H has constraints similar to those seen with aftinity purification methods,
which require modification and solid-phase immobilization of a test molecule.
MTX-based hybrid ligands that cause growth inhibition or cell death in yeast
cells would also be unsuitable, although we have not yet encountered such
a case. One complication, which we encountered once, involved a MTX
ligand that autoactivated the Y3H system. This appeared to be due to the
interaction of the test molecule with a yeast protein that, when recruited
to the promoter region of the reporter gene, causes transcription activation
(manuscript in preparation). This supposition is based on the findings that
the same hybrid ligand was not autoactivating in a yeast strain that was
made deficient in the gene encoding that particular yeast protein (which was
identified by screening of a yeast cDNA library). Alternatively, autoactivation
could be suppressed by adding 3-amino-1,2,4-triazole (3AT) to the culture
medium (as frequently done in Y2H experiments that utilize baits that
are autoactivating [41, 431). Another arguable limitation of the Y3H system
is that robust screening requires, ideally, robotic handling of yeast cells
and the generation of yeast cell arrays. This technical capability may not
be available to every laboratory, in which case more labor intensive and
error prone manual handling and spotting of yeast cells would have to be
performed.
In summary, although the application of Y3H may be limited in some
scenarios, most of these are likely to be rare events. The most limiting
factor is likely the requirement for expression of fusion proteins that
are able to translocate into the nucleus of yeast cells while retaining a
properly folded small molecule binding domain. This may, however, not
be an issue with many proteins, because of their modular structure. A
modular structure favors proper folding of a binding domain, even when
it is expressed in isolation or as part of a hybrid fusion protein. Thus, the
use of complex cDNA libraries, which contain multiple fusion variants of
a particular protein, is preferable and will decrease the occurrence of false
negatives.
18.2.4
Applications and Practical Examples
18.2.5
Future Developments
Y3H is the first 3H system that has been successfully applied to large scale
screening for small molecule targets. Future developments of 3H systems that
operate in mammalian cells rather than in yeast cells should further expand
the range of applications of the 3H concept. As already discussed, Y3H relies
on the expression of hybrid proteins in yeast cells and their translocation into
the nucleus. Furthermore, yeast cells are generally less permeable to small
molecules than mammalian cells, with the previously noted exception of MTX
heterodimers. These drawbacks render it difficult to perform competition
experiments, in which the ability of a test compound to compete with a hybrid
ligand for binding to a specific target protein is determined. This would be less
of an issue in a mammalian 3H (M3H) system. Furthermore, a M3H system
may facilitate the detection of interactions that require accessory proteins or
posttranslational modifications of the target protein.
Several 2H systems that enable the detection of protein-protein interactions
in mammalian cells have been described, for example: (a) the ubiquitin-split-
protein-sensor (USPS) technology [59], (b)two-component protein fragment
complementation assays (PCAs)[GO, 611 (e.g., systems based on reconstitution
of split-DHFR, split-b-lactamase,and split-GFP),and (c) interaction technolo-
gies based on resonance energy transfer between reporter proteins with either
fluorescent or bioluminescent properties (FRET:fluorescent resonance energy
transfer and BRET bioluminescent resonance energy transfer). These systems
have been used to monitor specific known protein-protein interactions in in-
tact cells or to determine whether one protein would be able to interact with
another protein (direct interaction tests). They have not been applied to ran-
dom screening of protein-protein interactions using cDNA library screening
paradigms, with the exception of a recent report on the use of split-GFP [G2].
How broadly applicable this system is remains to be determined. One potential
drawback of PCA assays is susceptibility to steric constraints imposed on the
assembly of two reporter protein fragments when these are fused to other
proteins or protein fragments of varying sizes and properties. Limited sensi-
tivity and dynamic range might also be an issue in some instances. Thus, even
if these 2H systems could be adapted to a 3H version for the detection and
characterization of defined small molecule-protein interactions (as has been
described for some of these [GO, G l ] ) , it remains uncertain whether they would
be suitable for random, large scale cDNA library screening and for de novo
target identification. On the other hand, a recently described M2H method,
termed mammalian protein-protein interaction trap (MAPPIT)[G3],has already
18.2 Scanning the Proteornefor Targets oforganic Small Molecules 1 1133
Fig. 18.2-7 The MAPPIT and MASPIT can be monitored using a STAT3-responsive
systems. (a) Events occurring in response t o reporter gene, which uses the pancreatitis
ligand-induced activation o f a type 1 associated protein 1 (rPAP1) promoter.
cytokine receptor. Ligand-binding results in (b) MAPPIT. This 2H system is based on the
conformational changes in the receptor concept described in (a). It employs a
complex, ultimately leading to juxtaposition signalingdeficient leptin receptor F3
and activation o f a receptor-associated (lepRF3) variant that cannot recruit STAT3.
Janus kinase (JAK). JAK then phosphorylates An interaction o f t h e bait and prey proteins
the cytoplasmic part o f the receptor, leading results in the recruitment o f a gpl30 protein
t o recruitment o f signaling molecules. fragment containing STAT3 recruitment
including signal transducers and activators sites. STAT3 can now be recruited and
o f transcriptions (STATs). JAK subsequently phosphorylated by JAK2,
phosphorylates STAT, which causes STAT t o leading t o its activation. (c) MASPIT. In this
dissociate from the receptor, form a system, the recruitment o f the g p l 3 0 protein
homodimer, translocate to the nucleus and fragment is triggered by the interaction o f a
activate transcription o f a STAT-response prey protein with the t e s t compound moiety
gene (or reporter gene). STAT3-activation o f an MFC.
References I 1 1 3 5
Finally, we have recently successfully applied MASPIT to the screening of
cDNA libraries and to the identification of novel small molecule-protein
interactions [64]. These studies mark the beginning of the development of a
broadly applicable M3H system that holds promise for future use in target
identification and drug discovery.
18.2.6
Conclusions
Acknowledgments
I thank Dr. Margaret Lee Kley for a critical reading of the manuscript and
many helpful comments.
References
Chemical Biology. From Small Molecules to System Biology and Drug Design.
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess
CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I1143
19
Chemical Biology - An Outlook
Giinther Wess
Outlook
19.1
The Evolving Concept of Chemical Biology
Almost 20 years ago Arthur Kornberg stated in his famous article “The Two
Cultures: Chemistry and Biology” the following: “. . . we now have the paradox
of the two cultures, Chemistry and Biology, growing farther apart even as they
discover more common ground . . [l] .I’
This was made at a time when it had already become apparent that the
1980s had ushered in a new era in biomedical research with new technologies
providing previously undreamed opportunities. Ten years later S.L. Schreiber
and KC Nicolaou commented on the emerging concept of Chemical Biology as
“. . . the perhaps most exciting development. . .”, “. . . that biological problems
are increasingly well defined from a chemist’s point of view . . .” and . . .
“while Molecular Biology allows the function of biological molecules such
as proteins and nucleic acids to be altered by mutation, Chemical Biology
directly alters the function of biological molecules by chemical means . . .”.
Finally they defined the core of the field of chemical biology as “. . . using
small molecules or designed molecules as ligands to directly alter the function
of biological molecules . . [2]. The next milestone happened in 2005: The
.I’
Nature Publishing Group launched the new journal Nature Chemical Biology
with the statement that “. . . Chemical Biology has emerged as a field grounded
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited bv Stuart L. Schreiber. Tarun M. Kaooor. and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag G k b H 6 Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
1144
l in technical advances brought about by the close collaborations of Chemists
19 Chemical Biology - An Outlook
19.2
Chemical Biology in Academia
Although there is not yet a precise definition of chemical biology, the common
understanding among many scientists is that chemical biology directly alters,
activates, perturbes or inhibits the function of biological macromolecules by
chemical means, that is, small-molecule ligands. In future, this leitmotiv
should be extended to higher levels of complexity and should also include
biological systems and pathways, regulatory networks, cellular processes, and
even whole organisms. The scientific questions will range from basic science,
purely academic in nature, to questions of life science, drug discovery, and
future medicine. It will also include plant biology and even ecosystems and
their evolution.
Chemical biology brings the small molecules into play. It will significantly
give new insight - how things function at various levels. Needless to mention
that this will require the fruitful interplay of many disciplines and technologies
such as Biology, Chemistry, Medicine, and Mathematics, screening in vivo
models and metabolomics. Such an approach will not only give new insight
into fundamental biological processes but will also create new opportunities
for new products and businesses.
At this point, some remarks on the future role of chemistry in the context
of chemical biology seem to be required. With some oversimplification,
chemistry was traditionally concerned with structure and synthesis, and biology
more with function (with the exception of structural biology of biological
macromolecules). Research into structure-activity relationships was always
79.2 Chemical Biology in Academia I1145
19.3
Chemical Biology in Industry
Both areas comprise very complex challenges. The first one deals with the
question of what the molecule does to the biological system with regard to
activity and specificity, for example, inhibiting an enzyme or activating a
receptor. The second one deals with the question of what the system does to
the molecule, for example, getting metabolized by an enzyme of the liver or
being transported through a membrane.
Despite the fact that pharmaceutical companies will optimize these areas by
applying new technologies and management processes [8] there are typical,
critical, success elements chemical biology can contribute. These elements
are primarily based on knowledge on targets and molecules and particularly
on target families and privileged molecular scaffolds, recognition patterns,
and binding motifs. This knowledge has to be accumulated over time and
needs validation in vivo to become more valuable. In addition, this knowledge
on target classes and privileged drug-like molecules will be complemented
by further insight into the ADMET rules and the correlation to the human
system.
Chemical biology in drug discovery would also address how drugs really work
in interdependent systems including pleiotropic effects of drugs [9].Emphasis
would also be laid on the characterization of compounds in distinguished
transgenic cellular and in vivo models to get a comprehensive set of data on
the whole biological profile. Such a systematic science-driven strategy would
lead into a new science of drug discovery. New types of targets require new
approaches that are much more knowledge-based and see the molecules in
their complex environment of interdependent biological networks. Needless
to say that the intention is definitely not to replace the classical pharmacology
approach. The question is simply how to reach the next level and get the most
relevant success critical information as soon as possible (Fig. 19-1).
Mechanisms of health and diseases and the complex interaction with the
environment at macroscopic and microscopic levels will become another
central theme in the context of future medicine that will be much more
focused on the question of prevention rather than classical treatment
and “polypharmacy” strategies. Other aspects are how to induce repair
mechanisms and how to cope with the question of personalized medicine.
It is apparent that these complex future questions will require much more
interaction between academic research and industry. The grand challenges
in drug discovery require new types of interaction, networks, and clusters of
knowledge. Chemical biology will not only be a major contributor but also a
key driver.
1148
I 19 Chemical Biology - An Outlook
19.4
Chemical Biology and Translational Medicine
19.5
Knowledge and Networks, Education and Training
that knowledge can flow and that there are no hierarchical or bureaucratic
boundaries. There is also a component that has to do with values and behavior:
sharing of knowledge across organizations and disciplines. Networks should
have in place mechanisms that encourage and reward knowledge sharing.
The networks should not be limited to academia. They should also include
partners from industry. This is a great chance to approach new fields with
grand challenges and to use the complementary capabilities of academia and
industry. In the precompetitive area, it’s just a question of commitment and
real interest. In the competitive area, it should be possible to find adequate
legal frames that respect the interest of the different stakeholders. In addition,
by performing joint efforts these partners will find more common ground, as
previously expected.
How should chemical biologists be trained and educated? Is this a training
in the job, a new curriculum or branch at the chemistry departments, or a
graduate program? Currently, there are all kinds of approaches and a clear
answer cannot be given at present. As the field is emerging, the requirements
and necessary skills will become defined. In the end, there might perhaps be
less traditional chemistry departments but more chemical biologists working
at different places.
19.6
Conclusion
Acknowledgment
References
I1151
Index
Chemical Biology. From Small Molecules to System Biology and Drug Design
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess
Copyright 0 2007 WILEY-VCH Verlag CmbH 61 Co. KGaA, Weinheim
ISBN: 978-3-527-31150-7
1152
I Index
607 855
DMPK, drug metabolism and chemical glycomics for 668ff
phamacokinetics ( D M P K ) 796 COX-2inhibitors
DMPS, 2,3-dimercaptopropanesulfonate development of, celecoxib (Celebrex)
( D M P S ) 453 792
DMSO, Dimethylsulfoxide ( D M S O ) 572 enzyme, identification of 792
DNA, Deoxyribonucleic acid ( D N A ) 300, drugs target
576,668 N R account, in pharmaceutical sales
DNA binding domain (DBD) 895 90 1
DNA-Protein Interaction 204, 218 gene-family approach
AD-cDNA fusion 205 for protein classes 852
genes histone deacetylases (HDACs)
olfactory-specific 205 outstripping histone acetyltransferases
one-hybrid assay 204 (HATS) 696
phage display 219 isolating and synthesizing active
transcriptional activators 218 ingredient
two-hybrid assay and pharmacological experiments in
into one-hybrid system 218 parallel 793
zinc-finger evolution 219 mechanism-based discovery background
DOS, Diversity-oriented synthesis ( D O S ) 793f
48 3 propranolol, interesting development
DOSY, D i f i s i o n ordered spectroscopy of 793
( D O S Y ) 860 new rules for 379
DPN, Diarylpropionitrile ( D P N ) 368 N M R spectroscopy
DRIP, Vitamin D receptor-interactingprotein different stages of, pharmaceutical
( D R I P ) 914 research 855f
Drosophila phenotypes 937 N R drug discovery
Drospirenone tissue-selective benefits 916
combinatorial acceleration of tissue-selective benefits, examples of
preparation 28 917
screening 28 N R drugs, brief history of 901ff
leading position N R function
in hormonal contraception 27 binding druglike small molecules
synthesis 27 895
unnatural N R LBD fold, of three stacked a-helical
biologically 27 sheets 892
Drug delivery applications N R superfamily
chemical groups on entrance of reverse endocrinology approach 903
protein into reducing environments NR-targeted drug discovery
597 history of 901
Drug development nuclear receptor structure/function,
inhibition of HDACs features of 891
beneficial effect in, repressing nuclear receptor superfamily
hypertrophy 698 classic steroid receptors 897
reasons for attrition in 1005 domain organization of 893
Drug discovery features of 891ff
approaches to general mechanisms of, N R function
C-terminal 891 896
biological models key methodologies, for nuclear
discovery of, penicillin-resistant receptor-targeted drugs 891
Streptococcus pneumoniae 794 representative structures of, N R
novel anti-infective drug 794f functional modules 895
1168
I Index
Gene therapy
throughput screening (GE-HTS) 313 targeted nuclear acid repair
Gene expression assay for 442
selected putative target, based on Genes
differential gene expression 795 chemical events
Gene expression omnibus (GEO) 1096 regulation of 300
Gene expression profiling genes 79
using microarrays Bub 78
new technology, history and Mad 78
development of 1084f Genetic approaches
Gene expression-based high-throughput forward chemical genetics
screening (GE-HTS) 313 phenotype of interest, relies on
Gene family 309
molecular targets with, chemical leads protein targets and genetic pathways,
and tools 813 identification of 310
redundant ortholog targets 813 forward genetics
Gene microarrays classical genetic approach 309
complementary oligonucleotide novel gene products, identification of
hybridization 309
inherent specificity of 405 use of, phenotype-based screening
Gene ontology (GO) 818 308
Gene profiling forward versus reverse chemical genetics
genome-wide gene expression analysis small molecules and phenotypic
outlook of 1083 assays 310
practical considerations and new small-molecule modulator of
application to 1083ff gene product 311
microarray analysis reverse chemical-genetic approach
data analysis, principles of 1089ff for dissecting biological systems
delineating of, biological pathways 311
involved in a process 1090 reverse chemical-genetic screen
pattern-recognition algorithms, starting point, protein of interest
identifying gene expression profiles 311
1091 reverse genetics
supervised methods, using “training phenotypic consequences of,
set” 1092 mutations in known gene 309
support vector machines (SVMs), use Genetic Code
of 1092 Cracking 50
public databases for Expanding 50
gene expression data 1095f Genetic Disease 186
T-cell subsets Complementation/Rescue 186
application and practical examples of compounds
1097ff Computer-aided design 188
unsupervised learning approach that rescue mutations 188
K-means clustering 1091 hormone
Gene profiling T helper cell differentiation analogspecific forms 186
Thl and Th2 cells, developing from nuclear/steroid 186
common precursor 1098 receptors 186
Gene regulation hormone analogs
altered patterns of, protein expression designed 187
694 interface
epigenetic mechanisms of 694ff receptor-hormone 187
and role of, activity enhancing accessory mutations
proteins 913f genetic disease 186
Gene regulatory networks 1046 in nuclear receptors 186
Genetic diversity Glycoconjugate biosynthesis 635
Index
I 1175